Forum Discussion

mdang's avatar
mdang
New Contributor
4 years ago

Does PDF.ConvertToText support PDFs which have protections to prevent text being "Copy and pasted"

Hi,

 

I have a requirement to validate the contents of an invoice PDF. The PDF has a protection mechanism to prevent the contents from being copy and pasted (the contents come out as "garbled" text). This means I can't use libraries like Apache PDFBox (tried this with TestComplete, and the contents come out "garbled"). 

 

I wanted to know if anyone can confirm if TestComplete's PDF.ConvertToText method would support this type of PDF,, since it uses OCR to extract the text. My organization has a secured network, and port 443 is blocked, the process for me to get the port opened is quite lengthy with numerous approvals. I would hate to go through the process to find out the the functionality wouldn't be able to extract the text from my PDF. 

 

 

Thank you!

  • if you were to get 443 opened the pdftotext should work since its not a copy and paste but ocr as you mentioned

  • if you were to get 443 opened the pdftotext should work since its not a copy and paste but ocr as you mentioned