Forum Discussion

NisHera's avatar
NisHera
Valued Contributor
8 years ago
Solved

splitting PDF text to array

Hi,

I'm testing PDF file using PDFBox java class.

Everything properly set up and could strip text from pdf.

But problem is converting text to array...my function is like below

 

function ABCD(){
  var docObj = loadDocument("E:\\Temp\\Report100.pdf");
   //Create a text stripper object to get text 
  var textStripperObj = JavaClasses.org_apache_pdfbox_util.PDFTextStripper.newInstance();
  var text = textStripperObj.getText_2(docObj);  
  Log.Message('',text);
  var textArray = text.split('\r');
  Log.Message(textArray.Length);
  for (var i=0; i<25; i++){
    Log.Message( String(textArray[i])+ String(i));
  } 
}

From log message I could see correct text

but not in textArray... when debug it shows like below

tried with split('\n') ,  split('\b')...it's not getting array values...

but could see it's braking text to array.. 

 

debug results

 It is not possible to direct compare Old pdf with new pdf because page structure is defferent.

But contents are same (except dates ) for eg Old pdf has 6 pages but New pdf has 5 pages.