Luigi Auriemma

aluigi.org (ARCHIVE-ONLY FORUM!)
It is currently 19 Jul 2012 13:59

All times are UTC [ DST ]





Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 1 post ] 
Author Message
 Post subject: Copying text data from PDF files using XPDF pdftotext
PostPosted: 03 Jun 2008 21:52 

Joined: 13 Aug 2007 21:44
Posts: 4068
Location: http://aluigi.org
Today my brother needed a way for copying text from a PDF file and the first strange thing was that no Windows viewer allowed that.

So after having searched something on Google and have found the most stupid comments and suggestions read in my life (for example an idiot suggested to do a screenshot of the screen ah ah ah) I finally decided to switch on the "converting" way.

The open source project of XPDF naturally implements everything so using pdftotext is possible to dump all the text data of a PDF file in some milliseconds (for the Windows lazy boys just drag your PDF over pdftotext.exe):

http://www.foolabs.com/xpdf/download.html

Nice job except that this program doesn't allow to bypass the DRM restrictions: I mean, it CAN but doesn't allow since his author lives in USA and so seems he must respect this rule.

So I have written a quick patch for bypassing the restriction allowing the dumping of any PDF file with or without DRM which can be applied to pdftotext, pdftops and pdfimages:

http://aluigi.org/mytoolz.htm#lpatch
http://mirror.aluigi.org/patches/pdftotext_nodrm.lpatch

It's just a one-byte modification.
Hope it helps.


Top
 Profile  
 
 
Display posts from previous:  Sort by  
Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 1 post ] 

All times are UTC [ DST ]


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for: