1. Put your .pdf in a folder, navigate to that folder in the command line.
2. Type the following:
pdftoppm -png3. Next install gocr
sudo apt-get install gocr4. Finally this command:
for i in *.png; do gocr -i $i -o $i.txt; done
You'll have a big list of .txt files.
Now you can concatenate all the files.
cat *.txt >> [new_file].txt
I won't claim that the text files are pretty at all, but you can take them and start to massage them so that you end up with a nice text file you can then use a speed reading app with.
If you build something that cleans these up, share it below.
Alternately if the PDF is not an image but a real PDF Text file, then a simple pdftotext command should work.
Post a Comment