User:Mattwj2002/1911 Encyclopedia scripts
From Wikisource
These are some scripts for working on the 1911 Encyclopedia project [edit]
Here is my script for doing OCR. This OCR script is version 1.
#!/bin/bash ddjvu -format=tiff eb1911-vol01-a-androphagi.djvu eb1911-vol01-a-androphagi.tif tiffsplit eb1911-vol01-a-androphagi.tif eb1911 rm eb1911-vol01-a-androphagi.tif sleep 2 let i=1 ls -1 *.tif | while read line; do echo $i; tesseract $line page$i -l eng; let i++; sleep 1; done sleep 2 rm *.tif
Here is my script for doing OCR. This OCR script is version 2.
#!/bin/bash
ddjvu -format=tiff volume1.djvu eb1911-vol01-a-androphagi.tif
tiffsplit eb1911-vol01-a-androphagi.tif eb1911
rm eb1911-vol01-a-androphagi.tif
sleep 2
let i=1
ls -1 *.tif | while read line; do
echo $i
if [ $i -le 32 ]; then
tesseract $line page$i -l eng
else
convert $line -crop 50%x100% +repage tmp%02d.tif
tesseract tmp00.tif tmp00 -l eng
tesseract tmp01.tif tmp01 -l eng
cat tmp00.txt tmp01.txt > page$i.txt
fi
mv $line $i.tif
let i++
done
This is my script for taking tiff files and converting them to a djvu file.
#!/bin/bash let i=1 ls -1 *.TIFF | while read line; do cjb2 $line $i.djvu; let i++; done djvm -c volume1.djvu 1.djvu for((i=2;i<=1029;i+=1)); do echo $i djvm -i volume1.djvu $i.djvu done
Here is my script for crop images.
#!/bin/bash ls -1 *.TIF | while read line; do convert +compress -crop 100%x99% -gravity South $line $line.TIFF; done
PNG files to PDF files [edit]
#!/bin/bash let i=1 ls -1 *.png | while read line; do convert +compress $line $i.pdf; echo $i; let i++; done mv 1.pdf outputfile.pdf let i=2 ls -1 *.pdf | while read line; do pdfjoin outputfile.pdf $i.pdf --outfile outputfile.pdf; echo $i; let i++; done