User:Inductiveload/Scripts/PDF page converter

From Wikisource
Jump to navigation Jump to search

This script "bursts" a PDF into page images. The images will be created in the current directory.

Parameters[edit]

This bash script takes a single parameter: the file to be burst into images:

./pdf_conv.sh file.pdf

Internal parameters[edit]

  • CORES is the number of concurrent processes to spawn. It won't affect your interactive session if you use every core of your processor since it uses "nice", but there is no benefit to having more processes than cores.
  • EXT is the image extension to convert to.
  • DENSITY is the resolution with which Imagemagick converts the PDF. There is no benefit to exceeding the PDF resolution, but if you have a smaller value, quality will suffer.

Requirements[edit]

  • Pdftk for the interrogation of the number of pages
  • Imagemagick for the image conversion

Both of these are easily available from standard repos.

Source code[edit]

EXT=".png"
CORES=4
DENSITY=300
 
 
FILE=$1
echo "Processing $FILE"
 
 
#Get the number of pages in the file
tmp=`pdftk "$FILE" dump_data output | grep -i "NumberOfPages"`
PAGES=${tmp#*:}
echo "Processing $PAGES pages"
 
 
 
 
convert_page(){
    #Takes one argument: the current page number
 
    local CURRENT_PAGE=$1
 
    FILENAME=`printf "%04d$EXT" $CURRENT_PAGE`
 
    echo "    Converting page $CURRENT_PAGE to $FILENAME"
 
    nice -n 19 convert -density $DENSITY "$FILE"[$CURRENT_PAGE] $FILENAME
}
 
 
 
THREAD_COUNTER=1
for (( PAGE=0; PAGE<$PAGES; PAGE++ ))
do
    convert_page $PAGE &
    if test $THREAD_COUNTER -ge $CORES
    then
        wait
        THREAD_COUNTER=1
    else
        let THREAD_COUNTER=$THREAD_COUNTER+1
    fi
done
wait