PDF / PS / EPS tricks (Linux)

Programs used

http://www.pdflabs.com/tools/pdftk-the-pdf-toolkit/ pdftk

Maybe http://pdfhacks.wordpress.com/category/linux/ is worth a look, did not check.

HowTo’s

http://www.efho.de/fh/linux/vortrag-clt2009.pdf

Convert PDF into svg (2012-08-06)

This can be solved using Scribus.  It has eps-import and svg export possibilities. The svg file can be croped using inkscape.

http://www.ehow.com/how_12112956_convert-eps-svg.html

Convert PDF into jpg (2011-10-17)

convert -density 900 file.pdf file.jpg

The density switch allows one to adjust the resolution http://www.imagemagick.org/script/command-line-options.php

Scans, PDFs, DJVUs, OCR, and stuff (2011-08-10)

The tool gscan2pdf (http://gscan2pdf.sourceforge.net/) is very nice. For OCR one needs a backend such as

gocr http://jocr.sourceforge.net/

ocropus http://code.google.com/p/ocropus/

tesseract   http://code.google.com/p/tesseract-ocr/

cuneiform http://cognitiveforms.ru/products/cuneiform/

So far I tried only one document. gocr did not work very well. ocropus did only work with tesseract. but the results were

PDF to TIFF using Ghostscript (2011-08-10)

http://www.linuxforums.org/forum/graphic-arts-digital-imaging/58248-pdf-tiff-any-other-image-type.html

gs -SDEVICE=tiffg4 -r600x600 -sPAPERSIZE=letter -sOutputFile= NAMEHERE_%04d.tif -dNOPAUSE -dBATCH PDFINTPUTFILEHERE

HowTo install pdftk on CentOS 5

install: java-1.4.2-gcj-compat-devel, then following http://www.pdflabs.com/docs/build-pdftk/ , then it worked.

HowTo: Extract pages from pdfs

http://denkenblog.blogspot.com/2008/01/extract-several-pages-from-one-pdf-file.html

pdftk A=test.pdf cat A32-86 output new.pdf

In additional: The keyword end can be used to reference the final page

[ < input PDF handle > ] [ < begin page number > [ -< end page number > [ < qualifier > ] ] ] [ < page rotation > ]

For example:

pdftk A=in1.pdf B=in2.pdf cat A1 B2-20even Aend output out.pdf

If no handle is given, pdftk uses the first input PDF. Otherwise, the handle identifies one of the input PDF files. The beginning and ending page numbers are one-based references to pages in the PDF file. The qualifier can be even or odd, and the page rotation can be N, S, E, W, L, R, or D.

HowTo: Merge several pdfs

Merge Two or More PDFs into a New Document

pdftk 1.pdf 2.pdf 3.pdf cat output 123.pdf

Change Color PDF into Gray

http://superuser.com/questions/104656/convert-a-pdf-to-greyscale-on-the-command-line-in-floss

ImageMagic: (seems to have problems with vector graphics and did not work for me…)

convert -colorspace GRAY color.pdf gray.pdf I had no problems using Ghostscript 9.00 (an older version produced the error CRIT: rangecheck in .putdeviceprops) gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -sColorConversionStrategy=Gray -dProcessColorModel=/DeviceGray -dNOPAUSE -dBATCH -dPDFSETTINGS=/ebook -sOutputFile=output.pdf input.pdf

How to convert a pdf page to a4 format (02/04/2012)

The following LaTeX document converts the pdf into a4 format

\documentclass[a4paper]{article}
\usepackage{pdfpages}
\begin{document}
\includepdf[pages=-]{input}
\end{document}

Change (reduce) size of a pdf

I had to make the size of some PDF-files smaller. Under Linux you can use ghostscript for this purpose (http://www.ehow.com/how_6823473_reduce-pdf-file-size-lin)gs -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/screen -sOutputFile=output.pdf input.pdf

instead of /screen you can also use  /ebook /pring /prepress
This are just predefined configurations, an overview is given here http://pages.cs.wisc.edu/~ghost/doc/cvs/Ps2pdf.htm (see the table below, notice that the options can be used like: GrayImageResolution -> -dGrayImageResolution=72)
So far I did not find a way to obtain a resolution between 72 and 150 which is a bit bad since 150 is still big and 72 is already a bad quality. However this is very usefull if a pdf contains only a few (small) pictures (which have an unneeded high quality) like photos in cv, etc.

Pdf nup

To combine some pages onto one one can use pdfnup (see http://www.efho.de/fh/linux/vortrag-clt2009.pdf)

pdfnup –nup 2×1 –outfile out.pdf  infile.pdf

Pdf’s cut, merge and a lot more: just pdflatex and the packet

Postprocessing of a scanned pdf using Latex and the packet pdfpages

To trim the borders one can use this file (one has to compile it using pdflatex)

\documentclass[a5paper,landscape]{article}

\usepackage{pdfpages}
\begin{document}
%trim= 1 2 3 4 ‘crops’ the picture by 1bp at the left, 2bp at the bottom, 3bp on the right and 4bp at the top
\includepdf[pages=-,trim=1.5cm 2.5cm 1.5cm 0.5cm,clip]{book}
\end{document}

Merge PDFs side by side

pdfa: 1,3,5,7,.. pdfb: 2,4,6,8… pdffinal: 1,2,3,4…

http://www.linuxquestions.org/questions/linux-general-1/how-to-merge-one-pdf-first-side-and-a-second-pdf-second-side-854784/

pdftk A=odd.pdf B=even.pdf shuffle A B output final.pdf
(at least v. 1.44)

Modify borders and put several pages onto one

\documentclass[a4paper,landscape]{article}
\usepackage{pdfpages}
\begin{document}
%trim= 1 2 3 4 ‘crops’ the picture by 1bp at the left, 2bp at the bottom, 3bp on the right and 4bp at the top
\includepdf[pages=-,trim=4.5cm 1.5cm 4.5cm 0.5cm,clip,nup=2×1]{book}
\end{document}

PDF to PS

beside pdf2ps, etc. (http://www.efho.de/fh/linux/vortrag-clt2009.pdf)

acroread -toPostScript < pdf.datei > postscript.datei

PDF to DJVU

To create a djvu-file one can use the following tools

Install djvulibre and djvu tools (djvu.sourceforge.net/ or sudo apt-get install djvulibre-bin)

The following bash script converts a pdf into a djvu

#!/bin/bash
#
# pdfs2djvu
#

if [ -z `which pdftoppm` -o -z `which cjb2` -o -z `which djvm` ]; then
  echo
  echo "Error: pdftoppm, cjb2 and djvm are needed"
  echo
  exit 1
fi

shopt -s extglob

OUTFILE="#0.djvu"
DEFMASK="*.pdf"
DPI=600

if [ -n "$1" ]; then
  MASK=$1
else
  MASK=$DEFMASK
fi

for PDF in $MASK; do
  if [ ! -e $PDF ]; then
    echo
    echo "Error: current directory must contain files with the mask $MASK"
    echo
    exit 1
  fi
  echo $PDF
  pdftoppm -mono -r 600 -aa yes $PDF $PDF
  for PBM in $PDF*.pbm; do
    echo $PBM
    cjb2 -dpi $DPI $PBM $PBM.djvu
    rm -f $PBM
  done
done

djvm -c $OUTFILE $MASK*.pbm.djvu

Useful links: http://ubuntuforums.org/showthread.php?t=216531, http://www.howtoforge.com/creating_djvu_documents_on_linux

Autocrop pdfs

Based on the script from the following webpage

http://sites.google.com/site/nathanandrewmiller/automaticallycroppdffigures

I wrote this small bash script

#!/bin/sh
pdftops $1 .tmp.ps
mv    $1 .backup.pdf
ps2eps -l -f -B -s b0 -c -n -P .tmp.ps
epstopdf .tmp.eps
mv    .tmp.pdf $1
rm .tmp.ps
rm .tmp.eps

to auto-crop pdf’s. It is useful if one wants to include a figure printed into a pdf or something simmilar.

Re-calculate the bounding box of an eps-file automatically (2012-05-30)

see the tool epstool (http://pages.cs.wisc.edu/~ghost/gsview/epstool.htm)

epstool –copy –bbox in.eps out.eps

Change Bounding Box in eps-file (2011-06-22)

The Bounding Box of an eps-file can be changed using a text editor. http://www.iam.ubc.ca/old_pages/newbury/tex/figures.html

%%BoundingBox:  0 0 453 216
                | |  |   |_distance from bottom of page to top of figure
                | |  |_distance from left side of page to right side of figure
                | |_distance from bottom of page to bottom of figure
                |_distance from left side of page to left side of figure

The numbers are points with 1pt = 1/72 inches.

 Print a A0 poster on A4/A3 pages (2011-07-13)

The Linux-tool poster can be used  http://www-public.it-sudparis.eu/~berger_o/weblog/2007/09/05/making-a0-posters-on-gnulinux-and-previewing-printout/

I first converted the pdf-file to an eps-file using pdf2ps ps2eps and the I used

poster -m a3 -i a0 -p a0  poster.eps  > poster2.eps

to convert the a0-poster into 8 a3 pages. Then back with epspdf and print…

Since the poster was a landscape one this procedure worked only after I rotated it by 90 degrees using pdftk

pdftk poster.pdf cat 1E output posterrot.pdf

Split a pdf into fragments of four pages (2011-11-18)

#!/usr/bin/env python
import copy, sys,math
from pyPdf import PdfFileWriter, PdfFileReader

print    “tst”
pdf         = PdfFileReader(file(“/home/theo/smeuren/main.pdf”, “rb”))
numpages    = pdf.numPages
end        = int(math.floor(numpages/4))
diff    = numpages – 4*end
print    numpages, end, diff
print    range(end)
for i in xrange(end):

print    i
output = PdfFileWriter()
output.addPage(pdf.getPage(4*i+0))
output.addPage(pdf.getPage(4*i+1))
output.addPage(pdf.getPage(4*i+2))
output.addPage(pdf.getPage(4*i+3))
print    i,”output”+str(i)+”.pdf”
outputStream = file(r”output”+str(i)+”.pdf”, “wb”)
output.write(outputStream)
outputStream.close()
sys.stdout.flush()

if    (diff == 1):
output = PdfFileWriter()
output.addPage(pdf.getPage(4*end+0))
outputStream = file(r”output”+str(end)+”.pdf”, “wb”)
output.write(outputStream)
outputStream.close()
sys.stdout.flush()
elif (diff == 2):
output = PdfFileWriter()
output.addPage(pdf.getPage(4*end+0))
output.addPage(pdf.getPage(4*end+1))
outputStream = file(r”output”+str(end)+”.pdf”, “wb”)
output.write(outputStream)
outputStream.close()
sys.stdout.flush()
elif (diff == 3):
output = PdfFileWriter()
output.addPage(pdf.getPage(4*end+0))
output.addPage(pdf.getPage(4*end+1))
output.addPage(pdf.getPage(4*end+2))
outputStream = file(r”output”+str(end)+”.pdf”, “wb”)
output.write(outputStream)
outputStream.close()
sys.stdout.flush()

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: