[nycphp-talk] Compressing PDF's
Jerry Kapron
nyphp at NewAgeWeb.com
Thu Jul 10 04:26:30 EDT 2003
>Hans Zaunere wrote:
>Since PDF is just text, why not gzip, bzip2 or even zip?
I know I wasn't specific enough, when I said "compress".
Raw PDF format is just text. However contents of a PDF file can be optimized (compressed). I'm not looking to create a .zip or .gz file (that would be a nobrainer). I want to compress the PDF file "internally". Most PDF's created with Acrobat/Distiller are already compressed.
If you download this PDF:
http://www.tax.state.ny.us/pdf/2000/wt/nys45mn_100.pdf
and open it in a text editor, you'll see that some parts are binary.
Those are FlateCoded content streams.
The file I'm working with was created with Adobe Illustrator and saved as raw PDF (text only). I need raw PDF to use it as a template (by preg_replacing some "variable text"). The problem is that the file is 700Kb (way too big for this web app). When I open it and save optimized in Adobe Distiller, the size is reduced to 195Kb, but the compressed file can not be used directly as a template anymore.
I could take two different routes:
1) use the raw PDF file as a template > preg_replace some text > compress the new PDF > send it to the client
2) use an already compressed PDF file as a template > fetch and uncompress the FlateCoded streams > preg_replace some text > recompress the modified content > send the new PDF to the client.
I know I could also use PDF4PHP to create a compressed PDF file from scratch but for performance reasons I really wanted to stick to using a template file. I searched the web but could not find any ready code specifically for what I want to do. I'm looking under the hood of the PDF4PHP class (it support FlateDecode compression) to get an idea how to uncompress the compressed streams. Any suggestions, pointers or code would be greatly appreciated.
Cheers,
Jerry
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.nyphp.org/pipermail/talk/attachments/20030710/7cbc7f0d/attachment.html>
More information about the talk
mailing list