[nycphp-talk] SEARCHING PDF DOCUMENTS WITH UNIX
DeWitt, Michael
mjdewitt at alexcommgrp.com
Thu Mar 4 21:17:35 EST 2004
Dan,
Isn't pdftotext part of the xpdf package?
Mike
> -----Original Message-----
> From: Daniel Convissor [SMTP:danielc at analysisandsolutions.com]
> Sent: Thursday, March 04, 2004 9:12 PM
> To: NYPHP Talk
> Subject: Re: [nycphp-talk] SEARCHING PDF DOCUMENTS WITH UNIX
>
> On Thu, Mar 04, 2004 at 08:42:55PM -0500, DeWitt, Michael wrote:
> > I am using xpdf to get text out of pdfs.
>
> ... IF you've got X windows going.
>
> Let's see what Panix has...
>
> d> apropos pdf | grep text
>
> latex, elatex, lambda, pdflatex (1) - structured text formatting and
> typesetting
> pdftotext (1) - Portable Document Format (PDF) to text converter
> (version 2.02)
>
> That second one looks like it'll fit the bill.
>
> d> man pdftotext
> ... snip ...
> Pdftotext reads the PDF file, PDF-file, and writes a text
> file, text-file. If text-file is not specified, pdftotext
> converts file.pdf to file.txt. If text-file is '-', the
> text is sent to stdout.
> ... snip ...
>
> d> pdftotext afile.pdf - | grep stringicareabout
>
> Works like a charm.
>
> Enjoy,
>
> --Dan
>
> --
> T H E A N A L Y S I S A N D S O L U T I O N S C O M P A N Y
> data intensive web and database programming
> http://www.AnalysisAndSolutions.com/
> 4015 7th Ave #4, Brooklyn NY 11232 v: 718-854-0335 f: 718-854-0409
> _______________________________________________
> talk mailing list
> talk at lists.nyphp.org
> http://lists.nyphp.org/mailman/listinfo/talk
More information about the talk
mailing list