[nycphp-talk] Stripping formatting from a word document
Anirudh Zala
arzala at gmail.com
Wed Jul 18 02:28:07 EDT 2007
On Wednesday 18 Jul 2007 01:07:49 csnyder wrote:
> On 7/17/07, Jon Baer <jonbaer at jonbaer.com> wrote:
> > I think he was asking about a .doc file directly? Im suprised that
> > manipulation of Word docs always comes up on the list + the resources are
> > pretty limited.
> >
> > One project I found a while ago was antiword in which the sources are
> > available:
> >
> >
> > http://www.winfield.demon.nl/
>
> There was a reference earlier to catdoc. The url for that project is
> http://www.wagner.pp.ru/~vitus/software/catdoc/
>
> The changelog shows slightly more recent activity than AntiWord, but I
> suppose it all breaks with Office 2007 (or whatever year they're up to
> in Redmond).
This is very nice command line utility to extract text from Word and
Powerpoint slides. It runs on most of *nix systems. Can be used to extract
text (in form of CSV) from XLS files also.
We have been using it very happily for long time.
Thanks
Anirudh Zala
More information about the talk
mailing list