NYCPHP Meetup

NYPHP.org

[nycphp-talk] Blog Posts with Embedded Content

John Campbell jcampbell1 at gmail.com
Mon Oct 13 09:47:32 EDT 2008


On Sun, Oct 12, 2008 at 7:19 PM, Hans Zaunere <lists at zaunere.com> wrote:
> Gentlemen,
>
>> > The safest approach is probably to pass the html through tidy, and
>> > then into DOM, and traverse and count the length of text nodes, but
>> > that would be quite slow if you ran it on every request.
>>
>> Right, +1 for Tidy and DOM, it's the "real" way to do it. You won't
>> need to do it on every request -- you can either store the summary
>> itself as a separate text field, or store the length of the summary as
>> an integer.
>
> I tried this, working through using both DOM and Tidy, and combinations of each - no luck.  The problem is getting the differential between the two versions of the text.
>

This is a solvable problem, but the problem needs to be really well
defined.  I assume you want to snip the html, to show a preview.  If
you leave things like youtube videos and images, then the post could
be really long without much text.  Why do you need the differential
between the two versions?  As soon as you pass something through tidy,
getting the differential is impossible because it can change the html
in unpredictable ways.  Not cutting in the middle of a tag is pretty
easy to solve, just iterate and keep track of the open tags on a
stack.

-John Campbell



More information about the talk mailing list