NYCPHP Meetup

NYPHP.org

[nycphp-talk] session size important?

Hans Zaunere lists at zaunere.com
Fri Apr 22 09:19:18 EDT 2005


> >  I saw some example a few months back about
> > about the space required by an application that had 20k of session data on
> > the file system x number of users x 20 minute time out which brought the
> > space needed to 2GB for a modest amount of users.
> 
> So the answer is, store as much as you need to, and no more.
> 
> But I'm still curious about the performance implications of
> serialize() / unserialize() -- should large sessions be broken up into
> many rows of a table, so that updates only touch one row and not the
> entire structure?

My understanding of things (mostly from seeing discussions from Rasmus et al about serialize()) is that it's slow as heck.  Basically, it was never designed for performance or production usage, and should be considered a utility function.  That said, I believe there is a PECL that provides much better serialize functionality.

> It seems like if you had A LOT of data being stored in the session,
> you would be better off putting it into a db and only reading/writing
> the rows you need for a given request. Then the only thing stored in
> $_SESSION would be the key(s) to those rows...
> 
> The disk size requirements don't go away, but processing might be more
> efficient.

At the end of the day, I think it's safe to use this rule:

-- keep session data stored in the database as small as possible
-- keep session data in transit even smaller

Session data is basically just the method of keeping state.  If the application requires large amounts of state, perhaps it's time to reconsider its architecture and deployment.  State shouldn't be that big.  If there's absolutely no way to minimize state (although there always should be) use a database/memcache combination to maintain it, but frankly, I'd never want to put it across the wire to a browser, for at least the following reasons:

-- moving large amounts of data across the wire is expensive (from a network and resource perspective, not to mention possibly a bandwidth cost perspective)

-- if you put things out on the wire, you really shouldn't depend on them coming back the same way.  This means you're doing a lot of checksums, etc.  This is a further performance hit, and one that is likely greater than using known data from a locally connected database.

-- processing overhead on both the client and Apache side can be a problem.  It's just a lot of data to have circulating *twice* on *every* request - plain and simple.


---
Hans Zaunere
President, Founder

New York PHP
http://www.nyphp.org

AMP Technology
Supporting Apache, MySQL and PHP







More information about the talk mailing list