[nycphp-talk] The user table
Gary Mort
garyamort at gmail.com
Tue Mar 8 13:09:46 EST 2011
On 3/7/2011 2:41 PM, Paul A Houle wrote:
> On 3/5/2011 1:27 PM, Gary Mort wrote:
>> I am no longer convinced that the cpu cycles saved by using a numeric
>> id are sufficient to justify doing so with todays computers. By the
>> time you get to an app that is sizable enough to be looking to shave
>> those few microseconds, your going to be dealing with a large number
>> of ways to save space....and your likely going to want to stick the
>> user mapping data into a cache in memory somewhere anyway, so the
>> time saved is not that big of a deal.
>>
> It's not cpu cycles that matter here, but I/O bandwidth, seek
> time, and RAM for cache on your database server. Although MySQL in
> an impressive product, it's remarkably easy to put enough data into a
> database that you'll have performance problems, particularly if
> you're importing data from public sources or if you develop a
> successful community site.
I highly doubt that unless you are Facebook, Google, or Amazon that you
can put enough "user records" into your database for this to matter.
Note, I am specifically referring to the user table - not any other bit
of data but where you need to identify a user uniquely.
I see no point in placing the added burden of using integer identifiers
for userids on all applications in order to accomodate "but what if
someone has hundreds of thousands of users or more!" - if they have
hundreds of thousands of users, they can afford customizing - why should
everyone else pay for those few people's convenience?
What do I mean by "pay"?
Take any mysql application used by a mom and pop store. For whatever
reason, there is no canned report to retrieve some aspect of data from a
table[could be the order table, could be the payment table, or here is
one that often occurs, the SHOPPING cart table to see how many abandoned
shopping carts there are].
There are a plethora of tools that the owner of the data can use to view
the data, Navicat, PHPMyAdmin, export it to CSV and look in excel,
etc. And the data tends to be very nicely laid out so it the primary
bits are understandable.....and then you come to "what actual person did
this?" - and it all breaks down. Sitting there is some number.... Now
suddenly from viewing an individual table[which is easy] the person
needs to join and link the data to another table. So he has to pay a
programmer everytime he wants a report to figure out these linkages.
So all the little websites of the world pay a premium just so that open
source applications can say their "scalable" and that the few big
businesses which could have afforded to pay to tune the system don't
have to.
That to me is a poor business decision. Making many other people pay so
that your code can be "scalable" and "good coding practices" and "well
engineered" is not the sign of a good coder - it's the sign of an
artistic one.
So, what happens if you do it the other way around? What if you put the
users name in every one of those records and used that as the key?
Well, performance for those little sites doesn't take any significant
hit. On the rare occassion that someone wants to change their name, you
do have to either go and change it in every existing table - or you have
to maintain some sort of history table that you can link through so the
program can say:
This order was placed on December 15th at 2:00pm and at that time, the
userid garyamort referred to the user who is garyamort today.
Again, for the mom and pop's of the world, take the hit and update it in
all tables.
It's a different paradigm. Throwing "art" out the window and replacing
it by practicality. Of course, it also means "know your market". If
you are coding a specific app for a specific person who is planning on
growing exponentially, then designing something for him which has to be
changed later is a poor choice. But if your making an app for tens of
thousands of people to use - make simplicity for the end user the
driving force. Do stupid ass things that would never scale....and then
note what they are and how you change it[if you scale past 10,000 users
convert all usernames to integer keys and always use the aliases
table.... here is the high level overview on how to do it. if your not
a coder, make it your business model to be able to afford to pay for a
coder when you reach that point.]
The key point, for me, is it should not be so difficult for me to change
an application so email addresses can be used instead of usernames. Or
to make it possible for multiple usernames to use the same email
address. Or to change the rules on usernames to allow different
charectors. And truely, it is easy to make that change for the one bit
of core code....it's tracking down all those other bits of code that
worked on the assumption of those rules...merely because those where the
rules....that I find troublesome.... and it bothers me greatly to
charge for this coding time because I am basically fixing the mistaken
design decisions made by people who are applying rules by rote. "Always
use an auto-incrementing primary id"
More information about the talk
mailing list