NYCPHP Meetup

NYPHP.org

[nycphp-talk] The user table

Gary Mort garyamort at gmail.com
Tue Mar 8 13:09:46 EST 2011


On 3/7/2011 2:41 PM, Paul A Houle wrote:
>  On 3/5/2011 1:27 PM, Gary Mort wrote:
>> I am no longer convinced that the cpu cycles saved by using a numeric 
>> id are sufficient to justify doing so with todays computers.  By the 
>> time you get to an app that is sizable enough to be looking to shave 
>> those few microseconds, your going to be dealing with a large number 
>> of ways to save space....and your likely going to want to stick the 
>> user mapping data into a cache in memory somewhere anyway, so the 
>> time saved is not that big of a deal.
>>
>     It's not cpu cycles that matter here,  but I/O bandwidth,  seek 
> time,  and RAM for cache on your  database server.  Although MySQL in 
> an impressive product,  it's remarkably easy to put enough data into a 
> database that you'll have performance problems,  particularly if 
> you're importing data from public sources or if you develop a 
> successful community site.


I highly doubt that unless you are Facebook, Google, or Amazon that you 
can put enough "user records" into your database for this to matter.  
Note, I am specifically referring to the user table - not any other bit 
of data but where you need to identify a user uniquely.

I see no point in placing the added burden of using integer identifiers 
for userids on all applications in order to accomodate "but what if 
someone has hundreds of thousands of users or more!" - if they have 
hundreds of thousands of users, they can afford customizing - why should 
everyone else pay for those few people's convenience?

What do I mean by "pay"?

Take any mysql application used by a mom and pop store.   For whatever 
reason, there is no canned report to retrieve some aspect of data from a 
table[could be the order table, could be the payment table, or here is 
one that often occurs, the SHOPPING cart table to see how many abandoned 
shopping carts there are].

There are a plethora of tools that the owner of the data can use to view 
the data, Navicat, PHPMyAdmin, export it to CSV and look in excel, 
etc.   And the data tends to be very nicely laid out so it the primary 
bits are understandable.....and then you come to "what actual person did 
this?" - and it all breaks down.  Sitting there is some number....  Now 
suddenly from viewing an individual table[which is easy] the person 
needs to join and link the data to another table.   So he has to pay a 
programmer everytime he wants a report to figure out these linkages.

So all the little websites of the world pay a premium just so that open 
source applications can say their "scalable" and that the few big 
businesses which could have afforded to pay to tune the system don't 
have to.

That to me is a poor business decision.  Making many other people pay so 
that your code can be "scalable" and "good coding practices" and "well 
engineered" is not the sign of a good coder - it's the sign of an 
artistic one.

So, what happens if you do it the other way around?  What if you put the 
users name in every one of those records and used that as the key?

Well, performance for those little sites doesn't take any significant 
hit.  On the rare occassion that someone wants to change their name, you 
do have to either go and change it in every existing table - or you have 
to maintain some sort of history table that you can link through so the 
program can say:
This order was placed on December 15th at 2:00pm and at that time, the 
userid garyamort referred to the user who is garyamort today.

Again, for the mom and pop's of the world, take the hit and update it in 
all tables.

It's a different paradigm.  Throwing "art" out the window and replacing 
it by practicality.  Of course, it also means "know your market".  If 
you are coding a specific app for a specific person who is planning on 
growing exponentially, then designing something for him which has to be 
changed later is a poor choice.  But if your making an app for tens of 
thousands of people to use - make simplicity for the end user the 
driving force.   Do stupid ass things that would never scale....and then 
note what they are and how you change it[if you scale past 10,000 users 
convert all usernames to integer keys and always use the aliases 
table.... here is the high level overview on how to do it.  if your not 
a coder, make it your business model to be able to afford to pay for a 
coder when you reach that point.]

The key point, for me, is it should not be so difficult for me to change 
an application so email addresses can be used instead of usernames.  Or 
to make it possible for multiple usernames to use the same email 
address.  Or to change the rules on usernames to allow different 
charectors.   And truely, it is easy to make that change for the one bit 
of core code....it's tracking down all those other bits of code that 
worked on the assumption of those rules...merely because those where the 
rules....that I find troublesome....   and it bothers me greatly to 
charge for this coding time because I am basically fixing the mistaken 
design decisions made by people who are applying rules by rote.  "Always 
use an auto-incrementing primary id"





More information about the talk mailing list