NYCPHP Meetup

NYPHP.org

[nycphp-talk] data modelling vs. db design (was: ER Diagram tool for MySQL/OS X)

Mitch Pirtle mitch.pirtle at gmail.com
Tue Oct 4 22:19:43 EDT 2005


On 10/4/05, max goldberg <max.goldberg at gmail.com> wrote:
> I'm actually running into the point of an application where my completely
> normalized database design is just killing my database server. It was fine
> during the first few months of my site being up but as my traffic grew, it
> really began to take a toll. The first thing I had to do was move from
> MyISAM to InnoDB, due to the huge number of locks causing the site to just
> screech to a halt. On average I am getting around 5000 hits per minute to
> the PHP pages that connect to the database. During the "slow" time at 7am, I
> get around 2500.  I've managed to lower my average queries per second from
> 300 to 220, but normalization is still killing me.

Without knowing your datamodel, I cannot really tell you much. My
experience tells me that the more you normalize, the less you will
encounter the locks that MySQL ISAM tables are prone to.
Denormalization will make your problems worse for sure.

>  I've considered using InnoDB's cascading update/delete function to keep my
> digest tables up to date without a cron job, but I want to play with it more
> before I just launch into it. I am starting to wonder how much of this is
> just my lack of expertise in database administration and how much is just a
> fundamental flaw with normalized database design. Is it possible to have a
> large scale production database with a fully normalized design without
> extremely expensive hardware?

See below for my comments, but I have personally worked on sites using
both MySQL and PostgreSQL that had daily page views well into the
millions. The PostgreSQL sites usually have no problems, and the one
really huge MySQL site had major ones - but remember that this was
almost ten years ago when MySQL was just a little punk ;-)

I would wager that moving to a more recent version of MySQL would
alleviate some of your pain, but I have no idea what version you are
using at the moment.

>  It seems like it would be worth the extra overhead to keep
> behind-the-scenes digest tables of commonly joined tables, is this something
> that higher end databases (ie. oracle) do?

I worked on a site with over 7.5 million page views daily, and ran on
two dual-xeon servers with PHP and PostgreSQL. The MySQL version of
the app had the exact same problems you had, and I also had to switch
to InnoDB tables to keep indexes from getting blown away on very
active tables. Interestingly enough, PostgreSQL handled the normalized
design without a burp. The switcheroo also included the implementation
of a lot more functionality, like foreign keys, stored procedures and
triggers. I expected a wash in performance, but it was just a tad
faster under heavy load.

I know of a financial application that was using SQL Server, and when
you get to like 20 joins or whatever the query just fails. A quick
port to PostgreSQL and all those joins run just fine.

I spend most of my time in PostgreSQL - but I can say that if you have
a sophisticated data model with a lot of foreign constraints,
PostgreSQL's MVCC model takes care of a lot of the pain that other
databases often experience when doing lots of JOIN operations.

Time to take a look at the impending MySQL 5.0 (it *is* going stable
sometime right? hehe) as I am not really up to speed with what MySQL
has to offer with the latest versions.

But in the end you can definitely run high-traffic, data-intensive
websites with open source databases, and I would only consider a big
commercial RDBMS if you hired someone with that particular skillset;
and even then it would be more of a business decision than a technical
one.

-- Mitch



More information about the talk mailing list