[nycphp-talk] data modelling vs. db design (was: ER Diagramtoolfor MySQL/OS X)
Hans Zaunere
lists at zaunere.com
Tue Oct 11 22:13:59 EDT 2005
Phil Duffy wrote on Tuesday, October 11, 2005 9:05 AM:
> > -----Original Message-----
> > From: talk-bounces at lists.nyphp.org
> > [mailto:talk-bounces at lists.nyphp.org] On Behalf Of Hans Zaunere Sent:
> > Monday, October 10, 2005 7:09 PM
> > To: 'NYPHP Talk'
> > Subject: Re: [nycphp-talk] data modelling vs. db design (was: ER Diagram
> > toolfor MySQL/OS X) Normalization is often one of the most
> > misunderstood aspects of relational databases. Of course, it's what
> > makes a relational database relational, and we're all taught that
> > you're not worth a cent unless you only work in the 5th normalized
> > form.
> >
> > Well, perhaps if you're a mathematician, then that's correct. The
> > reality is that over-normalization can kill a project just as much as
> > under-normalization. Don't forget - at least from the old SQL books I
> > learned from - why normalization was being preached. "Don't have
> > duplicate data!" was often the mantra. If you had more than one guy
> > named "John", then you need to have a table of first names, with a
> > lookup table relating first names with the other data you're storing.
> > Seems rediculous, right?
> >
> > The practice of heavy normalization was based on some old, bygone,
> > concerns, namely "storage is expensive." Add in academic
> > chest-beating, and pretty soon you have a table for every letter of the
> > alphabet and look-up tables to match.
> >
> > As I think we generally find, with the abundance of hardware resources,
> > normalization is reduced to only an element of good design. Sure,
> > performance can still be a factor, but let's consider the performance
> > of an entire system, and not just what's happening in the database.
>
> <snip>
>
> Hans,
>
> While I agree with your general message, I have a reservation about the
> "storage is expensive" specific example. Having been around since before
> Boole and Babbage, I can't even recall in the "old days" when developers
> were slavish about "storage is expensive". We sought opportunities to
> reduce storage, when that was practical, but only in very specialized
> applications did we go out of our way to conserve disk storage. The
> primary issue was data consistency and software maintainability. Lack of
> normalization created situations which encouraged users to create
> inconsistent data. For example, consider a pre-relational system in
> healthcare that recognized two 'pseudo-entities', physicians and patients
> (pseudo in the sense we now recognize these as roles as opposed to
> fundamental entities). Each pseudo-entity had its own basic record to
> contain address, telephone number and other attributes. That worked as
> long as the physician never became a patient. However, a change to a
> physician/patient's record created an inconsistency in the changed
> attribute, e.g., the address.
>
> The second issue was software maintenance. There were situations where
> one record's structure might be modified while the other was not.
Good points, and I think you raise an important issue, namely, normalization
as a mechanism for separation and isolation.
> I believe it was a natural progression from normalization of databases to
> object-oriented design, because the latter encouraged all interactions
> with the database to be performed in a single module. I know of at least
> one situation in healthcare where that limitation in the existing systems
> nearly killed a patient.
Ah hah! Separation and isolation. There is always the consideration of
what perspective we're looking at, and at the end, who is saying
normalization. I think the storage reasoning has come from - at least in my
experience when browsing academic SQL references - a largely idealistic way
of storing data. Perhaps storage was or wasn't really expensive, but it was
neat to not store redundant data, as the first few normal-forms try to
address:
http://www.datamodel.org/NormalizationRules.html
That's from the data-structures-are-cool-and-I'm-not-an-implementer
department, which begins down the road of:
http://en.wikipedia.org/wiki/Database_normalization#Sixth_normal_form
http://www.bkent.net/Doc/simple5.htm
and eventually to:
http://www.nerdbooks.com/item.php?id=1558608559
Not a bad thing, but not something I'd be ready to recommend to a customer
:)
>From a developer's perspective, however, the more important issue, as you
point out, is a good, clean, representation of data. And, as hardware
resources become cheaper, that might mean that it's OK to have
redundant/repeating data, and still be correctly normalized and
architectured.
> But your principle point is that database design (and object design by
> extension in my opinion) is still an art form. Both database and object
> design should look upon the 'rules' as significant guidelines. Understand
> when to bypass these rules, and document those reasons (conditions may
> change in the future and others will probably have to clean up the
> situation).
Exactly - it's a balance between what works in a sensible way, and how
you're supposed to do it. As some of the links above mention, they're
discussing "relational database theory." A lot of the work these guys do is
great, and the basis for much of the technology we have today. But as
architects and implementers on the front-lines, it's our job to find that
happy place between the theory and practical, cost effective software.
> I liked your example of normalizing the string 'John', which underlines
> the absurdity of over-normalization. For me, appropriate normalization
> begins with the recognition of the fundamental entities in a system and a
> distinction between real world-modeled entities (e.g., a Person) as
> opposed to system-internal entities (e.g., a web page). If I come to a
> branch in the road at which normalization or de-normalization both offer
> advantages, I am most likely to take the normalization branch, make a
> mental note and if I later discover that creates a significant
> performance issue that can't be otherwise corrected, I would de-normalize
> that area of the system intentionally (documenting my reasons and the
> implementation).
>
> It would be interesting to see if others have approached normalization
> guidelines differently. I won't claim the above is the only way to do
> database design - it is still an art form after all.
I approach it in much the same way, however often take de-normalization if
all other things are equal. Recognition of fundamental entities is really
the driver behind normalization in most environments I come across, and
without it, the total system, in terms of code/DB writing, style, and
maintainability, system performance, and future system changes by external
developers, is usually happier with a bit more data consolidation.
Good talk Phil,
---
Hans Zaunere / President / New York PHP
www.nyphp.org / www.nyphp.com
More information about the talk
mailing list