Terminology confusion abounds
in the software industry and I believe the worst examples are in database
management in particular. So I shouldn't have been surprised to read the
nonsense I discuss here... but still I was.
The following quote opens up
edition 6.14 of Woody’s
Access Watch (“WOW”), a normally fine e-newsletter focused on building
Microsoft Access solutions:
Access
is a normalized database (except for purists who don’t count a database as
normalized unless it adheres to all 13 of the Codd & Date rules), and that
means that you should store linked data in different tables.
If you don’t reel from an
affront to your sensibilities from that one sentence, you should. This is no
different, to me, than saying:
Microsoft
Word is a legal document.
Sounds silly, right?
But did the WOW quote sound as
odd to you? It should. It reveals many misunderstandings about database
technology and practices and is confused and muddled, something we of all
people should avoid.
Ignoring the poor grammar,
here are the technical reasons this WOW statement should offend you as a
technology professional:
1. Microsoft Access is not
a database.
Read that again. Access is not
a database. A database is an organized collection of information accessible for
some purpose. Is Access an organized collection of information? No! It is a
piece of software. Many people are far too cavalier with the word “database.”
Database professionals—those in our society who are responsible for (and
presumably trained for) organizing and classifying the world’s
information—should be the least so.
Fundamentally, Access is a
“database management system” or “DBMS” for short. You could call it “database software”
or even, loosely, a “database system.” But it not a database. “Northwind” and
“pubs” are databases. IMDB.com is (largely)
a database. The card catalog at the library is a database. But Microsoft Access
is a DBMS.
Calling Access a database is
like calling Microsoft Word a “document.”
2. Nothing guarantees that any
particular database implemented within Access will be normalized or not
normalized...
... any more than using Word
to write documents guarantees they will be legally enforceable documents.
In my experience databases
implemented with Access frequently violate all sorts of normalization rules.
One recent example I saw had a single column, “[City, State]” (yes, including
the comma and space!). This clearly violates first normal form (“1NF”).
And don’t mistake this for an
Access limitation or problem. No DBMS forces you to follow good database
design practices. I have seen equally good and bad designs in Access, Oracle,
and SQL Server. (Okay, the more advanced tools are somewhat more likely to be
used by better trained professionals. But the edge is slight.)
To think of Access as a
“normalized” database tool or even a “relational” DBMS would be akin to
thinking of Word as a tool that automatically produced legally enforceable
documents regardless of what you typed.
Being an expert at Access (or
any database tool) no more qualifies you to implement correct, high quality
database solutions than being a Word expert would qualify you to be a Pulitzer
prize-winning author. It helps get the job done faster, but you still have to
know how to do the job. (Most hiring managers do not understand this!)
3. Normalization does not
mean adhering to Codd’s 12 rules.
Note: the guidelines are normally listed as "12" rules and were
proposed by Ted Codd (not "and Date"). They are much quoted,
including of course in Date's standard database text. See this site for a quick
summary. Since there is a "Rule Zero," it would be fine to say the
"13 Rules," but most texts do not call them such.
Codd’s 12 rules for relational
database systems were geared toward those hoping to implement a relational DBMS
and, as such, dictate certain behaviors of those software systems.
Normalization,
in stark contrast, applies to the table-level design of one particular
database.
See why it’s so important to
be clear about “database” v. “DBMS”? I hope so.
(As an aside, being
“normalized” also does not mean being “relational.” Adhering to 1NF is required.
Beyond that they are simply design recommendations. See C.J. Date’s excellent “What First Normal Form
Really Means” article for details.)
4. Normalization does not per
se mean storing data in different tables and then linking back together.
Granted,
such a design is usually a consequence of a normalized design. But
normalization rules are strictly about removing certain types of redundancy
from a certain table. Usually this is accomplished by decomposing the
table in question into two or more related tables, for example splitting out a
“Children” column into a separate “Child” table containing one row per child.
But it’s important to note the difference.
It is especially important to
note that just because you have separated data tables and relationships between
them does not mean you have a normalized database or a relational database.
Legal jargon is probably
necessary to produce a legal document, but just a glut of legalese scattered in
my document does not mean it will be legally enforceable.
But possibly the biggest
problem with the WOW quote is that it comes from a supposed industry “expert.”
Unfortunately such ignorance of our field is rampant. You must be careful and
pay attention to the fundamentals.
Keep in mind:
If
you do not know the history of your own profession you can never hope to be but
an amateur.
—anonymous
Note:
For much deeper and longer treatment of many of the industry's database
confusions, see Fabian Pascal's site, www.dbdebunk.com.
If you regularly work with databases, you owe it to yourself to be familiar
with his and Chris Date’s writings available at this site.