Suppose you have a large collection of compact discs and you want to create a database to track them. The first step is to determine what the data that you are going to store is about. One good way to start is to think about why you want to store the data in the first place. In our case, we most likely want to be able to look up CDs by artist, title, and song. Since we want to look up those items, we know they must be included in the database. In addition, it is often useful to simply list items that should be tracked. One possible list might include: CD title, record label, band name, song title. As a starting point, we will store the data in the table shown in Table 2-1.
Table 2-1. A CD Database Made Up of a Single Table
Band Name | CD Title | Record Label | Songs |
---|---|---|---|
Stevie Wonder | Talking Book | Motown | You Are the Sunshine of My Life, Maybe Your Baby, Superstition, . . . |
Miles Davis Quintet | Miles Smiles | Columbia | Orbits, Circle, . . . |
Wayne Shorter | Speak No Evil | Blue Note | Witch Hunt, Fee-Fi-Fo-Fum |
Herbie Hancock | Headhunters | Columbia | Man, . . . |
Herbie Hancock | Maiden Voyage | Blue Note | Maiden Voyage |
There are basically three
components to design a database: Conceptual, logical and physical.
Logical database design involves modeling your business requirements and data
using database components, such as tables and constraints, without regard for
how or where the data will be physically stored. Physical database design
involves mapping the logical design onto physical media, taking advantage of
the hardware and software features available, which allows the data to be
physically accessed and maintained as quickly as possible, and indexing.
It is important to
correctly design the database to model your business requirements, and to take
advantage of hardware and software features early in the development cycle of a
database application, because it is difficult to make changes to these
components later.
Logical database Design:
A logical schema won't exist in your database. A logical
schema is a design-centric database structure built to meet your business
requirements. It is a model that exists on a white board or in a diagraming
tool. It is like the architect's drawings of your database.
Basic database
logical model objects (DBMS specific) are: Schema, Tables, Columns, Data Types,
Constraints: primary key, foreign key, uniqueness, referential integrity,
default, range
Logical Schema
- is made out of actually what you plan (your initial data model) and it's objects are:
- tablespaces
- tables
- views
- procedures
- packages etc...
- is made out of actually what you plan (your initial data model) and it's objects are:
- tablespaces
- tables
- views
- procedures
- packages etc...
Logical design involves
defining business entities, attributes for each entity, and relationships among
entities. The latter are sometimes referred to as cardinality and optionality
rules. For example, a grocery store
application might include entities for store, employee, customer, item and
sales transaction. Each store can have multiple employees, but each employee
might only be allowed to be a full-time employee in a single store. Similarly
each sales transaction must have a single customer, and can have one or more items
sold.
Logical data models
(LDMs) are often produced in the form of an Entity Relationship Diagram (ERD),
which uses notation for expressing entities as boxes and relationships as
lines. Attributes either appear in the boxes or are listed for each entity.
Relationships are lines between the boxes, with the cardinality and optionality
expressed as a circle for zero (or optional), a vertical line for
"one" and a crow's foot for "many".
Store -||---employs---0|< Employee
(a Store employs zero one or more Employees and an
employee is employed by exactly one Store).
Customer -||-- participates in -|< Sales Transaction
(a Customer participates in one or more Sales Transactions
and a Sales Transaction is participated in by exactly one Customer)
For Logical data modeling:
2. Identify
and model the relationships between the entities.
3. Identify
and model the attributes.
4. Identify
unique identifiers for each entity.
5. Normalize.
Logical modeling is not specific to the database engine or other technical platform. It is organizationally independent as well. Logical models are generally highly normalized. Logical models are often used to derive or generate physical models via a physical design process.
Note that some
methodologies (such as ORM) refer to a technology-independent data models as
Conceptual Data Models (CDM) and a Logical Data Model is constrained by
technology constraints. It is important if reading or discussing CDMs and LDMs
to understand the context of how these terms are used.
Physical Database
Design:-
The I/O subsystem (storage engine) is a key component of any
relational database. A successful database implementation usually requires
careful planning at the early stages of your project. The storage engine of a
relational database requires much of this planning, which includes determining:
- What type of disk hardware to use, such as RAID (redundant array of independent disks) devices.
- How to place your data onto the disks. For more information, see Data Placement Using File groups.
- Which index design to use to improve query performance in accessing data.
- How to set all configuration parameters appropriately for the database to perform well.
Physical design involves generating the schema for tables, indexes, default and check
constraints, and views for a specific physical structure such as a database,
file, or XML document. A physical model is generally specific to the
database engine and version and is often optimized for a specific application
usage of the data. Physical models include data types for each attribute and
can be normalized or de-normalized. Physical
models can change after the application is deployed in the production
environment. For example, indexes might be altered to tune the performance. New
tables, constraints, defaults and rules might be added to enhance the
application's feature set and enforce new set of business rules.
Designing
additional physical level structures are Files: storage files, partitions,
etc. Performance: indexes, views, etc. Functional: scripts (stored procedures),
functions, triggers, transactions, etc. Security: users, roles, permissions, etc.
Physical Schema
-here is where you have your data files, redo logs, control files etc, physical objects that reside on a O.S. and are in row state. These objects are used by logical structures to make our data make sense and readable.
-here is where you have your data files, redo logs, control files etc, physical objects that reside on a O.S. and are in row state. These objects are used by logical structures to make our data make sense and readable.
·
Now the
questions comes into your mind! What was the point in
creating the logical data model? E.g.You want to create a database to store
data about CDs. The data model is
only an intermediate step along the way. Ultimately, you would like to end up
with a MySQL or mSQL database where you can store data. How do you get there?
Physical database design translates your logical data model into a set of SQL
statements that define your MySQL or mSQL database.
Since
MySQL and mSQL are relational database systems, it is relatively easy to
translate from a logical data model, such as the one we described earlier, into
a physical MySQL or mSQL database. Here are the rules for translation:
1.
Entities
become tables in the physical
database.
2.
Attributes
become columns in
the physical database. You have to choose an appropriate datatype for each of
the columns.
3.
Unique
identifiers become columns that
are not allowed to have NULLs. These are
called primary keys in
the physical database. You may also choose to create a unique index on the
identifiers to enforce uniqueness. For your purposes, mSQL does not have a
concept of a primary key. It simply has unique indices. This issue does not
apply to MySQL.
4.
Relationships are
modeled as foreign keys.
We will cover this later.
If we apply these rules to our data model -- minus the Record
Label address
information -- we will end up with the physical database described in Table 2-1.
Table 2-1. Physical Table Definitions for the CD Database
Table
|
Column
|
Datatype
|
Notes
|
CD | CDId | INT | primary key |
CDTitle | TEXT(50) | ||
Artist | ArtistId | INT | primary key |
ArtistName | TEXT(50) | ||
Song | SongId | INT | primary key |
SongName | TEXT(50) | ||
RecordLabel | RecordLabelId | INT | primary key |
RecordLabelName | TEXT(50) | primary key |
The first thing you may notice is that all of the
spaces are gone from the entity names in our physical schema. This is because
these names need to translate into SQL calls to create these tables. Table
names should thus conform to SQL naming rules. Another thing to notice is that
we made all primary keys of type INT.
Because these attributes are complete
inventions on our part, they can be of any indexible datatype. The fact that they
are of type INT here is almost purely
arbitrary. It is almost arbitrary because it is actually
faster to search on numeric fields in many database engines and hence numeric
fields make good primary keys. However, we could have chosen CHAR as the type for the
primary key fields and everything would work just fine. The bottom line is that
this choice should be driven by your criteria for choosing identifiers.
The rest of the
columns are set to be of type TEXT with a length of 50.
This definition works for both MySQL and mSQL. For MySQL, however, VARCHAR would be a better
choice but not important to this example. Picking the right datatype for
columns is very important, but we will not dwell on it here since we have not
yet covered the datatypes for MySQL and mSQL.
The most common
relationship is the 1-to-M relationship. This relationship is mapped by placing
the primary key on the "one" side of the relationship into the table
on the "many" side. In our example, this rule means that we need to
do the following:
·
Place a RecordLabelId column in the CD table.
·
Place a CDId column in the Song table.
·
Place an ArtistId column in the Song table.
Table 2-2 shows the new schema.
Table 2-2. The Physical Data Model for the CD Database
Table
|
Column
|
Datatype
|
Notes
|
CD | CdId | INT | primary key |
CDTitle | TEXT(50) | ||
RecordLabelId | INT | foreign key | |
Artist | ArtistId | INT | primary key |
ArtistName | TEXT(50) | ||
Song | SongId | INT | primary key |
SongName | TEXT(50) | ||
CdId | INT | foreign key | |
ArtistId | INT | foreign key | |
RecordLabel | RecordLabelId | INT | primary key |
RecordLabelName | TEXT(50) |
We now have a complete physical database schema ready
to go. The last remaining task is to translate that schema into SQL. For each
table in the schema, you write one CREATE TABLE statement.
Typically, you will choose to create unique indices on the primary keys to
enforce uniqueness.
We are, in a sense,
jumping ahead at this point. You may not be familiar with SQL yet, and it is
not the purpose of this chapter to introduce the MySQL and mSQL variants of
SQL. Nevertheless, here are two sample scripts to create the CD database. The
first script, Example
2-1 is
for MySQL. Example
2-2 is
for mSQL.
CREATE TABLE CD (CD_ID INT NOT NULL,
RECORD_LABEL_I INT,
CD_TITLE TEXT,
PRIMARY KEY (CD_ID))
CREATE TABLE Artist (ARTIST_ID INT NOT
NULL,
ARTIST_NAME TEXT,
PRIMARY KEY (ARTIST_ID))
CREATE TABLE Song (SONG_ID INT NOT NULL,
CD_ID INT,
SONG_NAME TEXT,
PRIMARY KEY (SONG_ID))
CREATE TABLE RecorLabel (RECORD_LABEL_ID
INT NOT NULL,
RECORD_LABEL_NAME TEXT,
PRIMARY KEY(RECORD_LABEL_ID))
CREATE TABLE CD (CD_ID INT NOT NULL,
RECORD_LABEL_ID INT,
CD_TITLE TEXT(50))
CREATE UNIQUE INDEX CD_IDX ON CD (CD_ID)
CREATE TABLE Artist (ARTIST_ID INT NO NULL,
ARTIST_NAME TEXT(50))
CREATE UNIQUE INDEX Artist_IDX ON Artist (ARTIST_ID)
CREATE TABLE Song (SONG_ID INT NOT NULL,
CD_ID INT,
SONG_NAME TEXT(50))
CREATE UNIQUE INDEX Song_IDX ON Song (SONG_ID)
CREATE TABLE RecordLabel (RECORD_LABEL_ID INT NOT NULL,
RECORD_LABEL_NAME
TEXT(50))
CREATE UNIQUE INDEX RecordLabel_IDX
ON
RecordLabel(RECORD_LABEL_ID)
Data models are
meant to be database independent.
No comments:
Post a Comment