HTML Dropdown

Thursday, 8 October 2015

DataBase Design :Part1



Suppose you have a large collection of compact discs and you want to create a database to track them. The first step is to determine what the data that you are going to store is about. One good way to start is to think about why you want to store the data in the first place. In our case, we most likely want to be able to look up CDs by artist, title, and song. Since we want to look up those items, we know they must be included in the database. In addition, it is often useful to simply list items that should be tracked. One possible list might include: CD title, record label, band name, song title. As a starting point, we will store the data in the table shown in  Table 2-1.

Table 2-1. A CD Database Made Up of a Single Table

Band NameCD TitleRecord LabelSongs
Stevie WonderTalking BookMotownYou Are the Sunshine of My Life, Maybe Your Baby, Superstition, . . .
Miles Davis QuintetMiles SmilesColumbiaOrbits, Circle, . . .
Wayne ShorterSpeak No EvilBlue NoteWitch Hunt, Fee-Fi-Fo-Fum
Herbie HancockHeadhuntersColumbiaMan, . . .
Herbie HancockMaiden VoyageBlue NoteMaiden Voyage


There are basically three components to design a database: Conceptual, logical and physical. Logical database design involves modeling your business requirements and data using database components, such as tables and constraints, without regard for how or where the data will be physically stored. Physical database design involves mapping the logical design onto physical media, taking advantage of the hardware and software features available, which allows the data to be physically accessed and maintained as quickly as possible, and indexing.
It is important to correctly design the database to model your business requirements, and to take advantage of hardware and software features early in the development cycle of a database application, because it is difficult to make changes to these components later.



Logical database Design:
A logical schema won't exist in your database. A logical schema is a design-centric database structure built to meet your business requirements. It is a model that exists on a white board or in a diagraming tool. It is like the architect's drawings of your database.
Basic database logical model objects (DBMS specific) are: Schema, Tables, Columns, Data Types, Constraints: primary key, foreign key, uniqueness, referential integrity, default, range
Logical Schema 
- is made out of actually what you plan (your initial data model) and it's objects are:
- tablespaces
- tables
- views
- procedures
- packages etc...
Logical design involves defining business entities, attributes for each entity, and relationships among entities. The latter are sometimes referred to as cardinality and optionality rules. For example, a grocery store application might include entities for store, employee, customer, item and sales transaction. Each store can have multiple employees, but each employee might only be allowed to be a full-time employee in a single store. Similarly each sales transaction must have a single customer, and can have one or more items sold.
Logical data models (LDMs) are often produced in the form of an Entity Relationship Diagram (ERD), which uses notation for expressing entities as boxes and relationships as lines. Attributes either appear in the boxes or are listed for each entity. Relationships are lines between the boxes, with the cardinality and optionality expressed as a circle for zero (or optional), a vertical line for "one" and a crow's foot for "many".
Store -||---employs---0|< Employee 
(a Store employs zero one or more Employees and an employee is employed by exactly one Store).
Customer -||-- participates in -|< Sales Transaction 
(a Customer participates in one or more Sales Transactions and a Sales Transaction is participated in by exactly one Customer)
For Logical data modeling:
2.       Identify and model the relationships between the entities.
3.       Identify and model the attributes.
4.       Identify unique identifiers for each entity.
5.       Normalize.


Logical modeling is not specific to the database engine or other technical platform. It is organizationally independent as well. Logical models are generally highly normalized. Logical models are often used to derive or generate physical models via a physical design process. 
Note that some methodologies (such as ORM) refer to a technology-independent data models as Conceptual Data Models (CDM) and a Logical Data Model is constrained by technology constraints. It is important if reading or discussing CDMs and LDMs to understand the context of how these terms are used.

Physical Database Design:-
The I/O subsystem (storage engine) is a key component of any relational database. A successful database implementation usually requires careful planning at the early stages of your project. The storage engine of a relational database requires much of this planning, which includes determining:
  • What type of disk hardware to use, such as RAID (redundant array of independent disks) devices.
  • How to place your data onto the disks. For more information, see Data Placement Using File groups.
  • Which index design to use to improve query performance in accessing data.
  • How to set all configuration parameters appropriately for the database to perform well.
Physical design involves generating the schema for tables, indexes, default and check constraints, and views for a specific physical structure such as a database, file, or XML document. A physical model is generally specific to the database engine and version and is often optimized for a specific application usage of the data. Physical models include data types for each attribute and can be normalized or de-normalized. Physical models can change after the application is deployed in the production environment. For example, indexes might be altered to tune the performance. New tables, constraints, defaults and rules might be added to enhance the application's feature set and enforce new set of business rules.
Designing additional physical level structures are Files: storage files, partitions, etc. Performance: indexes, views, etc.  Functional: scripts (stored procedures), functions, triggers, transactions, etc.  Security: users, roles, permissions, etc.
Physical Schema
-here is where you have your data files, redo logs, control files etc, physical objects that reside on a O.S. and are in row state. These objects are used by logical structures to make our data make sense and readable.

·        
Now the questions comes into your mind! What was the point in creating the logical data model? E.g.You want to create a database to store data about CDs. The data model is only an intermediate step along the way. Ultimately, you would like to end up with a MySQL or mSQL database where you can store data. How do you get there? Physical database design translates your logical data model into a set of SQL statements that define your MySQL or mSQL database.
Since MySQL and mSQL are relational database systems, it is relatively easy to translate from a logical data model, such as the one we described earlier, into a physical MySQL or mSQL database. Here are the rules for translation:
1.       Entities become tables in the physical database.
2.       Attributes become columns in the physical database. You have to choose an appropriate datatype for each of the columns.
3.       Unique identifiers become columns that are not allowed to have NULLs. These are called primary keys in the physical database. You may also choose to create a unique index on the identifiers to enforce uniqueness. For your purposes, mSQL does not have a concept of a primary key. It simply has unique indices. This issue does not apply to MySQL.
4.       Relationships are modeled as foreign keys. We will cover this later.
If we apply these rules to our data model -- minus the Record Label address information -- we will end up with the physical database described in Table 2-1.

Table 2-1. Physical Table Definitions for the CD Database

Table
Column
Datatype
Notes
CD CDId INT primary key

CDTitle TEXT(50)
Artist ArtistId INT primary key

ArtistName TEXT(50)
Song SongId INT primary key

SongName TEXT(50)
RecordLabel RecordLabelId INT primary key

RecordLabelName TEXT(50) primary key


The first thing you may notice is that all of the spaces are gone from the entity names in our physical schema. This is because these names need to translate into SQL calls to create these tables. Table names should thus conform to SQL naming rules. Another thing to notice is that we made all primary keys of type INT. Because these attributes are complete inventions on our part, they can be of any indexible datatype. The fact that they are of type INT here is almost purely arbitrary. It is almost arbitrary because it is actually faster to search on numeric fields in many database engines and hence numeric fields make good primary keys. However, we could have chosen CHAR as the type for the primary key fields and everything would work just fine. The bottom line is that this choice should be driven by your criteria for choosing identifiers.
The rest of the columns are set to be of type TEXT with a length of 50. This definition works for both MySQL and mSQL. For MySQL, however, VARCHAR would be a better choice but not important to this example. Picking the right datatype for columns is very important, but we will not dwell on it here since we have not yet covered the datatypes for MySQL and mSQL.
The most common relationship is the 1-to-M relationship. This relationship is mapped by placing the primary key on the "one" side of the relationship into the table on the "many" side. In our example, this rule means that we need to do the following:
·         Place a RecordLabelId column in the CD table.
·         Place a CDId column in the Song table.
·         Place an ArtistId column in the Song table.
Table 2-2 shows the new schema.

Table 2-2. The Physical Data Model for the CD Database

Table
Column
Datatype
Notes
CD CdId INT primary key

CDTitle TEXT(50)

RecordLabelId INT foreign key
Artist ArtistId INT primary key

ArtistName TEXT(50)
Song SongId INT primary key

SongName TEXT(50)

CdId INT foreign key

ArtistId INT foreign key
RecordLabel RecordLabelId INT primary key

RecordLabelName TEXT(50)
We do not have any 1-to-1 relationships in this data model. If we did have such a relationship, it should be mapped by picking one of the tables and giving it a foreign key column that matches the primary key from the other table. In theory, it does not matter which table you choose, but practical considerations may dictate which column makes the most sense as a foreign key.
We now have a complete physical database schema ready to go. The last remaining task is to translate that schema into SQL. For each table in the schema, you write one CREATE TABLE statement. Typically, you will choose to create unique indices on the primary keys to enforce uniqueness.
Example 2-1. An Example Script for Creating the CD Database in MySQL
CREATE TABLE CD (CD_ID INT NOT NULL,
                         RECORD_LABEL_I INT,
       CD_TITLE TEXT,
       PRIMARY KEY (CD_ID))
CREATE TABLE Artist (ARTIST_ID INT NOT NULL,
       ARTIST_NAME  TEXT,
                                     PRIMARY KEY (ARTIST_ID))
CREATE TABLE Song (SONG_ID INT NOT NULL,
                   CD_ID INT,
                   SONG_NAME TEXT,
                   PRIMARY KEY (SONG_ID))
CREATE TABLE RecorLabel (RECORD_LABEL_ID INT NOT NULL,
                                                  RECORD_LABEL_NAME TEXT,
                                                  PRIMARY KEY(RECORD_LABEL_ID))
CREATE TABLE CD (CD_ID            INT NOT NULL,
                 RECORD_LABEL_ID  INT,
                 CD_TITLE         TEXT(50))
CREATE UNIQUE INDEX CD_IDX ON CD (CD_ID)
CREATE TABLE Artist (ARTIST_ID     INT NO NULL,
                     ARTIST_NAME   TEXT(50))
CREATE UNIQUE INDEX Artist_IDX ON Artist (ARTIST_ID)
CREATE TABLE Song (SONG_ID         INT NOT NULL,
                   CD_ID           INT,
                   SONG_NAME       TEXT(50))
CREATE UNIQUE INDEX Song_IDX ON Song (SONG_ID)
CREATE TABLE RecordLabel (RECORD_LABEL_ID   INT NOT NULL,
                          RECORD_LABEL_NAME TEXT(50))
CREATE UNIQUE INDEX RecordLabel_IDX
                   ON RecordLabel(RECORD_LABEL_ID)
Data models are meant to be database independent.

 




No comments:

Post a Comment