Data Modeling
Introduction
Data modeling is the process of creating a visual representation of data and its relationships within an information system. It serves as a blueprint for database design and development.
Modeling Levels:
- Conceptual - Business view
- Logical - System-independent view
- Physical - Implementation view
- Schema mapping
- Data flow modeling
- Dimensional modeling
Conceptual Modeling
Entity-Relationship Model
Example of ER diagram notation:
[Entity: Customer]
Attributes:
- CustomerID (PK)
- Name
- Email
- Phone
[Entity: Order]
Attributes:
- OrderID (PK)
- CustomerID (FK)
- OrderDate
- Status
Relationship:
Customer (1) ----< Order (N)
/* One customer can have many orders */
Business Rules
Common Rules:
- Cardinality constraints
- Participation constraints
- Business process rules
- Data integrity rules
Logical Modeling
Relational Schema
Example of logical model:
Customer (
customer_id: INTEGER {PK},
name: VARCHAR(100),
email: VARCHAR(100) {UNIQUE},
status: CHAR(1) {CHECK IN ('A','I')}
)
Order (
order_id: INTEGER {PK},
customer_id: INTEGER {FK},
order_date: DATE {NOT NULL},
total_amount: DECIMAL(10,2)
)
OrderItem (
order_id: INTEGER {PK, FK},
item_no: INTEGER {PK},
product_id: INTEGER {FK},
quantity: INTEGER {> 0},
unit_price: DECIMAL(10,2)
)
Constraints Definition
Types of Constraints:
- Domain constraints
- Key constraints
- Entity integrity
- Referential integrity
Physical Modeling
Storage Structures
Example of physical implementation:
CREATE TABLE customers (
customer_id SERIAL PRIMARY KEY,
name VARCHAR(100) NOT NULL,
email VARCHAR(100) UNIQUE,
status CHAR(1) DEFAULT 'A',
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
CONSTRAINT chk_status CHECK (status IN ('A','I'))
) TABLESPACE customer_space;
CREATE INDEX idx_customer_email ON customers(email);
CREATE INDEX idx_customer_status ON customers(status);
PARTITION TABLE orders BY RANGE (order_date) (
PARTITION orders_2023 VALUES LESS THAN ('2024-01-01'),
PARTITION orders_2024 VALUES LESS THAN ('2025-01-01'),
PARTITION orders_future VALUES LESS THAN (MAXVALUE)
);
Advanced Techniques
Dimensional Modeling
Example of star schema:
/* Fact Table */
CREATE TABLE sales_fact (
date_key INT,
product_key INT,
store_key INT,
customer_key INT,
quantity_sold INT,
amount DECIMAL(10,2),
FOREIGN KEY (date_key) REFERENCES dim_date(date_key),
FOREIGN KEY (product_key) REFERENCES dim_product(product_key),
FOREIGN KEY (store_key) REFERENCES dim_store(store_key),
FOREIGN KEY (customer_key) REFERENCES dim_customer(customer_key)
);
/* Dimension Tables */
CREATE TABLE dim_date (
date_key INT PRIMARY KEY,
full_date DATE,
year INT,
quarter INT,
month INT,
week INT,
day_of_week INT
);
NoSQL Modeling
Example of document model:
{
"order_id": "ORD123",
"customer": {
"id": "CUST456",
"name": "John Doe",
"email": "john@example.com"
},
"items": [
{
"product_id": "PROD789",
"name": "Widget",
"quantity": 2,
"price": 29.99
}
],
"shipping_address": {
"street": "123 Main St",
"city": "Springfield",
"state": "IL"
}
}
Best Practices
Modeling Guidelines
Best Practices:
- Start with business requirements
- Use standard notation
- Document assumptions
- Consider future growth
- Validate with stakeholders
- Maintain model consistency
Common Pitfalls
Things to Avoid:
- Over-normalization
- Ignoring scalability
- Missing indexes
- Poor naming conventions
- Inadequate documentation