Database Modeling
Introduction to Database Modeling
Database modeling is the process of designing a database structure that efficiently organizes and stores data while maintaining data integrity and supporting business requirements.
Key Concepts:
- Entity-Relationship Diagrams (ERD)
- Data normalization
- Referential integrity
- Data types and constraints
- Indexes and optimization
- ACID properties
- Schema design
- Query optimization
Design Process
-- Requirements Analysis
/*
1. Identify entities and relationships
2. Define attributes and data types
3. Establish primary and foreign keys
4. Apply normalization rules
5. Consider performance requirements
6. Implement security measures
*/
-- Example Entity
CREATE TABLE users (
id SERIAL PRIMARY KEY,
username VARCHAR(50) UNIQUE NOT NULL,
email VARCHAR(255) UNIQUE NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
status VARCHAR(20) CHECK (status IN ('active', 'inactive', 'suspended'))
);
Conceptual Modeling
Entity-Relationship Diagram
[User] 1 --- * [Order]
| |
| |
* --- 1 *
[Address] [Product]
Entities:
- User (id, username, email)
- Order (id, user_id, total, status)
- Product (id, name, price, stock)
- Address (id, user_id, street, city)
Relationships:
- User has many Orders (1:N)
- User has many Addresses (1:N)
- Order has many Products (M:N)
- Product belongs to many Orders (M:N)
Cardinality Rules
Relationship Types:
- One-to-One (1:1)
- One-to-Many (1:N)
- Many-to-Many (M:N)
- Zero-or-One to One
- Zero-or-One to Many
- Zero-to-Many
Logical Design
Schema Definition
-- User Management
CREATE TABLE users (
id SERIAL PRIMARY KEY,
username VARCHAR(50) UNIQUE NOT NULL,
email VARCHAR(255) UNIQUE NOT NULL,
password_hash CHAR(60) NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
CREATE TABLE addresses (
id SERIAL PRIMARY KEY,
user_id INTEGER REFERENCES users(id),
street VARCHAR(255) NOT NULL,
city VARCHAR(100) NOT NULL,
state VARCHAR(50) NOT NULL,
postal_code VARCHAR(20) NOT NULL,
is_default BOOLEAN DEFAULT false
);
-- Order Management
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
user_id INTEGER REFERENCES users(id),
total DECIMAL(10,2) NOT NULL,
status VARCHAR(20) NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
CREATE TABLE products (
id SERIAL PRIMARY KEY,
name VARCHAR(255) NOT NULL,
description TEXT,
price DECIMAL(10,2) NOT NULL,
stock INTEGER NOT NULL DEFAULT 0
);
CREATE TABLE order_items (
order_id INTEGER REFERENCES orders(id),
product_id INTEGER REFERENCES products(id),
quantity INTEGER NOT NULL,
price_at_time DECIMAL(10,2) NOT NULL,
PRIMARY KEY (order_id, product_id)
);
Constraints
-- Primary Keys
ALTER TABLE users ADD PRIMARY KEY (id);
-- Foreign Keys
ALTER TABLE orders
ADD CONSTRAINT fk_user
FOREIGN KEY (user_id)
REFERENCES users(id)
ON DELETE CASCADE;
-- Unique Constraints
ALTER TABLE users
ADD CONSTRAINT unique_email
UNIQUE (email);
-- Check Constraints
ALTER TABLE products
ADD CONSTRAINT positive_price
CHECK (price > 0);
-- Not Null Constraints
ALTER TABLE users
ALTER COLUMN username
SET NOT NULL;
-- Default Values
ALTER TABLE orders
ALTER COLUMN created_at
SET DEFAULT CURRENT_TIMESTAMP;
Physical Implementation
Indexing Strategy
-- B-tree Index (Default)
CREATE INDEX idx_users_email
ON users(email);
-- Unique Index
CREATE UNIQUE INDEX idx_users_username
ON users(username);
-- Composite Index
CREATE INDEX idx_orders_user_date
ON orders(user_id, created_at DESC);
-- Partial Index
CREATE INDEX idx_active_users
ON users(email)
WHERE status = 'active';
-- Expression Index
CREATE INDEX idx_lower_email
ON users(LOWER(email));
-- GiST Index (Geometric data)
CREATE INDEX idx_location
ON addresses USING GIST (coordinates);
-- Hash Index (Equality operations)
CREATE INDEX idx_user_lookup
ON users USING HASH (id);
Partitioning
-- Range Partitioning
CREATE TABLE orders (
id SERIAL,
created_at TIMESTAMP,
total DECIMAL(10,2)
) PARTITION BY RANGE (created_at);
CREATE TABLE orders_2023 PARTITION OF orders
FOR VALUES FROM ('2023-01-01') TO ('2024-01-01');
CREATE TABLE orders_2024 PARTITION OF orders
FOR VALUES FROM ('2024-01-01') TO ('2025-01-01');
-- List Partitioning
CREATE TABLE users (
id SERIAL,
country_code CHAR(2),
username VARCHAR(50)
) PARTITION BY LIST (country_code);
CREATE TABLE users_us PARTITION OF users
FOR VALUES IN ('US');
CREATE TABLE users_eu PARTITION OF users
FOR VALUES IN ('DE', 'FR', 'ES', 'IT');
Normalization
Normal Forms:
- 1NF: Atomic values, no repeating groups
- 2NF: 1NF + no partial dependencies
- 3NF: 2NF + no transitive dependencies
- BCNF: 3NF + all determinants are candidate keys
- 4NF: BCNF + no multi-valued dependencies
- 5NF: 4NF + no join dependencies
Denormalization Cases:
- Frequently accessed read-only data
- Reporting and analytics
- Aggregate calculations
- Historical data
- Performance optimization
Normalization Example
-- Before Normalization
CREATE TABLE orders_unnormalized (
order_id INTEGER,
customer_name VARCHAR(100),
customer_email VARCHAR(255),
customer_address TEXT,
product_name VARCHAR(255),
product_price DECIMAL(10,2),
quantity INTEGER
);
-- After Normalization
CREATE TABLE customers (
id SERIAL PRIMARY KEY,
name VARCHAR(100),
email VARCHAR(255) UNIQUE
);
CREATE TABLE addresses (
id SERIAL PRIMARY KEY,
customer_id INTEGER REFERENCES customers(id),
address_line TEXT,
city VARCHAR(100),
postal_code VARCHAR(20)
);
CREATE TABLE products (
id SERIAL PRIMARY KEY,
name VARCHAR(255),
price DECIMAL(10,2)
);
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
customer_id INTEGER REFERENCES customers(id),
address_id INTEGER REFERENCES addresses(id),
order_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
CREATE TABLE order_items (
order_id INTEGER REFERENCES orders(id),
product_id INTEGER REFERENCES products(id),
quantity INTEGER,
price_at_time DECIMAL(10,2),
PRIMARY KEY (order_id, product_id)
);
Best Practices
Design Guidelines:
- Use appropriate data types
- Implement proper constraints
- Design for scalability
- Consider data access patterns
- Plan for future growth
- Document schema changes
- Backup and recovery strategy
- Monitor performance
Security Measures:
- Implement access control
- Encrypt sensitive data
- Use prepared statements
- Regular security audits
- Backup encryption
- Monitor access logs
- Role-based permissions
- Data masking
Performance Optimization:
- Proper indexing strategy
- Query optimization
- Connection pooling
- Regular maintenance
- Caching strategy
- Partitioning strategy
- Load balancing
- Monitoring and tuning