Database Modeling

Introduction to Database Modeling

Database modeling is the process of designing a database structure that efficiently organizes and stores data while maintaining data integrity and supporting business requirements.

Key Concepts:

  • Entity-Relationship Diagrams (ERD)
  • Data normalization
  • Referential integrity
  • Data types and constraints
  • Indexes and optimization
  • ACID properties
  • Schema design
  • Query optimization

Design Process

-- Requirements Analysis
/*
1. Identify entities and relationships
2. Define attributes and data types
3. Establish primary and foreign keys
4. Apply normalization rules
5. Consider performance requirements
6. Implement security measures
*/

-- Example Entity
CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    username VARCHAR(50) UNIQUE NOT NULL,
    email VARCHAR(255) UNIQUE NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    status VARCHAR(20) CHECK (status IN ('active', 'inactive', 'suspended'))
);

Conceptual Modeling

Entity-Relationship Diagram

[User] 1 --- * [Order]
    |           |
    |           |
    * --- 1     *
[Address]    [Product]

Entities:
- User (id, username, email)
- Order (id, user_id, total, status)
- Product (id, name, price, stock)
- Address (id, user_id, street, city)

Relationships:
- User has many Orders (1:N)
- User has many Addresses (1:N)
- Order has many Products (M:N)
- Product belongs to many Orders (M:N)

Cardinality Rules

Relationship Types:

  • One-to-One (1:1)
  • One-to-Many (1:N)
  • Many-to-Many (M:N)
  • Zero-or-One to One
  • Zero-or-One to Many
  • Zero-to-Many

Logical Design

Schema Definition

-- User Management
CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    username VARCHAR(50) UNIQUE NOT NULL,
    email VARCHAR(255) UNIQUE NOT NULL,
    password_hash CHAR(60) NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE addresses (
    id SERIAL PRIMARY KEY,
    user_id INTEGER REFERENCES users(id),
    street VARCHAR(255) NOT NULL,
    city VARCHAR(100) NOT NULL,
    state VARCHAR(50) NOT NULL,
    postal_code VARCHAR(20) NOT NULL,
    is_default BOOLEAN DEFAULT false
);

-- Order Management
CREATE TABLE orders (
    id SERIAL PRIMARY KEY,
    user_id INTEGER REFERENCES users(id),
    total DECIMAL(10,2) NOT NULL,
    status VARCHAR(20) NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE products (
    id SERIAL PRIMARY KEY,
    name VARCHAR(255) NOT NULL,
    description TEXT,
    price DECIMAL(10,2) NOT NULL,
    stock INTEGER NOT NULL DEFAULT 0
);

CREATE TABLE order_items (
    order_id INTEGER REFERENCES orders(id),
    product_id INTEGER REFERENCES products(id),
    quantity INTEGER NOT NULL,
    price_at_time DECIMAL(10,2) NOT NULL,
    PRIMARY KEY (order_id, product_id)
);

Constraints

-- Primary Keys
ALTER TABLE users ADD PRIMARY KEY (id);

-- Foreign Keys
ALTER TABLE orders 
    ADD CONSTRAINT fk_user 
    FOREIGN KEY (user_id) 
    REFERENCES users(id) 
    ON DELETE CASCADE;

-- Unique Constraints
ALTER TABLE users 
    ADD CONSTRAINT unique_email 
    UNIQUE (email);

-- Check Constraints
ALTER TABLE products 
    ADD CONSTRAINT positive_price 
    CHECK (price > 0);

-- Not Null Constraints
ALTER TABLE users 
    ALTER COLUMN username 
    SET NOT NULL;

-- Default Values
ALTER TABLE orders 
    ALTER COLUMN created_at 
    SET DEFAULT CURRENT_TIMESTAMP;

Physical Implementation

Indexing Strategy

-- B-tree Index (Default)
CREATE INDEX idx_users_email 
    ON users(email);

-- Unique Index
CREATE UNIQUE INDEX idx_users_username 
    ON users(username);

-- Composite Index
CREATE INDEX idx_orders_user_date 
    ON orders(user_id, created_at DESC);

-- Partial Index
CREATE INDEX idx_active_users 
    ON users(email) 
    WHERE status = 'active';

-- Expression Index
CREATE INDEX idx_lower_email 
    ON users(LOWER(email));

-- GiST Index (Geometric data)
CREATE INDEX idx_location 
    ON addresses USING GIST (coordinates);

-- Hash Index (Equality operations)
CREATE INDEX idx_user_lookup 
    ON users USING HASH (id);

Partitioning

-- Range Partitioning
CREATE TABLE orders (
    id SERIAL,
    created_at TIMESTAMP,
    total DECIMAL(10,2)
) PARTITION BY RANGE (created_at);

CREATE TABLE orders_2023 PARTITION OF orders
    FOR VALUES FROM ('2023-01-01') TO ('2024-01-01');

CREATE TABLE orders_2024 PARTITION OF orders
    FOR VALUES FROM ('2024-01-01') TO ('2025-01-01');

-- List Partitioning
CREATE TABLE users (
    id SERIAL,
    country_code CHAR(2),
    username VARCHAR(50)
) PARTITION BY LIST (country_code);

CREATE TABLE users_us PARTITION OF users
    FOR VALUES IN ('US');

CREATE TABLE users_eu PARTITION OF users
    FOR VALUES IN ('DE', 'FR', 'ES', 'IT');

Normalization

Normal Forms:

  • 1NF: Atomic values, no repeating groups
  • 2NF: 1NF + no partial dependencies
  • 3NF: 2NF + no transitive dependencies
  • BCNF: 3NF + all determinants are candidate keys
  • 4NF: BCNF + no multi-valued dependencies
  • 5NF: 4NF + no join dependencies

Denormalization Cases:

  • Frequently accessed read-only data
  • Reporting and analytics
  • Aggregate calculations
  • Historical data
  • Performance optimization

Normalization Example

-- Before Normalization
CREATE TABLE orders_unnormalized (
    order_id INTEGER,
    customer_name VARCHAR(100),
    customer_email VARCHAR(255),
    customer_address TEXT,
    product_name VARCHAR(255),
    product_price DECIMAL(10,2),
    quantity INTEGER
);

-- After Normalization
CREATE TABLE customers (
    id SERIAL PRIMARY KEY,
    name VARCHAR(100),
    email VARCHAR(255) UNIQUE
);

CREATE TABLE addresses (
    id SERIAL PRIMARY KEY,
    customer_id INTEGER REFERENCES customers(id),
    address_line TEXT,
    city VARCHAR(100),
    postal_code VARCHAR(20)
);

CREATE TABLE products (
    id SERIAL PRIMARY KEY,
    name VARCHAR(255),
    price DECIMAL(10,2)
);

CREATE TABLE orders (
    id SERIAL PRIMARY KEY,
    customer_id INTEGER REFERENCES customers(id),
    address_id INTEGER REFERENCES addresses(id),
    order_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE order_items (
    order_id INTEGER REFERENCES orders(id),
    product_id INTEGER REFERENCES products(id),
    quantity INTEGER,
    price_at_time DECIMAL(10,2),
    PRIMARY KEY (order_id, product_id)
);

Best Practices

Design Guidelines:

  • Use appropriate data types
  • Implement proper constraints
  • Design for scalability
  • Consider data access patterns
  • Plan for future growth
  • Document schema changes
  • Backup and recovery strategy
  • Monitor performance

Security Measures:

  • Implement access control
  • Encrypt sensitive data
  • Use prepared statements
  • Regular security audits
  • Backup encryption
  • Monitor access logs
  • Role-based permissions
  • Data masking

Performance Optimization:

  • Proper indexing strategy
  • Query optimization
  • Connection pooling
  • Regular maintenance
  • Caching strategy
  • Partitioning strategy
  • Load balancing
  • Monitoring and tuning