Data Modeling

Introduction

Data modeling is the process of creating a visual representation of data and its relationships within an information system. It serves as a blueprint for database design and development.

Modeling Levels:

  • Conceptual - Business view
  • Logical - System-independent view
  • Physical - Implementation view
  • Schema mapping
  • Data flow modeling
  • Dimensional modeling

Conceptual Modeling

Entity-Relationship Model

Example of ER diagram notation:

[Entity: Customer]
Attributes:
- CustomerID (PK)
- Name
- Email
- Phone

[Entity: Order]
Attributes:
- OrderID (PK)
- CustomerID (FK)
- OrderDate
- Status

Relationship:
Customer (1) ----< Order (N)
/* One customer can have many orders */

Business Rules

Common Rules:

  • Cardinality constraints
  • Participation constraints
  • Business process rules
  • Data integrity rules

Logical Modeling

Relational Schema

Example of logical model:

Customer (
    customer_id: INTEGER {PK},
    name: VARCHAR(100),
    email: VARCHAR(100) {UNIQUE},
    status: CHAR(1) {CHECK IN ('A','I')}
)

Order (
    order_id: INTEGER {PK},
    customer_id: INTEGER {FK},
    order_date: DATE {NOT NULL},
    total_amount: DECIMAL(10,2)
)

OrderItem (
    order_id: INTEGER {PK, FK},
    item_no: INTEGER {PK},
    product_id: INTEGER {FK},
    quantity: INTEGER {> 0},
    unit_price: DECIMAL(10,2)
)

Constraints Definition

Types of Constraints:

  • Domain constraints
  • Key constraints
  • Entity integrity
  • Referential integrity

Physical Modeling

Storage Structures

Example of physical implementation:

CREATE TABLE customers (
    customer_id SERIAL PRIMARY KEY,
    name VARCHAR(100) NOT NULL,
    email VARCHAR(100) UNIQUE,
    status CHAR(1) DEFAULT 'A',
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    CONSTRAINT chk_status CHECK (status IN ('A','I'))
) TABLESPACE customer_space;

CREATE INDEX idx_customer_email ON customers(email);
CREATE INDEX idx_customer_status ON customers(status);

PARTITION TABLE orders BY RANGE (order_date) (
    PARTITION orders_2023 VALUES LESS THAN ('2024-01-01'),
    PARTITION orders_2024 VALUES LESS THAN ('2025-01-01'),
    PARTITION orders_future VALUES LESS THAN (MAXVALUE)
);

Advanced Techniques

Dimensional Modeling

Example of star schema:

/* Fact Table */
CREATE TABLE sales_fact (
    date_key INT,
    product_key INT,
    store_key INT,
    customer_key INT,
    quantity_sold INT,
    amount DECIMAL(10,2),
    FOREIGN KEY (date_key) REFERENCES dim_date(date_key),
    FOREIGN KEY (product_key) REFERENCES dim_product(product_key),
    FOREIGN KEY (store_key) REFERENCES dim_store(store_key),
    FOREIGN KEY (customer_key) REFERENCES dim_customer(customer_key)
);

/* Dimension Tables */
CREATE TABLE dim_date (
    date_key INT PRIMARY KEY,
    full_date DATE,
    year INT,
    quarter INT,
    month INT,
    week INT,
    day_of_week INT
);

NoSQL Modeling

Example of document model:

{
    "order_id": "ORD123",
    "customer": {
        "id": "CUST456",
        "name": "John Doe",
        "email": "john@example.com"
    },
    "items": [
        {
            "product_id": "PROD789",
            "name": "Widget",
            "quantity": 2,
            "price": 29.99
        }
    ],
    "shipping_address": {
        "street": "123 Main St",
        "city": "Springfield",
        "state": "IL"
    }
}

Best Practices

Modeling Guidelines

Best Practices:

  • Start with business requirements
  • Use standard notation
  • Document assumptions
  • Consider future growth
  • Validate with stakeholders
  • Maintain model consistency

Common Pitfalls

Things to Avoid:

  • Over-normalization
  • Ignoring scalability
  • Missing indexes
  • Poor naming conventions
  • Inadequate documentation