DevOps Fundamentals

Introduction

DevOps is a set of practices that combines software development (Dev) and IT operations (Ops). It aims to shorten the systems development life cycle and provide continuous delivery with high software quality.

Core Values:

Culture of collaboration
Automation of processes
Continuous improvement
Customer-centric action
End-to-end responsibility
Fast feedback loops

Core Principles

The Three Ways

First Way: Flow

Left-to-right flow of work
Small batch sizes
Reduced work in progress
Eliminated constraints

Second Way: Feedback

Short feedback loops
Problem detection
Quality at source
Understanding and response

Third Way: Learning

Continuous experimentation
Risk taking and learning
Practice and repetition
Organization improvement

Key Practices

Continuous Integration

# Example GitHub Actions workflow
name: CI Pipeline

on:
  push:
    branches: [ main ]
  pull_request:
    branches: [ main ]

jobs:
  build:
    runs-on: ubuntu-latest
    
    steps:
    - uses: actions/checkout@v2
    
    - name: Set up Node.js
      uses: actions/setup-node@v2
      with:
        node-version: '16'
        
    - name: Install dependencies
      run: npm ci
        
    - name: Run tests
      run: npm test
        
    - name: Build
      run: npm run build
        
    - name: Run linter
      run: npm run lint

Continuous Deployment

# Example deployment workflow
name: CD Pipeline

on:
  workflow_run:
    workflows: ["CI Pipeline"]
    branches: [main]
    types: [completed]

jobs:
  deploy:
    if: ${{ github.event.workflow_run.conclusion == 'success' }}
    runs-on: ubuntu-latest
    
    steps:
    - name: Deploy to production
      uses: some-deploy-action@v1
      with:
        api_token: ${{ secrets.DEPLOY_TOKEN }}
        environment: production

Essential Tools

Version Control:

Git
GitHub/GitLab/Bitbucket

CI/CD:

Jenkins
GitHub Actions
CircleCI
GitLab CI

Configuration Management:

Ansible
Puppet
Chef

Containerization:

Docker
Kubernetes

Monitoring:

Prometheus
Grafana
ELK Stack

Automation

Infrastructure as Code

# Terraform example
provider "aws" {
  region = "us-west-2"
}

resource "aws_instance" "web" {
  count = 3
  ami           = "ami-0c55b159cbfafe1f0"
  instance_type = "t2.micro"

  tags = {
    Name = "web-server-${count.index}"
    Environment = "production"
  }
}

resource "aws_security_group" "allow_http" {
  name        = "allow_http"
  description = "Allow HTTP inbound traffic"

  ingress {
    from_port   = 80
    to_port     = 80
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }
}

Configuration Management

# Ansible playbook example
---
- name: Configure web servers
  hosts: webservers
  become: yes
  
  tasks:
    - name: Install nginx
      apt:
        name: nginx
        state: present
        
    - name: Start nginx service
      service:
        name: nginx
        state: started
        enabled: yes
        
    - name: Copy website files
      copy:
        src: files/website/
        dest: /var/www/html/
        mode: '0644'

Metrics & KPIs

Key Performance Indicators

Deployment Metrics:

Deployment Frequency
Lead Time for Changes
Change Failure Rate
Mean Time to Recovery

Operational Metrics:

System Availability
Error Rate
Response Time
Resource Utilization

Quality Metrics:

Test Coverage
Bug Resolution Time
Technical Debt
Code Quality Score

# Prometheus monitoring example
groups:
- name: deployment_metrics
  rules:
  - record: deployment_frequency
    expr: count(deployment_timestamp) by (environment)

  - record: lead_time
    expr: deployment_timestamp - code_commit_timestamp

  - alert: HighErrorRate
    expr: error_rate > 0.05
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: High error rate detected
      description: Error rate is above 5% for 5 minutes