claude-help

10 - DevOps và CI/CD với Claude Code

Hướng dẫn toàn diện về DevOps, CI/CD, container hóa, triển khai và quản lý hạ tầng với Claude Code CLI.


Mục lục

  1. Docker & Docker Compose
  2. GitHub Actions
  3. GitLab CI / Jenkins
  4. Kubernetes cơ bản
  5. Terraform / Infrastructure as Code
  6. Monitoring & Logging
  7. Secret Management
  8. Sử dụng Claude Skills cho DevOps
  9. Các prompt mẫu DevOps thực tế
  10. Checklist DevOps

Docker & Docker Compose

Dockerfile cho Node.js TypeScript (multi-stage)

# === Build stage ===
FROM node:20-alpine AS builder

WORKDIR /app

# Cài dependencies trước (tận dụng Docker cache)
COPY package.json package-lock.json ./
RUN npm ci --only=production && \
    cp -R node_modules prod_node_modules && \
    npm ci

# Copy source và build
COPY tsconfig.json ./
COPY src ./src
RUN npm run build

# === Production stage ===
FROM node:20-alpine AS production

WORKDIR /app

# Tạo user non-root
RUN addgroup -g 1001 -S appgroup && \
    adduser -S appuser -u 1001 -G appgroup

COPY --from=builder /app/prod_node_modules ./node_modules
COPY --from=builder /app/dist ./dist
COPY package.json ./

# Chuyển sang user non-root
USER appuser

EXPOSE 3000

HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
  CMD wget --no-verbose --tries=1 --spider http://localhost:3000/health || exit 1

CMD ["node", "dist/index.js"]

Prompt mẫu:

Tạo Dockerfile multi-stage cho dự án Node.js TypeScript
Tối ưu Dockerfile hiện tại để giảm kích thước image, thêm health check và chạy non-root user

Dockerfile cho Go (multi-stage)

# === Build stage ===
FROM golang:1.22-alpine AS builder

WORKDIR /app

# Cài dependencies
COPY go.mod go.sum ./
RUN go mod download

# Build binary
COPY . .
RUN CGO_ENABLED=0 GOOS=linux go build -ldflags="-w -s" -o /app/server ./cmd/server

# === Production stage ===
FROM alpine:3.19 AS production

# Cài ca-certificates cho HTTPS
RUN apk --no-cache add ca-certificates tzdata

WORKDIR /app

# Tạo user non-root
RUN addgroup -g 1001 -S appgroup && \
    adduser -S appuser -u 1001 -G appgroup

COPY --from=builder /app/server .

USER appuser

EXPOSE 8080

HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
  CMD wget --no-verbose --tries=1 --spider http://localhost:8080/health || exit 1

ENTRYPOINT ["./server"]

Dockerfile cho Next.js (multi-stage)

# === Dependencies stage ===
FROM node:20-alpine AS deps
WORKDIR /app
COPY package.json package-lock.json ./
RUN npm ci

# === Build stage ===
FROM node:20-alpine AS builder
WORKDIR /app
COPY --from=deps /app/node_modules ./node_modules
COPY . .

# Biến môi trường cho build
ENV NEXT_TELEMETRY_DISABLED=1

RUN npm run build

# === Production stage ===
FROM node:20-alpine AS runner
WORKDIR /app

ENV NODE_ENV=production
ENV NEXT_TELEMETRY_DISABLED=1

RUN addgroup -g 1001 -S appgroup && \
    adduser -S appuser -u 1001 -G appgroup

# Copy chỉ những gì cần thiết cho standalone output
COPY --from=builder /app/public ./public
COPY --from=builder /app/.next/standalone ./
COPY --from=builder /app/.next/static ./.next/static

USER appuser

EXPOSE 3000

ENV PORT=3000
ENV HOSTNAME="0.0.0.0"

CMD ["node", "server.js"]

Prompt mẫu:

Tạo Dockerfile multi-stage cho Next.js với standalone output, tối ưu cho production

Docker Compose cho môi trường phát triển

# docker-compose.yml
version: "3.9"

services:
  # === Ứng dụng chính ===
  app:
    build:
      context: .
      dockerfile: Dockerfile
      target: builder  # Dùng stage builder cho dev
    ports:
      - "3000:3000"
    volumes:
      - ./src:/app/src       # Hot reload
      - ./package.json:/app/package.json
    environment:
      - NODE_ENV=development
      - DATABASE_URL=postgresql://postgres:postgres@db:5432/myapp
      - REDIS_URL=redis://redis:6379
    depends_on:
      db:
        condition: service_healthy
      redis:
        condition: service_healthy
    networks:
      - app-network
    restart: unless-stopped

  # === PostgreSQL ===
  db:
    image: postgres:16-alpine
    ports:
      - "5432:5432"
    environment:
      POSTGRES_USER: postgres
      POSTGRES_PASSWORD: postgres
      POSTGRES_DB: myapp
    volumes:
      - postgres-data:/var/lib/postgresql/data
      - ./scripts/init-db.sql:/docker-entrypoint-initdb.d/init.sql
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U postgres"]
      interval: 10s
      timeout: 5s
      retries: 5
    networks:
      - app-network

  # === Redis ===
  redis:
    image: redis:7-alpine
    ports:
      - "6379:6379"
    command: redis-server --appendonly yes --maxmemory 256mb --maxmemory-policy allkeys-lru
    volumes:
      - redis-data:/data
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 10s
      timeout: 5s
      retries: 5
    networks:
      - app-network

  # === Nginx reverse proxy ===
  nginx:
    image: nginx:alpine
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - ./nginx/nginx.conf:/etc/nginx/nginx.conf:ro
      - ./nginx/ssl:/etc/nginx/ssl:ro
    depends_on:
      - app
    networks:
      - app-network
    restart: unless-stopped

volumes:
  postgres-data:
  redis-data:

networks:
  app-network:
    driver: bridge

Prompt mẫu:

Tạo docker-compose.yml cho dev environment gồm: app, postgres, redis, nginx
Thêm service Adminer vào docker-compose để quản lý database qua web UI
Tạo docker-compose.override.yml cho môi trường staging với biến môi trường riêng

Các lệnh Docker thường dùng

# Build và chạy tất cả services
docker compose up -d --build

# Xem logs của service cụ thể
docker compose logs -f app

# Chạy lệnh trong container
docker compose exec app sh

# Dừng và xóa tất cả
docker compose down -v

# Rebuild chỉ một service
docker compose up -d --build app

# Kiểm tra kích thước image
docker images --format "table \t\t"

GitHub Actions

CI Workflow (test, lint, build)

# .github/workflows/ci.yml
name: CI Pipeline

on:
  push:
    branches: [main, develop]
  pull_request:
    branches: [main]

concurrency:
  group: ci-$
  cancel-in-progress: true

jobs:
  # === Lint và kiểm tra code ===
  lint:
    name: Lint & Format
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - uses: actions/setup-node@v4
        with:
          node-version: 20
          cache: "npm"

      - run: npm ci

      - name: Chạy ESLint
        run: npm run lint

      - name: Kiểm tra format
        run: npm run format:check

      - name: Kiểm tra TypeScript types
        run: npm run type-check

  # === Chạy test với matrix ===
  test:
    name: Test (Node $)
    runs-on: ubuntu-latest
    strategy:
      matrix:
        node-version: [18, 20, 22]
      fail-fast: false

    services:
      postgres:
        image: postgres:16-alpine
        env:
          POSTGRES_USER: test
          POSTGRES_PASSWORD: test
          POSTGRES_DB: test_db
        ports:
          - 5432:5432
        options: >-
          --health-cmd pg_isready
          --health-interval 10s
          --health-timeout 5s
          --health-retries 5

      redis:
        image: redis:7-alpine
        ports:
          - 6379:6379
        options: >-
          --health-cmd "redis-cli ping"
          --health-interval 10s
          --health-timeout 5s
          --health-retries 5

    steps:
      - uses: actions/checkout@v4

      - uses: actions/setup-node@v4
        with:
          node-version: $
          cache: "npm"

      - run: npm ci

      - name: Chạy migrations
        run: npm run db:migrate
        env:
          DATABASE_URL: postgresql://test:test@localhost:5432/test_db

      - name: Chạy unit tests
        run: npm run test:unit -- --coverage
        env:
          DATABASE_URL: postgresql://test:test@localhost:5432/test_db
          REDIS_URL: redis://localhost:6379

      - name: Chạy integration tests
        run: npm run test:integration
        env:
          DATABASE_URL: postgresql://test:test@localhost:5432/test_db
          REDIS_URL: redis://localhost:6379

      - name: Upload coverage
        if: matrix.node-version == 20
        uses: codecov/codecov-action@v4
        with:
          token: $

  # === Build ===
  build:
    name: Build
    runs-on: ubuntu-latest
    needs: [lint, test]
    steps:
      - uses: actions/checkout@v4

      - uses: actions/setup-node@v4
        with:
          node-version: 20
          cache: "npm"

      - run: npm ci
      - run: npm run build

      - name: Upload build artifact
        uses: actions/upload-artifact@v4
        with:
          name: build-output
          path: dist/
          retention-days: 7

Prompt mẫu:

Tạo GitHub Actions workflow CI/CD:
- Chạy test khi push/PR
- Build Docker image
- Deploy lên server khi merge vào main

CD Workflow (deploy khi merge vào main)

# .github/workflows/cd.yml
name: CD Pipeline

on:
  push:
    branches: [main]

env:
  REGISTRY: ghcr.io
  IMAGE_NAME: $

jobs:
  # === Build và push Docker image ===
  docker:
    name: Build & Push Docker Image
    runs-on: ubuntu-latest
    permissions:
      contents: read
      packages: write

    outputs:
      image-tag: $
      image-digest: $

    steps:
      - uses: actions/checkout@v4

      - name: Đăng nhập Container Registry
        uses: docker/login-action@v3
        with:
          registry: $
          username: $
          password: $

      - name: Tạo metadata cho Docker image
        id: meta
        uses: docker/metadata-action@v5
        with:
          images: $/$
          tags: |
            type=sha,prefix=
            type=raw,value=latest

      - name: Setup Docker Buildx
        uses: docker/setup-buildx-action@v3

      - name: Build và push image
        id: build
        uses: docker/build-push-action@v5
        with:
          context: .
          push: true
          tags: $
          labels: $
          cache-from: type=gha
          cache-to: type=gha,mode=max
          platforms: linux/amd64,linux/arm64

  # === Deploy lên server ===
  deploy:
    name: Deploy to Production
    runs-on: ubuntu-latest
    needs: docker
    environment:
      name: production
      url: https://myapp.example.com

    steps:
      - uses: actions/checkout@v4

      - name: Deploy qua SSH
        uses: appleboy/ssh-action@v1
        with:
          host: $
          username: $
          key: $
          script: |
            cd /opt/myapp
            docker compose pull
            docker compose up -d --remove-orphans
            docker image prune -f

      - name: Kiểm tra health check
        run: |
          for i in $(seq 1 30); do
            STATUS=$(curl -s -o /dev/null -w "%{http_code}" https://myapp.example.com/health)
            if [ "$STATUS" = "200" ]; then
              echo "Deploy thành công!"
              exit 0
            fi
            echo "Đang chờ... (lần $i/30)"
            sleep 10
          done
          echo "Health check thất bại!"
          exit 1

      - name: Thông báo qua Slack
        if: always()
        uses: 8398a7/action-slack@v3
        with:
          status: $
          fields: repo,message,commit,author,action
        env:
          SLACK_WEBHOOK_URL: $

Cache dependencies hiệu quả

# Ví dụ cache cho nhiều package manager
steps:
  # Cache npm
  - uses: actions/setup-node@v4
    with:
      node-version: 20
      cache: "npm"

  # Cache Go modules
  - uses: actions/setup-go@v5
    with:
      go-version: "1.22"
      cache: true

  # Cache Docker layers
  - uses: docker/build-push-action@v5
    with:
      cache-from: type=gha
      cache-to: type=gha,mode=max

  # Cache tùy chỉnh
  - uses: actions/cache@v4
    with:
      path: |
        ~/.cache/pip
        ~/.local/share/virtualenvs
      key: $-pip-$
      restore-keys: |
        $-pip-

Prompt mẫu:

Thêm cache cho GitHub Actions workflow để tăng tốc CI pipeline
Tạo GitHub Actions workflow chạy matrix test trên Node 18, 20, 22 với PostgreSQL service

GitLab CI / Jenkins

GitLab CI Pipeline

# .gitlab-ci.yml
stages:
  - test
  - build
  - deploy

variables:
  DOCKER_IMAGE: $CI_REGISTRY_IMAGE:$CI_COMMIT_SHORT_SHA
  POSTGRES_DB: test_db
  POSTGRES_USER: test
  POSTGRES_PASSWORD: test

# === Cache chung ===
default:
  cache:
    key: ${CI_COMMIT_REF_SLUG}
    paths:
      - node_modules/
      - .npm/

# === Test ===
test:
  stage: test
  image: node:20-alpine
  services:
    - postgres:16-alpine
    - redis:7-alpine
  script:
    - npm ci --cache .npm
    - npm run lint
    - npm run type-check
    - npm run test -- --coverage
  artifacts:
    reports:
      coverage_report:
        coverage_format: cobertura
        path: coverage/cobertura-coverage.xml
    when: always

# === Build Docker image ===
build:
  stage: build
  image: docker:24
  services:
    - docker:24-dind
  before_script:
    - docker login -u $CI_REGISTRY_USER -p $CI_REGISTRY_PASSWORD $CI_REGISTRY
  script:
    - docker build -t $DOCKER_IMAGE .
    - docker push $DOCKER_IMAGE
    - docker tag $DOCKER_IMAGE $CI_REGISTRY_IMAGE:latest
    - docker push $CI_REGISTRY_IMAGE:latest
  only:
    - main

# === Deploy staging ===
deploy_staging:
  stage: deploy
  image: alpine:latest
  before_script:
    - apk add --no-cache openssh-client
    - eval $(ssh-agent -s)
    - echo "$SSH_PRIVATE_KEY" | ssh-add -
  script:
    - ssh -o StrictHostKeyChecking=no $DEPLOY_USER@$STAGING_HOST
        "cd /opt/app && docker compose pull && docker compose up -d"
  environment:
    name: staging
    url: https://staging.example.com
  only:
    - develop

# === Deploy production ===
deploy_production:
  stage: deploy
  image: alpine:latest
  before_script:
    - apk add --no-cache openssh-client
    - eval $(ssh-agent -s)
    - echo "$SSH_PRIVATE_KEY" | ssh-add -
  script:
    - ssh -o StrictHostKeyChecking=no $DEPLOY_USER@$PROD_HOST
        "cd /opt/app && docker compose pull && docker compose up -d"
  environment:
    name: production
    url: https://app.example.com
  only:
    - main
  when: manual  # Yêu cầu xác nhận thủ công

Prompt mẫu:

Tạo GitLab CI pipeline với stages: test, build Docker, deploy staging tự động và production manual

Jenkins Pipeline (Jenkinsfile)

// Jenkinsfile
pipeline {
    agent any

    environment {
        DOCKER_IMAGE = "myapp:${env.BUILD_NUMBER}"
        REGISTRY = "registry.example.com"
    }

    options {
        timeout(time: 30, unit: 'MINUTES')
        disableConcurrentBuilds()
    }

    stages {
        stage('Cài đặt') {
            steps {
                sh 'npm ci'
            }
        }

        stage('Kiểm tra') {
            parallel {
                stage('Lint') {
                    steps {
                        sh 'npm run lint'
                    }
                }
                stage('Test') {
                    steps {
                        sh 'npm run test -- --coverage'
                    }
                    post {
                        always {
                            junit 'test-results/**/*.xml'
                            publishHTML(target: [
                                reportDir: 'coverage/lcov-report',
                                reportFiles: 'index.html',
                                reportName: 'Coverage Report'
                            ])
                        }
                    }
                }
            }
        }

        stage('Build Docker') {
            when { branch 'main' }
            steps {
                sh "docker build -t ${REGISTRY}/${DOCKER_IMAGE} ."
                sh "docker push ${REGISTRY}/${DOCKER_IMAGE}"
            }
        }

        stage('Deploy') {
            when { branch 'main' }
            input {
                message "Triển khai lên production?"
                ok "Đồng ý deploy"
            }
            steps {
                sh '''
                    ssh deploy@production "
                        cd /opt/app &&
                        docker compose pull &&
                        docker compose up -d
                    "
                '''
            }
        }
    }

    post {
        failure {
            slackSend(
                color: 'danger',
                message: "Build THẤT BẠI: ${env.JOB_NAME} #${env.BUILD_NUMBER}"
            )
        }
        success {
            slackSend(
                color: 'good',
                message: "Build THÀNH CÔNG: ${env.JOB_NAME} #${env.BUILD_NUMBER}"
            )
        }
    }
}

Prompt mẫu:

Tạo Jenkinsfile với parallel stages cho lint và test, build Docker image, deploy với manual approval

Kubernetes cơ bản

Deployment

# k8s/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp
  namespace: production
  labels:
    app: myapp
    version: v1
spec:
  replicas: 3
  selector:
    matchLabels:
      app: myapp
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0
  template:
    metadata:
      labels:
        app: myapp
        version: v1
    spec:
      serviceAccountName: myapp-sa
      containers:
        - name: myapp
          image: ghcr.io/myorg/myapp:latest
          ports:
            - containerPort: 3000
              protocol: TCP
          envFrom:
            - configMapRef:
                name: myapp-config
            - secretRef:
                name: myapp-secrets
          resources:
            requests:
              memory: "128Mi"
              cpu: "100m"
            limits:
              memory: "512Mi"
              cpu: "500m"
          livenessProbe:
            httpGet:
              path: /health
              port: 3000
            initialDelaySeconds: 15
            periodSeconds: 20
            timeoutSeconds: 5
            failureThreshold: 3
          readinessProbe:
            httpGet:
              path: /ready
              port: 3000
            initialDelaySeconds: 5
            periodSeconds: 10
            timeoutSeconds: 3
            failureThreshold: 3
          startupProbe:
            httpGet:
              path: /health
              port: 3000
            failureThreshold: 30
            periodSeconds: 10
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
            - weight: 100
              podAffinityTerm:
                labelSelector:
                  matchExpressions:
                    - key: app
                      operator: In
                      values:
                        - myapp
                topologyKey: kubernetes.io/hostname

Service

# k8s/service.yaml
apiVersion: v1
kind: Service
metadata:
  name: myapp-service
  namespace: production
  labels:
    app: myapp
spec:
  type: ClusterIP
  selector:
    app: myapp
  ports:
    - name: http
      port: 80
      targetPort: 3000
      protocol: TCP

Ingress với SSL

# k8s/ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: myapp-ingress
  namespace: production
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt-prod
    nginx.ingress.kubernetes.io/rate-limit: "100"
    nginx.ingress.kubernetes.io/rate-limit-window: "1m"
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
    nginx.ingress.kubernetes.io/force-ssl-redirect: "true"
spec:
  ingressClassName: nginx
  tls:
    - hosts:
        - myapp.example.com
      secretName: myapp-tls
  rules:
    - host: myapp.example.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: myapp-service
                port:
                  number: 80

ConfigMap và Secret

# k8s/configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: myapp-config
  namespace: production
data:
  NODE_ENV: "production"
  LOG_LEVEL: "info"
  REDIS_HOST: "redis-master.redis.svc.cluster.local"
  REDIS_PORT: "6379"

---
# k8s/secret.yaml
apiVersion: v1
kind: Secret
metadata:
  name: myapp-secrets
  namespace: production
type: Opaque
stringData:
  DATABASE_URL: "postgresql://user:pass@db-host:5432/myapp"
  JWT_SECRET: "super-secret-key-thay-doi-ngay"
  API_KEY: "api-key-cua-ban"

HorizontalPodAutoscaler

# k8s/hpa.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: myapp-hpa
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: myapp
  minReplicas: 3
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 80

Prompt mẫu:

Tạo Kubernetes manifests cho ứng dụng Node.js:
- Deployment với 3 replicas
- Service type ClusterIP
- Ingress với SSL
- ConfigMap và Secret
Tạo Helm chart cho ứng dụng microservice với values cho staging và production

Helm Chart cơ bản

# Tạo Helm chart mới
helm create myapp-chart
# myapp-chart/values.yaml
replicaCount: 3

image:
  repository: ghcr.io/myorg/myapp
  pullPolicy: IfNotPresent
  tag: "latest"

service:
  type: ClusterIP
  port: 80
  targetPort: 3000

ingress:
  enabled: true
  className: nginx
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt-prod
  hosts:
    - host: myapp.example.com
      paths:
        - path: /
          pathType: Prefix
  tls:
    - secretName: myapp-tls
      hosts:
        - myapp.example.com

resources:
  limits:
    cpu: 500m
    memory: 512Mi
  requests:
    cpu: 100m
    memory: 128Mi

autoscaling:
  enabled: true
  minReplicas: 3
  maxReplicas: 10
  targetCPUUtilizationPercentage: 70

env:
  NODE_ENV: production
  LOG_LEVEL: info
# Triển khai Helm chart
helm install myapp ./myapp-chart -f values-production.yaml -n production

# Cập nhật
helm upgrade myapp ./myapp-chart -f values-production.yaml -n production

# Rollback
helm rollback myapp 1 -n production

# Xem lịch sử
helm history myapp -n production

Terraform / Infrastructure as Code

Cấu trúc thư mục Terraform

terraform/
├── environments/
│   ├── staging/
│   │   ├── main.tf
│   │   ├── variables.tf
│   │   └── terraform.tfvars
│   └── production/
│       ├── main.tf
│       ├── variables.tf
│       └── terraform.tfvars
├── modules/
│   ├── vpc/
│   │   ├── main.tf
│   │   ├── variables.tf
│   │   └── outputs.tf
│   ├── ec2/
│   ├── rds/
│   └── s3/
└── backend.tf

VPC, Subnet, Security Group

# modules/vpc/main.tf
resource "aws_vpc" "main" {
  cidr_block           = var.vpc_cidr
  enable_dns_hostnames = true
  enable_dns_support   = true

  tags = {
    Name        = "${var.project}-vpc"
    Environment = var.environment
  }
}

# Subnet công khai
resource "aws_subnet" "public" {
  count                   = length(var.public_subnet_cidrs)
  vpc_id                  = aws_vpc.main.id
  cidr_block              = var.public_subnet_cidrs[count.index]
  availability_zone       = var.availability_zones[count.index]
  map_public_ip_on_launch = true

  tags = {
    Name = "${var.project}-public-${count.index + 1}"
  }
}

# Subnet riêng tư
resource "aws_subnet" "private" {
  count             = length(var.private_subnet_cidrs)
  vpc_id            = aws_vpc.main.id
  cidr_block        = var.private_subnet_cidrs[count.index]
  availability_zone = var.availability_zones[count.index]

  tags = {
    Name = "${var.project}-private-${count.index + 1}"
  }
}

# Internet Gateway
resource "aws_internet_gateway" "main" {
  vpc_id = aws_vpc.main.id

  tags = {
    Name = "${var.project}-igw"
  }
}

# NAT Gateway
resource "aws_eip" "nat" {
  domain = "vpc"
}

resource "aws_nat_gateway" "main" {
  allocation_id = aws_eip.nat.id
  subnet_id     = aws_subnet.public[0].id

  tags = {
    Name = "${var.project}-nat"
  }
}

# Security Group cho ứng dụng
resource "aws_security_group" "app" {
  name_prefix = "${var.project}-app-"
  vpc_id      = aws_vpc.main.id

  ingress {
    from_port   = 80
    to_port     = 80
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }

  ingress {
    from_port   = 443
    to_port     = 443
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }

  tags = {
    Name = "${var.project}-app-sg"
  }
}

# Security Group cho database
resource "aws_security_group" "db" {
  name_prefix = "${var.project}-db-"
  vpc_id      = aws_vpc.main.id

  ingress {
    from_port       = 5432
    to_port         = 5432
    protocol        = "tcp"
    security_groups = [aws_security_group.app.id]
  }

  tags = {
    Name = "${var.project}-db-sg"
  }
}

EC2 Instance

# modules/ec2/main.tf
resource "aws_instance" "app" {
  ami                    = var.ami_id
  instance_type          = var.instance_type
  subnet_id              = var.subnet_id
  vpc_security_group_ids = [var.security_group_id]
  key_name               = var.key_name

  iam_instance_profile = aws_iam_instance_profile.app.name

  root_block_device {
    volume_type = "gp3"
    volume_size = 30
    encrypted   = true
  }

  user_data = <<-EOF
    #!/bin/bash
    # Cài đặt Docker
    yum update -y
    yum install -y docker
    systemctl start docker
    systemctl enable docker
    usermod -aG docker ec2-user

    # Cài Docker Compose
    curl -L "https://github.com/docker/compose/releases/latest/download/docker-compose-$(uname -s)-$(uname -m)" \
      -o /usr/local/bin/docker-compose
    chmod +x /usr/local/bin/docker-compose
  EOF

  tags = {
    Name        = "${var.project}-app"
    Environment = var.environment
  }
}

RDS PostgreSQL

# modules/rds/main.tf
resource "aws_db_subnet_group" "main" {
  name       = "${var.project}-db-subnet"
  subnet_ids = var.private_subnet_ids

  tags = {
    Name = "${var.project}-db-subnet-group"
  }
}

resource "aws_db_instance" "postgres" {
  identifier     = "${var.project}-db"
  engine         = "postgres"
  engine_version = "16.1"
  instance_class = var.db_instance_class

  allocated_storage     = 20
  max_allocated_storage = 100
  storage_type          = "gp3"
  storage_encrypted     = true

  db_name  = var.db_name
  username = var.db_username
  password = var.db_password

  db_subnet_group_name   = aws_db_subnet_group.main.name
  vpc_security_group_ids = [var.db_security_group_id]

  multi_az            = var.environment == "production" ? true : false
  publicly_accessible = false

  backup_retention_period = 7
  backup_window           = "03:00-04:00"
  maintenance_window      = "Mon:04:00-Mon:05:00"

  deletion_protection = var.environment == "production" ? true : false
  skip_final_snapshot = var.environment == "production" ? false : true

  tags = {
    Name        = "${var.project}-postgres"
    Environment = var.environment
  }
}

S3 Bucket

# modules/s3/main.tf
resource "aws_s3_bucket" "assets" {
  bucket = "${var.project}-assets-${var.environment}"

  tags = {
    Name        = "${var.project}-assets"
    Environment = var.environment
  }
}

resource "aws_s3_bucket_versioning" "assets" {
  bucket = aws_s3_bucket.assets.id
  versioning_configuration {
    status = "Enabled"
  }
}

resource "aws_s3_bucket_server_side_encryption_configuration" "assets" {
  bucket = aws_s3_bucket.assets.id
  rule {
    apply_server_side_encryption_by_default {
      sse_algorithm = "aws:kms"
    }
  }
}

resource "aws_s3_bucket_public_access_block" "assets" {
  bucket = aws_s3_bucket.assets.id

  block_public_acls       = true
  block_public_policy     = true
  ignore_public_acls      = true
  restrict_public_buckets = true
}

resource "aws_s3_bucket_lifecycle_configuration" "assets" {
  bucket = aws_s3_bucket.assets.id

  rule {
    id     = "archive-old-objects"
    status = "Enabled"

    transition {
      days          = 90
      storage_class = "STANDARD_IA"
    }

    transition {
      days          = 180
      storage_class = "GLACIER"
    }

    expiration {
      days = 365
    }
  }
}

Prompt mẫu:

Tạo Terraform config cho AWS:
- VPC, Subnet, Security Group
- EC2 instance
- RDS PostgreSQL
- S3 bucket
Tạo Terraform module cho ECS Fargate cluster chạy ứng dụng containerized

Các lệnh Terraform thường dùng

# Khởi tạo
terraform init

# Xem kế hoạch thay đổi
terraform plan -var-file=environments/production/terraform.tfvars

# Áp dụng thay đổi
terraform apply -var-file=environments/production/terraform.tfvars

# Xóa hạ tầng
terraform destroy -var-file=environments/staging/terraform.tfvars

# Format code
terraform fmt -recursive

# Validate cấu hình
terraform validate

# Xem state hiện tại
terraform state list

Monitoring & Logging

Prometheus + Grafana với Docker Compose

# docker-compose.monitoring.yml
version: "3.9"

services:
  # === Prometheus ===
  prometheus:
    image: prom/prometheus:v2.50.0
    ports:
      - "9090:9090"
    volumes:
      - ./monitoring/prometheus.yml:/etc/prometheus/prometheus.yml:ro
      - ./monitoring/alert-rules.yml:/etc/prometheus/alert-rules.yml:ro
      - prometheus-data:/prometheus
    command:
      - "--config.file=/etc/prometheus/prometheus.yml"
      - "--storage.tsdb.retention.time=30d"
      - "--web.enable-lifecycle"
    restart: unless-stopped

  # === Grafana ===
  grafana:
    image: grafana/grafana:10.3.0
    ports:
      - "3001:3000"
    environment:
      - GF_SECURITY_ADMIN_USER=admin
      - GF_SECURITY_ADMIN_PASSWORD=${GRAFANA_PASSWORD}
      - GF_USERS_ALLOW_SIGN_UP=false
    volumes:
      - grafana-data:/var/lib/grafana
      - ./monitoring/grafana/dashboards:/etc/grafana/provisioning/dashboards:ro
      - ./monitoring/grafana/datasources:/etc/grafana/provisioning/datasources:ro
    depends_on:
      - prometheus
    restart: unless-stopped

  # === Alertmanager ===
  alertmanager:
    image: prom/alertmanager:v0.27.0
    ports:
      - "9093:9093"
    volumes:
      - ./monitoring/alertmanager.yml:/etc/alertmanager/alertmanager.yml:ro
    restart: unless-stopped

  # === Node Exporter (metrics hệ thống) ===
  node-exporter:
    image: prom/node-exporter:v1.7.0
    ports:
      - "9100:9100"
    volumes:
      - /proc:/host/proc:ro
      - /sys:/host/sys:ro
      - /:/rootfs:ro
    command:
      - "--path.procfs=/host/proc"
      - "--path.sysfs=/host/sys"
      - "--collector.filesystem.mount-points-exclude=^/(sys|proc|dev|host|etc)($$|/)"
    restart: unless-stopped

volumes:
  prometheus-data:
  grafana-data:

Cấu hình Prometheus

# monitoring/prometheus.yml
global:
  scrape_interval: 15s
  evaluation_interval: 15s

alerting:
  alertmanagers:
    - static_configs:
        - targets:
            - alertmanager:9093

rule_files:
  - "alert-rules.yml"

scrape_configs:
  - job_name: "prometheus"
    static_configs:
      - targets: ["localhost:9090"]

  - job_name: "app"
    metrics_path: /metrics
    static_configs:
      - targets: ["app:3000"]

  - job_name: "node-exporter"
    static_configs:
      - targets: ["node-exporter:9100"]

  - job_name: "postgres"
    static_configs:
      - targets: ["postgres-exporter:9187"]

Alert Rules

# monitoring/alert-rules.yml
groups:
  - name: app-alerts
    rules:
      - alert: HighErrorRate
        expr: rate(http_requests_total{status=~"5.."}[5m]) / rate(http_requests_total[5m]) > 0.05
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "Tỷ lệ lỗi HTTP cao ()"
          description: "Tỷ lệ lỗi 5xx vượt quá 5% trong 5 phút gần nhất"

      - alert: HighResponseTime
        expr: histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m])) > 2
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Thời gian phản hồi cao (p95 = s)"

      - alert: HighMemoryUsage
        expr: process_resident_memory_bytes / 1024 / 1024 > 512
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "Bộ nhớ sử dụng cao (MB)"

      - alert: PodDown
        expr: up == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Service  đang không hoạt động"

Health Check Endpoint (Node.js)

// src/routes/health.ts
import { Router } from "express";
import { Pool } from "pg";
import { Redis } from "ioredis";

const router = Router();

interface HealthStatus {
  status: "healthy" | "unhealthy";
  timestamp: string;
  uptime: number;
  checks: Record<string, { status: string; latency?: number; error?: string }>;
}

// Health check đơn giản (cho liveness probe)
router.get("/health", (_req, res) => {
  res.json({ status: "ok", timestamp: new Date().toISOString() });
});

// Readiness check (kiểm tra dependencies)
router.get("/ready", async (_req, res) => {
  const health: HealthStatus = {
    status: "healthy",
    timestamp: new Date().toISOString(),
    uptime: process.uptime(),
    checks: {},
  };

  // Kiểm tra database
  try {
    const start = Date.now();
    await pool.query("SELECT 1");
    health.checks.database = {
      status: "up",
      latency: Date.now() - start,
    };
  } catch (error) {
    health.status = "unhealthy";
    health.checks.database = {
      status: "down",
      error: (error as Error).message,
    };
  }

  // Kiểm tra Redis
  try {
    const start = Date.now();
    await redis.ping();
    health.checks.redis = {
      status: "up",
      latency: Date.now() - start,
    };
  } catch (error) {
    health.status = "unhealthy";
    health.checks.redis = {
      status: "down",
      error: (error as Error).message,
    };
  }

  const statusCode = health.status === "healthy" ? 200 : 503;
  res.status(statusCode).json(health);
});

export default router;

Loki + Promtail (Log management)

# Thêm vào docker-compose.monitoring.yml
services:
  loki:
    image: grafana/loki:2.9.4
    ports:
      - "3100:3100"
    volumes:
      - ./monitoring/loki-config.yml:/etc/loki/local-config.yaml:ro
      - loki-data:/loki
    command: -config.file=/etc/loki/local-config.yaml
    restart: unless-stopped

  promtail:
    image: grafana/promtail:2.9.4
    volumes:
      - ./monitoring/promtail-config.yml:/etc/promtail/config.yml:ro
      - /var/log:/var/log:ro
      - /var/lib/docker/containers:/var/lib/docker/containers:ro
    command: -config.file=/etc/promtail/config.yml
    restart: unless-stopped

Prompt mẫu:

Tạo setup Prometheus + Grafana + Alertmanager với Docker Compose, thêm alert rules cho HTTP error rate và response time
Tạo health check endpoint cho ứng dụng Express.js, kiểm tra database và Redis
Tạo cấu hình Loki + Promtail để thu thập logs từ Docker containers

Secret Management

Quản lý biến môi trường

# .env.example - Template cho biến môi trường (commit vào git)
NODE_ENV=development
PORT=3000

# Database
DATABASE_URL=postgresql://user:password@localhost:5432/myapp

# Redis
REDIS_URL=redis://localhost:6379

# JWT
JWT_SECRET=thay-doi-secret-nay
JWT_EXPIRES_IN=7d

# External APIs
STRIPE_SECRET_KEY=sk_test_xxx
AWS_ACCESS_KEY_ID=xxx
AWS_SECRET_ACCESS_KEY=xxx
# .gitignore - KHÔNG BAO GIỜ commit file .env thật
.env
.env.local
.env.production
*.pem
*.key

Docker Secrets (Docker Swarm)

# docker-compose.yml với Docker Secrets
version: "3.9"

services:
  app:
    image: myapp:latest
    secrets:
      - db_password
      - jwt_secret
      - api_key
    environment:
      - DB_PASSWORD_FILE=/run/secrets/db_password
      - JWT_SECRET_FILE=/run/secrets/jwt_secret

secrets:
  db_password:
    file: ./secrets/db_password.txt   # Chế độ development
  jwt_secret:
    file: ./secrets/jwt_secret.txt
  api_key:
    external: true                     # Chế độ production (tạo trước)
// Đọc Docker Secret trong code
import { readFileSync } from "fs";

function getSecret(name: string): string {
  // Thử đọc từ Docker Secret trước
  const secretPath = `/run/secrets/${name}`;
  try {
    return readFileSync(secretPath, "utf8").trim();
  } catch {
    // Fallback về biến môi trường
    const envKey = name.toUpperCase();
    const value = process.env[envKey];
    if (!value) {
      throw new Error(`Secret "${name}" không tìm thấy`);
    }
    return value;
  }
}

// Sử dụng
const dbPassword = getSecret("db_password");
const jwtSecret = getSecret("jwt_secret");

AWS Secrets Manager

// src/utils/aws-secrets.ts
import {
  SecretsManagerClient,
  GetSecretValueCommand,
} from "@aws-sdk/client-secrets-manager";

const client = new SecretsManagerClient({ region: "ap-southeast-1" });

// Cache secrets để tránh gọi API liên tục
const secretCache = new Map<string, { value: string; expiry: number }>();
const CACHE_TTL = 5 * 60 * 1000; // 5 phút

export async function getSecret(secretName: string): Promise<string> {
  // Kiểm tra cache
  const cached = secretCache.get(secretName);
  if (cached && cached.expiry > Date.now()) {
    return cached.value;
  }

  const command = new GetSecretValueCommand({ SecretId: secretName });
  const response = await client.send(command);

  const value = response.SecretString!;

  // Lưu vào cache
  secretCache.set(secretName, {
    value,
    expiry: Date.now() + CACHE_TTL,
  });

  return value;
}

// Lấy nhiều secrets cùng lúc
export async function loadSecrets(): Promise<Record<string, string>> {
  const secretName = `myapp/${process.env.NODE_ENV}/config`;
  const raw = await getSecret(secretName);
  return JSON.parse(raw);
}
# Terraform tạo secret trong AWS Secrets Manager
resource "aws_secretsmanager_secret" "app_config" {
  name                    = "myapp/${var.environment}/config"
  recovery_window_in_days = 7

  tags = {
    Environment = var.environment
  }
}

resource "aws_secretsmanager_secret_version" "app_config" {
  secret_id = aws_secretsmanager_secret.app_config.id
  secret_string = jsonencode({
    DATABASE_URL = "postgresql://${var.db_user}:${var.db_password}@${aws_db_instance.postgres.endpoint}/${var.db_name}"
    JWT_SECRET   = var.jwt_secret
    REDIS_URL    = "redis://${aws_elasticache_cluster.redis.cache_nodes[0].address}:6379"
  })
}

HashiCorp Vault cơ bản

# Khởi động Vault server (development mode)
vault server -dev

# Lưu secret
vault kv put secret/myapp/config \
  database_url="postgresql://user:pass@db:5432/myapp" \
  jwt_secret="super-secret-key" \
  api_key="sk-xxx"

# Đọc secret
vault kv get secret/myapp/config

# Đọc field cụ thể
vault kv get -field=database_url secret/myapp/config
# Docker Compose cho Vault
services:
  vault:
    image: hashicorp/vault:1.15
    ports:
      - "8200:8200"
    environment:
      VAULT_DEV_ROOT_TOKEN_ID: "dev-root-token"
      VAULT_DEV_LISTEN_ADDRESS: "0.0.0.0:8200"
    cap_add:
      - IPC_LOCK
    volumes:
      - vault-data:/vault/data

Prompt mẫu:

Tạo hệ thống quản lý secret cho ứng dụng Node.js: hỗ trợ Docker Secrets, AWS Secrets Manager, và fallback về biến môi trường
Tạo Terraform config để setup AWS Secrets Manager cho các biến môi trường production

Sử dụng Claude Skills cho DevOps

Các skill DevOps chuyên dụng

/engineering-skills:senior-devops         # Tư vấn DevOps chuyên sâu

Sử dụng khi cần:

/engineering-skills:aws-solution-architect # Kiến trúc AWS

Sử dụng khi cần:

/engineering-skills:senior-secops         # Security Operations

Sử dụng khi cần:

Kết hợp skills trong workflow

# Bước 1: Thiết kế kiến trúc
/engineering-skills:senior-architect
> Thiết kế kiến trúc microservices cho ứng dụng e-commerce

# Bước 2: Viết Terraform
/engineering-skills:aws-solution-architect
> Tạo Terraform cho kiến trúc đã thiết kế ở trên

# Bước 3: Setup CI/CD
/engineering-skills:senior-devops
> Tạo GitHub Actions pipeline cho deploy microservices lên ECS

# Bước 4: Monitoring
/engineering-skills:senior-devops
> Setup Prometheus + Grafana monitoring cho hệ thống microservices

# Bước 5: Security audit
/engineering-skills:senior-secops
> Audit bảo mật toàn bộ hạ tầng và pipeline CI/CD

Các prompt mẫu DevOps thực tế

1. Setup dự án mới hoàn chỉnh

Tạo setup DevOps hoàn chỉnh cho dự án Node.js TypeScript:
1. Dockerfile multi-stage (dev + production)
2. docker-compose.yml (app + postgres + redis)
3. GitHub Actions CI/CD
4. Nginx reverse proxy config
5. Makefile cho các lệnh thường dùng

2. Tối ưu Docker build

Phân tích Dockerfile hiện tại và tối ưu:
- Giảm kích thước image
- Tận dụng layer cache tốt hơn
- Thêm security best practices (non-root user, .dockerignore)
- Multi-stage build nếu chưa có

3. Pipeline cho monorepo

Tạo GitHub Actions workflow cho monorepo gồm:
- packages/api (Node.js)
- packages/web (Next.js)
- packages/shared (common code)
Chỉ build/deploy package nào có thay đổi (path filter)

4. Database migration trong CI/CD

Thêm bước database migration vào CI/CD pipeline:
- Chạy migration tự động khi deploy
- Rollback migration nếu deploy thất bại
- Backup database trước khi migrate production

5. Blue-Green Deployment

Thiết kế blue-green deployment cho ứng dụng trên AWS:
- 2 target groups (blue và green)
- ALB chuyển traffic giữa 2 nhóm
- Health check trước khi switch
- Khả năng rollback nhanh

6. Auto-scaling

Cấu hình auto-scaling cho Kubernetes:
- HPA dựa trên CPU và memory
- Custom metrics (request per second)
- Scale-to-zero cho môi trường staging
- PodDisruptionBudget cho high availability

7. Disaster Recovery

Thiết kế disaster recovery plan:
- Backup database tự động hàng ngày (RDS snapshot)
- Cross-region replication cho S3
- Recovery time objective (RTO): 1 giờ
- Recovery point objective (RPO): 15 phút
- Runbook cho từng tình huống

8. Log aggregation

Setup hệ thống log tập trung:
- Thu thập logs từ tất cả containers
- Structured logging (JSON format)
- Log rotation và retention policy
- Dashboard Grafana cho log analysis
- Alert khi có error pattern bất thường

9. SSL/TLS tự động

Cấu hình SSL/TLS tự động:
- Let's Encrypt với cert-manager trên Kubernetes
- Auto-renewal certificates
- Force HTTPS redirect
- HSTS headers

10. Performance monitoring

Setup performance monitoring cho ứng dụng Node.js:
- APM với Prometheus metrics (http_request_duration, memory, cpu)
- Database query performance tracking
- Redis cache hit/miss ratio
- Grafana dashboard với các panel quan trọng
- Alert khi p95 response time > 2s

11. Container security scanning

Thêm security scanning vào CI/CD:
- Scan Docker image với Trivy
- Scan dependencies với npm audit / Snyk
- SAST (Static Application Security Testing)
- Block deploy nếu có vulnerability critical

12. Multi-environment setup

Tạo cấu hình cho 3 môi trường: dev, staging, production
- Terraform workspaces hoặc directories riêng
- Biến môi trường khác nhau cho mỗi env
- GitHub Actions workflow với environment protection rules
- Tự động deploy staging, manual approve cho production

Checklist DevOps

Containerization

CI/CD Pipeline

Infrastructure

Monitoring & Logging

Security

Reliability


Mẹo: Sử dụng /engineering-skills:senior-devops khi cần tư vấn chuyên sâu về DevOps. Kết hợp với /engineering-skills:aws-solution-architect cho các dự án trên AWS và /engineering-skills:senior-secops cho audit bảo mật hạ tầng.