Hướng dẫn toàn diện về DevOps, CI/CD, container hóa, triển khai và quản lý hạ tầng với Claude Code CLI.
# === Build stage ===
FROM node:20-alpine AS builder
WORKDIR /app
# Cài dependencies trước (tận dụng Docker cache)
COPY package.json package-lock.json ./
RUN npm ci --only=production && \
cp -R node_modules prod_node_modules && \
npm ci
# Copy source và build
COPY tsconfig.json ./
COPY src ./src
RUN npm run build
# === Production stage ===
FROM node:20-alpine AS production
WORKDIR /app
# Tạo user non-root
RUN addgroup -g 1001 -S appgroup && \
adduser -S appuser -u 1001 -G appgroup
COPY --from=builder /app/prod_node_modules ./node_modules
COPY --from=builder /app/dist ./dist
COPY package.json ./
# Chuyển sang user non-root
USER appuser
EXPOSE 3000
HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
CMD wget --no-verbose --tries=1 --spider http://localhost:3000/health || exit 1
CMD ["node", "dist/index.js"]
Prompt mẫu:
Tạo Dockerfile multi-stage cho dự án Node.js TypeScript
Tối ưu Dockerfile hiện tại để giảm kích thước image, thêm health check và chạy non-root user
# === Build stage ===
FROM golang:1.22-alpine AS builder
WORKDIR /app
# Cài dependencies
COPY go.mod go.sum ./
RUN go mod download
# Build binary
COPY . .
RUN CGO_ENABLED=0 GOOS=linux go build -ldflags="-w -s" -o /app/server ./cmd/server
# === Production stage ===
FROM alpine:3.19 AS production
# Cài ca-certificates cho HTTPS
RUN apk --no-cache add ca-certificates tzdata
WORKDIR /app
# Tạo user non-root
RUN addgroup -g 1001 -S appgroup && \
adduser -S appuser -u 1001 -G appgroup
COPY --from=builder /app/server .
USER appuser
EXPOSE 8080
HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
CMD wget --no-verbose --tries=1 --spider http://localhost:8080/health || exit 1
ENTRYPOINT ["./server"]
# === Dependencies stage ===
FROM node:20-alpine AS deps
WORKDIR /app
COPY package.json package-lock.json ./
RUN npm ci
# === Build stage ===
FROM node:20-alpine AS builder
WORKDIR /app
COPY --from=deps /app/node_modules ./node_modules
COPY . .
# Biến môi trường cho build
ENV NEXT_TELEMETRY_DISABLED=1
RUN npm run build
# === Production stage ===
FROM node:20-alpine AS runner
WORKDIR /app
ENV NODE_ENV=production
ENV NEXT_TELEMETRY_DISABLED=1
RUN addgroup -g 1001 -S appgroup && \
adduser -S appuser -u 1001 -G appgroup
# Copy chỉ những gì cần thiết cho standalone output
COPY --from=builder /app/public ./public
COPY --from=builder /app/.next/standalone ./
COPY --from=builder /app/.next/static ./.next/static
USER appuser
EXPOSE 3000
ENV PORT=3000
ENV HOSTNAME="0.0.0.0"
CMD ["node", "server.js"]
Prompt mẫu:
Tạo Dockerfile multi-stage cho Next.js với standalone output, tối ưu cho production
# docker-compose.yml
version: "3.9"
services:
# === Ứng dụng chính ===
app:
build:
context: .
dockerfile: Dockerfile
target: builder # Dùng stage builder cho dev
ports:
- "3000:3000"
volumes:
- ./src:/app/src # Hot reload
- ./package.json:/app/package.json
environment:
- NODE_ENV=development
- DATABASE_URL=postgresql://postgres:postgres@db:5432/myapp
- REDIS_URL=redis://redis:6379
depends_on:
db:
condition: service_healthy
redis:
condition: service_healthy
networks:
- app-network
restart: unless-stopped
# === PostgreSQL ===
db:
image: postgres:16-alpine
ports:
- "5432:5432"
environment:
POSTGRES_USER: postgres
POSTGRES_PASSWORD: postgres
POSTGRES_DB: myapp
volumes:
- postgres-data:/var/lib/postgresql/data
- ./scripts/init-db.sql:/docker-entrypoint-initdb.d/init.sql
healthcheck:
test: ["CMD-SHELL", "pg_isready -U postgres"]
interval: 10s
timeout: 5s
retries: 5
networks:
- app-network
# === Redis ===
redis:
image: redis:7-alpine
ports:
- "6379:6379"
command: redis-server --appendonly yes --maxmemory 256mb --maxmemory-policy allkeys-lru
volumes:
- redis-data:/data
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 10s
timeout: 5s
retries: 5
networks:
- app-network
# === Nginx reverse proxy ===
nginx:
image: nginx:alpine
ports:
- "80:80"
- "443:443"
volumes:
- ./nginx/nginx.conf:/etc/nginx/nginx.conf:ro
- ./nginx/ssl:/etc/nginx/ssl:ro
depends_on:
- app
networks:
- app-network
restart: unless-stopped
volumes:
postgres-data:
redis-data:
networks:
app-network:
driver: bridge
Prompt mẫu:
Tạo docker-compose.yml cho dev environment gồm: app, postgres, redis, nginx
Thêm service Adminer vào docker-compose để quản lý database qua web UI
Tạo docker-compose.override.yml cho môi trường staging với biến môi trường riêng
# Build và chạy tất cả services
docker compose up -d --build
# Xem logs của service cụ thể
docker compose logs -f app
# Chạy lệnh trong container
docker compose exec app sh
# Dừng và xóa tất cả
docker compose down -v
# Rebuild chỉ một service
docker compose up -d --build app
# Kiểm tra kích thước image
docker images --format "table \t\t"
# .github/workflows/ci.yml
name: CI Pipeline
on:
push:
branches: [main, develop]
pull_request:
branches: [main]
concurrency:
group: ci-$
cancel-in-progress: true
jobs:
# === Lint và kiểm tra code ===
lint:
name: Lint & Format
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: 20
cache: "npm"
- run: npm ci
- name: Chạy ESLint
run: npm run lint
- name: Kiểm tra format
run: npm run format:check
- name: Kiểm tra TypeScript types
run: npm run type-check
# === Chạy test với matrix ===
test:
name: Test (Node $)
runs-on: ubuntu-latest
strategy:
matrix:
node-version: [18, 20, 22]
fail-fast: false
services:
postgres:
image: postgres:16-alpine
env:
POSTGRES_USER: test
POSTGRES_PASSWORD: test
POSTGRES_DB: test_db
ports:
- 5432:5432
options: >-
--health-cmd pg_isready
--health-interval 10s
--health-timeout 5s
--health-retries 5
redis:
image: redis:7-alpine
ports:
- 6379:6379
options: >-
--health-cmd "redis-cli ping"
--health-interval 10s
--health-timeout 5s
--health-retries 5
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: $
cache: "npm"
- run: npm ci
- name: Chạy migrations
run: npm run db:migrate
env:
DATABASE_URL: postgresql://test:test@localhost:5432/test_db
- name: Chạy unit tests
run: npm run test:unit -- --coverage
env:
DATABASE_URL: postgresql://test:test@localhost:5432/test_db
REDIS_URL: redis://localhost:6379
- name: Chạy integration tests
run: npm run test:integration
env:
DATABASE_URL: postgresql://test:test@localhost:5432/test_db
REDIS_URL: redis://localhost:6379
- name: Upload coverage
if: matrix.node-version == 20
uses: codecov/codecov-action@v4
with:
token: $
# === Build ===
build:
name: Build
runs-on: ubuntu-latest
needs: [lint, test]
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: 20
cache: "npm"
- run: npm ci
- run: npm run build
- name: Upload build artifact
uses: actions/upload-artifact@v4
with:
name: build-output
path: dist/
retention-days: 7
Prompt mẫu:
Tạo GitHub Actions workflow CI/CD:
- Chạy test khi push/PR
- Build Docker image
- Deploy lên server khi merge vào main
# .github/workflows/cd.yml
name: CD Pipeline
on:
push:
branches: [main]
env:
REGISTRY: ghcr.io
IMAGE_NAME: $
jobs:
# === Build và push Docker image ===
docker:
name: Build & Push Docker Image
runs-on: ubuntu-latest
permissions:
contents: read
packages: write
outputs:
image-tag: $
image-digest: $
steps:
- uses: actions/checkout@v4
- name: Đăng nhập Container Registry
uses: docker/login-action@v3
with:
registry: $
username: $
password: $
- name: Tạo metadata cho Docker image
id: meta
uses: docker/metadata-action@v5
with:
images: $/$
tags: |
type=sha,prefix=
type=raw,value=latest
- name: Setup Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Build và push image
id: build
uses: docker/build-push-action@v5
with:
context: .
push: true
tags: $
labels: $
cache-from: type=gha
cache-to: type=gha,mode=max
platforms: linux/amd64,linux/arm64
# === Deploy lên server ===
deploy:
name: Deploy to Production
runs-on: ubuntu-latest
needs: docker
environment:
name: production
url: https://myapp.example.com
steps:
- uses: actions/checkout@v4
- name: Deploy qua SSH
uses: appleboy/ssh-action@v1
with:
host: $
username: $
key: $
script: |
cd /opt/myapp
docker compose pull
docker compose up -d --remove-orphans
docker image prune -f
- name: Kiểm tra health check
run: |
for i in $(seq 1 30); do
STATUS=$(curl -s -o /dev/null -w "%{http_code}" https://myapp.example.com/health)
if [ "$STATUS" = "200" ]; then
echo "Deploy thành công!"
exit 0
fi
echo "Đang chờ... (lần $i/30)"
sleep 10
done
echo "Health check thất bại!"
exit 1
- name: Thông báo qua Slack
if: always()
uses: 8398a7/action-slack@v3
with:
status: $
fields: repo,message,commit,author,action
env:
SLACK_WEBHOOK_URL: $
# Ví dụ cache cho nhiều package manager
steps:
# Cache npm
- uses: actions/setup-node@v4
with:
node-version: 20
cache: "npm"
# Cache Go modules
- uses: actions/setup-go@v5
with:
go-version: "1.22"
cache: true
# Cache Docker layers
- uses: docker/build-push-action@v5
with:
cache-from: type=gha
cache-to: type=gha,mode=max
# Cache tùy chỉnh
- uses: actions/cache@v4
with:
path: |
~/.cache/pip
~/.local/share/virtualenvs
key: $-pip-$
restore-keys: |
$-pip-
Prompt mẫu:
Thêm cache cho GitHub Actions workflow để tăng tốc CI pipeline
Tạo GitHub Actions workflow chạy matrix test trên Node 18, 20, 22 với PostgreSQL service
# .gitlab-ci.yml
stages:
- test
- build
- deploy
variables:
DOCKER_IMAGE: $CI_REGISTRY_IMAGE:$CI_COMMIT_SHORT_SHA
POSTGRES_DB: test_db
POSTGRES_USER: test
POSTGRES_PASSWORD: test
# === Cache chung ===
default:
cache:
key: ${CI_COMMIT_REF_SLUG}
paths:
- node_modules/
- .npm/
# === Test ===
test:
stage: test
image: node:20-alpine
services:
- postgres:16-alpine
- redis:7-alpine
script:
- npm ci --cache .npm
- npm run lint
- npm run type-check
- npm run test -- --coverage
artifacts:
reports:
coverage_report:
coverage_format: cobertura
path: coverage/cobertura-coverage.xml
when: always
# === Build Docker image ===
build:
stage: build
image: docker:24
services:
- docker:24-dind
before_script:
- docker login -u $CI_REGISTRY_USER -p $CI_REGISTRY_PASSWORD $CI_REGISTRY
script:
- docker build -t $DOCKER_IMAGE .
- docker push $DOCKER_IMAGE
- docker tag $DOCKER_IMAGE $CI_REGISTRY_IMAGE:latest
- docker push $CI_REGISTRY_IMAGE:latest
only:
- main
# === Deploy staging ===
deploy_staging:
stage: deploy
image: alpine:latest
before_script:
- apk add --no-cache openssh-client
- eval $(ssh-agent -s)
- echo "$SSH_PRIVATE_KEY" | ssh-add -
script:
- ssh -o StrictHostKeyChecking=no $DEPLOY_USER@$STAGING_HOST
"cd /opt/app && docker compose pull && docker compose up -d"
environment:
name: staging
url: https://staging.example.com
only:
- develop
# === Deploy production ===
deploy_production:
stage: deploy
image: alpine:latest
before_script:
- apk add --no-cache openssh-client
- eval $(ssh-agent -s)
- echo "$SSH_PRIVATE_KEY" | ssh-add -
script:
- ssh -o StrictHostKeyChecking=no $DEPLOY_USER@$PROD_HOST
"cd /opt/app && docker compose pull && docker compose up -d"
environment:
name: production
url: https://app.example.com
only:
- main
when: manual # Yêu cầu xác nhận thủ công
Prompt mẫu:
Tạo GitLab CI pipeline với stages: test, build Docker, deploy staging tự động và production manual
// Jenkinsfile
pipeline {
agent any
environment {
DOCKER_IMAGE = "myapp:${env.BUILD_NUMBER}"
REGISTRY = "registry.example.com"
}
options {
timeout(time: 30, unit: 'MINUTES')
disableConcurrentBuilds()
}
stages {
stage('Cài đặt') {
steps {
sh 'npm ci'
}
}
stage('Kiểm tra') {
parallel {
stage('Lint') {
steps {
sh 'npm run lint'
}
}
stage('Test') {
steps {
sh 'npm run test -- --coverage'
}
post {
always {
junit 'test-results/**/*.xml'
publishHTML(target: [
reportDir: 'coverage/lcov-report',
reportFiles: 'index.html',
reportName: 'Coverage Report'
])
}
}
}
}
}
stage('Build Docker') {
when { branch 'main' }
steps {
sh "docker build -t ${REGISTRY}/${DOCKER_IMAGE} ."
sh "docker push ${REGISTRY}/${DOCKER_IMAGE}"
}
}
stage('Deploy') {
when { branch 'main' }
input {
message "Triển khai lên production?"
ok "Đồng ý deploy"
}
steps {
sh '''
ssh deploy@production "
cd /opt/app &&
docker compose pull &&
docker compose up -d
"
'''
}
}
}
post {
failure {
slackSend(
color: 'danger',
message: "Build THẤT BẠI: ${env.JOB_NAME} #${env.BUILD_NUMBER}"
)
}
success {
slackSend(
color: 'good',
message: "Build THÀNH CÔNG: ${env.JOB_NAME} #${env.BUILD_NUMBER}"
)
}
}
}
Prompt mẫu:
Tạo Jenkinsfile với parallel stages cho lint và test, build Docker image, deploy với manual approval
# k8s/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp
namespace: production
labels:
app: myapp
version: v1
spec:
replicas: 3
selector:
matchLabels:
app: myapp
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
template:
metadata:
labels:
app: myapp
version: v1
spec:
serviceAccountName: myapp-sa
containers:
- name: myapp
image: ghcr.io/myorg/myapp:latest
ports:
- containerPort: 3000
protocol: TCP
envFrom:
- configMapRef:
name: myapp-config
- secretRef:
name: myapp-secrets
resources:
requests:
memory: "128Mi"
cpu: "100m"
limits:
memory: "512Mi"
cpu: "500m"
livenessProbe:
httpGet:
path: /health
port: 3000
initialDelaySeconds: 15
periodSeconds: 20
timeoutSeconds: 5
failureThreshold: 3
readinessProbe:
httpGet:
path: /ready
port: 3000
initialDelaySeconds: 5
periodSeconds: 10
timeoutSeconds: 3
failureThreshold: 3
startupProbe:
httpGet:
path: /health
port: 3000
failureThreshold: 30
periodSeconds: 10
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- myapp
topologyKey: kubernetes.io/hostname
# k8s/service.yaml
apiVersion: v1
kind: Service
metadata:
name: myapp-service
namespace: production
labels:
app: myapp
spec:
type: ClusterIP
selector:
app: myapp
ports:
- name: http
port: 80
targetPort: 3000
protocol: TCP
# k8s/ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: myapp-ingress
namespace: production
annotations:
cert-manager.io/cluster-issuer: letsencrypt-prod
nginx.ingress.kubernetes.io/rate-limit: "100"
nginx.ingress.kubernetes.io/rate-limit-window: "1m"
nginx.ingress.kubernetes.io/ssl-redirect: "true"
nginx.ingress.kubernetes.io/force-ssl-redirect: "true"
spec:
ingressClassName: nginx
tls:
- hosts:
- myapp.example.com
secretName: myapp-tls
rules:
- host: myapp.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: myapp-service
port:
number: 80
# k8s/configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: myapp-config
namespace: production
data:
NODE_ENV: "production"
LOG_LEVEL: "info"
REDIS_HOST: "redis-master.redis.svc.cluster.local"
REDIS_PORT: "6379"
---
# k8s/secret.yaml
apiVersion: v1
kind: Secret
metadata:
name: myapp-secrets
namespace: production
type: Opaque
stringData:
DATABASE_URL: "postgresql://user:pass@db-host:5432/myapp"
JWT_SECRET: "super-secret-key-thay-doi-ngay"
API_KEY: "api-key-cua-ban"
# k8s/hpa.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: myapp-hpa
namespace: production
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: myapp
minReplicas: 3
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
Prompt mẫu:
Tạo Kubernetes manifests cho ứng dụng Node.js:
- Deployment với 3 replicas
- Service type ClusterIP
- Ingress với SSL
- ConfigMap và Secret
Tạo Helm chart cho ứng dụng microservice với values cho staging và production
# Tạo Helm chart mới
helm create myapp-chart
# myapp-chart/values.yaml
replicaCount: 3
image:
repository: ghcr.io/myorg/myapp
pullPolicy: IfNotPresent
tag: "latest"
service:
type: ClusterIP
port: 80
targetPort: 3000
ingress:
enabled: true
className: nginx
annotations:
cert-manager.io/cluster-issuer: letsencrypt-prod
hosts:
- host: myapp.example.com
paths:
- path: /
pathType: Prefix
tls:
- secretName: myapp-tls
hosts:
- myapp.example.com
resources:
limits:
cpu: 500m
memory: 512Mi
requests:
cpu: 100m
memory: 128Mi
autoscaling:
enabled: true
minReplicas: 3
maxReplicas: 10
targetCPUUtilizationPercentage: 70
env:
NODE_ENV: production
LOG_LEVEL: info
# Triển khai Helm chart
helm install myapp ./myapp-chart -f values-production.yaml -n production
# Cập nhật
helm upgrade myapp ./myapp-chart -f values-production.yaml -n production
# Rollback
helm rollback myapp 1 -n production
# Xem lịch sử
helm history myapp -n production
terraform/
├── environments/
│ ├── staging/
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ └── terraform.tfvars
│ └── production/
│ ├── main.tf
│ ├── variables.tf
│ └── terraform.tfvars
├── modules/
│ ├── vpc/
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ └── outputs.tf
│ ├── ec2/
│ ├── rds/
│ └── s3/
└── backend.tf
# modules/vpc/main.tf
resource "aws_vpc" "main" {
cidr_block = var.vpc_cidr
enable_dns_hostnames = true
enable_dns_support = true
tags = {
Name = "${var.project}-vpc"
Environment = var.environment
}
}
# Subnet công khai
resource "aws_subnet" "public" {
count = length(var.public_subnet_cidrs)
vpc_id = aws_vpc.main.id
cidr_block = var.public_subnet_cidrs[count.index]
availability_zone = var.availability_zones[count.index]
map_public_ip_on_launch = true
tags = {
Name = "${var.project}-public-${count.index + 1}"
}
}
# Subnet riêng tư
resource "aws_subnet" "private" {
count = length(var.private_subnet_cidrs)
vpc_id = aws_vpc.main.id
cidr_block = var.private_subnet_cidrs[count.index]
availability_zone = var.availability_zones[count.index]
tags = {
Name = "${var.project}-private-${count.index + 1}"
}
}
# Internet Gateway
resource "aws_internet_gateway" "main" {
vpc_id = aws_vpc.main.id
tags = {
Name = "${var.project}-igw"
}
}
# NAT Gateway
resource "aws_eip" "nat" {
domain = "vpc"
}
resource "aws_nat_gateway" "main" {
allocation_id = aws_eip.nat.id
subnet_id = aws_subnet.public[0].id
tags = {
Name = "${var.project}-nat"
}
}
# Security Group cho ứng dụng
resource "aws_security_group" "app" {
name_prefix = "${var.project}-app-"
vpc_id = aws_vpc.main.id
ingress {
from_port = 80
to_port = 80
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
ingress {
from_port = 443
to_port = 443
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
tags = {
Name = "${var.project}-app-sg"
}
}
# Security Group cho database
resource "aws_security_group" "db" {
name_prefix = "${var.project}-db-"
vpc_id = aws_vpc.main.id
ingress {
from_port = 5432
to_port = 5432
protocol = "tcp"
security_groups = [aws_security_group.app.id]
}
tags = {
Name = "${var.project}-db-sg"
}
}
# modules/ec2/main.tf
resource "aws_instance" "app" {
ami = var.ami_id
instance_type = var.instance_type
subnet_id = var.subnet_id
vpc_security_group_ids = [var.security_group_id]
key_name = var.key_name
iam_instance_profile = aws_iam_instance_profile.app.name
root_block_device {
volume_type = "gp3"
volume_size = 30
encrypted = true
}
user_data = <<-EOF
#!/bin/bash
# Cài đặt Docker
yum update -y
yum install -y docker
systemctl start docker
systemctl enable docker
usermod -aG docker ec2-user
# Cài Docker Compose
curl -L "https://github.com/docker/compose/releases/latest/download/docker-compose-$(uname -s)-$(uname -m)" \
-o /usr/local/bin/docker-compose
chmod +x /usr/local/bin/docker-compose
EOF
tags = {
Name = "${var.project}-app"
Environment = var.environment
}
}
# modules/rds/main.tf
resource "aws_db_subnet_group" "main" {
name = "${var.project}-db-subnet"
subnet_ids = var.private_subnet_ids
tags = {
Name = "${var.project}-db-subnet-group"
}
}
resource "aws_db_instance" "postgres" {
identifier = "${var.project}-db"
engine = "postgres"
engine_version = "16.1"
instance_class = var.db_instance_class
allocated_storage = 20
max_allocated_storage = 100
storage_type = "gp3"
storage_encrypted = true
db_name = var.db_name
username = var.db_username
password = var.db_password
db_subnet_group_name = aws_db_subnet_group.main.name
vpc_security_group_ids = [var.db_security_group_id]
multi_az = var.environment == "production" ? true : false
publicly_accessible = false
backup_retention_period = 7
backup_window = "03:00-04:00"
maintenance_window = "Mon:04:00-Mon:05:00"
deletion_protection = var.environment == "production" ? true : false
skip_final_snapshot = var.environment == "production" ? false : true
tags = {
Name = "${var.project}-postgres"
Environment = var.environment
}
}
# modules/s3/main.tf
resource "aws_s3_bucket" "assets" {
bucket = "${var.project}-assets-${var.environment}"
tags = {
Name = "${var.project}-assets"
Environment = var.environment
}
}
resource "aws_s3_bucket_versioning" "assets" {
bucket = aws_s3_bucket.assets.id
versioning_configuration {
status = "Enabled"
}
}
resource "aws_s3_bucket_server_side_encryption_configuration" "assets" {
bucket = aws_s3_bucket.assets.id
rule {
apply_server_side_encryption_by_default {
sse_algorithm = "aws:kms"
}
}
}
resource "aws_s3_bucket_public_access_block" "assets" {
bucket = aws_s3_bucket.assets.id
block_public_acls = true
block_public_policy = true
ignore_public_acls = true
restrict_public_buckets = true
}
resource "aws_s3_bucket_lifecycle_configuration" "assets" {
bucket = aws_s3_bucket.assets.id
rule {
id = "archive-old-objects"
status = "Enabled"
transition {
days = 90
storage_class = "STANDARD_IA"
}
transition {
days = 180
storage_class = "GLACIER"
}
expiration {
days = 365
}
}
}
Prompt mẫu:
Tạo Terraform config cho AWS:
- VPC, Subnet, Security Group
- EC2 instance
- RDS PostgreSQL
- S3 bucket
Tạo Terraform module cho ECS Fargate cluster chạy ứng dụng containerized
# Khởi tạo
terraform init
# Xem kế hoạch thay đổi
terraform plan -var-file=environments/production/terraform.tfvars
# Áp dụng thay đổi
terraform apply -var-file=environments/production/terraform.tfvars
# Xóa hạ tầng
terraform destroy -var-file=environments/staging/terraform.tfvars
# Format code
terraform fmt -recursive
# Validate cấu hình
terraform validate
# Xem state hiện tại
terraform state list
# docker-compose.monitoring.yml
version: "3.9"
services:
# === Prometheus ===
prometheus:
image: prom/prometheus:v2.50.0
ports:
- "9090:9090"
volumes:
- ./monitoring/prometheus.yml:/etc/prometheus/prometheus.yml:ro
- ./monitoring/alert-rules.yml:/etc/prometheus/alert-rules.yml:ro
- prometheus-data:/prometheus
command:
- "--config.file=/etc/prometheus/prometheus.yml"
- "--storage.tsdb.retention.time=30d"
- "--web.enable-lifecycle"
restart: unless-stopped
# === Grafana ===
grafana:
image: grafana/grafana:10.3.0
ports:
- "3001:3000"
environment:
- GF_SECURITY_ADMIN_USER=admin
- GF_SECURITY_ADMIN_PASSWORD=${GRAFANA_PASSWORD}
- GF_USERS_ALLOW_SIGN_UP=false
volumes:
- grafana-data:/var/lib/grafana
- ./monitoring/grafana/dashboards:/etc/grafana/provisioning/dashboards:ro
- ./monitoring/grafana/datasources:/etc/grafana/provisioning/datasources:ro
depends_on:
- prometheus
restart: unless-stopped
# === Alertmanager ===
alertmanager:
image: prom/alertmanager:v0.27.0
ports:
- "9093:9093"
volumes:
- ./monitoring/alertmanager.yml:/etc/alertmanager/alertmanager.yml:ro
restart: unless-stopped
# === Node Exporter (metrics hệ thống) ===
node-exporter:
image: prom/node-exporter:v1.7.0
ports:
- "9100:9100"
volumes:
- /proc:/host/proc:ro
- /sys:/host/sys:ro
- /:/rootfs:ro
command:
- "--path.procfs=/host/proc"
- "--path.sysfs=/host/sys"
- "--collector.filesystem.mount-points-exclude=^/(sys|proc|dev|host|etc)($$|/)"
restart: unless-stopped
volumes:
prometheus-data:
grafana-data:
# monitoring/prometheus.yml
global:
scrape_interval: 15s
evaluation_interval: 15s
alerting:
alertmanagers:
- static_configs:
- targets:
- alertmanager:9093
rule_files:
- "alert-rules.yml"
scrape_configs:
- job_name: "prometheus"
static_configs:
- targets: ["localhost:9090"]
- job_name: "app"
metrics_path: /metrics
static_configs:
- targets: ["app:3000"]
- job_name: "node-exporter"
static_configs:
- targets: ["node-exporter:9100"]
- job_name: "postgres"
static_configs:
- targets: ["postgres-exporter:9187"]
# monitoring/alert-rules.yml
groups:
- name: app-alerts
rules:
- alert: HighErrorRate
expr: rate(http_requests_total{status=~"5.."}[5m]) / rate(http_requests_total[5m]) > 0.05
for: 5m
labels:
severity: critical
annotations:
summary: "Tỷ lệ lỗi HTTP cao ()"
description: "Tỷ lệ lỗi 5xx vượt quá 5% trong 5 phút gần nhất"
- alert: HighResponseTime
expr: histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m])) > 2
for: 5m
labels:
severity: warning
annotations:
summary: "Thời gian phản hồi cao (p95 = s)"
- alert: HighMemoryUsage
expr: process_resident_memory_bytes / 1024 / 1024 > 512
for: 10m
labels:
severity: warning
annotations:
summary: "Bộ nhớ sử dụng cao (MB)"
- alert: PodDown
expr: up == 0
for: 1m
labels:
severity: critical
annotations:
summary: "Service đang không hoạt động"
// src/routes/health.ts
import { Router } from "express";
import { Pool } from "pg";
import { Redis } from "ioredis";
const router = Router();
interface HealthStatus {
status: "healthy" | "unhealthy";
timestamp: string;
uptime: number;
checks: Record<string, { status: string; latency?: number; error?: string }>;
}
// Health check đơn giản (cho liveness probe)
router.get("/health", (_req, res) => {
res.json({ status: "ok", timestamp: new Date().toISOString() });
});
// Readiness check (kiểm tra dependencies)
router.get("/ready", async (_req, res) => {
const health: HealthStatus = {
status: "healthy",
timestamp: new Date().toISOString(),
uptime: process.uptime(),
checks: {},
};
// Kiểm tra database
try {
const start = Date.now();
await pool.query("SELECT 1");
health.checks.database = {
status: "up",
latency: Date.now() - start,
};
} catch (error) {
health.status = "unhealthy";
health.checks.database = {
status: "down",
error: (error as Error).message,
};
}
// Kiểm tra Redis
try {
const start = Date.now();
await redis.ping();
health.checks.redis = {
status: "up",
latency: Date.now() - start,
};
} catch (error) {
health.status = "unhealthy";
health.checks.redis = {
status: "down",
error: (error as Error).message,
};
}
const statusCode = health.status === "healthy" ? 200 : 503;
res.status(statusCode).json(health);
});
export default router;
# Thêm vào docker-compose.monitoring.yml
services:
loki:
image: grafana/loki:2.9.4
ports:
- "3100:3100"
volumes:
- ./monitoring/loki-config.yml:/etc/loki/local-config.yaml:ro
- loki-data:/loki
command: -config.file=/etc/loki/local-config.yaml
restart: unless-stopped
promtail:
image: grafana/promtail:2.9.4
volumes:
- ./monitoring/promtail-config.yml:/etc/promtail/config.yml:ro
- /var/log:/var/log:ro
- /var/lib/docker/containers:/var/lib/docker/containers:ro
command: -config.file=/etc/promtail/config.yml
restart: unless-stopped
Prompt mẫu:
Tạo setup Prometheus + Grafana + Alertmanager với Docker Compose, thêm alert rules cho HTTP error rate và response time
Tạo health check endpoint cho ứng dụng Express.js, kiểm tra database và Redis
Tạo cấu hình Loki + Promtail để thu thập logs từ Docker containers
# .env.example - Template cho biến môi trường (commit vào git)
NODE_ENV=development
PORT=3000
# Database
DATABASE_URL=postgresql://user:password@localhost:5432/myapp
# Redis
REDIS_URL=redis://localhost:6379
# JWT
JWT_SECRET=thay-doi-secret-nay
JWT_EXPIRES_IN=7d
# External APIs
STRIPE_SECRET_KEY=sk_test_xxx
AWS_ACCESS_KEY_ID=xxx
AWS_SECRET_ACCESS_KEY=xxx
# .gitignore - KHÔNG BAO GIỜ commit file .env thật
.env
.env.local
.env.production
*.pem
*.key
# docker-compose.yml với Docker Secrets
version: "3.9"
services:
app:
image: myapp:latest
secrets:
- db_password
- jwt_secret
- api_key
environment:
- DB_PASSWORD_FILE=/run/secrets/db_password
- JWT_SECRET_FILE=/run/secrets/jwt_secret
secrets:
db_password:
file: ./secrets/db_password.txt # Chế độ development
jwt_secret:
file: ./secrets/jwt_secret.txt
api_key:
external: true # Chế độ production (tạo trước)
// Đọc Docker Secret trong code
import { readFileSync } from "fs";
function getSecret(name: string): string {
// Thử đọc từ Docker Secret trước
const secretPath = `/run/secrets/${name}`;
try {
return readFileSync(secretPath, "utf8").trim();
} catch {
// Fallback về biến môi trường
const envKey = name.toUpperCase();
const value = process.env[envKey];
if (!value) {
throw new Error(`Secret "${name}" không tìm thấy`);
}
return value;
}
}
// Sử dụng
const dbPassword = getSecret("db_password");
const jwtSecret = getSecret("jwt_secret");
// src/utils/aws-secrets.ts
import {
SecretsManagerClient,
GetSecretValueCommand,
} from "@aws-sdk/client-secrets-manager";
const client = new SecretsManagerClient({ region: "ap-southeast-1" });
// Cache secrets để tránh gọi API liên tục
const secretCache = new Map<string, { value: string; expiry: number }>();
const CACHE_TTL = 5 * 60 * 1000; // 5 phút
export async function getSecret(secretName: string): Promise<string> {
// Kiểm tra cache
const cached = secretCache.get(secretName);
if (cached && cached.expiry > Date.now()) {
return cached.value;
}
const command = new GetSecretValueCommand({ SecretId: secretName });
const response = await client.send(command);
const value = response.SecretString!;
// Lưu vào cache
secretCache.set(secretName, {
value,
expiry: Date.now() + CACHE_TTL,
});
return value;
}
// Lấy nhiều secrets cùng lúc
export async function loadSecrets(): Promise<Record<string, string>> {
const secretName = `myapp/${process.env.NODE_ENV}/config`;
const raw = await getSecret(secretName);
return JSON.parse(raw);
}
# Terraform tạo secret trong AWS Secrets Manager
resource "aws_secretsmanager_secret" "app_config" {
name = "myapp/${var.environment}/config"
recovery_window_in_days = 7
tags = {
Environment = var.environment
}
}
resource "aws_secretsmanager_secret_version" "app_config" {
secret_id = aws_secretsmanager_secret.app_config.id
secret_string = jsonencode({
DATABASE_URL = "postgresql://${var.db_user}:${var.db_password}@${aws_db_instance.postgres.endpoint}/${var.db_name}"
JWT_SECRET = var.jwt_secret
REDIS_URL = "redis://${aws_elasticache_cluster.redis.cache_nodes[0].address}:6379"
})
}
# Khởi động Vault server (development mode)
vault server -dev
# Lưu secret
vault kv put secret/myapp/config \
database_url="postgresql://user:pass@db:5432/myapp" \
jwt_secret="super-secret-key" \
api_key="sk-xxx"
# Đọc secret
vault kv get secret/myapp/config
# Đọc field cụ thể
vault kv get -field=database_url secret/myapp/config
# Docker Compose cho Vault
services:
vault:
image: hashicorp/vault:1.15
ports:
- "8200:8200"
environment:
VAULT_DEV_ROOT_TOKEN_ID: "dev-root-token"
VAULT_DEV_LISTEN_ADDRESS: "0.0.0.0:8200"
cap_add:
- IPC_LOCK
volumes:
- vault-data:/vault/data
Prompt mẫu:
Tạo hệ thống quản lý secret cho ứng dụng Node.js: hỗ trợ Docker Secrets, AWS Secrets Manager, và fallback về biến môi trường
Tạo Terraform config để setup AWS Secrets Manager cho các biến môi trường production
/engineering-skills:senior-devops # Tư vấn DevOps chuyên sâu
Sử dụng khi cần:
/engineering-skills:aws-solution-architect # Kiến trúc AWS
Sử dụng khi cần:
/engineering-skills:senior-secops # Security Operations
Sử dụng khi cần:
# Bước 1: Thiết kế kiến trúc
/engineering-skills:senior-architect
> Thiết kế kiến trúc microservices cho ứng dụng e-commerce
# Bước 2: Viết Terraform
/engineering-skills:aws-solution-architect
> Tạo Terraform cho kiến trúc đã thiết kế ở trên
# Bước 3: Setup CI/CD
/engineering-skills:senior-devops
> Tạo GitHub Actions pipeline cho deploy microservices lên ECS
# Bước 4: Monitoring
/engineering-skills:senior-devops
> Setup Prometheus + Grafana monitoring cho hệ thống microservices
# Bước 5: Security audit
/engineering-skills:senior-secops
> Audit bảo mật toàn bộ hạ tầng và pipeline CI/CD
Tạo setup DevOps hoàn chỉnh cho dự án Node.js TypeScript:
1. Dockerfile multi-stage (dev + production)
2. docker-compose.yml (app + postgres + redis)
3. GitHub Actions CI/CD
4. Nginx reverse proxy config
5. Makefile cho các lệnh thường dùng
Phân tích Dockerfile hiện tại và tối ưu:
- Giảm kích thước image
- Tận dụng layer cache tốt hơn
- Thêm security best practices (non-root user, .dockerignore)
- Multi-stage build nếu chưa có
Tạo GitHub Actions workflow cho monorepo gồm:
- packages/api (Node.js)
- packages/web (Next.js)
- packages/shared (common code)
Chỉ build/deploy package nào có thay đổi (path filter)
Thêm bước database migration vào CI/CD pipeline:
- Chạy migration tự động khi deploy
- Rollback migration nếu deploy thất bại
- Backup database trước khi migrate production
Thiết kế blue-green deployment cho ứng dụng trên AWS:
- 2 target groups (blue và green)
- ALB chuyển traffic giữa 2 nhóm
- Health check trước khi switch
- Khả năng rollback nhanh
Cấu hình auto-scaling cho Kubernetes:
- HPA dựa trên CPU và memory
- Custom metrics (request per second)
- Scale-to-zero cho môi trường staging
- PodDisruptionBudget cho high availability
Thiết kế disaster recovery plan:
- Backup database tự động hàng ngày (RDS snapshot)
- Cross-region replication cho S3
- Recovery time objective (RTO): 1 giờ
- Recovery point objective (RPO): 15 phút
- Runbook cho từng tình huống
Setup hệ thống log tập trung:
- Thu thập logs từ tất cả containers
- Structured logging (JSON format)
- Log rotation và retention policy
- Dashboard Grafana cho log analysis
- Alert khi có error pattern bất thường
Cấu hình SSL/TLS tự động:
- Let's Encrypt với cert-manager trên Kubernetes
- Auto-renewal certificates
- Force HTTPS redirect
- HSTS headers
Setup performance monitoring cho ứng dụng Node.js:
- APM với Prometheus metrics (http_request_duration, memory, cpu)
- Database query performance tracking
- Redis cache hit/miss ratio
- Grafana dashboard với các panel quan trọng
- Alert khi p95 response time > 2s
Thêm security scanning vào CI/CD:
- Scan Docker image với Trivy
- Scan dependencies với npm audit / Snyk
- SAST (Static Application Security Testing)
- Block deploy nếu có vulnerability critical
Tạo cấu hình cho 3 môi trường: dev, staging, production
- Terraform workspaces hoặc directories riêng
- Biến môi trường khác nhau cho mỗi env
- GitHub Actions workflow với environment protection rules
- Tự động deploy staging, manual approve cho production
.dockerignore loại bỏ files không cần thiếtlatest)Mẹo: Sử dụng
/engineering-skills:senior-devopskhi cần tư vấn chuyên sâu về DevOps. Kết hợp với/engineering-skills:aws-solution-architectcho các dự án trên AWS và/engineering-skills:senior-secopscho audit bảo mật hạ tầng.