# Production Readiness Implementation Plan # Phase 10-07, Task 5 — EXECUTION ROADMAP **Date:** 2026-03-07 **Status:** IMPLEMENTATION READY **Owner:** Backend-Dev (execution) + Architect (oversight) **Target Completion:** +6-8 hours from start (by ~09:30-11:30 CET Saturday) --- ## Executive Summary Task 5 (Production Readiness Review) has **4 critical blockers** preventing production launch. This document provides the exact implementation steps for each blocker with pre-written Kubernetes manifests and validation procedures. **All 4 blockers have templates ready in `/workspace/gravl/k8s/production/`:** 1. `cert-manager-setup.yaml` — TLS automation 2. `sealed-secrets-setup.yaml` — Secrets encryption 3. `network-policy-with-dns.yaml` — Network egress fix 4. `load-test.js` + execution instructions --- ## Critical Path Execution (Ordered by Dependency) ### ✅ Blocker 1: TLS/cert-manager Setup (Dependency: None) **File:** `k8s/production/cert-manager-setup.yaml` **Status:** READY FOR IMPLEMENTATION #### Steps: ```bash # 1. Install cert-manager controller (official release) kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.14.0/cert-manager.yaml # 2. Verify installation kubectl rollout status deployment/cert-manager-webhook -n cert-manager --timeout=120s kubectl rollout status deployment/cert-manager -n cert-manager --timeout=120s # 3. Apply ClusterIssuers (Let's Encrypt prod + staging) kubectl apply -f k8s/production/cert-manager-setup.yaml # 4. Verify issuers created kubectl get clusterissuer -A # Expected output: # NAME READY AGE # letsencrypt-prod True 2m # letsencrypt-staging True 2m # selfsigned-issuer True 2m # 5. Create Cloudflare API token secret (MANUAL) kubectl create secret generic cloudflare-api-token \ --from-literal=api-token=YOUR_CLOUDFLARE_API_TOKEN \ -n cert-manager # 6. Update Ingress with cert-manager annotation (already in template) # Ingress automatically requests certificate once annotation is set kubectl apply -f k8s/production/cert-manager-setup.yaml # 7. Verify certificate creation kubectl get certificate -A kubectl get secret -A | grep gravl-tls-prod ``` #### Validation Checklist: - [ ] cert-manager pods running in cert-manager namespace - [ ] ClusterIssuers show READY=True - [ ] Certificate created in gravl-prod namespace - [ ] TLS secret `gravl-tls-prod` exists - [ ] HTTPS accessible on gravl.app + api.gravl.app - [ ] cert-manager logs show no errors **Estimated Duration:** 10-15 minutes (certificate issuance may take 1-2 minutes) --- ### ✅ Blocker 2: Secrets Management (Dependency: None — parallel with TLS) **File:** `k8s/production/sealed-secrets-setup.yaml` **Status:** TWO OPTIONS (choose one) #### OPTION A: sealed-secrets (kubeseal) — RECOMMENDED for simplicity ```bash # 1. Install sealed-secrets controller kubectl apply -f https://github.com/bitnami-labs/sealed-secrets/releases/download/v0.24.0/controller.yaml # 2. Verify installation kubectl rollout status deployment/sealed-secrets-controller -n kube-system --timeout=120s # 3. Extract sealing key (for backup + disaster recovery) mkdir -p /secure/location kubectl get secret -n kube-system -l sealedsecrets.bitnami.com/status=active \ -o jsonpath='{.items[0].data.tls\.crt}' | base64 -d > /secure/location/sealed-secrets-prod.crt kubectl get secret -n kube-system -l sealedsecrets.bitnami.com/status=active \ -o jsonpath='{.items[0].data.tls\.key}' | base64 -d > /secure/location/sealed-secrets-prod.key # 4. Create plain secret (temporary) cat <100 req/s # 5. Save results to file for documentation k6 run --out json=load-test-results.json k8s/production/load-test.js # 6. Upload results to shared documentation mv load-test-results.json docs/load-test-baseline-2026-03-07.json git add docs/load-test-baseline-*.json git commit -m "Load test baseline: p95 <200ms, error rate <0.1%" ``` #### Validation Checklist: - [ ] k6 installed and executable - [ ] Load test completes without script errors - [ ] p95 latency < 200ms ✅ - [ ] p99 latency < 500ms ✅ - [ ] Error rate < 0.1% ✅ - [ ] Results documented in `docs/load-test-baseline-2026-03-07.json` **Estimated Duration:** 5-10 minutes (test runs for 5 minutes) --- ## Production Readiness Sign-Off Template Once all blockers are complete, update `PRODUCTION_READINESS.md` with final sign-offs: ```markdown ## Final Sign-Off (2026-03-07) ### Security Review ✅ APPROVED - [x] RBAC: Least privilege verified - [x] Network Policies: Default deny + explicit allowlist (DNS egress added) - [x] Secrets Management: sealed-secrets OR External Secrets Operator deployed - [x] TLS/Encryption: cert-manager + Let's Encrypt configured - [x] Image Scanning: Scheduled for [DATE] ### Performance Validation ✅ APPROVED - [x] Load test baseline: p95 <200ms, error rate <0.1% - [x] Database performance: Query latency acceptable - [x] Pod resource limits: Configured and validated ### Operations Readiness ✅ APPROVED - [x] Monitoring: Prometheus + Grafana operational - [x] Alerting: AlertManager configured with receivers - [x] Logging: [Loki workaround OR alternative configured] - [x] Backup: Daily + weekly jobs validated - [x] Runbooks: Created and tested ### Go-Live Authorization: ✅ APPROVED **Authorized by:** [Architect/PM name] **Date:** 2026-03-07 **Conditions:** All critical path items complete, load test passing, monitoring alerts active ``` --- ## Rollback Readiness If any blocker fails production testing: ```bash # 1. Immediate rollback to staging-only: kubectl scale deployment -n gravl-prod --replicas=0 # 2. Disable cert-manager for Ingress (revert to self-signed): kubectl patch ingress gravl-ingress -n gravl-prod --type json \ -p='[{"op":"remove","path":"/metadata/annotations/cert-manager.io~1cluster-issuer"}]' # 3. Restore pre-cert-manager Ingress: kubectl apply -f k8s/staging/ingress.yaml # 4. Alert team: "Production deployment rolled back — investigation required" ``` --- ## Success Criteria Phase 10-07 is **COMPLETE** when: ✅ All 4 critical blockers resolved ✅ Load test baseline documented (p95 <200ms) ✅ Security sign-off checklist approved ✅ Monitoring + alerting operational ✅ Team authorization obtained ✅ Go-live procedure documented **Ready to proceed to production launch.** --- ## Timeline Summary | Blocker | Duration | Start | End | |---------|----------|-------|-----| | 1. cert-manager setup | 10-15 min | 03:40 | 03:55 | | 2. Secrets mgmt (parallel) | 10-15 min | 03:40 | 03:55 | | 3. Network policy (parallel) | 5-10 min | 03:40 | 03:50 | | 4. Load test | 5-10 min | 04:00 | 04:10 | | **Total** | **6-8 hours** | **03:40** | **~09:30-11:30** | *(Includes buffer for kubectl wait times, certificate issuance, etc.)* --- **Document Version:** 2.0 (Implementation Ready) **Last Updated:** 2026-03-07 03:45 **Owner:** Gravl PM Autonomy / Architect **Next Review:** Before production launch