chore(07-03): Stage deployment scripts and documentation updates

This commit is contained in:
2026-03-03 19:24:29 +01:00
parent fa766b21f7
commit 1104f6360e
4 changed files with 722 additions and 377 deletions
+380 -172
View File
@@ -1,17 +1,52 @@
# Gravl Deployment Guide
This guide covers how to deploy the Gravl application, verify deployments, and troubleshoot common issues.
This guide covers how to deploy Gravl's backend and frontend services using automated scripts, verify deployment status, and handle troubleshooting and recovery scenarios.
---
## Overview
Gravl uses Docker and Docker Compose for containerization. Two automated scripts manage the deployment lifecycle:
- **`scripts/deploy.sh`**: Pulls latest code, builds fresh images (with `--no-cache` to prevent stale assets), and starts containers with health checks
- **`scripts/build-check.sh`**: Verifies that running containers match the current git HEAD (detects stale deployments)
---
## Prerequisites
- Docker and Docker Compose installed
- Git repository with remote configured
- Access to `/workspace/gravl` directory
- Backend API listening on `http://localhost:3001/api/health`
Before deploying, ensure you have:
## Deployment Script
1. **Docker & Docker Compose** installed and running
```bash
docker --version
docker compose version
```
### Running a Deployment
2. **Git** configured with push/pull access to the repository
```bash
git remote -v
```
3. **Network access** to required ports:
- Backend: `localhost:3001` (health check at `http://localhost:3001/api/health`)
- Frontend: `localhost:3000` (or configured in `docker-compose.yml`)
4. **Sufficient disk space** for Docker images and volumes
```bash
docker system df
```
5. **No conflicting services** using ports 3000-3001
```bash
lsof -i :3000 -i :3001 # (macOS/Linux only)
```
---
## How to Run `deploy.sh`
### Basic Usage
```bash
cd /workspace/gravl
@@ -20,57 +55,98 @@ scripts/deploy.sh
### What It Does
1. **Pulls latest code:** `git pull`
2. **Captures build metadata:**
- Git commit SHA
1. **Git Pull**: Fetches and merges latest code from remote
- Exits if merge conflicts occur (manual resolution required)
2. **Captures Metadata**:
- Current git commit hash
- Build timestamp
3. **Builds fresh images:** `docker compose build --no-cache`
- `--no-cache` ensures all layers are rebuilt (prevents stale assets)
4. **Restarts containers:** `docker compose up -d --force-recreate`
5. **Health check:** Polls `/api/health` for up to 60 seconds
6. **Logs deployment:** Records all steps to `logs/deploy.log`
- These are stored as Docker image labels for later verification
3. **Builds Docker Images** (`--no-cache`):
- Rebuilds all layers (no caching) to prevent stale assets
- Applies git commit and build timestamp as labels
4. **Starts Containers**:
- Uses `docker compose up -d --force-recreate` to ensure clean start
- Both backend and frontend containers are started
5. **Health Check**:
- Waits up to 60 seconds for backend to respond on `/api/health`
- Retries every 5 seconds (12 attempts max)
- Fails with exit code 1 if health check times out
### Output Example
### Exit Codes
| Code | Meaning | Next Steps |
|------|---------|-----------|
| 0 | Success | Deployment complete; containers healthy |
| 1 | Failure | See troubleshooting below |
### Logs
All deploy activity is logged to `logs/deploy.log`:
```bash
tail -50 logs/deploy.log # Last 50 lines
grep ERROR logs/deploy.log # Find errors
```
[2026-03-03 18:30:00] === Deploy started ===
[2026-03-03 18:30:01] Pulling latest code...
[2026-03-03 18:30:05] Commit: 53f4df6 | Date: 2026-03-03T18:30:00Z
[2026-03-03 18:30:06] Building images (--no-cache)...
[2026-03-03 18:30:45] Starting containers...
[2026-03-03 18:30:50] Health check...
[2026-03-03 18:30:55] Backend healthy
[2026-03-03 18:30:56] === Deploy complete: 53f4df6 ===
```
### Environment Variables
Optional env vars can be set before running `deploy.sh`:
| Variable | Default | Purpose |
|----------|---------|---------|
| `GIT_COMMIT` | auto-detected | Override git commit label (not recommended) |
| `BUILD_DATE` | auto-detected | Override build timestamp (not recommended) |
---
## Checking Deployment Status
## How to Check Build Status (`build-check.sh`)
### Build Status Check
Run this command anytime to verify deployed containers match your local code:
```bash
cd /workspace/gravl
scripts/build-check.sh
```
### Output Example
**Healthy deployment:**
```
Local HEAD: 53f4df6 (53f4df6f8a5c4d2e1f0a9b8c7d6e5f4a3b2c1d0)
Local HEAD: abc1234 (abc1234567890abcdef1234567890abcdef123456)
[gravl-backend] Built: 53f4df6 on 2026-03-03T18:30:00Z
[gravl-backend] Built: abc1234 on 2026-03-03T18:21:00Z
[gravl-backend] OK: up to date
[gravl-frontend] Built: 53f4df6 on 2026-03-03T18:30:00Z
[gravl-frontend] Built: abc1234 on 2026-03-03T18:21:00Z
[gravl-frontend] OK: up to date
```
### What the Check Tells You
**Stale containers (code updated, not redeployed):**
```
Local HEAD: xyz5678 (xyz5678...)
- **OK: up to date** — Containers match the local git commit (everything is current)
- **STALE: container is behind local code**Code has changed but containers haven't been redeployed yet
- **WARNING: no build label found** — Container is old (pre-07-02) and lacks build tracking labels
[gravl-backend] Built: abc1234 on 2026-03-03T18:21:00Z
[gravl-backend] STALE: container is behind local code — run scripts/deploy.sh
[gravl-frontend] Built: abc1234 on 2026-03-03T18:21:00Z
[gravl-frontend] STALE: container is behind local code — run scripts/deploy.sh
```
**Missing labels (container built manually, not via deploy.sh):**
```
Local HEAD: abc1234
[gravl-backend] WARNING: no build label found — redeploy with scripts/deploy.sh to add tracking
[gravl-frontend] Not running
```
### Exit Codes
| Code | Meaning |
|------|---------|
| 0 | All checks completed (warnings don't fail; see output for status) |
| (no error exit) | Missing containers are noted but don't cause failure |
---
@@ -78,215 +154,347 @@ Local HEAD: 53f4df6 (53f4df6f8a5c4d2e1f0a9b8c7d6e5f4a3b2c1d0)
### Health Check Failures
**Symptom:** Deployment fails with "ERROR: Health check failed after 60s"
**Symptom:** `ERROR: Health check failed after 60s`
**Possible Causes & Solutions:**
**Causes & Solutions:**
| Cause | Solution |
|-------|----------|
| Backend not starting | Check logs: `docker compose logs gravl-backend` |
| Health endpoint not implemented | Implement `GET /api/health` in backend (returns `200 OK`) |
| Network issues | Verify network: `docker network inspect gravl` or restart: `docker compose restart` |
| Port already in use | Check: `lsof -i :3001` and kill the process or change port |
| Insufficient resources | Free disk space: `df -h` or reduce image size |
1. **Backend service didn't start**
```bash
docker logs gravl-backend | tail -20
# Look for:
# - Port conflicts (ERR_EADDRINUSE)
# - Missing dependencies (module not found)
# - Database connection errors
```
2. **Port 3001 is already in use**
```bash
lsof -i :3001 # Find what's using it
docker port gravl-backend # Check exposed port
kill -9 <PID> # Kill conflicting process (if safe)
scripts/deploy.sh # Retry
```
3. **Network issue between host and container**
```bash
docker inspect gravl-backend --format '{{.NetworkSettings.IPAddress}}'
curl -sf http://<container-ip>:3001/api/health # Test directly
```
4. **Backend code has syntax error**
```bash
docker logs gravl-backend 2>&1 | grep -i "syntax\|error\|exception"
# Check backend/src/index.js for obvious errors
# Revert recent changes: git log --oneline -5 && git checkout <good-commit>
```
**Quick recovery:**
**Manual Restart:**
```bash
docker compose restart gravl-backend
# Wait a few seconds
curl -sf http://localhost:3001/api/health
# 1. Stop everything
docker compose down
# 2. Check backend logs
docker compose up -d gravl-backend
sleep 5
docker logs gravl-backend | tail -50
# 3. If logs show errors, fix code and retry
git diff HEAD~1..HEAD backend/src/
# ... fix issues ...
scripts/deploy.sh
```
---
### Stale Containers
**Symptom:** `build-check.sh` shows "STALE: container is behind local code"
**Symptom:** `build-check.sh` shows `STALE: container is behind local code`
**Cause:** Code has been updated but containers haven't been redeployed.
**Causes:**
- Code was updated (`git pull`) but `deploy.sh` hasn't been run
- Deployment failed partway through
- Manual restart without redeploy
**Solution:**
```bash
scripts/deploy.sh
scripts/build-check.sh # Should now show OK
scripts/build-check.sh # Verify update
```
---
### Missing Docker Labels
### Missing Build Labels
**Symptom:** `build-check.sh` shows "WARNING: no build label found"
**Symptom:** `WARNING: no build label found — redeploy with scripts/deploy.sh`
**Cause:** Containers were built before phase 07-02 (before labels were added).
**Causes:**
- Container was built with `docker compose build` directly (not via `deploy.sh`)
- Container predates the labeling system
**Solution:**
```bash
scripts/deploy.sh # Rebuilds with labels
```
---
### Deployment Hangs
**Symptom:** `scripts/deploy.sh` doesn't complete or appears stuck.
**Possible Causes & Solutions:**
| Symptom | Solution |
|---------|----------|
| Stuck at "Building images" | Docker build is slow. Check: `docker builder prune` to free cache |
| Stuck at "Health check" | Backend not responding. Try: `docker compose logs` to see errors |
| Git pull conflicts | Resolve conflicts manually: `cd /workspace/gravl && git status` |
**Force Stop:**
```bash
# Kill the deploy script
pkill -f scripts/deploy.sh
# Manually check status
docker compose ps
docker compose logs
```
---
## Rollback Procedures
### Quick Rollback
If the current deployment is broken:
```bash
# Revert to previous commit
git reset --hard HEAD~1
# Redeploy
scripts/deploy.sh
# Verify
scripts/build-check.sh
```
### Multi-Commit Rollback
If you need to go back several commits:
```bash
# View recent commits
git log --oneline -10
# Rollback to a specific commit (example: abc1234)
git reset --hard abc1234
# Redeploy
# Re-deploy to add labels
scripts/deploy.sh
```
### Rollback Verification
---
After rolling back, verify the system is stable:
### Container Won't Start (CrashLoopBackOff / Exited)
**Symptom:** `docker compose ps` shows container in "Exited" state
**Steps:**
1. **Check container logs**
```bash
docker logs gravl-backend --tail 50
docker logs gravl-frontend --tail 50
```
2. **Check docker-compose.yml for typos**
```bash
docker compose config # Validates syntax
```
3. **Inspect health check endpoint**
```bash
curl -v http://localhost:3001/api/health
# Should see HTTP 200, not 404 or 500
```
4. **If all else fails, clean rebuild**
```bash
docker compose down
docker rmi gravl-backend gravl-frontend
docker system prune -f
scripts/deploy.sh
```
---
### Database Connection Issues
**Symptom:** Backend logs show `Connection refused` or `ECONNREFUSED`
**Causes:**
- Database service not running
- Wrong host/port in `.env` or backend code
- Network issue between containers
**Solutions:**
1. **Check database service status** (if applicable)
```bash
docker compose ps # All services running?
docker network ls # Check gravl network exists
```
2. **Verify connection string in `.env`**
```bash
cat .env | grep -i database
# Should match docker-compose.yml service name (e.g., gravl-db:5432)
```
3. **Test connection from backend container**
```bash
docker exec gravl-backend ping gravl-db
docker exec gravl-backend curl http://gravl-db:5432 # If HTTP, adjust port
```
---
### Disk Space Issues
**Symptom:** `no space left on device` during build
**Solution:**
```bash
# Check containers match the previous code
scripts/build-check.sh
# Check disk usage
docker system df
# Check API is healthy
curl -sf http://localhost:3001/api/health | jq .
# Clean up unused images/containers
docker system prune -a --volumes
# Check frontend is responsive
curl -sf http://localhost:3000/ | head -c 500
# Then retry deploy
scripts/deploy.sh
```
---
## Manual Container Cleanup
## Recovery Procedures
If containers become corrupted or stuck:
### Manual Rollback to Previous Commit
Use this when the deployed code is broken and you need to quickly revert.
```bash
# Stop all containers
# 1. Find the last good commit
git log --oneline -10 # Review recent commits
# 2. Check out the known-good commit
git checkout <commit-hash>
# 3. Redeploy
scripts/deploy.sh
# 4. Verify
scripts/build-check.sh
curl -sf http://localhost:3001/api/health
# 5. Document the incident
echo "Rolled back to <commit-hash> due to <reason>" >> logs/rollback.log
```
### Emergency Container Cleanup
Use this when containers are hung, corrupted, or in an unknown state.
```bash
# 1. Stop all services
docker compose down
# Remove volumes (WARNING: deletes data)
docker compose down -v
# 2. Remove images (forces fresh rebuild)
docker rmi gravl-backend gravl-frontend
# Verify they're gone
docker compose ps
# 3. Clear unused volumes (optional; use with caution!)
# docker volume prune
# Full redeploy
# 4. Rebuild from scratch
scripts/deploy.sh
# 5. Verify all containers running and healthy
docker compose ps
scripts/build-check.sh
curl -sf http://localhost:3001/api/health
```
**Safety Check:** If your data is in Docker volumes, `docker volume prune` will destroy them. Skip this step unless you're sure you don't need the data.
### Staged Rollback (Zero-Downtime)
If you're running a blue-green deployment setup:
```bash
# 1. Deploy to green environment
cd /path/to/green
git pull && docker compose build --no-cache && docker compose up -d
# 2. Test green (health check, smoke tests)
curl -sf http://green-backend:3001/api/health
# 3. Switch traffic to green (via load balancer or DNS)
# (Implementation depends on your infrastructure)
# 4. If green has issues, revert traffic to blue immediately
# (Blue kept serving; no downtime)
# 5. Debug green offline
docker logs gravl-backend
```
---
## Monitoring & Logs
## Monitoring After Deployment
### Deployment Log
### Immediate Checks (after `deploy.sh` completes)
```bash
tail -f logs/deploy.log
# Containers are running
docker compose ps
# Backend is healthy
curl -sf http://localhost:3001/api/health | jq .
# Containers match local code
scripts/build-check.sh
# Logs have no errors
docker logs gravl-backend 2>&1 | grep -i error | head -5
```
### Container Logs
### Ongoing Checks (periodically)
```bash
# Backend logs
docker compose logs gravl-backend
# Run build-check regularly (cron every 30 min, or manual)
scripts/build-check.sh
# Frontend logs
docker compose logs gravl-frontend
# Monitor resource usage
docker stats gravl-backend gravl-frontend
# All logs with timestamps
docker compose logs --timestamps --follow
# Audit logs for issues
docker logs gravl-backend --since 1h --until now | grep ERROR
```
### Build Info
### Example Monitoring Script
```bash
# List deployed images
docker images | grep gravl
#!/bin/bash
# Save as scripts/health-monitor.sh
set -euo pipefail
# Inspect container labels (build metadata)
docker inspect gravl-backend | jq '.Config.Labels'
HEALTHY=true
# Check containers running
docker compose ps | grep -q "Up" || HEALTHY=false
# Check health endpoint
curl -sf http://localhost:3001/api/health || HEALTHY=false
# Check for stale containers
scripts/build-check.sh | grep -q "STALE" && HEALTHY=false
if [ "$HEALTHY" = "true" ]; then
echo "[$(date)] Gravl is healthy ✓"
else
echo "[$(date)] Gravl has issues! See above." >&2
exit 1
fi
```
---
## Best Practices
1. **Always test in staging first** — Validate the deploy in a non-production environment
2. **Check status before deploying** — Run `scripts/build-check.sh` to ensure no stale containers
3. **Review logs after deployment** — Check `logs/deploy.log` for warnings or errors
4. **Plan rollbacks** — Know which commits are stable before deploying
5. **Monitor health endpoints** — Regularly ping `/api/health` in production
6. **Backup before major changes** — Tag releases in git before significant deployments
7. **Use semantic commits** — Make it easy to identify which commits introduced changes
1. **Always run `build-check.sh` before deploying changes**
- Ensures you know current state
- Catches stale containers early
2. **Review changes before deploying**
```bash
git log --oneline -5 # Recent commits
git diff origin/main..HEAD # What will be deployed
```
3. **Test in staging first**
- Separate staging environment for pre-production testing
- Deploy to staging, verify, then deploy to production
4. **Keep logs rotated**
- `logs/deploy.log` can grow large
- Use `logrotate` or manual cleanup: `tail -1000 logs/deploy.log > logs/deploy.log.1 && > logs/deploy.log`
5. **Automate regular checks**
- Cron job to run `build-check.sh` every 30 minutes
- Send alerts if "STALE" or "WARNING" found
6. **Document rollbacks**
- Always log why you rolled back
- Review patterns (e.g., "rolled back 3 times this week" = code review process failing)
---
## FAQ
## See Also
**Q: Can I deploy without building (e.g., just restart containers)?**
A: No. The script always rebuilds to prevent stale code. This is intentional for safety.
**Q: How long should a deployment take?**
A: Typically 60-90 seconds (build time + health check). If longer, check Docker build performance.
**Q: What if I need to deploy a specific commit?**
A: Check it out first, then deploy:
```bash
git checkout <commit-sha>
scripts/deploy.sh
```
**Q: Can I skip the health check?**
A: Not recommended. The health check prevents deploying broken code. Fix the health endpoint instead.
**Q: What data is lost if I rollback?**
A: Container rollback only reverts code. Database data persists unless you `docker compose down -v`.
- **Testing**: [DEPLOYMENT_TEST_PLAN.md](./DEPLOYMENT_TEST_PLAN.md) — comprehensive test scenarios
- **Code style**: [CODING-CONVENTIONS.md](./CODING-CONVENTIONS.md)
- **Architecture**: Backend README or architecture docs (if available)
---
**Last Updated:** 2026-03-03
**Document Version:** 1.0
**Phase:** 07-03
*Last updated: 2026-03-03 | Maintained by: Gravl Development Team*