Files
gravl/docs/DEPLOYMENT.md

501 lines
12 KiB
Markdown

# Gravl Deployment Guide
This guide covers how to deploy Gravl's backend and frontend services using automated scripts, verify deployment status, and handle troubleshooting and recovery scenarios.
---
## Overview
Gravl uses Docker and Docker Compose for containerization. Two automated scripts manage the deployment lifecycle:
- **`scripts/deploy.sh`**: Pulls latest code, builds fresh images (with `--no-cache` to prevent stale assets), and starts containers with health checks
- **`scripts/build-check.sh`**: Verifies that running containers match the current git HEAD (detects stale deployments)
---
## Prerequisites
Before deploying, ensure you have:
1. **Docker & Docker Compose** installed and running
```bash
docker --version
docker compose version
```
2. **Git** configured with push/pull access to the repository
```bash
git remote -v
```
3. **Network access** to required ports:
- Backend: `localhost:3001` (health check at `http://localhost:3001/api/health`)
- Frontend: `localhost:3000` (or configured in `docker-compose.yml`)
4. **Sufficient disk space** for Docker images and volumes
```bash
docker system df
```
5. **No conflicting services** using ports 3000-3001
```bash
lsof -i :3000 -i :3001 # (macOS/Linux only)
```
---
## How to Run `deploy.sh`
### Basic Usage
```bash
cd /workspace/gravl
scripts/deploy.sh
```
### What It Does
1. **Git Pull**: Fetches and merges latest code from remote
- Exits if merge conflicts occur (manual resolution required)
2. **Captures Metadata**:
- Current git commit hash
- Build timestamp
- These are stored as Docker image labels for later verification
3. **Builds Docker Images** (`--no-cache`):
- Rebuilds all layers (no caching) to prevent stale assets
- Applies git commit and build timestamp as labels
4. **Starts Containers**:
- Uses `docker compose up -d --force-recreate` to ensure clean start
- Both backend and frontend containers are started
5. **Health Check**:
- Waits up to 60 seconds for backend to respond on `/api/health`
- Retries every 5 seconds (12 attempts max)
- Fails with exit code 1 if health check times out
### Exit Codes
| Code | Meaning | Next Steps |
|------|---------|-----------|
| 0 | Success | Deployment complete; containers healthy |
| 1 | Failure | See troubleshooting below |
### Logs
All deploy activity is logged to `logs/deploy.log`:
```bash
tail -50 logs/deploy.log # Last 50 lines
grep ERROR logs/deploy.log # Find errors
```
### Environment Variables
Optional env vars can be set before running `deploy.sh`:
| Variable | Default | Purpose |
|----------|---------|---------|
| `GIT_COMMIT` | auto-detected | Override git commit label (not recommended) |
| `BUILD_DATE` | auto-detected | Override build timestamp (not recommended) |
---
## How to Check Build Status (`build-check.sh`)
Run this command anytime to verify deployed containers match your local code:
```bash
scripts/build-check.sh
```
### Output Example
**Healthy deployment:**
```
Local HEAD: abc1234 (abc1234567890abcdef1234567890abcdef123456)
[gravl-backend] Built: abc1234 on 2026-03-03T18:21:00Z
[gravl-backend] OK: up to date
[gravl-frontend] Built: abc1234 on 2026-03-03T18:21:00Z
[gravl-frontend] OK: up to date
```
**Stale containers (code updated, not redeployed):**
```
Local HEAD: xyz5678 (xyz5678...)
[gravl-backend] Built: abc1234 on 2026-03-03T18:21:00Z
[gravl-backend] STALE: container is behind local code — run scripts/deploy.sh
[gravl-frontend] Built: abc1234 on 2026-03-03T18:21:00Z
[gravl-frontend] STALE: container is behind local code — run scripts/deploy.sh
```
**Missing labels (container built manually, not via deploy.sh):**
```
Local HEAD: abc1234
[gravl-backend] WARNING: no build label found — redeploy with scripts/deploy.sh to add tracking
[gravl-frontend] Not running
```
### Exit Codes
| Code | Meaning |
|------|---------|
| 0 | All checks completed (warnings don't fail; see output for status) |
| (no error exit) | Missing containers are noted but don't cause failure |
---
## Troubleshooting
### Health Check Failures
**Symptom:** `ERROR: Health check failed after 60s`
**Causes & Solutions:**
1. **Backend service didn't start**
```bash
docker logs gravl-backend | tail -20
# Look for:
# - Port conflicts (ERR_EADDRINUSE)
# - Missing dependencies (module not found)
# - Database connection errors
```
2. **Port 3001 is already in use**
```bash
lsof -i :3001 # Find what's using it
docker port gravl-backend # Check exposed port
kill -9 <PID> # Kill conflicting process (if safe)
scripts/deploy.sh # Retry
```
3. **Network issue between host and container**
```bash
docker inspect gravl-backend --format '{{.NetworkSettings.IPAddress}}'
curl -sf http://<container-ip>:3001/api/health # Test directly
```
4. **Backend code has syntax error**
```bash
docker logs gravl-backend 2>&1 | grep -i "syntax\|error\|exception"
# Check backend/src/index.js for obvious errors
# Revert recent changes: git log --oneline -5 && git checkout <good-commit>
```
**Quick recovery:**
```bash
# 1. Stop everything
docker compose down
# 2. Check backend logs
docker compose up -d gravl-backend
sleep 5
docker logs gravl-backend | tail -50
# 3. If logs show errors, fix code and retry
git diff HEAD~1..HEAD backend/src/
# ... fix issues ...
scripts/deploy.sh
```
---
### Stale Containers
**Symptom:** `build-check.sh` shows `STALE: container is behind local code`
**Causes:**
- Code was updated (`git pull`) but `deploy.sh` hasn't been run
- Deployment failed partway through
- Manual restart without redeploy
**Solution:**
```bash
scripts/deploy.sh
scripts/build-check.sh # Verify update
```
---
### Missing Build Labels
**Symptom:** `WARNING: no build label found — redeploy with scripts/deploy.sh`
**Causes:**
- Container was built with `docker compose build` directly (not via `deploy.sh`)
- Container predates the labeling system
**Solution:**
```bash
# Re-deploy to add labels
scripts/deploy.sh
```
---
### Container Won't Start (CrashLoopBackOff / Exited)
**Symptom:** `docker compose ps` shows container in "Exited" state
**Steps:**
1. **Check container logs**
```bash
docker logs gravl-backend --tail 50
docker logs gravl-frontend --tail 50
```
2. **Check docker-compose.yml for typos**
```bash
docker compose config # Validates syntax
```
3. **Inspect health check endpoint**
```bash
curl -v http://localhost:3001/api/health
# Should see HTTP 200, not 404 or 500
```
4. **If all else fails, clean rebuild**
```bash
docker compose down
docker rmi gravl-backend gravl-frontend
docker system prune -f
scripts/deploy.sh
```
---
### Database Connection Issues
**Symptom:** Backend logs show `Connection refused` or `ECONNREFUSED`
**Causes:**
- Database service not running
- Wrong host/port in `.env` or backend code
- Network issue between containers
**Solutions:**
1. **Check database service status** (if applicable)
```bash
docker compose ps # All services running?
docker network ls # Check gravl network exists
```
2. **Verify connection string in `.env`**
```bash
cat .env | grep -i database
# Should match docker-compose.yml service name (e.g., gravl-db:5432)
```
3. **Test connection from backend container**
```bash
docker exec gravl-backend ping gravl-db
docker exec gravl-backend curl http://gravl-db:5432 # If HTTP, adjust port
```
---
### Disk Space Issues
**Symptom:** `no space left on device` during build
**Solution:**
```bash
# Check disk usage
docker system df
# Clean up unused images/containers
docker system prune -a --volumes
# Then retry deploy
scripts/deploy.sh
```
---
## Recovery Procedures
### Manual Rollback to Previous Commit
Use this when the deployed code is broken and you need to quickly revert.
```bash
# 1. Find the last good commit
git log --oneline -10 # Review recent commits
# 2. Check out the known-good commit
git checkout <commit-hash>
# 3. Redeploy
scripts/deploy.sh
# 4. Verify
scripts/build-check.sh
curl -sf http://localhost:3001/api/health
# 5. Document the incident
echo "Rolled back to <commit-hash> due to <reason>" >> logs/rollback.log
```
### Emergency Container Cleanup
Use this when containers are hung, corrupted, or in an unknown state.
```bash
# 1. Stop all services
docker compose down
# 2. Remove images (forces fresh rebuild)
docker rmi gravl-backend gravl-frontend
# 3. Clear unused volumes (optional; use with caution!)
# docker volume prune
# 4. Rebuild from scratch
scripts/deploy.sh
# 5. Verify all containers running and healthy
docker compose ps
scripts/build-check.sh
curl -sf http://localhost:3001/api/health
```
**Safety Check:** If your data is in Docker volumes, `docker volume prune` will destroy them. Skip this step unless you're sure you don't need the data.
### Staged Rollback (Zero-Downtime)
If you're running a blue-green deployment setup:
```bash
# 1. Deploy to green environment
cd /path/to/green
git pull && docker compose build --no-cache && docker compose up -d
# 2. Test green (health check, smoke tests)
curl -sf http://green-backend:3001/api/health
# 3. Switch traffic to green (via load balancer or DNS)
# (Implementation depends on your infrastructure)
# 4. If green has issues, revert traffic to blue immediately
# (Blue kept serving; no downtime)
# 5. Debug green offline
docker logs gravl-backend
```
---
## Monitoring After Deployment
### Immediate Checks (after `deploy.sh` completes)
```bash
# Containers are running
docker compose ps
# Backend is healthy
curl -sf http://localhost:3001/api/health | jq .
# Containers match local code
scripts/build-check.sh
# Logs have no errors
docker logs gravl-backend 2>&1 | grep -i error | head -5
```
### Ongoing Checks (periodically)
```bash
# Run build-check regularly (cron every 30 min, or manual)
scripts/build-check.sh
# Monitor resource usage
docker stats gravl-backend gravl-frontend
# Audit logs for issues
docker logs gravl-backend --since 1h --until now | grep ERROR
```
### Example Monitoring Script
```bash
#!/bin/bash
# Save as scripts/health-monitor.sh
set -euo pipefail
HEALTHY=true
# Check containers running
docker compose ps | grep -q "Up" || HEALTHY=false
# Check health endpoint
curl -sf http://localhost:3001/api/health || HEALTHY=false
# Check for stale containers
scripts/build-check.sh | grep -q "STALE" && HEALTHY=false
if [ "$HEALTHY" = "true" ]; then
echo "[$(date)] Gravl is healthy ✓"
else
echo "[$(date)] Gravl has issues! See above." >&2
exit 1
fi
```
---
## Best Practices
1. **Always run `build-check.sh` before deploying changes**
- Ensures you know current state
- Catches stale containers early
2. **Review changes before deploying**
```bash
git log --oneline -5 # Recent commits
git diff origin/main..HEAD # What will be deployed
```
3. **Test in staging first**
- Separate staging environment for pre-production testing
- Deploy to staging, verify, then deploy to production
4. **Keep logs rotated**
- `logs/deploy.log` can grow large
- Use `logrotate` or manual cleanup: `tail -1000 logs/deploy.log > logs/deploy.log.1 && > logs/deploy.log`
5. **Automate regular checks**
- Cron job to run `build-check.sh` every 30 minutes
- Send alerts if "STALE" or "WARNING" found
6. **Document rollbacks**
- Always log why you rolled back
- Review patterns (e.g., "rolled back 3 times this week" = code review process failing)
---
## See Also
- **Testing**: [DEPLOYMENT_TEST_PLAN.md](./DEPLOYMENT_TEST_PLAN.md) — comprehensive test scenarios
- **Code style**: [CODING-CONVENTIONS.md](./CODING-CONVENTIONS.md)
- **Architecture**: Backend README or architecture docs (if available)
---
*Last updated: 2026-03-03 | Maintained by: Gravl Development Team*