Files
gravl/docs/DEPLOYMENT.md
T

12 KiB

Gravl Deployment Guide

This guide covers how to deploy Gravl's backend and frontend services using automated scripts, verify deployment status, and handle troubleshooting and recovery scenarios.


Overview

Gravl uses Docker and Docker Compose for containerization. Two automated scripts manage the deployment lifecycle:

  • scripts/deploy.sh: Pulls latest code, builds fresh images (with --no-cache to prevent stale assets), and starts containers with health checks
  • scripts/build-check.sh: Verifies that running containers match the current git HEAD (detects stale deployments)

Prerequisites

Before deploying, ensure you have:

  1. Docker & Docker Compose installed and running

    docker --version
    docker compose version
    
  2. Git configured with push/pull access to the repository

    git remote -v
    
  3. Network access to required ports:

    • Backend: localhost:3001 (health check at http://localhost:3001/api/health)
    • Frontend: localhost:3000 (or configured in docker-compose.yml)
  4. Sufficient disk space for Docker images and volumes

    docker system df
    
  5. No conflicting services using ports 3000-3001

    lsof -i :3000 -i :3001  # (macOS/Linux only)
    

How to Run deploy.sh

Basic Usage

cd /workspace/gravl
scripts/deploy.sh

What It Does

  1. Git Pull: Fetches and merges latest code from remote

    • Exits if merge conflicts occur (manual resolution required)
  2. Captures Metadata:

    • Current git commit hash
    • Build timestamp
    • These are stored as Docker image labels for later verification
  3. Builds Docker Images (--no-cache):

    • Rebuilds all layers (no caching) to prevent stale assets
    • Applies git commit and build timestamp as labels
  4. Starts Containers:

    • Uses docker compose up -d --force-recreate to ensure clean start
    • Both backend and frontend containers are started
  5. Health Check:

    • Waits up to 60 seconds for backend to respond on /api/health
    • Retries every 5 seconds (12 attempts max)
    • Fails with exit code 1 if health check times out

Exit Codes

Code Meaning Next Steps
0 Success Deployment complete; containers healthy
1 Failure See troubleshooting below

Logs

All deploy activity is logged to logs/deploy.log:

tail -50 logs/deploy.log           # Last 50 lines
grep ERROR logs/deploy.log        # Find errors

Environment Variables

Optional env vars can be set before running deploy.sh:

Variable Default Purpose
GIT_COMMIT auto-detected Override git commit label (not recommended)
BUILD_DATE auto-detected Override build timestamp (not recommended)

How to Check Build Status (build-check.sh)

Run this command anytime to verify deployed containers match your local code:

scripts/build-check.sh

Output Example

Healthy deployment:

Local HEAD: abc1234 (abc1234567890abcdef1234567890abcdef123456)

[gravl-backend] Built: abc1234 on 2026-03-03T18:21:00Z
[gravl-backend] OK: up to date
[gravl-frontend] Built: abc1234 on 2026-03-03T18:21:00Z
[gravl-frontend] OK: up to date

Stale containers (code updated, not redeployed):

Local HEAD: xyz5678 (xyz5678...)

[gravl-backend] Built: abc1234 on 2026-03-03T18:21:00Z
[gravl-backend] STALE: container is behind local code — run scripts/deploy.sh
[gravl-frontend] Built: abc1234 on 2026-03-03T18:21:00Z
[gravl-frontend] STALE: container is behind local code — run scripts/deploy.sh

Missing labels (container built manually, not via deploy.sh):

Local HEAD: abc1234

[gravl-backend] WARNING: no build label found — redeploy with scripts/deploy.sh to add tracking
[gravl-frontend] Not running

Exit Codes

Code Meaning
0 All checks completed (warnings don't fail; see output for status)
(no error exit) Missing containers are noted but don't cause failure

Troubleshooting

Health Check Failures

Symptom: ERROR: Health check failed after 60s

Causes & Solutions:

  1. Backend service didn't start

    docker logs gravl-backend | tail -20
    # Look for: 
    # - Port conflicts (ERR_EADDRINUSE)
    # - Missing dependencies (module not found)
    # - Database connection errors
    
  2. Port 3001 is already in use

    lsof -i :3001  # Find what's using it
    docker port gravl-backend  # Check exposed port
    kill -9 <PID>  # Kill conflicting process (if safe)
    scripts/deploy.sh  # Retry
    
  3. Network issue between host and container

    docker inspect gravl-backend --format '{{.NetworkSettings.IPAddress}}'
    curl -sf http://<container-ip>:3001/api/health  # Test directly
    
  4. Backend code has syntax error

    docker logs gravl-backend 2>&1 | grep -i "syntax\|error\|exception"
    # Check backend/src/index.js for obvious errors
    # Revert recent changes: git log --oneline -5 && git checkout <good-commit>
    

Quick recovery:

# 1. Stop everything
docker compose down

# 2. Check backend logs
docker compose up -d gravl-backend
sleep 5
docker logs gravl-backend | tail -50

# 3. If logs show errors, fix code and retry
git diff HEAD~1..HEAD backend/src/
# ... fix issues ...
scripts/deploy.sh

Stale Containers

Symptom: build-check.sh shows STALE: container is behind local code

Causes:

  • Code was updated (git pull) but deploy.sh hasn't been run
  • Deployment failed partway through
  • Manual restart without redeploy

Solution:

scripts/deploy.sh
scripts/build-check.sh  # Verify update

Missing Build Labels

Symptom: WARNING: no build label found — redeploy with scripts/deploy.sh

Causes:

  • Container was built with docker compose build directly (not via deploy.sh)
  • Container predates the labeling system

Solution:

# Re-deploy to add labels
scripts/deploy.sh

Container Won't Start (CrashLoopBackOff / Exited)

Symptom: docker compose ps shows container in "Exited" state

Steps:

  1. Check container logs

    docker logs gravl-backend --tail 50
    docker logs gravl-frontend --tail 50
    
  2. Check docker-compose.yml for typos

    docker compose config  # Validates syntax
    
  3. Inspect health check endpoint

    curl -v http://localhost:3001/api/health
    # Should see HTTP 200, not 404 or 500
    
  4. If all else fails, clean rebuild

    docker compose down
    docker rmi gravl-backend gravl-frontend
    docker system prune -f
    scripts/deploy.sh
    

Database Connection Issues

Symptom: Backend logs show Connection refused or ECONNREFUSED

Causes:

  • Database service not running
  • Wrong host/port in .env or backend code
  • Network issue between containers

Solutions:

  1. Check database service status (if applicable)

    docker compose ps  # All services running?
    docker network ls  # Check gravl network exists
    
  2. Verify connection string in .env

    cat .env | grep -i database
    # Should match docker-compose.yml service name (e.g., gravl-db:5432)
    
  3. Test connection from backend container

    docker exec gravl-backend ping gravl-db
    docker exec gravl-backend curl http://gravl-db:5432  # If HTTP, adjust port
    

Disk Space Issues

Symptom: no space left on device during build

Solution:

# Check disk usage
docker system df

# Clean up unused images/containers
docker system prune -a --volumes

# Then retry deploy
scripts/deploy.sh

Recovery Procedures

Manual Rollback to Previous Commit

Use this when the deployed code is broken and you need to quickly revert.

# 1. Find the last good commit
git log --oneline -10  # Review recent commits

# 2. Check out the known-good commit
git checkout <commit-hash>

# 3. Redeploy
scripts/deploy.sh

# 4. Verify
scripts/build-check.sh
curl -sf http://localhost:3001/api/health

# 5. Document the incident
echo "Rolled back to <commit-hash> due to <reason>" >> logs/rollback.log

Emergency Container Cleanup

Use this when containers are hung, corrupted, or in an unknown state.

# 1. Stop all services
docker compose down

# 2. Remove images (forces fresh rebuild)
docker rmi gravl-backend gravl-frontend

# 3. Clear unused volumes (optional; use with caution!)
# docker volume prune

# 4. Rebuild from scratch
scripts/deploy.sh

# 5. Verify all containers running and healthy
docker compose ps
scripts/build-check.sh
curl -sf http://localhost:3001/api/health

Safety Check: If your data is in Docker volumes, docker volume prune will destroy them. Skip this step unless you're sure you don't need the data.

Staged Rollback (Zero-Downtime)

If you're running a blue-green deployment setup:

# 1. Deploy to green environment
cd /path/to/green
git pull && docker compose build --no-cache && docker compose up -d

# 2. Test green (health check, smoke tests)
curl -sf http://green-backend:3001/api/health

# 3. Switch traffic to green (via load balancer or DNS)
# (Implementation depends on your infrastructure)

# 4. If green has issues, revert traffic to blue immediately
# (Blue kept serving; no downtime)

# 5. Debug green offline
docker logs gravl-backend

Monitoring After Deployment

Immediate Checks (after deploy.sh completes)

# Containers are running
docker compose ps

# Backend is healthy
curl -sf http://localhost:3001/api/health | jq .

# Containers match local code
scripts/build-check.sh

# Logs have no errors
docker logs gravl-backend 2>&1 | grep -i error | head -5

Ongoing Checks (periodically)

# Run build-check regularly (cron every 30 min, or manual)
scripts/build-check.sh

# Monitor resource usage
docker stats gravl-backend gravl-frontend

# Audit logs for issues
docker logs gravl-backend --since 1h --until now | grep ERROR

Example Monitoring Script

#!/bin/bash
# Save as scripts/health-monitor.sh
set -euo pipefail

HEALTHY=true

# Check containers running
docker compose ps | grep -q "Up" || HEALTHY=false

# Check health endpoint
curl -sf http://localhost:3001/api/health || HEALTHY=false

# Check for stale containers
scripts/build-check.sh | grep -q "STALE" && HEALTHY=false

if [ "$HEALTHY" = "true" ]; then
  echo "[$(date)] Gravl is healthy ✓"
else
  echo "[$(date)] Gravl has issues! See above." >&2
  exit 1
fi

Best Practices

  1. Always run build-check.sh before deploying changes

    • Ensures you know current state
    • Catches stale containers early
  2. Review changes before deploying

    git log --oneline -5  # Recent commits
    git diff origin/main..HEAD  # What will be deployed
    
  3. Test in staging first

    • Separate staging environment for pre-production testing
    • Deploy to staging, verify, then deploy to production
  4. Keep logs rotated

    • logs/deploy.log can grow large
    • Use logrotate or manual cleanup: tail -1000 logs/deploy.log > logs/deploy.log.1 && > logs/deploy.log
  5. Automate regular checks

    • Cron job to run build-check.sh every 30 minutes
    • Send alerts if "STALE" or "WARNING" found
  6. Document rollbacks

    • Always log why you rolled back
    • Review patterns (e.g., "rolled back 3 times this week" = code review process failing)

See Also


Last updated: 2026-03-03 | Maintained by: Gravl Development Team