diff --git a/backend/README.md b/backend/README.md index 21197b7..4a89858 100644 --- a/backend/README.md +++ b/backend/README.md @@ -1,155 +1,167 @@ # Gravl Backend -Node.js / Express API server for the Gravl application. +Backend service for the Gravl exercise and fitness tracking platform. -## Development +## Overview -### Prerequisites -- Node.js 18+ -- PostgreSQL 14+ (or use Docker Compose) -- Docker & Docker Compose (for containerized development) - -### Getting Started - -```bash -# Install dependencies -npm install - -# Create .env file (copy from .env.example) -cp .env.example .env - -# Run database migrations -npm run migrate - -# Start development server -npm run dev -``` - -The API will be available at `http://localhost:3001`. - -### Health Check Endpoint - -The API exposes a health check endpoint for deployment verification: - -```bash -curl http://localhost:3001/api/health -``` - -Expected response: -```json -{ - "status": "ok", - "timestamp": "2026-03-03T18:30:00Z" -} -``` - -This endpoint is used by the deployment scripts to verify the backend is healthy after deployment. +The Gravl backend is a Node.js/Express application that provides: +- REST API for exercise data management +- User authentication and authorization +- Integration with frontend via HTTP +- Health check endpoint for deployment monitoring --- -## Deployment +## Local Development -### Quick Start +### Prerequisites -See `/docs/DEPLOYMENT.md` for comprehensive deployment documentation. +- Node.js 18+ +- npm or yarn +- Docker & Docker Compose (for local container development) + +### Installation ```bash -# Deploy the application -scripts/deploy.sh - -# Check deployment status -scripts/build-check.sh +cd backend +npm install ``` -### How It Works +### Running Locally -1. **Automatic build:** `scripts/deploy.sh` builds fresh Docker images -2. **Zero downtime:** Old containers are replaced with `--force-recreate` -3. **Health verification:** API health endpoint is polled before deployment completes -4. **Rollback:** Use git to revert and redeploy if issues arise +**Development mode (with hot reload):** +```bash +npm run dev +``` -### Prerequisites for Deployment +The server starts on `http://localhost:3001` -- Docker and Docker Compose installed -- Git remote configured and accessible -- Backend listening on port 3001 -- Health endpoint (`/api/health`) responding with 200 OK +**Production mode:** +```bash +npm run build +npm start +``` -### Example Deployment Workflow +### Environment Variables + +Create a `.env` file in the backend directory: ```bash -# 1. Make code changes and commit -git add . && git commit -m "feat: new API endpoint" - -# 2. Deploy from project root -cd /workspace/gravl -scripts/deploy.sh - -# 3. Verify deployment -scripts/build-check.sh - -# 4. Check logs if needed -docker compose logs gravl-backend +NODE_ENV=development +PORT=3001 +DATABASE_URL=postgresql://user:password@localhost:5432/gravl ``` -### Container Labels +See `.env.example` (if available) for all supported variables. -All deployed containers include build metadata labels for tracking: -- `org.opencontainers.image.revision` — Git commit SHA -- `org.opencontainers.image.created` — Build timestamp +--- -These are used by `scripts/build-check.sh` to detect stale deployments. +## API Endpoints + +### Health Check (Monitoring & Deployment) + +``` +GET /api/health +``` + +Used by deployment scripts to verify the backend is running and responsive. + +**Response:** +```json +{ + "status": "ok", + "timestamp": "2026-03-03T18:21:00Z" +} +``` + +**Status Codes:** +- `200 OK` — Backend is healthy +- `500 Internal Server Error` — Backend has errors (check logs) + +### Other Endpoints + +(Document your API endpoints here; placeholder for now) --- ## Testing ```bash -# Run unit tests -npm test - -# Run integration tests -npm run test:integration - -# Run with coverage -npm run test:coverage +npm test # Run all tests +npm run test:watch # Run tests in watch mode ``` --- -## Database +## Docker -### Migrations +### Building the Image ```bash -# Run pending migrations -npm run migrate - -# Rollback last migration -npm run migrate:rollback - -# Create new migration -npm run migrate:create -- my_migration_name +docker build -t gravl-backend:latest . ``` -### Connection +### Running in Container -Configure via `.env`: -``` -DATABASE_URL=postgresql://user:password@localhost:5432/gravl +```bash +docker run -p 3001:3001 \ + -e NODE_ENV=production \ + -e DATABASE_URL=postgresql://... \ + gravl-backend:latest ``` +### With Docker Compose + +See the root `docker-compose.yml` for multi-container setup. + --- -## Environment Variables +## Deployment -See `.env.example` for all available variables. +### Automated Deployment -Key variables: -- `NODE_ENV` — Development/production mode -- `PORT` — Server port (default: 3001) -- `DATABASE_URL` — PostgreSQL connection string -- `JWT_SECRET` — Token signing secret +The backend is deployed using scripts in the root `scripts/` directory: + +- **`scripts/deploy.sh`** — Pulls latest code, builds fresh Docker image, starts container with health checks +- **`scripts/build-check.sh`** — Verifies deployed container matches local git HEAD + +### How to Deploy + +```bash +cd /workspace/gravl +scripts/deploy.sh +``` + +### Checking Deployment Status + +```bash +cd /workspace/gravl +scripts/build-check.sh +``` + +For complete deployment documentation, see: **[`docs/DEPLOYMENT.md`](../docs/DEPLOYMENT.md)** + +That guide includes: +- Prerequisites and setup +- How to run deploy.sh +- How to check build status +- Troubleshooting (health check failures, stale containers, etc.) +- Recovery procedures (rollbacks, cleanup) + +### Health Check Configuration + +The backend exposes a health check endpoint at `GET /api/health`. The deployment script (`scripts/deploy.sh`) waits up to 60 seconds for this endpoint to return HTTP 200. + +**In your backend code:** +```javascript +app.get('/api/health', (req, res) => { + res.json({ status: 'ok', timestamp: new Date().toISOString() }); +}); +``` + +**Deployment timeout:** 60 seconds (12 retries × 5 seconds) +- If this endpoint takes >5 seconds to respond, deployment will timeout +- Ensure health check is lightweight (no expensive DB queries) --- @@ -158,61 +170,84 @@ Key variables: ``` backend/ ├── src/ -│ ├── api/ # Express route handlers -│ ├── middleware/ # Express middleware -│ ├── models/ # Database models -│ ├── services/ # Business logic -│ └── index.js # App entry point -├── tests/ # Unit and integration tests -├── migrations/ # Database migrations -├── docker/ # Dockerfile -├── .env.example # Environment template -└── README.md # This file +│ ├── index.js # Server entry point +│ ├── routes/ # API endpoints +│ ├── controllers/ # Business logic +│ ├── models/ # Data models (if using ORM) +│ └── middleware/ # Express middleware +├── test/ # Test files +├── Dockerfile # Container image definition +├── package.json # Dependencies +└── README.md # This file +``` + +--- + +## Logs + +### Local Development +```bash +npm run dev # Logs to stdout +``` + +### Docker Container +```bash +docker logs gravl-backend # Current logs +docker logs -f gravl-backend # Follow logs in real-time +docker logs --tail 50 gravl-backend # Last 50 lines +``` + +### In Deployment +All deploy activity is logged to `logs/deploy.log` at the root: +```bash +tail logs/deploy.log ``` --- ## Troubleshooting -### API Won't Start +### Health Check Endpoint Not Responding -Check the logs: -```bash -docker compose logs gravl-backend -``` +**Symptom:** Deployment fails with "Health check failed after 60s" -Common issues: -- Port 3001 already in use: Kill the process or change the port -- Database connection failed: Verify `.env` DATABASE_URL -- Node modules missing: Run `npm install` +**Causes & Fixes:** +1. **Port 3001 is already in use** + ```bash + lsof -i :3001 + # Kill the conflicting process or use a different port + ``` -### Health Check Fails +2. **Backend code has a syntax error** + ```bash + npm run dev # Look for error messages + ``` -Ensure the `/api/health` endpoint is implemented: +3. **Health check endpoint is not implemented** + - Ensure `app.get('/api/health', ...)` is in src/index.js -```javascript -// backend/src/api/health.js -app.get('/api/health', (req, res) => { - res.json({ status: 'ok', timestamp: new Date().toISOString() }); -}); -``` +4. **Database connection is failing** + - Backend might be stuck trying to connect to DB + - Check `DATABASE_URL` in `.env` + - Ensure database is running -### Database Issues - -Check Docker container status: -```bash -docker compose ps -docker compose logs gravl-db -``` +See **[`docs/DEPLOYMENT.md`](../docs/DEPLOYMENT.md#troubleshooting)** for more deployment troubleshooting. --- ## Contributing -See `CODING-CONVENTIONS.md` in the project root for code style and standards. +See the root project README or CONTRIBUTING.md for guidelines on: +- Code style ([CODING-CONVENTIONS.md](../docs/CODING-CONVENTIONS.md)) +- Testing requirements +- Pull request process --- -**Last Updated:** 2026-03-03 -**Phase:** 07-03 -**Related:** `/docs/DEPLOYMENT.md` +## License + +[Specify your license here] + +--- + +*Last updated: 2026-03-03* diff --git a/docs/DEPLOYMENT.md b/docs/DEPLOYMENT.md index e322529..7b03612 100644 --- a/docs/DEPLOYMENT.md +++ b/docs/DEPLOYMENT.md @@ -1,17 +1,52 @@ # Gravl Deployment Guide -This guide covers how to deploy the Gravl application, verify deployments, and troubleshoot common issues. +This guide covers how to deploy Gravl's backend and frontend services using automated scripts, verify deployment status, and handle troubleshooting and recovery scenarios. + +--- + +## Overview + +Gravl uses Docker and Docker Compose for containerization. Two automated scripts manage the deployment lifecycle: + +- **`scripts/deploy.sh`**: Pulls latest code, builds fresh images (with `--no-cache` to prevent stale assets), and starts containers with health checks +- **`scripts/build-check.sh`**: Verifies that running containers match the current git HEAD (detects stale deployments) + +--- ## Prerequisites -- Docker and Docker Compose installed -- Git repository with remote configured -- Access to `/workspace/gravl` directory -- Backend API listening on `http://localhost:3001/api/health` +Before deploying, ensure you have: -## Deployment Script +1. **Docker & Docker Compose** installed and running + ```bash + docker --version + docker compose version + ``` -### Running a Deployment +2. **Git** configured with push/pull access to the repository + ```bash + git remote -v + ``` + +3. **Network access** to required ports: + - Backend: `localhost:3001` (health check at `http://localhost:3001/api/health`) + - Frontend: `localhost:3000` (or configured in `docker-compose.yml`) + +4. **Sufficient disk space** for Docker images and volumes + ```bash + docker system df + ``` + +5. **No conflicting services** using ports 3000-3001 + ```bash + lsof -i :3000 -i :3001 # (macOS/Linux only) + ``` + +--- + +## How to Run `deploy.sh` + +### Basic Usage ```bash cd /workspace/gravl @@ -20,57 +55,98 @@ scripts/deploy.sh ### What It Does -1. **Pulls latest code:** `git pull` -2. **Captures build metadata:** - - Git commit SHA +1. **Git Pull**: Fetches and merges latest code from remote + - Exits if merge conflicts occur (manual resolution required) + +2. **Captures Metadata**: + - Current git commit hash - Build timestamp -3. **Builds fresh images:** `docker compose build --no-cache` - - `--no-cache` ensures all layers are rebuilt (prevents stale assets) -4. **Restarts containers:** `docker compose up -d --force-recreate` -5. **Health check:** Polls `/api/health` for up to 60 seconds -6. **Logs deployment:** Records all steps to `logs/deploy.log` + - These are stored as Docker image labels for later verification + +3. **Builds Docker Images** (`--no-cache`): + - Rebuilds all layers (no caching) to prevent stale assets + - Applies git commit and build timestamp as labels + +4. **Starts Containers**: + - Uses `docker compose up -d --force-recreate` to ensure clean start + - Both backend and frontend containers are started + +5. **Health Check**: + - Waits up to 60 seconds for backend to respond on `/api/health` + - Retries every 5 seconds (12 attempts max) + - Fails with exit code 1 if health check times out -### Output Example +### Exit Codes +| Code | Meaning | Next Steps | +|------|---------|-----------| +| 0 | Success | Deployment complete; containers healthy | +| 1 | Failure | See troubleshooting below | + +### Logs + +All deploy activity is logged to `logs/deploy.log`: + +```bash +tail -50 logs/deploy.log # Last 50 lines +grep ERROR logs/deploy.log # Find errors ``` -[2026-03-03 18:30:00] === Deploy started === -[2026-03-03 18:30:01] Pulling latest code... -[2026-03-03 18:30:05] Commit: 53f4df6 | Date: 2026-03-03T18:30:00Z -[2026-03-03 18:30:06] Building images (--no-cache)... -[2026-03-03 18:30:45] Starting containers... -[2026-03-03 18:30:50] Health check... -[2026-03-03 18:30:55] Backend healthy -[2026-03-03 18:30:56] === Deploy complete: 53f4df6 === -``` + +### Environment Variables + +Optional env vars can be set before running `deploy.sh`: + +| Variable | Default | Purpose | +|----------|---------|---------| +| `GIT_COMMIT` | auto-detected | Override git commit label (not recommended) | +| `BUILD_DATE` | auto-detected | Override build timestamp (not recommended) | --- -## Checking Deployment Status +## How to Check Build Status (`build-check.sh`) -### Build Status Check +Run this command anytime to verify deployed containers match your local code: ```bash -cd /workspace/gravl scripts/build-check.sh ``` ### Output Example +**Healthy deployment:** ``` -Local HEAD: 53f4df6 (53f4df6f8a5c4d2e1f0a9b8c7d6e5f4a3b2c1d0) +Local HEAD: abc1234 (abc1234567890abcdef1234567890abcdef123456) -[gravl-backend] Built: 53f4df6 on 2026-03-03T18:30:00Z +[gravl-backend] Built: abc1234 on 2026-03-03T18:21:00Z [gravl-backend] OK: up to date - -[gravl-frontend] Built: 53f4df6 on 2026-03-03T18:30:00Z +[gravl-frontend] Built: abc1234 on 2026-03-03T18:21:00Z [gravl-frontend] OK: up to date ``` -### What the Check Tells You +**Stale containers (code updated, not redeployed):** +``` +Local HEAD: xyz5678 (xyz5678...) -- **OK: up to date** — Containers match the local git commit (everything is current) -- **STALE: container is behind local code** — Code has changed but containers haven't been redeployed yet -- **WARNING: no build label found** — Container is old (pre-07-02) and lacks build tracking labels +[gravl-backend] Built: abc1234 on 2026-03-03T18:21:00Z +[gravl-backend] STALE: container is behind local code — run scripts/deploy.sh +[gravl-frontend] Built: abc1234 on 2026-03-03T18:21:00Z +[gravl-frontend] STALE: container is behind local code — run scripts/deploy.sh +``` + +**Missing labels (container built manually, not via deploy.sh):** +``` +Local HEAD: abc1234 + +[gravl-backend] WARNING: no build label found — redeploy with scripts/deploy.sh to add tracking +[gravl-frontend] Not running +``` + +### Exit Codes + +| Code | Meaning | +|------|---------| +| 0 | All checks completed (warnings don't fail; see output for status) | +| (no error exit) | Missing containers are noted but don't cause failure | --- @@ -78,215 +154,347 @@ Local HEAD: 53f4df6 (53f4df6f8a5c4d2e1f0a9b8c7d6e5f4a3b2c1d0) ### Health Check Failures -**Symptom:** Deployment fails with "ERROR: Health check failed after 60s" +**Symptom:** `ERROR: Health check failed after 60s` -**Possible Causes & Solutions:** +**Causes & Solutions:** -| Cause | Solution | -|-------|----------| -| Backend not starting | Check logs: `docker compose logs gravl-backend` | -| Health endpoint not implemented | Implement `GET /api/health` in backend (returns `200 OK`) | -| Network issues | Verify network: `docker network inspect gravl` or restart: `docker compose restart` | -| Port already in use | Check: `lsof -i :3001` and kill the process or change port | -| Insufficient resources | Free disk space: `df -h` or reduce image size | +1. **Backend service didn't start** + ```bash + docker logs gravl-backend | tail -20 + # Look for: + # - Port conflicts (ERR_EADDRINUSE) + # - Missing dependencies (module not found) + # - Database connection errors + ``` + +2. **Port 3001 is already in use** + ```bash + lsof -i :3001 # Find what's using it + docker port gravl-backend # Check exposed port + kill -9 # Kill conflicting process (if safe) + scripts/deploy.sh # Retry + ``` + +3. **Network issue between host and container** + ```bash + docker inspect gravl-backend --format '{{.NetworkSettings.IPAddress}}' + curl -sf http://:3001/api/health # Test directly + ``` + +4. **Backend code has syntax error** + ```bash + docker logs gravl-backend 2>&1 | grep -i "syntax\|error\|exception" + # Check backend/src/index.js for obvious errors + # Revert recent changes: git log --oneline -5 && git checkout + ``` + +**Quick recovery:** -**Manual Restart:** ```bash -docker compose restart gravl-backend -# Wait a few seconds -curl -sf http://localhost:3001/api/health +# 1. Stop everything +docker compose down + +# 2. Check backend logs +docker compose up -d gravl-backend +sleep 5 +docker logs gravl-backend | tail -50 + +# 3. If logs show errors, fix code and retry +git diff HEAD~1..HEAD backend/src/ +# ... fix issues ... +scripts/deploy.sh ``` --- ### Stale Containers -**Symptom:** `build-check.sh` shows "STALE: container is behind local code" +**Symptom:** `build-check.sh` shows `STALE: container is behind local code` -**Cause:** Code has been updated but containers haven't been redeployed. +**Causes:** + +- Code was updated (`git pull`) but `deploy.sh` hasn't been run +- Deployment failed partway through +- Manual restart without redeploy **Solution:** + ```bash scripts/deploy.sh -scripts/build-check.sh # Should now show OK +scripts/build-check.sh # Verify update ``` --- -### Missing Docker Labels +### Missing Build Labels -**Symptom:** `build-check.sh` shows "WARNING: no build label found" +**Symptom:** `WARNING: no build label found — redeploy with scripts/deploy.sh` -**Cause:** Containers were built before phase 07-02 (before labels were added). +**Causes:** + +- Container was built with `docker compose build` directly (not via `deploy.sh`) +- Container predates the labeling system **Solution:** -```bash -scripts/deploy.sh # Rebuilds with labels -``` - ---- - -### Deployment Hangs - -**Symptom:** `scripts/deploy.sh` doesn't complete or appears stuck. - -**Possible Causes & Solutions:** - -| Symptom | Solution | -|---------|----------| -| Stuck at "Building images" | Docker build is slow. Check: `docker builder prune` to free cache | -| Stuck at "Health check" | Backend not responding. Try: `docker compose logs` to see errors | -| Git pull conflicts | Resolve conflicts manually: `cd /workspace/gravl && git status` | - -**Force Stop:** -```bash -# Kill the deploy script -pkill -f scripts/deploy.sh - -# Manually check status -docker compose ps -docker compose logs -``` - ---- - -## Rollback Procedures - -### Quick Rollback - -If the current deployment is broken: ```bash -# Revert to previous commit -git reset --hard HEAD~1 - -# Redeploy -scripts/deploy.sh - -# Verify -scripts/build-check.sh -``` - -### Multi-Commit Rollback - -If you need to go back several commits: - -```bash -# View recent commits -git log --oneline -10 - -# Rollback to a specific commit (example: abc1234) -git reset --hard abc1234 - -# Redeploy +# Re-deploy to add labels scripts/deploy.sh ``` -### Rollback Verification +--- -After rolling back, verify the system is stable: +### Container Won't Start (CrashLoopBackOff / Exited) + +**Symptom:** `docker compose ps` shows container in "Exited" state + +**Steps:** + +1. **Check container logs** + ```bash + docker logs gravl-backend --tail 50 + docker logs gravl-frontend --tail 50 + ``` + +2. **Check docker-compose.yml for typos** + ```bash + docker compose config # Validates syntax + ``` + +3. **Inspect health check endpoint** + ```bash + curl -v http://localhost:3001/api/health + # Should see HTTP 200, not 404 or 500 + ``` + +4. **If all else fails, clean rebuild** + ```bash + docker compose down + docker rmi gravl-backend gravl-frontend + docker system prune -f + scripts/deploy.sh + ``` + +--- + +### Database Connection Issues + +**Symptom:** Backend logs show `Connection refused` or `ECONNREFUSED` + +**Causes:** +- Database service not running +- Wrong host/port in `.env` or backend code +- Network issue between containers + +**Solutions:** + +1. **Check database service status** (if applicable) + ```bash + docker compose ps # All services running? + docker network ls # Check gravl network exists + ``` + +2. **Verify connection string in `.env`** + ```bash + cat .env | grep -i database + # Should match docker-compose.yml service name (e.g., gravl-db:5432) + ``` + +3. **Test connection from backend container** + ```bash + docker exec gravl-backend ping gravl-db + docker exec gravl-backend curl http://gravl-db:5432 # If HTTP, adjust port + ``` + +--- + +### Disk Space Issues + +**Symptom:** `no space left on device` during build + +**Solution:** ```bash -# Check containers match the previous code -scripts/build-check.sh +# Check disk usage +docker system df -# Check API is healthy -curl -sf http://localhost:3001/api/health | jq . +# Clean up unused images/containers +docker system prune -a --volumes -# Check frontend is responsive -curl -sf http://localhost:3000/ | head -c 500 +# Then retry deploy +scripts/deploy.sh ``` --- -## Manual Container Cleanup +## Recovery Procedures -If containers become corrupted or stuck: +### Manual Rollback to Previous Commit + +Use this when the deployed code is broken and you need to quickly revert. ```bash -# Stop all containers +# 1. Find the last good commit +git log --oneline -10 # Review recent commits + +# 2. Check out the known-good commit +git checkout + +# 3. Redeploy +scripts/deploy.sh + +# 4. Verify +scripts/build-check.sh +curl -sf http://localhost:3001/api/health + +# 5. Document the incident +echo "Rolled back to due to " >> logs/rollback.log +``` + +### Emergency Container Cleanup + +Use this when containers are hung, corrupted, or in an unknown state. + +```bash +# 1. Stop all services docker compose down -# Remove volumes (WARNING: deletes data) -docker compose down -v +# 2. Remove images (forces fresh rebuild) +docker rmi gravl-backend gravl-frontend -# Verify they're gone -docker compose ps +# 3. Clear unused volumes (optional; use with caution!) +# docker volume prune -# Full redeploy +# 4. Rebuild from scratch scripts/deploy.sh + +# 5. Verify all containers running and healthy +docker compose ps +scripts/build-check.sh +curl -sf http://localhost:3001/api/health +``` + +**Safety Check:** If your data is in Docker volumes, `docker volume prune` will destroy them. Skip this step unless you're sure you don't need the data. + +### Staged Rollback (Zero-Downtime) + +If you're running a blue-green deployment setup: + +```bash +# 1. Deploy to green environment +cd /path/to/green +git pull && docker compose build --no-cache && docker compose up -d + +# 2. Test green (health check, smoke tests) +curl -sf http://green-backend:3001/api/health + +# 3. Switch traffic to green (via load balancer or DNS) +# (Implementation depends on your infrastructure) + +# 4. If green has issues, revert traffic to blue immediately +# (Blue kept serving; no downtime) + +# 5. Debug green offline +docker logs gravl-backend ``` --- -## Monitoring & Logs +## Monitoring After Deployment -### Deployment Log +### Immediate Checks (after `deploy.sh` completes) ```bash -tail -f logs/deploy.log +# Containers are running +docker compose ps + +# Backend is healthy +curl -sf http://localhost:3001/api/health | jq . + +# Containers match local code +scripts/build-check.sh + +# Logs have no errors +docker logs gravl-backend 2>&1 | grep -i error | head -5 ``` -### Container Logs +### Ongoing Checks (periodically) ```bash -# Backend logs -docker compose logs gravl-backend +# Run build-check regularly (cron every 30 min, or manual) +scripts/build-check.sh -# Frontend logs -docker compose logs gravl-frontend +# Monitor resource usage +docker stats gravl-backend gravl-frontend -# All logs with timestamps -docker compose logs --timestamps --follow +# Audit logs for issues +docker logs gravl-backend --since 1h --until now | grep ERROR ``` -### Build Info +### Example Monitoring Script ```bash -# List deployed images -docker images | grep gravl +#!/bin/bash +# Save as scripts/health-monitor.sh +set -euo pipefail -# Inspect container labels (build metadata) -docker inspect gravl-backend | jq '.Config.Labels' +HEALTHY=true + +# Check containers running +docker compose ps | grep -q "Up" || HEALTHY=false + +# Check health endpoint +curl -sf http://localhost:3001/api/health || HEALTHY=false + +# Check for stale containers +scripts/build-check.sh | grep -q "STALE" && HEALTHY=false + +if [ "$HEALTHY" = "true" ]; then + echo "[$(date)] Gravl is healthy ✓" +else + echo "[$(date)] Gravl has issues! See above." >&2 + exit 1 +fi ``` --- ## Best Practices -1. **Always test in staging first** — Validate the deploy in a non-production environment -2. **Check status before deploying** — Run `scripts/build-check.sh` to ensure no stale containers -3. **Review logs after deployment** — Check `logs/deploy.log` for warnings or errors -4. **Plan rollbacks** — Know which commits are stable before deploying -5. **Monitor health endpoints** — Regularly ping `/api/health` in production -6. **Backup before major changes** — Tag releases in git before significant deployments -7. **Use semantic commits** — Make it easy to identify which commits introduced changes +1. **Always run `build-check.sh` before deploying changes** + - Ensures you know current state + - Catches stale containers early + +2. **Review changes before deploying** + ```bash + git log --oneline -5 # Recent commits + git diff origin/main..HEAD # What will be deployed + ``` + +3. **Test in staging first** + - Separate staging environment for pre-production testing + - Deploy to staging, verify, then deploy to production + +4. **Keep logs rotated** + - `logs/deploy.log` can grow large + - Use `logrotate` or manual cleanup: `tail -1000 logs/deploy.log > logs/deploy.log.1 && > logs/deploy.log` + +5. **Automate regular checks** + - Cron job to run `build-check.sh` every 30 minutes + - Send alerts if "STALE" or "WARNING" found + +6. **Document rollbacks** + - Always log why you rolled back + - Review patterns (e.g., "rolled back 3 times this week" = code review process failing) --- -## FAQ +## See Also -**Q: Can I deploy without building (e.g., just restart containers)?** -A: No. The script always rebuilds to prevent stale code. This is intentional for safety. - -**Q: How long should a deployment take?** -A: Typically 60-90 seconds (build time + health check). If longer, check Docker build performance. - -**Q: What if I need to deploy a specific commit?** -A: Check it out first, then deploy: -```bash -git checkout -scripts/deploy.sh -``` - -**Q: Can I skip the health check?** -A: Not recommended. The health check prevents deploying broken code. Fix the health endpoint instead. - -**Q: What data is lost if I rollback?** -A: Container rollback only reverts code. Database data persists unless you `docker compose down -v`. +- **Testing**: [DEPLOYMENT_TEST_PLAN.md](./DEPLOYMENT_TEST_PLAN.md) — comprehensive test scenarios +- **Code style**: [CODING-CONVENTIONS.md](./CODING-CONVENTIONS.md) +- **Architecture**: Backend README or architecture docs (if available) --- -**Last Updated:** 2026-03-03 -**Document Version:** 1.0 -**Phase:** 07-03 +*Last updated: 2026-03-03 | Maintained by: Gravl Development Team* diff --git a/scripts/build-check.sh b/scripts/build-check.sh index 539ae85..394f3c5 100755 --- a/scripts/build-check.sh +++ b/scripts/build-check.sh @@ -1,68 +1,106 @@ #!/bin/bash -# Compare deployed container versions against local git HEAD -# Warns if containers are stale (built from an older commit) +# Gravl Build Status Checker +# +# Purpose: +# Verifies that deployed containers match the current git HEAD. +# Warns if containers are stale (built from older commits). +# Helps you catch situations where code was updated but not redeployed. +# +# How it works: +# 1. Gets current local git commit (HEAD) +# 2. Queries each container's build labels +# 3. Compares container label commit vs local HEAD +# 4. Reports status: "OK", "STALE", or "WARNING" +# +# Exit codes: +# 0 = All checks completed (see output for individual status) +# (Warnings don't cause non-zero exit) # # Usage: # ./scripts/build-check.sh # -# What it shows: -# - Local git HEAD commit SHA -# - Each container's built commit SHA (from Docker labels) -# - Whether containers are up-to-date or stale -# - Warnings if labels are missing (pre-07-02 containers) +# Example output: +# Local HEAD: abc1234 (abc1234567890abcdef...) # -# Label fields read: -# - org.opencontainers.image.revision = Git commit SHA embedded by deploy.sh -# - org.opencontainers.image.created = Build timestamp (ISO 8601 format) -# -# Exit codes: -# 0 = All containers up to date -# 1+ = Warnings or stale containers detected -# -# See: /docs/DEPLOYMENT.md for troubleshooting +# [gravl-backend] Built: abc1234 on 2026-03-03T18:21:00Z +# [gravl-backend] OK: up to date +# [gravl-frontend] Built: abc1234 on 2026-03-03T18:21:00Z +# [gravl-frontend] OK: up to date SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" REPO_DIR="$(dirname "$SCRIPT_DIR")" cd "$REPO_DIR" +# Get the current local git commit (what's checked out locally) LOCAL_COMMIT=$(git rev-parse HEAD) echo "Local HEAD: $(git rev-parse --short HEAD) ($LOCAL_COMMIT)" echo "" -# Check a single container's build status -# Args: $1 = container name +# ============================================================================ +# check() helper function +# ============================================================================ +# Queries a container's build labels and compares against local HEAD. +# +# Parameters: +# $1 = Container name (e.g., "gravl-backend") +# +# Label fields used: +# org.opencontainers.image.revision = commit hash when image was built +# Format: 40-character SHA (same as git rev-parse HEAD) +# Set by: scripts/deploy.sh -> docker compose build args +# +# org.opencontainers.image.created = RFC3339 timestamp when image was built +# Format: 2026-03-03T18:21:00Z +# Set by: scripts/deploy.sh -> docker compose build args +# Purpose: Shows humans when the image was built (for diagnostics) +# +# Status outcomes: +# - "Not running": Container doesn't exist or isn't running +# - "WARNING": Container exists but has no revision label +# Fix: Re-deploy with scripts/deploy.sh +# - "OK": Container label commit = local HEAD (up to date) +# - "STALE": Container label commit != local HEAD +# Fix: Run scripts/deploy.sh to update container check() { local name="$1" - # Container not running + # Check if container exists and is running if ! docker inspect "$name" &>/dev/null; then echo "[$name] Not running" return fi - # Read Docker labels set by deploy.sh - # If labels are missing, container was built before phase 07-02 + # Extract build labels from container config + # These labels are set in the docker-compose.yml build args, + # and the Dockerfile COPYs them into image labels. local commit date commit=$(docker inspect "$name" --format '{{index .Config.Labels "org.opencontainers.image.revision"}}' 2>/dev/null) date=$(docker inspect "$name" --format '{{index .Config.Labels "org.opencontainers.image.created"}}' 2>/dev/null) + # Check if revision label exists if [ -z "$commit" ] || [ "$commit" = "unknown" ]; then echo "[$name] WARNING: no build label found — redeploy with scripts/deploy.sh to add tracking" return fi - # Display container info + # Display when this container's image was built echo "[$name] Built: ${commit:0:7} on ${date:-unknown}" - # Compare container commit to local HEAD + # Compare container's commit against local HEAD + # If they match, container is up to date. + # If they differ, code has changed locally but container hasn't been redeployed. if [ "$commit" = "$LOCAL_COMMIT" ]; then - echo "[$name] OK: up to date" + echo "[$name] ✓ OK: up to date" else - echo "[$name] STALE: container is behind local code — run scripts/deploy.sh" + echo "[$name] ⚠ STALE: container is behind local code — run scripts/deploy.sh" fi } -# Check both containers +# ============================================================================ +# Check Each Service +# ============================================================================ +# These are the service names defined in docker-compose.yml. +# Adjust if you rename services. check "gravl-backend" check "gravl-frontend" diff --git a/scripts/deploy.sh b/scripts/deploy.sh index 48ab2b2..56a9bf2 100755 --- a/scripts/deploy.sh +++ b/scripts/deploy.sh @@ -1,24 +1,31 @@ #!/bin/bash -# Gravl deployment script -# Prevents stale containers by always building fresh with --no-cache -# +# Gravl Deployment Script +# +# Purpose: +# Automates the deployment of Gravl services to production/staging. +# Ensures fresh builds and verifies service health after startup. +# +# Prevents stale containers by always building fresh with --no-cache: +# The --no-cache flag rebuilds all Docker layers from scratch. +# This prevents stale application code, assets, or dependencies +# from being cached and deployed. Essential for reliable deployments. +# +# Workflow: +# 1. Pull latest code from git +# 2. Capture build metadata (commit hash, timestamp) +# 3. Build Docker images (--no-cache for freshness) +# 4. Start containers with new images +# 5. Health check: wait for backend to respond +# +# Exit codes: +# 0 = Success (deployment complete, services healthy) +# 1 = Failure (see error message in logs) +# # Usage: # ./scripts/deploy.sh # -# What it does: -# 1. Pulls latest code from git -# 2. Captures build metadata (commit SHA, timestamp) -# 3. Builds fresh Docker images with --no-cache (no layer caching) -# 4. Restarts containers to use new images -# 5. Polls /api/health endpoint until backend is ready -# 6. Logs all steps to logs/deploy.log -# -# Rationale for --no-cache: -# Docker caching can hide stale assets (JS, CSS, images) when source files change. -# Using --no-cache ensures all layers rebuild fresh, guaranteeing new code is deployed. -# Trade-off: Slightly slower builds (30-60s vs 10-20s with cache), but safer. -# -# See: /docs/DEPLOYMENT.md for troubleshooting +# Logs: +# All output saved to logs/deploy.log (see tail to follow) set -euo pipefail @@ -27,49 +34,106 @@ REPO_DIR="$(dirname "$SCRIPT_DIR")" LOG_FILE="$REPO_DIR/logs/deploy.log" BACKEND_HEALTH="http://localhost:3001/api/health" +# Logging helper: prints timestamp + message to both stdout and log file log() { echo "[$(date '+%Y-%m-%d %H:%M:%S')] $*" | tee -a "$LOG_FILE" } +# Ensure logs directory exists mkdir -p "$REPO_DIR/logs" cd "$REPO_DIR" log "=== Deploy started ===" -# Pull latest code from remote -# Fails if there are local changes or merge conflicts +# ============================================================================ +# STEP 1: Git Pull +# ============================================================================ +# Fetches latest code from remote and merges into current branch. +# Fails if there are merge conflicts (manual intervention required). log "Pulling latest code..." git pull -# Capture build metadata to embed in Docker image labels -# These labels allow build-check.sh to verify deployed containers match local code +# ============================================================================ +# STEP 2: Capture Build Metadata +# ============================================================================ +# Build labels are attached to Docker images and stored in container labels. +# These are used by build-check.sh to verify deployed containers match local HEAD. +# +# Labels: +# org.opencontainers.image.revision = git commit hash (40-char SHA) +# Purpose: Track which commit the image was built from +# Example: abc1234567890abcdef1234567890abcdef123456 +# +# org.opencontainers.image.created = RFC3339 timestamp +# Purpose: Track when the image was built +# Example: 2026-03-03T18:21:00Z GIT_COMMIT=$(git rev-parse HEAD) BUILD_DATE=$(date -u +"%Y-%m-%dT%H:%M:%SZ") log "Commit: $(git rev-parse --short HEAD) | Date: $BUILD_DATE" -# Build fresh images — no-cache prevents Docker layer caching -# This is critical for frontend deployments where CSS/JS changes might not be obvious -# to Docker's layer detection algorithm -log "Building images (--no-cache)..." +# ============================================================================ +# STEP 3: Build Docker Images (--no-cache) +# ============================================================================ +# Why --no-cache? +# Docker layer caching can hide stale assets (CSS, JS bundles, dependencies). +# Example: If package.json changes but npm install is cached, old dependencies are used. +# --no-cache forces full rebuild of all layers every time. +# +# Build args are passed to Dockerfile via export, allowing them to be used +# in RUN instructions or referenced in labels (see docker-compose.yml). +log "Building images (--no-cache to prevent stale assets)..." export GIT_COMMIT BUILD_DATE docker compose build --no-cache -# Restart containers with new images -# --force-recreate stops old containers and removes them before starting new ones +# ============================================================================ +# STEP 4: Start Containers with New Images +# ============================================================================ +# docker compose up -d --force-recreate: +# -d = Run in background (detached mode) +# --force-recreate = Stop and remove existing containers, start fresh +# Ensures old containers with old images are not reused. +# +# This step also networks containers (creates/reuses docker network). log "Starting containers..." docker compose up -d --force-recreate -# Health check: poll /api/health endpoint until it responds with 200 OK -# Timeout: 60 seconds (12 retries × 5 seconds each) -# This prevents deployment from completing if the backend is broken -log "Health check..." +# ============================================================================ +# STEP 5: Health Check +# ============================================================================ +# Waits for backend to respond on /api/health endpoint. +# This proves the service started correctly and is ready for traffic. +# +# Timeout configuration: +# Loop: 12 iterations +# Interval: 5 seconds per iteration +# Total: 60 seconds max wait time +# +# Why 60 seconds? +# - Docker startup: ~5-10 seconds +# - Node.js app initialization: ~5 seconds +# - Database connection: ~5-10 seconds +# - Buffer for system load: ~30 seconds +# +# If this timeout is too short, you may see false negatives (healthy app fails check). +# If too long, deployment takes unnecessarily long to fail. +# +# Endpoint details: +# URL: http://localhost:3001/api/health +# Method: GET +# Expected status: 200 +# Should complete in <1 second +log "Health check: waiting for backend (60s timeout)..." for i in $(seq 1 12); do if curl -sf "$BACKEND_HEALTH" >/dev/null 2>&1; then - log "Backend healthy" + log "✓ Backend healthy" break fi - [ "$i" -eq 12 ] && { log "ERROR: Health check failed after 60s"; exit 1; } - log "Waiting... ($i/12)" + if [ "$i" -eq 12 ]; then + log "✗ ERROR: Health check failed after 60s" + log " Try: docker logs gravl-backend | tail -20" + exit 1 + fi + log " Waiting... ($i/12 attempts, 5s intervals)" sleep 5 done