Operations
Deploy, monitor, and maintain the NorthBuilt RAG System in production.
Overview
This section covers day-to-day operations including deployment, data ingestion, monitoring, troubleshooting, and incident response.
Documents
1. Deployment
Deploy via GitHub Actions or manually with Terraform.
- GitHub Actions CI/CD setup
- Manual Terraform deployment
- OIDC authentication
- Monitoring deployments
Audience: DevOps, Infrastructure Time: 15 minutes
2. Data Ingestion
Add documents to the system via webhooks or manual upload.
- Webhook integrations (Fathom, HelpScout, Linear)
- Manual document upload API
- Bulk import from S3
- Document management (update, delete)
Audience: Operations, Engineers Time: 25 minutes
3. Monitoring
CloudWatch metrics, logs, alarms, and dashboards.
- Key metrics to track
- CloudWatch dashboards
- Alarm configuration
- Log queries
- Health checks
Audience: DevOps, SRE Time: 20 minutes
4. Operations Runbook
Day-to-day operational procedures and tasks.
- Daily health checks
- API key rotation
- Lambda code updates
- Backup and recovery
- Incident response
Audience: Operations, DevOps, On-call Engineers Time: 30 minutes
5. Troubleshooting
Common issues and how to resolve them.
- Deployment issues
- Runtime errors (401, 500, timeouts)
- Performance problems
- Data ingestion issues
- Security problems
Audience: All Engineers Time: As needed (reference guide)
Recommended Reading Order
For Initial Deployment: Deployment → Monitoring → Runbook For Operations Team: Runbook → Monitoring → Troubleshooting For Data Management: Data Ingestion → Troubleshooting
Quick Reference
# Deploy
cd terraform && terraform apply
# View logs
aws logs tail /aws/lambda/nb-rag-sys-chat --follow
# Check health
./scripts/morning-health-check.sh
# Rotate API key (example: Fathom)
aws secretsmanager update-secret \
--secret-id nb-rag-sys-fathom-api-key \
--secret-string '{"api_key": "new-key"}'
Next: Development Guide to contribute code.