Operations

Deploy, monitor, and maintain the NorthBuilt RAG System in production.

Overview

This section covers day-to-day operations including deployment, data ingestion, monitoring, troubleshooting, and incident response.

Documents

1. Deployment

Deploy via GitHub Actions or manually with Terraform.

  • GitHub Actions CI/CD setup
  • Manual Terraform deployment
  • OIDC authentication
  • Monitoring deployments

Audience: DevOps, Infrastructure Time: 15 minutes

2. Data Ingestion

Add documents to the system via webhooks or manual upload.

  • Webhook integrations (Fathom, HelpScout, Linear)
  • Manual document upload API
  • Bulk import from S3
  • Document management (update, delete)

Audience: Operations, Engineers Time: 25 minutes

3. Monitoring

CloudWatch metrics, logs, alarms, and dashboards.

  • Key metrics to track
  • CloudWatch dashboards
  • Alarm configuration
  • Log queries
  • Health checks

Audience: DevOps, SRE Time: 20 minutes

4. Operations Runbook

Day-to-day operational procedures and tasks.

  • Daily health checks
  • API key rotation
  • Lambda code updates
  • Backup and recovery
  • Incident response

Audience: Operations, DevOps, On-call Engineers Time: 30 minutes

5. Troubleshooting

Common issues and how to resolve them.

  • Deployment issues
  • Runtime errors (401, 500, timeouts)
  • Performance problems
  • Data ingestion issues
  • Security problems

Audience: All Engineers Time: As needed (reference guide)

For Initial Deployment: Deployment → Monitoring → Runbook For Operations Team: Runbook → Monitoring → Troubleshooting For Data Management: Data Ingestion → Troubleshooting

Quick Reference

# Deploy
cd terraform && terraform apply

# View logs
aws logs tail /aws/lambda/nb-rag-sys-chat --follow

# Check health
./scripts/morning-health-check.sh

# Rotate API key (example: Fathom)
aws secretsmanager update-secret \
  --secret-id nb-rag-sys-fathom-api-key \
  --secret-string '{"api_key": "new-key"}'

Next: Development Guide to contribute code.


Table of contents