System Architecture
Detailed technical architecture of the NorthBuilt RAG System.
Key AWS Services: This system is built on Amazon Bedrock Knowledge Bases with S3 Vectors for vector storage. For a complete list of AWS documentation references, see AWS Documentation References.
Architecture Overview
┌─────────────────────────────────────────────────────────────────────┐
│ CloudFront CDN │
│ (Global Edge Distribution) │
└──────────────────────┬──────────────────────────────────────────────┘
│
┌─────────────┴─────────────┐
│ │
┌────▼────┐ ┌───────▼────────┐
│ S3 │ │ API Gateway │
│ (Web) │ │ (REST API) │
└─────────┘ └───────┬────────┘
│
┌───────────────┼───────────────┐
│ │ │
┌──────▼─────┐ ┌─────▼──────┐ ┌────▼─────┐
│ Cognito │ │ Lambda │ │ Lambda │
│ (Auth) │ │ (Chat) │ │ (Webhooks)│
└────────────┘ └─────┬──────┘ └────┬─────┘
│ │
┌──────────────┼──────────────┘
│ │
┌──────▼─────┐ ┌─────▼──────┐
│ Bedrock │ │ S3 Vectors │
│ (Claude) │ │ (Bedrock) │
└────────────┘ └────────────┘
│
┌──────▼─────┐
│ DynamoDB │
│ (Classify) │
└────────────┘
Component Details
1. Frontend Layer
CloudFront Distribution
- Purpose: Global CDN for low-latency access
- Origin: S3 bucket with static web assets
- Features:
- HTTPS-only with TLS 1.2+
- HTTP/2 and HTTP/3 support
- Gzip/Brotli compression
- Custom domain support
- Origin Access Identity (OAI) for S3 security
S3 Web Bucket
- Contents: HTML, CSS, JavaScript, images
- Access: Private (CloudFront only via OAI)
- Versioning: Enabled for rollback capability
- Lifecycle: Old versions expire after 90 days
Web Application
- Framework: React 19 + TypeScript + Vite
- Styling: Tailwind CSS 4
- Authentication: Google OAuth via Cognito
- Features:
- Real-time chat interface with clarification prompts
- Source citation display with relevance scores
- Query understanding integration
- Dark/light mode
- Mobile responsive PWA
2. API Layer
API Gateway (HTTP API)
- Type: HTTP API (not REST API - lower cost, better performance)
- Routes:
POST /chat- Main query endpointPOST /webhooks/fathom- Fathom video webhookPOST /webhooks/helpscout- HelpScout ticket webhookPOST /webhooks/linear- Linear issue webhook
- Authentication:
/chat: JWT authorizer (Cognito)- Webhooks: API key validation in Lambda
- Throttling:
- Rate: 10 requests/second
- Burst: 20 requests
- CORS: Enabled for web UI origin
3. Authentication Layer
Amazon Cognito
- User Pool: Manages user accounts and sessions
- Identity Provider: Google OAuth 2.0 federation
- Token Validation: JWT tokens with 1-hour expiration
- Features:
- Email/password fallback (optional)
- MFA support (optional)
- Account recovery
- User attributes (email, name, picture)
Authorization Flow
1. User clicks "Sign in with Google"
2. Redirected to Cognito hosted UI
3. Cognito redirects to Google OAuth
4. User authorizes → Google returns code
5. Cognito exchanges code for tokens
6. User redirected to app with JWT
7. App stores JWT in localStorage
8. JWT sent in Authorization header for API calls
4. Compute Layer
Lambda Functions
Chat Lambda (lambda/chat/)
- Runtime: Python 3.13
- Memory: 1024 MB
- Timeout: 60 seconds
- Purpose: Main orchestrator for chat queries
- Flow:
- Validate JWT token
- Extract query from request
- Call Query Lambda to retrieve relevant docs
- Call Bedrock with context + query
- Stream response back to client
- Return sources with citations
- Environment Variables:
QUERY_LAMBDA_ARN- Query Lambda function ARNBEDROCK_MODEL_ID- Claude model identifierAWS_REGION- Deployment region
Note: Document retrieval is handled directly by the Chat Lambda via Bedrock Knowledge Base APIs.
The Chat Lambda calls bedrock-agent-runtime:retrieve which:
- Generates embeddings via Titan Embeddings v2
- Searches S3 Vectors for similar documents
- Optionally reranks results using Bedrock Reranking
- Returns top K results (configurable, default 5)
Classify Lambda (lambda/classify/)
- Runtime: Python 3.13
- Memory: 512 MB
- Timeout: 30 seconds
- Purpose: Determine client/project for incoming documents using email domain matching
- Flow:
- Receive source type and document data
- Select classification strategy (Fathom or HelpScout)
- Extract email domain(s) from document
- Query DynamoDB for DOMAIN# record matching the domain
- Fetch CLIENT# and PROJECT# records for metadata
- Return client name and project name for S3 metadata tagging
- Strategies:
- FathomStrategy: Extracts domains from meeting participant emails, selects highest-count match
- HelpScoutStrategy: Extracts domain from primary customer email
- Input Format:
{"source": "fathom", "data": {"document": {"participants": ["user@client.com"]}}} - Output Format:
{"clientId": "uuid", "clientName": "Client Name", "projectId": null, "projectName": null} - Environment Variables:
DYNAMODB_TABLE_NAME- Classify table name (nb-rag-sys-classify)
Fathom Webhook Lambda (lambda/webhooks/fathom/)
- Runtime: Python 3.13
- Memory: 512 MB
- Timeout: 300 seconds (5 minutes)
- Purpose: Process Fathom video recordings
- Flow:
- Receive webhook from Fathom
- Validate API key
- Fetch video metadata and transcript
- Write document and .metadata.json sidecar to S3
- Trigger Bedrock KB sync (via scheduled ingestion)
- Return 200 OK
- Environment Variables:
FATHOM_API_KEY- Stored in Secrets ManagerDOCUMENTS_BUCKET- S3 bucket for documents
HelpScout Webhook Lambda (lambda/webhooks/helpscout/)
- Runtime: Python 3.13
- Memory: 512 MB
- Timeout: 60 seconds
- Purpose: Process HelpScout support tickets
- Flow:
- Receive webhook from HelpScout
- Validate API key
- Extract ticket content
- Call Classify Lambda
- Write document to S3
- Return 200 OK
- Environment Variables:
HELPSCOUT_API_KEY- Stored in Secrets ManagerCLASSIFY_LAMBDA_ARN- Classify Lambda function ARN
Linear Webhook Lambda (lambda/webhooks/linear/)
- Runtime: Python 3.13
- Memory: 512 MB
- Timeout: 60 seconds
- Purpose: Sync Linear Teams (clients) and Projects to DynamoDB for entity registry
- Supported Events:
Team.create,Team.update,Team.remove,Project.create,Project.update,Project.remove - Flow (Team events):
- Receive webhook from Linear with HMAC signature
- Verify signature using webhook secret
- Create/update/delete CLIENT record in DynamoDB
- Uses
update_itemto preserve manual fields (Domains, Notes, Aliases)
- Flow (Project events):
- Verify webhook signature
- Fetch full project details from Linear API (includes team info)
- Ensure parent CLIENT exists via
ensure_client_exists() - Create/update/delete PROJECT record in DynamoDB
- Uses
if_not_exists(CreatedAt, :val)to preserve original timestamps
- Environment Variables:
LINEAR_WEBHOOK_SECRET_ARN- Webhook secret in Secrets ManagerLINEAR_API_KEY_SECRET_ARN- API key in Secrets ManagerDYNAMODB_TABLE_NAME- Classify table name
Linear Sync Lambda (lambda/sync/linear/)
- Runtime: Python 3.13
- Memory: 512 MB
- Timeout: 300 seconds (5 minutes)
- Purpose: Full sync of all Linear Teams and Projects to DynamoDB
- Architecture: Handler + Worker pattern (fire-and-forget for long-running sync)
- Flow:
- Handler receives request, invokes worker asynchronously, returns 202 Accepted
- Worker fetches all teams from Linear GraphQL API
- For each team: creates/updates CLIENT record
- For each project: creates/updates PROJECT record with CLIENT parent
- Uses
update_itemwithif_not_exists(CreatedAt)for idempotent updates
- Invocation:
aws lambda invoke --function-name nb-rag-sys-linear-sync --payload '{}' response.json - Environment Variables:
LINEAR_API_KEY_SECRET_ARN- API key in Secrets ManagerDYNAMODB_TABLE_NAME- Classify table name
5. AI Layer
Amazon Bedrock
For complete Bedrock documentation, see Amazon Bedrock User Guide.
Claude Sonnet 4.5 (us.anthropic.claude-sonnet-4-5-20250929-v1:0)
- Purpose: Response generation
- Input: System prompt + context + user query
- Output: Natural language response with citations
- Pricing: $3.00 per million input tokens, $15.00 per million output tokens
- Performance: ~2-3 seconds for typical query
- Documentation: Claude on Amazon Bedrock
Titan Embeddings V2 (amazon.titan-embed-text-v2:0)
- Purpose: Generate vector embeddings for semantic search
- Input: Text chunks (up to 8192 tokens)
- Output: 1024-dimensional vector (also supports 256, 512)
- Pricing: $0.0001 per 1000 tokens
- Performance: ~200ms per embedding
- Documentation: Amazon Titan Text Embeddings
System Prompt Strategy
You are a helpful assistant with access to company documentation.
Context:
{retrieved_documents}
User Query: {query}
Instructions:
1. Answer based on the provided context
2. Cite sources using [doc_N] notation
3. If information is not in context, say so
4. Be concise but thorough
6. Data Layer
S3 Vectors
S3 Vectors is purpose-built vector storage that provides cost-optimized, durable storage for AI workloads. See Working with S3 Vectors for complete documentation.
- Index Configuration:
- Dimension: 1024 (matches Titan Embeddings v2)
- Metric: Cosine similarity
- Data Type: FLOAT32
- Storage: Fully managed by AWS
- Non-filterable keys:
AMAZON_BEDROCK_TEXT,AMAZON_BEDROCK_METADATA(required for 100% ingestion success) - See Vector indexes for configuration details
- Integration with Bedrock Knowledge Base:
- Native integration via
S3_VECTORSstorage type - see Using S3 Vectors with Bedrock KB - Automatic chunking (512 tokens, 20% overlap) - see Chunking strategies
- LLM parsing: Disabled (sidecar
.metadata.jsonfiles used instead) - Data deletion policy: DELETE (vectors auto-removed when S3 docs deleted)
- Native integration via
-
Document Metadata Architecture: Two metadata systems:
1. Bedrock KB Metadata (
.metadata.jsonsidecar files for filtering): Each document has a companion metadata sidecar file stored alongside it in S3. For example:fathom/acme/alpha/meeting_12345.mdhasfathom/acme/alpha/meeting_12345.md.metadata.json{ "metadataAttributes": { "source": {"value": {"type": "STRING", "stringValue": "fathom"}}, "client": {"value": {"type": "STRING", "stringValue": "acme"}}, "project": {"value": {"type": "STRING", "stringValue": "alpha"}}, "category": {"value": {"type": "STRING", "stringValue": "meeting-transcript"}} } }These enable multi-tenant filtering in RAG queries (filter by client). Project metadata is preserved for display purposes but not used for filtering. See S3 Vectors metadata filtering and RetrievalFilter API for filter syntax. For multi-tenancy patterns, see Multi-tenancy with metadata filtering.
2. S3 Object Metadata (HTTP headers, for S3 operations only): Technical metadata stored as S3 object headers. NOT used by Bedrock KB for filtering.
- Query Performance: Fast retrieval via Bedrock KB API
- Capacity: Scales automatically with usage
DynamoDB Tables
Classify Table (nb-rag-sys-classify)
The classify table is the central entity registry for multi-tenant document classification. It stores clients (from Linear teams), projects, and domain mappings used by the classification service to tag documents with the correct metadata.
Primary Key Structure:
- PK (Partition Key): Entity type prefix + ID (e.g.,
CLIENT#uuid,PROJECT#uuid,DOMAIN#example.com) - SK (Sort Key): Relationship context (e.g.,
METADATA,CLIENT#parent-uuid)
Record Types:
| Record Type | PK | SK | Purpose |
|---|---|---|---|
| CLIENT | CLIENT#<team-id> |
METADATA |
Client/Team from Linear |
| PROJECT | PROJECT#<project-id> |
CLIENT#<team-id> |
Project with parent client |
| DOMAIN | DOMAIN#<domain> |
METADATA |
Email domain to client mapping |
CLIENT Record (from Linear Teams):
{
"PK": "CLIENT#b2362d41-0364-4325-9b1e-a32b7e2d9255",
"SK": "METADATA",
"EntityType": "CLIENT",
"Name": "American Cell Technology",
"LinearTeamId": "b2362d41-0364-4325-9b1e-a32b7e2d9255",
"LinearTeamKey": "ACT",
"Description": "Biotech client",
"CreatedAt": "2025-12-30T23:09:26.580013",
"UpdatedAt": "2025-12-30T23:09:26.580022"
}
PROJECT Record (from Linear Projects):
{
"PK": "PROJECT#97ae8b49-c653-4538-a526-ba4d5e91f79a",
"SK": "CLIENT#b2362d41-0364-4325-9b1e-a32b7e2d9255",
"EntityType": "PROJECT",
"Name": "LIMS",
"LinearProjectId": "97ae8b49-c653-4538-a526-ba4d5e91f79a",
"ClientId": "b2362d41-0364-4325-9b1e-a32b7e2d9255",
"Description": "Laboratory Information Management System",
"State": "started",
"CreatedAt": "2025-12-30T23:09:26.716056",
"UpdatedAt": "2025-12-30T23:09:26.716068"
}
DOMAIN Record (for classification):
{
"PK": "DOMAIN#americancelltechnology.com",
"SK": "METADATA",
"ClientId": "b2362d41-0364-4325-9b1e-a32b7e2d9255",
"ProjectId": null,
"CreatedAt": "2025-12-30T23:15:00.000000",
"UpdatedAt": "2025-12-30T23:15:00.000000"
}
Global Secondary Index (GSI):
- EntityTypeIndex: Query all entities of a specific type
-
Hash Key: EntityType(CLIENTPROJECT) - Used by query understanding to load all known entities for LLM context
-
Features:
- Point-in-Time Recovery (PITR) enabled
- On-demand billing
update_itempattern preserves manual fields across syncsif_not_exists(CreatedAt, :val)preserves original creation timestamps
AWS Secrets Manager
- Purpose: Secure storage of API keys and credentials
- Secrets:
fathom-api-key- Fathom API keyhelpscout-api-key- HelpScout API keylinear-api-key- Linear API keygoogle-oauth-client-secret- Google OAuth secret
- Rotation: Manual (can be automated)
- Access: IAM role-based (Lambda execution roles)
7. Infrastructure Layer
Terraform State
- Backend: S3 + DynamoDB
- State File:
s3://nb-rag-sys-terraform-state/terraform.tfstate - Locking: DynamoDB table
nb-rag-sys-terraform-locks - Encryption: AES-256 at rest
- Versioning: Enabled for state recovery
IAM Roles
Lambda Execution Roles
- Chat Lambda: Invoke Query Lambda, Bedrock access
- Query Lambda: Secrets Manager read, Bedrock access
- Classify Lambda: DynamoDB write, Bedrock access
- Webhook Lambdas: Secrets Manager read, invoke other Lambdas
GitHub Actions OIDC Role
- Trust policy: GitHub OIDC provider
- Permissions: Full Terraform deployment access
- Session duration: 1 hour
- MFA: Not required (OIDC provides strong authentication)
Data Flow
Query Flow
The query flow uses Bedrock Knowledge Base Retrieve API with optional reranking.
1. User enters query in web UI
2. Web UI sends POST /chat with JWT token
3. API Gateway validates JWT with Cognito
4. Chat Lambda invoked:
a. Calls Bedrock Knowledge Base retrieve API
b. Knowledge Base:
- Generates embedding via Bedrock Titan
- Searches S3 Vectors for similar documents
- Optionally reranks results using Bedrock Reranking (Cohere Rerank 3.5)
- Returns top K documents (adaptive retrieval)
c. Formats context from retrieved documents
d. Calls Bedrock Claude with system prompt + context + query
e. Returns response
5. Chat Lambda returns JSON response with answer + sources
6. Web UI displays answer with source citations
Ingestion Flow (Fathom Example)
1. Fathom video completes processing
2. Fathom sends webhook to API Gateway
3. API Gateway routes to Fathom Webhook Lambda
4. Lambda validates API key from Secrets Manager
5. Lambda fetches video metadata and transcript
6. Lambda classifies content (client, project) via classify Lambda
7. Lambda writes document and .metadata.json sidecar to S3
8. Bedrock Knowledge Base sync triggers (scheduled or manual)
9. Knowledge Base:
a. Reads .metadata.json for filtering attributes
b. Chunks document (~512 tokens, 20% overlap)
c. Generates embeddings via Bedrock Titan
d. Stores vectors in S3 Vectors with metadata
10. Lambda returns 200 OK to Fathom
11. Video content searchable after next sync cycle
Security Architecture
Network Security
- All traffic over HTTPS/TLS 1.2+
- API Gateway with WAF (optional)
- CloudFront with AWS Shield Standard
- VPC endpoints for AWS service communication (optional)
Authentication & Authorization
- Google OAuth 2.0 for user authentication
- Cognito JWT tokens for API authorization
- API keys for webhook validation
- IAM roles for Lambda execution
Data Security
- Secrets Manager for sensitive credentials
- S3 encryption at rest (AES-256)
- DynamoDB encryption at rest
- Lambda environment variable encryption (optional)
Compliance
- GDPR: User data in EU can use eu-west-1
- SOC 2: AWS services are SOC 2 compliant
- HIPAA: Not HIPAA compliant (would require BAA)
Scalability
Current Capacity
- Concurrent users: ~100
- Queries per second: 10 (API Gateway limit)
- Vector storage: 100K documents
- Lambda concurrency: 10 reserved
Scaling Strategies
Horizontal Scaling
- Increase API Gateway rate limit
- Increase Lambda reserved concurrency
- S3 Vectors scales automatically with usage
- Enable CloudFront caching
Vertical Scaling
- Increase Lambda memory allocation
- Adjust Bedrock Knowledge Base retrieval limits
- Switch to provisioned DynamoDB capacity
Performance Optimization
- Cache frequent queries in Lambda
- Optimize chunk size/overlap in Knowledge Base
- Tune adaptive retrieval multiplier
- Enable Bedrock reranking for better relevance
Disaster Recovery
Backup Strategy
- Terraform State: S3 versioning enabled
- S3 Documents: S3 versioning + cross-region replication (optional)
- S3 Vectors: Automatically backed up with S3 data protection
- DynamoDB: Point-in-Time Recovery (PITR) enabled
- Secrets: Replicate to backup region
- Web Assets: S3 versioning + lifecycle
Recovery Procedures
Complete Infrastructure Loss
- Deploy from Terraform (
terraform apply) - Restore S3 documents from backup/versioning
- Trigger Bedrock Knowledge Base sync to rebuild vectors
- Restore DynamoDB from PITR
- Update secrets in Secrets Manager
- Deploy web assets to S3
- Invalidate CloudFront cache RTO: ~30 minutes | RPO: Near-zero (S3 durability)
Region Failure
- Update Terraform region variable
- Deploy to new region
- Update DNS to point to new CloudFront
- Restore data from backups RTO: ~1 hour | RPO: ~24 hours
Monitoring & Observability
CloudWatch Metrics
- Lambda: Invocations, Duration, Errors, Throttles
- API Gateway: 4xx, 5xx, Latency, Request Count
- Bedrock: Model invocations, token usage
- DynamoDB: Read/write capacity, throttles
CloudWatch Logs
- Lambda function logs (7-day retention)
- API Gateway access logs (optional)
- Structured JSON logging in Lambda
Alarms
- Lambda error rate > 1%
- API Gateway 5xx rate > 0.5%
- Lambda duration > 30 seconds
- DynamoDB throttling events
Distributed Tracing
- X-Ray integration for Lambda (optional)
- Request ID propagation across services
Last updated: 2026-01-01