System Architecture
Detailed technical architecture of the NorthBuilt RAG System.
Key AWS Services: This system is built on Amazon Bedrock Knowledge Bases with S3 Vectors for vector storage. For a complete list of AWS documentation references, see AWS Documentation References.
Architecture Overview
┌─────────────────────────────────────────────────────────────────────┐
│ CloudFront CDN │
│ (Global Edge Distribution) │
└──────────────────────┬──────────────────────────────────────────────┘
│
┌─────────────┴─────────────┐
│ │
┌────▼────┐ ┌───────▼────────┐
│ S3 │ │ API Gateway │
│ (Web) │ │ (REST API) │
└─────────┘ └───────┬────────┘
│
┌───────────────┼───────────────┐
│ │ │
┌──────▼─────┐ ┌─────▼──────┐ ┌────▼─────┐
│ Cognito │ │ Lambda │ │ Lambda │
│ (Auth) │ │ (Chat) │ │ (Webhooks)│
└────────────┘ └─────┬──────┘ └────┬─────┘
│ │
┌──────────────┼──────────────┘
│ │
┌──────▼─────┐ ┌─────▼──────┐
│ Bedrock │ │ S3 Vectors │
│ (Claude) │ │ (Bedrock) │
└────────────┘ └────────────┘
│
┌──────▼─────┐
│ DynamoDB │
│ (Classification) │
└────────────┘
Component Details
1. Frontend Layer
CloudFront Distribution
- Purpose: Global CDN for low-latency access
- Origin: S3 bucket with static web assets
- Features:
- HTTPS-only with TLS 1.2+
- HTTP/2 and HTTP/3 support
- Gzip/Brotli compression
- Custom domain support
- Origin Access Identity (OAI) for S3 security
S3 Web Bucket
- Contents: HTML, CSS, JavaScript, images
- Access: Private (CloudFront only via OAI)
- Versioning: Enabled for rollback capability
- Lifecycle: Old versions expire after 90 days
Web Application
- Framework: React 19 + TypeScript + Vite
- Styling: Tailwind CSS 4
- Authentication: Google OAuth via Cognito
- Features:
- Real-time chat interface with clarification prompts
- Source citation display with relevance scores
- Query understanding integration
- Dark/light mode
- Mobile responsive PWA
2. API Layer
API Gateway (REST API for Streaming, HTTP API for Webhooks)
REST API (Chat endpoints with Lambda Response Streaming):
- Routes:
POST /chat- Streaming chat endpoint (SSE)GET /chat/{session_id}- Retrieve conversation history
- Authentication: Cognito JWT authorizer
- Features:
- Lambda Response Streaming for real-time token delivery
- Server-Sent Events (SSE) protocol
response_transfer_mode = "STREAM"for all endpoints
- CORS: Enabled for web UI origin
HTTP API (Webhooks - lower cost):
- Routes:
POST /webhooks/fathom- Fathom video webhookPOST /webhooks/helpscout- HelpScout ticket webhook
- Authentication: API key/signature validation in Lambda
- Throttling:
- Rate: 10 requests/second
- Burst: 20 requests
3. Authentication Layer
Amazon Cognito
- User Pool: Manages user accounts and sessions
- Identity Provider: Google OAuth 2.0 federation
- Token Validation: JWT tokens with 1-hour expiration
- Features:
- Email/password fallback (optional)
- MFA support (optional)
- Account recovery
- User attributes (email, name, picture)
Authorization Flow
1. User clicks "Sign in with Google"
2. Redirected to Cognito hosted UI
3. Cognito redirects to Google OAuth
4. User authorizes → Google returns code
5. Cognito exchanges code for tokens
6. User redirected to app with JWT
7. App stores JWT in localStorage
8. JWT sent in Authorization header for API calls
4. Compute Layer
Lambda Functions
Chat Lambda (lambda/node/chat/)
- Runtime: Node.js 22
- Memory: 1024 MB
- Timeout: 60 seconds
- Purpose: Main orchestrator for streaming chat queries
- Features:
- Lambda Response Streaming via
awslambda.streamifyResponse() - Server-Sent Events (SSE) for real-time token delivery
- Conversation history via DynamoDB
- Query understanding with client extraction
- Clarification prompts for ambiguous queries
- Pre-signed S3 URLs for source documents
- Lambda Response Streaming via
- Flow:
- Parse request (POST for new messages, GET for history)
- For POST: Extract query and run query understanding
- If clarification needed: Return JSON clarification response
- Retrieve documents from Bedrock Knowledge Base
- Stream LLM response via SSE (token by token)
- Save conversation to DynamoDB
- Return sources with pre-signed URLs
- Environment Variables:
KNOWLEDGE_BASE_ID- Bedrock Knowledge Base IDBEDROCK_LLM_MODEL- Claude model identifierDYNAMODB_TABLE- Conversation history tableENABLE_QUERY_UNDERSTANDING- Toggle query understandingAWS_REGION- Deployment region
Note: Document retrieval is handled directly by the Chat Lambda via Bedrock Knowledge Base APIs.
The Chat Lambda calls bedrock-agent-runtime:retrieve which:
- Generates embeddings via Titan Embeddings v2
- Searches S3 Vectors for similar documents
- Optionally reranks results using Bedrock Reranking
- Returns top K results (configurable, default 5)
Classification Lambda (lambda/python/classification/)
- Runtime: Python 3.14
- Memory: 512 MB
- Timeout: 30 seconds
- Purpose: Determine client for incoming documents using tiered content-based classification
- Classification Tiers:
- Tier 0 - Contact Match: Match source emails (meeting participants, customer) against client
Contactsfield - Tier 1 - Name Match: Search for client names/aliases in document text (word boundary matching)
- Tier 2 - Keyword Match: TF-IDF scoring against client
Keywordsfield - Tier 3 - Semantic Match: Claude Haiku via Bedrock for contextual analysis (fallback)
- Tier 0 - Contact Match: Match source emails (meeting participants, customer) against client
- Confidence Thresholds:
- Contact match: 1.0 (definitive)
- Name match: 1.0 (single match accepted)
- Keyword match: >= 0.7 (unambiguous)
- Semantic match: >= 0.8
- Pending Classification: Documents with low confidence are flagged as
PENDING_CLASSIFICATIONfor manual review. These auto-expire after 30 days (TTL). - Metrics: Emitted to CloudWatch namespace
RAG/Classification - Flow:
- Receive source type and document data
- Extract content and source emails via strategy (Fathom or HelpScout)
- Load client entities from DynamoDB (with keywords, contacts, aliases)
- Run tiered classification (contact → name → keyword → semantic)
- If low confidence, create PENDING_CLASSIFICATION record
- Return classification result with client, confidence, and tier
- Input Format:
{"source": "fathom", "data": {"document": {"transcript": "...", "participants": ["user@client.com"]}, "documentS3Key": "...", "documentTitle": "..."}} - Output Format:
{"success": true, "clientId": "uuid", "clientName": "Client Name", "confidence": 0.95, "tier": "name_match", "flaggedForReview": false, "pendingId": null} - Environment Variables:
DYNAMODB_TABLE_NAME- Classification table name (nb-rag-sys)AWS_REGION- AWS region for Bedrock calls
Fathom Webhook Lambda (lambda/python/webhooks/fathom/)
- Runtime: Python 3.14
- Memory: 512 MB
- Timeout: 300 seconds (5 minutes)
- Purpose: Process Fathom video recordings
- Flow:
- Receive webhook from Fathom
- Validate API key
- Fetch video metadata and transcript
- Write document and .metadata.json sidecar to S3
- Trigger Bedrock KB sync (via scheduled ingestion)
- Return 200 OK
- Environment Variables:
FATHOM_API_KEY- Stored in Secrets ManagerDOCUMENTS_BUCKET- S3 bucket for documents
HelpScout Webhook Lambda (lambda/python/webhooks/helpscout/)
- Runtime: Python 3.14
- Memory: 512 MB
- Timeout: 60 seconds
- Purpose: Process HelpScout support tickets
- Flow:
- Receive webhook from HelpScout
- Validate API key
- Extract ticket content
- Call Classification Lambda
- Write document to S3
- Return 200 OK
- Environment Variables:
HELPSCOUT_API_KEY- Stored in Secrets ManagerCLASSIFICATION_LAMBDA_ARN- Classification Lambda function ARN
5. AI Layer
Amazon Bedrock
For complete Bedrock documentation, see Amazon Bedrock User Guide.
Claude Sonnet 4.5 (us.anthropic.claude-sonnet-4-5-20250929-v1:0)
- Purpose: Response generation
- Input: System prompt + context + user query
- Output: Natural language response with citations
- Pricing: $3.00 per million input tokens, $15.00 per million output tokens
- Performance: ~2-3 seconds for typical query
- Documentation: Claude on Amazon Bedrock
Titan Embeddings V2 (amazon.titan-embed-text-v2:0)
- Purpose: Generate vector embeddings for semantic search
- Input: Text chunks (up to 8192 tokens)
- Output: 1024-dimensional vector (also supports 256, 512)
- Pricing: $0.0001 per 1000 tokens
- Performance: ~200ms per embedding
- Documentation: Amazon Titan Text Embeddings
System Prompt Strategy
You are a helpful assistant with access to company documentation.
Context:
{retrieved_documents}
User Query: {query}
Instructions:
1. Answer based on the provided context
2. Cite sources using [doc_N] notation
3. If information is not in context, say so
4. Be concise but thorough
6. Data Layer
S3 Vectors
S3 Vectors is purpose-built vector storage that provides cost-optimized, durable storage for AI workloads. See Working with S3 Vectors for complete documentation.
- Index Configuration:
- Dimension: 1024 (matches Titan Embeddings v2)
- Metric: Cosine similarity
- Data Type: FLOAT32
- Storage: Fully managed by AWS
- Non-filterable keys:
AMAZON_BEDROCK_TEXT,AMAZON_BEDROCK_METADATA(required for 100% ingestion success) - See Vector indexes for configuration details
- Integration with Bedrock Knowledge Base:
- Native integration via
S3_VECTORSstorage type - see Using S3 Vectors with Bedrock KB - Automatic chunking (512 tokens, 20% overlap) - see Chunking strategies
- LLM parsing: Disabled (sidecar
.metadata.jsonfiles used instead) - Data deletion policy: DELETE (vectors auto-removed when S3 docs deleted)
- Native integration via
-
Document Metadata Architecture: Two metadata systems:
1. Bedrock KB Metadata (
.metadata.jsonsidecar files for filtering): Each document has a companion metadata sidecar file stored alongside it in S3. For example:fathom/acme/alpha/meeting_12345.mdhasfathom/acme/alpha/meeting_12345.md.metadata.json{ "metadataAttributes": { "source": {"value": {"type": "STRING", "stringValue": "fathom"}}, "client": {"value": {"type": "STRING", "stringValue": "acme"}}, "project": {"value": {"type": "STRING", "stringValue": "alpha"}}, "category": {"value": {"type": "STRING", "stringValue": "meeting-transcript"}} } }These enable multi-tenant filtering in RAG queries (filter by client). Project metadata is preserved for display purposes but not used for filtering. See S3 Vectors metadata filtering and RetrievalFilter API for filter syntax. For multi-tenancy patterns, see Multi-tenancy with metadata filtering.
2. S3 Object Metadata (HTTP headers, for S3 operations only): Technical metadata stored as S3 object headers. NOT used by Bedrock KB for filtering.
- Query Performance: Fast retrieval via Bedrock KB API
- Capacity: Scales automatically with usage
DynamoDB Tables
Classification Table (nb-rag-sys)
The classification table is the central entity registry for multi-tenant document classification. It stores clients, projects, and domain mappings used by the classification service to tag documents with the correct metadata. Clients are managed via the Management UI in the web application.
Primary Key Structure:
- PK (Partition Key): Entity type prefix + ID (e.g.,
CLIENT#uuid,PROJECT#uuid,DOMAIN#example.com) - SK (Sort Key): Relationship context (e.g.,
METADATA,CLIENT#parent-uuid)
Record Types:
| Record Type | PK | SK | Purpose |
|---|---|---|---|
| CLIENT | CLIENT#<uuid> |
METADATA |
Client entity with classification data |
| PROJECT | PROJECT#<uuid> |
CLIENT#<client-uuid> |
Project with parent client |
| PENDING | PENDING#<uuid> |
METADATA |
Pending classification for manual review |
CLIENT Record (with classification fields):
{
"PK": "CLIENT#b2362d41-0364-4325-9b1e-a32b7e2d9255",
"SK": "METADATA",
"EntityType": "CLIENT",
"Name": "American Cell Technology",
"Aliases": ["ACT", "AmCell"],
"Keywords": ["cell therapy", "biotech", "LIMS"],
"Contacts": ["john@act.com", "support@act.com"],
"Description": "Biotech client specializing in cell therapy",
"CreatedAt": "2025-12-30T23:09:26.580013",
"UpdatedAt": "2025-12-30T23:09:26.580022"
}
PROJECT Record:
{
"PK": "PROJECT#97ae8b49-c653-4538-a526-ba4d5e91f79a",
"SK": "CLIENT#b2362d41-0364-4325-9b1e-a32b7e2d9255",
"EntityType": "PROJECT",
"Name": "LIMS",
"ClientId": "b2362d41-0364-4325-9b1e-a32b7e2d9255",
"Description": "Laboratory Information Management System",
"State": "started",
"CreatedAt": "2025-12-30T23:09:26.716056",
"UpdatedAt": "2025-12-30T23:09:26.716068"
}
PENDING_CLASSIFICATION Record (for manual review):
{
"PK": "PENDING#a1b2c3d4-5678-90ab-cdef-123456789abc",
"SK": "METADATA",
"EntityType": "PENDING_CLASSIFICATION",
"DocumentS3Key": "clients/unknown/fathom/meeting-123.md",
"DocumentTitle": "Meeting with potential client",
"Source": "fathom",
"SuggestedClientId": "b2362d41-0364-4325-9b1e-a32b7e2d9255",
"SuggestedClientName": "American Cell Technology",
"Confidence": "0.65",
"Tier": "semantic_match",
"Status": "pending",
"ResolvedClientId": null,
"CreatedAt": "2025-12-30T23:15:00.000000Z",
"TTL": 1738281600
}
Global Secondary Index (GSI):
- EntityTypeIndex: Query all entities of a specific type
-
Hash Key: EntityType(CLIENTPROJECT) - Used by query understanding to load all known entities for LLM context
-
Features:
- Point-in-Time Recovery (PITR) enabled
- On-demand billing
if_not_exists(CreatedAt, :val)preserves original creation timestamps
AWS Secrets Manager
- Purpose: Secure storage of API keys and credentials
- Secrets:
fathom-api-key- Fathom API keyhelpscout-api-key- HelpScout API keygoogle-oauth-client-secret- Google OAuth secret
- Rotation: Manual (can be automated)
- Access: IAM role-based (Lambda execution roles)
7. Infrastructure Layer
Terraform State
- Backend: S3 + DynamoDB
- State File:
s3://nb-rag-sys-terraform-state/terraform.tfstate - Locking: DynamoDB table
nb-rag-sys-terraform-locks - Encryption: AES-256 at rest
- Versioning: Enabled for state recovery
IAM Roles
Lambda Execution Roles
- Chat Lambda: Invoke Query Lambda, Bedrock access
- Query Lambda: Secrets Manager read, Bedrock access
- Classification Lambda: DynamoDB write, Bedrock access
- Webhook Lambdas: Secrets Manager read, invoke other Lambdas
GitHub Actions OIDC Role
- Trust policy: GitHub OIDC provider
- Permissions: Full Terraform deployment access
- Session duration: 1 hour
- MFA: Not required (OIDC provides strong authentication)
Data Flow
Query Flow
The query flow uses Bedrock Knowledge Base Retrieve API with optional reranking.
1. User enters query in web UI
2. Web UI sends POST /chat with JWT token
3. API Gateway validates JWT with Cognito
4. Chat Lambda invoked:
a. Calls Bedrock Knowledge Base retrieve API
b. Knowledge Base:
- Generates embedding via Bedrock Titan
- Searches S3 Vectors for similar documents
- Optionally reranks results using Bedrock Reranking (Cohere Rerank 3.5)
- Returns top K documents (adaptive retrieval)
c. Formats context from retrieved documents
d. Calls Bedrock Claude with system prompt + context + query
e. Returns response
5. Chat Lambda returns JSON response with answer + sources
6. Web UI displays answer with source citations
Ingestion Flow (Fathom Example)
1. Fathom video completes processing
2. Fathom sends webhook to API Gateway
3. API Gateway routes to Fathom Webhook Lambda
4. Lambda validates API key from Secrets Manager
5. Lambda fetches video metadata and transcript
6. Lambda classifies content (client, project) via classification Lambda
7. Lambda writes document and .metadata.json sidecar to S3
8. Bedrock Knowledge Base sync triggers (scheduled or manual)
9. Knowledge Base:
a. Reads .metadata.json for filtering attributes
b. Chunks document (~512 tokens, 20% overlap)
c. Generates embeddings via Bedrock Titan
d. Stores vectors in S3 Vectors with metadata
10. Lambda returns 200 OK to Fathom
11. Video content searchable after next sync cycle
Security Architecture
Network Security
- All traffic over HTTPS/TLS 1.2+
- API Gateway with WAF (optional)
- CloudFront with AWS Shield Standard
- VPC endpoints for AWS service communication (optional)
Authentication & Authorization
- Google OAuth 2.0 for user authentication
- Cognito JWT tokens for API authorization
- API keys for webhook validation
- IAM roles for Lambda execution
Data Security
- Secrets Manager for sensitive credentials
- S3 encryption at rest (AES-256)
- DynamoDB encryption at rest
- Lambda environment variable encryption (optional)
Compliance
- GDPR: User data in EU can use eu-west-1
- SOC 2: AWS services are SOC 2 compliant
- HIPAA: Not HIPAA compliant (would require BAA)
Scalability
Current Capacity
- Concurrent users: ~100
- Queries per second: 10 (API Gateway limit)
- Vector storage: 100K documents
- Lambda concurrency: 10 reserved
Scaling Strategies
Horizontal Scaling
- Increase API Gateway rate limit
- Increase Lambda reserved concurrency
- S3 Vectors scales automatically with usage
- Enable CloudFront caching
Vertical Scaling
- Increase Lambda memory allocation
- Adjust Bedrock Knowledge Base retrieval limits
- Switch to provisioned DynamoDB capacity
Performance Optimization
- Cache frequent queries in Lambda
- Optimize chunk size/overlap in Knowledge Base
- Tune adaptive retrieval multiplier
- Enable Bedrock reranking for better relevance
Disaster Recovery
Backup Strategy
- Terraform State: S3 versioning enabled
- S3 Documents: S3 versioning + cross-region replication (optional)
- S3 Vectors: Automatically backed up with S3 data protection
- DynamoDB: Point-in-Time Recovery (PITR) enabled
- Secrets: Replicate to backup region
- Web Assets: S3 versioning + lifecycle
Recovery Procedures
Complete Infrastructure Loss
- Deploy from Terraform (
terraform apply) - Restore S3 documents from backup/versioning
- Trigger Bedrock Knowledge Base sync to rebuild vectors
- Restore DynamoDB from PITR
- Update secrets in Secrets Manager
- Deploy web assets to S3
- Invalidate CloudFront cache RTO: ~30 minutes | RPO: Near-zero (S3 durability)
Region Failure
- Update Terraform region variable
- Deploy to new region
- Update DNS to point to new CloudFront
- Restore data from backups RTO: ~1 hour | RPO: ~24 hours
Monitoring & Observability
CloudWatch Metrics
- Lambda: Invocations, Duration, Errors, Throttles
- API Gateway: 4xx, 5xx, Latency, Request Count
- Bedrock: Model invocations, token usage
- DynamoDB: Read/write capacity, throttles
CloudWatch Logs
- Lambda function logs (7-day retention)
- API Gateway access logs (optional)
- Structured JSON logging in Lambda
Alarms
- Lambda error rate > 1%
- API Gateway 5xx rate > 0.5%
- Lambda duration > 30 seconds
- DynamoDB throttling events
Distributed Tracing
- X-Ray integration for Lambda (optional)
- Request ID propagation across services
Last updated: 2026-01-17