System Architecture

Detailed technical architecture of the NorthBuilt RAG System.

Key AWS Services: This system is built on Amazon Bedrock Knowledge Bases with S3 Vectors for vector storage. For a complete list of AWS documentation references, see AWS Documentation References.

Architecture Overview

┌─────────────────────────────────────────────────────────────────────┐
│                           CloudFront CDN                             │
│                    (Global Edge Distribution)                        │
└──────────────────────┬──────────────────────────────────────────────┘
                       │
         ┌─────────────┴─────────────┐
         │                           │
    ┌────▼────┐              ┌───────▼────────┐
    │   S3    │              │  API Gateway   │
    │  (Web)  │              │   (REST API)   │
    └─────────┘              └───────┬────────┘
                                     │
                     ┌───────────────┼───────────────┐
                     │               │               │
              ┌──────▼─────┐  ┌─────▼──────┐  ┌────▼─────┐
              │  Cognito   │  │   Lambda    │  │  Lambda  │
              │   (Auth)   │  │   (Chat)    │  │ (Webhooks)│
              └────────────┘  └─────┬──────┘  └────┬─────┘
                                    │              │
                     ┌──────────────┼──────────────┘
                     │              │
              ┌──────▼─────┐  ┌─────▼──────┐
              │  Bedrock   │  │ S3 Vectors │
              │  (Claude)  │  │  (Bedrock) │
              └────────────┘  └────────────┘
                     │
              ┌──────▼─────┐
              │ DynamoDB   │
              │ (Classification) │
              └────────────┘

Component Details

1. Frontend Layer

CloudFront Distribution

Purpose: Global CDN for low-latency access
Origin: S3 bucket with static web assets
Features:
- HTTPS-only with TLS 1.2+
- HTTP/2 and HTTP/3 support
- Gzip/Brotli compression
- Custom domain support
- Origin Access Identity (OAI) for S3 security

S3 Web Bucket

Contents: HTML, CSS, JavaScript, images
Access: Private (CloudFront only via OAI)
Versioning: Enabled for rollback capability
Lifecycle: Old versions expire after 90 days

Web Application

Framework: React 19 + TypeScript + Vite
Styling: Tailwind CSS 4
Authentication: Google OAuth via Cognito
Features:
- Real-time chat interface with clarification prompts
- Source citation display with relevance scores
- Query understanding integration
- Dark/light mode
- Mobile responsive PWA

2. API Layer

API Gateway (REST API for Streaming, HTTP API for Webhooks)

REST API (Chat endpoints with Lambda Response Streaming):

Routes:
- POST /chat - Streaming chat endpoint (SSE)
- GET /chat/{session_id} - Retrieve conversation history
Authentication: Cognito JWT authorizer
Features:
- Lambda Response Streaming for real-time token delivery
- Server-Sent Events (SSE) protocol
- response_transfer_mode = "STREAM" for all endpoints
CORS: Enabled for web UI origin

HTTP API (Webhooks - lower cost):

Routes:
- POST /webhooks/fathom - Fathom video webhook
- POST /webhooks/helpscout - HelpScout ticket webhook
Authentication: API key/signature validation in Lambda
Throttling:
- Rate: 10 requests/second
- Burst: 20 requests

3. Authentication Layer

Amazon Cognito

User Pool: Manages user accounts and sessions
Identity Provider: Google OAuth 2.0 federation
Token Validation: JWT tokens with 1-hour expiration
Features:
- Email/password fallback (optional)
- MFA support (optional)
- Account recovery
- User attributes (email, name, picture)

Authorization Flow

User clicks "Sign in with Google"
Redirected to Cognito hosted UI
Cognito redirects to Google OAuth
User authorizes → Google returns code
Cognito exchanges code for tokens
User redirected to app with JWT
App stores JWT in localStorage
JWT sent in Authorization header for API calls

4. Compute Layer

Lambda Functions

Chat Lambda (lambda/node/chat/)

Runtime: Node.js 22
Memory: 1024 MB
Timeout: 60 seconds
Purpose: Main orchestrator for streaming chat queries
Features:
- Lambda Response Streaming via awslambda.streamifyResponse()
- Server-Sent Events (SSE) for real-time token delivery
- Conversation history via DynamoDB
- Query understanding with client extraction
- Clarification prompts for ambiguous queries
- Pre-signed S3 URLs for source documents
Flow:
1. Parse request (POST for new messages, GET for history)
2. For POST: Extract query and run query understanding
3. If clarification needed: Return JSON clarification response
4. Retrieve documents from Bedrock Knowledge Base
5. Stream LLM response via SSE (token by token)
6. Save conversation to DynamoDB
7. Return sources with pre-signed URLs
Environment Variables:
- KNOWLEDGE_BASE_ID - Bedrock Knowledge Base ID
- BEDROCK_LLM_MODEL - Claude model identifier
- DYNAMODB_TABLE - Conversation history table
- ENABLE_QUERY_UNDERSTANDING - Toggle query understanding
- AWS_REGION - Deployment region

Note: Document retrieval is handled directly by the Chat Lambda via Bedrock Knowledge Base APIs. The Chat Lambda calls bedrock-agent-runtime:retrieve which:

Generates embeddings via Titan Embeddings v2
Searches S3 Vectors for similar documents
Optionally reranks results using Bedrock Reranking
Returns top K results (configurable, default 5)

Classification Lambda (lambda/python/classification/)

Runtime: Python 3.14
Memory: 512 MB
Timeout: 30 seconds
Purpose: Determine client for incoming documents using tiered content-based classification
Classification Tiers:
1. Tier 0 - Contact Match: Match source emails (meeting participants, customer) against client Contacts field
2. Tier 1 - Name Match: Search for client names/aliases in document text (word boundary matching)
3. Tier 2 - Keyword Match: TF-IDF scoring against client Keywords field
4. Tier 3 - Semantic Match: Claude Haiku via Bedrock for contextual analysis (fallback)
Confidence Thresholds:
- Contact match: 1.0 (definitive)
- Name match: 1.0 (single match accepted)
- Keyword match: >= 0.7 (unambiguous)
- Semantic match: >= 0.8
Pending Classification: Documents with low confidence are flagged as PENDING_CLASSIFICATION for manual review. These auto-expire after 30 days (TTL).
Metrics: Emitted to CloudWatch namespace RAG/Classification
Flow:
1. Receive source type and document data
2. Extract content and source emails via strategy (Fathom or HelpScout)
3. Load client entities from DynamoDB (with keywords, contacts, aliases)
4. Run tiered classification (contact → name → keyword → semantic)
5. If low confidence, create PENDING_CLASSIFICATION record
6. Return classification result with client, confidence, and tier

Input Format:

{"source": "fathom", "data": {"document": {"transcript": "...", "participants": ["user@client.com"]}, "documentS3Key": "...", "documentTitle": "..."}}

Output Format:

{"success": true, "clientId": "uuid", "clientName": "Client Name", "confidence": 0.95, "tier": "name_match", "flaggedForReview": false, "pendingId": null}

Environment Variables:
- DYNAMODB_TABLE_NAME - Classification table name (nb-rag-sys)
- AWS_REGION - AWS region for Bedrock calls

Fathom Webhook Lambda (lambda/python/webhooks/fathom/)

Runtime: Python 3.14
Memory: 512 MB
Timeout: 300 seconds (5 minutes)
Purpose: Process Fathom video recordings
Flow:
1. Receive webhook from Fathom
2. Validate API key
3. Fetch video metadata and transcript
4. Write document and .metadata.json sidecar to S3
5. Trigger Bedrock KB sync (via scheduled ingestion)
6. Return 200 OK
Environment Variables:
- FATHOM_API_KEY - Stored in Secrets Manager
- DOCUMENTS_BUCKET - S3 bucket for documents

HelpScout Webhook Lambda (lambda/python/webhooks/helpscout/)

Runtime: Python 3.14
Memory: 512 MB
Timeout: 60 seconds
Purpose: Process HelpScout support tickets
Flow:
1. Receive webhook from HelpScout
2. Validate API key
3. Extract ticket content
4. Call Classification Lambda
5. Write document to S3
6. Return 200 OK
Environment Variables:
- HELPSCOUT_API_KEY - Stored in Secrets Manager
- CLASSIFICATION_LAMBDA_ARN - Classification Lambda function ARN

5. AI Layer

Amazon Bedrock

For complete Bedrock documentation, see Amazon Bedrock User Guide.

Claude Sonnet 4.5 (us.anthropic.claude-sonnet-4-5-20250929-v1:0)

Purpose: Response generation
Input: System prompt + context + user query
Output: Natural language response with citations
Pricing: $3.00 per million input tokens, $15.00 per million output tokens
Performance: ~2-3 seconds for typical query
Documentation: Claude on Amazon Bedrock

Titan Embeddings V2 (amazon.titan-embed-text-v2:0)

Purpose: Generate vector embeddings for semantic search
Input: Text chunks (up to 8192 tokens)
Output: 1024-dimensional vector (also supports 256, 512)
Pricing: $0.0001 per 1000 tokens
Performance: ~200ms per embedding
Documentation: Amazon Titan Text Embeddings

System Prompt Strategy

You are a helpful assistant with access to company documentation.

Context:
{retrieved_documents}

User Query: {query}

Instructions:
1. Answer based on the provided context
2. Cite sources using [doc_N] notation
3. If information is not in context, say so
4. Be concise but thorough

6. Data Layer

S3 Vectors

S3 Vectors is purpose-built vector storage that provides cost-optimized, durable storage for AI workloads. See Working with S3 Vectors for complete documentation.

Index Configuration:
- Dimension: 1024 (matches Titan Embeddings v2)
- Metric: Cosine similarity
- Data Type: FLOAT32
- Storage: Fully managed by AWS
- Non-filterable keys: AMAZON_BEDROCK_TEXT, AMAZON_BEDROCK_METADATA (required for 100% ingestion success)
- See Vector indexes for configuration details
Integration with Bedrock Knowledge Base:
- Native integration via S3_VECTORS storage type - see Using S3 Vectors with Bedrock KB
- Automatic chunking (512 tokens, 20% overlap) - see Chunking strategies
- LLM parsing: Disabled (sidecar .metadata.json files used instead)
- Data deletion policy: DELETE (vectors auto-removed when S3 docs deleted)
Document Metadata Architecture: Two metadata systems:

1. Bedrock KB Metadata (.metadata.json sidecar files for filtering): Each document has a companion metadata sidecar file stored alongside it in S3. For example: fathom/acme/alpha/meeting_12345.md has fathom/acme/alpha/meeting_12345.md.metadata.json
```
{
  "metadataAttributes": {
    "source": {"value": {"type": "STRING", "stringValue": "fathom"}},
    "client": {"value": {"type": "STRING", "stringValue": "acme"}},
    "project": {"value": {"type": "STRING", "stringValue": "alpha"}},
    "category": {"value": {"type": "STRING", "stringValue": "meeting-transcript"}}
  }
}
```
These enable multi-tenant filtering in RAG queries (filter by client). Project metadata is preserved for display purposes but not used for filtering. See S3 Vectors metadata filtering and RetrievalFilter API for filter syntax. For multi-tenancy patterns, see Multi-tenancy with metadata filtering.

2. S3 Object Metadata (HTTP headers, for S3 operations only): Technical metadata stored as S3 object headers. NOT used by Bedrock KB for filtering.
Query Performance: Fast retrieval via Bedrock KB API
Capacity: Scales automatically with usage

DynamoDB Tables

Classification Table (nb-rag-sys)

The classification table is the central entity registry for multi-tenant document classification. It stores clients, projects, and domain mappings used by the classification service to tag documents with the correct metadata. Clients are managed via the Management UI in the web application.

Primary Key Structure:

PK (Partition Key): Entity type prefix + ID (e.g., CLIENT#uuid, PROJECT#uuid, DOMAIN#example.com)
SK (Sort Key): Relationship context (e.g., METADATA, CLIENT#parent-uuid)

Record Types:

Record Type	PK	SK	Purpose
CLIENT	`CLIENT#<uuid>`	`METADATA`	Client entity with classification data
PROJECT	`PROJECT#<uuid>`	`CLIENT#<client-uuid>`	Project with parent client
PENDING	`PENDING#<uuid>`	`METADATA`	Pending classification for manual review

CLIENT Record (with classification fields):

{
  "PK": "CLIENT#b2362d41-0364-4325-9b1e-a32b7e2d9255",
  "SK": "METADATA",
  "EntityType": "CLIENT",
  "Name": "American Cell Technology",
  "Aliases": ["ACT", "AmCell"],
  "Keywords": ["cell therapy", "biotech", "LIMS"],
  "Contacts": ["john@act.com", "support@act.com"],
  "Description": "Biotech client specializing in cell therapy",
  "CreatedAt": "2025-12-30T23:09:26.580013",
  "UpdatedAt": "2025-12-30T23:09:26.580022"
}

PROJECT Record:

{
  "PK": "PROJECT#97ae8b49-c653-4538-a526-ba4d5e91f79a",
  "SK": "CLIENT#b2362d41-0364-4325-9b1e-a32b7e2d9255",
  "EntityType": "PROJECT",
  "Name": "LIMS",
  "ClientId": "b2362d41-0364-4325-9b1e-a32b7e2d9255",
  "Description": "Laboratory Information Management System",
  "State": "started",
  "CreatedAt": "2025-12-30T23:09:26.716056",
  "UpdatedAt": "2025-12-30T23:09:26.716068"
}

PENDING_CLASSIFICATION Record (for manual review):

{
  "PK": "PENDING#a1b2c3d4-5678-90ab-cdef-123456789abc",
  "SK": "METADATA",
  "EntityType": "PENDING_CLASSIFICATION",
  "DocumentS3Key": "clients/unknown/fathom/meeting-123.md",
  "DocumentTitle": "Meeting with potential client",
  "Source": "fathom",
  "SuggestedClientId": "b2362d41-0364-4325-9b1e-a32b7e2d9255",
  "SuggestedClientName": "American Cell Technology",
  "Confidence": "0.65",
  "Tier": "semantic_match",
  "Status": "pending",
  "ResolvedClientId": null,
  "CreatedAt": "2025-12-30T23:15:00.000000Z",
  "TTL": 1738281600
}

Global Secondary Index (GSI):

EntityTypeIndex: Query all entities of a specific type
- Hash Key: EntityType (CLIENT PROJECT)
- Used by query understanding to load all known entities for LLM context

Features:

Point-in-Time Recovery (PITR) enabled
On-demand billing
if_not_exists(CreatedAt, :val) preserves original creation timestamps

AWS Secrets Manager

Purpose: Secure storage of API keys and credentials
Secrets:
- fathom-api-key - Fathom API key
- helpscout-api-key - HelpScout API key
- google-oauth-client-secret - Google OAuth secret
Rotation: Manual (can be automated)
Access: IAM role-based (Lambda execution roles)

7. Infrastructure Layer

Terraform State

Backend: S3 + DynamoDB
State File: s3://nb-rag-sys-terraform-state/terraform.tfstate
Locking: DynamoDB table nb-rag-sys-terraform-locks
Encryption: AES-256 at rest
Versioning: Enabled for state recovery

IAM Roles

Lambda Execution Roles

Chat Lambda: Invoke Query Lambda, Bedrock access
Query Lambda: Secrets Manager read, Bedrock access
Classification Lambda: DynamoDB write, Bedrock access
Webhook Lambdas: Secrets Manager read, invoke other Lambdas

GitHub Actions OIDC Role

Trust policy: GitHub OIDC provider
Permissions: Full Terraform deployment access
Session duration: 1 hour
MFA: Not required (OIDC provides strong authentication)

Data Flow

Query Flow

The query flow uses Bedrock Knowledge Base Retrieve API with optional reranking.

1. User enters query in web UI
2. Web UI sends POST /chat with JWT token
3. API Gateway validates JWT with Cognito
4. Chat Lambda invoked:
   a. Calls Bedrock Knowledge Base retrieve API
   b. Knowledge Base:
      - Generates embedding via Bedrock Titan
      - Searches S3 Vectors for similar documents
      - Optionally reranks results using Bedrock Reranking (Cohere Rerank 3.5)
      - Returns top K documents (adaptive retrieval)
   c. Formats context from retrieved documents
   d. Calls Bedrock Claude with system prompt + context + query
   e. Returns response
5. Chat Lambda returns JSON response with answer + sources
6. Web UI displays answer with source citations

Ingestion Flow (Fathom Example)

Fathom video completes processing
Fathom sends webhook to API Gateway
API Gateway routes to Fathom Webhook Lambda
Lambda validates API key from Secrets Manager
Lambda fetches video metadata and transcript
Lambda classifies content (client, project) via classification Lambda
Lambda writes document and .metadata.json sidecar to S3
Bedrock Knowledge Base sync triggers (scheduled or manual)
Knowledge Base:
   a. Reads .metadata.json for filtering attributes
   b. Chunks document (~512 tokens, 20% overlap)
   c. Generates embeddings via Bedrock Titan
   d. Stores vectors in S3 Vectors with metadata
Lambda returns 200 OK to Fathom
Video content searchable after next sync cycle

Security Architecture

Network Security

All traffic over HTTPS/TLS 1.2+
API Gateway with WAF (optional)
CloudFront with AWS Shield Standard
VPC endpoints for AWS service communication (optional)

Authentication & Authorization

Google OAuth 2.0 for user authentication
Cognito JWT tokens for API authorization
API keys for webhook validation
IAM roles for Lambda execution

Data Security

Secrets Manager for sensitive credentials
S3 encryption at rest (AES-256)
DynamoDB encryption at rest
Lambda environment variable encryption (optional)

Compliance

GDPR: User data in EU can use eu-west-1
SOC 2: AWS services are SOC 2 compliant
HIPAA: Not HIPAA compliant (would require BAA)

Scalability

Current Capacity

Concurrent users: ~100
Queries per second: 10 (API Gateway limit)
Vector storage: 100K documents
Lambda concurrency: 10 reserved

Scaling Strategies

Horizontal Scaling

Increase API Gateway rate limit
Increase Lambda reserved concurrency
S3 Vectors scales automatically with usage
Enable CloudFront caching

Vertical Scaling

Increase Lambda memory allocation
Adjust Bedrock Knowledge Base retrieval limits
Switch to provisioned DynamoDB capacity

Performance Optimization

Cache frequent queries in Lambda
Optimize chunk size/overlap in Knowledge Base
Tune adaptive retrieval multiplier
Enable Bedrock reranking for better relevance

Disaster Recovery

Backup Strategy

Terraform State: S3 versioning enabled
S3 Documents: S3 versioning + cross-region replication (optional)
S3 Vectors: Automatically backed up with S3 data protection
DynamoDB: Point-in-Time Recovery (PITR) enabled
Secrets: Replicate to backup region
Web Assets: S3 versioning + lifecycle

Recovery Procedures

Complete Infrastructure Loss

Deploy from Terraform (terraform apply)
Restore S3 documents from backup/versioning
Trigger Bedrock Knowledge Base sync to rebuild vectors
Restore DynamoDB from PITR
Update secrets in Secrets Manager
Deploy web assets to S3
Invalidate CloudFront cache RTO: ~30 minutes | RPO: Near-zero (S3 durability)

Region Failure

Update Terraform region variable
Deploy to new region
Update DNS to point to new CloudFront
Restore data from backups RTO: ~1 hour | RPO: ~24 hours

Monitoring & Observability

CloudWatch Metrics

Lambda: Invocations, Duration, Errors, Throttles
API Gateway: 4xx, 5xx, Latency, Request Count
Bedrock: Model invocations, token usage
DynamoDB: Read/write capacity, throttles

CloudWatch Logs

Lambda function logs (7-day retention)
API Gateway access logs (optional)
Structured JSON logging in Lambda

Alarms

Lambda error rate > 1%
API Gateway 5xx rate > 0.5%
Lambda duration > 30 seconds
DynamoDB throttling events

Distributed Tracing

X-Ray integration for Lambda (optional)
Request ID propagation across services

Last updated: 2026-01-17