System Architecture

Detailed technical architecture of the NorthBuilt RAG System.

Key AWS Services: This system is built on Amazon Bedrock Knowledge Bases with S3 Vectors for vector storage. For a complete list of AWS documentation references, see AWS Documentation References.

Architecture Overview

┌─────────────────────────────────────────────────────────────────────┐
│                           CloudFront CDN                             │
│                    (Global Edge Distribution)                        │
└──────────────────────┬──────────────────────────────────────────────┘
                       │
         ┌─────────────┴─────────────┐
         │                           │
    ┌────▼────┐              ┌───────▼────────┐
    │   S3    │              │  API Gateway   │
    │  (Web)  │              │   (REST API)   │
    └─────────┘              └───────┬────────┘
                                     │
                     ┌───────────────┼───────────────┐
                     │               │               │
              ┌──────▼─────┐  ┌─────▼──────┐  ┌────▼─────┐
              │  Cognito   │  │   Lambda    │  │  Lambda  │
              │   (Auth)   │  │   (Chat)    │  │ (Webhooks)│
              └────────────┘  └─────┬──────┘  └────┬─────┘
                                    │              │
                     ┌──────────────┼──────────────┘
                     │              │
              ┌──────▼─────┐  ┌─────▼──────┐
              │  Bedrock   │  │ S3 Vectors │
              │  (Claude)  │  │  (Bedrock) │
              └────────────┘  └────────────┘
                     │
              ┌──────▼─────┐
              │ DynamoDB   │
              │ (Classification) │
              └────────────┘

Component Details

1. Frontend Layer

CloudFront Distribution

  • Purpose: Global CDN for low-latency access
  • Origin: S3 bucket with static web assets
  • Features:
    • HTTPS-only with TLS 1.2+
    • HTTP/2 and HTTP/3 support
    • Gzip/Brotli compression
    • Custom domain support
    • Origin Access Identity (OAI) for S3 security

S3 Web Bucket

  • Contents: HTML, CSS, JavaScript, images
  • Access: Private (CloudFront only via OAI)
  • Versioning: Enabled for rollback capability
  • Lifecycle: Old versions expire after 90 days

Web Application

  • Framework: React 19 + TypeScript + Vite
  • Styling: Tailwind CSS 4
  • Authentication: Google OAuth via Cognito
  • Features:
    • Real-time chat interface with clarification prompts
    • Source citation display with relevance scores
    • Query understanding integration
    • Dark/light mode
    • Mobile responsive PWA

2. API Layer

API Gateway (REST API for Streaming, HTTP API for Webhooks)

REST API (Chat endpoints with Lambda Response Streaming):

  • Routes:
    • POST /chat - Streaming chat endpoint (SSE)
    • GET /chat/{session_id} - Retrieve conversation history
  • Authentication: Cognito JWT authorizer
  • Features:
    • Lambda Response Streaming for real-time token delivery
    • Server-Sent Events (SSE) protocol
    • response_transfer_mode = "STREAM" for all endpoints
  • CORS: Enabled for web UI origin

HTTP API (Webhooks - lower cost):

  • Routes:
    • POST /webhooks/fathom - Fathom video webhook
    • POST /webhooks/helpscout - HelpScout ticket webhook
  • Authentication: API key/signature validation in Lambda
  • Throttling:
    • Rate: 10 requests/second
    • Burst: 20 requests

3. Authentication Layer

Amazon Cognito

  • User Pool: Manages user accounts and sessions
  • Identity Provider: Google OAuth 2.0 federation
  • Token Validation: JWT tokens with 1-hour expiration
  • Features:
    • Email/password fallback (optional)
    • MFA support (optional)
    • Account recovery
    • User attributes (email, name, picture)

Authorization Flow

1. User clicks "Sign in with Google"
2. Redirected to Cognito hosted UI
3. Cognito redirects to Google OAuth
4. User authorizes → Google returns code
5. Cognito exchanges code for tokens
6. User redirected to app with JWT
7. App stores JWT in localStorage
8. JWT sent in Authorization header for API calls

4. Compute Layer

Lambda Functions

Chat Lambda (lambda/node/chat/)

  • Runtime: Node.js 22
  • Memory: 1024 MB
  • Timeout: 60 seconds
  • Purpose: Main orchestrator for streaming chat queries
  • Features:
    • Lambda Response Streaming via awslambda.streamifyResponse()
    • Server-Sent Events (SSE) for real-time token delivery
    • Conversation history via DynamoDB
    • Query understanding with client extraction
    • Clarification prompts for ambiguous queries
    • Pre-signed S3 URLs for source documents
  • Flow:
    1. Parse request (POST for new messages, GET for history)
    2. For POST: Extract query and run query understanding
    3. If clarification needed: Return JSON clarification response
    4. Retrieve documents from Bedrock Knowledge Base
    5. Stream LLM response via SSE (token by token)
    6. Save conversation to DynamoDB
    7. Return sources with pre-signed URLs
  • Environment Variables:
    • KNOWLEDGE_BASE_ID - Bedrock Knowledge Base ID
    • BEDROCK_LLM_MODEL - Claude model identifier
    • DYNAMODB_TABLE - Conversation history table
    • ENABLE_QUERY_UNDERSTANDING - Toggle query understanding
    • AWS_REGION - Deployment region

Note: Document retrieval is handled directly by the Chat Lambda via Bedrock Knowledge Base APIs. The Chat Lambda calls bedrock-agent-runtime:retrieve which:

  1. Generates embeddings via Titan Embeddings v2
  2. Searches S3 Vectors for similar documents
  3. Optionally reranks results using Bedrock Reranking
  4. Returns top K results (configurable, default 5)

Classification Lambda (lambda/python/classification/)

  • Runtime: Python 3.14
  • Memory: 512 MB
  • Timeout: 30 seconds
  • Purpose: Determine client for incoming documents using tiered content-based classification
  • Classification Tiers:
    1. Tier 0 - Contact Match: Match source emails (meeting participants, customer) against client Contacts field
    2. Tier 1 - Name Match: Search for client names/aliases in document text (word boundary matching)
    3. Tier 2 - Keyword Match: TF-IDF scoring against client Keywords field
    4. Tier 3 - Semantic Match: Claude Haiku via Bedrock for contextual analysis (fallback)
  • Confidence Thresholds:
    • Contact match: 1.0 (definitive)
    • Name match: 1.0 (single match accepted)
    • Keyword match: >= 0.7 (unambiguous)
    • Semantic match: >= 0.8
  • Pending Classification: Documents with low confidence are flagged as PENDING_CLASSIFICATION for manual review. These auto-expire after 30 days (TTL).
  • Metrics: Emitted to CloudWatch namespace RAG/Classification
  • Flow:
    1. Receive source type and document data
    2. Extract content and source emails via strategy (Fathom or HelpScout)
    3. Load client entities from DynamoDB (with keywords, contacts, aliases)
    4. Run tiered classification (contact → name → keyword → semantic)
    5. If low confidence, create PENDING_CLASSIFICATION record
    6. Return classification result with client, confidence, and tier
  • Input Format:
    {"source": "fathom", "data": {"document": {"transcript": "...", "participants": ["user@client.com"]}, "documentS3Key": "...", "documentTitle": "..."}}
    
  • Output Format:
    {"success": true, "clientId": "uuid", "clientName": "Client Name", "confidence": 0.95, "tier": "name_match", "flaggedForReview": false, "pendingId": null}
    
  • Environment Variables:
    • DYNAMODB_TABLE_NAME - Classification table name (nb-rag-sys)
    • AWS_REGION - AWS region for Bedrock calls

Fathom Webhook Lambda (lambda/python/webhooks/fathom/)

  • Runtime: Python 3.14
  • Memory: 512 MB
  • Timeout: 300 seconds (5 minutes)
  • Purpose: Process Fathom video recordings
  • Flow:
    1. Receive webhook from Fathom
    2. Validate API key
    3. Fetch video metadata and transcript
    4. Write document and .metadata.json sidecar to S3
    5. Trigger Bedrock KB sync (via scheduled ingestion)
    6. Return 200 OK
  • Environment Variables:
    • FATHOM_API_KEY - Stored in Secrets Manager
    • DOCUMENTS_BUCKET - S3 bucket for documents

HelpScout Webhook Lambda (lambda/python/webhooks/helpscout/)

  • Runtime: Python 3.14
  • Memory: 512 MB
  • Timeout: 60 seconds
  • Purpose: Process HelpScout support tickets
  • Flow:
    1. Receive webhook from HelpScout
    2. Validate API key
    3. Extract ticket content
    4. Call Classification Lambda
    5. Write document to S3
    6. Return 200 OK
  • Environment Variables:
    • HELPSCOUT_API_KEY - Stored in Secrets Manager
    • CLASSIFICATION_LAMBDA_ARN - Classification Lambda function ARN

5. AI Layer

Amazon Bedrock

For complete Bedrock documentation, see Amazon Bedrock User Guide.

Claude Sonnet 4.5 (us.anthropic.claude-sonnet-4-5-20250929-v1:0)

  • Purpose: Response generation
  • Input: System prompt + context + user query
  • Output: Natural language response with citations
  • Pricing: $3.00 per million input tokens, $15.00 per million output tokens
  • Performance: ~2-3 seconds for typical query
  • Documentation: Claude on Amazon Bedrock

Titan Embeddings V2 (amazon.titan-embed-text-v2:0)

  • Purpose: Generate vector embeddings for semantic search
  • Input: Text chunks (up to 8192 tokens)
  • Output: 1024-dimensional vector (also supports 256, 512)
  • Pricing: $0.0001 per 1000 tokens
  • Performance: ~200ms per embedding
  • Documentation: Amazon Titan Text Embeddings

System Prompt Strategy

You are a helpful assistant with access to company documentation.

Context:
{retrieved_documents}

User Query: {query}

Instructions:
1. Answer based on the provided context
2. Cite sources using [doc_N] notation
3. If information is not in context, say so
4. Be concise but thorough

6. Data Layer

S3 Vectors

S3 Vectors is purpose-built vector storage that provides cost-optimized, durable storage for AI workloads. See Working with S3 Vectors for complete documentation.

  • Index Configuration:
    • Dimension: 1024 (matches Titan Embeddings v2)
    • Metric: Cosine similarity
    • Data Type: FLOAT32
    • Storage: Fully managed by AWS
    • Non-filterable keys: AMAZON_BEDROCK_TEXT, AMAZON_BEDROCK_METADATA (required for 100% ingestion success)
    • See Vector indexes for configuration details
  • Integration with Bedrock Knowledge Base:
    • Native integration via S3_VECTORS storage type - see Using S3 Vectors with Bedrock KB
    • Automatic chunking (512 tokens, 20% overlap) - see Chunking strategies
    • LLM parsing: Disabled (sidecar .metadata.json files used instead)
    • Data deletion policy: DELETE (vectors auto-removed when S3 docs deleted)
  • Document Metadata Architecture: Two metadata systems:

    1. Bedrock KB Metadata (.metadata.json sidecar files for filtering): Each document has a companion metadata sidecar file stored alongside it in S3. For example: fathom/acme/alpha/meeting_12345.md has fathom/acme/alpha/meeting_12345.md.metadata.json

    {
      "metadataAttributes": {
        "source": {"value": {"type": "STRING", "stringValue": "fathom"}},
        "client": {"value": {"type": "STRING", "stringValue": "acme"}},
        "project": {"value": {"type": "STRING", "stringValue": "alpha"}},
        "category": {"value": {"type": "STRING", "stringValue": "meeting-transcript"}}
      }
    }
    

    These enable multi-tenant filtering in RAG queries (filter by client). Project metadata is preserved for display purposes but not used for filtering. See S3 Vectors metadata filtering and RetrievalFilter API for filter syntax. For multi-tenancy patterns, see Multi-tenancy with metadata filtering.

    2. S3 Object Metadata (HTTP headers, for S3 operations only): Technical metadata stored as S3 object headers. NOT used by Bedrock KB for filtering.

  • Query Performance: Fast retrieval via Bedrock KB API
  • Capacity: Scales automatically with usage

DynamoDB Tables

Classification Table (nb-rag-sys)

The classification table is the central entity registry for multi-tenant document classification. It stores clients, projects, and domain mappings used by the classification service to tag documents with the correct metadata. Clients are managed via the Management UI in the web application.

Primary Key Structure:

  • PK (Partition Key): Entity type prefix + ID (e.g., CLIENT#uuid, PROJECT#uuid, DOMAIN#example.com)
  • SK (Sort Key): Relationship context (e.g., METADATA, CLIENT#parent-uuid)

Record Types:

Record Type PK SK Purpose
CLIENT CLIENT#<uuid> METADATA Client entity with classification data
PROJECT PROJECT#<uuid> CLIENT#<client-uuid> Project with parent client
PENDING PENDING#<uuid> METADATA Pending classification for manual review

CLIENT Record (with classification fields):

{
  "PK": "CLIENT#b2362d41-0364-4325-9b1e-a32b7e2d9255",
  "SK": "METADATA",
  "EntityType": "CLIENT",
  "Name": "American Cell Technology",
  "Aliases": ["ACT", "AmCell"],
  "Keywords": ["cell therapy", "biotech", "LIMS"],
  "Contacts": ["john@act.com", "support@act.com"],
  "Description": "Biotech client specializing in cell therapy",
  "CreatedAt": "2025-12-30T23:09:26.580013",
  "UpdatedAt": "2025-12-30T23:09:26.580022"
}

PROJECT Record:

{
  "PK": "PROJECT#97ae8b49-c653-4538-a526-ba4d5e91f79a",
  "SK": "CLIENT#b2362d41-0364-4325-9b1e-a32b7e2d9255",
  "EntityType": "PROJECT",
  "Name": "LIMS",
  "ClientId": "b2362d41-0364-4325-9b1e-a32b7e2d9255",
  "Description": "Laboratory Information Management System",
  "State": "started",
  "CreatedAt": "2025-12-30T23:09:26.716056",
  "UpdatedAt": "2025-12-30T23:09:26.716068"
}

PENDING_CLASSIFICATION Record (for manual review):

{
  "PK": "PENDING#a1b2c3d4-5678-90ab-cdef-123456789abc",
  "SK": "METADATA",
  "EntityType": "PENDING_CLASSIFICATION",
  "DocumentS3Key": "clients/unknown/fathom/meeting-123.md",
  "DocumentTitle": "Meeting with potential client",
  "Source": "fathom",
  "SuggestedClientId": "b2362d41-0364-4325-9b1e-a32b7e2d9255",
  "SuggestedClientName": "American Cell Technology",
  "Confidence": "0.65",
  "Tier": "semantic_match",
  "Status": "pending",
  "ResolvedClientId": null,
  "CreatedAt": "2025-12-30T23:15:00.000000Z",
  "TTL": 1738281600
}

Global Secondary Index (GSI):

  • EntityTypeIndex: Query all entities of a specific type
    • Hash Key: EntityType (CLIENT PROJECT)
    • Used by query understanding to load all known entities for LLM context

Features:

  • Point-in-Time Recovery (PITR) enabled
  • On-demand billing
  • if_not_exists(CreatedAt, :val) preserves original creation timestamps

AWS Secrets Manager

  • Purpose: Secure storage of API keys and credentials
  • Secrets:
    • fathom-api-key - Fathom API key
    • helpscout-api-key - HelpScout API key
    • google-oauth-client-secret - Google OAuth secret
  • Rotation: Manual (can be automated)
  • Access: IAM role-based (Lambda execution roles)

7. Infrastructure Layer

Terraform State

  • Backend: S3 + DynamoDB
  • State File: s3://nb-rag-sys-terraform-state/terraform.tfstate
  • Locking: DynamoDB table nb-rag-sys-terraform-locks
  • Encryption: AES-256 at rest
  • Versioning: Enabled for state recovery

IAM Roles

Lambda Execution Roles

  • Chat Lambda: Invoke Query Lambda, Bedrock access
  • Query Lambda: Secrets Manager read, Bedrock access
  • Classification Lambda: DynamoDB write, Bedrock access
  • Webhook Lambdas: Secrets Manager read, invoke other Lambdas

GitHub Actions OIDC Role

  • Trust policy: GitHub OIDC provider
  • Permissions: Full Terraform deployment access
  • Session duration: 1 hour
  • MFA: Not required (OIDC provides strong authentication)

Data Flow

Query Flow

The query flow uses Bedrock Knowledge Base Retrieve API with optional reranking.

1. User enters query in web UI
2. Web UI sends POST /chat with JWT token
3. API Gateway validates JWT with Cognito
4. Chat Lambda invoked:
   a. Calls Bedrock Knowledge Base retrieve API
   b. Knowledge Base:
      - Generates embedding via Bedrock Titan
      - Searches S3 Vectors for similar documents
      - Optionally reranks results using Bedrock Reranking (Cohere Rerank 3.5)
      - Returns top K documents (adaptive retrieval)
   c. Formats context from retrieved documents
   d. Calls Bedrock Claude with system prompt + context + query
   e. Returns response
5. Chat Lambda returns JSON response with answer + sources
6. Web UI displays answer with source citations

Ingestion Flow (Fathom Example)

1. Fathom video completes processing
2. Fathom sends webhook to API Gateway
3. API Gateway routes to Fathom Webhook Lambda
4. Lambda validates API key from Secrets Manager
5. Lambda fetches video metadata and transcript
6. Lambda classifies content (client, project) via classification Lambda
7. Lambda writes document and .metadata.json sidecar to S3
8. Bedrock Knowledge Base sync triggers (scheduled or manual)
9. Knowledge Base:
   a. Reads .metadata.json for filtering attributes
   b. Chunks document (~512 tokens, 20% overlap)
   c. Generates embeddings via Bedrock Titan
   d. Stores vectors in S3 Vectors with metadata
10. Lambda returns 200 OK to Fathom
11. Video content searchable after next sync cycle

Security Architecture

Network Security

  • All traffic over HTTPS/TLS 1.2+
  • API Gateway with WAF (optional)
  • CloudFront with AWS Shield Standard
  • VPC endpoints for AWS service communication (optional)

Authentication & Authorization

  • Google OAuth 2.0 for user authentication
  • Cognito JWT tokens for API authorization
  • API keys for webhook validation
  • IAM roles for Lambda execution

Data Security

  • Secrets Manager for sensitive credentials
  • S3 encryption at rest (AES-256)
  • DynamoDB encryption at rest
  • Lambda environment variable encryption (optional)

Compliance

  • GDPR: User data in EU can use eu-west-1
  • SOC 2: AWS services are SOC 2 compliant
  • HIPAA: Not HIPAA compliant (would require BAA)

Scalability

Current Capacity

  • Concurrent users: ~100
  • Queries per second: 10 (API Gateway limit)
  • Vector storage: 100K documents
  • Lambda concurrency: 10 reserved

Scaling Strategies

Horizontal Scaling

  • Increase API Gateway rate limit
  • Increase Lambda reserved concurrency
  • S3 Vectors scales automatically with usage
  • Enable CloudFront caching

Vertical Scaling

  • Increase Lambda memory allocation
  • Adjust Bedrock Knowledge Base retrieval limits
  • Switch to provisioned DynamoDB capacity

Performance Optimization

  • Cache frequent queries in Lambda
  • Optimize chunk size/overlap in Knowledge Base
  • Tune adaptive retrieval multiplier
  • Enable Bedrock reranking for better relevance

Disaster Recovery

Backup Strategy

  • Terraform State: S3 versioning enabled
  • S3 Documents: S3 versioning + cross-region replication (optional)
  • S3 Vectors: Automatically backed up with S3 data protection
  • DynamoDB: Point-in-Time Recovery (PITR) enabled
  • Secrets: Replicate to backup region
  • Web Assets: S3 versioning + lifecycle

Recovery Procedures

Complete Infrastructure Loss

  1. Deploy from Terraform (terraform apply)
  2. Restore S3 documents from backup/versioning
  3. Trigger Bedrock Knowledge Base sync to rebuild vectors
  4. Restore DynamoDB from PITR
  5. Update secrets in Secrets Manager
  6. Deploy web assets to S3
  7. Invalidate CloudFront cache RTO: ~30 minutes | RPO: Near-zero (S3 durability)

Region Failure

  1. Update Terraform region variable
  2. Deploy to new region
  3. Update DNS to point to new CloudFront
  4. Restore data from backups RTO: ~1 hour | RPO: ~24 hours

Monitoring & Observability

CloudWatch Metrics

  • Lambda: Invocations, Duration, Errors, Throttles
  • API Gateway: 4xx, 5xx, Latency, Request Count
  • Bedrock: Model invocations, token usage
  • DynamoDB: Read/write capacity, throttles

CloudWatch Logs

  • Lambda function logs (7-day retention)
  • API Gateway access logs (optional)
  • Structured JSON logging in Lambda

Alarms

  • Lambda error rate > 1%
  • API Gateway 5xx rate > 0.5%
  • Lambda duration > 30 seconds
  • DynamoDB throttling events

Distributed Tracing

  • X-Ray integration for Lambda (optional)
  • Request ID propagation across services

Last updated: 2026-01-17