Project Overview

What is the NorthBuilt RAG System?

The NorthBuilt RAG System is a production-ready serverless Retrieval-Augmented Generation (RAG) platform built entirely on AWS infrastructure with S3 Vectors for vector storage. It combines document storage, semantic search, and large language models to provide intelligent, context-aware responses to user queries.

Key AWS Services:

Amazon Bedrock Knowledge Bases - Managed RAG orchestration
Amazon S3 Vectors - Purpose-built vector storage
Amazon Titan Embeddings - Vector embeddings
Claude on Amazon Bedrock - Response generation

Core Capabilities

Document Intelligence

Automated Ingestion: Documents uploaded via API are automatically chunked, embedded, and indexed
Multi-Format Support: Handles text documents, PDFs (via Bedrock parsing), and structured data
Smart Chunking: Bedrock automatically chunks documents for optimal retrieval
Metadata Preservation: Maintains document metadata for filtering and organization

Semantic Search

Vector Embeddings: Uses AWS Titan Embeddings v2 (1024 dimensions) for high-quality representations
Fast Retrieval: S3 Vectors delivers fast query latency at scale
Relevance Scoring: Returns confidence scores for each retrieved chunk
Multi-Tenant Support: Isolates documents by client for data segregation; all projects under a client are accessible for richer context

AI-Powered Responses

Claude Sonnet 4.5: State-of-the-art language model for response generation
Context-Aware: Grounds responses in retrieved documents to prevent hallucinations
Source Citations: Every response includes source documents with relevance scores
Conversational: Maintains chat history for follow-up questions

Enterprise Integrations

Fathom: Automatically ingests and indexes video transcripts
HelpScout: Indexes support conversations for better customer service
Google OAuth: Secure authentication via AWS Cognito

Architecture Overview

User Query
    ↓
API Gateway (+ Cognito Auth)
    ↓
Chat Lambda
    ↓
Bedrock Knowledge Base
    ↓
S3 Vectors ← Titan Embeddings
    ↓
Claude Sonnet 4.5 (Response Generation)
    ↓
Response with Sources

Technology Stack

Infrastructure

Terraform: v1.13+ for infrastructure as code
AWS Services: Lambda, S3, S3 Vectors, API Gateway, Cognito, Bedrock, Secrets Manager, CloudWatch
GitHub Actions: CI/CD with OIDC authentication

Backend

Python 3.14: Lambda runtime
Boto3: AWS SDK for Python
Flask: Local development server

Frontend

React 19: Modern React with TypeScript
Tailwind CSS 4: Utility-first CSS framework
CloudFront: Global CDN with custom domain

AI/ML

AWS Bedrock: Managed AI service
Claude Sonnet 4.5: LLM for response generation
Titan Embeddings v2: 1024-dimensional embeddings
S3 Vectors: Purpose-built vector storage with native Bedrock integration

Key Features

Serverless Architecture

Auto-Scaling: Automatically scales to handle traffic spikes
Pay-Per-Use: Only pay for what you actually use
High Availability: Built on AWS managed services (99.9%+ uptime)
Global Performance: CloudFront CDN for <100ms latency worldwide

Cost-Optimized

Significant Cost Reduction: Migrated from OpenSearch to S3 Vectors (fully AWS-native)
True Serverless: Scales to zero when not in use
Efficient Storage: S3 Glacier Instant Retrieval for backups
Optimized Compute: Right-sized Lambda functions

Developer-Friendly

Infrastructure as Code: 100% Terraform with modular design
One-Command Deploy: GitHub Actions CI/CD
Local Development: Run entire stack locally
Comprehensive Docs: Extensive documentation for every component

Production-Ready

Automated Backups: Continuous S3 replication with 14-day retention
Point-in-Time Recovery: DynamoDB PITR for 35-day recovery window
Security: IAM least-privilege, encrypted secrets, JWT authentication
Monitoring: CloudWatch logs, metrics, and alarms

System Components

Data Layer

S3 Buckets: Document storage with versioning
S3 Vectors: Serverless vector storage with Bedrock integration
DynamoDB: Classification data and entity relationships
Secrets Manager: Encrypted API keys and credentials

For complete AWS documentation references, see AWS Documentation References.

Compute Layer

9 Lambda Functions:
- chat: User-facing chat interface
- classification: Document classification engine
- ingest: Knowledge base ingestion
- fathom-sync, fathom-sync-worker, fathom-webhook: Fathom integration
- helpscout-sync, helpscout-sync-worker, helpscout-webhook: HelpScout integration

API Layer

API Gateway: HTTP API with JWT authorization
CloudFront: CDN for web UI with custom domain
Cognito: User authentication with Google OAuth

AI/ML Layer

Bedrock Knowledge Base: Managed RAG orchestration
Titan Embeddings v2: Vector embeddings
Claude Sonnet 4.5: Response generation
Bedrock Parsing: Document parsing

Performance Characteristics

Query Performance

Vector Search: Fast retrieval via S3 Vectors + Bedrock KB
LLM Response: 2-4 seconds for typical query
End-to-End: 2.5-5 seconds user query to response
Concurrent Users: Scales to 1000+ simultaneous queries

Ingestion Performance

Single Document: 1-2 seconds to S3, 30-60 seconds to index
Batch Processing: 100 documents in ~5 minutes
Large Documents: Up to 50 pages/document supported

Scalability

Documents: Tested with 100K+ documents
Vectors: Supports millions (S3 Vectors scales automatically)
Queries: 10K+ queries/day without degradation
Storage: Effectively unlimited (S3)

Cost Structure

Monthly Operating Cost: Variable based on usage (pay-per-use model)

S3 Vectors: Pay per storage and query (no fixed monthly cost)
Bedrock: ~$18/month (LLM + embeddings at typical usage)
Lambda: ~$8.50/month (compute)
Other AWS: ~$10/month (S3, API Gateway, DynamoDB, etc.)

See Cost Analysis for detailed breakdown.

Use Cases

Internal Knowledge Base

Enable employees to query company documentation, policies, and procedures using natural language.

Customer Support

Provide support agents with instant access to product documentation and past support cases.

Engineering Documentation

Index code repositories, technical specs, and architecture docs for engineering teams.

Sales Enablement

Give sales teams quick access to product information, pricing, and competitive intelligence.

Getting Started

For New Engineers

Initial Setup - Configure your environment
Local Development - Run the system locally

For Platform Engineers

Bootstrap Guide - Set up AWS infrastructure
Deployment Guide - Deploy via GitHub Actions

Last updated: 2026-01-01