Project Overview
What is the NorthBuilt RAG System?
The NorthBuilt RAG System is a production-ready serverless Retrieval-Augmented Generation (RAG) platform built entirely on AWS infrastructure with S3 Vectors for vector storage. It combines document storage, semantic search, and large language models to provide intelligent, context-aware responses to user queries.
Key AWS Services:
- Amazon Bedrock Knowledge Bases - Managed RAG orchestration
- Amazon S3 Vectors - Purpose-built vector storage
- Amazon Titan Embeddings - Vector embeddings
- Claude on Amazon Bedrock - Response generation
Core Capabilities
Document Intelligence
- Automated Ingestion: Documents uploaded via API are automatically chunked, embedded, and indexed
- Multi-Format Support: Handles text documents, PDFs (via Bedrock parsing), and structured data
- Smart Chunking: Bedrock automatically chunks documents for optimal retrieval
- Metadata Preservation: Maintains document metadata for filtering and organization
Semantic Search
- Vector Embeddings: Uses AWS Titan Embeddings v2 (1024 dimensions) for high-quality representations
- Fast Retrieval: S3 Vectors delivers fast query latency at scale
- Relevance Scoring: Returns confidence scores for each retrieved chunk
- Multi-Tenant Support: Isolates documents by client for data segregation; all projects under a client are accessible for richer context
AI-Powered Responses
- Claude Sonnet 4.5: State-of-the-art language model for response generation
- Context-Aware: Grounds responses in retrieved documents to prevent hallucinations
- Source Citations: Every response includes source documents with relevance scores
- Conversational: Maintains chat history for follow-up questions
Enterprise Integrations
- Fathom: Automatically ingests and indexes video transcripts
- HelpScout: Indexes support conversations for better customer service
- Linear: Integrates project management data for team knowledge sharing
- Google OAuth: Secure authentication via AWS Cognito
Architecture Overview
User Query
↓
API Gateway (+ Cognito Auth)
↓
Chat Lambda
↓
Bedrock Knowledge Base
↓
S3 Vectors ← Titan Embeddings
↓
Claude Sonnet 4.5 (Response Generation)
↓
Response with Sources
Technology Stack
Infrastructure
- Terraform: v1.13+ for infrastructure as code
- AWS Services: Lambda, S3, S3 Vectors, API Gateway, Cognito, Bedrock, Secrets Manager, CloudWatch
- GitHub Actions: CI/CD with OIDC authentication
Backend
- Python 3.13: Lambda runtime
- Boto3: AWS SDK for Python
- Flask: Local development server
Frontend
- React 19: Modern React with TypeScript
- Tailwind CSS 4: Utility-first CSS framework
- CloudFront: Global CDN with custom domain
AI/ML
- AWS Bedrock: Managed AI service
- Claude Sonnet 4.5: LLM for response generation
- Titan Embeddings v2: 1024-dimensional embeddings
- S3 Vectors: Purpose-built vector storage with native Bedrock integration
Key Features
Serverless Architecture
- Auto-Scaling: Automatically scales to handle traffic spikes
- Pay-Per-Use: Only pay for what you actually use
- High Availability: Built on AWS managed services (99.9%+ uptime)
- Global Performance: CloudFront CDN for <100ms latency worldwide
Cost-Optimized
- Significant Cost Reduction: Migrated from OpenSearch to S3 Vectors (fully AWS-native)
- True Serverless: Scales to zero when not in use
- Efficient Storage: S3 Glacier Instant Retrieval for backups
- Optimized Compute: Right-sized Lambda functions
Developer-Friendly
- Infrastructure as Code: 100% Terraform with modular design
- One-Command Deploy: GitHub Actions CI/CD
- Local Development: Run entire stack locally
- Comprehensive Docs: Extensive documentation for every component
Production-Ready
- Automated Backups: Continuous S3 replication with 14-day retention
- Point-in-Time Recovery: DynamoDB PITR for 35-day recovery window
- Security: IAM least-privilege, encrypted secrets, JWT authentication
- Monitoring: CloudWatch logs, metrics, and alarms
System Components
Data Layer
- S3 Buckets: Document storage with versioning
- S3 Vectors: Serverless vector storage with Bedrock integration
- DynamoDB: Classification data and entity relationships
- Secrets Manager: Encrypted API keys and credentials
For complete AWS documentation references, see AWS Documentation References.
Compute Layer
- 12 Lambda Functions:
chat: User-facing chat interfaceclassify: Document classification engineingest: Knowledge base ingestionfathom-sync,fathom-sync-worker,fathom-webhook: Fathom integrationhelpscout-sync,helpscout-sync-worker,helpscout-webhook: HelpScout integrationlinear-sync,linear-sync-worker,linear-webhook: Linear integration
API Layer
- API Gateway: HTTP API with JWT authorization
- CloudFront: CDN for web UI with custom domain
- Cognito: User authentication with Google OAuth
AI/ML Layer
- Bedrock Knowledge Base: Managed RAG orchestration
- Titan Embeddings v2: Vector embeddings
- Claude Sonnet 4.5: Response generation
- Bedrock Parsing: Document parsing
Performance Characteristics
Query Performance
- Vector Search: Fast retrieval via S3 Vectors + Bedrock KB
- LLM Response: 2-4 seconds for typical query
- End-to-End: 2.5-5 seconds user query to response
- Concurrent Users: Scales to 1000+ simultaneous queries
Ingestion Performance
- Single Document: 1-2 seconds to S3, 30-60 seconds to index
- Batch Processing: 100 documents in ~5 minutes
- Large Documents: Up to 50 pages/document supported
Scalability
- Documents: Tested with 100K+ documents
- Vectors: Supports millions (S3 Vectors scales automatically)
- Queries: 10K+ queries/day without degradation
- Storage: Effectively unlimited (S3)
Cost Structure
Monthly Operating Cost: Variable based on usage (pay-per-use model)
- S3 Vectors: Pay per storage and query (no fixed monthly cost)
- Bedrock: ~$18/month (LLM + embeddings at typical usage)
- Lambda: ~$8.50/month (compute)
- Other AWS: ~$10/month (S3, API Gateway, DynamoDB, etc.)
See Cost Analysis for detailed breakdown.
Use Cases
Internal Knowledge Base
Enable employees to query company documentation, policies, and procedures using natural language.
Customer Support
Provide support agents with instant access to product documentation and past support cases.
Engineering Documentation
Index code repositories, technical specs, and architecture docs for engineering teams.
Sales Enablement
Give sales teams quick access to product information, pricing, and competitive intelligence.
Getting Started
For New Engineers
- Initial Setup - Configure your environment
- Local Development - Run the system locally
For Platform Engineers
- Bootstrap Guide - Set up AWS infrastructure
- Deployment Guide - Deploy via GitHub Actions
Last updated: 2026-01-01