Project Overview

What is the NorthBuilt RAG System?

The NorthBuilt RAG System is a production-ready serverless Retrieval-Augmented Generation (RAG) platform built entirely on AWS infrastructure with S3 Vectors for vector storage. It combines document storage, semantic search, and large language models to provide intelligent, context-aware responses to user queries.

Key AWS Services:

Core Capabilities

Document Intelligence

  • Automated Ingestion: Documents uploaded via API are automatically chunked, embedded, and indexed
  • Multi-Format Support: Handles text documents, PDFs (via Bedrock parsing), and structured data
  • Smart Chunking: Bedrock automatically chunks documents for optimal retrieval
  • Metadata Preservation: Maintains document metadata for filtering and organization
  • Vector Embeddings: Uses AWS Titan Embeddings v2 (1024 dimensions) for high-quality representations
  • Fast Retrieval: S3 Vectors delivers fast query latency at scale
  • Relevance Scoring: Returns confidence scores for each retrieved chunk
  • Multi-Tenant Support: Isolates documents by client for data segregation; all projects under a client are accessible for richer context

AI-Powered Responses

  • Claude Sonnet 4.5: State-of-the-art language model for response generation
  • Context-Aware: Grounds responses in retrieved documents to prevent hallucinations
  • Source Citations: Every response includes source documents with relevance scores
  • Conversational: Maintains chat history for follow-up questions

Enterprise Integrations

  • Fathom: Automatically ingests and indexes video transcripts
  • HelpScout: Indexes support conversations for better customer service
  • Linear: Integrates project management data for team knowledge sharing
  • Google OAuth: Secure authentication via AWS Cognito

Architecture Overview

User Query
    ↓
API Gateway (+ Cognito Auth)
    ↓
Chat Lambda
    ↓
Bedrock Knowledge Base
    ↓
S3 Vectors ← Titan Embeddings
    ↓
Claude Sonnet 4.5 (Response Generation)
    ↓
Response with Sources

Technology Stack

Infrastructure

  • Terraform: v1.13+ for infrastructure as code
  • AWS Services: Lambda, S3, S3 Vectors, API Gateway, Cognito, Bedrock, Secrets Manager, CloudWatch
  • GitHub Actions: CI/CD with OIDC authentication

Backend

  • Python 3.13: Lambda runtime
  • Boto3: AWS SDK for Python
  • Flask: Local development server

Frontend

  • React 19: Modern React with TypeScript
  • Tailwind CSS 4: Utility-first CSS framework
  • CloudFront: Global CDN with custom domain

AI/ML

Key Features

Serverless Architecture

  • Auto-Scaling: Automatically scales to handle traffic spikes
  • Pay-Per-Use: Only pay for what you actually use
  • High Availability: Built on AWS managed services (99.9%+ uptime)
  • Global Performance: CloudFront CDN for <100ms latency worldwide

Cost-Optimized

  • Significant Cost Reduction: Migrated from OpenSearch to S3 Vectors (fully AWS-native)
  • True Serverless: Scales to zero when not in use
  • Efficient Storage: S3 Glacier Instant Retrieval for backups
  • Optimized Compute: Right-sized Lambda functions

Developer-Friendly

  • Infrastructure as Code: 100% Terraform with modular design
  • One-Command Deploy: GitHub Actions CI/CD
  • Local Development: Run entire stack locally
  • Comprehensive Docs: Extensive documentation for every component

Production-Ready

  • Automated Backups: Continuous S3 replication with 14-day retention
  • Point-in-Time Recovery: DynamoDB PITR for 35-day recovery window
  • Security: IAM least-privilege, encrypted secrets, JWT authentication
  • Monitoring: CloudWatch logs, metrics, and alarms

System Components

Data Layer

  • S3 Buckets: Document storage with versioning
  • S3 Vectors: Serverless vector storage with Bedrock integration
  • DynamoDB: Classification data and entity relationships
  • Secrets Manager: Encrypted API keys and credentials

For complete AWS documentation references, see AWS Documentation References.

Compute Layer

  • 12 Lambda Functions:
    • chat: User-facing chat interface
    • classify: Document classification engine
    • ingest: Knowledge base ingestion
    • fathom-sync, fathom-sync-worker, fathom-webhook: Fathom integration
    • helpscout-sync, helpscout-sync-worker, helpscout-webhook: HelpScout integration
    • linear-sync, linear-sync-worker, linear-webhook: Linear integration

API Layer

  • API Gateway: HTTP API with JWT authorization
  • CloudFront: CDN for web UI with custom domain
  • Cognito: User authentication with Google OAuth

AI/ML Layer

Performance Characteristics

Query Performance

  • Vector Search: Fast retrieval via S3 Vectors + Bedrock KB
  • LLM Response: 2-4 seconds for typical query
  • End-to-End: 2.5-5 seconds user query to response
  • Concurrent Users: Scales to 1000+ simultaneous queries

Ingestion Performance

  • Single Document: 1-2 seconds to S3, 30-60 seconds to index
  • Batch Processing: 100 documents in ~5 minutes
  • Large Documents: Up to 50 pages/document supported

Scalability

  • Documents: Tested with 100K+ documents
  • Vectors: Supports millions (S3 Vectors scales automatically)
  • Queries: 10K+ queries/day without degradation
  • Storage: Effectively unlimited (S3)

Cost Structure

Monthly Operating Cost: Variable based on usage (pay-per-use model)

  • S3 Vectors: Pay per storage and query (no fixed monthly cost)
  • Bedrock: ~$18/month (LLM + embeddings at typical usage)
  • Lambda: ~$8.50/month (compute)
  • Other AWS: ~$10/month (S3, API Gateway, DynamoDB, etc.)

See Cost Analysis for detailed breakdown.

Use Cases

Internal Knowledge Base

Enable employees to query company documentation, policies, and procedures using natural language.

Customer Support

Provide support agents with instant access to product documentation and past support cases.

Engineering Documentation

Index code repositories, technical specs, and architecture docs for engineering teams.

Sales Enablement

Give sales teams quick access to product information, pricing, and competitive intelligence.

Getting Started

For New Engineers

  1. Initial Setup - Configure your environment
  2. Local Development - Run the system locally

For Platform Engineers

  1. Bootstrap Guide - Set up AWS infrastructure
  2. Deployment Guide - Deploy via GitHub Actions

Last updated: 2026-01-01