System Limitations

Known limitations, constraints, and considerations for the NorthBuilt RAG System.

AWS Documentation References:

Overview
Search and Retrieval Limitations
Query Understanding Limitations
Document and Content Limitations
Performance Limitations
Infrastructure Limitations
Security and Compliance Limitations
Cost and Usage Limitations
Integration Limitations
1. Supported Data Sources
2. Webhook Reliability
User Interface Limitations
Known Issues
1. Reranking Requires Marketplace Agreement
2. LLM Parsing Disabled
Comparison with Alternatives
Requesting Changes

Overview

This document describes the known limitations of the NorthBuilt RAG System. Understanding these constraints helps set appropriate expectations and informs architectural decisions for future enhancements.

Search and Retrieval Limitations

Semantic Search Only

Limitation: The system uses semantic (vector) search exclusively. Keyword-based or hybrid search is not supported.

Impact:

Exact phrase matching is not guaranteed
Queries relying on specific terminology may return semantically similar but not exact matches
Acronyms and technical jargon may not match as expected

Workaround:

Include context in queries to improve semantic matching
Register important acronyms as client aliases in the entity registry

Future Option: Export to OpenSearch Serverless for hybrid search (see RAG Improvements)

Maximum Retrieval Results

Limitation: Maximum 100 results per query (S3 Vectors TopK limit)

Impact: For very broad queries, some relevant documents may not be retrieved

Current Setting: Default is 5 results, configurable up to 20 via API

No Real-Time Document Updates

Limitation: New documents become searchable after the next ingestion job, not immediately

Impact: Documents uploaded via webhook may take up to 5 minutes to appear in search results

Technical Details:

Webhooks upload to S3 immediately
Scheduled Lambda triggers ingestion every 5 minutes
Ingestion processes the document and adds vectors to S3 Vectors

Workaround: For time-critical documents, trigger manual ingestion via AWS Console

Query Understanding Limitations

Client Recognition Only

Limitation: The system only extracts and filters by client, not by project or other entities

Impact:

Cannot filter queries to a specific project within a client
All documents from all projects under a client are included in search results

Rationale: This is by design to provide richer context. See Query Understanding Architecture

Entity Recognition Depends on Registry

Limitation: Query understanding only recognizes clients that exist in the DynamoDB entity registry

Impact:

New clients must be added via the Management UI before queries can filter by them
Misspellings or unknown aliases will not match

Workaround: Ensure clients are created in the Management UI with appropriate aliases

Clarification Prompts

Limitation: When multiple clients match or confidence is low, the system prompts for clarification rather than guessing

Impact: Some queries require a follow-up interaction to specify the intended client

Threshold: Confidence score below 0.8 triggers clarification

Document and Content Limitations

Document Size Limits

Limit	Value	Source
Maximum file size	50 MB	Bedrock KB Prerequisites
Maximum pages (PDF)	50 pages	Bedrock parsing limit
Maximum chunk size	512 tokens	Configured in data source
Maximum metadata size	2 KB filterable, 40 KB total	S3 Vectors Metadata

Supported Document Formats

Format	Support	Notes
Markdown (.md)	Full	Primary format
Plain text (.txt)	Full
PDF	Partial	Bedrock parsing, max 50 pages
Word (.docx)	Not supported	Convert to Markdown first
Excel (.xlsx)	Not supported	Export to CSV/text
HTML	Not supported	Convert to Markdown
Images	Not supported	No OCR or image analysis

Content Processing

Limitation: Documents are chunked by token count, not by semantic boundaries

Impact:

Important context may be split across chunks
Conversations in meeting transcripts may be divided mid-discussion

Mitigation: 20% chunk overlap helps preserve context at boundaries

Performance Limitations

Response Latency

Operation	Typical Latency	Notes
Query understanding	200-500ms	Claude Haiku call
Vector search	100-300ms	S3 Vectors query
Reranking	200-400ms	Cohere rerank
Response generation	2-4 seconds	Claude Sonnet 4.5
Total end-to-end	2.5-5 seconds

Impact: Real-time or low-latency applications may find response times too slow

No Streaming: Responses are returned only after full generation completes

Concurrency Limits

Resource	Limit	Notes
Lambda concurrent executions	Account default (1000)	Can be increased
Reserved concurrency (chat)	10	Configurable
API Gateway rate limit	10 req/s, 20 burst	Configurable
Bedrock model throughput	On-demand	AWS manages

Cold Start Latency

Limitation: Lambda cold starts add 1-2 seconds to first request after idle period

Impact: First query after period of inactivity is slower

Mitigation Options:

Provisioned concurrency (additional cost)
Warm-up pings (not implemented)

Infrastructure Limitations

Single Region Deployment

Limitation: System runs in us-east-1 only

Impact:

Higher latency for users in other regions
No failover if us-east-1 has an outage

Future Option: Multi-region deployment (see RAG Improvements)

No Local Development Environment

Limitation: Full system cannot run locally; requires AWS services

Impact:

Developers need AWS credentials for testing
Integration tests run against production AWS resources
No offline development possible

Workaround: Use mocked responses for unit tests; use AWS for integration tests

Immutable Configuration

These settings cannot be changed after creation. See Bedrock KB documentation and S3 Vectors limitations.

Configuration	Scope	To Change
Vector dimensions	Knowledge Base	Recreate KB
Embedding model	Knowledge Base	Recreate KB
Chunking strategy	Data Source	Recreate data source
Distance metric	S3 Vectors Index	Recreate index
Non-filterable keys	S3 Vectors Index	Recreate index

Security and Compliance Limitations

Authentication

Limitation: Only Google OAuth is supported for user authentication

Impact: Organizations using other identity providers cannot integrate directly

Workaround: Configure additional identity providers in Cognito (manual setup)

Data Residency

Limitation: All data stored in US (us-east-1)

Impact: May not meet data residency requirements for EU (GDPR) or other jurisdictions

Future Option: Deploy in EU region (eu-west-1)

Audit Logging

Limitation: Vector operation logging (QueryVectors, PutVectors) not enabled by default

Impact: Cannot audit who queried what content

Enable: Configure CloudTrail data events for S3 Vectors (see RAG Improvements)

Cost and Usage Limitations

Pay-Per-Use Model

Limitation: No fixed monthly cost; costs scale with usage

Impact: Unpredictable costs for variable workloads

Mitigation: Set CloudWatch billing alarms (see Cost Analysis)

Token Limits

Model	Max Input	Max Output	Source
Claude Sonnet 4.5	200K tokens	8K tokens	Claude on Bedrock
Claude Haiku	200K tokens	4K tokens	Claude on Bedrock
Titan Embeddings v2	8K tokens	N/A	Titan Embeddings

Impact: Very long documents or conversations may be truncated

Rate Limits

Service	Limit	Source
S3 Vectors writes	1,000 requests/second	S3 Vectors Quotas
S3 Vectors reads	Hundreds/second	S3 Vectors Quotas
Bedrock inference	Account-specific	Bedrock Quotas

Integration Limitations

Supported Data Sources

Source	Integration	Notes
Fathom	Webhook + Sync	Video transcripts
HelpScout	Webhook + Sync	Support conversations
Manual upload	S3 only	No API endpoint for direct upload

Webhook Reliability

Limitation: Webhooks are at-most-once delivery; missed webhooks are handled by scheduled sync

Impact: Documents may be delayed if webhook fails and must wait for sync

Sync Frequency: Every 5 minutes (configurable)

User Interface Limitations

Web UI Only

Limitation: Only web browser interface is provided

Impact: No mobile app, desktop app, or CLI tool for end users

No Conversation Export

Limitation: Chat history cannot be exported

Impact: Users cannot save or share conversations

No Feedback Loop

Limitation: No mechanism for users to rate response quality

Impact: Cannot systematically improve responses based on user feedback

Known Issues

Reranking Requires Marketplace Agreement

Issue: Cohere Rerank 3.5 requires accepting the AWS Marketplace agreement before use

Current Solution: Reranking is disabled by default via ENABLE_RERANKING=false

To Enable:

Go to AWS Bedrock console
Navigate to Model access
Accept the Cohere Rerank 3.5 marketplace agreement
Set ENABLE_RERANKING=true in Lambda environment variables

Status: Optional feature - works without reranking, improved relevance with it

Reference: Bedrock Reranking Supported Models and Regions

LLM Parsing Disabled

Issue: Bedrock LLM parsing exceeds S3 Vectors 2KB filterable metadata limit

Current Solution: LLM parsing disabled; metadata provided via sidecar files

Status: 100% ingestion success rate with current configuration

Reference: S3 Vectors Metadata Filtering - 2KB filterable, 40KB non-filterable limits

Comparison with Alternatives

Feature	NorthBuilt RAG	OpenSearch	Pinecone
Hybrid search	No	Yes	Yes
Multi-region	No	Yes	Yes
Real-time indexing	No (5 min)	Near real-time	Near real-time
Managed service	Fully	Serverless option	Fully
Cost model	Pay-per-use	OCU-based	Pod-based

Requesting Changes

For feature requests or to address limitations:

Check if the limitation is documented in RAG Improvements
Create a GitHub issue with the feature request
Include use case and business justification

Last updated: 2026-01-01

System Limitations

Table of Contents

Overview

Search and Retrieval Limitations

Semantic Search Only

Maximum Retrieval Results

No Real-Time Document Updates

Query Understanding Limitations

Client Recognition Only

Entity Recognition Depends on Registry

Clarification Prompts

Document and Content Limitations

Document Size Limits

Supported Document Formats

Content Processing

Performance Limitations

Response Latency

Concurrency Limits

Cold Start Latency

Infrastructure Limitations

Single Region Deployment

No Local Development Environment

Immutable Configuration

Security and Compliance Limitations

Authentication

Data Residency

Audit Logging

Cost and Usage Limitations

Pay-Per-Use Model

Token Limits

Rate Limits

Integration Limitations

Supported Data Sources

Webhook Reliability

User Interface Limitations

Web UI Only

No Conversation Export

No Feedback Loop

Known Issues

Reranking Requires Marketplace Agreement

LLM Parsing Disabled

Comparison with Alternatives

Requesting Changes