System Limitations
Known limitations, constraints, and considerations for the NorthBuilt RAG System.
AWS Documentation References:
Table of Contents
- Overview
- Search and Retrieval Limitations
- Query Understanding Limitations
- Document and Content Limitations
- Performance Limitations
- Infrastructure Limitations
- Security and Compliance Limitations
- Cost and Usage Limitations
- Integration Limitations
- User Interface Limitations
- Known Issues
- Comparison with Alternatives
- Requesting Changes
Overview
This document describes the known limitations of the NorthBuilt RAG System. Understanding these constraints helps set appropriate expectations and informs architectural decisions for future enhancements.
Search and Retrieval Limitations
Semantic Search Only
Limitation: The system uses semantic (vector) search exclusively. Keyword-based or hybrid search is not supported.
Impact:
- Exact phrase matching is not guaranteed
- Queries relying on specific terminology may return semantically similar but not exact matches
- Acronyms and technical jargon may not match as expected
Workaround:
- Include context in queries to improve semantic matching
- Register important acronyms as client aliases in the entity registry
Future Option: Export to OpenSearch Serverless for hybrid search (see RAG Improvements)
Maximum Retrieval Results
Limitation: Maximum 100 results per query (S3 Vectors TopK limit)
Impact: For very broad queries, some relevant documents may not be retrieved
Current Setting: Default is 5 results, configurable up to 20 via API
No Real-Time Document Updates
Limitation: New documents become searchable after the next ingestion job, not immediately
Impact: Documents uploaded via webhook may take up to 5 minutes to appear in search results
Technical Details:
- Webhooks upload to S3 immediately
- Scheduled Lambda triggers ingestion every 5 minutes
- Ingestion processes the document and adds vectors to S3 Vectors
Workaround: For time-critical documents, trigger manual ingestion via AWS Console
Query Understanding Limitations
Client Recognition Only
Limitation: The system only extracts and filters by client, not by project or other entities
Impact:
- Cannot filter queries to a specific project within a client
- All documents from all projects under a client are included in search results
Rationale: This is by design to provide richer context. See Query Understanding Architecture
Entity Recognition Depends on Registry
Limitation: Query understanding only recognizes clients that exist in the DynamoDB entity registry
Impact:
- New clients must be synced from Linear before queries can filter by them
- Misspellings or unknown aliases will not match
Workaround: Ensure Linear is properly configured and syncing team data
Clarification Prompts
Limitation: When multiple clients match or confidence is low, the system prompts for clarification rather than guessing
Impact: Some queries require a follow-up interaction to specify the intended client
Threshold: Confidence score below 0.8 triggers clarification
Document and Content Limitations
Document Size Limits
| Limit | Value | Source |
|---|---|---|
| Maximum file size | 50 MB | Bedrock KB Prerequisites |
| Maximum pages (PDF) | 50 pages | Bedrock parsing limit |
| Maximum chunk size | 512 tokens | Configured in data source |
| Maximum metadata size | 2 KB filterable, 40 KB total | S3 Vectors Metadata |
Supported Document Formats
| Format | Support | Notes |
|---|---|---|
| Markdown (.md) | Full | Primary format |
| Plain text (.txt) | Full | |
| Partial | Bedrock parsing, max 50 pages | |
| Word (.docx) | Not supported | Convert to Markdown first |
| Excel (.xlsx) | Not supported | Export to CSV/text |
| HTML | Not supported | Convert to Markdown |
| Images | Not supported | No OCR or image analysis |
Content Processing
Limitation: Documents are chunked by token count, not by semantic boundaries
Impact:
- Important context may be split across chunks
- Conversations in meeting transcripts may be divided mid-discussion
Mitigation: 20% chunk overlap helps preserve context at boundaries
Performance Limitations
Response Latency
| Operation | Typical Latency | Notes |
|---|---|---|
| Query understanding | 200-500ms | Claude Haiku call |
| Vector search | 100-300ms | S3 Vectors query |
| Reranking | 200-400ms | Cohere rerank |
| Response generation | 2-4 seconds | Claude Sonnet 4.5 |
| Total end-to-end | 2.5-5 seconds |
Impact: Real-time or low-latency applications may find response times too slow
No Streaming: Responses are returned only after full generation completes
Concurrency Limits
| Resource | Limit | Notes |
|---|---|---|
| Lambda concurrent executions | Account default (1000) | Can be increased |
| Reserved concurrency (chat) | 10 | Configurable |
| API Gateway rate limit | 10 req/s, 20 burst | Configurable |
| Bedrock model throughput | On-demand | AWS manages |
Cold Start Latency
Limitation: Lambda cold starts add 1-2 seconds to first request after idle period
Impact: First query after period of inactivity is slower
Mitigation Options:
- Provisioned concurrency (additional cost)
- Warm-up pings (not implemented)
Infrastructure Limitations
Single Region Deployment
Limitation: System runs in us-east-1 only
Impact:
- Higher latency for users in other regions
- No failover if us-east-1 has an outage
Future Option: Multi-region deployment (see RAG Improvements)
No Local Development Environment
Limitation: Full system cannot run locally; requires AWS services
Impact:
- Developers need AWS credentials for testing
- Integration tests run against production AWS resources
- No offline development possible
Workaround: Use mocked responses for unit tests; use AWS for integration tests
Immutable Configuration
These settings cannot be changed after creation. See Bedrock KB documentation and S3 Vectors limitations.
| Configuration | Scope | To Change |
|---|---|---|
| Vector dimensions | Knowledge Base | Recreate KB |
| Embedding model | Knowledge Base | Recreate KB |
| Chunking strategy | Data Source | Recreate data source |
| Distance metric | S3 Vectors Index | Recreate index |
| Non-filterable keys | S3 Vectors Index | Recreate index |
Security and Compliance Limitations
Authentication
Limitation: Only Google OAuth is supported for user authentication
Impact: Organizations using other identity providers cannot integrate directly
Workaround: Configure additional identity providers in Cognito (manual setup)
Data Residency
Limitation: All data stored in US (us-east-1)
Impact: May not meet data residency requirements for EU (GDPR) or other jurisdictions
Future Option: Deploy in EU region (eu-west-1)
Audit Logging
Limitation: Vector operation logging (QueryVectors, PutVectors) not enabled by default
Impact: Cannot audit who queried what content
Enable: Configure CloudTrail data events for S3 Vectors (see RAG Improvements)
Cost and Usage Limitations
Pay-Per-Use Model
Limitation: No fixed monthly cost; costs scale with usage
Impact: Unpredictable costs for variable workloads
Mitigation: Set CloudWatch billing alarms (see Cost Analysis)
Token Limits
| Model | Max Input | Max Output | Source |
|---|---|---|---|
| Claude Sonnet 4.5 | 200K tokens | 8K tokens | Claude on Bedrock |
| Claude Haiku | 200K tokens | 4K tokens | Claude on Bedrock |
| Titan Embeddings v2 | 8K tokens | N/A | Titan Embeddings |
Impact: Very long documents or conversations may be truncated
Rate Limits
| Service | Limit | Source |
|---|---|---|
| S3 Vectors writes | 1,000 requests/second | S3 Vectors Quotas |
| S3 Vectors reads | Hundreds/second | S3 Vectors Quotas |
| Bedrock inference | Account-specific | Bedrock Quotas |
Integration Limitations
Supported Data Sources
| Source | Integration | Notes |
|---|---|---|
| Fathom | Webhook + Sync | Video transcripts |
| HelpScout | Webhook + Sync | Support conversations |
| Linear | Webhook + Sync | Teams/Projects (entity registry only) |
| Manual upload | S3 only | No API endpoint for direct upload |
Webhook Reliability
Limitation: Webhooks are at-most-once delivery; missed webhooks are handled by scheduled sync
Impact: Documents may be delayed if webhook fails and must wait for sync
Sync Frequency: Every 5 minutes (configurable)
User Interface Limitations
Web UI Only
Limitation: Only web browser interface is provided
Impact: No mobile app, desktop app, or CLI tool for end users
No Conversation Export
Limitation: Chat history cannot be exported
Impact: Users cannot save or share conversations
No Feedback Loop
Limitation: No mechanism for users to rate response quality
Impact: Cannot systematically improve responses based on user feedback
Known Issues
Reranking Availability
Issue: Amazon Rerank 1.0 is not available in us-east-1
Current Solution: Using Cohere Rerank 3.5 instead
Status: Working as expected with Cohere
Reference: Bedrock Reranking Supported Models and Regions
LLM Parsing Disabled
Issue: Bedrock LLM parsing exceeds S3 Vectors 2KB filterable metadata limit
Current Solution: LLM parsing disabled; metadata provided via sidecar files
Status: 100% ingestion success rate with current configuration
Reference: S3 Vectors Metadata Filtering - 2KB filterable, 40KB non-filterable limits
Comparison with Alternatives
| Feature | NorthBuilt RAG | OpenSearch | Pinecone |
|---|---|---|---|
| Hybrid search | No | Yes | Yes |
| Multi-region | No | Yes | Yes |
| Real-time indexing | No (5 min) | Near real-time | Near real-time |
| Managed service | Fully | Serverless option | Fully |
| Cost model | Pay-per-use | OCU-based | Pod-based |
Requesting Changes
For feature requests or to address limitations:
- Check if the limitation is documented in RAG Improvements
- Create a GitHub issue with the feature request
- Include use case and business justification
Last updated: 2026-01-01