System Limitations

Known limitations, constraints, and considerations for the NorthBuilt RAG System.

AWS Documentation References:

Table of Contents

  1. Overview
  2. Search and Retrieval Limitations
    1. Semantic Search Only
    2. Maximum Retrieval Results
    3. No Real-Time Document Updates
  3. Query Understanding Limitations
    1. Client Recognition Only
    2. Entity Recognition Depends on Registry
    3. Clarification Prompts
  4. Document and Content Limitations
    1. Document Size Limits
    2. Supported Document Formats
    3. Content Processing
  5. Performance Limitations
    1. Response Latency
    2. Concurrency Limits
    3. Cold Start Latency
  6. Infrastructure Limitations
    1. Single Region Deployment
    2. No Local Development Environment
    3. Immutable Configuration
  7. Security and Compliance Limitations
    1. Authentication
    2. Data Residency
    3. Audit Logging
  8. Cost and Usage Limitations
    1. Pay-Per-Use Model
    2. Token Limits
    3. Rate Limits
  9. Integration Limitations
    1. Supported Data Sources
    2. Webhook Reliability
  10. User Interface Limitations
    1. Web UI Only
    2. No Conversation Export
    3. No Feedback Loop
  11. Known Issues
    1. Reranking Availability
    2. LLM Parsing Disabled
  12. Comparison with Alternatives
  13. Requesting Changes

Overview

This document describes the known limitations of the NorthBuilt RAG System. Understanding these constraints helps set appropriate expectations and informs architectural decisions for future enhancements.


Search and Retrieval Limitations

Semantic Search Only

Limitation: The system uses semantic (vector) search exclusively. Keyword-based or hybrid search is not supported.

Impact:

  • Exact phrase matching is not guaranteed
  • Queries relying on specific terminology may return semantically similar but not exact matches
  • Acronyms and technical jargon may not match as expected

Workaround:

  • Include context in queries to improve semantic matching
  • Register important acronyms as client aliases in the entity registry

Future Option: Export to OpenSearch Serverless for hybrid search (see RAG Improvements)

Maximum Retrieval Results

Limitation: Maximum 100 results per query (S3 Vectors TopK limit)

Impact: For very broad queries, some relevant documents may not be retrieved

Current Setting: Default is 5 results, configurable up to 20 via API

No Real-Time Document Updates

Limitation: New documents become searchable after the next ingestion job, not immediately

Impact: Documents uploaded via webhook may take up to 5 minutes to appear in search results

Technical Details:

  • Webhooks upload to S3 immediately
  • Scheduled Lambda triggers ingestion every 5 minutes
  • Ingestion processes the document and adds vectors to S3 Vectors

Workaround: For time-critical documents, trigger manual ingestion via AWS Console


Query Understanding Limitations

Client Recognition Only

Limitation: The system only extracts and filters by client, not by project or other entities

Impact:

  • Cannot filter queries to a specific project within a client
  • All documents from all projects under a client are included in search results

Rationale: This is by design to provide richer context. See Query Understanding Architecture

Entity Recognition Depends on Registry

Limitation: Query understanding only recognizes clients that exist in the DynamoDB entity registry

Impact:

  • New clients must be synced from Linear before queries can filter by them
  • Misspellings or unknown aliases will not match

Workaround: Ensure Linear is properly configured and syncing team data

Clarification Prompts

Limitation: When multiple clients match or confidence is low, the system prompts for clarification rather than guessing

Impact: Some queries require a follow-up interaction to specify the intended client

Threshold: Confidence score below 0.8 triggers clarification


Document and Content Limitations

Document Size Limits

Limit Value Source
Maximum file size 50 MB Bedrock KB Prerequisites
Maximum pages (PDF) 50 pages Bedrock parsing limit
Maximum chunk size 512 tokens Configured in data source
Maximum metadata size 2 KB filterable, 40 KB total S3 Vectors Metadata

Supported Document Formats

Format Support Notes
Markdown (.md) Full Primary format
Plain text (.txt) Full  
PDF Partial Bedrock parsing, max 50 pages
Word (.docx) Not supported Convert to Markdown first
Excel (.xlsx) Not supported Export to CSV/text
HTML Not supported Convert to Markdown
Images Not supported No OCR or image analysis

Content Processing

Limitation: Documents are chunked by token count, not by semantic boundaries

Impact:

  • Important context may be split across chunks
  • Conversations in meeting transcripts may be divided mid-discussion

Mitigation: 20% chunk overlap helps preserve context at boundaries


Performance Limitations

Response Latency

Operation Typical Latency Notes
Query understanding 200-500ms Claude Haiku call
Vector search 100-300ms S3 Vectors query
Reranking 200-400ms Cohere rerank
Response generation 2-4 seconds Claude Sonnet 4.5
Total end-to-end 2.5-5 seconds  

Impact: Real-time or low-latency applications may find response times too slow

No Streaming: Responses are returned only after full generation completes

Concurrency Limits

Resource Limit Notes
Lambda concurrent executions Account default (1000) Can be increased
Reserved concurrency (chat) 10 Configurable
API Gateway rate limit 10 req/s, 20 burst Configurable
Bedrock model throughput On-demand AWS manages

Cold Start Latency

Limitation: Lambda cold starts add 1-2 seconds to first request after idle period

Impact: First query after period of inactivity is slower

Mitigation Options:

  • Provisioned concurrency (additional cost)
  • Warm-up pings (not implemented)

Infrastructure Limitations

Single Region Deployment

Limitation: System runs in us-east-1 only

Impact:

  • Higher latency for users in other regions
  • No failover if us-east-1 has an outage

Future Option: Multi-region deployment (see RAG Improvements)

No Local Development Environment

Limitation: Full system cannot run locally; requires AWS services

Impact:

  • Developers need AWS credentials for testing
  • Integration tests run against production AWS resources
  • No offline development possible

Workaround: Use mocked responses for unit tests; use AWS for integration tests

Immutable Configuration

These settings cannot be changed after creation. See Bedrock KB documentation and S3 Vectors limitations.

Configuration Scope To Change
Vector dimensions Knowledge Base Recreate KB
Embedding model Knowledge Base Recreate KB
Chunking strategy Data Source Recreate data source
Distance metric S3 Vectors Index Recreate index
Non-filterable keys S3 Vectors Index Recreate index

Security and Compliance Limitations

Authentication

Limitation: Only Google OAuth is supported for user authentication

Impact: Organizations using other identity providers cannot integrate directly

Workaround: Configure additional identity providers in Cognito (manual setup)

Data Residency

Limitation: All data stored in US (us-east-1)

Impact: May not meet data residency requirements for EU (GDPR) or other jurisdictions

Future Option: Deploy in EU region (eu-west-1)

Audit Logging

Limitation: Vector operation logging (QueryVectors, PutVectors) not enabled by default

Impact: Cannot audit who queried what content

Enable: Configure CloudTrail data events for S3 Vectors (see RAG Improvements)


Cost and Usage Limitations

Pay-Per-Use Model

Limitation: No fixed monthly cost; costs scale with usage

Impact: Unpredictable costs for variable workloads

Mitigation: Set CloudWatch billing alarms (see Cost Analysis)

Token Limits

Model Max Input Max Output Source
Claude Sonnet 4.5 200K tokens 8K tokens Claude on Bedrock
Claude Haiku 200K tokens 4K tokens Claude on Bedrock
Titan Embeddings v2 8K tokens N/A Titan Embeddings

Impact: Very long documents or conversations may be truncated

Rate Limits

Service Limit Source
S3 Vectors writes 1,000 requests/second S3 Vectors Quotas
S3 Vectors reads Hundreds/second S3 Vectors Quotas
Bedrock inference Account-specific Bedrock Quotas

Integration Limitations

Supported Data Sources

Source Integration Notes
Fathom Webhook + Sync Video transcripts
HelpScout Webhook + Sync Support conversations
Linear Webhook + Sync Teams/Projects (entity registry only)
Manual upload S3 only No API endpoint for direct upload

Webhook Reliability

Limitation: Webhooks are at-most-once delivery; missed webhooks are handled by scheduled sync

Impact: Documents may be delayed if webhook fails and must wait for sync

Sync Frequency: Every 5 minutes (configurable)


User Interface Limitations

Web UI Only

Limitation: Only web browser interface is provided

Impact: No mobile app, desktop app, or CLI tool for end users

No Conversation Export

Limitation: Chat history cannot be exported

Impact: Users cannot save or share conversations

No Feedback Loop

Limitation: No mechanism for users to rate response quality

Impact: Cannot systematically improve responses based on user feedback


Known Issues

Reranking Availability

Issue: Amazon Rerank 1.0 is not available in us-east-1

Current Solution: Using Cohere Rerank 3.5 instead

Status: Working as expected with Cohere

Reference: Bedrock Reranking Supported Models and Regions

LLM Parsing Disabled

Issue: Bedrock LLM parsing exceeds S3 Vectors 2KB filterable metadata limit

Current Solution: LLM parsing disabled; metadata provided via sidecar files

Status: 100% ingestion success rate with current configuration

Reference: S3 Vectors Metadata Filtering - 2KB filterable, 40KB non-filterable limits


Comparison with Alternatives

Feature NorthBuilt RAG OpenSearch Pinecone
Hybrid search No Yes Yes
Multi-region No Yes Yes
Real-time indexing No (5 min) Near real-time Near real-time
Managed service Fully Serverless option Fully
Cost model Pay-per-use OCU-based Pod-based

Requesting Changes

For feature requests or to address limitations:

  1. Check if the limitation is documented in RAG Improvements
  2. Create a GitHub issue with the feature request
  3. Include use case and business justification

Last updated: 2026-01-01