RAG System Changelog
Chronological record of implemented improvements, optimizations, and fixes for the NorthBuilt RAG system. For pending improvements and backlog items, see RAG Improvements.
Table of Contents
2026
2026-01-11: GET /chat/{id} Empty Response Fix
Fixed critical bug where GET /chat/{id} endpoint returned empty responses despite Lambda executing successfully.
Root Cause:
The Terraform deployment triggers for aws_api_gateway_deployment.streaming only included resource IDs, not the full integration objects. When response_transfer_mode = "STREAM" was added to the GET integration on 2026-01-10, Terraform didn’t detect this as a change requiring redeployment because the integration ID remained the same.
This caused the GET endpoint to use responseTransferMode: "BUFFERED" (the default) instead of "STREAM", resulting in empty responses. API Gateway access logs confirmed this discrepancy.
Fix:
Updated terraform/modules/api-gateway-rest/main.tf to include full integration objects in the deployment triggers instead of just their IDs:
resource "aws_api_gateway_deployment" "streaming" {
triggers = {
redeployment = sha1(jsonencode([
aws_api_gateway_integration.chat_stream_lambda, # Full object, not .id
aws_api_gateway_integration.chat_stream_get_lambda, # Full object, not .id
# ... other resources
]))
}
}
Debugging Approach:
- Invoked Lambda directly via AWS CLI - returned correct data
- Checked API Gateway access logs:
aws logs tail /aws/apigateway/nb-rag-sys-rest - Found GET requests showed
"responseTransferMode":"BUFFERED"while POST showed"STREAM" - Checked Terraform state -
response_transfer_mode = "STREAM"was configured but deployment wasn’t triggered
Files Modified:
| File | Change |
|---|---|
terraform/modules/api-gateway-rest/main.tf |
Changed deployment triggers to include full integration objects |
Lesson Learned:
When API Gateway integration settings are changed, ensure the deployment triggers reference the full integration resource (not just ID) to detect setting changes like response_transfer_mode.
2026-01-10: Source Citation Improvements
Enhanced source citations with secure document links and improved snippet formatting.
New Features:
- Pre-signed S3 URLs for Source Documents
- Each source now includes a “View Document” link
- Generates secure, time-limited URLs (1-hour expiry) for direct document access
- Uses AWS S3 pre-signed URL mechanism via
@aws-sdk/s3-request-presigner - Added S3 GetObject permission to chat Lambda IAM role
- Markdown Snippet Rendering
- Source snippets are now rendered as Markdown using ReactMarkdown with remark-gfm
- Supports headers, bold/italic, lists, code blocks, and tables
- Improves readability of structured content from meeting notes and documentation
- Intelligent Markdown Preprocessing
- Added
preprocessMarkdownSnippet()utility to restore logical line breaks - Bedrock KB strips newlines during indexing for vector similarity optimization
- Preprocessing detects headers, bold labels, and list items to insert line breaks
- Ensures proper rendering of structured Markdown content
- Added
- Intelligent Snippet Truncation
- Increased snippet size from 300 to 500 characters for better context
- Truncates at sentence boundaries when possible (looking for
.,?,!) - Falls back to character limit with ellipsis if no sentence boundary found
- URL Regeneration on History Retrieval
- Pre-signed URLs expire after 1 hour, so viewing old conversations showed broken links
- GET /chat/{id} now regenerates fresh pre-signed URLs for all sources
- Uses
regenerateSourceUrls()helper with Promise.all for parallel URL generation - Fixed API Gateway integration to use streaming invocation endpoint with
response_transfer_mode = "STREAM"(required forInvokeWithResponseStream)
- URL Validation for Security
- Added
isValidUrl()utility function - Only allows
http:andhttps:protocols - Blocks dangerous protocols like
javascript:,file:,data: - Validates URLs before rendering links in the UI
- Added
Files Modified:
| File | Change |
|---|---|
web/src/types/index.ts |
Added document_url field to Source interface |
web/src/components/Chat/ChatMessage.tsx |
Markdown rendering for snippets, View Document link |
web/src/lib/utils.ts |
Added preprocessMarkdownSnippet() and isValidUrl() |
web/src/lib/utils.test.ts |
22 new tests for preprocessing and URL validation |
web/src/components/Chat/ChatMessage.test.tsx |
4 new tests for document_url feature |
lambda/node/chat/index.js |
Pre-signed URL generation, intelligent snippet truncation, URL regeneration |
lambda/node/chat/__tests__/helpers.test.js |
41 tests for createSnippet, parseS3Uri, regenerateSourceUrls |
lambda/node/chat/package.json |
Added S3 SDK dependencies |
terraform/modules/lambda/main.tf |
Added S3 GetObject IAM permission |
terraform/modules/api-gateway-rest/main.tf |
Fixed GET /chat/{id} to use streaming invocation endpoint |
docs/reference/api.md |
Source Object Schema, history response documentation |
docs/reference/user-guide.md |
Updated source citation documentation |
Removed Dead Code:
- Deleted orphaned
SourceCard.tsxcomponent (unused, ChatMessage handles sources inline) - Deleted corresponding
SourceCard.test.tsxtest file
2026-01-08: Streaming Response Support
Implemented real-time streaming responses using Lambda Response Streaming and Server-Sent Events (SSE). This was a significant architectural change from the batch-response Python Lambda to a streaming Node.js Lambda.
Architecture:
| Component | Technology | Purpose |
|---|---|---|
| Lambda Runtime | Node.js 22 | Native streaming support via awslambda.streamifyResponse() |
| API Gateway | REST API (not HTTP API) | Response streaming URIs for SSE |
| Protocol | Server-Sent Events (SSE) | Real-time token delivery |
| Frontend | React EventSource API | Progressive response rendering |
Implementation Details:
- Node.js Chat Lambda (
lambda/node/chat/):- Uses
@aws-sdk/client-bedrock-agent-runtimefor Knowledge Base retrieval - Uses
@aws-sdk/client-bedrock-runtimewithinvokeModelWithResponseStreamfor streaming - Implements conversation history via DynamoDB
- Supports query understanding and clarification prompts
- GET endpoint for history retrieval (
GET /chat/{id})
- Uses
-
SSE Event Types:
event: sources data: {"sources": [...]} event: token data: {"text": "partial response..."} event: done data: {"done": true, "message_id": "..."} event: error data: {"error": "message"} - Clarification Handling:
- Pre-flight check before streaming begins
- Returns JSON (not SSE) when clarification needed
- Frontend detects
Content-Type: application/jsonvstext/event-stream
- Terraform Configuration:
- REST API Gateway with Lambda proxy integration
- Response streaming URI:
arn:aws:apigateway:{region}:lambda:path/.../invocations/response-streaming - Cognito authorizer for authentication
- CORS configuration for web access
Files Created/Modified:
| File | Change |
|---|---|
lambda/node/chat/index.js |
Main streaming handler |
lambda/node/chat/lib/dynamodb.js |
Conversation history operations |
lambda/node/chat/lib/entities.js |
Entity loading from DynamoDB |
lambda/node/chat/lib/query-understanding.js |
Client extraction with LLM |
lambda/node/chat/lib/clarification.js |
Clarification prompt building |
lambda/node/chat/lib/logger.js |
Structured logging with redaction |
terraform/modules/api-gateway-rest/ |
New REST API module |
terraform/modules/lambda/main.tf |
Chat Lambda configuration |
web/src/lib/api.ts |
SSE streaming client |
web/src/hooks/useChat.ts |
Streaming state management |
Testing:
- 150+ Jest unit tests for Node.js Lambda (
npm testinlambda/node/chat/) - Integration tests in
tests/test_chat_integration.py - Snyk security scanning in CI/CD pipeline
Performance Improvement:
- Time to first token: < 2 seconds (previously 3-5 seconds for full response)
- Users see response progressively instead of waiting for completion
- Streaming enabled by default in production
Related: ADR-011: Lambda Response Streaming
2026-01-08: Bedrock Knowledge Base Documentation Audit
Comprehensive audit of all 36 Amazon Bedrock Knowledge Base documentation pages against current implementation. This audit complements the S3 Vectors audit from 2026-01-07.
Documentation Reviewed:
- Knowledge Base overview, architecture, and how it works
- Data ingestion process and retrieval mechanisms
- Chunking strategies (NONE, FIXED_SIZE, HIERARCHICAL, SEMANTIC)
- Advanced parsing options (default, Bedrock Data Automation, Foundation Model)
- Custom transformation Lambda functions
- Metadata configuration and filtering
- S3 data source connector configuration
- Direct ingestion API (IngestKnowledgeBaseDocuments)
- Security configuration and deployment best practices
- Supported models, regions, and vector stores
Confirmed Correct Implementation:
| Configuration | Current Value | Documentation Alignment |
|---|---|---|
| Storage type | S3 Vectors | Supported, cost-effective option |
| Embedding model | Titan Text Embeddings V2 (1024 dim) | Correct for text-only content |
| Embedding data type | FLOAT32 | Standard precision (BINARY available for cost savings) |
| Chunking strategy | FIXED_SIZE (512 tokens, 20% overlap) | Recommended for S3 Vectors |
| Parsing strategy | Disabled (sidecar metadata) | Correct - avoids 2KB metadata limit |
| Data deletion policy | DELETE | Enables automatic vector cleanup |
| Reranking | Cohere Rerank v3.5 (optional) | Correct model for us-east-1 |
| Multi-tenant isolation | Client metadata filtering | Proper implementation with defense-in-depth |
| IAM role configuration | Bedrock service role with S3/S3Vectors permissions | Correct trust policy with conditions |
| Retry configuration | Adaptive retry with 3 attempts | Best practice for Bedrock API reliability |
Key Documentation Insights:
-
Immutable Configurations: Confirmed that chunking strategy, parsing strategy, embedding model, and vector store type cannot be changed after creation.
-
Sidecar Metadata Files: Documentation confirms our approach of using
.metadata.jsonsidecar files instead of LLM parsing. Current implementation inbuild_bedrock_metadata_json()follows the correct format. -
Incremental Sync: Documentation confirms that
StartIngestionJobonly processes added, modified, or deleted documents since last sync. -
Direct Ingestion API: Documentation reveals
IngestKnowledgeBaseDocumentsAPI for immediate ingestion (up to 25 documents per call). -
includeForEmbedding Option: Documentation shows metadata can optionally be included in vector embeddings for improved semantic search.
2026-01-07: S3 Vectors Documentation Audit
Comprehensive audit of all 39 S3 Vectors documentation pages against current implementation.
Confirmed Correct Implementation:
| Configuration | Current Value | Status |
|---|---|---|
| Non-filterable metadata keys | AMAZON_BEDROCK_TEXT, AMAZON_BEDROCK_METADATA |
Correct - avoids 2KB limit |
| Fixed-size chunking | 512 tokens, 20% overlap | Optimal for S3 Vectors |
| LLM parsing | Disabled | Correct - sidecar metadata used |
| Distance metric | Cosine | Appropriate for text embeddings |
| Dimensions | 1024 (Titan v2) | Matches embedding model |
| Data deletion policy | DELETE | Enables automatic cleanup |
| Encryption | SSE-S3 (AES256) | Sufficient for current needs |
Defense-in-Depth Validation:
The chat handler implements post-filtering validation (handler.py:366) that rejects documents where metadata client doesn’t match the filter. This provides a second layer of security beyond vector search filtering.
Sidecar Metadata Files:
The build_bedrock_metadata_json() utility (validation.py:360-448) correctly generates Bedrock KB metadata sidecar files with proper type annotations (STRING/NUMBER). Includes validation for Bedrock KB limits (35 keys max, 1KB per value).
Filter Validation:
The validate_filter_for_s3_vectors() function (filters.py:205-238) correctly rejects unsupported operators (startsWith, stringContains).
Recommendations Implemented:
- Metadata Value Size Validation - Added to
build_bedrock_metadata_json()invalidation.py - Query Latency Metrics - Added
RAGMetricsclass inmetrics.py, integrated into chat handler - S3 Vectors Debugging Runbook - Added to
docs/operations/runbook.md
2025
2025-12-31: S3 Vectors Metadata Fix (Critical)
Fixed critical ingestion failures caused by S3 Vectors 2KB filterable metadata limit. Previously 97% of documents failed to index.
| Improvement | File | Change |
|---|---|---|
| Non-filterable metadata keys | terraform/main.tf |
Added AMAZON_BEDROCK_TEXT and AMAZON_BEDROCK_METADATA as non-filterable |
| LLM parsing disabled | terraform/modules/bedrock/main.tf |
Removed parsing configuration (sidecar metadata is sufficient) |
| Data deletion policy | terraform/modules/bedrock/main.tf |
Changed from RETAIN to DELETE for automatic vector cleanup |
Root Cause Analysis:
- S3 Vectors has a 2KB limit on filterable metadata per vector
- Bedrock KB stores
AMAZON_BEDROCK_TEXT(chunk content) andAMAZON_BEDROCK_METADATA(extracted metadata) as filterable by default - For larger document chunks, this exceeded the 2KB limit
Solution:
Configure these as non-filterable in the S3 Vectors index. They can still use the 40KB non-filterable limit while our custom fields (source, client, category) remain filterable for multi-tenant isolation. Project metadata is stored for display purposes but not used for filtering.
Result: 100% ingestion success rate (594/594 documents indexed)
References:
2025-12-29: RAG Retrieval Quality Improvements
Based on Bedrock Knowledge Base documentation audit, the following HIGH priority improvements were implemented:
| Improvement | File | Change |
|---|---|---|
| Reranking support | lambda/chat/handler.py |
Optional Bedrock reranking model (cohere.rerank-v3-5:0) to re-score retrieved documents |
| Adaptive retrieval multiplier | lambda/chat/handler.py |
Dynamic multiplier: 2x base, 3x with client filter |
| Query sanitization | lambda/shared/utils/validation.py |
Added sanitize_query() function for input safety |
| Query sanitization applied | lambda/chat/handler.py |
Sanitizes user queries before RAG retrieval |
Reranking Details:
- Uses Cohere Rerank 3.5 (
cohere.rerank-v3-5:0) to re-score documents against the actual query - Disabled by default - requires accepting Cohere AWS Marketplace agreement
- Enable via
ENABLE_RERANKING=trueenvironment variable after accepting agreement - Note: Amazon Rerank 1.0 is not available in us-east-1; Cohere is the supported option
- Significantly improves relevance for ambiguous queries when enabled
- Configured to rerank to top
max_resultsafter initial retrieval - See Bedrock Reranking Guide for supported models and regions
Adaptive Retrieval Multiplier:
# No filters: 2x (standard over-retrieval)
# With client filter: 3x (compensates for filtering)
This ensures sufficient candidates are retrieved even when client metadata filtering reduces the result set. Note: The system uses client-only filtering (not project-level) to allow all documents from a client’s projects to be accessible for richer context.
2025-12-29: S3 Vectors Optimization
The following improvements were implemented based on a comprehensive audit of S3 Vectors documentation and codebase analysis:
| Improvement | File | Change |
|---|---|---|
| Chunk size optimization | terraform/modules/bedrock/main.tf |
Increased from 300 to 512 tokens |
| Chunk overlap | terraform/modules/bedrock/main.tf |
Increased from 10% to 20% |
| Parsing prompt robustness | terraform/modules/bedrock/main.tf |
Added multi-step extraction with fallbacks |
| Bedrock retry logic | lambda/chat/handler.py |
Added adaptive retry with 3 attempts |
| Configurable temperature | lambda/chat/handler.py |
Added LLM_TEMPERATURE env var |
| Configurable max tokens | lambda/chat/handler.py |
Added LLM_MAX_TOKENS env var |
Last updated: 2026-01-11