Query-Driven Metadata Filtering
This document describes the query understanding system that extracts client filters from natural language queries to improve RAG retrieval accuracy.
Overview
Instead of requiring users to select clients from dropdown menus, the system automatically extracts client mentions from natural language queries and applies them as metadata filters to the Bedrock Knowledge Base retrieval. All projects under a client are accessible to build richer context.
Example:
User: "What did we discuss with Acme?"
|
System extracts: client=Acme Corporation
|
Filtered retrieval: Searches all documents for that client (across all projects)
Architecture
+---------------------------------------------------------------------+
| User Query |
| "What did we discuss with Acme?" |
+---------------------------------------------------------------------+
|
v
+---------------------------------------------------------------------+
| STEP 1: Load Known Clients |
| (DynamoDB Entity Registry) |
| |
| Fetch list of known clients with aliases |
+---------------------------------------------------------------------+
|
v
+---------------------------------------------------------------------+
| STEP 2: Query Understanding |
| (Claude via Bedrock Invoke) |
| |
| Input: User query + Known clients + Conversation history |
| Output: { |
| "cleaned_query": "What did we discuss?", |
| "filters": {"client": "Acme Corporation"}, |
| "confidence": 0.95, |
| "ambiguous": false, |
| "resolved_from_context": false |
| } |
+---------------------------------------------------------------------+
|
+-----------+-----------+
| |
confidence >= 0.8 confidence < 0.8
no ambiguity OR ambiguous
| |
v v
+---------------------------+ +---------------------------+
| STEP 3a: Build Filter | | STEP 3b: Prompt User |
| Apply to KB Retrieve | | Ask for clarification |
+---------------------------+ +---------------------------+
|
v
+---------------------------------------------------------------------+
| STEP 4: Filtered Retrieval |
| (Bedrock KB Retrieve API) |
| |
| retrieve(filter={"equals": {"key": "client", "value": "acme"}}) |
+---------------------------------------------------------------------+
|
v
+---------------------------------------------------------------------+
| STEP 5: Response Generation |
| (Claude via Bedrock) |
+---------------------------------------------------------------------+
Components
1. Entity Registry (DynamoDB)
Stores known clients with aliases for fuzzy matching. Clients are managed via the Management UI in the web application.
Table: nb-rag-sys
| Attribute | Type | Description |
|---|---|---|
PK |
String (PK) | CLIENT#<id> |
SK |
String (SK) | METADATA |
EntityType |
String (GSI) | CLIENT (for EntityTypeIndex GSI) |
Name |
String | Display name (human-readable) |
Aliases |
List |
Alternative names for matching |
CreatedAt |
String | ISO8601 timestamp |
UpdatedAt |
String | ISO8601 timestamp |
The EntityTypeIndex GSI enables efficient queries by entity type for the query understanding module.
Example Client Item:
{
"PK": "CLIENT#b2362d41-0364-4325-9b1e-a32b7e2d9255",
"SK": "METADATA",
"EntityType": "CLIENT",
"Name": "Acme Corporation",
"Aliases": ["Acme", "ACME"],
"Description": "Technology company"
}
2. Query Understanding Module
Location: lambda/node/chat/lib/query-understanding.js (Node.js) or lambda/shared/utils/query_understanding.py (Python)
Uses Claude to analyze queries and extract client references, with support for conversation context to resolve pronoun references.
Input:
- User query string
- List of known clients
- Conversation history (optional, for context-aware resolution)
Output:
class QueryUnderstanding {
originalQuery: string; // Original query for reference
cleanedQuery: string; // Query with client mentions removed
filters: { client?: string }; // Extracted filters {client: "..."}
confidence: number; // 0.0-1.0 confidence score
ambiguous: boolean; // True if multiple possible matches
ambiguousMatches: Array; // List of ambiguous matches for user prompt
noEntitiesFound: boolean; // True if client mentioned but not found
}
Confidence Threshold: 0.8 (industry standard for entity extraction)
Context-Aware Resolution
When conversation history is available, the system can resolve pronouns and references like “their”, “they”, “the company” to the client being discussed in the conversation. This improves UX by allowing natural follow-up questions without repeating the client name.
Example:
User: "Tell me about Acme Corporation"
Assistant: "Acme Corporation is a technology company..."
User: "What are their main challenges?"
System: Resolves "their" → "Acme Corporation" (confidence: 0.95)
The LLM prompt includes:
- Recent conversation messages (last 4 turns)
- The most recent client filter used in the conversation
- Instructions to resolve pronouns with high confidence (0.9+) when context is clear
3. Filter Builder
Location: lambda/shared/utils/filters.py
Converts extracted metadata into Bedrock KB RetrievalFilter format.
Supported Operators (S3 Vectors compatible):
equals- Exact matchnotEquals- Exclusionin- Multiple valuesnotIn- Multiple exclusionsgreaterThan,lessThan- Numeric/date comparisonsandAll,orAll- Logical combinations
Note: startsWith and stringContains are NOT supported with S3 Vectors.
4. User Prompting
When the system cannot confidently extract filters, it prompts the user.
Scenarios requiring prompts:
- No client found - Show available clients and ask user to specify
- Low confidence (< 0.8) - Ask user to confirm or clarify
- Ambiguous match - Multiple clients match, ask user to choose
Prompt Response Format:
{
"type": "clarification_needed",
"message": "I found multiple clients that match 'Acme'. Which one did you mean?",
"options": [
{"id": "Acme Corporation", "display_name": "Acme Corporation", "entity_type": "client"},
{"id": "Acme Labs", "display_name": "Acme Labs", "entity_type": "client"}
],
"original_query": "What did Acme say about pricing?"
}
API Changes
Chat Lambda Request
{
"query": "What did Acme say about pricing?",
"session_id": "session-id",
"client": "optional-explicit-client"
}
Chat Lambda Response
Normal response:
{
"answer": "Based on the meeting notes...",
"sources": [...],
"filters_applied": {"client": "Acme Corporation"}
}
Clarification needed:
{
"type": "clarification_needed",
"message": "Which client did you mean?",
"options": [...],
"original_query": "..."
}
Why Client-Only Filtering
The system filters by client only (not project) for these reasons:
- Richer context: All projects under a client are accessible in RAG queries, providing more comprehensive answers
- Simpler queries: Users only need to specify the client, not both client and project
- Fewer clarification prompts: No project ambiguity to resolve
- Faster extraction: Simpler LLM prompt = faster response from Claude Haiku
- Security boundary: Client-level isolation is the real security boundary
Project metadata is still stored on documents for display purposes and auditing, but not used for filtering.
Cost Considerations
Query understanding adds one Claude API call per chat message for client extraction.
Estimated additional cost per query:
- Input tokens: ~300 (query + client list + prompt)
- Output tokens: ~50 (structured JSON response)
- Cost: ~$0.001 per query (Claude Haiku)
Recommendation: Use Claude Haiku for query understanding (fast, cost-effective for structured extraction).
Monthly cost estimate (1000 queries/month):
- Haiku: ~$1/month additional
Security Considerations
- Tenant isolation: Filters are applied server-side; users cannot bypass them
- Client validation: Only known clients from DynamoDB can be used as filters
- No client-side filter construction: Filters are built from LLM extraction, not user input
- Defense-in-depth: Retrieved documents are validated post-retrieval to ensure client match
Adaptive Retrieval Multiplier
To compensate for potential filter mismatches in metadata, the system retrieves more candidates when filtering:
- No filter: 2x multiplier (standard over-retrieval for quality)
- With client filter: 3x multiplier (some results may be filtered out)
This ensures sufficient high-quality results even after filtering.
References
- Multi-tenancy in RAG with metadata filtering - AWS Blog
- RetrievalFilter API Reference - AWS Docs
Last updated: 2026-01-09