Query-Driven Metadata Filtering

This document describes the query understanding system that extracts client filters from natural language queries to improve RAG retrieval accuracy.

Overview

Instead of requiring users to select clients from dropdown menus, the system automatically extracts client mentions from natural language queries and applies them as metadata filters to the Bedrock Knowledge Base retrieval. All projects under a client are accessible to build richer context.

Example:

User: "What did we discuss with Acme?"
       |
System extracts: client=Acme Corporation
       |
Filtered retrieval: Searches all documents for that client (across all projects)

Architecture

+---------------------------------------------------------------------+
|                        User Query                                    |
|              "What did we discuss with Acme?"                        |
+---------------------------------------------------------------------+
                                |
                                v
+---------------------------------------------------------------------+
|                   STEP 1: Load Known Clients                         |
|                   (DynamoDB Entity Registry)                         |
|                                                                      |
|  Fetch list of known clients with aliases                            |
+---------------------------------------------------------------------+
                                |
                                v
+---------------------------------------------------------------------+
|                   STEP 2: Query Understanding                        |
|                   (Claude via Bedrock Invoke)                        |
|                                                                      |
|  Input: User query + Known clients + Conversation history            |
|  Output: {                                                           |
|    "cleaned_query": "What did we discuss?",                          |
|    "filters": {"client": "Acme Corporation"},                        |
|    "confidence": 0.95,                                               |
|    "ambiguous": false,                                               |
|    "resolved_from_context": false                                    |
|  }                                                                   |
+---------------------------------------------------------------------+
                                |
                    +-----------+-----------+
                    |                       |
            confidence >= 0.8        confidence < 0.8
            no ambiguity             OR ambiguous
                    |                       |
                    v                       v
+---------------------------+   +---------------------------+
|  STEP 3a: Build Filter    |   |  STEP 3b: Prompt User     |
|  Apply to KB Retrieve     |   |  Ask for clarification    |
+---------------------------+   +---------------------------+
                    |
                    v
+---------------------------------------------------------------------+
|                   STEP 4: Filtered Retrieval                         |
|                   (Bedrock KB Retrieve API)                          |
|                                                                      |
|  retrieve(filter={"equals": {"key": "client", "value": "acme"}})     |
+---------------------------------------------------------------------+
                                |
                                v
+---------------------------------------------------------------------+
|                   STEP 5: Response Generation                        |
|                   (Claude via Bedrock)                               |
+---------------------------------------------------------------------+

Components

1. Entity Registry (DynamoDB)

Stores known clients with aliases for fuzzy matching. Clients are managed via the Management UI in the web application.

Table: nb-rag-sys

Attribute Type Description
PK String (PK) CLIENT#<id>
SK String (SK) METADATA
EntityType String (GSI) CLIENT (for EntityTypeIndex GSI)
Name String Display name (human-readable)
Aliases List Alternative names for matching
CreatedAt String ISO8601 timestamp
UpdatedAt String ISO8601 timestamp

The EntityTypeIndex GSI enables efficient queries by entity type for the query understanding module.

Example Client Item:

{
  "PK": "CLIENT#b2362d41-0364-4325-9b1e-a32b7e2d9255",
  "SK": "METADATA",
  "EntityType": "CLIENT",
  "Name": "Acme Corporation",
  "Aliases": ["Acme", "ACME"],
  "Description": "Technology company"
}

2. Query Understanding Module

Location: lambda/node/chat/lib/query-understanding.js (Node.js) or lambda/shared/utils/query_understanding.py (Python)

Uses Claude to analyze queries and extract client references, with support for conversation context to resolve pronoun references.

Input:

  • User query string
  • List of known clients
  • Conversation history (optional, for context-aware resolution)

Output:

class QueryUnderstanding {
  originalQuery: string;         // Original query for reference
  cleanedQuery: string;          // Query with client mentions removed
  filters: { client?: string };  // Extracted filters {client: "..."}
  confidence: number;            // 0.0-1.0 confidence score
  ambiguous: boolean;            // True if multiple possible matches
  ambiguousMatches: Array;       // List of ambiguous matches for user prompt
  noEntitiesFound: boolean;      // True if client mentioned but not found
}

Confidence Threshold: 0.8 (industry standard for entity extraction)

Context-Aware Resolution

When conversation history is available, the system can resolve pronouns and references like “their”, “they”, “the company” to the client being discussed in the conversation. This improves UX by allowing natural follow-up questions without repeating the client name.

Example:

User: "Tell me about Acme Corporation"
Assistant: "Acme Corporation is a technology company..."

User: "What are their main challenges?"
System: Resolves "their""Acme Corporation" (confidence: 0.95)

The LLM prompt includes:

  • Recent conversation messages (last 4 turns)
  • The most recent client filter used in the conversation
  • Instructions to resolve pronouns with high confidence (0.9+) when context is clear

3. Filter Builder

Location: lambda/shared/utils/filters.py

Converts extracted metadata into Bedrock KB RetrievalFilter format.

Supported Operators (S3 Vectors compatible):

  • equals - Exact match
  • notEquals - Exclusion
  • in - Multiple values
  • notIn - Multiple exclusions
  • greaterThan, lessThan - Numeric/date comparisons
  • andAll, orAll - Logical combinations

Note: startsWith and stringContains are NOT supported with S3 Vectors.

4. User Prompting

When the system cannot confidently extract filters, it prompts the user.

Scenarios requiring prompts:

  1. No client found - Show available clients and ask user to specify
  2. Low confidence (< 0.8) - Ask user to confirm or clarify
  3. Ambiguous match - Multiple clients match, ask user to choose

Prompt Response Format:

{
  "type": "clarification_needed",
  "message": "I found multiple clients that match 'Acme'. Which one did you mean?",
  "options": [
    {"id": "Acme Corporation", "display_name": "Acme Corporation", "entity_type": "client"},
    {"id": "Acme Labs", "display_name": "Acme Labs", "entity_type": "client"}
  ],
  "original_query": "What did Acme say about pricing?"
}

API Changes

Chat Lambda Request

{
  "query": "What did Acme say about pricing?",
  "session_id": "session-id",
  "client": "optional-explicit-client"
}

Chat Lambda Response

Normal response:

{
  "answer": "Based on the meeting notes...",
  "sources": [...],
  "filters_applied": {"client": "Acme Corporation"}
}

Clarification needed:

{
  "type": "clarification_needed",
  "message": "Which client did you mean?",
  "options": [...],
  "original_query": "..."
}

Why Client-Only Filtering

The system filters by client only (not project) for these reasons:

  1. Richer context: All projects under a client are accessible in RAG queries, providing more comprehensive answers
  2. Simpler queries: Users only need to specify the client, not both client and project
  3. Fewer clarification prompts: No project ambiguity to resolve
  4. Faster extraction: Simpler LLM prompt = faster response from Claude Haiku
  5. Security boundary: Client-level isolation is the real security boundary

Project metadata is still stored on documents for display purposes and auditing, but not used for filtering.

Cost Considerations

Query understanding adds one Claude API call per chat message for client extraction.

Estimated additional cost per query:

  • Input tokens: ~300 (query + client list + prompt)
  • Output tokens: ~50 (structured JSON response)
  • Cost: ~$0.001 per query (Claude Haiku)

Recommendation: Use Claude Haiku for query understanding (fast, cost-effective for structured extraction).

Monthly cost estimate (1000 queries/month):

  • Haiku: ~$1/month additional

Security Considerations

  1. Tenant isolation: Filters are applied server-side; users cannot bypass them
  2. Client validation: Only known clients from DynamoDB can be used as filters
  3. No client-side filter construction: Filters are built from LLM extraction, not user input
  4. Defense-in-depth: Retrieved documents are validated post-retrieval to ensure client match

Adaptive Retrieval Multiplier

To compensate for potential filter mismatches in metadata, the system retrieves more candidates when filtering:

  • No filter: 2x multiplier (standard over-retrieval for quality)
  • With client filter: 3x multiplier (some results may be filtered out)

This ensures sufficient high-quality results even after filtering.

References


Last updated: 2026-01-09