API Reference

Complete API documentation for the NorthBuilt RAG System.

Overview
Authentication
Endpoints
Rate Limits
Error Codes
Query Understanding
Best Practices
SDK Examples
Changelog

Overview

The NorthBuilt RAG System exposes REST APIs via AWS API Gateway. All endpoints require JWT authentication via AWS Cognito.

Chat API: https://{api-id}.execute-api.us-east-1.amazonaws.com/v1

Authentication

All API requests require a valid JWT token from AWS Cognito in the Authorization header.

Obtaining a Token

Users authenticate via Google OAuth through the Cognito Hosted UI:

Navigate to the web application
Click “Sign in with Google”
Complete Google OAuth flow
Token is stored in browser and included in API requests automatically

For programmatic access, use the Cognito SDK:

import boto3

cognito = boto3.client('cognito-idp', region_name='us-east-1')
response = cognito.initiate_auth(
    ClientId='your-client-id',
    AuthFlow='USER_PASSWORD_AUTH',
    AuthParameters={
        'USERNAME': 'user@example.com',
        'PASSWORD': 'password'
    }
)
token = response['AuthenticationResult']['IdToken']

Using the Token

Include the JWT in the Authorization header:

Authorization: Bearer eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9...

Token Expiration

ID Token: 1 hour
Refresh Token: 30 days

When tokens expire, the web application automatically refreshes them using the refresh token.

Endpoints

POST /chat (Recommended)

Query the RAG system with real-time streaming responses via Server-Sent Events (SSE). Provides immediate feedback as the response generates.

URL: POST /chat

Headers:

Header	Required	Description
`Authorization`	Yes	Bearer token from Cognito
`Content-Type`	Yes	`application/json`
`Accept`	No	`text/event-stream` (optional)

Request Body:

{
  "session_id": "unique-session-id",
  "message": "What did we discuss with Acme about pricing?",
  "client": "Optional explicit client filter",
  "max_results": 5
}

Field	Type	Required	Default	Description
`session_id`	string	Yes	-	Session ID for conversation history
`message`	string	Yes	-	Natural language question
`client`	string	No	extracted from query	Explicit client filter
`max_results`	integer	No	5	Number of documents to retrieve (1-20)

Response (SSE Stream):

The response is a stream of Server-Sent Events:

event: sources
data: {"sources": [{"title": "Meeting Notes", "client": "Acme Corp", ...}]}

event: token
data: {"text": "Based on"}

event: token
data: {"text": " the meeting"}

event: token
data: {"text": " notes..."}

event: done
data: {"done": true, "message_id": "msg_abc123"}

Event Types:

Event	Description
`sources`	Retrieved documents used for the response
`token`	Partial response text (streamed incrementally)
`done`	Stream completion with message ID and filters applied
`error`	Error information if something fails

Source Object Schema:

Each source in the sources event contains:

{
  "document_number": 1,
  "relevance_score": 0.8542,
  "snippet": "Meeting notes excerpt with context...",
  "metadata": {
    "client": "Acme Corp",
    "project": "Q1 Planning",
    "source": "fathom_meeting_12345.md"
  },
  "location": {
    "type": "S3",
    "s3Location": {
      "uri": "s3://bucket/path/to/document.md"
    }
  },
  "document_url": "https://bucket.s3.amazonaws.com/path/to/document.md?X-Amz-..."
}

Field	Type	Description
`document_number`	integer	Sequential number for display (1-indexed)
`relevance_score`	float	Relevance score from 0.0 to 1.0
`snippet`	string	First 500 characters of document content
`metadata`	object	Document metadata (client, project, source, category)
`location`	object	S3 location information from Bedrock KB
`document_url`	string	Pre-signed S3 URL for secure document access (1-hour expiry)

The document_url field provides a time-limited secure link to the original source document, allowing users to view the full document context.

Response (Clarification Needed):

When clarification is needed, returns JSON instead of SSE (check Content-Type header):

{
  "type": "clarification_needed",
  "message": "I found multiple clients matching 'Acme'. Which one?",
  "options": [
    {"id": "acme-corp", "display_name": "Acme Corporation"},
    {"id": "acme-labs", "display_name": "Acme Labs"}
  ],
  "original_query": "What about Acme?",
  "clarification_type": "ambiguous",
  "session_id": "sess_abc123"
}

GET /chat/{id}

Retrieve conversation history for a session.

URL: GET /chat/{id}

Headers:

Header	Required	Description
`Authorization`	Yes	Bearer token from Cognito

Response:

{
  "session_id": "sess_abc123",
  "messages": [
    {
      "role": "user",
      "content": "What did we discuss with Acme?",
      "timestamp": 1736334600
    },
    {
      "role": "assistant",
      "content": "Based on the meeting notes...",
      "timestamp": 1736334605,
      "sources": [
        {
          "document_number": 1,
          "relevance_score": 0.8542,
          "snippet": "Meeting notes excerpt...",
          "metadata": {"client": "Acme Corp"},
          "location": {"s3Location": {"uri": "s3://bucket/doc.md"}},
          "document_url": "https://bucket.s3.amazonaws.com/doc.md?X-Amz-..."
        }
      ]
    }
  ]
}

Note: The document_url field contains a freshly-generated pre-signed URL (1-hour expiry). URLs are regenerated on each history retrieval to ensure they are always valid, even for older conversations.

Webhook Endpoints

Webhook endpoints receive data from external services. These are internal endpoints secured by API key validation.

POST /webhooks/fathom

Receives video transcript webhooks from Fathom.

Authentication: API key in request header or body (configured in Fathom)

Payload: Fathom webhook format (see Fathom documentation)

POST /webhooks/helpscout

Receives conversation webhooks from HelpScout.

Authentication: API key validation (configured in HelpScout)

Payload: HelpScout webhook format (see HelpScout documentation)

Sync Endpoints

Manual sync endpoints for triggering data ingestion. All sync endpoints require JWT authentication.

POST /sync/fathom

Trigger a manual sync of Fathom recordings.

Authentication: JWT Bearer token (Cognito)

Response: 202 Accepted with worker invocation confirmation

POST /sync/helpscout

Trigger a manual sync of HelpScout conversations.

Authentication: JWT Bearer token (Cognito)

Response: 202 Accepted with worker invocation confirmation

Rate Limits

The API enforces rate limits to ensure fair usage:

Limit	Value
Rate	10 requests/second
Burst	20 requests

When rate limited, the API returns HTTP 429:

{
  "error": "TOO_MANY_REQUESTS",
  "message": "Rate limit exceeded. Please wait and try again.",
  "retry_after": 1
}

Error Codes

HTTP Status	Error Code	Description	Resolution
200	-	Success	N/A
400	`BAD_REQUEST`	Invalid request	Check request body format
401	`UNAUTHORIZED`	Authentication failed	Refresh JWT token
403	`FORBIDDEN`	Access denied	Check user permissions
429	`TOO_MANY_REQUESTS`	Rate limited	Wait and retry
500	`INTERNAL_ERROR`	Server error	Check CloudWatch logs
503	`SERVICE_UNAVAILABLE`	Bedrock/KB unavailable	Retry after delay

Query Understanding

The system automatically extracts client context from natural language queries. This enables users to ask questions naturally without needing to select filters from dropdowns.

How It Works

User submits query: “What did we discuss with Acme?”
System extracts client mention: “Acme”
System matches to known client: “Acme Corporation”
System applies filter and retrieves relevant documents

Supported Patterns

The query understanding recognizes various ways users might mention clients:

Direct name: “What about Acme Corporation?”
Abbreviation: “What did Acme say?”
Aliases: “Tell me about AC” (if “AC” is registered as an alias)

Confidence and Clarification

When the system has low confidence or finds ambiguous matches, it returns a clarification request rather than potentially incorrect results.

Best Practices

Query Formulation

Be specific: “What was discussed in the December meeting with Acme?” is better than “meetings”
Include context: Mention client names, project names, or dates when relevant
Ask one question at a time: Complex multi-part questions may yield less focused answers

Session Management

Use consistent session_id for follow-up questions in a conversation
Start a new session for unrelated queries
Sessions maintain context for 24 hours

Error Handling

Implement exponential backoff for 429 and 503 errors
Cache responses when appropriate to reduce API calls
Log correlation IDs from responses for debugging

SDK Examples

Python (Streaming)

import requests
import json

API_URL = "https://your-api-id.execute-api.us-east-1.amazonaws.com/v1"
TOKEN = "your-jwt-token"

def query_rag_streaming(message: str, session_id: str):
    """Query the RAG system with streaming response."""
    response = requests.post(
        f"{API_URL}/chat",
        headers={
            "Authorization": f"Bearer {TOKEN}",
            "Content-Type": "application/json",
            "Accept": "text/event-stream"
        },
        json={
            "message": message,
            "session_id": session_id
        },
        stream=True
    )
    response.raise_for_status()

    # Parse SSE events
    for line in response.iter_lines():
        if line:
            line = line.decode('utf-8')
            if line.startswith('data:'):
                data = json.loads(line[5:].strip())
                yield data

# Example usage
for event in query_rag_streaming("What did we discuss with Acme?", "session-123"):
    if 'text' in event:
        print(event['text'], end='', flush=True)
    elif 'sources' in event:
        print(f"\n\nSources: {len(event['sources'])} documents")

JavaScript/TypeScript (Streaming)

async function* queryRAGStreaming(
  message: string,
  sessionId: string
): AsyncGenerator<StreamEvent> {
  const response = await fetch(`${API_URL}/chat`, {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${token}`,
      'Content-Type': 'application/json',
      'Accept': 'text/event-stream',
    },
    body: JSON.stringify({
      message,
      session_id: sessionId,
    }),
  });

  if (!response.ok) {
    throw new Error(`API error: ${response.status}`);
  }

  const reader = response.body?.getReader();
  const decoder = new TextDecoder();

  while (reader) {
    const { done, value } = await reader.read();
    if (done) break;

    const chunk = decoder.decode(value);
    // Parse SSE events from chunk
    for (const line of chunk.split('\n')) {
      if (line.startsWith('data:')) {
        yield JSON.parse(line.slice(5).trim());
      }
    }
  }
}

// Example usage
for await (const event of queryRAGStreaming('What about Acme?', 'session-123')) {
  if (event.text) process.stdout.write(event.text);
}

cURL (Streaming)

curl -N -X POST "https://your-api-id.execute-api.us-east-1.amazonaws.com/v1/chat" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -H "Accept: text/event-stream" \
  -d '{
    "session_id": "test-session",
    "message": "What did we discuss with Acme about pricing?",
    "max_results": 5
  }'

Changelog

Date	Change
2026-01-12	Added Sync Endpoints section for manual data ingestion
2026-01-09	Removed batch `/chat` endpoint - use `/chat` for all chat operations
2026-01-08	Added streaming endpoint documentation (`/chat`)
2026-01-01	Updated for client-only filtering (removed project filter)
2025-12-31	Added clarification response format
2025-12-29	Added reranking for improved relevance

Last updated: 2026-01-17