API Reference

Complete API documentation for the NorthBuilt RAG System.

Table of Contents

  1. Overview
  2. Authentication
    1. Obtaining a Token
    2. Using the Token
    3. Token Expiration
  3. Endpoints
    1. POST /chat (Recommended)
    2. GET /chat/{id}
    3. Webhook Endpoints
      1. POST /webhooks/fathom
      2. POST /webhooks/helpscout
    4. Sync Endpoints
      1. POST /sync/fathom
      2. POST /sync/helpscout
  4. Rate Limits
  5. Error Codes
  6. Query Understanding
    1. How It Works
    2. Supported Patterns
    3. Confidence and Clarification
  7. Best Practices
    1. Query Formulation
    2. Session Management
    3. Error Handling
  8. SDK Examples
    1. Python (Streaming)
    2. JavaScript/TypeScript (Streaming)
    3. cURL (Streaming)
  9. Changelog

Overview

The NorthBuilt RAG System exposes REST APIs via AWS API Gateway. All endpoints require JWT authentication via AWS Cognito.

Chat API: https://{api-id}.execute-api.us-east-1.amazonaws.com/v1


Authentication

All API requests require a valid JWT token from AWS Cognito in the Authorization header.

Obtaining a Token

Users authenticate via Google OAuth through the Cognito Hosted UI:

  1. Navigate to the web application
  2. Click “Sign in with Google”
  3. Complete Google OAuth flow
  4. Token is stored in browser and included in API requests automatically

For programmatic access, use the Cognito SDK:

import boto3

cognito = boto3.client('cognito-idp', region_name='us-east-1')
response = cognito.initiate_auth(
    ClientId='your-client-id',
    AuthFlow='USER_PASSWORD_AUTH',
    AuthParameters={
        'USERNAME': 'user@example.com',
        'PASSWORD': 'password'
    }
)
token = response['AuthenticationResult']['IdToken']

Using the Token

Include the JWT in the Authorization header:

Authorization: Bearer eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9...

Token Expiration

  • ID Token: 1 hour
  • Refresh Token: 30 days

When tokens expire, the web application automatically refreshes them using the refresh token.


Endpoints

POST /chat (Recommended)

Query the RAG system with real-time streaming responses via Server-Sent Events (SSE). Provides immediate feedback as the response generates.

URL: POST /chat

Headers:

Header Required Description
Authorization Yes Bearer token from Cognito
Content-Type Yes application/json
Accept No text/event-stream (optional)

Request Body:

{
  "session_id": "unique-session-id",
  "message": "What did we discuss with Acme about pricing?",
  "client": "Optional explicit client filter",
  "max_results": 5
}
Field Type Required Default Description
session_id string Yes - Session ID for conversation history
message string Yes - Natural language question
client string No extracted from query Explicit client filter
max_results integer No 5 Number of documents to retrieve (1-20)

Response (SSE Stream):

The response is a stream of Server-Sent Events:

event: sources
data: {"sources": [{"title": "Meeting Notes", "client": "Acme Corp", ...}]}

event: token
data: {"text": "Based on"}

event: token
data: {"text": " the meeting"}

event: token
data: {"text": " notes..."}

event: done
data: {"done": true, "message_id": "msg_abc123"}

Event Types:

Event Description
sources Retrieved documents used for the response
token Partial response text (streamed incrementally)
done Stream completion with message ID and filters applied
error Error information if something fails

Source Object Schema:

Each source in the sources event contains:

{
  "document_number": 1,
  "relevance_score": 0.8542,
  "snippet": "Meeting notes excerpt with context...",
  "metadata": {
    "client": "Acme Corp",
    "project": "Q1 Planning",
    "source": "fathom_meeting_12345.md"
  },
  "location": {
    "type": "S3",
    "s3Location": {
      "uri": "s3://bucket/path/to/document.md"
    }
  },
  "document_url": "https://bucket.s3.amazonaws.com/path/to/document.md?X-Amz-..."
}
Field Type Description
document_number integer Sequential number for display (1-indexed)
relevance_score float Relevance score from 0.0 to 1.0
snippet string First 500 characters of document content
metadata object Document metadata (client, project, source, category)
location object S3 location information from Bedrock KB
document_url string Pre-signed S3 URL for secure document access (1-hour expiry)

The document_url field provides a time-limited secure link to the original source document, allowing users to view the full document context.

Response (Clarification Needed):

When clarification is needed, returns JSON instead of SSE (check Content-Type header):

{
  "type": "clarification_needed",
  "message": "I found multiple clients matching 'Acme'. Which one?",
  "options": [
    {"id": "acme-corp", "display_name": "Acme Corporation"},
    {"id": "acme-labs", "display_name": "Acme Labs"}
  ],
  "original_query": "What about Acme?",
  "clarification_type": "ambiguous",
  "session_id": "sess_abc123"
}

GET /chat/{id}

Retrieve conversation history for a session.

URL: GET /chat/{id}

Headers:

Header Required Description
Authorization Yes Bearer token from Cognito

Response:

{
  "session_id": "sess_abc123",
  "messages": [
    {
      "role": "user",
      "content": "What did we discuss with Acme?",
      "timestamp": 1736334600
    },
    {
      "role": "assistant",
      "content": "Based on the meeting notes...",
      "timestamp": 1736334605,
      "sources": [
        {
          "document_number": 1,
          "relevance_score": 0.8542,
          "snippet": "Meeting notes excerpt...",
          "metadata": {"client": "Acme Corp"},
          "location": {"s3Location": {"uri": "s3://bucket/doc.md"}},
          "document_url": "https://bucket.s3.amazonaws.com/doc.md?X-Amz-..."
        }
      ]
    }
  ]
}

Note: The document_url field contains a freshly-generated pre-signed URL (1-hour expiry). URLs are regenerated on each history retrieval to ensure they are always valid, even for older conversations.


Webhook Endpoints

Webhook endpoints receive data from external services. These are internal endpoints secured by API key validation.

POST /webhooks/fathom

Receives video transcript webhooks from Fathom.

Authentication: API key in request header or body (configured in Fathom)

Payload: Fathom webhook format (see Fathom documentation)

POST /webhooks/helpscout

Receives conversation webhooks from HelpScout.

Authentication: API key validation (configured in HelpScout)

Payload: HelpScout webhook format (see HelpScout documentation)


Sync Endpoints

Manual sync endpoints for triggering data ingestion. All sync endpoints require JWT authentication.

POST /sync/fathom

Trigger a manual sync of Fathom recordings.

Authentication: JWT Bearer token (Cognito)

Response: 202 Accepted with worker invocation confirmation

POST /sync/helpscout

Trigger a manual sync of HelpScout conversations.

Authentication: JWT Bearer token (Cognito)

Response: 202 Accepted with worker invocation confirmation


Rate Limits

The API enforces rate limits to ensure fair usage:

Limit Value
Rate 10 requests/second
Burst 20 requests

When rate limited, the API returns HTTP 429:

{
  "error": "TOO_MANY_REQUESTS",
  "message": "Rate limit exceeded. Please wait and try again.",
  "retry_after": 1
}

Error Codes

HTTP Status Error Code Description Resolution
200 - Success N/A
400 BAD_REQUEST Invalid request Check request body format
401 UNAUTHORIZED Authentication failed Refresh JWT token
403 FORBIDDEN Access denied Check user permissions
429 TOO_MANY_REQUESTS Rate limited Wait and retry
500 INTERNAL_ERROR Server error Check CloudWatch logs
503 SERVICE_UNAVAILABLE Bedrock/KB unavailable Retry after delay

Query Understanding

The system automatically extracts client context from natural language queries. This enables users to ask questions naturally without needing to select filters from dropdowns.

How It Works

  1. User submits query: “What did we discuss with Acme?”
  2. System extracts client mention: “Acme”
  3. System matches to known client: “Acme Corporation”
  4. System applies filter and retrieves relevant documents

Supported Patterns

The query understanding recognizes various ways users might mention clients:

  • Direct name: “What about Acme Corporation?”
  • Abbreviation: “What did Acme say?”
  • Aliases: “Tell me about AC” (if “AC” is registered as an alias)

Confidence and Clarification

When the system has low confidence or finds ambiguous matches, it returns a clarification request rather than potentially incorrect results.


Best Practices

Query Formulation

  • Be specific: “What was discussed in the December meeting with Acme?” is better than “meetings”
  • Include context: Mention client names, project names, or dates when relevant
  • Ask one question at a time: Complex multi-part questions may yield less focused answers

Session Management

  • Use consistent session_id for follow-up questions in a conversation
  • Start a new session for unrelated queries
  • Sessions maintain context for 24 hours

Error Handling

  • Implement exponential backoff for 429 and 503 errors
  • Cache responses when appropriate to reduce API calls
  • Log correlation IDs from responses for debugging

SDK Examples

Python (Streaming)

import requests
import json

API_URL = "https://your-api-id.execute-api.us-east-1.amazonaws.com/v1"
TOKEN = "your-jwt-token"

def query_rag_streaming(message: str, session_id: str):
    """Query the RAG system with streaming response."""
    response = requests.post(
        f"{API_URL}/chat",
        headers={
            "Authorization": f"Bearer {TOKEN}",
            "Content-Type": "application/json",
            "Accept": "text/event-stream"
        },
        json={
            "message": message,
            "session_id": session_id
        },
        stream=True
    )
    response.raise_for_status()

    # Parse SSE events
    for line in response.iter_lines():
        if line:
            line = line.decode('utf-8')
            if line.startswith('data:'):
                data = json.loads(line[5:].strip())
                yield data

# Example usage
for event in query_rag_streaming("What did we discuss with Acme?", "session-123"):
    if 'text' in event:
        print(event['text'], end='', flush=True)
    elif 'sources' in event:
        print(f"\n\nSources: {len(event['sources'])} documents")

JavaScript/TypeScript (Streaming)

async function* queryRAGStreaming(
  message: string,
  sessionId: string
): AsyncGenerator<StreamEvent> {
  const response = await fetch(`${API_URL}/chat`, {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${token}`,
      'Content-Type': 'application/json',
      'Accept': 'text/event-stream',
    },
    body: JSON.stringify({
      message,
      session_id: sessionId,
    }),
  });

  if (!response.ok) {
    throw new Error(`API error: ${response.status}`);
  }

  const reader = response.body?.getReader();
  const decoder = new TextDecoder();

  while (reader) {
    const { done, value } = await reader.read();
    if (done) break;

    const chunk = decoder.decode(value);
    // Parse SSE events from chunk
    for (const line of chunk.split('\n')) {
      if (line.startsWith('data:')) {
        yield JSON.parse(line.slice(5).trim());
      }
    }
  }
}

// Example usage
for await (const event of queryRAGStreaming('What about Acme?', 'session-123')) {
  if (event.text) process.stdout.write(event.text);
}

cURL (Streaming)

curl -N -X POST "https://your-api-id.execute-api.us-east-1.amazonaws.com/v1/chat" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -H "Accept: text/event-stream" \
  -d '{
    "session_id": "test-session",
    "message": "What did we discuss with Acme about pricing?",
    "max_results": 5
  }'

Changelog

Date Change
2026-01-12 Added Sync Endpoints section for manual data ingestion
2026-01-09 Removed batch /chat endpoint - use /chat for all chat operations
2026-01-08 Added streaming endpoint documentation (/chat)
2026-01-01 Updated for client-only filtering (removed project filter)
2025-12-31 Added clarification response format
2025-12-29 Added reranking for improved relevance

Last updated: 2026-01-17