Codebase Audit Findings

Comprehensive security, code quality, and architecture audit of the NorthBuilt RAG System. This document tracks issues identified during code reviews and their resolution status.

Table of Contents

  1. Audit Overview
    1. Summary Statistics
  2. Critical Issues
    1. CRIT-001: Hardcoded AWS Account ID in Repository
      1. Description
      2. Current Code
      3. Recommended Fix
      4. Impact
      5. Resolution Steps
    2. CRIT-002: Missing DynamoDB State Locking for Terraform
      1. Description
      2. Current Code
      3. Recommended Fix
      4. Impact
      5. Resolution Steps
    3. CRIT-003: Overly Permissive CloudWatch IAM Policy
      1. Description
      2. Current Code
      3. Recommended Fix
      4. Impact
      5. Resolution Steps
    4. CRIT-004: Bare Exception Handlers Silently Suppress Errors
      1. Description
      2. Current Code (Fathom)
      3. Current Code (HelpScout)
      4. Recommended Fix
      5. Impact
      6. Resolution Steps
    5. CRIT-005: Missing JSON Decode Error Handling
      1. Description
      2. Current Code
      3. Example Failure
      4. Recommended Fix
      5. Impact
      6. Resolution Steps
    6. CRIT-006: Missing Request Timeouts in HelpScout Connection
      1. Description
      2. Current Code
      3. Recommended Fix
      4. Impact
      5. Resolution Steps
    7. CRIT-007: Hardcoded Dates in Integration Tests
      1. Description
      2. Current Code
      3. Recommended Fix
      4. Impact
      5. Resolution Steps
    8. CRIT-008: Untested Classification Strategies
      1. Description
      2. Current Code
      3. Recommended Fix
      4. Impact
      5. Resolution Steps
  3. High Severity Issues
    1. HIGH-001: Cognito Security Features Disabled
      1. Description
      2. Impact
      3. Resolution Steps
    2. HIGH-002: CORS Wildcard Default in API Gateway
      1. Description
      2. Recommended Fix
    3. HIGH-003: Missing DynamoDB Encryption Configuration
      1. Description
      2. Recommended Fix
    4. HIGH-004: S3 Buckets Using AWS-Managed Encryption Keys
      1. Description
      2. Recommended Fix
    5. HIGH-005: CloudFront Missing Security Headers
      1. Description
      2. Recommended Fix
    6. HIGH-006: No Lambda Reserved Concurrency Limits
      1. Description
      2. Recommended Fix
    7. HIGH-007: Missing Lambda Dead Letter Queues
      1. Description
      2. Recommended Fix
    8. HIGH-008: Short Log Retention (7 Days)
      1. Description
      2. Recommended Fix
    9. HIGH-009: No Retry Logic for External API Calls
      1. Description
      2. Recommended Fix
    10. HIGH-010: Tokens Stored in localStorage (Web)
      1. Description
      2. Recommended Fix
    11. MED-002: No MFA Configuration
    12. MED-003: S3 Backup Immediate Glacier Transition
    13. MED-004: Unimplemented Strategies Return Null Silently
    14. MED-005: DynamoDB Client Created Per Request
    15. MED-006: Hardcoded Rate Limiting Sleep Values
    16. MED-007: DEBUG Mode Logs Sensitive Data (Web)
    17. MED-008: Error Messages Expose Backend Details (Web)
    18. MED-009: Incomplete Test Body
    19. MED-010 through MED-025: Additional Medium Issues
  4. Low Severity Issues
    1. LOW-001 through LOW-020: Minor Issues
  5. Positive Findings
    1. Security Strengths
    2. Code Quality Strengths
    3. Infrastructure Strengths
  6. Resolution Tracking
    1. Sprint 1 (Immediate)
    2. Sprint 2 (Short-Term)
    3. Sprint 3 (Medium-Term)
    4. Future
  7. Appendix: Related Documents
  8. Audit Findings - 2025-12-30
    1. NEW-001: Frontend Debug Mode Defaults to True
      1. Description
      2. Recommended Fix
    2. NEW-002: Missing EntityType in Classify Lambda Response
      1. Description
      2. Recommended Fix
    3. NEW-003: Query Understanding Not Documented in CLAUDE.md
      1. Description
      2. Recommended Fix
    4. NEW-004: CLASSIFY_TABLE Environment Variable Undocumented
      1. Description
      2. Recommended Fix
    5. NEW-005: System Architecture Doc Says “Vanilla JS”
      1. Description
      2. Recommended Fix
    6. NEW-006: No Entity Loader Tests
      1. Description
      2. Recommended Fix
    7. NEW-007: Integration Tests Not Run on PRs
      1. Description
      2. Recommended Fix
    8. NEW-008: No Post-Deployment Smoke Tests
      1. Description
      2. Recommended Fix
    9. NEW-009: Security Gates Don’t Fail Builds
      1. Description
      2. Recommended Fix
    10. NEW-010: No Token Refresh Logic in Frontend
      1. Description
      2. Recommended Fix

Audit Overview

Audit Date: 2025-12-29 Scope: Full codebase including Lambda functions, Terraform infrastructure, React web application, shared utilities, and test suite.

Summary Statistics

Severity Count Status
Critical 8 Open
High 17 1 Resolved, 16 Open
Medium 25 Open
Low 20+ Open

Critical Issues

Issues that pose significant security risks, could cause data loss, or will cause system failures.


CRIT-001: Hardcoded AWS Account ID in Repository

Severity: Critical Component: Terraform File: terraform/variables.tf (lines 196-197) Status: Open

Description

The AWS account ID and ACM certificate ARN are hardcoded as default values in the Terraform variables file. This is a security anti-pattern because:

  1. Account ID Exposure: While not a secret per se, the AWS account ID provides attackers with targeting information for social engineering, phishing, or crafting valid-looking IAM principal ARNs.

  2. Certificate Tracking: The ACM certificate ARN reveals the exact certificate being used, which could be useful for reconnaissance.

  3. Version Control Persistence: Once committed, these values exist in Git history permanently, even if removed later.

Current Code

variable "acm_certificate_arn" {
  description = "ACM certificate ARN for CloudFront distribution"
  type        = string
  default     = "arn:aws:acm:us-east-1:554192034098:certificate/883924c2-0006-476f-bce6-05e25ebcb9e7"
}

Remove the hardcoded default and require explicit configuration:

variable "acm_certificate_arn" {
  description = "ACM certificate ARN for CloudFront distribution. Must be in us-east-1 for CloudFront."
  type        = string
  # No default - must be provided via terraform.tfvars or environment variable
}

Then create a terraform.tfvars.example file (committed) and terraform.tfvars (gitignored):

# terraform.tfvars.example
acm_certificate_arn = "arn:aws:acm:us-east-1:YOUR_ACCOUNT_ID:certificate/YOUR_CERT_ID"

Impact

  • Security: Reduces attack surface
  • Operations: Requires explicit configuration per environment

Resolution Steps

  1. Create terraform.tfvars.example with placeholder values
  2. Add terraform.tfvars to .gitignore
  3. Remove default values from variables.tf
  4. Update deployment documentation
  5. Consider using AWS Secrets Manager or Parameter Store for sensitive configurations

CRIT-002: Missing DynamoDB State Locking for Terraform

Severity: Critical Component: Terraform File: terraform/backend.tf (lines 12-19) Status: Open

Description

The Terraform backend configuration uses S3 native locking (use_lockfile = true) instead of DynamoDB state locking. S3 native locking is:

  1. Less Reliable: S3’s conditional writes are eventually consistent in some edge cases
  2. Not Recommended: AWS and HashiCorp recommend DynamoDB for production state locking
  3. Race Condition Risk: Concurrent terraform apply operations could corrupt state

Current Code

terraform {
  backend "s3" {
    bucket       = "nb-rag-sys-terraform-state"
    key          = "terraform.tfstate"
    region       = "us-east-1"
    encrypt      = true
    use_lockfile = true  # S3 native locking - NOT RECOMMENDED
  }
}

Add DynamoDB state locking:

terraform {
  backend "s3" {
    bucket         = "nb-rag-sys-terraform-state"
    key            = "terraform.tfstate"
    region         = "us-east-1"
    encrypt        = true
    dynamodb_table = "nb-rag-sys-terraform-lock"
    # Remove use_lockfile when using DynamoDB
  }
}

Create the DynamoDB table:

resource "aws_dynamodb_table" "terraform_lock" {
  name         = "nb-rag-sys-terraform-lock"
  billing_mode = "PAY_PER_REQUEST"
  hash_key     = "LockID"

  attribute {
    name = "LockID"
    type = "S"
  }

  tags = {
    Name    = "Terraform State Lock"
    Purpose = "terraform-state-locking"
  }
}

Impact

  • Reliability: Prevents state corruption from concurrent operations
  • Safety: CI/CD pipelines and multiple developers can safely run Terraform

Resolution Steps

  1. Create DynamoDB table for locking (can be done manually or via bootstrap script)
  2. Update backend configuration
  3. Run terraform init -migrate-state to apply new backend config
  4. Test with concurrent plan operations to verify locking works

CRIT-003: Overly Permissive CloudWatch IAM Policy

Severity: Critical Component: Terraform File: terraform/modules/lambda/main.tf (lines 536-542) Status: Open

Description

Lambda functions are granted cloudwatch:PutMetricData permission with a wildcard resource (Resource = "*"). This violates the principle of least privilege because:

  1. Metric Pollution: Lambda could write metrics to any namespace, including AWS-reserved namespaces
  2. Cost Implications: Malicious or buggy code could create thousands of custom metrics (each costing $0.30/month)
  3. Monitoring Confusion: Metrics could be written to incorrect namespaces, confusing dashboards

Current Code

{
  Effect = "Allow"
  Action = [
    "cloudwatch:PutMetricData"
  ]
  Resource = "*"  # Too permissive
}

Unfortunately, CloudWatch PutMetricData does not support resource-level permissions. The mitigation is to add a condition limiting to specific namespaces:

{
  Effect = "Allow"
  Action = [
    "cloudwatch:PutMetricData"
  ]
  Resource = "*"
  Condition = {
    StringEquals = {
      "cloudwatch:namespace" = [
        "NorthBuilt/RAG",
        "NorthBuilt/Sync",
        "NorthBuilt/Webhooks"
      ]
    }
  }
}

Update Lambda code to use only these approved namespaces:

cloudwatch.put_metric_data(
    Namespace='NorthBuilt/RAG',  # Must match IAM condition
    MetricData=[...]
)

Impact

  • Security: Limits metric write scope to approved namespaces
  • Cost: Prevents accidental metric creation in wrong namespaces

Resolution Steps

  1. Audit all Lambda functions for current namespace usage
  2. Define approved namespace list
  3. Update IAM policy with namespace condition
  4. Update Lambda code to use approved namespaces
  5. Test metric publishing still works

CRIT-004: Bare Exception Handlers Silently Suppress Errors

Severity: Critical Component: Lambda Shared Utilities Files:

  • lambda/shared/connections/fathom.py (line 260)
  • lambda/shared/connections/helpscout.py (line 536) Status: Open

Description

Bare except: clauses (without specifying exception types) catch ALL exceptions, including critical system exceptions like SystemExit, KeyboardInterrupt, and MemoryError. This can:

  1. Mask Real Bugs: Programming errors like NameError, TypeError, or AttributeError are silently swallowed
  2. Hide Memory Issues: MemoryError exceptions go unnoticed, leading to silent failures
  3. Prevent Debugging: No logging or stack traces for unexpected exceptions
  4. Break Signal Handling: KeyboardInterrupt and SystemExit are caught, preventing graceful shutdown

Current Code (Fathom)

# lambda/shared/connections/fathom.py:260
try:
    # Date calculation logic
    from datetime import datetime
    if isinstance(recording_start, str):
        recording_start = datetime.fromisoformat(recording_start.replace('Z', '+00:00'))
    if isinstance(recording_end, str):
        recording_end = datetime.fromisoformat(recording_end.replace('Z', '+00:00'))
    duration = int((recording_end - recording_start).total_seconds() / 60)
except:  # DANGEROUS: Catches everything including MemoryError, SystemExit
    duration = 0

Current Code (HelpScout)

# lambda/shared/connections/helpscout.py:536
try:
    dt = datetime.fromisoformat(dt_string.replace('Z', '+00:00'))
    return dt.strftime('%B %d, %Y %I:%M %p')
except:  # DANGEROUS: Silently returns original string for ANY error
    return dt_string

Catch only the specific exceptions that are expected:

# Fathom duration calculation
try:
    from datetime import datetime
    if isinstance(recording_start, str):
        recording_start = datetime.fromisoformat(recording_start.replace('Z', '+00:00'))
    if isinstance(recording_end, str):
        recording_end = datetime.fromisoformat(recording_end.replace('Z', '+00:00'))
    duration = int((recording_end - recording_start).total_seconds() / 60)
except (ValueError, TypeError, AttributeError) as e:
    logger.warning(f"Failed to calculate meeting duration: {e}")
    duration = 0

# HelpScout datetime formatting
try:
    dt = datetime.fromisoformat(dt_string.replace('Z', '+00:00'))
    return dt.strftime('%B %d, %Y %I:%M %p')
except (ValueError, TypeError) as e:
    logger.warning(f"Failed to parse datetime '{dt_string}': {e}")
    return dt_string

Impact

  • Reliability: Unexpected errors are properly logged and can be debugged
  • Observability: CloudWatch logs will show warnings for parsing failures
  • Safety: System exceptions (MemoryError, SystemExit) propagate correctly

Resolution Steps

  1. Replace all bare except: clauses in the codebase
  2. Identify specific exception types that should be caught
  3. Add logging for caught exceptions
  4. Run linting with pylint or flake8 to catch future occurrences
  5. Add bare-except check to CI/CD pipeline

CRIT-005: Missing JSON Decode Error Handling

Severity: Critical Component: Lambda Shared Utilities Files:

  • lambda/shared/utils/utils.py (line 48)
  • lambda/shared/common_utils.py (line 48) Status: Open

Description

The parse_event_body() function calls json.loads() without catching json.JSONDecodeError. When API Gateway passes a malformed JSON body, the Lambda crashes with an unhandled exception instead of returning a proper 400 Bad Request response.

This is a critical issue because:

  1. Poor User Experience: Users see a generic 500 error instead of helpful “Invalid JSON” message
  2. Confusing Error Logs: Stack traces for invalid JSON look like bugs
  3. Security Information Leak: Some JSON parsers include the malformed input in error messages

Current Code

def parse_event_body(event: Dict[str, Any]) -> Dict[str, Any]:
    """Parse the event body from API Gateway."""
    if 'body' in event:
        body = event['body']
        if isinstance(body, str):
            return json.loads(body)  # CRASHES on invalid JSON!
        return body
    return event

Example Failure

Request with invalid JSON:

curl -X POST https://api.example.com/chat \
  -H "Content-Type: application/json" \
  -d '{"message": "hello'  # Missing closing brace and quote

Result: Lambda crashes with json.JSONDecodeError, API Gateway returns 500.

def parse_event_body(event: Dict[str, Any]) -> Dict[str, Any]:
    """
    Parse the event body from API Gateway.

    Args:
        event: API Gateway event dictionary

    Returns:
        Parsed body as dictionary

    Raises:
        ValueError: If body contains invalid JSON
    """
    if 'body' in event:
        body = event['body']
        if isinstance(body, str):
            try:
                return json.loads(body)
            except json.JSONDecodeError as e:
                # Log the error but don't include the malformed input (security)
                logger.warning(f"Invalid JSON in request body at position {e.pos}")
                raise ValueError(f"Invalid JSON in request body: {e.msg}") from e
        return body
    return event

Then update Lambda handlers to catch ValueError and return 400:

try:
    body = parse_event_body(event)
except ValueError as e:
    return {
        'statusCode': 400,
        'headers': {'Content-Type': 'application/json'},
        'body': json.dumps({
            'error': 'Invalid request',
            'message': str(e)
        })
    }

Impact

  • User Experience: Clear error messages for malformed requests
  • Security: Prevents information leakage in error messages
  • Operations: Cleaner logs without stack traces for user errors

Resolution Steps

  1. Update parse_event_body() to catch and re-raise as ValueError
  2. Update all Lambda handlers to catch ValueError and return 400
  3. Add unit tests for malformed JSON handling
  4. Test with various malformed payloads

CRIT-006: Missing Request Timeouts in HelpScout Connection

Severity: Critical Component: Lambda Shared Connections File: lambda/shared/connections/helpscout.py (lines 82, 181, 227) Status: Open

Description

Three HTTP requests in the HelpScout connection class are missing the timeout parameter. Without explicit timeouts, requests can hang indefinitely, causing:

  1. Lambda Timeout: The Lambda function hits its 15-minute maximum and is killed
  2. Resource Exhaustion: Connection pools are exhausted by hanging requests
  3. Cost Increase: Lambda is billed for the full hanging duration
  4. No Error Recovery: No opportunity for retry logic to kick in

Current Code

# Line 82 - authenticate() method
response = requests.get(f'{self.base_url}/users/me', headers=headers)
# NO TIMEOUT - can hang forever

# Line 181 - fetch_all_items() method
response = requests.get(url, headers=self._get_headers(), params=params)
# NO TIMEOUT - can hang forever

# Line 227 - fetch_conversation() method
response = requests.get(url, headers=self._get_headers(), params=params)
# NO TIMEOUT - can hang forever

Note: The token request (line 103) correctly has timeout=10, but other requests are inconsistent.

Add explicit timeouts to all HTTP requests:

# Define timeout constants at module level
HELPSCOUT_CONNECT_TIMEOUT = 5   # seconds to establish connection
HELPSCOUT_READ_TIMEOUT = 30     # seconds to receive response
HELPSCOUT_TIMEOUT = (HELPSCOUT_CONNECT_TIMEOUT, HELPSCOUT_READ_TIMEOUT)

# In authenticate() method
response = requests.get(
    f'{self.base_url}/users/me',
    headers=headers,
    timeout=HELPSCOUT_TIMEOUT
)

# In fetch_all_items() method
response = requests.get(
    url,
    headers=self._get_headers(),
    params=params,
    timeout=HELPSCOUT_TIMEOUT
)

# In fetch_conversation() method
response = requests.get(
    url,
    headers=self._get_headers(),
    params=params,
    timeout=HELPSCOUT_TIMEOUT
)

Impact

  • Reliability: Failed requests timeout quickly instead of hanging
  • Cost: Prevents Lambda from running for 15 minutes on a single hung request
  • Recovery: Allows retry logic to attempt alternative actions

Resolution Steps

  1. Define timeout constants in the module
  2. Add timeout to all requests.get() and requests.post() calls
  3. Add similar timeouts to fathom.py and linear.py connections
  4. Consider adding exponential backoff retry logic
  5. Add unit tests for timeout behavior

CRIT-007: Hardcoded Dates in Integration Tests

Severity: Critical Component: Test Suite Files:

  • tests/test_fathom_webhook.py (lines 30-31)
  • tests/test_helpscout_webhook.py (lines 41, 58, 75, 92)
  • tests/test_linear_webhook.py (lines 35-36, 50-51, 57) Status: Open

Description

Integration tests use hardcoded future dates like “2025-11-05” and “2025-12-06”. These tests will start failing when the current date passes these hardcoded values, because:

  1. Date Validation Logic: Some code validates that dates are not in the past
  2. TTL Calculations: DynamoDB TTL values may become negative
  3. Sorting Logic: Tests expecting chronological order may fail

This is a time bomb that will cause CI/CD failures on or after November 5, 2025.

Current Code

# test_fathom_webhook.py:30-31
"recording_start_time": "2025-11-05T10:00:00Z",  # WILL FAIL after Nov 5, 2025
"recording_end_time": "2025-11-05T11:00:00Z",

# test_helpscout_webhook.py:41
"createdAt": "2025-11-05T10:00:00Z",  # WILL FAIL after Nov 5, 2025

# test_linear_webhook.py:35-36
"createdAt": "2025-12-06T14:00:00Z",  # WILL FAIL after Dec 6, 2025
"updatedAt": "2025-12-06T14:30:00Z",

Use dynamic date generation:

from datetime import datetime, timedelta, timezone

def get_future_timestamp(days_ahead: int = 30) -> str:
    """Generate an ISO timestamp N days in the future."""
    future_date = datetime.now(timezone.utc) + timedelta(days=days_ahead)
    return future_date.strftime("%Y-%m-%dT%H:%M:%SZ")

# In test fixtures
@pytest.fixture
def sample_fathom_event():
    return {
        "recording_start_time": get_future_timestamp(30),
        "recording_end_time": get_future_timestamp(30),  # or add 1 hour
        # ...
    }

Or use freezegun to freeze time during tests:

from freezegun import freeze_time

@freeze_time("2025-06-15T12:00:00Z")
def test_webhook_processing():
    event = {
        "recording_start_time": "2025-11-05T10:00:00Z",  # Always in the future
        # ...
    }
    # Test logic

Impact

  • CI/CD Reliability: Tests won’t randomly fail based on current date
  • Maintainability: No need to update tests periodically

Resolution Steps

  1. Install freezegun as dev dependency: pip install freezegun
  2. Update all test files with hardcoded dates
  3. Either use dynamic dates or freeze time with @freeze_time
  4. Add CI/CD check for hardcoded date patterns
  5. Document the chosen approach in test conventions

CRIT-008: Untested Classification Strategies

Severity: Critical Component: Test Suite Files:

  • lambda/classify/strategies/slack.py - 0% test coverage
  • lambda/classify/strategies/basecamp.py - 0% test coverage
  • lambda/classify/strategies/googledrive.py - 0% test coverage Status: Open

Description

Three classification strategy classes are registered in the StrategyFactory but have absolutely no unit tests:

  1. SlackStrategy - Returns null_result() silently
  2. BasecampStrategy - Returns null_result() silently
  3. GoogleDriveStrategy - Returns null_result() silently

The test file lambda/classify/tests/test_strategies.py only tests FathomStrategy and HelpScoutStrategy.

This is critical because:

  1. Silent Failures: If someone tries to use these integrations, classification silently returns null
  2. No Error Indication: No exception, no log, just empty results
  3. Factory Mismatch: test_handler.py line 229 loops through all 6 sources but only 2 have actual test coverage
  4. Future Bug Risk: When implemented, no tests will catch regressions

Current Code

# lambda/classify/strategies/slack.py:12
class SlackStrategy(BaseStrategy):
    def classify(self, data: Any, table_name: str) -> Dict[str, Optional[str]]:
        """Classify Slack data."""
        # TODO: Implement when Slack integration is ready.
        return self.null_result()  # Silent failure!

Option A: Remove from factory until implemented

# lambda/classify/factory.py
STRATEGIES = {
    'fathom': FathomStrategy,
    'helpscout': HelpScoutStrategy,
    'linear': LinearStrategy,
    # Removed until implemented:
    # 'slack': SlackStrategy,
    # 'basecamp': BasecampStrategy,
    # 'googledrive': GoogleDriveStrategy,
}

Option B: Raise NotImplementedError

class SlackStrategy(BaseStrategy):
    def classify(self, data: Any, table_name: str) -> Dict[str, Optional[str]]:
        """Classify Slack data."""
        raise NotImplementedError(
            "Slack classification strategy is not yet implemented. "
            "See docs/roadmap.md for planned integrations."
        )

Option C: Add placeholder tests

# lambda/classify/tests/test_strategies.py
class TestSlackStrategy:
    """Tests for SlackStrategy - currently a placeholder."""

    def test_slack_returns_null_result_when_not_implemented(self):
        """Verify Slack classification returns null until implemented."""
        strategy = SlackStrategy()
        result = strategy.classify({'channel': 'general'}, 'test-table')

        # Document expected behavior
        assert result == {'client_name': None, 'project_name': None}

    def test_slack_raises_when_implemented(self):
        """This test should fail when Slack is implemented, prompting real tests."""
        pytest.skip("Update this test when Slack integration is implemented")

Impact

  • Reliability: Clear indication when unimplemented features are used
  • Maintainability: Tests document expected behavior
  • Developer Experience: New developers understand feature status

Resolution Steps

  1. Decide on approach (A, B, or C above)
  2. If Option A: Remove from factory, update handler to return helpful error
  3. If Option B: Add NotImplementedError, update handler to catch and return 501
  4. If Option C: Add placeholder tests documenting current behavior
  5. Update documentation about integration status

High Severity Issues

Issues that affect security, reliability, or code quality but have workarounds or lower immediate impact.


HIGH-001: Cognito Security Features Disabled

Severity: High Component: Terraform File: terraform/main.tf (lines 152-153) Status: Open

Description

AWS Cognito advanced security features and deletion protection are explicitly disabled with comments indicating they should be enabled for production:

enable_advanced_security   = false # Set to true for production (has additional costs)
enable_deletion_protection = false # Set to true for production

Advanced Security provides:

  • Compromised credentials detection
  • Adaptive authentication based on risk
  • Account takeover protection
  • Bot detection

Deletion Protection prevents:

  • Accidental deletion of user pool
  • Loss of all user data and sessions
  • Authentication outage

Impact

  • Security: Missing protection against credential stuffing and account takeover
  • Reliability: User pool could be accidentally deleted

Resolution Steps

  1. Enable deletion protection immediately (no cost)
  2. Evaluate cost of advanced security ($0.05 per MAU)
  3. Enable advanced security if budget allows
  4. Add environment-specific configuration

HIGH-002: CORS Wildcard Default in API Gateway

Severity: High Component: Terraform File: terraform/modules/api-gateway/variables.tf (lines 8-12) Status: Open

Description

The API Gateway CORS configuration defaults to allowing requests from any origin:

variable "cors_allow_origins" {
  description = "CORS allowed origins"
  type        = list(string)
  default     = ["*"]  # Allows all origins
}

While the API uses Cognito authentication, CORS wildcard:

  • Allows any website to make authenticated requests
  • Increases attack surface for XSS-based attacks
  • May not meet security compliance requirements
variable "cors_allow_origins" {
  description = "CORS allowed origins"
  type        = list(string)
  default     = []  # Force explicit configuration
}

# In main.tf
cors_allow_origins = [
  "https://compass.northbuilt.com",
  "https://www.compass.northbuilt.com",
  var.environment == "development" ? "http://localhost:5173" : null
]

HIGH-003: Missing DynamoDB Encryption Configuration

Severity: High Component: Terraform File: terraform/modules/dynamodb/main.tf (lines 10-72) Status: Open

Description

DynamoDB tables do not explicitly configure server-side encryption with customer-managed KMS keys. While AWS enables encryption by default with AWS-managed keys, customer-managed keys provide:

  • Key rotation control
  • Key usage auditing via CloudTrail
  • Cross-account access control
  • Compliance with security standards (SOC2, HIPAA)
resource "aws_dynamodb_table" "main" {
  # ... existing config ...

  server_side_encryption {
    enabled     = true
    kms_key_arn = aws_kms_key.dynamodb.arn
  }
}

resource "aws_kms_key" "dynamodb" {
  description             = "KMS key for DynamoDB encryption"
  deletion_window_in_days = 30
  enable_key_rotation     = true

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Sid       = "Enable IAM policies"
        Effect    = "Allow"
        Principal = { AWS = "arn:aws:iam::${data.aws_caller_identity.current.account_id}:root" }
        Action    = "kms:*"
        Resource  = "*"
      }
    ]
  })
}

HIGH-004: S3 Buckets Using AWS-Managed Encryption Keys

Severity: High Component: Terraform File: terraform/modules/s3/main.tf (lines 34-41, 116-124, 169-176) Status: Open

Description

All S3 buckets use AES256 encryption (AWS-managed keys) instead of customer-managed KMS keys:

rule {
  apply_server_side_encryption_by_default {
    sse_algorithm = "AES256"  # AWS-managed
  }
}

Customer-managed KMS keys provide:

  • Fine-grained access control via key policies
  • Key usage auditing
  • Cross-account sharing capabilities
  • Compliance with security frameworks
resource "aws_s3_bucket_server_side_encryption_configuration" "documents" {
  bucket = aws_s3_bucket.documents.id

  rule {
    apply_server_side_encryption_by_default {
      sse_algorithm     = "aws:kms"
      kms_master_key_id = aws_kms_key.s3.arn
    }
    bucket_key_enabled = true  # Reduces KMS API costs
  }
}

HIGH-005: CloudFront Missing Security Headers

Severity: High Component: Terraform File: terraform/modules/cloudfront/main.tf Status: Open

Description

The CloudFront distribution does not configure security headers. Missing headers leave the application vulnerable to:

  • XSS Attacks: No Content-Security-Policy (CSP)
  • Clickjacking: No X-Frame-Options
  • MIME Sniffing: No X-Content-Type-Options
  • Downgrade Attacks: No Strict-Transport-Security (HSTS)

Create a response headers policy:

resource "aws_cloudfront_response_headers_policy" "security_headers" {
  name = "${var.resource_prefix}-security-headers"

  security_headers_config {
    strict_transport_security {
      access_control_max_age_sec = 31536000  # 1 year
      include_subdomains         = true
      override                   = true
      preload                    = true
    }

    content_type_options {
      override = true  # Adds X-Content-Type-Options: nosniff
    }

    frame_options {
      frame_option = "DENY"
      override     = true
    }

    referrer_policy {
      referrer_policy = "strict-origin-when-cross-origin"
      override        = true
    }

    content_security_policy {
      content_security_policy = "default-src 'self'; script-src 'self' 'wasm-unsafe-eval'; style-src 'self' 'unsafe-inline' https://fonts.googleapis.com; font-src 'self' https://fonts.gstatic.com; img-src 'self' https: data:; connect-src 'self' ${var.api_endpoint} ${var.cognito_domain};"
      override                = true
    }
  }
}

HIGH-006: No Lambda Reserved Concurrency Limits

Severity: High Component: Terraform File: terraform/modules/lambda/main.tf (lines 866-1333) Status: Open

Description

No Lambda functions have reserved_concurrent_executions configured. All functions share the default AWS account limit (typically 1000 concurrent executions). This means:

  • One function could monopolize all concurrency during traffic spike
  • Other critical functions could be throttled
  • No isolation between function workloads

Add reserved concurrency to each function based on expected traffic:

resource "aws_lambda_function" "chat" {
  # ... existing config ...
  reserved_concurrent_executions = 100  # Chat gets 100 concurrent
}

resource "aws_lambda_function" "webhook_fathom" {
  # ... existing config ...
  reserved_concurrent_executions = 10  # Webhooks get 10 each
}

resource "aws_lambda_function" "sync_worker" {
  # ... existing config ...
  reserved_concurrent_executions = 5  # Sync workers get 5 each
}

HIGH-007: Missing Lambda Dead Letter Queues

Severity: High Component: Terraform File: terraform/modules/lambda/main.tf (lines 866-1333) Status: Open

Description

Async/scheduled Lambda functions don’t have Dead Letter Queue (DLQ) configuration. When these functions fail, the events are silently dropped:

  • ingest - EventBridge triggered
  • fathom_sync_scheduler - Scheduled
  • helpscout_sync_scheduler - Scheduled
  • fathom_sync_worker - Async invocation

Without DLQs:

  • Failed events are lost forever
  • No visibility into failure patterns
  • No ability to replay failed events
resource "aws_sqs_queue" "lambda_dlq" {
  name                       = "${var.resource_prefix}-lambda-dlq"
  message_retention_seconds  = 1209600  # 14 days
  visibility_timeout_seconds = 300

  tags = var.tags
}

resource "aws_lambda_function" "sync_worker" {
  # ... existing config ...

  dead_letter_config {
    target_arn = aws_sqs_queue.lambda_dlq.arn
  }
}

Also create CloudWatch alarm for DLQ messages:

resource "aws_cloudwatch_metric_alarm" "dlq_messages" {
  alarm_name          = "${var.resource_prefix}-dlq-messages"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = 1
  metric_name         = "ApproximateNumberOfMessagesVisible"
  namespace           = "AWS/SQS"
  period              = 300
  statistic           = "Sum"
  threshold           = 0
  alarm_description   = "Lambda DLQ has messages - investigate failures"

  dimensions = {
    QueueName = aws_sqs_queue.lambda_dlq.name
  }

  alarm_actions = [aws_sns_topic.alerts.arn]
}

HIGH-008: Short Log Retention (7 Days)

Severity: High Component: Terraform File: terraform/modules/lambda/main.tf (line 229) Status: Open

Description

CloudWatch log retention is set to only 7 days:

log_retention_days = 7

This is insufficient for:

  • Security incident investigation (often discovered weeks later)
  • Compliance audits (typically require 90+ days)
  • Performance trend analysis
  • Debugging intermittent issues
# For development environments
log_retention_days = 14

# For production environments
log_retention_days = 90  # Or 365 for compliance

Alternatively, archive logs to S3 for longer retention at lower cost:

resource "aws_cloudwatch_log_subscription_filter" "to_s3" {
  name            = "${var.resource_prefix}-log-archive"
  log_group_name  = aws_cloudwatch_log_group.lambda.name
  filter_pattern  = ""  # All logs
  destination_arn = aws_kinesis_firehose_delivery_stream.log_archive.arn
}

HIGH-009: No Retry Logic for External API Calls

Severity: High Component: Lambda Connections Files:

  • lambda/shared/connections/helpscout.py
  • lambda/shared/connections/linear.py
  • lambda/shared/connections/fathom.py Status: Open

Description

API calls to external services use basic error handling without retry logic. A single transient failure (network timeout, 429 rate limit, 503 service unavailable) causes the entire operation to fail:

response = requests.get(url, headers=self._get_headers(), params=params)
response.raise_for_status()  # Fails immediately on 429, 503, etc.

In production, transient failures are common:

  • 429 Too Many Requests (rate limiting)
  • 503 Service Unavailable (maintenance)
  • Connection timeouts (network issues)
  • 500 Internal Server Error (temporary backend issues)

Implement retry with exponential backoff:

from tenacity import retry, stop_after_attempt, wait_exponential, retry_if_exception_type
import requests

RETRYABLE_STATUS_CODES = {429, 500, 502, 503, 504}

class RetryableHTTPError(Exception):
    """HTTP error that should be retried."""
    pass

def raise_for_retryable_status(response):
    """Raise RetryableHTTPError for status codes that warrant retry."""
    if response.status_code in RETRYABLE_STATUS_CODES:
        raise RetryableHTTPError(f"HTTP {response.status_code}: {response.reason}")
    response.raise_for_status()

@retry(
    stop=stop_after_attempt(3),
    wait=wait_exponential(multiplier=1, min=2, max=30),
    retry=retry_if_exception_type((RetryableHTTPError, requests.exceptions.Timeout))
)
def fetch_with_retry(url, headers, params=None, timeout=30):
    """Fetch URL with automatic retry on transient failures."""
    response = requests.get(url, headers=headers, params=params, timeout=timeout)
    raise_for_retryable_status(response)
    return response

HIGH-010: Tokens Stored in localStorage (Web)

Severity: High Component: React Web App File: web/src/hooks/useAuth.ts (lines 26-28, 51-55, 65-67) Status: Open

Description

JWT tokens (id_token, access_token, refresh_token) are stored in localStorage:

const idToken = localStorage.getItem('id_token');
const accessToken = localStorage.getItem('access_token');
const refreshToken = localStorage.getItem('refresh_token');

localStorage is vulnerable to XSS attacks. If an attacker injects JavaScript (through any XSS vulnerability), they can:

  • Read all tokens from localStorage
  • Exfiltrate tokens to attacker-controlled server
  • Impersonate the user indefinitely (until token expiry)

Migrate to HTTP-only cookies:

  1. Backend Changes:
    # In Cognito callback handler
    response = make_response(redirect('/'))
    response.set_cookie(
     'access_token',
     access_token,
     httponly=True,      # Can't be read by JavaScript
     secure=True,        # HTTPS only
     samesite='Strict',  # CSRF protection
     max_age=3600        # 1 hour
    )
    
  2. Frontend Changes: ```typescript // Remove localStorage token management // Cookies are automatically sent with requests

const response = await fetch(endpoint, { credentials: ‘include’, // Send cookies // No Authorization header needed });


**Note:** This requires coordination with backend/Cognito configuration.

---

### HIGH-011: Missing Content Security Policy (Web)

**Severity:** High
**Component:** React Web App
**File:** Application-wide (CloudFront configuration needed)
**Status:** Open

#### Description

No Content Security Policy (CSP) headers are configured. CSP prevents:
- Inline script injection (XSS)
- Loading scripts from unauthorized domains
- Unauthorized form submissions
- Clickjacking via framing

#### Recommended Fix

Configure CSP at CloudFront level (see HIGH-005) with policy:

Content-Security-Policy: default-src ‘self’; script-src ‘self’ ‘wasm-unsafe-eval’; style-src ‘self’ ‘unsafe-inline’ https://fonts.googleapis.com; font-src ‘self’ https://fonts.gstatic.com; img-src ‘self’ https: data:; connect-src ‘self’ https://s0vt4k7o47.execute-api.us-east-1.amazonaws.com https://nb-rag-sys-auth.auth.us-east-1.amazoncognito.com; frame-ancestors ‘none’; base-uri ‘self’; form-action ‘self’;


---

### HIGH-012: Missing Connection Tests (Fathom, HelpScout)

**Severity:** High
**Component:** Test Suite
**Files:**
- `lambda/shared/connections/fathom.py`
- `lambda/shared/connections/helpscout.py`
**Status:** Resolved

#### Description

The `FathomConnection` and `HelpScoutConnection` classes have over 500 lines of code each with multiple API calls. Unit tests have been added.

| Connection | Lines of Code | Test Coverage |
|------------|--------------|---------------|
| Base | ~100 | 100% (test_base.py) |
| Linear | ~300 | ~100% (test_linear.py) |
| Fathom | ~350 | ~100% (test_fathom.py) |
| HelpScout | ~550 | ~100% (test_helpscout.py) |

#### Resolution

Tests were added in `lambda/shared/tests/connections/`:
- `test_fathom.py` - 42 tests covering initialization, auth, fetch, S3 keys, transforms
- `test_helpscout.py` - 45 tests covering OAuth2, pagination, conversation fetch, transforms

---

## Medium Severity Issues

Issues that affect maintainability, performance, or developer experience.

---

### MED-001: Cognito Password Temporary Validity Too Long

**File:** `terraform/modules/cognito/main.tf` (lines 20-27)

Temporary passwords are valid for 7 days, which provides a longer window for brute force attacks on new accounts. Reduce to 3-5 days:

```terraform
temporary_password_validity_days = 3

MED-002: No MFA Configuration

File: terraform/modules/cognito/main.tf

No Multi-Factor Authentication is configured for Cognito. Add optional MFA:

mfa_configuration = "OPTIONAL"

software_token_mfa_configuration {
  enabled = true
}

MED-003: S3 Backup Immediate Glacier Transition

File: terraform/modules/s3/main.tf (lines 222-226)

Backups transition to Glacier immediately (day 0), making quick recovery expensive:

transition {
  days          = 0  # Immediate - expensive to retrieve
  storage_class = "GLACIER_IR"
}

Change to:

transition {
  days          = 30  # Keep accessible for 30 days
  storage_class = "GLACIER_IR"
}

MED-004: Unimplemented Strategies Return Null Silently

Files: lambda/classify/strategies/slack.py, basecamp.py, googledrive.py

These strategies are registered in the factory but return null results silently:

def classify(self, data: Any, table_name: str) -> Dict[str, Optional[str]]:
    return self.null_result()  # No error, no log

Should either:

  • Raise NotImplementedError
  • Log a warning
  • Be removed from factory until implemented

MED-005: DynamoDB Client Created Per Request

File: lambda/chat/handler.py (lines 128-147)

Each Lambda invocation creates new boto3 clients:

def create_aws_clients(config: Config) -> AWSClients:
    dynamodb = boto3.resource('dynamodb', region_name=config.aws_region)  # New instance

Should create clients at module level for reuse across warm invocations:

_clients = None

def get_clients(config):
    global _clients
    if _clients is None:
        _clients = create_aws_clients(config)
    return _clients

MED-006: Hardcoded Rate Limiting Sleep Values

Files:

  • lambda/shared/connections/helpscout.py (line 200)
  • lambda/shared/connections/fathom.py (lines 173, 184)

Fixed time.sleep() values that can’t be tuned:

time.sleep(0.5)  # Hardcoded

Should be configurable and respect API rate limit headers:

RATE_LIMIT_DELAY = float(os.environ.get('FATHOM_RATE_LIMIT_DELAY', '0.5'))

def respect_rate_limit(response):
    """Check response headers and sleep if rate limited."""
    retry_after = response.headers.get('Retry-After')
    if retry_after:
        time.sleep(int(retry_after))
    else:
        time.sleep(RATE_LIMIT_DELAY)

MED-007: DEBUG Mode Logs Sensitive Data (Web)

File: web/src/config.ts (line 64), multiple hooks

DEBUG mode enabled in config logs sensitive information:

console.log('OAuth callback successful:', parsedUser);

Should gate by both DEBUG and development mode:

DEBUG: import.meta.env.VITE_DEBUG === 'true' && import.meta.env.MODE === 'development'

MED-008: Error Messages Expose Backend Details (Web)

File: web/src/components/ChatPage.tsx (line 142)

Error messages from API displayed directly to users:

<p>Error: {error}</p>

Should map to user-friendly messages:

const getUserFriendlyError = (error: string) => {
    if (error.includes('Authentication')) return 'Please sign in again';
    if (error.includes('Network')) return 'Connection lost. Please try again.';
    return 'An error occurred. Please try again.';
};

MED-009: Incomplete Test Body

File: lambda/ingest/tests/test_handler.py (line 99)

Test with only pass statement:

def test_handles_metric_send_failure_gracefully(self):
    """Should not raise exception if metric send fails."""
    mock_cloudwatch = Mock()
    mock_cloudwatch.put_metric_data.side_effect = Exception("CloudWatch error")

    handler.send_metric(mock_cloudwatch, 'TestMetric')
    # NO ASSERTIONS!

Should assert the expected behavior.


MED-010 through MED-025: Additional Medium Issues

ID Component File Issue
MED-010 Terraform api-gateway/variables.tf API throttle defaults may be too high (10/sec)
MED-011 Terraform cloudfront/main.tf CloudFront min TTL is 0 (no caching by default)
MED-012 Terraform cloudfront/main.tf Price class limits to NA/Europe only
MED-013 Terraform cognito/main.tf 30-day refresh token validity is long
MED-014 Lambda chat/handler.py Async Lambda invocation not verified
MED-015 Lambda shared/connections S3 permission errors treated as “not found”
MED-016 Lambda shared/connections/fathom.py Type coercion issues in data extraction
MED-017 Lambda All handlers Generic exception handling hides specific errors
MED-018 Lambda shared/utils Overly permissive CORS * in responses
MED-019 Lambda connections No logging context for multi-client scenarios
MED-020 Web hooks/useAuth.ts OAuth tokens visible in URL hash
MED-021 Web components/Chat Missing React.memo on ChatMessage
MED-022 Web lib/api.ts No runtime validation of API responses
MED-023 Tests Multiple Tests depend on real AWS infrastructure
MED-024 Tests Multiple Timestamp assertions without tolerance
MED-025 Tests Multiple Duplicate test utilities (common_utils.py)

Low Severity Issues

Minor issues that affect code cleanliness and maintainability.


LOW-001 through LOW-020: Minor Issues

ID Component Issue
LOW-001 Terraform Missing version constraints on modules
LOW-002 Terraform Missing README.md in module directories
LOW-003 Terraform Some variables lack descriptions
LOW-004 Terraform Unused local variables
LOW-005 Terraform EventBridge rules missing disable option
LOW-006 Lambda Unused sys import in chat handler
LOW-007 Lambda Inconsistent type hints (tuple vs Tuple)
LOW-008 Lambda Hardcoded Lambda function name fallback
LOW-009 Lambda Implicit None return in worker
LOW-010 Lambda Missing docstrings in some utilities
LOW-011 Lambda Information leakage in error messages
LOW-012 Web Composite key with index in MessageList
LOW-013 Web Missing image lazy loading
LOW-014 Web Gravatar URL generated on each render
LOW-015 Tests Missing test factories for varied data
LOW-016 Tests Tests that test multiple things
LOW-017 Tests Overly permissive assertions
LOW-018 Tests Missing test docstrings
LOW-019 Tests Scattered conftest files
LOW-020 Docs Missing inline code documentation

Positive Findings

The audit also identified many security and quality strengths:

Security Strengths

  • Comprehensive input validation and sanitization
  • Proper use of AWS Secrets Manager (no hardcoded secrets)
  • Webhook signature verification with HMAC and constant-time comparison
  • Multi-tenant isolation at vector search level with defense-in-depth
  • No eval(), exec(), or unsafe string formatting
  • Proper CORS headers on API responses

Code Quality Strengths

  • Extensive type hints throughout Python code
  • Strict TypeScript with ~100% type coverage
  • Good React patterns with proper error boundaries
  • Well-structured test organization (2.6:1 test-to-code ratio)
  • Consistent coding style with linting

Infrastructure Strengths

  • S3 versioning enabled for state protection
  • CloudFront HTTPS redirect configured
  • Proper IAM role separation per Lambda
  • Environment-specific tagging

Resolution Tracking

Sprint 1 (Immediate)

  • CRIT-001: Remove hardcoded AWS account ID
  • CRIT-002: Add DynamoDB state locking
  • CRIT-004: Replace bare except clauses
  • CRIT-005: Add JSON decode error handling
  • CRIT-006: Add request timeouts
  • CRIT-007: Fix hardcoded test dates

Sprint 2 (Short-Term)

  • CRIT-003: Restrict CloudWatch IAM permissions
  • CRIT-008: Add tests for unimplemented strategies
  • HIGH-001: Enable Cognito deletion protection
  • HIGH-002: Restrict CORS origins
  • HIGH-005: Add CloudFront security headers
  • HIGH-008: Increase log retention

Sprint 3 (Medium-Term)

  • HIGH-003: Add DynamoDB KMS encryption
  • HIGH-004: Add S3 KMS encryption
  • HIGH-006: Configure Lambda concurrency limits
  • HIGH-007: Add Lambda DLQs
  • HIGH-009: Implement retry logic
  • HIGH-012: Add missing connection tests

Future

  • HIGH-010: Migrate tokens to HTTP-only cookies
  • HIGH-011: Configure CSP headers
  • All Medium and Low severity items


Audit Findings - 2025-12-30

Additional findings from comprehensive RAG system audit.

NEW-001: Frontend Debug Mode Defaults to True

Severity: Medium Component: Web Application File: web/src/config.ts (line 64) Status: Open

Description

DEBUG mode defaults to true which logs sensitive information (session IDs, query content, entity matches) in production.

DEBUG: import.meta.env.VITE_DEBUG === 'true' || true,  // Defaults to true

Change default to false:

DEBUG: import.meta.env.VITE_DEBUG === 'true',

NEW-002: Missing EntityType in Classify Lambda Response

Severity: High Component: Lambda File: lambda/classify/handler.py Status: Open

Description

The Classify Lambda returns classification results but doesn’t include entityType field. Query understanding’s entity loader requires EntityType field for GSI queries. When classify is used to update DynamoDB records, the EntityType may not be set.

Add entityType to classification response to match what the entity loader expects.


NEW-003: Query Understanding Not Documented in CLAUDE.md

Severity: Medium Component: Documentation File: CLAUDE.md Status: Open

Description

The query understanding feature (entity extraction, clarification prompts, metadata filtering) is implemented but not documented in CLAUDE.md. Developers won’t know about:

  • Automatic entity extraction from natural language queries
  • Clarification responses for ambiguous queries
  • Adaptive retrieval multiplier based on filter count

Add Query Understanding section to CLAUDE.md documenting these features.


NEW-004: CLASSIFY_TABLE Environment Variable Undocumented

Severity: Low Component: Documentation File: CLAUDE.md (line 81-82) Status: Open

Description

Chat handler expects CLASSIFY_TABLE environment variable for entity loading, but it’s not documented in the Lambda environment variables section.

Add CLASSIFY_TABLE to the documented environment variables.


NEW-005: System Architecture Doc Says “Vanilla JS”

Severity: Medium Component: Documentation File: docs/architecture/system-architecture.md (line 68) Status: Open

Description

System architecture documentation describes the web application as “Vanilla JavaScript (no framework dependencies)” but the actual implementation uses React 19 + TypeScript + Vite.

Update line 68 to accurately reflect the React implementation.


NEW-006: No Entity Loader Tests

Severity: High Component: Test Suite File: lambda/shared/utils/entities.py Status: Open

Description

The entity loader module (entities.py) has zero test coverage. This is critical for verifying multi-tenant isolation works correctly.

Create comprehensive tests for load_entities_from_dynamodb() covering:

  • DynamoDB load and pagination
  • Entity validation and schema errors
  • EntityType GSI queries
  • Edge cases (empty results, missing fields)

NEW-007: Integration Tests Not Run on PRs

Severity: Medium Component: CI/CD File: .github/workflows/test.yml Status: Open

Description

Integration tests only run on main branch, not on PRs. Real AWS service issues won’t surface until after merge and deployment.

Add integration test stage for PRs against a staging environment, or run a subset of integration tests.


NEW-008: No Post-Deployment Smoke Tests

Severity: Medium Component: CI/CD File: .github/workflows/main.yml Status: Open

Description

No validation runs after Lambda deployment. Bad deployments could go unnoticed.

Add health check endpoint and sanity tests after deployment completes.


NEW-009: Security Gates Don’t Fail Builds

Severity: Medium Component: CI/CD File: .github/workflows/test.yml Status: Open

Description

Snyk, Semgrep, and Checkov security scans all use continue-on-error: true. Security issues are logged but don’t block the build.

Change to fail on HIGH/CRITICAL findings to ensure security issues are addressed before merge.


NEW-010: No Token Refresh Logic in Frontend

Severity: High Component: Web Application File: web/src/hooks/useAuth.ts Status: Open

Description

When Cognito tokens expire (1 hour default), users get authentication errors and must manually refresh the page. No automatic token refresh mechanism exists.

Implement token refresh on 401 responses or proactively before expiration.


Last updated: 2025-12-30