Skip to content

Architecture

This document describes the high-level system design and component architecture of the OpenCHS AI Service.

High-Level System Design

┌──────────────────────────────────────────────────────────────┐
│                     Load Balancer (NGINX)                    │
└──────────────────────────────────────────────────────────────┘

┌──────────────────────────────────────────────────────────────┐
│                   FastAPI Gateway (Port 8125)                │
│  • Health checks                                             │
│  • Request routing                                           │
│  • WebSocket connections                                     │
│  • Streaming endpoints                                       │
│  • API documentation (Swagger UI)                            │
└──────────────────────────────────────────────────────────────┘

         ┌────────────────────┴─────────────────────┐
         ↓                                          ↓
┌────────────────────────┐            ┌──────────────────────┐
│  Celery Task Queue     │            │   Redis Store        │
│  (model_processing)    │            │  • Task results      │
│  • Model inference     │            │  • Session state     │
│  • Async processing    │            │  • Caching           │
│  • Retry logic         │            │  • Rate limiting     │
└────────────────────────┘            └──────────────────────┘

┌──────────────────────────────────────────────────────────────┐
│                    Celery Workers Pool                       │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐          │
│  │ GPU Worker  │  │ CPU Worker  │  │ CPU Worker  │  ...     │
│  │             │  │             │  │             │          │
│  │ • Whisper   │  │ • NER       │  │ • QA        │          │
│  │ • Translate │  │ • Classify  │  │ • Summarize │          │
│  └─────────────┘  └─────────────┘  └─────────────┘          │
└──────────────────────────────────────────────────────────────┘

                   ┌──────────────────┐
                   │  ML Models       │
                   │  • Whisper v3    │
                   │  • Transformers  │
                   │  • spaCy         │
                   │  • Custom Models │
                   └──────────────────┘

┌──────────────────────────────────────────────────────────────┐
│               Streaming Layer (TCP + WebSocket)              │
│  • Asterisk Integration (Port 8301)                          │
│  • Real-time transcription                                   │
│  • Progressive result delivery                               │
└──────────────────────────────────────────────────────────────┘

Component Layers

1. API Layer (app/api/)

The REST and WebSocket interface for the service. Handles:

ResponsibilityDescription
Audio file uploadsAccept and validate audio files for processing
Batch processingQueue multiple files for background processing
Real-time streamingWebSocket connections for live transcription
Individual model endpointsDirect access to specific AI models
Health monitoringSystem health and readiness checks

Key Files:

  • audio_routes.py - Audio processing endpoints
  • whisper_routes.py - Transcription endpoints
  • translator_routes.py - Translation endpoints
  • classifier_route.py - Classification endpoints
  • health_routes.py - Health check endpoints
  • call_session_routes.py - Live call management

2. Task Processing Layer (app/tasks/)

Asynchronous task execution via Celery:

ComponentPurpose
Audio preprocessingValidate and normalize audio files
Model inference schedulingQueue tasks for GPU/CPU workers
Progressive result updatesStream partial results as processing continues
Error handling and retriesAutomatic retry with exponential backoff

Key Files:

  • audio_tasks.py - Audio processing tasks
  • model_tasks.py - Model inference tasks
  • health_tasks.py - Health check tasks

3. Model Layer (app/model_scripts/)

ML model management and inference:

ComponentPurpose
Model loadingInitialize models on startup or on-demand
GPU/CPU resource managementAllocate compute resources efficiently
Batch and streaming inferenceSupport both batch and real-time processing
Model versioning and cachingCache loaded models for fast inference

Key Files:

  • model_loader.py - Central model management
  • whisper_model.py - Speech-to-text model
  • translator_model.py - Translation model
  • ner_model.py - Named entity recognition
  • classifier_model.py - Case classification
  • summarizer_model.py - Text summarization
  • qa_model.py - Question answering

4. Core Services (app/core/)

Infrastructure services that support the application:

ServicePurpose
Resource ManagerGPU/CPU monitoring and allocation
Processing Strategy ManagerMode selection (real-time vs batch)
Celery MonitorWorker health and task monitoring
Notification ManagerSend alerts to agent dashboards
MetricsPrometheus metrics collection

Key Files:

  • resource_manager.py - GPU/CPU management
  • celery_monitor.py - Celery monitoring
  • notification_manager.py - Notification system
  • processing_strategy_manager.py - Processing mode logic
  • metrics.py - Prometheus metrics

5. Streaming Layer (app/streaming/)

Real-time audio processing for live calls:

ComponentPurpose
TCP ServerAsterisk PBX integration for live calls (Port 8301)
WebSocket ServerBrowser-based real-time updates
Audio BufferManage streaming audio chunks
Call Session ManagerTrack live call state and metadata
Progressive ProcessorRun NLP analysis on partial transcripts

Key Files:

  • tcp_server.py - Asterisk TCP integration
  • websocket_server.py - WebSocket server
  • call_session_manager.py - Call state management
  • audio_buffer.py - Audio buffering
  • progressive_processor.py - Progressive analysis

6. Data Layer (app/db/)

Data persistence and storage:

ComponentPurpose
SQLAlchemy ORMDatabase abstraction
Session managementDatabase connection pooling
Call history storagePersist call records and transcripts
Agent feedback trackingStore agent corrections and ratings

Key Files:

  • session.py - Database session management
  • models.py - SQLAlchemy models
  • repositories/feedback_repository.py - Feedback CRUD operations

Data Flow

Batch Processing Flow

1. Client uploads audio file via POST /audio/process
2. API validates file format and size
3. Task queued to Celery with unique task_id
4. Client receives 202 Accepted with task_id
5. Celery worker picks up task
6. Worker runs pipeline: Whisper → Translation → NER → Classification → Summary
7. Results stored in Redis
8. Client polls GET /audio/task/{task_id} for results

Real-time Processing Flow

1. Asterisk sends audio stream via TCP (port 8301)
2. TCP Server receives audio chunks
3. Audio buffered until chunk threshold
4. Whisper transcribes chunk (every 5 seconds)
5. Progressive NLP runs on accumulated transcript (every 30 seconds)
6. Results pushed via WebSocket to agent dashboard
7. Call ends → trigger post-call processing (optional)

Technology Stack

LayerTechnology
API FrameworkFastAPI 0.116+
Task QueueCelery 5.5+
Message BrokerRedis 7.0+
ML FrameworkPyTorch 2.7+, Transformers 4.53+
Speech-to-TextOpenAI Whisper (faster-whisper)
NLPspaCy 3.8+, Custom Transformers
DatabaseSQLAlchemy 2.0+ (SQLite/PostgreSQL/MySQL)
MonitoringPrometheus, Grafana
ContainerDocker 20.10+, Kubernetes
Load BalancerNGINX 1.20+

Scalability Considerations

Horizontal Scaling

  • API Servers: Scale horizontally behind load balancer
  • Celery Workers: Add workers to increase throughput
  • Redis: Use Redis Cluster for high availability
  • GPU Workers: Add GPU nodes for faster inference

Vertical Scaling

  • GPU Memory: Larger GPUs allow larger models and batch sizes
  • CPU Cores: More cores for parallel CPU-bound tasks
  • RAM: More memory for model caching

Bottleneck Analysis

ComponentBottleneckSolution
WhisperGPU memoryUse smaller model or batch processing
RedisMemoryConfigure eviction policy, use Redis Cluster
DatabaseConnection poolIncrease pool size, use read replicas
NetworkBandwidthCDN for audio files, compression

Security Architecture

Network Security

  • HTTPS/TLS for all API traffic
  • Internal services communicate via private network
  • Redis protected by authentication
  • Firewall rules limit exposed ports

Data Security

  • PII detection and masking
  • Encryption at rest (database, files)
  • Encryption in transit (TLS)
  • Audit logging for all operations

Access Control

  • JWT-based authentication (configurable)
  • Role-based access control
  • API rate limiting
  • Request validation and sanitization

Next Steps