Translator Model Documentation

1. Model Overview

The opus-mt-sw-en-finetuned translation model is a MarianMT-based sequence-to-sequence model fine-tuned for Swahili-to-English translation in the context of child helpline services. This model is specifically trained on conversations from the 116 Child Helpline in Tanzania, making it domain-specific for sensitive topics including abuse reporting, emergency situations, emotional distress, and general child welfare inquiries.

Key Features

Architecture: Helsinki-NLP/opus-mt-mul-en (MarianMT)
Domain: Child helpline conversations
Deployment: Available via AI Service API and Hugging Face Hub
Repository: openchs/sw-en-opus-mt-mul-en-v1
Special Capabilities: Handles code-switching, fragmented trauma narratives, emergency language, and emotional distress expressions

Integration in AI Service Pipeline

The translation model is a core component of the BITZ AI Service pipeline:

Audio Input → Transcription Model → Swahili Text → Translation Model → English Text → NER/Classification/Summarization

The model receives Swahili transcripts from the transcription model and outputs English translations for downstream processing by Named Entity Recognition, classification, and other analysis components.

2. Integration in AI Service Architecture

The translation model is deeply integrated into the AI service through multiple layers:

2.1. Configuration Layer

The translation model is configured through the central settings system (app/config/settings.py):

python

class Settings(BaseSettings):
    # Hugging Face configuration
    hf_token: Optional[str] = None  # No token needed for public models
    translation_hf_repo_id: Optional[str] = "openchs/sw-en-opus-mt-mul-en-v1"
    hf_translator_model: str = "openchs/sw-en-opus-mt-mul-en-v1"
    
    # Model paths
    models_path: str = "./models"  # Auto-adjusted for Docker
    
    # Enable HuggingFace models
    use_hf_models: bool = True

Configuration Helper Methods:

python

def get_translator_model_id(self) -> str:
    """Return the HuggingFace translator model id from configuration"""
    if self.hf_translator_model:
        return self.hf_translator_model
    return self._get_hf_model_id("translator")

def translation_backend(self, include_translation: bool) -> str:
    """Decide which backend should perform translation"""
    if not include_translation:
        return "none"
    if self.use_hf_models and self.hf_translator_model:
        return "hf"
    return "whisper"  # Fallback to Whisper built-in

def resolve_translation_target(self, include_translation: bool) -> Dict[str, str]:
    """Resolve and return the translation target information"""
    backend = self.translation_backend(include_translation)
    if backend == "hf":
        return {"backend": "hf", "model": self.get_translator_model_id()}
    if backend == "whisper":
        return {"backend": "whisper", "model": self.whisper_model_variant}
    return {"backend": "none", "model": ""}

Environment Detection:

The system automatically detects whether it's running in Docker or local development and adjusts paths accordingly:

python

def initialize_paths(self):
    """Initialize paths - automatically detect Docker environment"""
    self.docker_container = os.getenv("DOCKER_CONTAINER") is not None or os.path.exists("/.dockerenv")
    
    if self.docker_container:
        self.models_path = "/app/models"
        self.logs_path = "/app/logs"
        self.temp_path = "/app/temp"
    else:
        # Use relative paths for local development
        self.models_path = "./models"

2.2. Model Loading and Initialization

The translation model is loaded through the TranslationModel class during FastAPI application startup:

python

class TranslationModel:
    """Translation model with intelligent chunking support"""
    
    def __init__(self, model_path: str = None):
        from ..config.settings import settings
        
        self.settings = settings
        self.tokenizer = None
        self.model = None
        self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
        self.loaded = False
        self.max_length = 512  # Model's maximum token limit
        
        # Hugging Face repo support
        self.hf_repo_id = settings.translation_hf_repo_id
        
    def load(self) -> bool:
        """Load model from Hugging Face Hub"""
        try:
            logger.info(f"Loading translation model from HF Hub: {self.hf_repo_id}")
            
            self.tokenizer = AutoTokenizer.from_pretrained(
                self.hf_repo_id, 
                local_files_only=False
            )
            self.model = AutoModelForSeq2SeqLM.from_pretrained(
                self.hf_repo_id, 
                local_files_only=False
            )
            
            self.model.to(self.device)
            self.loaded = True
            
            logger.info(f" Translation model loaded on {self.device}")
            return True
            
        except Exception as e:
            logger.error(f" Failed to load translation model: {e}")
            return False

Model Loader Integration:

The translation model is managed by the central model_loader which handles:

Startup initialization
Readiness checks
Dependency tracking
Health monitoring

python

# During FastAPI startup
@asynccontextmanager
async def lifespan(app: FastAPI):
    # Initialize translation model via model_loader
    model_loader.load_model("translator")
    yield
    # Cleanup on shutdown

2.3. API Endpoints Layer

Translation functionality is exposed through FastAPI routes (app/api/endpoints/translator_routes.py):

python

router = APIRouter(prefix="/translate", tags=["translation"])

class TranslationRequest(BaseModel):
    text: str

class TranslationResponse(BaseModel):
    translated: str
    processing_time: float
    model_info: Dict
    timestamp: str

@router.post("/", response_model=TranslationResponse)
async def translate_text(request: TranslationRequest):
    """Translate text with automatic chunking for long inputs"""
    
    # Check model readiness
    if not model_loader.is_model_ready("translator"):
        raise HTTPException(
            status_code=503,
            detail="Translation model not ready. Check /health/models for status."
        )
    
    # Validate input
    if not request.text.strip():
        raise HTTPException(
            status_code=400,
            detail="Text input cannot be empty"
        )
    
    # Get translator model
    translator_model = model_loader.models.get("translator")
    
    # Initialize chunker
    tokenizer_name = "openchs/sw-en-opus-mt-mul-en-v1"
    chunker = TranslationChunker(
        tokenizer_name=tokenizer_name,
        max_tokens=512
    )
    
    # Count tokens and decide on chunking
    token_count = chunker.count_tokens(request.text)
    MAX_SOURCE_LENGTH = 512
    
    if token_count <= MAX_SOURCE_LENGTH:
        # Direct translation for short text
        translated = translator_model.translate(request.text)
    else:
        # Chunking needed for long text
        chunks = chunker.chunk_transcript(request.text)
        
        # Translate each chunk
        translated_chunks = []
        for chunk_info in chunks:
            chunk_translated = translator_model.translate(chunk_info['text'])
            translated_chunks.append(chunk_translated)
        
        # Reconstruct final translation
        translated = chunker.reconstruct_translation(translated_chunks)
    
    return TranslationResponse(
        translated=translated,
        processing_time=processing_time,
        model_info=translator_model.get_model_info(),
        timestamp=datetime.now().isoformat()
    )

Information Endpoint:

python

@router.get("/info")
async def get_translation_info():
    """Get translation service information"""
    if not model_loader.is_model_ready("translator"):
        return {
            "status": "not_ready",
            "message": "Translation model not loaded"
        }
    
    translator_model = model_loader.models.get("translator")
    return {
        "status": "ready",
        "model_info": translator_model.get_model_info()
    }

2.4. Translation Chunking Strategy

The TranslationChunker utility handles automatic segmentation of long texts to respect the model's 512-token limit:

Why Chunking is Needed:

MarianMT models have a maximum context length of 512 tokens
Long helpline conversations often exceed this limit
Direct truncation would lose critical information
Chunking with overlap preserves context across segments

Chunking Implementation:

python

class TranslationChunker:
    """Utility for chunking long texts for translation"""
    
    def __init__(self, tokenizer_name: str, max_tokens: int = 512):
        self.tokenizer_name = tokenizer_name
        self.max_tokens = max_tokens
        self.tokenizer = AutoTokenizer.from_pretrained(tokenizer_name)
    
    def count_tokens(self, text: str) -> int:
        """Count tokens in text"""
        return len(self.tokenizer.encode(text, add_special_tokens=True))
    
    def chunk_transcript(self, text: str, segment_length: int = 450, 
                        overlap: int = 50) -> List[Dict]:
        """
        Create overlapping chunks from text
        
        Args:
            text: Input text to chunk
            segment_length: Target tokens per segment (default: 450)
            overlap: Token overlap between segments (default: 50)
            
        Returns:
            List of chunk dictionaries with 'text' and 'tokens' keys
        """
        tokens = self.tokenizer.encode(text, add_special_tokens=False)
        chunks = []
        
        start = 0
        while start < len(tokens):
            end = min(start + segment_length, len(tokens))
            chunk_tokens = tokens[start:end]
            chunk_text = self.tokenizer.decode(chunk_tokens, skip_special_tokens=True)
            
            chunks.append({
                'text': chunk_text,
                'tokens': len(chunk_tokens)
            })
            
            # Move start forward, accounting for overlap
            start = end - overlap if end < len(tokens) else end
        
        return chunks
    
    def reconstruct_translation(self, translated_chunks: List[str]) -> str:
        """
        Reconstruct complete translation from chunks
        
        Args:
            translated_chunks: List of translated chunk strings
            
        Returns:
            Combined translation as single string
        """
        return " ".join(chunk.strip() for chunk in translated_chunks if chunk.strip())

Chunking Process Flow:

Token Counting: Count tokens in input text
Decision: If ≤ 512 tokens → direct translation; if > 512 → chunking
Segmentation: Split into 450-token segments with 50-token overlap
Translation: Translate each chunk independently
Reconstruction: Combine translated chunks with intelligent spacing

Overlap Strategy:

The 50-token overlap between chunks ensures:

Context preservation across segment boundaries
Better handling of sentence splits
Improved translation quality for long conversations

2.5. Health Monitoring

The translation model integrates with the AI service health monitoring system (app/api/endpoints/health_routes.py):

Model Status Endpoint:

python

@router.get("/health/models")
async def models_health():
    """Get detailed model status with dependency info"""
    model_status = model_loader.get_model_status()
    ready_models = model_loader.get_ready_models()
    implementable_models = model_loader.get_implementable_models()
    blocked_models = model_loader.get_blocked_models()
    
    return {
        "timestamp": datetime.now().isoformat(),
        "summary": {
            "total": len(model_status),
            "ready": len(ready_models),
            "implementable": len(implementable_models),
            "blocked": len(blocked_models)
        },
        "ready_models": ready_models,
        "implementable_models": implementable_models,
        "blocked_models": blocked_models,
        "details": model_status
    }

Model Readiness States:

Ready: Model loaded and available for translation
Implementable: Model can be loaded but not yet initialized
Blocked: Missing dependencies preventing model loading

Health Check Example Response:

json

{
  "ready_models": ["translator", "classifier", "ner"],
  "implementable_models": [],
  "blocked_models": [],
  "details": {
    "translator": {
      "loaded": true,
      "device": "cuda",
      "hf_repo_id": "openchs/sw-en-opus-mt-mul-en-v1",
      "max_length": 512,
      "chunking_supported": true
    }
  }
}

2.6. Pipeline Integration

The translation model integrates into two processing modes:

Real-time Processing

For live calls, translation works progressively:

python

# Progressive translation during active call
@router.get("/{call_id}/translation")
async def get_call_translation(call_id: str):
    """Get cumulative translation for active call"""
    analysis = progressive_processor.get_call_analysis(call_id)
    
    return {
        "call_id": call_id,
        "cumulative_translation": analysis.cumulative_translation,
        "translation_windows": len([w for w in analysis.windows if w.translation]),
        "latest_entities": analysis.latest_entities,
        "latest_classification": analysis.latest_classification
    }

Real-time Flow:

Audio stream → Whisper transcription (Swahili)
Transcript chunks → Translation (English)
Translated text → NER extraction
Entities + classification → Agent notifications

Post-call Processing

For completed calls, full pipeline execution:

python

# Complete pipeline after call ends
Audio File → Transcription (full) → Translation (chunked if needed) 
→ NER → Classification → Summarization → Insights

Configuration for Pipeline Modes:

python

# Settings control pipeline behavior
realtime_enable_progressive_translation: bool = True
postcall_enable_full_pipeline: bool = True

2.7. Memory Management

The translation model implements automatic GPU memory management:

python

def _cleanup_memory(self):
    """Clean up GPU memory after translation"""
    if torch.cuda.is_available():
        torch.cuda.empty_cache()
    gc.collect()

# Cleanup occurs:
# - After each translation request
# - Every 5 chunks during long translations
# - Before retry attempts

3. Using the Model

3.1. Via AI Service API (Production Use)

The translation model is deployed as part of the AI Service and accessible via REST API.

Endpoint

POST /translate/

Request Format

Headers:

Content-Type: application/json

Request Body:

json

{
  "text": "string"
}

Response Format

Success Response (200):

json

{
  "translated": "string",
  "processing_time": 0,
  "model_info": {
    "model_path": "string",
    "hf_repo_id": "openchs/sw-en-opus-mt-mul-en-v1",
    "device": "cuda",
    "loaded": true,
    "max_length": 512,
    "chunking_supported": true
  },
  "timestamp": "string"
}

Validation Error (422):

json

{
  "detail": [
    {
      "loc": ["string"],
      "msg": "string",
      "type": "string"
    }
  ]
}

Example cURL Request

Basic Translation:

bash

curl -X POST "https://your-api-domain.com/translate/" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Watoto wanahitaji msaada wa haraka. Kuna tatizo kubwa hapa."
  }'

Response:

json

{
  "translated": "Children need help urgently. There is a big problem here.",
  "processing_time": 1.23,
  "model_info": {
    "model_path": "./models/translation",
    "hf_repo_id": "openchs/sw-en-opus-mt-mul-en-v1",
    "device": "cuda",
    "loaded": true,
    "max_length": 512,
    "chunking_supported": true
  },
  "timestamp": "2025-10-15T10:30:45Z"
}

Getting Translation Info

Endpoint:

GET /translate/info

Response:

json

{
  "status": "ready",
  "model_info": {
    "hf_repo_id": "openchs/sw-en-opus-mt-mul-en-v1",
    "device": "cuda",
    "loaded": true,
    "max_length": 512,
    "chunking_supported": true,
    "fallback_strategies": [
      "chunked_translation",
      "memory_cleanup",
      "retry_logic"
    ]
  }
}

Features

Automatic Chunking: Long inputs are automatically chunked to handle conversations without truncation
Segment Processing: Configured with 450 max tokens per segment with 50 token overlap
Intelligent Reconstruction: Translated chunks are combined with proper spacing and punctuation handling
GPU Memory Management: Automatic cleanup between requests and during long translations

3.2. Via Hugging Face Hub

The model is publicly available on Hugging Face for direct inference and fine-tuning.

Model Repository

Organization: openchs
Model: openchs/sw-en-opus-mt-mul-en-v1

Installation

bash

pip install transformers torch

Inference Example

Using Pipeline (Recommended):

python

from transformers import pipeline

# Load the translation pipeline
translator = pipeline(
    "translation",
    model="openchs/sw-en-opus-mt-mul-en-v1"
)

# Translate Swahili to English
text = "Ninajisikia vibaya sana. Nahitaji mtu wa kuongea naye."
result = translator(text)

print(result[0]["translation_text"])
# Output: "I feel very bad. I need someone to talk to."

Using Model and Tokenizer Directly:

python

from transformers import MarianMTModel, MarianTokenizer

# Load model and tokenizer
model_name = "openchs/sw-en-opus-mt-mul-en-v1"
tokenizer = MarianTokenizer.from_pretrained(model_name)
model = MarianMTModel.from_pretrained(model_name)

# Prepare input
text = ["Mtoto wangu amepotea. Tafadhali nisaidieni."]
inputs = tokenizer(text, return_tensors="pt", padding=True)

# Generate translation
translated = model.generate(**inputs)
output = tokenizer.decode(translated[0], skip_special_tokens=True)

print(output)
# Output: "My child is missing. Please help me."

Batch Translation

python

from transformers import pipeline

translator = pipeline(
    "translation",
    model="openchs/sw-en-opus-mt-mul-en-v1"
)

texts = [
    "Habari za asubuhi",
    "Ninahitaji usaidizi",
    "Je, mnaweza kunisaidia?"
]

results = translator(texts)

for original, result in zip(texts, results):
    print(f"{original} → {result['translation_text']}")

Note: When using the model directly from Hugging Face, you'll need to implement your own chunking logic for texts longer than 512 tokens. The AI Service API handles this automatically.

4. Production Considerations

Token Limits

Maximum Input Length: 512 tokens
Segment Size: 450 tokens per segment (for automatic chunking)
Segment Overlap: 50 tokens between segments

Processing Time

Short texts (< 50 tokens): ~0.5-1.0 seconds
Medium texts (50-200 tokens): ~1.0-2.5 seconds
Long texts (> 200 tokens): ~2.5-5.0 seconds (with chunking)

Processing times vary based on:

GPU availability (CUDA vs CPU)
Text length and complexity
Number of chunks required
System load

Automatic Chunking

The API automatically handles long conversations by:

Token Counting: Checking if input exceeds 512 tokens
Segmentation: Creating 450-token chunks with 50-token overlap
Translation: Processing each segment independently
Reconstruction: Combining translations with intelligent spacing

Error Handling

Common Error Scenarios:

Empty Input:

json

{
  "detail": [
    {
      "loc": ["body", "text"],
      "msg": "Text input cannot be empty",
      "type": "value_error"
    }
  ]
}

Model Not Ready:

json

{
  "detail": "Translation model not ready. Check /health/models for status."
}

Service Unavailable:

json

{
  "detail": "Translator model not available"
}

Health Checks

Monitor translation service health:

bash

# Check if translator is ready
curl -X GET "https://your-api-domain.com/health/models"

# Get detailed system status
curl -X GET "https://your-api-domain.com/health/detailed"

5. Model Limitations

Domain Specificity

Optimized for: Child helpline conversations, emotional support language, emergency situations
May require adaptation for: General Swahili-English translation, formal documents, technical content
Performance varies on: Out-of-distribution data, highly specialized terminology

Technical Constraints

Maximum Context: 512 tokens (handled automatically via chunking)
Memory Requirements: GPU recommended for production (CPU fallback available)
Processing Speed: Dependent on hardware and text length

Known Considerations

Code-Switching: While generally successful, complex code-switching patterns may reduce accuracy
Cultural Context: Some culturally-specific Tanzanian expressions may not translate perfectly
Long Conversations: Chunking may occasionally affect cohesion across segment boundaries

Recommendations

Monitor translation quality for edge cases in your domain
Use health check endpoints to verify model readiness before critical operations
Consider implementing translation quality feedback loops
Review translations for sensitive or critical content

6. Citation

If you use this model in your research or application, please cite:

bibtex

@misc{opus_mt_sw_en_finetuned,
  title={opus-mt-sw-en-finetuned},
  author={OpenCHS Team},
  year={2025},
  publisher={Hugging Face},
  url={https://huggingface.co/openchs/sw-en-opus-mt-mul-en-v1}
}

7. Support and Contact

Issues and Questions

Email: info@bitz-itc.com
Hugging Face: openchs organization

Contributing

For dataset contributions, model improvements, or bug reports, please contact the BITZ AI Team at info@bitz-itc.com.

8. License

This model is released under the Apache 2.0 License, allowing for both commercial and non-commercial use with proper attribution.

Documentation last updated: October 15, 2025

Translator Model Documentation ​

1. Model Overview ​

Key Features ​

Integration in AI Service Pipeline ​

2. Integration in AI Service Architecture ​

2.1. Configuration Layer ​

2.2. Model Loading and Initialization ​

2.3. API Endpoints Layer ​

2.4. Translation Chunking Strategy ​

2.5. Health Monitoring ​

2.6. Pipeline Integration ​

Real-time Processing ​

Post-call Processing ​

2.7. Memory Management ​

3. Using the Model ​

3.1. Via AI Service API (Production Use) ​

Endpoint ​

Request Format ​

Response Format ​

Example cURL Request ​

Getting Translation Info ​

Features ​

3.2. Via Hugging Face Hub ​

Model Repository ​

Installation ​

Inference Example ​

Batch Translation ​

4. Production Considerations ​

Token Limits ​

Processing Time ​

Automatic Chunking ​

Error Handling ​

Health Checks ​

5. Model Limitations ​

Domain Specificity ​

Technical Constraints ​

Known Considerations ​

Recommendations ​

6. Citation ​

7. Support and Contact ​

Issues and Questions ​

Contributing ​

8. License ​

Translator Model Documentation

1. Model Overview

Key Features

Integration in AI Service Pipeline

2. Integration in AI Service Architecture

2.1. Configuration Layer

2.2. Model Loading and Initialization

2.3. API Endpoints Layer

2.4. Translation Chunking Strategy

2.5. Health Monitoring

2.6. Pipeline Integration

Real-time Processing

Post-call Processing

2.7. Memory Management

3. Using the Model

3.1. Via AI Service API (Production Use)

Endpoint

Request Format

Response Format

Example cURL Request

Getting Translation Info

Features

3.2. Via Hugging Face Hub

Model Repository

Installation

Inference Example

Batch Translation

4. Production Considerations

Token Limits

Processing Time

Automatic Chunking

Error Handling

Health Checks

5. Model Limitations

Domain Specificity

Technical Constraints

Known Considerations

Recommendations

6. Citation

7. Support and Contact

Issues and Questions

Contributing

8. License