Translator Model Documentation
1. Model Overview
The opus-mt-sw-en-finetuned translation model is a MarianMT-based sequence-to-sequence model fine-tuned for Swahili-to-English translation in the context of child helpline services. This model is specifically trained on conversations from the 116 Child Helpline in Tanzania, making it domain-specific for sensitive topics including abuse reporting, emergency situations, emotional distress, and general child welfare inquiries.
Key Features
- Architecture: Helsinki-NLP/opus-mt-mul-en (MarianMT)
- Domain: Child helpline conversations
- Deployment: Available via AI Service API and Hugging Face Hub
- Repository: openchs/sw-en-opus-mt-mul-en-v1
- Special Capabilities: Handles code-switching, fragmented trauma narratives, emergency language, and emotional distress expressions
Integration in AI Service Pipeline
The translation model is a core component of the BITZ AI Service pipeline:
Audio Input → Transcription Model → Swahili Text → Translation Model → English Text → NER/Classification/SummarizationThe model receives Swahili transcripts from the transcription model and outputs English translations for downstream processing by Named Entity Recognition, classification, and other analysis components.
2. Integration in AI Service Architecture
The translation model is deeply integrated into the AI service through multiple layers:
2.1. Configuration Layer
The translation model is configured through the central settings system (app/config/settings.py):
class Settings(BaseSettings):
# Hugging Face configuration
hf_token: Optional[str] = None # No token needed for public models
translation_hf_repo_id: Optional[str] = "openchs/sw-en-opus-mt-mul-en-v1"
hf_translator_model: str = "openchs/sw-en-opus-mt-mul-en-v1"
# Model paths
models_path: str = "./models" # Auto-adjusted for Docker
# Enable HuggingFace models
use_hf_models: bool = TrueConfiguration Helper Methods:
def get_translator_model_id(self) -> str:
"""Return the HuggingFace translator model id from configuration"""
if self.hf_translator_model:
return self.hf_translator_model
return self._get_hf_model_id("translator")
def translation_backend(self, include_translation: bool) -> str:
"""Decide which backend should perform translation"""
if not include_translation:
return "none"
if self.use_hf_models and self.hf_translator_model:
return "hf"
return "whisper" # Fallback to Whisper built-in
def resolve_translation_target(self, include_translation: bool) -> Dict[str, str]:
"""Resolve and return the translation target information"""
backend = self.translation_backend(include_translation)
if backend == "hf":
return {"backend": "hf", "model": self.get_translator_model_id()}
if backend == "whisper":
return {"backend": "whisper", "model": self.whisper_model_variant}
return {"backend": "none", "model": ""}Environment Detection:
The system automatically detects whether it's running in Docker or local development and adjusts paths accordingly:
def initialize_paths(self):
"""Initialize paths - automatically detect Docker environment"""
self.docker_container = os.getenv("DOCKER_CONTAINER") is not None or os.path.exists("/.dockerenv")
if self.docker_container:
self.models_path = "/app/models"
self.logs_path = "/app/logs"
self.temp_path = "/app/temp"
else:
# Use relative paths for local development
self.models_path = "./models"2.2. Model Loading and Initialization
The translation model is loaded through the TranslationModel class during FastAPI application startup:
class TranslationModel:
"""Translation model with intelligent chunking support"""
def __init__(self, model_path: str = None):
from ..config.settings import settings
self.settings = settings
self.tokenizer = None
self.model = None
self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
self.loaded = False
self.max_length = 512 # Model's maximum token limit
# Hugging Face repo support
self.hf_repo_id = settings.translation_hf_repo_id
def load(self) -> bool:
"""Load model from Hugging Face Hub"""
try:
logger.info(f"Loading translation model from HF Hub: {self.hf_repo_id}")
self.tokenizer = AutoTokenizer.from_pretrained(
self.hf_repo_id,
local_files_only=False
)
self.model = AutoModelForSeq2SeqLM.from_pretrained(
self.hf_repo_id,
local_files_only=False
)
self.model.to(self.device)
self.loaded = True
logger.info(f" Translation model loaded on {self.device}")
return True
except Exception as e:
logger.error(f" Failed to load translation model: {e}")
return FalseModel Loader Integration:
The translation model is managed by the central model_loader which handles:
- Startup initialization
- Readiness checks
- Dependency tracking
- Health monitoring
# During FastAPI startup
@asynccontextmanager
async def lifespan(app: FastAPI):
# Initialize translation model via model_loader
model_loader.load_model("translator")
yield
# Cleanup on shutdown2.3. API Endpoints Layer
Translation functionality is exposed through FastAPI routes (app/api/endpoints/translator_routes.py):
router = APIRouter(prefix="/translate", tags=["translation"])
class TranslationRequest(BaseModel):
text: str
class TranslationResponse(BaseModel):
translated: str
processing_time: float
model_info: Dict
timestamp: str
@router.post("/", response_model=TranslationResponse)
async def translate_text(request: TranslationRequest):
"""Translate text with automatic chunking for long inputs"""
# Check model readiness
if not model_loader.is_model_ready("translator"):
raise HTTPException(
status_code=503,
detail="Translation model not ready. Check /health/models for status."
)
# Validate input
if not request.text.strip():
raise HTTPException(
status_code=400,
detail="Text input cannot be empty"
)
# Get translator model
translator_model = model_loader.models.get("translator")
# Initialize chunker
tokenizer_name = "openchs/sw-en-opus-mt-mul-en-v1"
chunker = TranslationChunker(
tokenizer_name=tokenizer_name,
max_tokens=512
)
# Count tokens and decide on chunking
token_count = chunker.count_tokens(request.text)
MAX_SOURCE_LENGTH = 512
if token_count <= MAX_SOURCE_LENGTH:
# Direct translation for short text
translated = translator_model.translate(request.text)
else:
# Chunking needed for long text
chunks = chunker.chunk_transcript(request.text)
# Translate each chunk
translated_chunks = []
for chunk_info in chunks:
chunk_translated = translator_model.translate(chunk_info['text'])
translated_chunks.append(chunk_translated)
# Reconstruct final translation
translated = chunker.reconstruct_translation(translated_chunks)
return TranslationResponse(
translated=translated,
processing_time=processing_time,
model_info=translator_model.get_model_info(),
timestamp=datetime.now().isoformat()
)Information Endpoint:
@router.get("/info")
async def get_translation_info():
"""Get translation service information"""
if not model_loader.is_model_ready("translator"):
return {
"status": "not_ready",
"message": "Translation model not loaded"
}
translator_model = model_loader.models.get("translator")
return {
"status": "ready",
"model_info": translator_model.get_model_info()
}2.4. Translation Chunking Strategy
The TranslationChunker utility handles automatic segmentation of long texts to respect the model's 512-token limit:
Why Chunking is Needed:
- MarianMT models have a maximum context length of 512 tokens
- Long helpline conversations often exceed this limit
- Direct truncation would lose critical information
- Chunking with overlap preserves context across segments
Chunking Implementation:
class TranslationChunker:
"""Utility for chunking long texts for translation"""
def __init__(self, tokenizer_name: str, max_tokens: int = 512):
self.tokenizer_name = tokenizer_name
self.max_tokens = max_tokens
self.tokenizer = AutoTokenizer.from_pretrained(tokenizer_name)
def count_tokens(self, text: str) -> int:
"""Count tokens in text"""
return len(self.tokenizer.encode(text, add_special_tokens=True))
def chunk_transcript(self, text: str, segment_length: int = 450,
overlap: int = 50) -> List[Dict]:
"""
Create overlapping chunks from text
Args:
text: Input text to chunk
segment_length: Target tokens per segment (default: 450)
overlap: Token overlap between segments (default: 50)
Returns:
List of chunk dictionaries with 'text' and 'tokens' keys
"""
tokens = self.tokenizer.encode(text, add_special_tokens=False)
chunks = []
start = 0
while start < len(tokens):
end = min(start + segment_length, len(tokens))
chunk_tokens = tokens[start:end]
chunk_text = self.tokenizer.decode(chunk_tokens, skip_special_tokens=True)
chunks.append({
'text': chunk_text,
'tokens': len(chunk_tokens)
})
# Move start forward, accounting for overlap
start = end - overlap if end < len(tokens) else end
return chunks
def reconstruct_translation(self, translated_chunks: List[str]) -> str:
"""
Reconstruct complete translation from chunks
Args:
translated_chunks: List of translated chunk strings
Returns:
Combined translation as single string
"""
return " ".join(chunk.strip() for chunk in translated_chunks if chunk.strip())Chunking Process Flow:
- Token Counting: Count tokens in input text
- Decision: If ≤ 512 tokens → direct translation; if > 512 → chunking
- Segmentation: Split into 450-token segments with 50-token overlap
- Translation: Translate each chunk independently
- Reconstruction: Combine translated chunks with intelligent spacing
Overlap Strategy:
The 50-token overlap between chunks ensures:
- Context preservation across segment boundaries
- Better handling of sentence splits
- Improved translation quality for long conversations
2.5. Health Monitoring
The translation model integrates with the AI service health monitoring system (app/api/endpoints/health_routes.py):
Model Status Endpoint:
@router.get("/health/models")
async def models_health():
"""Get detailed model status with dependency info"""
model_status = model_loader.get_model_status()
ready_models = model_loader.get_ready_models()
implementable_models = model_loader.get_implementable_models()
blocked_models = model_loader.get_blocked_models()
return {
"timestamp": datetime.now().isoformat(),
"summary": {
"total": len(model_status),
"ready": len(ready_models),
"implementable": len(implementable_models),
"blocked": len(blocked_models)
},
"ready_models": ready_models,
"implementable_models": implementable_models,
"blocked_models": blocked_models,
"details": model_status
}Model Readiness States:
- Ready: Model loaded and available for translation
- Implementable: Model can be loaded but not yet initialized
- Blocked: Missing dependencies preventing model loading
Health Check Example Response:
{
"ready_models": ["translator", "classifier", "ner"],
"implementable_models": [],
"blocked_models": [],
"details": {
"translator": {
"loaded": true,
"device": "cuda",
"hf_repo_id": "openchs/sw-en-opus-mt-mul-en-v1",
"max_length": 512,
"chunking_supported": true
}
}
}2.6. Pipeline Integration
The translation model integrates into two processing modes:
Real-time Processing
For live calls, translation works progressively:
# Progressive translation during active call
@router.get("/{call_id}/translation")
async def get_call_translation(call_id: str):
"""Get cumulative translation for active call"""
analysis = progressive_processor.get_call_analysis(call_id)
return {
"call_id": call_id,
"cumulative_translation": analysis.cumulative_translation,
"translation_windows": len([w for w in analysis.windows if w.translation]),
"latest_entities": analysis.latest_entities,
"latest_classification": analysis.latest_classification
}Real-time Flow:
- Audio stream → Whisper transcription (Swahili)
- Transcript chunks → Translation (English)
- Translated text → NER extraction
- Entities + classification → Agent notifications
Post-call Processing
For completed calls, full pipeline execution:
# Complete pipeline after call ends
Audio File → Transcription (full) → Translation (chunked if needed)
→ NER → Classification → Summarization → InsightsConfiguration for Pipeline Modes:
# Settings control pipeline behavior
realtime_enable_progressive_translation: bool = True
postcall_enable_full_pipeline: bool = True2.7. Memory Management
The translation model implements automatic GPU memory management:
def _cleanup_memory(self):
"""Clean up GPU memory after translation"""
if torch.cuda.is_available():
torch.cuda.empty_cache()
gc.collect()
# Cleanup occurs:
# - After each translation request
# - Every 5 chunks during long translations
# - Before retry attempts3. Using the Model
3.1. Via AI Service API (Production Use)
The translation model is deployed as part of the AI Service and accessible via REST API.
Endpoint
POST /translate/Request Format
Headers:
Content-Type: application/jsonRequest Body:
{
"text": "string"
}Response Format
Success Response (200):
{
"translated": "string",
"processing_time": 0,
"model_info": {
"model_path": "string",
"hf_repo_id": "openchs/sw-en-opus-mt-mul-en-v1",
"device": "cuda",
"loaded": true,
"max_length": 512,
"chunking_supported": true
},
"timestamp": "string"
}Validation Error (422):
{
"detail": [
{
"loc": ["string"],
"msg": "string",
"type": "string"
}
]
}Example cURL Request
Basic Translation:
curl -X POST "https://your-api-domain.com/translate/" \
-H "Content-Type: application/json" \
-d '{
"text": "Watoto wanahitaji msaada wa haraka. Kuna tatizo kubwa hapa."
}'Response:
{
"translated": "Children need help urgently. There is a big problem here.",
"processing_time": 1.23,
"model_info": {
"model_path": "./models/translation",
"hf_repo_id": "openchs/sw-en-opus-mt-mul-en-v1",
"device": "cuda",
"loaded": true,
"max_length": 512,
"chunking_supported": true
},
"timestamp": "2025-10-15T10:30:45Z"
}Getting Translation Info
Endpoint:
GET /translate/infoResponse:
{
"status": "ready",
"model_info": {
"hf_repo_id": "openchs/sw-en-opus-mt-mul-en-v1",
"device": "cuda",
"loaded": true,
"max_length": 512,
"chunking_supported": true,
"fallback_strategies": [
"chunked_translation",
"memory_cleanup",
"retry_logic"
]
}
}Features
- Automatic Chunking: Long inputs are automatically chunked to handle conversations without truncation
- Segment Processing: Configured with 450 max tokens per segment with 50 token overlap
- Intelligent Reconstruction: Translated chunks are combined with proper spacing and punctuation handling
- GPU Memory Management: Automatic cleanup between requests and during long translations
3.2. Via Hugging Face Hub
The model is publicly available on Hugging Face for direct inference and fine-tuning.
Model Repository
- Organization: openchs
- Model: openchs/sw-en-opus-mt-mul-en-v1
Installation
pip install transformers torchInference Example
Using Pipeline (Recommended):
from transformers import pipeline
# Load the translation pipeline
translator = pipeline(
"translation",
model="openchs/sw-en-opus-mt-mul-en-v1"
)
# Translate Swahili to English
text = "Ninajisikia vibaya sana. Nahitaji mtu wa kuongea naye."
result = translator(text)
print(result[0]["translation_text"])
# Output: "I feel very bad. I need someone to talk to."Using Model and Tokenizer Directly:
from transformers import MarianMTModel, MarianTokenizer
# Load model and tokenizer
model_name = "openchs/sw-en-opus-mt-mul-en-v1"
tokenizer = MarianTokenizer.from_pretrained(model_name)
model = MarianMTModel.from_pretrained(model_name)
# Prepare input
text = ["Mtoto wangu amepotea. Tafadhali nisaidieni."]
inputs = tokenizer(text, return_tensors="pt", padding=True)
# Generate translation
translated = model.generate(**inputs)
output = tokenizer.decode(translated[0], skip_special_tokens=True)
print(output)
# Output: "My child is missing. Please help me."Batch Translation
from transformers import pipeline
translator = pipeline(
"translation",
model="openchs/sw-en-opus-mt-mul-en-v1"
)
texts = [
"Habari za asubuhi",
"Ninahitaji usaidizi",
"Je, mnaweza kunisaidia?"
]
results = translator(texts)
for original, result in zip(texts, results):
print(f"{original} → {result['translation_text']}")Note: When using the model directly from Hugging Face, you'll need to implement your own chunking logic for texts longer than 512 tokens. The AI Service API handles this automatically.
4. Production Considerations
Token Limits
- Maximum Input Length: 512 tokens
- Segment Size: 450 tokens per segment (for automatic chunking)
- Segment Overlap: 50 tokens between segments
Processing Time
- Short texts (< 50 tokens): ~0.5-1.0 seconds
- Medium texts (50-200 tokens): ~1.0-2.5 seconds
- Long texts (> 200 tokens): ~2.5-5.0 seconds (with chunking)
Processing times vary based on:
- GPU availability (CUDA vs CPU)
- Text length and complexity
- Number of chunks required
- System load
Automatic Chunking
The API automatically handles long conversations by:
- Token Counting: Checking if input exceeds 512 tokens
- Segmentation: Creating 450-token chunks with 50-token overlap
- Translation: Processing each segment independently
- Reconstruction: Combining translations with intelligent spacing
Error Handling
Common Error Scenarios:
- Empty Input:
{
"detail": [
{
"loc": ["body", "text"],
"msg": "Text input cannot be empty",
"type": "value_error"
}
]
}- Model Not Ready:
{
"detail": "Translation model not ready. Check /health/models for status."
}- Service Unavailable:
{
"detail": "Translator model not available"
}Health Checks
Monitor translation service health:
# Check if translator is ready
curl -X GET "https://your-api-domain.com/health/models"
# Get detailed system status
curl -X GET "https://your-api-domain.com/health/detailed"5. Model Limitations
Domain Specificity
- Optimized for: Child helpline conversations, emotional support language, emergency situations
- May require adaptation for: General Swahili-English translation, formal documents, technical content
- Performance varies on: Out-of-distribution data, highly specialized terminology
Technical Constraints
- Maximum Context: 512 tokens (handled automatically via chunking)
- Memory Requirements: GPU recommended for production (CPU fallback available)
- Processing Speed: Dependent on hardware and text length
Known Considerations
- Code-Switching: While generally successful, complex code-switching patterns may reduce accuracy
- Cultural Context: Some culturally-specific Tanzanian expressions may not translate perfectly
- Long Conversations: Chunking may occasionally affect cohesion across segment boundaries
Recommendations
- Monitor translation quality for edge cases in your domain
- Use health check endpoints to verify model readiness before critical operations
- Consider implementing translation quality feedback loops
- Review translations for sensitive or critical content
6. Citation
If you use this model in your research or application, please cite:
@misc{opus_mt_sw_en_finetuned,
title={opus-mt-sw-en-finetuned},
author={OpenCHS Team},
year={2025},
publisher={Hugging Face},
url={https://huggingface.co/openchs/sw-en-opus-mt-mul-en-v1}
}7. Support and Contact
Issues and Questions
- Email: info@bitz-itc.com
- Hugging Face: openchs organization
Contributing
For dataset contributions, model improvements, or bug reports, please contact the BITZ AI Team at info@bitz-itc.com.
8. License
This model is released under the Apache 2.0 License, allowing for both commercial and non-commercial use with proper attribution.
Documentation last updated: October 15, 2025