Named Entity Recognition Model Documentation

1. Model Overview

The ner_distillbert model is a DistilBERT-based Named Entity Recognition system fine-tuned for identifying sensitive entities in child helpline conversations. Built on DistilBERT architecture, it retains 97% of BERT's performance while being 60% smaller and faster, making it ideal for real-time production environments.

Key Features

Architecture: DistilBERT (distilled BERT)
Domain: Child helpline and protection services
Deployment: Available via AI Service API and Hugging Face Hub
Repository: openchs/ner_distillbert_v1
Special Capabilities: Identifies perpetrators, victims, locations, landmarks, contact information, and incident types in sensitive conversations

Integration in AI Service Pipeline

The NER model is a core component of the BITZ AI Service pipeline:

Translated Text → NER Model → Entity Extraction → Structured Data → Analysis

The model receives English text (either direct input or from the translation model) and outputs structured entity information for downstream processing by classification, summarization components.

Supported Entity Types

The model can identify the following entity types:

PERPETRATOR - Individuals accused or suspected of committing harmful acts
VICTIM - Individuals who have experienced harm or abuse
NAME - General person names that don't fall into specific roles
LOCATION - Geographic locations (cities, countries, regions)
LANDMARK - Specific places or buildings (schools, hospitals, parks)
PHONE_NUMBER - Telephone numbers
GENDER - Gender identifiers
AGE - Age-related information
INCIDENT_TYPE - Types of incidents or events reported
O - Outside of named entity (non-entity tokens)

2. Integration in AI Service Architecture

The NER model is deeply integrated into the AI service through multiple layers:

2.1. Configuration Layer

The NER model is configured through the central settings system (app/config/settings.py):

python

class Settings(BaseSettings):
    # Hugging Face configuration
    hf_token: Optional[str] = None  
    ner_hf_repo_id: Optional[str] = "openchs/ner_distillbert_v1"
    hf_ner_model: str = "openchs/ner_distillbert_v1"
    
    # Model paths
    models_path: str = "./models" 
    
    # Fallback configuration
    ner_fallback_model: str = "en_core_web_lg"  # spaCy fallback
    
    # Enable HuggingFace models
    use_hf_models: bool = True

Configuration Helper Methods:

python

def get_ner_model_id(self) -> str:
    """Return the HuggingFace NER model id from configuration"""
    if self.hf_ner_model:
        return self.hf_ner_model
    return self._get_hf_model_id("ner")

def ner_backend(self) -> str:
    """Decide which backend should perform NER"""
    if self.use_hf_models and self.hf_ner_model:
        return "hf"
    return "spacy"  # Fallback to spaCy

Environment Detection:

The system automatically detects whether it's running in Docker or local development and adjusts paths accordingly:

python

def initialize_paths(self):
    """Initialize paths - automatically detect Docker environment"""
    self.docker_container = os.getenv("DOCKER_CONTAINER") is not None or os.path.exists("/.dockerenv")
    
    if self.docker_container:
        self.models_path = "/app/models"
        self.logs_path = "/app/logs"
        self.temp_path = "/app/temp"
    else:
        # Use relative paths for local development
        self.models_path = "./models"

2.2. Model Loading and Initialization

The NER model is loaded through the NERModel class during FastAPI application startup:

python

class NERModel:
    """Named Entity Recognition model with confidence scoring"""
    
    def __init__(self, model_path: str = None):
        from ..config.settings import settings
        
        self.settings = settings
        self.tokenizer = None
        self.model = None
        self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
        self.loaded = False
        
        # Hugging Face repo support
        self.hf_repo_id = settings.ner_hf_repo_id
        
        # Fallback to spaCy if needed
        self.fallback_model_name = settings.ner_fallback_model
        self.nlp = None  # spaCy model
        
    def load(self) -> bool:
        """Load model from Hugging Face Hub with spaCy fallback"""
        try:
            logger.info(f"Loading NER model from HF Hub: {self.hf_repo_id}")
            
            self.tokenizer = AutoTokenizer.from_pretrained(
                self.hf_repo_id, 
                local_files_only=False
            )
            self.model = AutoModelForTokenClassification.from_pretrained(
                self.hf_repo_id, 
                local_files_only=False
            )
            
            self.model.to(self.device)
            self.loaded = True
            
            logger.info(f"✓ NER model loaded on {self.device}")
            return True
            
        except Exception as e:
            logger.error(f"✗ Failed to load NER model: {e}")
            logger.info(f"Attempting to load fallback spaCy model: {self.fallback_model_name}")
            
            try:
                self.nlp = spacy.load(self.fallback_model_name)
                self.loaded = True
                logger.info(f"✓ Fallback spaCy model loaded successfully")
                return True
            except Exception as spacy_error:
                logger.error(f"✗ Failed to load fallback model: {spacy_error}")
                return False

Model Loader Integration:

The NER model is managed by the central model_loader which handles:

Startup initialization
Readiness checks
Dependency tracking
Health monitoring

python

# During FastAPI startup
@asynccontextmanager
async def lifespan(app: FastAPI):
    # Initialize NER model via model_loader
    model_loader.load_model("ner")
    yield
    # Cleanup on shutdown

2.3. API Endpoints Layer

NER functionality is exposed through FastAPI routes (app/api/endpoints/ner_routes.py):

python

router = APIRouter(prefix="/ner", tags=["ner"])

class NERRequest(BaseModel):
    text: str
    flat: bool = True

class EntityResponse(BaseModel):
    text: str
    label: str
    start: int
    end: int
    confidence: float

class NERResponse(BaseModel):
    entities: List[EntityResponse]
    processing_time: float
    model_info: Dict
    timestamp: str

@router.post("/extract", response_model=NERResponse)
async def extract_entities(request: NERRequest):
    """Extract named entities from text with confidence scores"""
    
    # Check model readiness
    if not model_loader.is_model_ready("ner"):
        raise HTTPException(
            status_code=503,
            detail="NER model not ready. Check /health/models for status."
        )
    
    # Validate input
    if not request.text.strip():
        raise HTTPException(
            status_code=400,
            detail="Text input cannot be empty"
        )
    
    # Get NER model
    ner_model = model_loader.models.get("ner")
    
    # Extract entities
    start_time = time.time()
    entities = ner_model.extract_entities(request.text, flat=request.flat)
    processing_time = time.time() - start_time
    
    return NERResponse(
        entities=entities,
        processing_time=processing_time,
        model_info=ner_model.get_model_info(),
        timestamp=datetime.now().isoformat()
    )

Information Endpoint:

python

@router.get("/info")
async def get_ner_info():
    """Get NER service information"""
    if not model_loader.is_model_ready("ner"):
        return {
            "status": "not_ready",
            "message": "NER model not loaded"
        }
    
    ner_model = model_loader.models.get("ner")
    return {
        "status": "ready",
        "model_info": ner_model.get_model_info(),
        "supported_entities": [
            "PERPETRATOR", "VICTIM", "NAME", "LOCATION", 
            "LANDMARK", "PHONE_NUMBER", "GENDER", "AGE", 
            "INCIDENT_TYPE"
        ]
    }

Demo Endpoint:

python

@router.post("/demo")
async def demo_extraction():
    """Demonstrate NER extraction using sample text"""
    sample_text = "John reported an incident at Central Park. The victim was a 12-year-old girl. Contact number: 555-0123."
    
    if not model_loader.is_model_ready("ner"):
        raise HTTPException(status_code=503, detail="NER model not ready")
    
    ner_model = model_loader.models.get("ner")
    entities = ner_model.extract_entities(sample_text)
    
    return {
        "sample_text": sample_text,
        "entities": entities,
        "model_info": ner_model.get_model_info()
    }

2.4. Entity Extraction Strategy

The NER model implements intelligent entity extraction with confidence scoring:

Why Confidence Scoring is Important:

Child protection contexts require high-confidence entity identification
False positives can lead to inappropriate alerts or actions
Confidence scores enable downstream filtering and prioritization
Different entity types may require different confidence thresholds

Extraction Implementation:

python

def extract_entities(self, text: str, flat: bool = True) -> List[Dict]:
    """
    Extract entities with confidence scores
    
    Args:
        text: Input text to analyze
        flat: If True, return flat list; if False, return grouped entities
        
    Returns:
        List of entity dictionaries with text, label, position, and confidence
    """
    if not self.loaded:
        return []
    
    # Use transformer model if available
    if self.model and self.tokenizer:
        return self._extract_with_transformers(text, flat)
    
    # Fallback to spaCy
    if self.nlp:
        return self._extract_with_spacy(text, flat)
    
    return []

def _extract_with_transformers(self, text: str, flat: bool) -> List[Dict]:
    """Extract entities using DistilBERT model"""
    inputs = self.tokenizer(
        text, 
        return_tensors="pt", 
        truncation=True, 
        padding=True
    ).to(self.device)
    
    with torch.no_grad():
        outputs = self.model(**inputs)
        predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
        predicted_labels = torch.argmax(predictions, dim=-1)
    
    # Extract entities with confidence scores
    entities = []
    tokens = self.tokenizer.convert_ids_to_tokens(inputs["input_ids"][0])
    
    for idx, (token, label_id, probs) in enumerate(zip(
        tokens, 
        predicted_labels[0], 
        predictions[0]
    )):
        if label_id != 0:  # Not 'O' (outside entity)
            entity_label = self.model.config.id2label[label_id.item()]
            confidence = probs[label_id].item()
            
            entities.append({
                "text": token,
                "label": entity_label,
                "start": idx,
                "end": idx + 1,
                "confidence": round(confidence, 4)
            })
    
    # Group consecutive entities if flat=False
    if not flat:
        entities = self._group_entities(entities)
    
    return entities

Entity Grouping:

The model can optionally group consecutive tokens into complete entities:

python

def _group_entities(self, entities: List[Dict]) -> List[Dict]:
    """Group consecutive B- and I- tagged entities"""
    grouped = []
    current_entity = None
    
    for entity in entities:
        label = entity["label"]
        
        # Handle BIO tagging
        if label.startswith("B-"):
            # Start new entity
            if current_entity:
                grouped.append(current_entity)
            current_entity = entity.copy()
            current_entity["label"] = label[2:]  # Remove B- prefix
            
        elif label.startswith("I-") and current_entity:
            # Continue current entity
            current_entity["text"] += " " + entity["text"]
            current_entity["end"] = entity["end"]
            # Average confidence scores
            current_entity["confidence"] = (
                current_entity["confidence"] + entity["confidence"]
            ) / 2
        else:
            # Single token entity or O tag
            if current_entity:
                grouped.append(current_entity)
                current_entity = None
    
    if current_entity:
        grouped.append(current_entity)
    
    return grouped

2.5. Health Monitoring

The NER model integrates with the AI service health monitoring system (app/api/endpoints/health_routes.py):

Model Status Endpoint:

python

@router.get("/health/models")
async def models_health():
    """Get detailed model status with dependency info"""
    model_status = model_loader.get_model_status()
    ready_models = model_loader.get_ready_models()
    
    return {
        "timestamp": datetime.now().isoformat(),
        "summary": {
            "total": len(model_status),
            "ready": len(ready_models)
        },
        "ready_models": ready_models,
        "details": model_status
    }

Model Readiness States:

Ready: Model loaded and available for entity extraction
Implementable: Model can be loaded but not yet initialized
Blocked: Missing dependencies preventing model loading

Health Check Example Response:

json

{
  "ready_models": ["ner", "translator", "classifier"],
  "details": {
    "ner": {
      "loaded": true,
      "device": "cuda",
      "hf_repo_id": "openchs/ner_distillbert_v1",
      "fallback_available": true,
      "fallback_model": "en_core_web_lg",
      "supported_entities": 9
    }
  }
}

2.6. Pipeline Integration

The NER model integrates into two processing modes:

Real-time Processing

For live calls, NER works progressively on translated text:

python

# Progressive NER during active call
@router.get("/{call_id}/entities")
async def get_call_entities(call_id: str):
    """Get cumulative entities for active call"""
    analysis = progressive_processor.get_call_analysis(call_id)
    
    return {
        "call_id": call_id,
        "latest_entities": analysis.latest_entities,
        "entity_windows": len([w for w in analysis.windows if w.entities]),
        "high_confidence_entities": [
            e for e in analysis.latest_entities 
            if e["confidence"] > 0.85
        ]
    }

Real-time Flow:

Audio stream → Transcription → Translation (English)
Translated text → NER extraction
Entities (with confidence) → Classification
High-priority entities → Agent alerts

Post-call Processing

For completed calls, full pipeline execution:

Complete Transcript → Translation → NER → Entity Extraction
→ Classification → Summarization → Structured Report

Configuration for Pipeline Modes:

python

# Settings control pipeline behavior
realtime_enable_ner: bool = True
postcall_enable_full_pipeline: bool = True
ner_confidence_threshold: float = 0.7  # Filter low-confidence entities

2.7. Memory Management

The NER model implements automatic GPU memory management:

python

def _cleanup_memory(self):
    """Clean up GPU memory after extraction"""
    if torch.cuda.is_available():
        torch.cuda.empty_cache()
    gc.collect()

# Cleanup occurs:
# - After each extraction request
# - During batch processing
# - Before fallback attempts

3. Using the Model

3.1. Via AI Service API (Production Use)

The NER model is deployed as part of the AI Service and accessible via REST API.

Endpoint

POST /ner/extract

Request Format

Headers:

Content-Type: application/json

Request Body:

json

{
  "text": "string",
  "flat": true
}

Parameters:

text (required, string): The input text to analyze for named entities
flat (optional, boolean): Controls output format. If true, returns individual tokens; if false, groups consecutive entities. Default: true

Response Format

Success Response (200):

json

{
  "entities": [
    {
      "text": "string",
      "label": "string",
      "start": 0,
      "end": 0,
      "confidence": 0.95
    }
  ],
  "processing_time": 0,
  "model_info": {
    "model_path": "string",
    "hf_repo_id": "openchs/ner_distillbert_v1",
    "device": "cuda",
    "loaded": true,
    "fallback_model_name": "en_core_web_lg",
    "use_hf": true
  },
  "timestamp": "string"
}

Validation Error (422):

json

{
  "detail": [
    {
      "loc": ["string"],
      "msg": "string",
      "type": "string"
    }
  ]
}

Example cURL Request

Basic Entity Extraction:

bash

curl -X POST "https://your-api-domain.com/ner/extract" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "John reported an incident at Central Park. The victim was a 12-year-old girl. Contact number: 555-0123."
  }'

Response:

json

{
  "entities": [
    {
      "text": "John",
      "label": "NAME",
      "start": 0,
      "end": 4,
      "confidence": 0.92
    },
    {
      "text": "Central Park",
      "label": "LANDMARK",
      "start": 30,
      "end": 42,
      "confidence": 0.88
    },
    {
      "text": "12-year-old",
      "label": "AGE",
      "start": 61,
      "end": 72,
      "confidence": 0.85
    },
    {
      "text": "girl",
      "label": "GENDER",
      "start": 73,
      "end": 77,
      "confidence": 0.79
    },
    {
      "text": "555-0123",
      "label": "PHONE_NUMBER",
      "start": 95,
      "end": 103,
      "confidence": 0.94
    }
  ],
  "processing_time": 0.156,
  "model_info": {
    "hf_repo_id": "openchs/ner_distillbert_v1",
    "device": "cuda",
    "loaded": true
  },
  "timestamp": "2025-10-17T15:45:30.123456"
}

Grouped Entity Extraction:

bash

curl -X POST "https://your-api-domain.com/ner/extract" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Maria Garcia lives in New York City and works at Children Hospital.",
    "flat": false
  }'

Getting NER Info

Endpoint:

GET /ner/info

Response:

json

{
  "status": "ready",
  "model_info": {
    "hf_repo_id": "openchs/ner_distillbert_v1",
    "device": "cuda",
    "loaded": true,
    "fallback_available": true
  },
  "supported_entities": [
    "PERPETRATOR",
    "VICTIM",
    "NAME",
    "LOCATION",
    "LANDMARK",
    "PHONE_NUMBER",
    "GENDER",
    "AGE",
    "INCIDENT_TYPE"
  ]
}

Demo Endpoint

Endpoint:

POST /ner/demo

Response:

json

{
  "sample_text": "John reported an incident at Central Park...",
  "entities": [...],
  "model_info": {...}
}

Features

Confidence Scoring: Every entity includes a confidence score (0-1)
Flexible Output: Flat or grouped entity formats
Fallback Support: Automatic fallback to spaCy if transformer model unavailable
Real-time Processing: Optimized for low-latency extraction

3.2. Via Hugging Face Hub

The model is publicly available on Hugging Face for direct inference and fine-tuning.

Model Repository

Organization: openchs
Model: openchs/ner_distillbert_v1

Installation

bash

pip install transformers torch

Inference Example

Using Pipeline (Recommended):

python

from transformers import pipeline

# Load the NER pipeline
ner = pipeline(
    "ner",
    model="openchs/ner_distillbert_v1",
    aggregation_strategy="simple"
)

# Extract entities
text = "John reported an incident at Central Park. The victim was a 12-year-old girl."
entities = ner(text)

for entity in entities:
    print(f"{entity['word']} -> {entity['entity_group']} (confidence: {entity['score']:.2f})")

Using Model and Tokenizer Directly:

python

from transformers import AutoTokenizer, AutoModelForTokenClassification
import torch

# Load model and tokenizer
model_name = "openchs/ner_distillbert_v1"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForTokenClassification.from_pretrained(model_name)

# Prepare input
text = "Maria lives in Nairobi and called 0712345678 for help."
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)

# Get predictions
with torch.no_grad():
    outputs = model(**inputs)
    predictions = torch.argmax(outputs.logits, dim=-1)

# Decode predictions
tokens = tokenizer.convert_ids_to_tokens(inputs["input_ids"][0])
predicted_labels = [model.config.id2label[p.item()] for p in predictions[0]]

# Display results
for token, label in zip(tokens, predicted_labels):
    if label != "O":
        print(f"{token} -> {label}")

Batch Processing

python

from transformers import pipeline

ner = pipeline("ner", model="openchs/ner_distillbert_v1")

texts = [
    "John reported abuse at school.",
    "The victim is a 10-year-old boy from Nairobi.",
    "Contact emergency services at 999."
]

results = ner(texts)

for text, entities in zip(texts, results):
    print(f"\nText: {text}")
    for entity in entities:
        print(f"  {entity['word']} -> {entity['entity']}")

Note: When using the model directly from Hugging Face, you'll need to implement your own entity grouping and post-processing logic. The AI Service API handles this automatically with the flat parameter.

4. Production Considerations

Confidence Thresholds

High Priority Entities (PERPETRATOR, VICTIM): Recommend threshold ≥ 0.8
Contact Information (PHONE_NUMBER): Recommend threshold ≥ 0.85
General Entities (NAME, LOCATION): Recommend threshold ≥ 0.7
Contextual Entities (AGE, GENDER): Recommend threshold ≥ 0.6

Processing Time

Short texts (< 50 tokens): ~0.1-0.3 seconds
Medium texts (50-200 tokens): ~0.3-0.8 seconds
Long texts (> 200 tokens): ~0.8-1.5 seconds

Processing times vary based on:

GPU availability (CUDA vs CPU)
Text length and entity density
Flat vs grouped output format
System load

Entity Extraction Strategy

For Sensitive Content:

Initial Extraction: Extract all entities with confidence scores
Filtering: Apply domain-specific confidence thresholds
Validation: Cross-reference high-priority entities (PERPETRATOR, VICTIM)
Alerting: Trigger alerts only for high-confidence critical entities

For General Content:

Use lower confidence thresholds
Accept more entity types
Apply post-processing validation

Error Handling

Common Error Scenarios:

Empty Input:

json

{
  "detail": [
    {
      "loc": ["body", "text"],
      "msg": "Text input cannot be empty",
      "type": "value_error"
    }
  ]
}

Model Not Ready:

json

{
  "detail": "NER model not ready. Check /health/models for status."
}

Service Unavailable:

json

{
  "detail": "NER model not loaded"
}

Health Checks

Monitor NER service health:

bash

# Check if NER is ready
curl -X GET "https://your-api-domain.com/health/models"

# Get detailed system status
curl -X GET "https://your-api-domain.com/health/detailed"

# Test with demo endpoint
curl -X POST "https://your-api-domain.com/ner/demo"

5. Model Limitations

Domain Specificity

Optimized for: Child protection conversations, abuse reporting, helpline transcripts
May require adaptation for: General NER tasks, formal documents, non-English text
Performance varies on: Out-of-distribution data, highly technical terminology

Technical Constraints

Maximum Context: 512 tokens (DistilBERT limit)
Memory Requirements: GPU recommended for production (CPU fallback available)
Processing Speed: Dependent on hardware and text complexity

Known Considerations

Entity Ambiguity: Some names may be misclassified (e.g., NAME vs PERPETRATOR)
Context Dependency: Entity types depend on surrounding context
Overlapping Entities: Model may struggle with entities spanning multiple categories
Code-Switching: Mixed-language text may reduce accuracy

Recommendations

Monitor confidence scores and adjust thresholds based on your use case
Use health check endpoints to verify model readiness before critical operations
Implement entity validation for high-stakes scenarios (e.g., perpetrator identification)
Consider human review for entities with confidence scores below 0.8 in sensitive contexts
Review extractions for false positives in PERPETRATOR and VICTIM categories

6. Citation

If you use this model in your research or application, please cite:

bibtex

@misc{ner_distillbert_1,
  title={ner_distillbert},
  author={BITZ-AI TEAM},
  year={2025},
  publisher={Hugging Face},
  url={https://huggingface.co/openchs/ner_distillbert_v1}
}

7. Support and Contact

Issues and Questions

Email: info@bitz-itc.com
Hugging Face: openchs organization

Contributing

For dataset contributions, model improvements, or bug reports, please contact the BITZ AI Team at info@bitz-itc.com.

8. License

This model is released under the Apache 2.0 License, allowing for both commercial and non-commercial use with proper attribution.

Documentation last updated: October 17, 2025

Named Entity Recognition Model Documentation ​

1. Model Overview ​

Key Features ​

Integration in AI Service Pipeline ​

Supported Entity Types ​

2. Integration in AI Service Architecture ​

2.1. Configuration Layer ​

2.2. Model Loading and Initialization ​

2.3. API Endpoints Layer ​

2.4. Entity Extraction Strategy ​

2.5. Health Monitoring ​

2.6. Pipeline Integration ​

Real-time Processing ​

Post-call Processing ​

2.7. Memory Management ​

3. Using the Model ​

3.1. Via AI Service API (Production Use) ​

Endpoint ​

Request Format ​

Response Format ​

Example cURL Request ​

Getting NER Info ​

Demo Endpoint ​

Features ​

3.2. Via Hugging Face Hub ​

Model Repository ​

Installation ​

Inference Example ​

Batch Processing ​

4. Production Considerations ​

Confidence Thresholds ​

Processing Time ​

Entity Extraction Strategy ​

Error Handling ​

Health Checks ​

5. Model Limitations ​

Domain Specificity ​

Technical Constraints ​

Known Considerations ​

Recommendations ​

6. Citation ​

7. Support and Contact ​

Issues and Questions ​

Contributing ​

8. License ​

Named Entity Recognition Model Documentation

1. Model Overview

Key Features

Integration in AI Service Pipeline

Supported Entity Types

2. Integration in AI Service Architecture

2.1. Configuration Layer

2.2. Model Loading and Initialization

2.3. API Endpoints Layer

2.4. Entity Extraction Strategy

2.5. Health Monitoring

2.6. Pipeline Integration

Real-time Processing

Post-call Processing

2.7. Memory Management

3. Using the Model

3.1. Via AI Service API (Production Use)

Endpoint

Request Format

Response Format

Example cURL Request

Getting NER Info

Demo Endpoint

Features

3.2. Via Hugging Face Hub

Model Repository

Installation

Inference Example

Batch Processing

4. Production Considerations

Confidence Thresholds

Processing Time

Entity Extraction Strategy

Error Handling

Health Checks

5. Model Limitations

Domain Specificity

Technical Constraints

Known Considerations

Recommendations

6. Citation

7. Support and Contact

Issues and Questions

Contributing

8. License