1) What We Did
- Built zero-dose risk models using 10 predictors.
- Trained Logistic Regression (L1, class-weighted) and Neural Network (focal loss + class weights).
- Modeled regions separately: North, South, East, West, Northeast.
- Used stratified splits and PR-curve thresholding for minority class.
- Generated feature importance using LR coefficients, LOFO, and permutation analysis.
Operational Definition
Zero-dose means no DTP-containing first dose (no-DTP1; IA2030-aligned proxy).
2) Predictor Set
Outcome:
3) Class Prevalence by Region
| Region | ZD=1 / Total | Prevalence |
|---|---|---|
| North | 1015 / 14303 | 7.10% |
| South | 268 / 5662 | 4.73% |
| East | 515 / 10069 | 5.11% |
| West | 463 / 6883 | 6.73% |
| Northeast | 715 / 6134 | 11.66% |
| Overall | 2976 / 43051 | 6.91% |
4) Logistic Baseline
with class_weight='balanced' (minority up-weighting).
5) Neural Comparator
Training uses focal-loss emphasis and class weights to prioritize hard minority examples.
6) Threshold Selection and Validation
Threshold is chosen from training PR curve by maximizing F-beta objective, then applied to held-out tests.
Implemented betas: LR about 2.0, NN about 1.85 (class-1 focus).
7) Metrics Driving Model Choice
Primary emphasis: minority class precision-recall tradeoff, not plain accuracy.
8) Explainability
Global explainability:
- LR coefficient direction/magnitude.
- LOFO importance.
- Permutation importance.
9) Region-Specific Generalization and Practical Insights
Separate regional models prevent a single pooled model from masking heterogeneity. Stable high-signal factors include Rural, Deprived, Maternal_Illiteracy, NoAntenatalCare, and UnassistedBirth.
Failure modes in low-support settings include threshold instability and precision loss due to low class-1 support.
10) Current Gaps to Acknowledge
- No full formal fairness audit yet (e.g., equal opportunity gaps across protected subgroups).
- Need ongoing calibration and drift monitoring for repeated deployment.
Drift Monitoring
Track prevalence drift, feature drift, PR/F1 drift, calibration drift, and subgroup gap drift.
11) Intervention Workflow After High-Risk Flag
- Generate district/block high-risk list.
- ASHA/ANM verification (card + recall).
- Household outreach and catch-up scheduling.
- Reminder/defaulter follow-up.
- Closure logging and feedback loop to model retraining.
12) Why This is Operationally Useful
The model identifies children who never entered the immunization pathway, allowing programs to prioritize first-contact outreach before schedule-completion interventions.
Extension project: this work is expanded in Insights into NN Across National and Regional Immunisation Outcomes.