1) Scope and Relation to Previous Work
This project extends Zero-Dose Prediction Using ML from a single-outcome setup to a unified three-outcome framework:
ZD(zero dose)Dropout BCG to MCVDPT1 to DPT3 dropout
Only fresh outputs from 202_final/retrain_outputs/*_fresh are used.
2) Data Selection and Sample Definition
The analytical framing follows the NFHS-style 12-23 month cohort logic you outlined.
Sample Filtering (Notation)
Working sample is S2 after complete-case filtering on model variables.
Data Sources
3) Predictor Definitions
Shared feature vector for each record:
Binary/categorical encodings follow the operational definitions used in retraining scripts.
4) Regional Partition
Let R = {india, north, east, west, south, northeast}.
Each outcome o is trained and evaluated separately in each region r.
5) Train/Eval Protocol
Per outcome and region, splits are generated using stratified shuffling:
Class proportions are preserved in each split to stabilize minority-class evaluation.
Scaling
Continuous predictors are standardized on train statistics:
6) Logistic Regression Baseline
LR with L1 regularization and balanced class weighting:
with class-balance weights:
7) Neural Network Comparator
Tuned MLP classifier:
Hyperparameter search spans depth, width, regularization, learning rate, batch-size, and training iterations.
8) Class-1-Focused Threshold Optimization
For each trained model candidate, probabilities are converted to labels using threshold tau:
Class-1 objective:
This ensures optimization is aligned to missed-child detection rather than majority-class accuracy.
9) NN Candidate Selection Score
NN candidates are ranked with minority-class emphasis and LR comparison:
The LR-margin bonus prioritizes NN candidates that genuinely exceed LR on class-1 performance.
10) Evaluation Metrics
11) National vs Regional Aggregation
Split-level metric:
Region summary:
National summary over regions (macro):
12) NN Feature Importance Mathematics
Permutation AP-drop for feature j:
where AP is average precision on held-out data and X_perm(j) shuffles feature j to destroy its signal.
Cross-Outcome Alignment
13) Required Comparison Views and Generated Charts
- National LR vs NN comparison for all outcomes.
- Inter-regional NN feature-importance heatmaps.
- Region-wise cross-outcome NN importance comparison.
- National cross-outcome NN importance comparison.
- Outcome-similarity by region.
14) Feature-Importance Story Pack Outputs
Generated under retrain_outputs/feature_importance_story/:
15) Reproducibility and Change Log
New/Updated Scripts (2026-02-25)
202_final/retrain_zd.py202_final/retrain_mcv.py202_final/retrain_dpt.py202_final/retrain_core.py202_final/build_cross_outcome_visuals.py202_final/build_feature_importance_story.py
Run Commands
This page is intentionally math-explicit so the project logic is transparent from data definition through modeling, metric design, and interpretation.