Back to Projects

Project 04 · Completed · Geospatial Vaccine Modelling

Geo-Spatial Modelling of Vaccine Coverage

Cluster-level NFHS-4 and NFHS-5 Bayesian geospatial pipeline with INLA-SPDE and spGLM, including explicit maternal recall sensitivity (MR=0 vs MR=1), uncertainty mapping, and dropout-focused planning outputs.

NFHS-4 / NFHS-5 INLA-SPDE spGLM MR Sensitivity

1) Project Recap

This pipeline trains on DHS/NFHS cluster points and predicts onto raster grids to produce hyper-local coverage surfaces, uncertainty layers, dropout maps, and state-level evaluation summaries.

Primary modeling targets include BCG, DPT, and MCV dose outcomes across NFHS-4 (2015-2016) and NFHS-5 (2019-2021), with direct comparison between card-only and card+maternal-recall scenarios.

2) Interactive Mesh and Sensitivity Lab

Use this panel to interactively vary state, mesh granularity, model family, and maternal-recall regime. The mesh preview and metrics update in real time to show how uncertainty and fit move with each design choice.

The 0/1 vaccination point layer is generated from DHS birth-recode pipeline outputs (summary export in data/cluster-vax-summary.json).

State boundaries are loaded from real GeoJSON shape files in data/india-states/.

  • DHS birth recode input: KidsRecode/IAKR7DFL.SAV filtered to age 12-23 months and alive children.
  • Per-child 0/1 vaccination computed exactly as card-or-maternal-recall for each vaccine indicator.
  • Cluster GPS merge done via DHSCLUST = ClusterID from ShapeFile/IAGE7AFL.shp, then exported to Cluster_data.csv.
  • This page uses those generated NFHS4_IndividualData.csv/NFHS5_IndividualData.csv plus Cluster_data.csv to build cluster-level 0/1 summaries.

Point legend: green = vaccinated=1 majority in cluster, red = unvaccinated majority (vaccinated=0). Radius scales with cluster sample size.

Mesh Nodes (Est.)

-

-

Mean Edge (km)

-

-

Expected RMSE

-

-

Uncertainty Index U

-

-

Coldspot Stability

-

-

Expected PPC95

-

-

Clusters (Actual Data)

-

Selected survey geolocated clusters

Children (n)

-

Selected state + vaccine

Vaccinated (Vaccinated=1)

-

-

Unvaccinated (Vaccinated=0)

-

-

Cluster Mean Coverage

-

Mean of cluster-level rates

Uncertainty by Mesh Resolution (Current State/Model/Regime)

Sparse -
Medium -
Fine -

What Varies and Why

    3) Spatial Scale and Why It Works

    Scale

    • Training: survey cluster coordinates.
    • Prediction: boundary-constrained raster grid.
    • Reporting: state-level aggregated summaries.

    Signal vs Noise Tradeoff

    Cluster-level inputs preserve observed survey signal. Latent spatial smoothing regularizes sparse/noisy areas before rasterized outputs are exported for planning.

    4) Core Statistical Model

    For cluster i, vaccine outcome v, period t, maternal-recall regime m in {0,1}:

    y_{i,v,t}^{(m)} ~ Binomial(n_{i,v,t}, p_{i,v,t}^{(m)})
    logit(p_{i,v,t}^{(m)}) = x_i^T beta_{v,t,m} + w_{v,t,m}(s_i) + b_{state(i)} + b_{district(i)}

    where s_i = (lon_i, lat_i) and w(·) is the latent spatial field.

    5) Spatial Dependence and Mesh Construction

    Adjacency is encoded through coordinate-driven mesh projection, not polygon-neighbor lists.

    M = (V, T), V = mesh vertices, T = triangles
    A = A(loc, M) (projection matrix from cluster/raster locations to mesh basis)
    w(s) approx sum_{k=1}^{K} psi_k(s) * omega_k

    Raster multilevel labels are propagated through nearest-neighbor assignment for state/district factors.

    6) Matérn-SPDE Formulation

    (kappa^2 - Delta)^(alpha/2) w(s) = W(s)
    omega ~ N(0, Q(kappa, tau)^{-1})

    This produces a sparse Gaussian Markov random field representation compatible with INLA.

    Boundary/Sparse Handling

    • Mesh clipped to raster boundary polygon.
    • offset, cutoff, and max.edge tuned for stable triangulation.
    • NA raster cells excluded from prediction domain.

    7) Covariates: Spatial vs Non-Spatial

    Spatial component: latent field driven by longitude/latitude through SPDE mesh.

    Non-spatial component: cluster-aggregated socioeconomic and service-use predictors (wealth, education, access barriers, maternal-care proxies), with optional iid hierarchical effects for State/District/Region.

    8) Uncertainty Mapping

    Prediction exports include:

    • posterior mean
    • standard deviation
    • 2.5th and 97.5th posterior quantiles
    • relative uncertainty ratio
    U_g = sd_g / mu_g

    High U_g indicates unstable estimates requiring cautious interpretation.

    9) Diagnostics and Fit Comparison

    Current diagnostics include coverage/error/correlation summaries and CI checks; explicit residual Moran's I/variogram diagnostics are not yet fully automated in the committed pipeline.

    RMSE = sqrt((1/N) * sum_i (y_i - y_hat_i)^2)
    Corr = cor(y, y_hat)
    Coverage95 = (1/N) * sum_i 1[y_i in [q0.025_i, q0.975_i]]

    10) Maternal Recall Sensitivity (MR=0 vs MR=1)

    The pipeline explicitly tests data-source sensitivity by toggling maternal recall inclusion. For MCV1, MR=1 improved fit and reduced uncertainty in both NFHS rounds:

    Survey Metric MR=0 MR=1
    NFHS-4 RMSE 0.388 0.295
    NFHS-4 Uncertainty 0.443 0.241
    NFHS-5 RMSE 0.272 0.253
    NFHS-5 Uncertainty 0.254 0.173

    11) Persistent Coldspots Across Survey Rounds

    MCV1 Persistent Low-Coverage States (MR=1)

    Arunachal Pradesh, Assam, Manipur, Meghalaya, Mizoram, Nagaland, Tripura, Uttar Pradesh.

    DPT3 Persistent Low-Coverage States (MR=1)

    Arunachal Pradesh, Assam, Meghalaya, Mizoram, Nagaland, Tripura, Uttar Pradesh.

    These persistent patterns are useful for continuity-focused policy planning across survey waves.

    12) Current Gaps and Risk Sources

    • No fully automated prior-sensitivity sweep in the current committed workflow.
    • No full posterior variance-partition report for spatial-effect share yet.
    • Largest data-quality risk: cluster geolocation uncertainty and key harmonization mismatch.

    13) Planning Integration and First-Use Map

    Pipeline already exports planning-ready raster/TIFF products: coverage, uncertainty, missed-cluster and error maps.

    First map to use: DPT1 to DPT3 dropout probability map under MR=1, with uncertainty overlay. This is operationally high-value because it isolates service-continuity failure after first contact.

    14) One-Sentence Policy Contribution

    The project provides uncertainty-aware, cluster-to-grid geospatial estimates that distinguish stable versus unstable low-coverage pockets, enabling state programs to prioritize continuity gaps and maternal-recall-sensitive hotspots with greater confidence.