Back to Projects

Project 04 · Completed · Geospatial Vaccine Modelling

Geo-Spatial Modelling of Vaccine Coverage

Cluster-level NFHS-4 and NFHS-5 Bayesian geospatial pipeline with INLA-SPDE and spGLM, including explicit maternal recall sensitivity (MR=0 vs MR=1), uncertainty mapping, and dropout-focused planning outputs.

NFHS-4 / NFHS-5 INLA-SPDE spGLM MR Sensitivity

Project Recap

Tap a floating bubble to open a quick answer popup.

Basic Method Interpretation

Pipeline Outputs

What the modeling pipeline produces for policy and program action

Spatial Bayesian Pipeline

DHS/NFHS clusters -> raster predictions

Coverage Surfaces

Continuous vaccination coverage maps

Uncertainty Layers

Posterior standard deviation layers

Dropout Mapping

Multi-dose dropout maps

State Diagnostics

State-level evaluation summaries

Scenario Contrast

Card-only vs card + maternal recall comparisons

Quick Insight Quiz

This framework moves beyond aggregate coverage metrics to deliver localized, uncertainty-calibrated insights for targeted immunization action.

2) Interactive Mesh and Sensitivity Lab

Use this lab to test how modelling choices change spatial fit, uncertainty, and interpretation in real time.

The point layer is built from child-level vaccination responses linked with mapped survey clusters for children aged 12-23 months who were alive at interview. Vaccination status is computed from documented card evidence and, when selected, maternal recall.

State boundaries come from official geographic polygons. Child records and cluster coordinates are joined through the shared cluster identifier, then summarized into cluster-level vaccinated versus unvaccinated counts for each survey round.

Click a quick question to see what each control changes and how that impacts interpretation.

Data Model Mesh Action

Point legend: green = vaccinated=1 majority in cluster, red = unvaccinated majority (vaccinated=0). Radius scales with cluster sample size.

Mesh Nodes (Est.)

-

-

Mean Edge (km)

-

-

Expected RMSE

-

-

Uncertainty Index U

-

-

Coldspot Stability

-

-

Expected PPC95

-

-

Clusters (Actual Data)

-

Selected survey geolocated clusters

Children (n)

-

Selected state + vaccine

Vaccinated (Vaccinated=1)

-

-

Unvaccinated (Vaccinated=0)

-

-

Cluster Mean Coverage

-

Mean of cluster-level rates

Uncertainty by Mesh Resolution (Current State/Model/Regime)

Sparse -
Medium -
Fine -

What Varies and Why

    3) Spatial Scale and Why It Works

    This pipeline is interpretable because each scale answers a different operational question, from observed evidence to planning-level allocation.

    Cluster Evidence Scale

    At survey-cluster points, we observe children with binary vaccine status. This is the direct field signal.

    Observed cluster proportion

    p^ i = yi ni

    Interpretation: where ni is small, raw proportions are noisy and should not be used alone.

    Grid Prediction Scale

    Predictions are generated on dense grid cells inside state boundaries, combining covariates and spatial structure.

    Predicted grid probability

    p^ g = logit -1 ( xgT β + w(sg) + ustate(g) + udistrict(g) )

    Interpretation: this surface reveals hyper-local continuity gaps that state averages hide.

    State Summary Scale

    Grid outputs are aggregated into state-level summaries for planning and inter-state comparison.

    Population-weighted state estimate

    P^ state = gstate Ng p^g gstate Ng

    Interpretation: summary values are policy-facing, while maps preserve local targeting fidelity.

    Why This Scale Design Works

    • Signal retention: cluster-level observations preserve real survey variation.
    • Noise control: spatial pooling reduces instability in sparse areas.
    • Actionability: grid maps guide micro-targeting; state summaries guide budgeting and monitoring.
    • Interpretability: each output can be traced back to observed child-level evidence.

    4) Core Statistical Model: spGLM vs INLA-SPDE

    This section is interactive by design. Read the basic explanation first, then tap a question to open detailed mathematical modeling, comparison logic, and strengths-versus-limitations.

    spGLM (Gaussian Process Logistic Model)

    Basic explanation: spGLM models vaccination probability with a logistic link and a spatial Gaussian process that captures distance-driven similarity between clusters.

    Core predictor form

    logit(pi) = xiTβ +w(si)

    INLA-SPDE (Mesh-Based Bayesian Approximation)

    Basic explanation: INLA-SPDE represents the spatial field on a triangulated mesh and performs fast approximate Bayesian inference using sparse precision matrices.

    Core predictor form

    ηi = xiTβ +Ai·ω , pi=logit-1(ηi)

    Compare Both Methods

    Comparing both is essential for robustness. Agreement indicates stable signal, while disagreement signals model-sensitive or data-sparse areas that need cautious interpretation.

    Dimension spGLM INLA-SPDE
    Spatial representation Dense Gaussian process covariance Matérn field via sparse mesh representation
    Inference style Posterior sampling / optimization over GP structure Integrated Nested Laplace Approximation
    Scaling profile Heavier for large cluster counts Typically faster for large geographies
    Primary risk Compute burden Approximation and mesh-choice sensitivity

    Decision rule: prioritize hotspots that remain high-risk under both models; route high-disagreement locations to uncertainty-aware field validation.

    5) Spatial Dependence and Mesh Construction

    Adjacency is encoded through coordinate-driven mesh projection, not polygon-neighbor lists.

    M=(V,T) , V=mesh vertices , T=triangles
    A=A(loc,M)
    w(s) k=1K ψk(s) ωk

    Raster multilevel labels are propagated through nearest-neighbor assignment for state/district factors.

    6) Matérn-SPDE Formulation

    (κ2-Δ) α/2 w(s) = W(s)
    ω N ( 0, Q-1 (κ,τ) )

    This produces a sparse Gaussian Markov random field representation compatible with INLA.

    Boundary/Sparse Handling

    • Mesh clipped to raster boundary polygon.
    • offset, cutoff, and max.edge tuned for stable triangulation.
    • NA raster cells excluded from prediction domain.

    7) Covariates: Spatial vs Non-Spatial

    Spatial component: latent field driven by longitude/latitude through SPDE mesh.

    Non-spatial component: cluster-aggregated socioeconomic and service-use predictors (wealth, education, access barriers, maternal-care proxies), with optional iid hierarchical effects for State/District/Region.

    8) Uncertainty Mapping

    Prediction exports include:

    • posterior mean
    • standard deviation
    • 2.5th and 97.5th posterior quantiles
    • relative uncertainty ratio
    Ug = sdg μg

    High U_g indicates unstable estimates requiring cautious interpretation.

    9) Diagnostics and Fit Comparison

    Current diagnostics include coverage/error/correlation summaries and CI checks; explicit residual Moran's I/variogram diagnostics are not yet fully automated in the committed pipeline.

    RMSE= 1N i=1N (yi-y^i) 2
    Corr=cor(y,y^)
    Coverage95= 1N i=1N 1 [ yi [q0.025,i,q0.975,i] ]

    10) Maternal Recall Sensitivity (MR=0 vs MR=1)

    The pipeline explicitly tests data-source sensitivity by toggling maternal recall inclusion. For MCV1, MR=1 improved fit and reduced uncertainty in both NFHS rounds:

    Survey Metric MR=0 MR=1
    NFHS-4 RMSE 0.388 0.295
    NFHS-4 Uncertainty 0.443 0.241
    NFHS-5 RMSE 0.272 0.253
    NFHS-5 Uncertainty 0.254 0.173

    11) Persistent Coldspots Across Survey Rounds

    MCV1 Persistent Low-Coverage States (MR=1)

    Arunachal Pradesh, Assam, Manipur, Meghalaya, Mizoram, Nagaland, Tripura, Uttar Pradesh.

    DPT3 Persistent Low-Coverage States (MR=1)

    Arunachal Pradesh, Assam, Meghalaya, Mizoram, Nagaland, Tripura, Uttar Pradesh.

    These persistent patterns are useful for continuity-focused policy planning across survey waves.

    12) Current Gaps and Risk Sources

    • No fully automated prior-sensitivity sweep in the current committed workflow.
    • No full posterior variance-partition report for spatial-effect share yet.
    • Largest data-quality risk: cluster geolocation uncertainty and key harmonization mismatch.

    13) Planning Integration and First-Use Map

    Pipeline already exports planning-ready raster/TIFF products: coverage, uncertainty, missed-cluster and error maps.

    First map to use: DPT1 to DPT3 dropout probability map under MR=1, with uncertainty overlay. This is operationally high-value because it isolates service-continuity failure after first contact.

    14) One-Sentence Policy Contribution

    The project provides uncertainty-aware, cluster-to-grid geospatial estimates that distinguish stable versus unstable low-coverage pockets, enabling state programs to prioritize continuity gaps and maternal-recall-sensitive hotspots with greater confidence.