Project 04 - Geo-Spatial Modelling of Vaccine Coverage

Project Recap

Tap a floating bubble to open a quick answer popup.

Basic Method Interpretation

Pipeline Outputs

What the modeling pipeline produces for policy and program action

Spatial Bayesian Pipeline

DHS/NFHS clusters -> raster predictions

Coverage Surfaces

Continuous vaccination coverage maps

Uncertainty Layers

Posterior standard deviation layers

Dropout Mapping

Multi-dose dropout maps

State Diagnostics

State-level evaluation summaries

Scenario Contrast

Card-only vs card + maternal recall comparisons

Quick Insight Quiz

This framework moves beyond aggregate coverage metrics to deliver localized, uncertainty-calibrated insights for targeted immunization action.

2) Interactive Mesh and Sensitivity Lab

Use this lab to test how modelling choices change spatial fit, uncertainty, and interpretation in real time.

The point layer is built from child-level vaccination responses linked with mapped survey clusters for children aged 12-23 months who were alive at interview. Vaccination status is computed from documented card evidence and, when selected, maternal recall.

State boundaries come from official geographic polygons. Child records and cluster coordinates are joined through the shared cluster identifier, then summarized into cluster-level vaccinated versus unvaccinated counts for each survey round.

Click a quick question to see what each control changes and how that impacts interpretation.

Data Model Mesh Action

State Survey Model Family Data Source Regime Vaccine (Actual 0/1 Layer) Mesh Resolution Medium Mesh Offset 0.25 Mesh Cutoff 0.08

Point legend: green = vaccinated=1 majority in cluster, red = unvaccinated majority (vaccinated=0). Radius scales with cluster sample size.

Mesh Nodes (Est.)

-

Mean Edge (km)

-

Expected RMSE

-

Uncertainty Index U

-

Coldspot Stability

-

Expected PPC95

-

Clusters (Actual Data)

-

Selected survey geolocated clusters

Children (n)

-

Selected state + vaccine

Vaccinated (Vaccinated=1)

-

Unvaccinated (Vaccinated=0)

-

Cluster Mean Coverage

-

Mean of cluster-level rates

Uncertainty by Mesh Resolution (Current State/Model/Regime)

Sparse -

Medium -

Fine -

What Varies and Why

3) Spatial Scale and Why It Works

This pipeline is interpretable because each scale answers a different operational question, from observed evidence to planning-level allocation.

Cluster Evidence Scale

At survey-cluster points, we observe children with binary vaccine status. This is the direct field signal.

Observed cluster proportion

{\hat{p}}_{i} = \frac{y_{i}}{n_{i}}

Interpretation: where n_i is small, raw proportions are noisy and should not be used alone.

Grid Prediction Scale

Predictions are generated on dense grid cells inside state boundaries, combining covariates and spatial structure.

Predicted grid probability

{\hat{p}}_{g} = {logit}^{- 1} ({x_{g}}^{T} β + w (s_{g}) + u_{state (g)} + u_{district (g)})

Interpretation: this surface reveals hyper-local continuity gaps that state averages hide.

State Summary Scale

Grid outputs are aggregated into state-level summaries for planning and inter-state comparison.

Population-weighted state estimate

{\hat{P}}_{state} = \frac{\sum_{g \in state} N_{g} {\hat{p}}_{g}}{\sum_{g \in state} N_{g}}

Interpretation: summary values are policy-facing, while maps preserve local targeting fidelity.

Why This Scale Design Works

Signal retention: cluster-level observations preserve real survey variation.
Noise control: spatial pooling reduces instability in sparse areas.
Actionability: grid maps guide micro-targeting; state summaries guide budgeting and monitoring.
Interpretability: each output can be traced back to observed child-level evidence.

4) Core Statistical Model: spGLM vs INLA-SPDE

This section is interactive by design. Read the basic explanation first, then tap a question to open detailed mathematical modeling, comparison logic, and strengths-versus-limitations.

spGLM (Gaussian Process Logistic Model)

Basic explanation: spGLM models vaccination probability with a logistic link and a spatial Gaussian process that captures distance-driven similarity between clusters.

Core predictor form

logit (p_{i}) = {x_{i}}^{T} β + w (s_{i})

INLA-SPDE (Mesh-Based Bayesian Approximation)

Basic explanation: INLA-SPDE represents the spatial field on a triangulated mesh and performs fast approximate Bayesian inference using sparse precision matrices.

Core predictor form

η_{i} = {x_{i}}^{T} β + A_{i \cdot} ω, p_{i} = {logit}^{- 1} (η_{i})

Compare Both Methods

Comparing both is essential for robustness. Agreement indicates stable signal, while disagreement signals model-sensitive or data-sparse areas that need cautious interpretation.

Dimension	spGLM	INLA-SPDE
Spatial representation	Dense Gaussian process covariance	Matérn field via sparse mesh representation
Inference style	Posterior sampling / optimization over GP structure	Integrated Nested Laplace Approximation
Scaling profile	Heavier for large cluster counts	Typically faster for large geographies
Primary risk	Compute burden	Approximation and mesh-choice sensitivity

Decision rule: prioritize hotspots that remain high-risk under both models; route high-disagreement locations to uncertainty-aware field validation.

5) Spatial Dependence and Mesh Construction

Adjacency is encoded through coordinate-driven mesh projection, not polygon-neighbor lists.

M = (V, T), V = mesh vertices, T = triangles

A = A (loc, M)

w (s) \approx \sum_{k = 1}^{K} ψ_{k} (s) ω_{k}

Raster multilevel labels are propagated through nearest-neighbor assignment for state/district factors.

6) Matérn-SPDE Formulation

{(κ^{2} - Δ)}^{α / 2} w (s) = W (s)

ω \sim N (0, Q^{- 1} (κ, τ))

This produces a sparse Gaussian Markov random field representation compatible with INLA.

Boundary/Sparse Handling

Mesh clipped to raster boundary polygon.
offset, cutoff, and max.edge tuned for stable triangulation.
NA raster cells excluded from prediction domain.

7) Covariates: Spatial vs Non-Spatial

Spatial component: latent field driven by longitude/latitude through SPDE mesh.

Non-spatial component: cluster-aggregated socioeconomic and service-use predictors (wealth, education, access barriers, maternal-care proxies), with optional iid hierarchical effects for State/District/Region.

8) Uncertainty Mapping

Prediction exports include:

posterior mean
standard deviation
2.5th and 97.5th posterior quantiles
relative uncertainty ratio

U_{g} = \frac{{sd}_{g}}{μ_{g}}

High U_g indicates unstable estimates requiring cautious interpretation.

9) Diagnostics and Fit Comparison

Current diagnostics include coverage/error/correlation summaries and CI checks; explicit residual Moran's I/variogram diagnostics are not yet fully automated in the committed pipeline.

RMSE = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} {(y_{i} - {\hat{y}}_{i})}^{2}}

Corr = cor (y, \hat{y})

{Coverage}_{95} = \frac{1}{N} \sum_{i = 1}^{N} 1 [y_{i} \in [q_{0.025, i}, q_{0.975, i}]]

10) Maternal Recall Sensitivity (MR=0 vs MR=1)

The pipeline explicitly tests data-source sensitivity by toggling maternal recall inclusion. For MCV1, MR=1 improved fit and reduced uncertainty in both NFHS rounds:

Survey	Metric	MR=0	MR=1
NFHS-4	RMSE	0.388	0.295
NFHS-4	Uncertainty	0.443	0.241
NFHS-5	RMSE	0.272	0.253
NFHS-5	Uncertainty	0.254	0.173

11) Persistent Coldspots Across Survey Rounds

MCV1 Persistent Low-Coverage States (MR=1)

Arunachal Pradesh, Assam, Manipur, Meghalaya, Mizoram, Nagaland, Tripura, Uttar Pradesh.

DPT3 Persistent Low-Coverage States (MR=1)

Arunachal Pradesh, Assam, Meghalaya, Mizoram, Nagaland, Tripura, Uttar Pradesh.

These persistent patterns are useful for continuity-focused policy planning across survey waves.

12) Current Gaps and Risk Sources

No fully automated prior-sensitivity sweep in the current committed workflow.
No full posterior variance-partition report for spatial-effect share yet.
Largest data-quality risk: cluster geolocation uncertainty and key harmonization mismatch.

13) Planning Integration and First-Use Map

Pipeline already exports planning-ready raster/TIFF products: coverage, uncertainty, missed-cluster and error maps.

First map to use: DPT1 to DPT3 dropout probability map under MR=1, with uncertainty overlay. This is operationally high-value because it isolates service-continuity failure after first contact.

14) One-Sentence Policy Contribution

The project provides uncertainty-aware, cluster-to-grid geospatial estimates that distinguish stable versus unstable low-coverage pockets, enabling state programs to prioritize continuity gaps and maternal-recall-sensitive hotspots with greater confidence.