Data & datasets

Everything behind this site is open — plus a practical catalog of the large datasets a serious depression-exposome program can mine.

Download this project's data

Large datasets to mine

Mega-cohorts / biobanks

UK Biobank (~500k; genetics, 1,500-field exposome, imaging, metabolomics) · All of Us (~800k, diverse, EHR+WGS) · MoBa, ALSPAC, Generation Scotland (developmental/family) · NESDA (deep biomarkers).

Epidemiology / surveillance

NHANES (open; ~250 chemical analytes + PHQ-9 — the chemical-exposome engine) · IHME GBD / GHDx & WHO GHO (country burden).

Genetics

PGC MDD summary statistics (open; MR instruments) · FinnGen (register-linked) · Million Veteran Program (diverse).

Microbiome

American Gut / Microsetta · Dutch Microbiome Project / LifeLines — for MbWAS and microbiome–metabolome–depression triangulation.

Environmental layers

US EPA air quality · ACAG global satellite PM2.5 · VIIRS night-lights (light-at-night) — link to any geocoded cohort.

Exposome initiatives

EXPANSE (EU, tens of millions) · HHEAR (NIEHS untargeted chemical profiling) — ExWAS at scale.

Best triangulation: ExWAS discovery in NHANES → replicate in UK Biobank → diverse replication in All of Us; MR using PGC + FinnGen + MVP instruments (check cross-ancestry concordance); developmental chain MoBa → ALSPAC → ABCD.

Data dictionary — factor matrix

Licensing: factor summaries, grades and map notes are released for reuse with attribution (Objektiv AI · Claude). WHO map data © WHO (GHO, public). Underlying studies remain © their publishers — follow each source link.