NeuroSTORM supports a wide range of publicly available fMRI datasets for both pre-training and downstream analysis. The table below summarizes key characteristics of each dataset, including the subject number, male/female ratio, spatial resolution, TR, and official homepage.
UK Biobank (UKB):
A large-scale prospective study from the UK containing health, genetic, and neuroimaging data of over 40,000 middle-aged participants. fMRI is acquired at 2.4mm isotropic resolution (TR=735ms).
The UK Biobank (UKB) is one of the world's largest population-based health resource projects, comprising extensive genetic, clinical, lifestyle, and imaging data from over 500,000 participants, of which more than 40,000 have multimodal brain MRI—including both resting-state and task-based fMRI scans. Initiated between 2006 and 2010, UKB focuses on adults aged 40–69, with repeated imaging on a subset, enabling longitudinal analyses. fMRI data are acquired on Siemens Skyra 3T scanners, with a resolution of 2.4×2.4×2.4 mm³, and a fast TR of 735ms. The dataset includes both rsfMRI (6 min scan) and tfMRI (motor, emotion, social, gambling, and relational tasks), plus comprehensive demographic, cognitive, and health phenotypes. Standardized preprocessing pipelines (including motion correction, ICA-FIX denoising, and registration to MNI152) are publicly available. UKB is widely used for large-scale brain-behavior association studies, disease risk modeling, and neuroimaging foundation model pre-training.
Dataset Name | UK Biobank (UKB) |
fMRI Types | rsfMRI and tfMRI |
tfMRI Tasks | Emotion, Gambling, Motor, Relational, Social |
Age Range | 40-69 years |
Gender Ratio | ~46% male, ~54% female |
Patient/Control | Population-based (includes healthy and various disease cases) |
Disease Types | Various (not a patient cohort, but disease info available) |
Used in Tasks | Pre-training, Task 1 (age/gender prediction) |
Sample Size | ~40,842 with fMRI |
Adolescent Brain Cognitive Development (ABCD):
Longitudinal neuroimaging of ~9,500 children in the US, with multimodal data and 2.4mm/800ms fMRI scans.
The ABCD Study is the largest long-term study of brain development and child/adolescent health in the US. It follows over 11,800 children (9–10 at baseline) through adolescence, with repeated multimodal MRI, cognitive, behavioral, genetic, and environmental data collection. Neuroimaging includes high-resolution rsfMRI (2.4mm³, TR=800ms) and tfMRI (emotional, reward, cognitive tasks), harmonized across 21 sites and 3 major scanner vendors. Imaging pipelines are derived from HCP preprocessing (motion correction, normalization, artifact removal), with ROI time-series available (e.g., Schaefer atlas). ABCD supports studies of typical development, neuropsychiatric risk, and gene–brain–behavior relationships.
Dataset Name | ABCD |
fMRI Types | rsfMRI and tfMRI |
tfMRI Tasks | Emotional n-back, Reward, Stop-signal, Monetary Incentive Delay |
Age Range | 9–13 years (at latest release) |
Gender Ratio | ~52% male, ~48% female |
Patient/Control | Community sample (includes healthy and at-risk youth) |
Disease Types | Not specific, but behavioral/clinical phenotypes available |
Used in Tasks | Pre-training, Task 1 (age/gender) |
Sample Size | ~9,448 with fMRI |
Human Connectome Project – Young Adult (HCP-YA), Aging (HCP-A), Development (HCP-D):
Three high-resolution (2mm) public datasets for mapping brain structure and function across the lifespan (children, young adults, elderly).
The HCP is an NIH initiative to map human brain connectivity with unprecedented detail. Three major lifespan datasets are:
- HCP-YA: 1,206 healthy young adults (22–37y), scanned at 3T and 7T. Imaging includes rsfMRI (2mm³, TR=720ms; 1 hour per subject), tfMRI (7 tasks: working memory, emotion, language, motor, gambling, relational, social), and dMRI. Extensively preprocessed: motion correction, ICA-FIX, MNI registration, surface/volumetric data.
- HCP-A: 725 adults aged 36–100, using similar MRI protocols. Focused on typical aging and age-related brain changes.
- HCP-D: 652 children and adolescents (ages 5–21), using harmonized imaging, enables developmental connectomics.
All datasets include rich behavioral, cognitive, and demographic data. Used widely for benchmarking machine learning, connectomics, and lifespan brain research.
Dataset Name | HCP-YA, HCP-A, HCP-D |
fMRI Types | rsfMRI and tfMRI |
tfMRI Tasks | Emotion, Gambling, Language, Motor, Relational, Social, Working Memory |
Age Range | HCP-YA: 22–37; HCP-A: 36–100; HCP-D: 5–21 |
Gender Ratio | ~47% male, ~53% female (YA); similar balance in others |
Patient/Control | Healthy volunteers |
Disease Types | None (controls only) |
Used in Tasks | Pre-training, Task 1 (age/gender), Task 2 (phenotype), Task 5 (tfMRI state classification) |
Sample Size | HCP-YA: 1,206; HCP-A: 725; HCP-D: 652 |
Human Connectome Project – Early Psychosis (HCP-EP):
fMRI/clinical data for early psychosis research (252 subjects), 2mm, TR=800ms.
The HCP-EP dataset focuses on individuals in the early phases (within 5 years) of psychotic disorders, including both affective and non-affective psychoses, and matched healthy controls. Participants (ages 16–35) are clinically characterized, with rsfMRI (2mm³, TR=800ms) and full neurocognitive/clinical assessments. Imaging is harmonized with HCP-Lifespan protocols (motion correction, ICA-FIX, MNI). The dataset supports studies of biomarkers and network changes in schizophrenia spectrum disorders and is a benchmark for disease diagnosis tasks.
Dataset Name | HCP-EP |
fMRI Types | rsfMRI |
tfMRI Tasks | None |
Age Range | 16–35 years |
Gender Ratio | ~42% male, ~58% female |
Patient/Control | Patients and controls |
Disease Types | Early psychosis (schizophrenia, schizoaffective, bipolar with psychosis) |
Used in Tasks | Task 3 (disease diagnosis) |
Sample Size | 252 (57 affective psychosis, 127 non-affective psychosis, 68 controls) |
ADHD-200 Sample (ADHD200):
Multi-site data of 973 children/adolescents, 3×3×4mm, TR=2000ms, focused on ADHD diagnosis.
The ADHD-200 Sample is a multi-center open dataset for ADHD biomarker discovery. It consists of 973 children and adolescents (ages 7–21) from 8 US and 4 Chinese sites, including both ADHD (combined, inattentive, hyperactive-impulsive) and typically developing controls. Imaging includes resting-state fMRI (3×3×4 mm³, TR=2s) and T1-weighted MRI. Phenotypic data covers diagnosis, ADHD subtype, IQ, age, sex, and clinical symptoms. Preprocessing pipelines (Athena, NIAK, others) are public, supporting motion correction, normalization, and ROI extraction. ADHD200 is widely used for benchmarking machine learning models for disease classification.
Dataset Name | ADHD-200 |
fMRI Types | rsfMRI |
tfMRI Tasks | None |
Age Range | 7–21 years |
Gender Ratio | ~73% male, ~27% female |
Patient/Control | ADHD patients and controls |
Disease Types | Attention-Deficit/Hyperactivity Disorder (ADHD) |
Used in Tasks | Task 3 (disease diagnosis) |
Sample Size | 973 (362 ADHD, 611 controls) |
Autism Brain Imaging Data Exchange (ABIDE):
Aggregated from 17 sites, 1,112 subjects (948 males, 164 females) for ASD studies, 3mm, TR=2000ms.
ABIDE collates resting-state fMRI and anatomical MRI from 1,112 subjects (539 with Autism Spectrum Disorder, 573 controls), ages 7–64, across 17 international sites. Imaging protocols are heterogeneous (typical: 3mm³, TR=2s). Extensive phenotypic/clinical data are included, covering ASD diagnosis, IQ, and behavioral scales. Preprocessing (multiple pipelines) includes normalization, head motion correction, nuisance regression, registration, and ROI-based time series extraction. ABIDE is a benchmark for autism connectomics and machine learning-based disorder classification.
Dataset Name | ABIDE |
fMRI Types | rsfMRI |
tfMRI Tasks | None |
Age Range | 7–64 years |
Gender Ratio | ~85% male, ~15% female |
Patient/Control | ASD patients and controls |
Disease Types | Autism Spectrum Disorder (ASD) |
Used in Tasks | Task 3 (disease diagnosis) |
Sample Size | 1,112 (539 ASD, 573 controls) |
UCLA Consortium for Neuropsychiatric Phenomics (UCLA):
272 subjects (multi-diagnostic), 3×3×4mm, TR=2000ms.
The UCLA dataset comprises multimodal MRI and neuropsychological data for 272 adults (aged 21–50), including healthy controls and patients with schizophrenia, bipolar disorder, and ADHD. Resting-state and task-based fMRI (3×3×4 mm³, TR=2s) are included, with rich cognitive, behavioral, and clinical phenotype data. Imaging was acquired on Siemens Trio 3T scanners. Preprocessing includes motion correction, normalization, and ROI time series extraction. This dataset enables studies of transdiagnostic neural signatures and supports disease classification benchmarks.
Dataset Name | UCLA Phenomics |
fMRI Types | rsfMRI and tfMRI |
tfMRI Tasks | Sternberg, Stroop, Stop-signal, Task-switching |
Age Range | 21–50 years |
Gender Ratio | ~46% male, ~54% female |
Patient/Control | Patients and controls |
Disease Types | Schizophrenia, Bipolar Disorder, ADHD |
Used in Tasks | Task 3 (disease diagnosis) |
Sample Size | 272 (130 healthy, 72 schizophrenia, 35 bipolar, 35 ADHD) |
Center for Biomedical Research Excellence (COBRE):
173 subjects (schizophrenia and controls), 3.75×3.75×4.55mm, TR=2000ms.
COBRE provides MRI data for 89 schizophrenia patients and 84 healthy controls (aged 18–65), recruited at a single US site. Imaging includes rsfMRI (3.75×3.75×4.55 mm³, TR=2s), T1 MRI, and clinical/behavioral measures. Preprocessing follows standard steps: motion correction, normalization, ROI time series extraction. This dataset is widely used for machine learning classification of schizophrenia and connectome analysis.
Dataset Name | COBRE |
fMRI Types | rsfMRI |
tfMRI Tasks | None |
Age Range | 18–65 years |
Gender Ratio | ~70% male, ~30% female |
Patient/Control | Schizophrenia patients and controls |
Disease Types | Schizophrenia |
Used in Tasks | Task 3 (disease diagnosis) |
Sample Size | 173 (89 patients, 84 controls) |
Motor Neuron Disease fMRI Dataset (MND):
59 participants (ALS and controls), 2.4mm, TR=2000ms, collected in Australia.
The MND dataset features anatomical and resting-state fMRI (2.395×2.395×2.4mm³, TR=2s) from 59 subjects (36 with Amyotrophic Lateral Sclerosis—ALS, 23 controls), acquired at Herston Imaging Research Facility in Australia using Siemens Prisma 3T scanners. Detailed motor, cognitive, and clinical characterization is included. Imaging data are preprocessed (motion correction, normalization). This dataset is suitable for studying motor system degeneration and machine learning-based diagnosis.
Dataset Name | MND (Motor Neuron Disease) |
fMRI Types | rsfMRI |
tfMRI Tasks | None |
Age Range | Mean ~57 years (range 30–80) |
Gender Ratio | 44 male, 15 female |
Patient/Control | ALS patients and controls |
Disease Types | Amyotrophic Lateral Sclerosis (ALS) |
Used in Tasks | Task 3 (disease diagnosis) |
Sample Size | 59 (36 ALS, 23 controls) |
Transdiagnostic Connectome Project (TCP):
245 subjects with multiple psychiatric diagnoses (2mm, TR=800ms), harmonized imaging.
The TCP dataset consists of 245 adults (aged 18–65) with a diverse range of psychiatric conditions (including mood, anxiety, and psychotic disorders), along with healthy controls, recruited at Yale and McLean (US). Resting-state fMRI (2mm³, TR=800ms) is harmonized across sites using Siemens Prisma scanners. All participants undergo the same comprehensive psychiatric diagnostic interviews (DSM-5), cognitive battery, and clinical assessments. Preprocessing mirrors HCP pipelines (motion correction, ICA-FIX, MNI registration, global signal regression), providing analysis-ready ROI-based functional connectivity and supporting transdiagnostic biomarker research.
Dataset Name | TCP (Transdiagnostic Connectome Project) |
fMRI Types | rsfMRI |
tfMRI Tasks | None |
Age Range | 18–65 years |
Gender Ratio | ~54% female, ~46% male |
Patient/Control | Mixed: patients (multiple psychiatric diagnoses) and controls |
Disease Types | Major depressive disorder, generalized anxiety, bipolar, psychotic disorders, etc. |
Used in Tasks | Task 2 (phenotype prediction), Task 3 (disease diagnosis) |
Sample Size | 245 |
Healthy Brain Network (HBN):
A large-scale, transdiagnostic developmental dataset aiming for 10,000 children/adolescents (ages 5–21), with multimodal MRI and extensive clinical/behavioral measures.
The Healthy Brain Network (HBN) is an ongoing open-data initiative from the Child Mind Institute, designed to capture a broad spectrum of childhood psychopathologies and typical development. Enrollment focuses on families where there are concerns about a child’s mental health or learning challenges, leading to high representation of clinical populations (e.g., ADHD, mood/anxiety disorders), though any child meeting basic inclusion criteria may participate. HBN spans multiple scanning sites in the New York City area (Staten Island mobile 1.5T and several 3T facilities), offering resting-state fMRI, anatomical MRI, diffusion MRI, EEG, voice/video recordings, and extensive phenotypic data. The resting-state protocol and “naturalistic” movie-watching runs (e.g., “Despicable Me,” “The Present”) facilitate pediatric compliance and reduce head motion. Data are released with thorough quality control metrics (framewise displacement, QAP) and partially harmonized preprocessing.
Dataset Name | Healthy Brain Network (HBN) |
fMRI Types | rsfMRI (plus naturalistic viewing fMRI) |
tfMRI Tasks | Movie-watching scans (e.g., “Despicable Me,” “The Present”) |
Age Range | 5–21 years |
Gender Ratio | Mixed (ongoing enrollment) |
Patient/Control | Transdiagnostic sample (clinical concerns + typically developing) |
Disease Types | Various childhood conditions (ADHD, mood, anxiety, etc.) |
Used in Tasks | Task 2 (phenotype prediction), Task 3 (disease diagnosis) |
Sample Size | ~3,900 to date (aiming for 10,000) |
Philadelphia Neurodevelopmental Cohort (PNC):
A population-based youth cohort (age 8–21) with deep phenotyping, neurocognitive assessment, and multimodal MRI (1,445 scanned).
The PNC is a large-scale, community-based study of neurodevelopment and psychiatric risk in over 9,500 youths (ages 8–21) from the greater Philadelphia area, with a deeply phenotyped imaging subsample (n=1,445) providing multimodal MRI, including resting-state and task-based fMRI, structural MRI, DTI, and perfusion scans (ASL). All imaging was performed on a single Siemens TIM Trio 3T scanner using harmonized protocols (rest fMRI: 3mm isotropic, TR=3s; 124 volumes; n-back/Emotion tasks: identical parameters).
All participants received comprehensive computerized neurocognitive testing (CNB) and structured psychiatric assessment (GOASSESS, adapted K-SADS), with parent and self-report for children/adolescents, and rich demographic, medical, and clinical data. The imaging sample is balanced by age, sex, and race, providing a unique resource for developmental, cognitive, and transdiagnostic psychopathology research. Data are publicly available via dbGaP and the PNC data portal.
Dataset Name | Philadelphia Neurodevelopmental Cohort (PNC) |
fMRI Types | rsfMRI, tfMRI (n-back, emotion ID) |
tfMRI Tasks | Fractal n-back (working memory), Emotion Identification |
Age Range | 8–21 years |
Gender Ratio | ~48.5% male, ~51.5% female |
Patient/Control | Population-based (includes healthy and various clinical groups) |
Disease Types | Transdiagnostic: ADHD, mood, anxiety, psychosis-risk, conduct, etc. |
Used in Tasks | Developmental/phenotype prediction, cognitive modeling, disease diagnosis |
Sample Size | 1,445 with MRI (of 9,428 total assessed) |
Homepage | Link |
REST-meta-MDD:
Multisite resting-state fMRI project from the DIRECT consortium investigating major depressive disorder across China.
The REST-meta-MDD Project is the first initiative of the Depression Imaging Research Consortium (DIRECT), involving 25 R-fMRI cohorts from 17 hospitals in China. It comprises 2,428 participants—1,300 patients with major depressive disorder (MDD) and 1,128 normal controls (NCs)—making it among the largest MDD resting-state fMRI datasets collected to date. Each site applied a standardized DPARSF-based preprocessing pipeline locally before sharing final resting-state metrics (e.g., region-wise functional connectivity) and necessary phenotypic data. The project aims to address concerns about low statistical power in smaller MDD studies and to reduce analytic flexibility across different centers by promoting uniform preprocessing. Clinical and demographic data (e.g., first-episode drug-naïve MDD, recurrent MDD, medication status, illness duration, Hamilton Depression Rating Scale scores) are included, allowing the investigation of pathophysiological mechanisms and potential biomarkers for diagnosis or treatment response. Data and code are openly shared to encourage replication, secondary analyses, and new discoveries in MDD research.
Dataset Name | REST-meta-MDD |
fMRI Types | rsfMRI |
tfMRI Tasks | None |
Age Range | Primarily 18–65 years |
Gender Ratio | Varies by cohort; overall ~826 female, 474 male in MDD group |
Patient/Control | MDD patients and normal controls |
Disease Types | Major Depressive Disorder |
Used in Tasks | Potentially Task 3 (disease diagnosis), research on biomarkers |
Sample Size | 2,428 (1,300 MDD, 1,128 NC) |