Public clinical data science data sets

Looking for a clinical data science project? Here are some free data sets that might be of use (in no particular order):

Human Physiological and Neuroimaging Data

  1. Cam-CAN (Cambridge Centre for Ageing Neuroscience) dataset: fMRI, MRI, MEG, and behavioral human neurodata on people of various ages (i.e., cross-sectional data)
  2. Physionet: A wide array of clinically relevant physiological recordings (e.g., sleep staging EEG/EMG data, ECG data) as well as some software. For example, the CHB-MIT Scalp EEG Database consisting of inpatient ictal and interictal scalp EEG recordings from pediatric patients
  3. Predict: Database of EEG from neurological and psychiatric disorders and software tools for their analysis. (They have a list of online EEG databases as well:
  4. A large database of EEG and intracranial EEG (iEEG) data from patients and animals suffering from epilepsy
  5. Kaggle iEEG epilepsy competitions:
    1. Seizure detection 2014
    2. Seizure prediction 2014
  6. Wrist sensor data for heart rate estimation
  7. Miller & Ojemann’s library of human electrocorticographic data and analyses: Free database of ECoG data acquired during various tasks (e.g., motor, memory, vision)
  8. University of Pennsylvania’s Electrophysiology Portal: Free database of iEEG data acquired during various cognitive tasks or brain states. The data are particularly good for studying memory and they have some intracranial electrical stimulation data as well.
  9. Arnaud Delorme’s list of public EEG datasets
  10. ABIDE autism neuroimaging databases: MRI and fMRI data from individuals diagnosed with autism and control participants
  11. PAMAP2 Physical Activity Monitoring Data set: Contains data of 18 different physical activities (such as walking, cycling, playing soccer, etc.), performed by 9 subjects wearing 3 inertial measurement units and a heart rate monitor. The dataset can be used for activity recognition and intensity estimation, while developing and applying algorithms of data processing, segmentation, feature extraction and classification.
  12. The Human Connectome Project: fMRI, MRI, DTI, & MEG data from 900 individuals (some across several years I believe)
  13. Free & open platform for sharing (and analyzing) neuroimaging data
  14. OMEGA (The Open MEG Archive): Magnetoencephalogram database hosted by the Montreal Neurological Institute
  15. Heart Disease Data Set: 76 attributes of individuals that should help detect heart disease
  16. NIMH Adolescent Brain Development Database: High-quality baseline data on 4500  9-and-10-year-old children, including basic participant demographics, assessments of physical and mental health, substance use, culture and environment, neurocognition, tabulated structural and functional neuroimaging data, and minimally processed brain images, as well as biological data such as pubertal hormone analyses.
  17. Anatomical Tracings of Lesions after Stroke (ATLAS): Over 200 MRIs in which the stroke location has been manually marked. Click here for the paper describing the data.

Health Care Data:

  1. Medicare health outcomes survey: Measures the ‘physical and mental health and well-being’ of beneficiaries for a 2 year period. The data set covers recipients from 1998-2014.
  2. Medical expenditure panel survey: From the agency for Healthcare Research and Quality.
  3. Behavioral risk factor surveillance system: American adult health behaviors collected by the Center for Disease Control
  4. OpenFDA: The FDA’s open data platform. Includes things such as databases on medical devices.
  5. A Dutch Hospital’s Event Log and a Dutch sepsis patient event log: The former is from the 2011 Business Processing Intelligence Challenge. Winning analyses are posted for that competition as well.
  6. Anonymous medical records

Regulatory Data:

  1. U.S. Food and Drug Administration records on Includes things like lists of FDA approved therapeutic products, their marketing application, and active ingredients

Genomics Data:

  1. 1000 Genomes Project


Note, most of these lattermost links are courtesy of the Data Incubator.




Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s