Public clinical data science data sets

Looking for a clinical data science project? Here are some free data sets that might be of use (in no particular order):

Human Physiological and Neuroimaging Data

  1. Cam-CAN (Cambridge Centre for Ageing Neuroscience) dataset: fMRI, MRI, MEG, and behavioral human neurodata on people of various ages (i.e., cross-sectional data)
  2. Physionet: A wide array of clinically relevant physiological recordings (e.g., sleep staging EEG/EMG data, ECG data) as well as some software
  3. A large database of EEG and intracranial EEG (iEEG) data from patients and animals suffering from epilepsy
  4. Kaggle iEEG epilepsy competitions:
    1. Seizure detection 2014
    2. Seizure prediction 2014
    3. Seizure prediction 2016
  5. Wrist sensor data for heart rate estimation
  6. Miller & Ojemann’s library of human electrocorticographic data and analyses: Free database of ECoG data acquired during various tasks (e.g., motor, memory, vision)
  7. Arnaud Delorme’s list of public EEG datasets
  8. ABIDE autism neuroimaging databases: MRI and fMRI data from individuals diagnosed with autism and control participants
  9. PAMAP2 Physical Activity Monitoring Data set: Contains data of 18 different physical activities (such as walking, cycling, playing soccer, etc.), performed by 9 subjects wearing 3 inertial measurement units and a heart rate monitor. The dataset can be used for activity recognition and intensity estimation, while developing and applying algorithms of data processing, segmentation, feature extraction and classification.
  10. The Human Connectome Project: fMRI, MRI, DTI, & MEG data from 900 individuals (some across several years I believe)
  11. OMEGA (The Open MEG Archive): Magnetoencephalogram database hosted by the Montreal Neurological Institute
  12. Heart Disease Data Set: 76 attributes of individuals that should help detect heart disease

Health Care Data:

  1. Medicare health outcomes survey: Measures the ‘physical and mental health and well-being’ of beneficiaries for a 2 year period. The data set covers recipients from 1998-2014.
  2. Medical expenditure panel survey: From the agency for Healthcare Research and Quality.
  3. Behavioral risk factor surveillance system: American adult health behaviors collected by the Center for Disease Control
  4. OpenFDA: The FDA’s open data platform. Includes things such as databases on medical devices.
  5. A Dutch Hospital’s Event Log and a Dutch sepsis patient event log: The former is from the 2011 Business Processing Intelligence Challenge. Winning analyses are posted for that competition as well.
  6. Anonymous medical records

Regulatory Data:

  1. U.S. Food and Drug Administration records on Includes things like lists of FDA approved therapeutic products, their marketing application, and active ingredients

Genomics Data:

  1. 1000 Genomes Project


Note, most of these lattermost links are courtesy of the Data Incubator.




Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s