Public clinical data science data sets

Looking for a clinical data science project? Here are some free data sets that might be of use (in no particular order):

Human Physiological and Neuroimaging Data

  1. Cam-CAN (Cambridge Centre for Ageing Neuroscience) dataset: fMRI, MRI, MEG, and behavioral human neurodata on people of various ages (i.e., cross-sectional data)
  2. Physionet: A wide array of clinically relevant physiological recordings (e.g., sleep staging EEG/EMG data, ECG data) as well as some software. For example, the CHB-MIT Scalp EEG Database consisting of inpatient ictal and interictal scalp EEG recordings from pediatric patients
  3. ieeg.org: A large database of EEG and intracranial EEG (iEEG) data from patients and animals suffering from epilepsy
  4. Kaggle iEEG epilepsy competitions:
    1. Seizure detection 2014
    2. Seizure prediction 2014
    3. Seizure prediction 2016
  5. Wrist sensor data for heart rate estimation
  6. Miller & Ojemann’s library of human electrocorticographic data and analyses: Free database of ECoG data acquired during various tasks (e.g., motor, memory, vision)
  7. University of Pennsylvania’s Electrophysiology Portal: Free database of iEEG data acquired during various cognitive tasks or brain states. The data are particularly good for studying memory and they have some intracranial electrical stimulation data as well.
  8. Arnaud Delorme’s list of public EEG datasets
  9. ABIDE autism neuroimaging databases: MRI and fMRI data from individuals diagnosed with autism and control participants
  10. PAMAP2 Physical Activity Monitoring Data set: Contains data of 18 different physical activities (such as walking, cycling, playing soccer, etc.), performed by 9 subjects wearing 3 inertial measurement units and a heart rate monitor. The dataset can be used for activity recognition and intensity estimation, while developing and applying algorithms of data processing, segmentation, feature extraction and classification.
  11. The Human Connectome Project: fMRI, MRI, DTI, & MEG data from 900 individuals (some across several years I believe)
  12. OMEGA (The Open MEG Archive): Magnetoencephalogram database hosted by the Montreal Neurological Institute
  13. Heart Disease Data Set: 76 attributes of individuals that should help detect heart disease

Health Care Data:

  1. Medicare health outcomes survey: Measures the ‘physical and mental health and well-being’ of beneficiaries for a 2 year period. The data set covers recipients from 1998-2014.
  2. Medical expenditure panel survey: From the agency for Healthcare Research and Quality.
  3. Behavioral risk factor surveillance system: American adult health behaviors collected by the Center for Disease Control
  4. OpenFDA: The FDA’s open data platform. Includes things such as databases on medical devices.
  5. A Dutch Hospital’s Event Log and a Dutch sepsis patient event log: The former is from the 2011 Business Processing Intelligence Challenge. Winning analyses are posted for that competition as well.
  6. Anonymous medical records

Regulatory Data:

  1. U.S. Food and Drug Administration records on www.data.gov: Includes things like lists of FDA approved therapeutic products, their marketing application, and active ingredients

Genomics Data:

  1. 1000 Genomes Project

 

Note, most of these lattermost links are courtesy of the Data Incubator.

 

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s