Last weekend we hosted a Critical Data Marathon in London, alongside parallel events in France (Paris) and the US (Cambridge, MA). The idea was to bring together healthcare providers with data scientists to answer clinically-relevant questions over the course of a weekend. Many thanks to The Institution of Engineering and Technology, IDEALondon, UCL, KCL, Imperial College, and MIT for helping to make it happen.
Our focus was the MIMIC-II database , an incredible resource of critical care data. Aside from some minor hiccups along the way (like getting locked out of the venue on Sunday morning!) I think it's fair to say that the event was a success. We kicked off with some great talks, learnt about MIMIC-II, got to know each other, and even made a start on some clinical questions.
As anticipated, one of the biggest challenges of the weekend was getting everyone set up with access to MIMIC-II. Not easy with around forty people, each with different skillsets, operating systems, and motivations. Data in MIMIC-II is anonymised but it relates to patients and requires respect, so there are a few hurdles that need to be jumped before access is granted.
Below I have documented what ended up being the most straightforward route to accessing the MIMIC-II data, in case it is helpful for others. If you're a clinical researcher fed up with battling your own hospital's data management systems, you could do worse than spending a few hours on the following steps:
- Register with Physionet, the platform that hosts MIMIC-II (and a great source of data in itself). Simply add your details to the Physionet registration page.
- Complete an approved online ethics training programme in human research, such as the NIH one on Protecting Human Research Participants. This will take a couple of hours.
- Read the MIMIC-II access instructions and, assuming you agree, click 'I agree' at the bottom of the page. You'll receive an email containing a 'Data Use Agreement'. Complete the agreement and send it to the email address provided, along with a copy of your ethics certificate.
- After carrying out some checks, the MIMIC-II team will send you an email to let you know that permission to access the data has been granted.
Once you've gained access there are several different methods for exploring and analysing the data. The most popular options amongst attendees of the London Data Marathon were:
MIMIC Explorer. This is a web-based tool that allows users to view tables and run queries. Exports are limited to 1000 rows, so while it's great for exploring the data, it's not so great for in-depth analysis.
Local Postgres instance. Setting up a local version of MIMIC-II is the simplest option for people who plan to use the data long-term. Install Postgres, import a dump of the data (you should be able to request this from the MIMIC-II team), and you're ready to go. We found that recreating the database takes around an hour on a solid state harddrive and requires about 60GB of space.
- Flat text files. The entire clinical database is available as a collection of CSV files. Great for experienced programmers like @MattDowle, who demonstrated the power of R at the data marathon!
That's all for now. Hopefully see you at the next MIMIC-II event, which is likely to be in mid-2015.
- Mohammed Saeed, MD, PhD, Mauricio Villarroel, MBA, Andrew T. Reisner, MD, Gari Clifford, PhD, Li-Wei Lehman, PhD, George Moody, Thomas Heldt, PhD, Tin H. Kyaw, Benjamin Moody, and Roger G. Mark, MD, PhD. Multiparameter Intelligent Monitoring in Intensive Care II (MIMIC-II): A public-access intensive care unit database 2011; 39:952-960.