Artificial Intelligence (AI) and big data have hit the mainstream. Whether it’s the amazing advances in image recognition or sentiment analysis or shockingly accurate and timely friend suggestions and targeted ads. CMS has even launched the “Artificial Intelligence Health Outcomes Challenge”. More information is available here.
The purpose of the project is to predict unplanned hospital visits, “skilled nursing facility admissions” and adverse events. Execution aside, I think the goal of the project is a good one. Participants in the CMS challenge will gain access to a large dataset which can be used to feed neural networks.
An important thing to understand about AI is that it requires A LOT of training data, the more the better. You almost can’t have too much. Data is the new currency so they say.
This is where I think CMS has an opportunity to really accelerate the use of AI (and other tools) to improve our understanding of the complex relationships between the patient, the caregiver, the one paying the bills and the final result. CMS’s data policies are in need of an update.
For years now we’ve used data.medicare.gov and RESDAC as our primary sources for information about Medicare part A. The former is aggregated and has no historical data. The latter is encumbered with severe restrictions on how the data is used and is not free. There is also significant bureaucracy to gain access. Also, there really isn’t a provision for people to get the data and do exploratory research with RESDAC.
The situation is worse for those of us who would like to study Medicaid and the efficacy of programs. Many states claim not to have the data (even though they are paying the claims). Other states ignore requests completely. Some states won’t work with non-residents. At least one state we work with actually told me: “You’ll have to sue us to get that information.”
If data is the new currency, we’ve got to find ways to get it in the hands of people who know how to use it and stop hoarding it at the bank. While I applaud the spirit of the competition and really hope there are useful results, I think we can do more. (The winner gets a million dollars. Did I mention that?)
Here are some ideas:
Rather than make access to these large datasets a big event and or a competition, make the data available to everyone who is willing to agree to an NDA. ALL THE TIME.
Release the data more often. We rarely get access to federal-level information sooner than A YEAR after the fact. We’re driving with the rear-view mirror. We deserve better.
Clean up data.medicare.gov. Have you tried to research quality measures using that data? Good luck. Again, the intentions are good but it’s really a jumbled mess.
Keep historical data on data.medicare.gov. Currently if I want to look at trending over time it’s completely on me to archive the data. Why?
Make the data FREE for general exploration. I’m talking about RESDAC-type data here. Yes there are some HIPAA concerns, but I think we can meet in the middle. I’m willing to sign an NDA and I bet others would as well. We can also take steps like removing the lowest volume providers and explicitly forbid attempted patient identification in the data use agreement.
Remove the requirement to produce a study or report with RESDAC data. Much of what gets published is of dubious quality already. Negative results are still results and are important.
Require states to provide some level of Medicaid program reporting. The federal government pays a majority of the bill so they should have access to the data and leverage to force a little transparency. It truly is mind-blowing that some states won’t even tell you what percentage of Medicaid patients received rehab treatments. This problem is getting worse with Medicaid replacement plans.
CMS has proven they are willing to make significant, sweeping changes to our healthcare system. (See 2019 for details.) Let’s update the IT infrastructure to support those of us who are willing and able to put that data to good use.