CumulusQ Bulk FHIR Data Quality Metrics

With the 21st Century Cures Act Final Rule in effect requiring support for the SMART and HL7 FHIR APIs in certified health IT, it is becoming increasingly simple for patients and other stakeholders to access healthcare data in standardized FHIR formats. However, there is very little tooling available to confirm or improve the quality and utility of the FHIR data being exported.

To help address this, using funding from the Office of the National Coordinator of Health Information Technology/Assistant Secretary for Technology Policy (ONC/ASTP) and the Advanced Research Projects Agency for Health (ARPA-H), the SMART team is building and testing a suite of bulk FHIR data quality and characterization tools called CumulusQ.

CumulusQ is a collection of open source data metrics and visualizations for counting and reviewing detailed trends in available FHIR data. As a reference implementation of metrics defined by the Qualifier project, CumulusQ aims to identify and guide remediation of errors that may have been introduced during data entry, mapping, serialization, or exporting. The metrics are inspired by the OHDSI Data Quality Dashboard, Kahn framework, and OHDSI OMOP Achilles. They are focused on evaluating the data quality and characteristics of data sets that comply with the USCDI v1 data subset as described in the FHIR US Core STU4 Implementation Guide (IG) and implemented by current certified EHR systems. They can also be adapted to work with FHIR data that complies with other IGs.

Built for Cumulus and other Bulk FHIR Projects

CumulusQ was initially developed to be run at scale in the cloud. Sites in the SMART Cumulus project run the metrics on their de-identified cloud nodes whenever new bulk FHIR data becomes available. In this setting, the metrics are federated to generate both site-specific and network-wide reports and visualizations for the Cumulus network dashboard web app. Based on the results, analysts and researchers can tailor project-specific data cleaning and transformation steps and make informed adjustments to analytic approaches.

For reviewing smaller bulk FHIR exports, a local CumulusQ “Lite” version is also being developed. This approach uses locally available NDJSON files and a lightweight open source method to process and visualize the metrics without connecting to a cloud service.

Example metric visualization characterizing Patient birth year: 

A USCDI data quality metric chart counting patients by birth year with a spike at 1940

Resources: