Cumulus: A Universal Sidecar for a SMART Learning Healthcare System

Developed under funding from the Office of the National Coordinator of Health Information Technology, the cloud-hosted and containerized Cumulus sits behind a provider’s firewall and ‘listens’ to the SMART/HL7 Bulk FHIR Access API for new patient data. Cumulus can acquire and use artificial intelligence to process population data for care, research, public health and learning.

The API is required to be supported in all certified health IT under the 21st Century Cures Act Rule. It affords standardized access to the US Core for Data Interoperability, which is more than 100 data elements in FHIR, that includes clinical notes. A central function of Cumulus is to extract meaning from the free text notes using natural language processing, including large language models.

Another central function is de-identification.

Cumulus is a state-of-the-art sidecar application designed to support a Learning Healthcare System with population health data processed with artificial intelligence. Its development was funded by the Office of the National Coordinator of Health Information Technology. It operates within a cloud environment and is containerized for seamless integration. Situated securely behind a healthcare provider’s firewall, Cumulus actively monitors the SMART/HL7 Bulk FHIR Access API, awaiting new patient data. Its primary role is to efficiently capture and handle vast amounts of population data that multi-solves for use cases across sectors such as care management, medical research, public health, and the learning healthcare system itself.

Under the mandate of the 21st Century Cures Act Rule, this API is required for all certified health IT. The API facilitates uniform access to the US Core Data for Interoperability, encompassing over 100 FHIR data elements, including detailed clinical notes. Cumulus employs advanced AI including large language models to interpret and derive meaningful insights from unstructured text within these notes.

Additionally, Cumulus prioritizes patient privacy by incorporating robust de-identification features, ensuring data analysis and sharing do not compromise individual confidentiality. 

To foster innovation and collaboration, the source code for Cumulus is made available under an open-source license, inviting developers and institutions to contribute to its evolution and enhancement. For those interested in exploring or contributing to the Cumulus project, the open-source code is accessible here.

First Cumulus Use Case

The inaugural use case of Cumulus, which also stands as one of the pioneering applications of the Bulk FHIR Access API, is dedicated to enhancing public health efforts. This initiative is part of the CDC’s Data Modernization Initiative and showcases Cumulus’s successful integration within a robust network comprising five partnerships between health systems and public health departments:

  • Boston Children’s Hospital & Massachusetts Department of Public Health
  • Regenstrief Institute & Marion County Public Health Department
  • Rush University Medical Center & Chicago Department of Public Health
  • Washington University in St. Louis & St. Louis Department of Public Health
  • UC Davis & Yolo County Health and Human Services & Sacramento County Public Health

Adopting a privacy-centric methodology, these partnerships facilitate the aggregation of data, which public health officials can then analyze and manipulate through a user-friendly dashboard or an advanced analytic workbench. The synergy of structured FHIR data and Natural Language Processing (NLP) technologies empowers us to craft reproducible and adaptable computable case definitions. Our initial roll-out targets critical areas such as COVID-19 management, chronic disease monitoring, opioid overdose tracking, and mental health surveillance.

Cumulus is part of an effort to reimagine collection and use of care delivery system data for public health by providing a swift gateway to crucial information housed within EHRs, including the nuanced insights buried in clinical notes.

  • Unified Tool for Diverse Health Challenges. Cumulus is adept at handling infectious diseases, chronic illnesses, and mental health conditions with minimal technological setup required, thereby extending its reach throughout the public health sphere.
  • Tailored Case Definitions at Public Health’s Command. The platform honors the unique perspectives of public health professionals by allowing them to create their own computable case definitions while also offering the agility to implement new ones as situations evolve.
  • Robust Analytics for In-Depth Research. With user-friendly analytics, public health experts can conduct thorough investigations into longitudinal health data, exploring comorbidities and other critical factors for dynamic surveillance.

Cumulus doesn’t just empower public health; it simplifies the technological journey for healthcare sites:

  • Turnkey Integration with EHRs. The technology is designed for easy deployment, harnessing the Bulk FHIR API capabilities already present in EHRs, thereby minimizing the technical load.
  • Barrier-Free Data Sharing for All. By significantly reducing the complexity of data sharing, Cumulus facilitates participation even in resource-constrained environments.
  • Secure and Private Data Handling. While enabling the secure transfer of de-identified data counts, Cumulus ensures that all identifiable patient information remains protected within the healthcare provider’s firewall.

Check out the documentation and feel free to reach out on our discussions page


  1. A computable phenotype for patients with SARS-CoV2 testing that occurred outside the hospital. Lijing Wang, Amy Zipursky, Alon Geva, Andrew J. McMurry, Kenneth D. Mandl, Timothy A. Miller
  2. The SMART Text2FHIR Pipeline. Timothy A. Miller, Andrew J. McMurry, James Jones, Daniel Gottlieb, Kenneth D. Mandl; medRxiv 2023.03.21.23287499; doi: 
  3. Moving Biosurveillance Beyond Coded Data: AI for Symptom Detection from Physician Notes Andrew McMurry, Amy R Zipursky, Alon Geva, Karen L Olson, James Jones, Vlad Ignatov, Timothy Miller, Kenneth D Mandl medRxiv 2023.09.24.23295960; doi: 
  4. Real World Performance of the 21st Century Cures Act Population Level Application Programming Interface. James R. Jones, Daniel Gottlieb, Andrew J. McMurry, Ashish Atreja, Pankaja M. Desai, Brian E. Dixon, Philip R.O. Payne, Anil J. Saldanha, Prabhu Shankar, Yauheni Solad, Adam B. Wilcox, Momeena S. Ali, Eugene Kang, Andrew M. Martin, Elizabeth Sprouse, David Taylor, Michael Terry, Vladimir Ignatov, the SMART Cumulus Network, Kenneth D. Mandl medRxiv 2023.10.05.23296560; doi: