Clinical scientists used machine learning (ML) models to explore unidentified electronic health record (EHR) data in the National COVID Cohort Collaborative (N3C), a national clinical database funded by the National Institutes of Health, to help distinguish characteristics of people with long standing disabilities. COVID and factors that may help identify these patients using data from medical records.
The results are published in The Lancet Digital Healthhas the potential to improve clinical research on the long-running COVID and provide a more standardized case care system.
First author Emily R. Pfaff, MD, assistant professor in the division of endocrinology and metabolic medicine at the University of North Carolina School of Medicine. “We needed to gain a better understanding of the intricacies of long COVID, which is why it made sense to take advantage of modern data analytics tools and a unique big data resource like N3C, where many of the long COVID features are represented.”
Sponsored by the National Centers for the Advancement of Translational Sciences (NCATS) of the National Institutes of Health, the N3C data pocket currently includes information representing more than 13 million people from 72 locations nationwide, including nearly 5 million positive cases of COVID-19. The resource enables quick research on emerging questions about COVID-19 vaccines, treatments, risk factors, and health outcomes.
This new research is part of the National Institutes of Health’s COVID Research Initiative to Promote Recovery (RECOVER), which has recruited thousands of participants across the country to answer critical research questions about the syndrome to determine who has had COVID for a long time, and their risk factors. For long-term COVID and potential interventions and treatments.
Using N3C, researchers have developed XGBoost machine learning (ML) models to understand patient characteristics and better identify potential long-term COVID patients.
Researchers examined the demographics, healthcare use, diagnoses, and medications of 97,995 adult COVID-19 patients. They used these features on nearly 600 long-term COVID patients from three specialty long-term COVID clinics to train and test three ML models, which focused on identifying potential long-term COVID patients in three groups: Among all COVID-19 patients, Among patients hospitalized with COVID-19, and among patients who have had COVID-19 but have not been hospitalized.
Models have proven accurate in identifying potential long-term COVID patients, achieving areas under the receiver operator characteristic curve, a measure of accuracy used by machine learning researchers, 0.91 (all patients); 0.90 (in hospital); and 0.85 (non-hospital). Patients tagged through the forms could be interpreted as “patients who require care in a specialized clinic for novel coronavirus for a long time”. Applying the model to the larger N3C cohort could also lead to the urgent goal of identifying long-term COVID patients for clinical trials.
The models also demonstrated several important features that distinguish potential long-term COVID patients from non-long-term COVID patients. They focused on patients with a positive diagnosis of COVID who were at least 90 days after acute infection. Features most commonly identified among potential long-term COVID patients include respiratory symptoms after COVID and associated treatments, non-respiratory symptoms widely reported as part of prolonged COVID (eg, sleep disturbances, anxiety, malaise, chest pain, constipation), pre-existing factors The risk for acute COVID severity (eg, chronic lung disease, diabetes, CKD), and hospitalization agents, indicating greater acute virus severity. The study also suggests that it is plausible that long-term COVID ultimately does not have a single definition, and could be better described as a group of conditions related to their symptoms, pathways, and treatments.
“These findings speak to the powerful impact of real-world clinical data and the potential capabilities of N3C to help better understand and find solutions to significant public health problems such as the long-running COVID,” said Johnny Rutter, NCATS Acting Director.
Josh Wessel, MD, PhD, senior clinical advisor at NCATS and science program leader at RECOVER added, “Once you can identify who has had COVID for a long time in a large database of people, you can start asking questions about those people. Was it Is there something different about these people long before they had coronavirus? Do they have certain risk factors? Was there something about how they were treated during acute covid-19 that might increase or decrease the risk of long-term covid-19?”
The study included how EHR data tends toward patients who benefit most from health care systems. Pfaff says it’s important to recognize data that are less likely to be represented — patients who are uninsured, patients who have limited access to care or are able to pay for care, or patients who seek care at small practices or community hospitals with limited data exchange capabilities.
“Electronic health records (EHRs) contain information only for people who go to the doctor,” said Pfaff, who is also co-director of the NC TraCS Informatics and Data Science (IDSci) program. “They also have more information about people who go to the doctor a lot. So, people who don’t have good access to care or people who don’t go to the doctor, we won’t get information about them. So that’s a warning I give with every study that I do based on health records. We need to identify who is not in the data set.”
The N3C team continues to improve its models as more real-world data emerges. Their longitudinal data for COVID-19 patients could provide a comprehensive basis for developing ML models to identify potential long-term COVID patients. As larger groups of long-term COVID patients are established, future work will include research to identify subtypes of long-term COVID, making the condition easier to study and treat.
“Depending on what the research leads to, we may find that patients with different presentations of long-acting COVID-19 are different enough to warrant completely different treatments,” Pfaff said. “Therefore, it is important for us to determine whether prolonged COVID is a single disease, or a group of related conditions also associated with severe COVID-19.”
With the help of this big data approach, effective study recruitment efforts can become available to deepen the understanding and complexities of the long-running COVID. In addition to identifying groups for research studies, understanding and validating the relationship between long-term COVID and social determinants of health and demographics will only improve the algorithm in these models as more evidence emerges.
“Research studies, particularly clinical trials, are one of our best tools for gaining an understanding of the long-running COVID-19 virus — its presentation, risk factors, potential treatments,” Pfaff said. “For the best chance of success, studies need large and diverse groups of eligible participants, which is hard to find. Using algorithms like the ones we’ve created in large clinical data sets can narrow a large number of patients to those who might be eligible. The COVID trial is long, which could give researchers a head start in hiring, making the trials more efficient, and hopefully getting results faster.”
This study was funded by NCATS and the National Institutes of Health through the RECOVER Initiative.
For more information: https://ncats.nih.gov
Long COVID-related content:
MRI sheds light on myocardial injury associated with COVID vaccine
What we know about protracted heart disease two years after the epidemic
Video: Long-term cardiac effects of COVID-19 two years after the pandemic – Interview with Aaron Bagish, Physician
Video: Long Presentations on COVID in Cardiology at Beaumont Hospital – Interview with Justin Trifax, MD
Video: Cardio Shows in COVID Long Distance at Cedars-Sinai Hospital – Interview with Siddharth Singh, MD
Discover more COVID news and videos
COVID related content:
The repercussions of the emerging coronavirus, COVID-19, may lead to more deaths from cancer
Kawasaki-like inflammatory disease affects children with COVID-19
FDA adds myocarditis warning to COVID mRNA vaccine clinical fact sheets
CMS now requires COVID-19 vaccinations for healthcare workers by January 4
Cardiac MRI of myocarditis after COVID-19 vaccination in adolescents
Few patients present with myocarditis-like illness after vaccination with COVID-19
An overview of cases of myocarditis caused by the COVID-19 vaccine
The case study describes one of the first US cases of MIS-C
Project funded by the National Institutes of Health wants to identify children at risk of MIS-C from COVID-19