Scientists identify characteristics to better define prolonged COVID

Scientists identify characteristics to better define prolonged COVID

new version

Monday 16 May 2022

Using machine learning, researchers are finding patterns in electronic health record data to better identify people likely to have the condition.

A research team supported by the National Institutes of Health has identified characteristics of people who have long-term COVID and those who are likely to have had it. Scientists, using machine learning techniques, have analyzed the unprecedented set of electronic health records (EHRs) available for COVID-19 research to better determine who has had COVID for a long time. To explore unidentified electronic health record data in the National COVID Collaborative (N3C), a centralized national public database led by the National Institutes of Health’s National Center for the Advancement of Translational Sciences (NCATS), the team used the data to find more than 100,000 probable long-term COVID cases. Like from October 2021 (as of May 2022 the number is over 200,000). The results appear in The Lancet Digital Health.

Long-term COVID-19 is characterized by widespread symptoms, including shortness of breath, fatigue, fever, headache, ‘brain fog’ and other neurological problems. These symptoms can persist for several months or longer after the initial diagnosis of COVID-19. One reason COVID has been so difficult to identify for so long is that many of its symptoms are similar to those of other diseases and conditions. Better characterization of COVID-19 may lead to improved prognosis and new therapeutic approaches.

“It makes sense to take advantage of modern data analytics tools and a unique big data resource such as N3C, where many of the long-running features of COVID can be represented,” said co-author Emily Pfaff, PhD, a clinical information scientist at the university. From North Carolina at Chapel Hill.

The N3C data pocket currently includes information representing more than 13 million people nationwide, including nearly 5 million positive cases of COVID-19. The resource enables quick research on emerging questions about COVID-19 vaccines, treatments, risk factors, and health outcomes.

The new research is part of a larger, related initiative across the National Institutes of Health, Research on COVID to Enhance Recovery (RECOVER), which aims to improve understanding of the long-term effects of COVID-19, called post-acute sequelae of SARS-CoV-2 infection ( PASC). RECOVER will accurately identify people with PASC and develop approaches to prevention and treatment. The program will also answer important research questions about the long-term effects of COVID through clinical trials, longitudinal observational studies, and more.

In the Lancet Study, Pfaff, Melissa Heindel, PhD, at the University of Colorado Anschutz Medical Campus, and colleagues examined patient demographics, healthcare use, diagnoses and medications in the health records of 97,995 adult patients with COVID-19 in N3C. They used this information, along with data from nearly 600 long-term COVID patients from three long-term COVID clinics, to create three machine learning models to identify long-term COVID patients.

In machine learning, scientists “train” computational methods to quickly sift through large amounts of data to reveal new insights — in this case, about the long COVID. The models looked for patterns in the data that could help researchers understand patient characteristics and better identify individuals with the condition.

The models focused on identifying potential long-term COVID patients among three groups in the N3C database: all COVID-19 patients, patients hospitalized with COVID-19, and patients who had COVID-19 but were not hospitalized. The models proved accurate, because the people identified as at prolonged COVID risk were similar to patients seen in long COVID clinics. Machine learning systems categorized nearly 100,000 patients in the N3C database whose profiles were identical to those with a prolonged COVID-19 outbreak.

Josh Wessel, MD, PhD, a senior clinical advisor at NCATS and a leading science program in Recover, said. “Was there something different about these people before they had long-term Covid? Do they have certain risk factors? Was there something about how they were treated during acute Covid-19 that might increase or decrease the risk of long-term Covid?”

The models looked for common features, including new medications, physician visits, and new symptoms, in patients with a positive diagnosis of COVID who were at least 90 days after acute infection. The models identified patients as having prolonged COVID if they went to a prolonged COVID clinic or showed prolonged COVID symptoms and likely had the condition but not been diagnosed.

“We want to integrate the new patterns we see with the COVID diagnostic code and include it in our models to try to improve their performance,” said Haendel of the University of Colorado. “Models can learn from a larger group of patients and become more accurate. We hope we can use our long COVID patient classifier to recruit clinical trial participants.”

This study was funded by NCATS, which contributed to the design, maintenance, and security of the N3C Enclave, and the NIH RECOVER Initiative, with support from NIH OT2HL161847. Recover coordinates, among other things, the participant recruitment protocol to which this work contributes. Analyzes were performed using data and tools accessed through the NCATS N3C Data Enclave and with support from NCATS U24TR002306.

About the National Center for the Advancement of Translational Science (NCATS): NCATS conducts and supports research on translational science and operation — the process by which health-improving interventions are developed and implemented — to allow more treatments to reach more patients more quickly. For more information on how NCATS can help shorten the journey from scientific observation to clinical intervention, visit

About the National Institutes of Health (NIH):
NIH, the country’s medical research agency, includes 27 institutes and centers and is part of the US Department of Health and Human Services. The National Institutes of Health is the primary federal agency that conducts and supports basic, clinical, and polymedical research, investigating the causes, treatments, and treatment of both common and rare diseases. For more information about the National Institutes of Health and its programs, visit

National Institutes of Health … Turning Discovery into Health®


2022-05-16 20:17:46

Leave a Comment

Your email address will not be published.