1,190 words, 6 minutes read time.
Marc Berger
Marc L. Berger, MD, is a semi-retired, part-time consultant and scientific advisor. Until July 2017, he was Vice President, Real World Data and Analytics at Pfizer, Inc. Marc has held senior-level positions in industry including Executive Vice President and Senior Scientist at OptumInsight; Vice President, Global Health Outcomes at Eli Lilly and Company; and Vice President, Outcomes Research and Management at Merck & Co., Inc. He was a temporary employee of CMS from July-December 2022. Learn More
Thirty years ago, I made the shift from conducting randomized clinical trials to undertaking observational outcomes research studies. At the time, health care costs were on the increase and as a result, more and more attention was being focused on assessing the value obtained from investments in health care. The commercial availability of administrative claims—that is, a source of real-world data—permitted a variety of new questions to be posed. Who was being treated? When were they treated? What was the spectrum of treatment regimens in use for specific conditions? What was the burden of illness? The conduct of these descriptive analyses was guided by good epidemiologic practices. And when it came to assessing the effectiveness or the comparative effectiveness of various treatments, there was a renewed level of healthy skepticism of the results. Observed correlations were not considered evidence of causation.
As we approached the millennium, managed care formulary committees in the U.S. and Health Technology Authorities around the world sought an increasingly quantitative assessment of treatment value; cost-effectiveness modelling became de rigueur for the submissions of research results. These models employed real-world data and the findings from clinical trials to assess the relative value of treatments. While CEA became standard practice, skepticism persisted regarding its findings. Concern about the robustness of these findings was in part due to uncertainty about the accuracy of life-time projections that exceeded existing data as well as controversy over measuring the QALY (quality-adjusted life year).

After Y2K, the growing wealth of real-world data came to be seen as a valuable gold-mine that was largely ignored by health care policy makers. The Institute of Medicine in the U.S. envisioned an overarching goal of a “Learning Health Care System” through which rigorous examination of outcomes would be performed through analysis of a variety of real-world data sources created by the health system itself. This analysis would continually inform improvements in the effectiveness and efficiency of the health care being delivered. With the increasing adoption of electronic medical records, there was the promise of yet richer and deeper source of real-world data. Nevertheless, skepticism persisted regarding the ability of real-world data sources to support causal inferences about treatment effectiveness—concerns partly driven by the variable quality and the incompleteness of real-world data.
During this period (early 2010’s), I gave a talk at a conference in Washington DC attended by a broad group of stakeholders including the FDA. I opined that skepticism of the findings from observational outcomes research studies that examined real-world data was sometimes warranted. But just as the statistician George Box had once observed, “All models are wrong. Some are useful”, I believed that “All real-world data sources are dirty and much of it is sparse. Nonetheless, it is often useful.”
I was influenced by work on ISPOR taskforces. In one of these, the statistician Sharon-Lise Normand commented that observational studies of treatment effectiveness should be designed as if they were clinical trials. (Prospective observational studies to assess comparative effectiveness: The ISPOR Good Practices Task Force Report. Value in Health 2012; 15:217-30). This led to an “AHA” moment for me. I realized that not only was skepticism of observational outcomes research generated by limitations of data quality and study design, but also warranted by a failure to utilize good clinical practice in the execution of the studies themselves. Observational studies needed to be as transparent and rigorous in their execution as randomized clinical trials.
Subsequently, joint ISPOR-ISPE taskforces recommended initiatives in line with this goal: e.g. proposing that Hypothesis Evaluating Treatment Effectiveness (HETE) studies should pre-register hypotheses and protocols, employ fit-for-purpose data sources, and utilize a causal inference framework (Good practices for real‐world data studies of treatment and/or comparative effectiveness: Recommendations from the joint ISPOR‐ISPE Special Task Force on real‐world evidence in health care decision making. Pharmacoepidemiology and Drug Safety 2017; 26:1033–1039. Value in Health 2017: 20(8), 1003-1008; Improving Transparency to Build Trust in Real-World Secondary Data Studies for Hypothesis Testing—Why, What, and How: Recommendations and a Road Map from the Real-World Evidence Transparency Initiative. Value in Health 2020; 23(9):1128–1136; Pharmacoepidemiology and Drug Safety, 2020; 29(11) 1504-1513.)
As to study design, Miguel Hernan and others demonstrated that employing a causal inference framework is crucial in study design, proposing that studies employ a Target Trial Emulation (TTE) framework. Indeed, the fact that some previous observational outcomes research studies had not utilized a causal inference framework could explain why some of these studies generated results at significant variance from randomized clinical trials. Subsequently, the RCT-Duplicate initiative, sponsored by the FDA, demonstrated that TTE designed studies using fit-for-purpose data sources in fact obtained comparable results for a large number of RCTs and that most variation, when it was observed, could be attributed to the distance between the RCT parameters and key emulation parameters.

At the present time, it is clear that documenting a hypothesis and protocol ex ante as well as employing a causal framework (such as TTE) for HETE studies will become standard practice over time, even though full stakeholder consensus is yet to be obtained. However, even with the improvement in the quality of real-world data sources, controversy persists whether real world data sources are fit-for-purpose. Clearly, if a data source lacks an adequate quantity of the populations of interest, or lacks consistent and accurate recording of key parameters required to assess exposures, outcomes, and confounders, a source will fulfill the old adage, “Garbage in, Garbage Out.”
In both the US and Europe, regulatory authorities have proposed frameworks for assessing the quality of data sources and whether they are fit-for-purpose (Real-World Data and Real-World Evidence in Healthcare in the United States and Europe Union. Bioengineering 2024. 11: 784. https://doi.org/10.3390/bioengineering11080784). I have collaborated on the development of a screening instrument to assess whether a particular data source would be fit-for-purpose to support a HETE study (ATRAcTR [Authentic Transparent Relevant Accurate Track-Record]: A Screening Tool to Assess the Potential for Real-World Data Sources to Support Creation of Credible Real-World Evidence for Regulatory Decision-Making. Health Services and Outcome Research Methodology, November 2023 https://doi.org/10.1007/s10742-023-00319-w).
Looking forward, there is little doubt that a consensus among stakeholders will emerge in the coming years, defining the process of evaluating the quality of real-word data sources and their fitness for use. Once this is accomplished, there will be challenges remaining before us: how to execute RWD studies rigorously and rapidly, so that we can truly achieve a Learning Health Care System. Despite the current hype, it remains to be seen whether meeting this goal will be rendered more feasible through advances in AI/ML (artificial intelligence/machine learning). It is also possible that newer and more powerful hardware, software, and additional analytic approaches also will be necessary.