Robert M. Califf, MD

Our lungs inhale almost 1M liters of air every day.
Untangling Safety and Efficacy


I am amazed by the number of pundits who are espousing the idea that one can tell that a drug or device “works” in a small number of human research participants, and then assure “safety” using post-marketing surveillance.  The most recent flurry on this topic came after Andrew von Eschenbach’s editorial in the Wall Street Journal.  The idea seems to be that the benefit of a drug or device can be established in a few patients, evaluating biomarkers or putative surrogate endpoints, while evaluating the risk can be done in a dissociated fashion without the benefit of randomization.


I can accept this paradigm for short-term treatment of symptoms in people without significant chronic diseases and for devices that are not directly therapeutically intended.  The study sample sizes need to be large enough to distinguish signal from noise and to estimate the treatment effect size.  Larger sample sizes are needed to detect rare toxicities or device malfunctions, and in a healthy population these toxicities are expected to be very rare so that a control group is not needed.


Extrapolation of this way of thinking to chronic diseases or serious systemic disease treatment ignores the lessons of the past 30 years.  The fundamental problem is that the balance of risk and benefit hinges on the total effects of the treatment on the intended target and unintended targets, and even that paradigm is affected by the general environment of standards of care for concomitant treatment.


Much has been written about the lessons from Type I antiarrhythmic drugs and hormone replacement therapy in women.  Both were thought to save lives based on epidemiological studies and uncontrolled studies.  When proper RCTs were done, both were found to be detrimental on balance for the broad purpose of prevention of cardiovascular death and disability.


The issue in chronic disease is that the signals for safety and efficacy are often mixed, and the meaning of a given rate of events must be interpreted in light of the fact that events occur as part of the natural history of the disease.  The case of high dose erythropoietin stimulating agents (ESAs) is very instructive.  People with renal failure become anemic and there is a direct relationship between the severity of anemia and the risk of cardiovascular events.  ESA’s were discovered as a normal part of biology and biological molecules have been successfully engineered.  When treated with ESA’s profoundly anemic patients with renal failure feel better as the hemoglobin improves.  In cohort studies patients with higher hemoglobin values on treatment have better outcomes.  This led to the concept of “normalizing hematocrit”, which was reinforced by clinical practice guidelines created in the absence of high quality randomized evidence.  Since the dialysis population is treated in a single payer (government) system, the quality standard had rapid uptake with significant use of high dose ESA’s in an effort to increase the Hgb .  Doctors, dialysis units, and pharmaceutical companies all made more money by adhering to the guidelines.  Finally, the trials were done and the answer was counterintuitive—high dose ESA’s were worse than lower dose, and surprisingly there were no significant differences in quality of life.


Do we really want to proliferate high-dose ESA’s without proper controlled trials that have a chance of giving reliable evidence about the balance of benefit and risk?  Is there a way to achieve the right balance?


Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

© Robert M. Califf, MD. Blog design by Hopkins Design Group Ltd. HTML, CSS, and WordPress Theme by Digital Mettle, LLC.