(Circulation. 2002;106:746.)
© 2002 American Heart Association, Inc.
Clinical Cardiology: New Frontiers |
From the Duke Clinical Research Institute (R.M.C.), Duke University Medical Center, Durham, NC, and the Department of Biostatistics and Medical Informatics (D.L.D.), University of Wisconsin, Madison, Wis.
Correspondence to Robert M. Califf, MD, Duke Clinical Research Institute, PO Box 17969, Durham, NC 27715.
Key Words: trials cardiovascular diseases therapy outcome assessment statistics
| Introduction |
|---|
|
|
|---|
We are entering an era in which the imperative to understand the rational basis for diagnostic and therapeutic options has become a major force in medical care. Medical products (drugs, devices, and biologics) are proliferating simultaneously with a substantial restructuring of the delivery of health care, with a focus on evidence to support medical interventions. As the texture of evidence to support clinical practice becomes clearer, the imperative to investigate the effectiveness of behavioral interventions becomes more important. When coupled with concern about the rising cost of medical care, this proliferation of technological and behavioral intervention reinforces the view that we cannot afford to offer all possible diagnostic and therapeutic options to all patients. Recent publicity about therapies that were in widespread use and were later found to be detrimental13 has fueled this concern to extend well beyond the cost implications. The advent of genomics, proteomics, combinatorial chemistry, and advanced biomedical engineering will increase the importance of developing rational evidence about therapeutic risks and benefits. This effort to develop a rational basis for diagnostic and therapeutic decisions on the basis of quantitative assessments is a major component of evidence-based medicine.
The randomized clinical trial (RCT) has emerged as the principal research tool for developing evidence to inform and influence clinical practice, particularly in the cardiovascular field. The practice of clinical trials is built on an understanding of the biology of disease, the structure of medical care delivery, statistical design and analysis, data management, clinical research quality, and scientific and medical integrity. As clinical trials continue to evolve, investigators will be challenged to find the optimum balance among these factors that best evaluates new interventions and places older approaches in the proper context as new technology and practice patterns emerge.
Because the RCT is relatively new as a scientific technique, little empirical experience could be used to test this balance in the past. However, the recent explosive growth of RCTs with their widely discussed methodological approaches has been valuable in guiding current RCT design and deployment.4,5 In the first 2 parts of this series of articles, we will review some of the lessons we have learned from recent cardiovascular trials about the conduct of trials. In the second two parts, we will examine proposed implications of these lessons for medical practice and focus on applying the concepts in clinical trials to patient care decisions. These concepts represent accumulated wisdom from many collaborators and colleagues over the past several decades.
| Surrogate Outcome Measures |
|---|
|
|
|---|
Surrogates are required for early-phase research because deductive, pathophysiological reasoning must be applied when developing new therapies. Surrogates also provide a potential means of monitoring the effect of therapy in an individual patient. Lowering blood pressure in a population of hypertensive patients is an example of a potential surrogate for the clinical outcome of death, myocardial infarction (MI), or stroke.
Composite outcomes are also used to evaluate treatments. A composite outcome may be composed of clinical events or measures such as survival or disease-free survival, and nonclinical outcomes such as change in blood pressure or CD4 cell counts by a specified amount in the treatment of HIV-positive patients. The validity of a composite depends on the validity of its individual components.
Unfortunately, potential surrogates have been inadequately validated in most cases. A valid surrogate outcome must fulfill at least 2 statistical and logical criteria.6 First, changes in the surrogate must be predictive of the relevant clinical outcome. The second, and more critical requirement, is that a potential surrogate must fully (or nearly so) capture the effect of the intervention on the clinical outcome (Figure 1a).
|
This second requirement can be difficult to fully appreciate. In considering surrogate outcomes, investigators might think of a simple cause-and-effect pathway for the treatment. If this were the true biological process, then measuring the surrogate would measure all of the treatment effect and would thus be adequate. Many, if not most, treatments have several effect pathways, however, as depicted in Figure 1b. The treatment may alter the clinical outcome in 2 pathways, although the surrogate measures only 1 of them; drugs, for example, may have multiple binding sites. Alternatively, a treatment may affect the clinical outcome on a pathway that is not related to the pathway the surrogate is measuring. In these cases, measuring only the proposed surrogate does not reflect the ultimate clinical impact of the therapy. In addition, the concept behind the proposed surrogate may be correct, but the measure of the concept may be flawed. However, the proposed surrogate is a specific measure of the concept and if the measure is flawed, the proposed surrogate may be invalid. In the past, these criteria, especially the second, have not been easily met,7,8 and proposed surrogate outcome measures in cardiovascular trials have been extremely disappointing.
Perhaps the most dramatic failure of a proposed surrogate outcome was demonstrated in the Cardiac Arrhythmia Suppression Trial (CAST).9 This trial tested whether 3 drugs in a class of antiarrhythmic drugs prevented sudden deaths in patients at risk because of premature ventricular contractions. Ventricular arrhythmias have been repeatedly found to be a risk factor for sudden death and total cardiovascular death. As a result, a class of drugs was developed that suppressed arrhythmias, and these drugs were approved for this indication by regulatory agencies. The cardiovascular community believed that because ventricular arrhythmias were predictive of sudden death and because these drugs could suppress ventricular arrhythmias, the drugs would reduce sudden death and total cardiovascular death.
CAST tested encainide, flecainide, and moricizine in patients with documented arrhythmias. Patients first had to go through a prerandomization "run-in" period that determined whether they had suppressible arrhythmias. If so, they were then randomly assigned, either to one of the three drugs or to a matching placebo. The primary outcome was sudden death, and the secondary outcome was death from any cause. The results were surprising.9,10 Very early in CAST, 2 arms (encainide and flecainide) were terminated because of a highly significant increase in both sudden death and all-cause mortality for patients on active treatment. The third arm was later stopped because of an increase in mortality.10
In another example, improved cardiac function (as estimated by cardiac output) was named as a surrogate in patients with chronic heart failure. Several inotropic drugs were developed that improved various heart function measures, such as cardiac output, and these were later tested in a series of trials.1114 Despite the demonstration that these drugs would improve cardiac output in the short term, many of them increased mortality. Three trials were terminated early because of the unethical nature of continuing an experimental treatment that increased the risk of mortality. Thus, despite being predictive of patient survival, improved cardiac output was not a valid surrogate for evaluating this new class of drugs.
Even the previously accepted surrogate of coronary perfusion status for coronary artery reperfusion strategies has recently been debated.7 Many agents with superior efficacy for achieving TIMI grade 3 flow have not proven to be superior in reducing mortality.15 Although more complex measures such as echocardiographic contrast16 or myocardial blush score,17 or simple measures such as the ST-segment resolution on the ECG may correlate even more closely with outcome, they have not been validated as surrogates, and in the recently completed Global Use of Strategies To Open occluded coronary arteries V in Acute Myocaridal Infarction (GUSTO V AMI) trial, they predicted an effect on mortality that did not occur in the trial.18 These examples offer sound evidence that clinical trials intended to inform clinical practice must not rely on unproven surrogates and that strict criteria should be met before a nonclinical outcome can be accepted as a valid surrogate.
Until recently, blood pressure and low-density lipoprotein (LDL) cholesterol levels were considered valid surrogates for predicting the effect of treatment on outcome, but the results of the Antihypertensive and Lipid Lowering treatment to Prevent Heart Attack Trial (ALLHAT) point out the problems with this assumption.19 Although the trial is not complete, the Data and Safety Monitoring Board recommended discontinuing the use of doxazocin, an
-adrenergic blocking agent, because of a sharp increase in heart failure in the treated patients relative to the reference arm of the study (chlorthalidone). According to the investigators, this happened despite the similar efficacy of doxazocin and chlorthalidone on blood pressure and well-documented improvements in lipids and glucose control with doxazocin relative to chlorthalidone. Similar data have recently been published about stroke in a comparison of the angiotensin II receptor-blocker, losartan, with the ß-blocker, atenolol.20 Recent indirect comparisons21 have called into question whether different agents with the same effect on LDL cholesterol would have an equal effect on reducing clinical events. Furthermore, the recent product recall of cerivastatin resulted from a known toxicity (rhabdomyolysis) in the absence of quantification of benefit. The drug had been marketed on the basis of the demonstration of a reduction in LDL cholesterol superior to other "statins" without any evidence of superior prevention of vascular events. Most recently, the Heart Protection Study22 found no relationship between baseline cholesterol values and benefit of simvastatin in preventing cardiac events, and the Air Force/Texas Coronary Atherosclerosis Prevention Study23 study found a benefit of pravastatin in patients with LDL cholesterol <100 mg/dL and elevated C-reactive protein level, providing further evidence that the benefits of statins may not be mediated solely by lipid levels. Thus, it seems highly unlikely that biomarkers will ever be able to predict all possible toxicities of a systemically administered drug.
Because of these experiences, it is preferable to use the term physiological measure to describe biological measures that give insight into disease progression.24 Biomarkers proposed as substitutes for tangible human outcomes should be evaluated rigorously, and this evaluation should employ modern statistical assessment in the context of current pathophysiological thinking. Trialists must distinguish between the correlation of a biomarker with outcome and the causation of a relevant clinical outcome difference. Furthermore, even if a proposed surrogate was determined to be valid for one class of treatments, it may be invalid for another class of treatments, and a surrogate may not accurately predict outcome for another drug or biologic in the same general class. Relying on nonvalidated surrogates only encourages the use of ineffective therapies and may even promote the use of harmful treatments.
| Composite Clinical End Points |
|---|
|
|
|---|
| Subgroups and Treatment Interactions |
|---|
|
|
|---|
However, the evaluation of multiple subgroups can lead to nominally significant results by chance alone. To avoid spurious results as a result of multiple testing, clinical trialists prefer for subgroup analyses to be specified in advance and not performed as a result of conducting a large number of divisions of the cohort. However, safety analyses do not readily allow all potential risk groups to be identified in advance. Thus, subgroup analyses require careful interpretation. Despite the attractiveness of these subgroup analyses to many clinicians,26 the best estimate of treatment effect to be expected from a patient treated outside the trial is still the overall estimate.27 This empirical observation about subgroups has profound implications for genetic polymorphism analysis done to identify differential responses to treatment (pharmacogenomics). Such observations should be replicated in independent samples before they are accepted because of the high likelihood of spurious associations when multiple polymorphisms are tested.
When a trial demonstrates either a significant beneficial or a harmful treatment effect, the direction of this effect is usually consistent within subgroups. The size of the effect within any particular subgroup can vary from the overall effect. For example, in recent trials focusing on heart failure,2831 coronary intervention,3235 coronary reperfusion,3639 and secondary prevention with statins (Figure 2),40 the treatment effect across known risk factors was highly consistent in terms of direction and relative size. Any variations were well within the expected range, as compared with other trials with the same number of events. This same trend has been reported in trials with a harmful or negative effect, such as those evaluating inotropic drugs for heart failure11,13,14 and orally administered glycoprotein IIb/IIIa receptor antagonists.4145
|
Trialists have known for some time that when a trial demonstrates an overall effect, failure to have a statistically significant finding in an individual subgroup does not mean that the treatment is ineffective in patients with that particular characteristic. In studies of primary46 or secondary47 prevention with aspirin, few women were randomized, and the results in women were not statistically significant. Unfortunately, literal interpretation of this subgroup analysis led to the undertreatment of women for years,48 although we now know that aspirin is effective in women.
Not all trials provide perfect consistency. When a particular subgroup suggests a different treatment effect from the overall effect, caution is needed in the interpretation. The Prospective Randomized Amlodipine Survival Evaluation (PRAISE I) trial49 evaluated amlodipine, a calcium channel blocker, in the treatment of congestive heart failure. The randomization was stratified by ischemic and nonischemic pathogenesis because of concern that the treatment might not be effective in patients with nonischemic heart disease. Mortality plus hospitalization was the primary outcome, and mortality from all causes was the secondary outcome. Three thousand patients were randomly assigned to best available care plus either drug or placebo. Overall results of the mortality outcome provided a nearly significant log rank test (P=0.06) favoring amlodipine. The treatment by pathogenesis subgroup interaction test was highly significant (P=0.004), indicating a hazard ratio of 1.0 in the ischemic subgroup and 0.6 in the nonischemic subgroup. Thus, the observed results were contrary to the prior belief that treatment would be more effective in patients with ischemic heart disease. With such a highly significant interaction test, statistical theory and practice suggested that the trial be interpreted separately for each subgroup. Accordingly, the standard interpretation for these results would judge amlodipine beneficial in the nonischemic subgroup.
The trial investigators and the sponsor were encouraged by the positive results in the nonischemic subgroup but were reluctant to endorse the treatment effect without further confirmation. Thus, the Prospective Randomized Amlodipine Survival Evaluation (PRAISE II) trial50 was conducted in a nonischemic heart failure population. Although the design of PRAISE II was similar to PRAISE I (except that enrollment was limited to patients without coronary artery disease), the results proved to be different. The overall hazard ratio for PRAISE II was 1.0, showing no apparent treatment benefit for nonischemic patients. No differences in patient baseline characteristics or concomitant therapy between PRAISE I and PRAISE II could explain the discrepant results.
Whether the discrepancies were due to a chance finding or to a change in background therapy in the niche for amlodipine identified in PRAISE I is unknown. If the overall results of PRAISE I had produced a probability value <0.05, a nonconservative interpretation might have recommended a positive benefit of amlodipine overall, or at least a positive benefit in the nonischemic subgroup. The results of PRAISE II remind us of the need to confirm results, especially when such results are not expected. Even if results are as expected, the most reliable estimate of a subgroups results are still the overall results, not the estimate of the particular subgroup.
Trials should not be expected to provide absolute consistency, although further analyses can often provide insight into the plausibility of observed variations. For example, 3 trials recently investigated ß-blocker therapy in patients with heart failure: the Metoprolol CR/XL Randomized Intervention Trial in congestive heart failure (MERIT),28 the Cardiac Insufficiency Bisoprolol Study II (CIBIS-II),29 and the Carvedilol Prospective Randomized Cumulative Survival (COPERNICUS) trial.30 All 3 trials terminated early because of positive overall results on rates of mortality and mortality plus hospitalization. In keeping with the usual practice, prespecified subgroup analyses were conducted, and the results were extremely consistent.
However, one post hoc subgroup analysis in the MERIT trial suggested that the mortality results in the United States were not as impressive as the overall mortality results.51 The US results for the combined end point of mortality and hospitalization were consistent with the other countries and with the overall results,52 and the US subgroup results were consistent with the overall study results in the other 2 trials. Thus, it is highly unlikely that best available care in the United States diminishes the benefit of ß-blockers in heart failure patients. A similar variation across clinical sites was observed in the ß-Blocker Heart Attack Trial (BHAT),53 a trial that used ß-blockers in patients surviving a recent MI. Overall results showed a highly significant reduction in total mortality, yet an increase in risk existed in a few sites.
The complexity of this lesson is exemplified by the difficulty of guaranteeing adequate participation of underrepresented minorities in clinical trials. Although a laudable goal, it is clear that in a clinical trial with a sample size designed to detect a clinically meaningful overall effect, there is usually not sufficient minority participation to ensure detection of a heterogeneous treatment effect as a function of race or ethnic background. When differences are found, as in the recent trials of ß-blocker and ACE inhibitor treatment for heart failure,54,55 they can serve only as hypothesis-generating experiences to stimulate further research.
The lesson for clinical trialists here is that a trial is typically designed to detect an effect in the whole population. Subgroup findings should be regarded with suspicion unless they are independently confirmed or expected on the basis of prior findings. In retrospect, almost any subgroup finding can be justified on the basis of a theory, but failing to recognize the capriciousness of random variation often leads to premature acceptance of the results, risking the adoption of inferior or unnecessarily costly treatments for patients.
The next article in this series will conclude the lessons we have learned about the conduct of clinical trials on the basis of recent cardiovascular clinical research. The final 2 articles will translate those lessons into principles to assist the clinician in applying the results of well-done clinical trials to the care of individual patients.
| Footnotes |
|---|
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
A. N. DeMaria Clinical trials and clinical judgment. J. Am. Coll. Cardiol., March 18, 2008; 51(11): 1120 - 1122. [Full Text] [PDF] |
||||
![]() |
A. F. Hernandez, G. C. Fonarow, and E. D. Peterson Implantable Cardioverter-Defibrillators, Heart Failure, and Patient Characteristics--Reply JAMA, January 23, 2008; 299(3): 286 - 286. [Full Text] [PDF] |
||||
![]() |
R. A. Harrington, V. Hasselblad, and R. M. Califf Defining and utilizing surrogates in the evaluation of coronary stents: what do we really want and need to know? J. Am. Coll. Cardiol., January 1, 2008; 51(1): 33 - 36. [Full Text] [PDF] |
||||
![]() |
N. Mayer-Hamblett, B. W. Ramsey, and R. A. Kronmal Advancing Outcome Measures for the New Era of Drug Development in Cystic Fibrosis Proceedings of the ATS, August 1, 2007; 4(4): 370 - 377. [Abstract] [Full Text] [PDF] |
||||
![]() |
The TRIUMPH Investigators Effect of Tilarginine Acetate in Patients With Acute Myocardial Infarction and Cardiogenic Shock: The TRIUMPH Randomized Controlled Trial JAMA, April 18, 2007; 297(15): 1657 - 1666. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. Ferreira-Gonzalez, G. Permanyer-Miralda, A. Domingo-Salvany, J. W Busse, D. Heels-Ansdell, V. M Montori, E. A Akl, D. M Bryant, P. Alonso-Coello, J. Alonso, et al. Problems with use of composite end points in cardiovascular trials: systematic review of randomised controlled trials BMJ, April 14, 2007; 334(7597): 786 - 786. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. B. Granger and J. J.V. McMurray Using Measures of Disease Progression to Determine Therapeutic Effect: A Sirens' Song J. Am. Coll. Cardiol., August 1, 2006; 48(3): 434 - 437. [Abstract] [Full Text] [PDF] |
||||
![]() |
J.-C. Tardif, T. Heinonen, D. Orloff, and P. Libby Vascular Biomarkers and Surrogates in Cardiovascular Disease Circulation, June 27, 2006; 113(25): 2936 - 2942. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. E. Freedland, G. E. Miller, and D. S. Sheps The Great Debate, revisited. Psychosom Med, March 1, 2006; 68(2): 179 - 184. [Full Text] [PDF] |
||||
![]() |
A. S. Brody, H. Sucharew, J. D. Campbell, S. P. Millard, P. L. Molina, J. S. Klein, and J. Quan Computed Tomography Correlates with Pulmonary Exacerbations in Children with Cystic Fibrosis Am. J. Respir. Crit. Care Med., November 1, 2005; 172(9): 1128 - 1132. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. N. Reddan, L. A. Szczech, V. Hasselblad, E. G. Lowrie, R. M. Lindsay, J. Himmelfarb, R. D. Toto, J. Stivelman, J. F. Winchester, L. A. Zillman, et al. Intradialytic Blood Volume Monitoring in Ambulatory Hemodialysis Patients: A Randomized Trial J. Am. Soc. Nephrol., July 1, 2005; 16(7): 2162 - 2169. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. S. March, S. G. Silva, S. Compton, M. Shapiro, R. Califf, and R. Krishnan The Case for Practical Clinical Trials in Psychiatry Am J Psychiatry, May 1, 2005; 162(5): 836 - 846. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. A. Blumenthal, A. Sherwood, M. A. Babyak, L. L. Watkins, R. Waugh, A. Georgiades, S. L. Bacon, J. Hayano, R. E. Coleman, and A. Hinderliter Effects of Exercise and Stress Management Training on Markers of Cardiovascular Risk in Patients With Ischemic Heart Disease: A Randomized Controlled Trial JAMA, April 6, 2005; 293(13): 1626 - 1634. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. A. Diamond and S. Kaul Prior convictions: bayesian approaches to the analysis and interpretation of clinical megatrials J. Am. Coll. Cardiol., June 2, 2004; 43(11): 1929 - 1939. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Massel Composite confusion J. Am. Coll. Cardiol., May 19, 2004; 43(10): 1926 - 1927. [Full Text] [PDF] |
||||
![]() |
J. P. Casas, L. A. Cubillos-Garzon, and C. A. Morillo Regional Pathologies and Globalization of Clinical Trials: Has the Time for Regional Trials Arrived? Circulation, May 27, 2003; 107 (20): e194 - e194. [Full Text] [PDF] |
||||
![]() |
K. M. Kessler, D. L. DeMets, and R. M. Califf Combining Composite Endpoints: Counterintuitive or a Mathematical Impossibility? * Response Circulation, March 11, 2003; 107 (9): e70 - e70. [Full Text] [PDF] |
||||
![]() |
D. S. Sheps, K. E. Freedland, R. N. Golden, and R. P. McMahon ENRICHD and SADHART: Implications for Future Biobehavioral Intervention Efforts Psychosom Med, January 1, 2003; 65(1): 1 - 2. [Full Text] [PDF] |
||||
![]() |
J. J. Bax, E. E. Van der Wall, M. J. Schalij, S. S. Gottlieb, M. L. Fisher, W. T. Abraham, and the MIRACLE Study Group Cardiac Resynchronization Therapy for Heart Failure N. Engl. J. Med., November 28, 2002; 347(22): 1803 - 1804. [Full Text] [PDF] |
||||
| |||||||||||||