(Circulation. 1999;99:2378-2382.)
© 1999 American Heart Association, Inc.
Clinical Investigation and Reports |
From the Department of Critical Care (J.M.-A., E.P., M.S., M.L.M.) and the Department of Cardiovascular Surgery (I.D.T.), Hospital Universitario de Canarias, Spain.
Correspondence to Dr Ignacio Díaz de Tuesta, Servicio de Cirugía Cardiaca, Hospital Universitario de Canarias, 38320 La Laguna, SC Tenerife, Spain. E-mail tuesta{at}usa.net
| Abstract |
|---|
|
|
|---|
Methods and ResultsThis was a prospective observational study of
465 cardiac surgery patients in a tertiary referral center.
Probabilities of hospital death for patients were estimated by applying
the 4 models and were compared with actual mortality rates.
Performance of the 4 systems was assessed by evaluating
calibration with the Hosmer-Lemeshow goodness-of-fit test and
discrimination with receiver operating characteristic (ROC) curves.
2 values were 3.71 for Parsonnet, 4.52 for MPM
II0, 4.30 for MPM II24, 5.16 for SAPS II, and
10.57 for APACHE II. The area under the ROC curve was 0.857 for
Parsonnet, 0.783 for MPM II0, 0.796 for MPM
II24, 0.771 for SAPS II, and 0.803 for APACHE II.
ConclusionsIn our experience, the Parsonnet score performs very well, with calibration and discrimination very high, better than general severity systems, and it is an appropriate tool to assess mortality in cardiac surgery patients. In our experience, the general severity systems perform well to predict mortality after cardiac surgery, with high calibration of MPM II24, MPM II0, and SAPS II; minor calibration for APACHE II; and high discrimination for 3 general systems, but not as well as the Parsonnet score.
Key Words: surgery mortality risk factors
| Introduction |
|---|
|
|
|---|
| Methods |
|---|
|
|
|---|
Statistical Analysis
The performances of the severity-of-illness scoring
systems (APACHE II, SAPS II, MPM II0, and MPM
II24) and the Parsonnet score in cardiac surgery
patients were assessed by evaluation of calibration and
discrimination.
Calibration evaluates the degree of correspondence between the
probabilities of mortality estimated by the severity system and
the actual mortality experience. Calibration was assessed by the
Hosmer-Lemeshow goodness-of-fit test (C statistic), which
compares the number of observed and predicted deaths in deciles of risk
covering the entire range of probabilities of
death.9 10 The expected or predicted number of
nonsurvivors was obtained by summation of predicted mortality risks of
all individuals in the decile; the expected number of survivors was the
total number of individuals in the decile minus the expected
nonsurvivors.
2 equals the sum of the squared
difference between observed and expected numbers divided by the
expected number [
(E-O)2/E]. The smaller
this value, the better the calibration.
For assessing discrimination, or the ability of the model to discriminate between patients who live and patients who die, we used 2x2 classification tables with decision criteria of 10%, 50%, and 90% and the area under the receiver operating characteristic (ROC) curve, computed by a modification of the Wilcoxon statistics, as proposed by Hanley and McNeil.11 The areas under the ROC curves were compared by use of the z statistic, with correction for the correlation introduced by studying the same sample.12 As a general rule, the larger the area under the ROC curve, the better the discriminatory capability of the model. This method is available for scores and probabilities but is meaningful only after the model has been shown to calibrate well.
| Results |
|---|
|
|
|---|
Tables 1 through 5![]()
![]()
![]()
![]()
show the Hosmer-Lemeshow goodness-of-fit test for Parsonnet, MPM
II0, MPM II24, SAPS II, and
APACHE II, respectively. The goodness-of-fit tables show 10 groups
(deciles of risk), with increasing risk of mortality, which are
distributed as expected survivors and expected nonsurvivors, as well as
observed survivors and observed nonsurvivors. Low values of the
Hosmer-Lemeshow C statistic and the corresponding high
P values indicate good agreement between observed and
expected number of deaths.
|
|
|
|
|
The calibration of systems was
2=3.71,
df=8, P=0.8821 for Parsonnet;
2=4.52, df=8, P=0.8074
for MPM II0;
2=4.30,
df=7, P=0.7442 for MPM
II24;
2=5.16,
df=8, P=0.7397 for SAPS II; and
2=10.57, df=8, P=0.2269
for APACHE II.
Table 6
presents the classification
tables for the severity systems using decision criteria of 10%, 50%,
and 90%. When a decision criterion of, for example, 10% was applied,
predicted mortality risks >10% were considered as predicting hospital
mortality, whereas predicted mortality risks
10% were considered as
predicting survival. For each decision criterion, the true-positive
rate or sensitivity (the proportion of the observed deaths correctly
predicted to die), false-positive rate (the proportion of observed
survivors incorrectly predicted to die), and the overall correct
classification rate (proportion of patients correctly classified as
survivors or nonsurvivors) are presented.
|
With a decision criterion of 10%, sensitivity was 96% for Parsonnet, 65% for MPM II0, 81% for MPM II24, 65% for SAPS II, and 100% for APACHE II; the false-positive rate was 51%, 26%, 43%, 26%, and 94%, respectively; and the overall correct classification was 74%, 85%, 77%, 85%, and 19%, respectively. With a decision criterion of 50%, sensitivity was 11% for Parsonnet, 31% for MPM II0, 46% for MPM II24, 38% for SAPS II, and 54% for APACHE II; the false-positive rate was 0%, 1%, 1%, 1%, and 4%, respectively; and the overall correct classification was 95%, 95%, 96%, 95%, and 96%. With a decision criterion of 90%, sensitivity was 0% for Parsonnet, 8% for MPM II0, 0% for MPM II24, 15% for SAPS II, and 8% for APACHE II; the false-positive rate was 0% for all systems; and the overall correct classification was 94%, 94%, 94%, 94%, and 95%, respectively.
The Figure
shows the area under the ROC
curve. For Parsonnet, this was 0.857; for MPM
II0, 0.783; for MPM II24,
0.796; for SAPS II, 0.771; and for APACHE II, 0.803.
|
| Discussion |
|---|
|
|
|---|
The large databases and computer analyses used in developing general prognostic systems have permitted testing of multiple predictor variables and of their empirical importance by use of regression analysis. This methodology allows these systems to include a minimum number of variables, to simplify data collection, and yet to maintain prognostic accuracy.
The outcome to be measured must be relevant to clinicians, easily recognized, and well defined so as to be free of ascertainment bias. Hospital mortality is the outcome most commonly measured by currently available prognostic systems and meets all of these criteria. Hospital mortality is, and will remain, a highly relevant outcome for most physicians and patients.
Conversely, the prognostic factors used to calculate the outcome are usually clinical and epidemiological information that the researcher identifies empirically or on the basis of previous studies as outcome-related factors. They should be obtained by objective methods and avoid any distortion by the observer. In case of subjective information, as symptoms described by the patient, efforts must be made to classify the information on the basis of standard criteria. Usually, the prognostic factors are easily obtained from the available clinical information of the patient and do not require special diagnostic procedures.
Patients undergoing CABG were excluded from the APACHE data collection, and all cardiac surgery patients were excluded from the MPM and SAPS data collection. However, to predict hospital outcome, complications, and length of stay, recent articles analyze these systems in isolation or by combination of preoperative, intraoperative, and postoperative variables in cardiac surgery patients.13 14 15 A variety of models to predict mortality after cardiac surgery have been developed from analysis of outcome, most of them in coronary artery surgery. Each model has been validated at the originating institution.16 17 18 19 20 21 22 23 In general, predictive models perform better in the original setting than when transposed to other patient populations.24
In our work, we assess the performance of general severity of illness scoring systems (APACHE II, SAPS II, and MPM II) for cardiac surgery patients and compare these systems with the Parsonnet score to obtain a good estimate of severity of illness and probability of hospital mortality. APACHE III was not included in the study because the equations of the model are not in the public domain and remain subject to copyright.
For the Parsonnet score, the agreement between observed and expected in
the 2 columns of survivors and nonsurvivors corresponds to a small
value of the C statistic (good calibration) and therefore
indicates that the Parsonnet score provided an adequate estimation of
the probability of mortality in cardiac surgery patients. The Parsonnet
score calibrates very well (
2=3.71,
df=8, P=0.8821). The discrimination of the
Parsonnet score was very high; the area under the ROC curve was
0.857.
The evaluation of the performance of general severity of
illness measures showed an adequate estimation of the mortality
experience in cardiac surgery patients, but not as well as the
Parsonnet score. For MPM II24, MPM
II0, and SAPS II systems, low values of the
Hosmer-Lemeshow C statistic and the corresponding high
P value indicate good agreement between observed and
expected number of deaths, and these systems also calibrate very well
(
2=4.30 and P=0.7442,
2=4.52 and P=0.8074,
2=5.16 and P=0.7397, in descending
order of calibration). However, the APACHE II system did not calibrate
as well with a higher
2 value
(
2=10.57 and P=0.2269). The
discrimination of the 3 general severity systems was good: the area
under the ROC curve was 0.783 for MPM II0, 0.796
for MPM II24, 0.771 for SAPS II, and 0.803 for
APACHE II. These differences may be clinically insignificant. However,
the number of deaths was small, possibly weakening the predictive
ability.
Prognostic scoring is important because the physician and the patient need an idea of likely risk of hospital mortality rate. This information prepares the physician for complications and helps to stratify the patient. It also helps the patient and family to weigh the risk and benefits of surgery and clarifies their expectations. Accurate outcome data will result in better communication with patients and relatives, and the treatment is more likely to be consistent with the patient's value system. Conversely, as healthcare costs have increased, outcome assessment has become a major priority. Accurate and objective outcome prediction will allow for financial and human resources to be allocated appropriately. Resources will need to be dedicated to patients who are likely to benefit, and cost containment will be a constant pressure. Resolution of these conflicts will require specialists to be not only excellent physicians but also excellent managers.
A prognostic system that establishes a predicted mortality rate for each unit based on a representative database and a patient-by-patient measurement of risk allows comparison of observed versus predicted outcomes. The difference between actual and predicted death rates provides an outcome-based measure of quality of care and provides insight into means for improving performance.
For many clinicians, the most important question regarding prognostic scoring systems is, how can they help with individual patient care decisions? Many physicians believe that group statistics do not apply to individuals. Although individual patients do have unique characteristics, they also share many common features with previous patients, and consideration of these similarities permits us to anticipate the patients' responses and predict their outcomes. We do use past experiences every day when we choose one therapy over another, and we frequently base our decisions on the relative probability that a particular treatment will be successful in an individual patient. Statistical predictions of outcome produced by prognostic scoring systems are apparently at least as accurate as clinical predictions and in most cases are more reliable. These findings suggest that the predictions available from prognostic scoring systems could eventually be useful in aiding or supporting clinical judgment in decision making for individual patients.
We can foresee a very interesting consequence from our results: whereas the Parsonnet model does not observe the effects of the surgical procedure, general-purpose systems do. However, the Parsonnet score, a specific-purpose system that considers only patient preoperative conditions, performs better than general-purpose systems (MPM, SAPS, and APACHE) that include not only risk factors but also information about the patient situation after the procedure. Although surgery generates the most significant and probably most aggressive change in patient evolution, in a standard institution the results may be better estimated through an accurate evaluation of the preoperative conditions of the patient, focused on the particular disease to be treated, rather than through a general evaluation of the patient and his or her situation after the procedure. In summary, in an institution with acceptable standards, the most important consideration to predict the outcome is the previous conditions of the patients, and not the immediate consequences of the procedure.
In our experience, the Parsonnet score performs very well, with calibration and discrimination very high, better than general severity systems, and it is an appropriate tool to assess severity of illness in cardiac surgery patients, with applications to clinical practice and clinical research. In our experience, the general severity systems perform well to predict mortality after cardiac surgery, with high calibration of MPM II24, MPM II0, and SAPS II, in descending order, and minor calibration for APACHE II and high discrimination for 3 general systems, but not as well as the Parsonnet score.
Received September 24, 1998; revision received February 16, 1999; accepted February 16, 1999.
| References |
|---|
|
|
|---|
2. Knaus WA, Draper EA, Wagner DP, Zimmerman JE. APACHE II: a severity of disease classification system. Crit Care Med. 1985;13:818829.[Medline] [Order article via Infotrieve]
3.
Knaus WA, Wagner DP, Draper EA, Zimmerman JE, Bergner
M, Bastos PG, Sirio CA, Murphy DJ, Lotring T. APACHE III prognostic
system. Chest. 1991;100:16191636.
4. Le Gall JR, Loirat P, Alperovitch A, Glaser P, Granthil C, Mathiev D, Mercier P, Thomas R, Villers D. A simplified acute physiology score for ICU patients. Crit Care Med. 1984;12:975977.[Medline] [Order article via Infotrieve]
5.
Le Gall JR, Lemeshow S, Saulnier F. A new Simplified
Acute Physiology Score (SAPS II) based on a European/North American
multicenter study. JAMA. 1993;270:29572963.
6. Lemeshow S, Teres D, Pastides H, Avrunin JS, Steingrub JA. A method for predicting survival and mortality of ICU patients using objectively derived weights. Crit Care Med. 1985;13:519525.[Medline] [Order article via Infotrieve]
7.
Lemeshow S, Teres D, Klar J, Avrunin JS, Gehlbach SH,
Rapoport J. Mortality Probability Models (MPM II) based on an
international cohort of intensive care unit patients. JAMA. 1993;270:24782486.
8. Parsonnet V, Dean D, Bernstein AD. A method of uniform stratification of risk for evaluating the results of surgery in acquired adult heart disease. Circulation. 1989;79(suppl I):I-3I-12.
9.
Lemeshow S, Hosmer DW. A review of goodness of fit
statistics for use in the development of logistic regression models.
Am J Epidemiol. 1982;115:92106.
10. Hosmer DW, Lemeshow S. Applied Logistic Regression. New York, NY: John Wiley & Sons; 1989.
11.
Hanley J, McNeil B. The meaning and use of the area
under a receiver operating characteristic (ROC) curve.
Radiology. 1982;143:2936.
12.
Hanley J, McNeil B. A method of comparing the areas
under receiver operating characteristic curves derived from the same
cases. Radiology. 1983;148:839843.
13. Turner JS, Mudaliar YM, Chang RW, Morgan CJ. Acute Physiology and Chronic Health Evaluation (APACHE II) scoring in a cardiothoracic intensive care unit. Crit Care Med. 1991;19:12661269.[Medline] [Order article via Infotrieve]
14. Becker RB, Zimmerman JE, Knaus WA, Wagner DP, Seneff MG, Draper EA, Higgins TL, Estafanous FG, Loop FD. The use of APACHE III to evaluate ICU length of stay, resource use, and mortality after coronary artery bypass surgery. J Cardiovasc Surg Torino. 1995;36:111.[Medline] [Order article via Infotrieve]
15. Turner JS, Morgan CJ, Thakrar B, Pepper JR. Difficulties in predicting outcome in cardiac surgery patients. Crit Care Med. 1995;11:18431850.
16.
Hannan EL, Kilburn H Jr, O'Donnell JF, Lukacik G,
Shields EP. Adult open heart surgery in New York State: an
analysis of risk factors and hospital mortality rates.
JAMA. 1990;264:27682774.
17.
Higgins TL, Estafanous FG, Loop FD, Beck GJ, Blum JM,
Paranandi L. Stratification of morbidity and mortality outcome by
preoperative risk factors in coronary artery bypass patients: a
clinical severity score. JAMA. 1992;267:23442348.
18.
O'Connor GT, Plume SK, Olmstead EM, Coffin LH, Morton
JR, Maloney CT, Nowicki ER, Levy DG, Tryzelaar JF, Hernandez F, Adrian
L, Casey KJ, Bundy D, Soule DN, Marrin CA, Nugent WC, Charlesworth DC,
Clough R, Katz S, Leavitt BJ, Wennberg J. Multivariate
prediction of in-hospital mortality associated with coronary
artery bypass graft surgery. Circulation. 1992;85:21102118.
19.
Tuman KJ, McCarthy RJ, March RJ, Najafi H, Ivankovich
AD. Morbidity and duration of ICU day after cardiac surgery: a model
for preoperative risk assessment. Chest. 1992;102:3644.
20. Edwards FH, Clark RE, Schwartz M. Coronary artery bypass grafting: the Society of Thoracic Surgeons National Database experience. Ann Thorac Surg. 1994;57:1219.[Abstract]
21.
Hannan EL, Kilburn H Jr, Racz M, Shields EP, Chassin
MR. Improving the outcomes of coronary artery bypass surgery in
New York State. JAMA. 1994;271:761766.
22.
Orr RK, Maini BS, Sottile FD, Dumas EM, O'Mara P. A
comparison of four severity-adjusted models to predict mortality after
coronary artery bypass graft surgery. Arch Surg. 1995;130:301306.
23.
Tu JV, Jaglal SB, Naylor CD, and the Steering Committee
of the Provincial Adult Cardiac Care Network of Ontario. Multicenter
validation of a risk index for mortality, intensive care unit stay, and
overall hospital length of stay after cardiac surgery.
Circulation. 1995;91:677684.
24. Díaz de Tuesta I, Rufilanchas JJ, Cortina J, Renes E, Rodríguez E, Molina L, Pérez de la Sota E, Carrascal Y, Marolo L, Guillén F. A method for the predictive estimation of the surgical risk in adult cardiac pathology. Rev Esp Cardiol. 1995;48:732740.[Medline] [Order article via Infotrieve]
This article has been cited by other articles:
![]() |
N. Messaoudi, J. De Cocker, B. A. Stockman, L. L. Bossaert, and I. E.R. Rodrigus Is EuroSCORE useful in the prediction of extended intensive care unit stay after cardiac surgery? Eur. J. Cardiothorac. Surg., July 1, 2009; 36(1): 35 - 39. [Abstract] [Full Text] [PDF] |
||||
![]() |
O. Staudinger, H. Ostermann, G. Laufer, R. Schistek, B. Staudinger, and B. Tilg Evaluation of cardiac scoring models for an Austrian cardiac register Interactive CardioVascular and Thoracic Surgery, August 1, 2007; 6(4): 470 - 473. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Berman, A. Stamler, G. Sahar, G. P. Georghiou, E. Sharoni, R. Brauner, B. Medalion, B. A. Vidne, and A. Kogan Validation of the 2000 Bernstein-Parsonnet Score Versus the EuroSCORE as a Prognostic Tool in Cardiac Surgery Ann. Thorac. Surg., February 1, 2006; 81(2): 537 - 540. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Hekmat, A. Kroener, H. Stuetzer, R. H.G. Schwinger, S. Kampe, G. B.W.E. Bennink, and U. Mehlhorn Daily Assessment of Organ Dysfunction and Survival in Intensive Care Unit Cardiac Surgical Patients Ann. Thorac. Surg., May 1, 2005; 79(5): 1555 - 1562. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Hutfless, R. Kazanegra, M. Madani, M. A. Bhalla, A. Tulua-Tata, A. Chen, P. Clopton, C. James, A. Chiu, and A. S. Maisel Utility of B-type natriuretic peptide in predicting postoperative complications and outcomes in patients undergoing heart surgery J. Am. Coll. Cardiol., May 19, 2004; 43(10): 1873 - 1879. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Asimakopoulos, S. Al-Ruzzeh, G. Ambler, R.Z. Omar, P. Punjabi, M. Amrani, and K.M. Taylor An evaluation of existing risk stratification models as a tool for comparison of surgical performances for coronary artery bypass grafting between institutions Eur. J. Cardiothorac. Surg., June 1, 2003; 23(6): 935 - 942. [Abstract] [Full Text] [PDF] |
||||
![]() |
V. A. Ferraris and S. P. Ferraris Risk Stratification and Comorbidity Card. Surg. Adult, January 1, 2003; 2(2003): 187 - 224. [Full Text] |
||||
![]() |
M. R. Williams, R. B. Wellner, E. A. Hartnett, B. Thornton, M. N. Kavarana, R. Mahapatra, M. C. Oz, and R. Sladen Long-term survival and quality of life in cardiac surgical patients with prolonged intensive care unit length of stay Ann. Thorac. Surg., May 1, 2002; 73(5): 1472 - 1478. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. C. Stoica, L. D. Sharples, I. Ahmed, F. Roques, S. R. Large, and S. A.M. Nashef Preoperative risk prediction and intraoperative events in cardiac surgery Eur. J. Cardiothorac. Surg., January 1, 2002; 21(1): 41 - 46. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. B. Luciani, T. Menon, B. Vecchi, S. Auriemma, and A. Mazzucco Modified Ultrafiltration Reduces Morbidity After Adult Cardiac Operations: A Prospective, Randomized Clinical Trial Circulation, September 18, 2001; 104 (2009): I-253 - I-259. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. R. Leal-Noval, M. D. Rincon-Ferrari, A. Garcia-Curiel, A. Herruzo-Aviles, P. Camacho-Larana, J. Garnacho-Montero, and R. Amaya-Villar Transfusion of Blood Components and Postoperative Infection in Patients Undergoing Cardiac Surgery Chest, May 1, 2001; 119(5): 1461 - 1468. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Black Jr, R. Cortina, I. Bossi, R.e. Choussat, J. Fajadet, and J. Marco Unprotected left main coronary artery stenting: Correlates of midterm survival and impact of patient selection J. Am. Coll. Cardiol., March 1, 2001; 37(3): 832 - 838. [Abstract] [Full Text] [PDF] |
||||
![]() |
O. Pitkanen, M. Niskanen, S. Rehnberg, M. Hippelainen, and M. Hynynen Intra-institutional prediction of outcome after cardiac surgery: comparison between a locally derived model and the EuroSCORE Eur. J. Cardiothorac. Surg., December 1, 2000; 18(6): 703 - 710. [Abstract] [Full Text] [PDF] |
||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Circulation Home | Subscriptions | Archives | Feedback | Authors | Help | AHA Journals Home | Search Copyright © 1999 American Heart Association, Inc. All rights reserved. Unauthorized use prohibited. |