Within
Normal Limits
of
Reason

"Chance is the very guide of life"

"In practical medicine the facts are far too few for them to enter into the calculus of probabilities... in applied medicine we are always concerned with the individual" -- S. D. Poisson

November 03, 2005

BMC Medical Informatics - Pretest probability assessment derived from attribute matching (Kline et al 2005)

A patient comes into the ED with the complaint of acute chest pain. How do we assess the chance that the patient has something potentially lethal like acute coronary syndrome (ACS)?

Currently we rely on the experience and acumen of doctors with decades of training under their belts. No doubt these clinical judgments will outperform any system of automated patient assessment. But such gut-feelings are often difficult to quantify, and when clinicians disagree, it boils down to a matter of opinion. The Principle of Defensive Medicine kicks in, and our patient stays in the ED for a full night of tests. So what exactly is the probability of ACS given the patient's presentation? Having an objective, numerical estimate based on the collective medical records can assist clincians in making difficult decisions. Clinical acumen can be supplemented with the indisputable evidence of the electronic medical records database.

Typically pretest probabilities are generated using logistic regression models. This is a parametric model for modelling probabilities of binary outcomee, such as the probability of having a myocardial infarction and the risk of adverse outcome given UA/NSTEMI (TIMI score)). A member of the Generalized Linear Models family of parametric models, logistic regression has a number of convenient statistical properties. However as Kline et al noted,
From the perspective of probability estimation for acute disease, one of the main drawbacks to logistic regression equations is that they seldom output a pretest probability in the very low (0�5%) range ... this is the range where the clinician must decide whether or not to use the resources required for formal testing by a chest pain protocol.

In addition,

  1. Additive risk in logit scale, an assumption of logistic regression, may be easily violated and is difficult to validate

  2. Complex interactions between clinical variables are difficult to model and to assess with logistic regression.


Given the gigabytes of data in our collective electronic medical records databases, we have the freedom to adopt less parametric methods. Kline et al uses a database of 14,796 emergency department (ED) patients (the i*trACS study) evaluated for possible ACS at 8 hospitals (7 in US, 1 in India). Each patient record was associated with 70 clinical variables. To find the most informative clinical variables, Kline et al used classification and regression tree analysis (CART). This is a relatively intuitive statisitical method that makes no distributional assumptions. It constructs a binary tree recursively. At each stage, it divides the group into 2 subgroups using the variable that can best distinguish the subjects with respect to the classification variable.

This is analogous to seuqentially cutting a cake into 2 halves, 4 halves-of-halves, and 8 halves-of-halves-of-halves, and so on. Quick intuitive isn't it?

Kline et al built a CART tree using this database, and yielded 8 clinical variables of predictive value:
  1. Age (<35, 35-38, 39-50, >50 years)

  2. Gender

  3. Race (white or asian, nonwhite and non-asian)

  4. Patient report or physician observation of sweating with symptoms

  5. Patient report of a prior history of coronary artery disease or myocardial infarction

  6. Chest pain worsened with manual palpation on physical examination

  7. STD >-0.5 mm in any two leads
  8. T wave inversion >-0.5 mm in any two leads


This CART tree, which they call attribute matching, was validated in a separate population of 8,120 patients presenting at the UCSD and UPenn EDs with suspected ACS. (Table of validation results vs logistic regression model).

While this result is not unexpected (given the large database of patient records, the more flexible, nonparametric CART approach not surprisingly gave a better fit of the data and was better at predicting the risk of ACS), it demonstrated the clinically important strength of CART in identifying low-risk patients:
The sensitivity and specificity of the very low-risk designation for the detection of acute coronary syndrome at 30 days was 95.3% and 25.4% for attribute matching, versus 99.3% and 3.7% for the logistic regression method.

This subset of patients are clinically most important to identify objectively and quantitatively to bolster clinical decision making:
In a 1997 multicenter study, Graff and colleagues found that only 2.5% of patients evaluated by a chest pain protocol were diagnosed with acute myocardial infarction. In our experience, this "rule-in rate" is declining.

A potential application of the present system would be use of the combination of a pretest probability <2.0%, and one negative biomarker of cardiac ischemia or necrosis, in conjunction with the patient's risk tolerance to prevent unnecessary chest pain protocol evaluation


A user-friendly interface was also developed by the authors for clinical application.

Some comments. What Kline et al did not address was the most undesirable aspect of CART: the instability of the results. As you can imagine, since each step depends on all of the cuts made previously, small variations in data leading to a different choice of cut at an early stage may result in progressively amplified differences in all subsequent stages. A statistical solution is the technique of bagging.

We note that a very useful characteristic of CART--endowed by its inherent flexibility--is the ease with which it can accomodate missing data, which we know is all too common in clinical databases.

We note that nonparametric does not mean the absence of modeling assumptions. Specifically, independence of observations is still assumed in building the CART tree.

For review of statistical classification methods see The Elements of Statistical Learning by Hastie, Tibshirani, and Friedman 2001.

Technorati Tags:

No comments: