Normal Limits

"Chance is the very guide of life"

"In practical medicine the facts are far too few for them to enter into the calculus of probabilities... in applied medicine we are always concerned with the individual" -- S. D. Poisson

June 07, 2006

Crowdsourcing topics in medical research

Not much activity here at the Within Normal Limits of Reason blog lately. Looking over the traffic to this blog, I found that my short post on "post-hoc analysis" was getting a lot of referrals from the major search engines. I decided to port that into Wikipedia. Then, seeing the lack of entries on specific topics in medical research, I started to contribute to other articles... and now I'm hooked! I'm adding bits and pieces on entries related to medical research like randomized clinical trials and EBM, and initiating a couple of entries like post-randomized consent and interim analysis--mostly works in progress.

The quality of the Wikipedia entries are generally pretty good, and given the editorial oversight and the informal peer review by the entire English-reading internet there really isn't as much misinformation as one might fear. There is some link-spamming but that is generally obvious and unobtrusive. As Wired magazine put it, Wiki has really harnessed the power of crowdsourcing and proved it can work.

Technorati Tags:

March 30, 2006

Evidence-based rock, paper, and scissors

Non-transitive dice, invented by Stanford statistician Professor Bradley Efron, defies intuition. In one version, it consists of a sequence of 4 dice A, B, C, D, each with non-standard numbers of its 6 faces. The numbers are chosen such that no matter which die of the 4 you pick, all I need to do is to pick the one before it (e.g., if you picked C then I'll pick A, or if you picked A then I'll pick D), and then, on our next throws of the dice, I am guaranteed to beat you 2/3 of the time. There is no trickery. The dice are not mechanically loaded; the numbers work out so. But this non-transitivity is disconcerting and boggles the mind.

We see something similar in the mediacl research literature. In the Feb issue of the Journal of American Psychiatry, Dr. Heres et al reviewed head-to-head comparisons of 2nd-generation antipsychotics. The report, titled Why olanzepine beats risperidone, risperidone beats quetiepine, and quetiepine beats olanzepine, studied 42 trials that were designed to each compare two 2nd-generation antipsychotics. Published results of these trials were assessed by a psychiatrist and an internist who were blinded to the identity of the medications. They evaluated whether the report favored one drug of the other, and a host of other potential sources of bias in these studies.

Some interesting findings:

  1. 33 trials (out of 42) were sponsored by pharmaceutical companies: No surprise here. In fact they thought these industry-sponsored trials were generally of better quality as far as design, implementation, and sample size are concerned.

  2. 90 percent of these trials favored the medication sold by the sponsoring pharm company

The wording of the abstract is also systematically different. In those abtracts in which the medication sold by the sponsoring company was shown to be superior, the findings would be elaborated in detail towards the end of the abstract. On the other hand, those trials with results unfavorable to the sponsor only briefly mentions the ersult at the beginning of the abstracts.

All in all not as surprising as the transitive dice.

Technorati Tags:
/ / / /

March 29, 2006

STAR*D: landmark study in psychiatry

The Orion Nebula [M42] by NASA via Flickr

The results of Sequenced Treatment Alteratives to Relieve Depression (STAR*D), a landmark study in psychiatry, are beginning to be published. Beyond the important questions it answers, STAR*D is remarkable for:

  1. Large sample size: about 4,000 patients are recruited, a mammoth study by the standards of psychiatry literature

  2. Broad inclusion criterion: any patient presenting for care with nonpsychotic major depressive disorder (defined using the HAM-D scale) for whom his/her clinician deems outpatient anti-depressant therpay would be appropriate. Most anti-depressant trials advertise for patients, which would arguably select for a qualitatively different patient population.

  3. Participation of primary care physicians: in addition to psychiatrists in outpatient psych clinics, PCPs were managing these patients in their daily practice. Titration of anti-depressant dosages were based on validated psychometric instruments (for example see the HRS-D and QIDS-C form used by the STAR*D researchers)

  4. Use of depression remission as endpoint: this is closer to the goals of treatment in clinicial practice. Quantitative abatement of depressive symptoms--the outcome measure of most previous anti-depressant trials--may be maningful and significant but its clinical value is difficult to assess.

Simply, by virtue of its design, STAR*D has ecological validity and clinicians can incorporate with confidence the findings of STAR*D studies.

What are these findings? So far, they have found:

  1. About 30 percent of patients showed remission when placed on Celexa. Predictors of remission include high levels of education, employment status, Caucasian race, and few psychiatric and medical co-morbidities. (Trivedi et al 2006, American Journal of Psychiatry 163, 28)

  2. Of the patients who showed no sufficient response to Celexa and who immediately switch to another anti-depressant (either Wellbutrin, Effexor, or Zoloft), about 1 in 3 patients will show remission within 14 weeks. The magnitude of the effect is about the same regardless of the class of medication switched to. That is, switching to Zoloft--another SSRI like Celexa--was as effective as switching to either Wellbutrin or Effexor (Rush et al 2006, NEJM 354, 1231).

  3. Of patients who showed no sufficient response to Celexa and whose medication regimen were immediately augmented with Wellbutrin or BuSpar, about 30 percent showed remission. (Trivedi et al 2006, NEJM 354, 1243)

STAR*D is a result of the NIMH's push for "practical clinical trials", trials that by design answer practical questions of great clinical value.

Technorati Tags:
/ / / / / /

February 19, 2006

Overlapping confidence intervals

From AP's report on patch contraceptives and the risk of venous thromboembolism:

However, because the confidence intervals of the results for the two forms of contraceptive overlap, there actually may be no increased risk from the patch or it may be more than double.

It's a common mistake to use confidence intervals of 2 measures to assess whether they are significantly different. Two measures can have overlapping CIs yet remain statistically significantly different. As an counterexample, let's take this extreme situation:

* Experiment 1 arrives at the conclusion that if the experiment were repeated thousands of times, the outcome measure X will turn out to be 0 with probability 2.5%, 1 with probability 94.5%, 2 with probability 0.5%, and 3 with probability 2.5% (see top graph). Obviously from the graph, the 95% CI for outcome X is from 1 to 2.

* Another (indepedent) experiment 2 arrives at the conclusion that if the experiment were repeated thousands of times, the outcome measure Y will turn out to be 0 with probability 2.5%, 1 with probability 0.5%, 2 with probability 94.5%, and 3 with probability 2.5% (see bottom graph). Again the CI for outcome Y is simple to see and spans 1 and 2.

Now we have 2 CIs that not only overlap but are in fact identical!

Given this fact that X and Y have identifcal CIs (ignoring the tail probabilities for now), can we conclude that they are not statistically significantly different? Since outcome X is so strongly concentrated at value 1 vs the strong weight of outcome Y at value 2, inspecting the graph and relying on our intuition, we are forced to conclude that despite their identical CIs, outcomes X and Y are statistically significantly different.

How does our intuition compare with statistical reality? A simple simulation test found X and Y to be significantly different with p basically equal to 0 (by Mann-Whitney test).

That overlapping CIs do not imply lack of significant difference is true in real-world situations too. More on this later.

Technorati Tags:
/ /