In a systematic review of riluzole for amyotrophic lateral sclerosis done for NIHCE, the authors found a marginally statistically significant result suggesting an improvement in functional status associated with riluzole, with p-values ranging from 0.03 to 0.07 for the pooled results from the three different measurement scales reported in the trials. This was the first time any evidence was published suggesting a statistically significant improvement in functional status because, in their representations to the regulatory authorities, the company had not pooled the results of all the trials for these outcomes and the drug was licensed to extend tracheostomy-free survival only.
Despite this new finding, there is still a considerable question mark over the clinical significance of these results. The improvements observed were very small, just one or two points on scales which might have a maximum score of 80 or 100, and which measure "fuzzy" things like the ability to grip a pencil. It is certainly encouraging to find that there might be an improvement in functional status to go with the likely improvement in survival, but it is important that someone who is considering taking riluzole is given clear information about the likely size of the benefits, and the likelihood of harm due to side effects, before making a decision about treatment.
Statements like "the difference in means was 2.7 (p=-.023)" or "RR=1.6 (p=0.044)" don't mean anything to real people, and they might be misleading to the clinician. They need to be translated into clinically meaningful terms before we can assess whether the benefit offered is worth the costs. We will look at a simple means to do this in the next two sections, but first we will discuss another important consideration: what was the comparator?
Control arms and understanding the question asked
If a treatment has been compared favourably to placebo but there are other active treatments available, then we know only that it is (probably) active against the disease. We do not know whether it is better than the alternative active treatments. Placebo-controlled trials are useful for licensing authorities, who must determine whether a new treatment is worth making available and will, quite rightly, wish to license all potentially effective agents. The clinician and patient, however, gain much more useful information from trials which compare the new treatment to an active control, preferably one which is widely used in standard practice and ideally is the proven gold standard.
It is very easy to manipulate the results of trials by using a substandard control treatment. Newer drugs in epilepsy tend to claim their place in the armoury on the basis of more acceptable side effects rather than better seizure control. In scrutinising the results of RCTs of the newer drugs it is therefore important to consider whether the control treatment was, in fact, the best available alternative with a sufficient titration period allowed to find the optimal dose, or whether it was a particularly toxic older drug given too short a titration period with too high a target dose. In one trial of anti-TNF vs methotrexate (the genuine gold standard treatment in rheumatoid arthritis) participants were first given methotrexate for a few months and then, if and only if they did not respond to methotrexate, were randomised to either switch to anti-TNF or to continue on methotrexate. This trial therefore looks like a comparison with the gold standard treatment but, because they selected participants who did not respond to methotrexate, it was actually a comparison with a toxic placebo.
Trials can only answer the questions they were designed to answer, so the first step in understanding the results of a trial is to understand the question it was asking. This includes consideration of the control treatment as above, but also issues like the population recruited, the treatment procedures used (especially their similarity to practices in the real world), the sorts of outcomes recorded (and those omitted).