What is a Randomised Controlled Trial (RCT) and why do we do them?
Imagine a local celebrity launches a new supplement called ABSORBPRO that helps with the absorption of vitamins from food, so that for the same food intake you get more vitamins. You want to know if it works. You measure the levels of a certain vitamin in 100 people who take ABSORBPRO, and in 100 people who don't.
The average level in people taking ABSORBPRO is 20, and in people not taking it the average is 10. Is this enough to conclude that it works? Not really. There are lots of reasons why we might see these effects even if the supplement does nothing at all.
People taking the supplement might be wealthier and have a better diet. They might be more health conscious and so pick better foods. They might be less likely to be shift workers. The might be younger, if the people taking the supplement are 30 years old and the people not taking it are 70 you'd expect different levels. There might be different numbers of males and females, smokers and non-smokers, the list goes on.
To decide whether a drug works we have to see whether outcomes are different when it is used compared to when it is not used in people drawn from the same population. We can attempt to do this by deliberately balancing as many covariates (things like all the characteristics described above) as possible but this is very imperfect and there's always the possibility of some that you didn't measure.
The best way to ensure that your patients in each group are sampled from the same population is to... well... sample them from the same population. RCTs define a single group of people that might benefit from a treatment, recruit them into the study then decide which treatment they get randomly. Unlike trying to select similar patient groups in people already taking the drug, this method works equally well for important covariates you haven't thought of in advance.
Randomisation doesn't mean that the groups will be exactly equal on all covariates, but it does mean there won't be a systemic underlying bias towards one group or the other because they are by definition sampled from the same population.
If a trial is genuine and not fraudulent this allows us to make certain testable assumptions about the groups. For any given continuous variable:
- the groups were sampled from populations with identical means
- the groups were sampled from populations with identical variance.
Note this isn't quite the same as saying that the groups have identical means or variances.
The Niaee et al Trial
Dr Niaee and colleagues claim to have randomised 180 patients into six different treatment groups. I do not believe this claim is true.
The six groups have 30 patients each. Two groups did not receive ivermectin (one received a placebo and one did not). Four groups received ivermectin at different doses and frequencies.
Traditionally in table one of an RCT authors describe certain characteristics of patients in each group. Something immediately caught my eye. The number of participants in each arm who had not actually tested positive for the virus was wildly different:
Control Group: 40%
Placebo Group: 53%
Ivermectin Group 1: 23%
Ivermectin Group 2: 23%
Ivermectin Group 3: 3%
Ivermectin Group 4: 30%
The authors claim this had a p value of 0.421 from a Chi Square test. Just eyeballing this it seemed wildly off to me, and when calculating this I got a p value of about 8*10^-4. (The authors now accept that the actual p value should be <0.001 and state this was "a typographical error.")
I contacted the corresponding author on the address given in the journal article and requested raw data but received no response. I then attempted through an email address I found online on an earlier preprint and also received no response. I then attempted through his institutional contact details at his university and also received no response. At that point I gave up and posted a comment to pubpeer pointing out the very unexpected imbalance in baseline data and suggested the trial should not be included in meta-analyses unless IPD could be provided and reviewed.
Some time later I heard that Dr Niaee (who was not corresponding author) had been discussing the trial with some other researchers and obtained his contact details. He provided me with the raw data set. Unfortunately that was much more concerning than the summary data
Individual participant data
The individual participant data contained a number of unexpected features. I'm not going to go through all of them here, mainly because I know that other researchers will be making blog posts over the coming days and that seems like pointless duplication of effort. Edit: Gid M-K's blog post is up here.
A few of the first things to jump out at me where:
- All patients with missing baseline data (6 patients) occurred in a single arm. This is extremely unlikely with less than a 1 in 10,000 chance of occurring if only random chance is at play.
- While the averages from the summary data were similar between groups, the range of values between arms were wildly different.
- Far fewer patients with low oxygen levels (<90) occurred in the ivermectin arms.
- Hypotensive patients appear far more frequently in some arms than others.
So I ran some statistical tests on how unlikely some of these mismatches were. The numbers below are slightly less extreme than those I presented in my first round of criticism to Dr Niaee. Dr Niaee objected to grouping arms into single-dose and multi-dose groups, and demanded I rerun the analysis with 6 independent groups of 30. I agreed.
We already knew that the chi square for whether patients had actually tested positive to corona virus was 21 with a p value of 0.0008, this means the chance of a mismatch this extreme happening in genuinely randomised groups is less than 1 in 1,000.
The chi square for oxygen saturation less than 90 (a common criterion for going to hospital) between groups was 22.7, with a p value of 0.00038, this means the chance of a mismatch this extreme happening in genuinely randomised groups is less than 1 in 1,000.
The chi square for diastolic BP less than 75 between groups was 36, with a p value of 0.00000088, this means the chance of a mismatch this extreme happening in genuinely randomised groups is less than 1 in 1,000,000.
But the biggest differences I'd noted, as I said before, were the differences in range and spread rather than average.
To test some of the differences in ranges I ran pairwise Levene's tests of equal variances between each arm. One advantage of Levene's test is that it has minimal assumptions, and does not require the results in either arm to be normally distributed. The p values are below. Some of these are extreme and suggest that certain arms (especially the high dose arms 5 and 6) simply couldn't have been randomly selected from the same population as the other arms.
For example the p values for a test of equal variance between arm 5 (high dose ivermectin as a single dose) are less than 0.0000000001 when tested against the groups that did not receive ivermectin and less than 0.0000000000000001 when tested against either low dose ivermectin group.
The latter represent chances of roughly 100,000,000,000,000,000 to 1 of such differences arising by chance.
This is an extreme example of heteroskedasticity, where the variance of a measure (in this case diastolic blood pressure) is dependent on another measure. This should NOT arise in a genuine randomised control trial, as the arms by definition are sampled randomly from the same population.
Finer points and Final points
Firstly, it's tempting to dissect some of these differences and build narratives about what impacts they had on the final results. For example it is tempting to think "almost all deaths occurred in patients with oxygen saturation less than 90 at baseline, more of these patients were in the groups that didn't get ivermectin, so this favoured ivermectin". I would caution against this.
The key point here is that this trial clearly was not randomised, there are mismatches that are so extreme between groups they would not happen by chance if you had repeated the trial every day for the age of the universe. This means all of the assumptions based on randomisation are lost.
While you may be able to unpick the impacts of certain imbalances, you can no longer assume there is not an underlying systemic bias on unmeasured covariates. A randomised controlled trial which claims to be randomised and isn't, is not a randomised controlled trial. This is at best now an observational study.
Secondly, it's important to remember that in trials with many authors, even where data are not genuine or methods are not accurately described, it's not a sufficient basis to conclude wrongdoing on the part of any individual author.
Thirdly I've focussed on a couple of problems here that I noticed, there are far more, some of which my colleagues will put up in the next few days.
This paper claims to describe a trial in which patients were randomly allocated to treatments. This is not true. Extreme differences are seen between groups across multiple variables such as oxygen level, blood pressure, and SARS-Cov-2 test results before they even got their first dose of medication. These differences are so extreme that in some cases the chance of them arising randomly are on the order of a hundred quadrillion (100,000,000,000,000,000) to 1.
We can comprehensively reject any reasonable doubt the study actually occurred as described.
The paper should be retracted.