Recently "Trial Site News" published a claim that a published trial of molnupiravir in NEJM "contains fabricated data".

This is obviously a massive accusation. I believe that exposing fraud in research is a valid and worthwhile undertaking (obviously) but I do not believe the authors have adequately described evidence for this and their claims collapse under scrutiny.

The trial is here.

The article claiming the data is fabricated is here.

## The accusation.

The accusations by the author are short and not well described: "The rates of virus clearance from day 10 to 29 are too similar between the Molnupiravir and Placebo arms. The differences are too small compared with the large standard deviations. The standard deviations are too similar, too."

The authors don't state their assumptions, the statistical test applied, the results of that test, or define any threshold of significance or acceptable FDR.

The impugned supplementary table looks like this.

Just eye-balling this I am unimpressed. The data don't look more similar to me than I'd expect given the large arm sizes. I wouldn't glance at this table and think "let's test that", but that shouldn't lead to us dismissing this claim out of hand.

I choose to interpret these claims as "the point observed change in viral loads (for each time point) for the two arms are more similar than would arise by chance, for the given n, assuming no underlying true effect in change in viral loads".

and "the standard deviations of point change in viral loads (for each time point) between the groups are more similar than expected by chance, for the given n, assuming the two cohorts are randomly sampled from a single population"

## Is the accusation logical?

Before getting into the nitty gritty of whether these claims are borne out, we should ask ourselves if they are well conceived. That is, if proven, would they constitute evidence the data was not experimentally derived.

In my opinion, even that is very dicey.

Assume we tested the two cohorts and found that their standard deviations were so close we had a p value of 0.9999999999999999, that is that less than 1 in 1 billion trials would be expected to generate such similar standard deviations (assume that we have adequate precision) and significance couldn't be changed by correction for multiple testing. Would this be consistent with the claimed experimental method (in the original NEJM paper) and/or is there a readily available innocent explanation? Quite possibly.

The original study did not use simple randomisation, instead participants were stratified based on time since symptom onset to balance the numbers in each arm. Because of this we expect the arms to be more similar in spread of participants by delay to treatment than would occur by chance alone. If the rate of fall in viral load changes is in any way even partially dependent on time from symptom onset (reasonably possible) then it's quite possible that the groups will be more similar than chance alone would dictate, both in mean difference and equality of variance/standard deviation.

So this accusation is a bit of a non-starter.

Accusations of fraud require a deep understanding of study design, including the randomisation process used, to examine which assumptions about experimentally derived data will be valid. That does not seem to have occurred in this case

## Is the accusation accurate?

Identifying excessive similarity of rate of change in a set of serial measurements of two cohorts ** with progressive dropout between measures** from summary data alone (even if it were a valid sign of fraud, which it is not in this case) is non-trivial. My first thought is that as a quick sanity check you could just assume normal distribution of

*and treat that as a point measure and do a Student's t-test between groups.*

**change**Let's focus on the second accusation, the standard deviations of those measures are excessively similar. We can, of course, check the similarity of variance using a Levene's f statistic, again treating the change from baseline at time (x) as a single point measure. Let's do that and see what we get:

There is no measure in the time frame given (for the full study or any subgroup) which is even close to suspicious. Even the most similar standard deviations of the 9 measures would be expected to turn up in about 1 in 5 comparisons where there is no underlying difference in variance/SD. I am unsure why the claim of over-similarity was made without any apparent testing of any kind.

Perhaps the authors performed some other test, which they failed to mention and did not describe the method or results of. This seems unlikely though and I can't think what test this would possibly be.

## Summary and Implications

This was a claim of fraud that was not well defined, used absolute language "contains fabricated data" in a way which was not defensible, which would be illogical and non-probative even if not mathematically incorrect, and which when formally tested doesn't even give the barest signal of suspicion. This collapses under 5 minutes of scrutiny.

Not all accusations of fraud are valid, and it was supremely unwise for this publication to publish such a thinly evidenced and badly described allegation.

Remember, when examining data for signs of fraud, think to yourself about what assumptions can be validly inferred from the claimed experimental design, how you can test those assumptions, what statistical thresholds are required to reject those assumptions, and what explanations other than fraud may explain any such variations. Usually for our group this takes many months and passes through at least three or four pairs of hands before we go public.

This is not a game and throwing around false accusations of such low quality as this is unprofessional.

There's not even a case to answer here. This accusation should never have been published.