Written Evidence Submitted by HealthWatch




HealthWatch is a UK charity that has been promoting science and integrity in healthcare since 1991. (It is not connected with the NHS organisation Healthwatch England.)

We promote:

  1. The rigorous assessment and testing of all medical and nutritional treatments, devices and procedures.
  2. Consumer protection in regard to all forms of health care.
  3. The highest standards of evidence-based education for clinical practitioners.
  4. Improving media and public understanding of the importance of applying evidence from robust clinical trials.

Our activities include public debates, awards, research, a student prize competition and a newsletter. Our honorary president is Nick Ross and our patrons are Robin Ince, Professor Steve Jones FRS, Dr Margaret McCartney, Sir Michael Rawlins, Lord Dick Taverne QC, and Dr Sarah Wollaston.

We make this submission because of our concerns for integrity in healthcare and science.


Executive summary

There are some innocent and understandable reasons why failure to replicate findings is a particular problem in medical research. However, evidence shows that in many instances failure to replicate findings is the result of poor methodology from the outset or outright dishonesty in the earlier publication.  The problem is compounded by the need for researchers to publish papers in order to further their careers; the desire of journals to publish ‘exciting’ new findings; the conflict of interest, for such journals, between getting the science right and serving their commercial needs, including in respect of advertising; the consequential difficulty in getting replication studies published, especially if they are negative, and the almost complete lack of regulation of misleading or fraudulent publication, including lack of action by universities, research funders and the GMC.

Our suggested solutions are:


Main submission

The publication of data that are unreproducible and therefore have questionable validity is frequent in medical research. Medical treatments predicated on unreproducible data put patients at risk and waste resources.  We believe that current procedures for investigating research misconduct are ineffective. Academic institutions and journals have consistently failed to investigate research misconduct properly. The GMC has been particularly ineffective: for example, a doctor who at separate GMC hearings in 2006 and 2015 was found repeatedly to have committed misconduct in research over many years was allowed to return to medical practice without the imposition of any conditions on his registration to prevent him undertaking future research. He is again publishing “research”. The difficulty is that academic institutions, journals and the GMC lack both the will and competence to undertake proper investigations of research misconduct.



There are a number of innocent and understandable reasons why failure to replicate findings is a particular problem in medical research.  However, evidence shows that in many instances failure to replicate findings is the result of poor methodology or outright dishonesty in the earlier publication.

When performing experiments in basic sciences such as physics and chemistry the experimental conditions can usually be controlled precisely. By contrast, during research in people it is very difficult to get exactly the same physiological and physical conditions in different groups of subjects or even in the same individual on different occasions. That makes it more difficult to replicate experiments.  The difficulty is often compounded by inadequate description of the methods in an original paper, so that precise repetition of an experiment by a different research team is frequently impossible. As a result, earlier experimental findings may not be replicated in subsequent experiments and, more importantly, may not translate to outcomes in “real-life” treatment of patients.  We believe that all trial protocols should be published in advance and reported. As well as aiding reproducibility, this would also greatly assist searching for unpublished trials (often negative) that might otherwise be ‘buried’ and distort the literature. Methods and outcomes should be adhered to, or modifications justified.

It is well known that patients who are the subjects in medical research trials invariably have better outcomes than apparently similar patients that receive the same treatment after the trial. The disparity has been attributed to differences in patient selection which are not clear in the trial reports, or better supervision of treatments during the trial compared with in real-life. It may also represent more favourable interpretation of data by clinical trial investigators, particularly when financial conflicts of interest exist in investigators or sponsors. Even without a financial conflict of interest, there may be other advantages to reporting good outcomes, such as career advantages from publication (see below).

To help overcome difficulties resulting from variability of responses in different patients an appropriately large group of patients/subjects must be studied and statistical tests must be employed to test the null hypothesis (the hypothesis that there is no real difference between treatment A and treatment B). The use of a particular level of statistical significance means that there is the possibility that a difference that is not real will be found by chance to be ‘significant’ on some occasions but not on other occasions when attempts are made to replicate the research.

The fact that researchers can innocently get the “wrong result” from a study as a result of chance can also make it more difficult to detect deliberate fraud. This is compounded by the fact that if the initial “wrong result” is surprising and exciting, it is more likely to be accepted for publication, particularly by major journals which want to publish novel findings. It follows that major innovations reported in major journals are often the ones that are most difficult to replicate. A problem now widely prevalent is that all journals like to publish exciting new claims but few journals, and certainly none of the major journals, are keen to publish evidence that an exciting new claim was flawed. Journals are particularly unwilling to publish findings which suggest articles that they published previously were flawed or false.

Another driver of the crisis in reproducibility of published research is that authors are under pressure to publish large numbers of research articles which, for purposes of career advancement, make the most sensational claims that the authors can get away with (the so-called ‘publish or perish’ phenomenon). Journals contribute to the problem because they accept articles entirely on trust and in recent years newer on-line journals have been launched which will publish almost anything submitted to them for a fee.

A major problem is that research and publication rely on a system of trust that has long been outdated and no longer applies in any other field. It is a relic of a bygone age of ‘gentlemen scientists’, whose reputations with their peers rested on the validity of their findings. Before publication such researchers would repeat experiments to make sure their findings were reproducible and valid. Universities could trust them to police themselves and, as a rule, editors could trust them to submit honest data.  Times have changed: as noted above, career advancement and livelihood nowadays depend on publications. Simple metrics are used, such as how many papers an individual has published, with thresholds which many great scientists and most Nobel laureates did not reach. Rather like corrupt electoral systems which flourish when supporters “vote early and vote often” the key to advancement in academia has become publishing earliest (because being first to publish a finding is crucial) and publish often (because quantity is valued more than quality). Hasty publication increases the possibility that findings cannot be replicated.

Some universities require employees to publish a certain number of articles each year in journals with a particular Impact Factor or greater. That can result in authors making their claims more sensational in order that they get them into the journals with high Impact Factors. The journals with the highest Impact Factors need articles that make sensational claims so that researchers read and cite the articles or their Impact Factor will decrease. Journals also want positive outcome trials of drugs and medical devices because that induces the manufacturers of the drugs and devices to purchase reprints and advertising linked to the articles.

Although editors and publishers know that pressure to publish creates the temptation to make false and exaggerated claims, editors do not insist that authors provide any evidence that the research was actually performed or, if it was done, that the data are reported accurately. Peer review has been shown to be poor at detecting research fraud – a fact obvious from the known extent of research fraud and the exponential increase in numbers of retractions. Retractions underestimate the extent of the problem because it is certain that some fraud goes undetected and much detected research fraud is not retracted by journals.

The willingness of editors to publish medical articles without any evidence that the data reported are true contrast with the insistence of editors on seeing evidence before publishing articles that might result in their journal being sued for libel. It is well documented that many medical journals, including the Lancet and BMJ, have a double standard of publishing medical claims without any verification of veracity but they and their lawyers minutely check supporting documents to confirm every sentence in articles that might involve the journal in risk of litigation. We are not aware of a single instance when a medical journal insisted on seeing the full, original data behind a research paper even when false or inaccurate data would put the lives of patients at risk.

A worrying aspect is that nobody effectively polices the problems of poor quality and misconduct in research. There is considerable evidence that academic institutions and journals help cover up all types of research misconduct in order to protect their reputations. In some areas of medical care this may be seriously harmful or even fatal.  It might be thought that the General Medical Council (GMC) would play a key role in combatting fraud in medical research but experience shows that it consistently tries to avoid dealing with cases of research misconduct, nor does it show any inclination to investigate allegations of conspiracy to commit misconduct; on one occasion when they did investigate but a UK institution refused the GMC’s request to see the data, the GMC took the view that “they could find no evidence of research misconduct” and this was seen by the institution as exoneration. Most sensible people would take the contrary view, i.e. that inability or unwillingness to produce the data suggests there is no evidence that the research was honest.

The Committee on Publication Ethics (COPE) gave the following advice to an editor considering publications by a doctor who had been found to have published many dishonest papers “(the journal) must not assume that evidence of past misconduct always indicates misconduct in other cases.” However, it is surely prudent, in such a case, not to assume honesty and thus to investigate harder!  To compare the problem with one in a different area of safeguarding, it would be considered irresponsible to leave a child with a person with convictions for paedophilia because one believed that “evidence of past misconduct does not always indicate misconduct in other cases”, so why take the risk?

Grant-awarding bodies do not want to draw attention to the fact that they have funded flawed research that defied replication, just as politicians are unwilling to admit that they wasted money taken from tax payers. All the interested parties play down the extent of failure to replicate.

Another concern is that the massive amount of failure to replicate research that we know about considerably underestimates the true amount of research that cannot be replicated. The reasons include:

  1. No attempts have been made to replicate many studies, particularly large studies. The reason is that if one treatment has been found to confer survival or symptomatic benefits to patients when compared to a second treatment or placebo, it is ethically difficult to justify randomisation of patients in replication experiments when “the benefit of one treatment is already known”. In addition, there is often difficulty funding such replication experiments. For example, if a healthcare product is found to have benefits for patients in a clinical trial, the corporation that manufactures and markets the product is unlikely to want to fund a replication study that might show that the earlier findings were flawed.
  2. It is very difficult to publish reports of failure to replicate experiments or trials. Journals are not interested in publishing them because no product manufacturer will buy reprints showing that their product is not effective.  Faced with a difficult in getting replication studies published, researchers may decide that, under the circumstances, writing them up and trying to publish them is a waste of time when they could do other things.
  3. There are examples of industrial sponsors trying to prevent publication of studies that failed to replicate earlier reports of efficacy of their drug or device. For example one pharmaceutical corporation persuaded three groups not to publish their three separate failures to replicate an earlier favourable study showing the corporation’s drug was effective. The corporation offered a bribe and then made legal threats in an unsuccessful attempt to get a fourth group to withhold a report of their failure to replicate favourable findings in the original study, and when fourth group of investigators publishing their failure to replicate findings, the three earlier, suppressed, studies showing failures to replicate came to light. The drug was banned world-wide but not before more than 1400 life-threatening adverse events, some fatal, had been reported. It should be easier to report replication studies. Indeed, it is poor research practice not to report, as it is dangerous that trials remain unreported - the basis of the campaigning of AllTrials (https://www.alltrials.net/).
  4. It is also important to consider the adverse effect of fraudulent studies that claim to have “replicated” earlier findings but which are entirely false. This happens because, from the fraudster’s perspective, the major problem with research fraud is that it can be detected if the fraudster fabricates data that can be proved to be wrong. One way around that is to fabricate data that you believe is true because they accord with previous published research. The risk is if the earlier data was also fabricated. There are examples in medical research of fraudsters publishing data that they claimed “replicated” earlier research which was also fraudulent. As a result, the greater the body of false data that is built up the more difficult it becomes for anyone to challenge it or even consider examining it.


Suggested remedies


(September 2021)