The Decay Function of Professional Science

An excerpt from HARDCODED: AI and The End of the Scientific Consensus:

How long does it take for a scientific field to fill with garbage?

The question sounds polemical, but it has a precise mathematical answer. Given a field’s publication rate, its replication rate, its correction mechanisms, and—critically—its citation dynamics, we can model the accumulation of unreliable findings over time. The result is not encouraging.

The key insight comes from a 2021 study by Marta Serra-Garcia and Uri Gneezy published in Science Advances. They examined papers from three major replication projects—in psychology, economics, and general science journals including Nature and Science—and correlated replicability with citation counts. Their finding was striking: papers that failed to replicate were cited significantly more than papers that replicated successfully.

Not slightly more. Sixteen times more per year, on average.

In Nature and Science, the gap was even larger: non-replicable papers were cited 300 times more than replicable ones. And the citation advantage persisted even after the replication failure was published. Only 12% of post-replication citations acknowledged that the original finding had failed to replicate. The other 88% cited the discredited paper as if it were still valid.

This is not a bug in the scientific literature. It is a feature of the incentive structure. “Interesting” findings—surprising results, counterintuitive claims, dramatic effects—attract attention, generate citations, and advance careers. They are also, precisely because they are surprising, more likely to be false positives or artifacts of methodological error. The system selects for interestingness, and interestingness is inversely correlated with reliability.

The Serra-Garcia and Gneezy finding transforms the replication crisis from a problem of individual bad actors into a problem of system dynamics. It’s not just that bad papers get published. It’s that bad papers get amplified. They accumulate citations. They enter textbooks. They shape the training of the next generation of researchers. They become, in effect, the curriculum.

Let’s build the model.

Define the following variables for a scientific field:

S(t) = the stock of “active” papers at time t (papers published in the last N years that are still being cited)

p(t) = the proportion of active papers that are unreliable (would fail replication if tested)

B(t) = the rate at which new unreliable papers enter the literature

G(t) = the rate at which new reliable papers enter the literature

C = the correction rate (the fraction of unreliable papers that are retracted, corrected, or otherwise removed from active circulation per year)

α = the citation amplification factor for unreliable papers relative to reliable ones

From the Serra-Garcia and Gneezy data, α ≈ 16 for typical fields and can reach 300 for high-profile journals. The correction rate C is extremely low: retraction rates are approximately 11 per 10,000 papers as of 2022, and retractions capture only a tiny fraction of unreliable papers. Elisabeth Bik’s analysis of 20,000 papers found that approximately 2% contained deliberately manipulated images—a rate 200 times higher than the retraction rate.

Now consider how new researchers are trained.

A graduate student entering a field reads the literature. They learn what questions are interesting, what methods are appropriate, what findings are established. They calibrate their sense of “what is true in this field” against the papers they encounter. Crucially, they encounter papers in proportion to how often those papers are cited. A paper with 1,000 citations is more likely to appear in syllabi, review articles, and search results than a paper with 100 citations.

This means the effective training signal is not the proportion of unreliable papers in the literature. It is the citation-weighted proportion. If unreliable papers receive α times more citations than reliable papers, then:

Effective training signal = (p × α) / (p × α + (1 – p))

Consider a field where 50 percent of papers are unreliable (p = 0.5). If unreliable papers are cited 16 times more often (α = 16), then:

Effective training signal = (0.5 × 16) / (0.5 × 16 + 0.5 × 1) = 8 / 8.5 ≈ 0.94

When half the literature is unreliable, 94 percent of the citation-weighted training signal comes from unreliable papers.

This is the amplification mechanism. The literature can be 50 percent garbage, but the effective literatur, what researchers actually encounter, learn from, and calibrate against, is 94 percent garbage. The citation dynamics concentrate the garbage.

Now what happens when researchers trained on this signal produce new work?

DISCUSS ON SG