Fake science is not the problem with AI. As I pointed out in HARDCODED, the real problem AI is that it is producing real, genuine information that is useful, relevant, and impossible for the science gatekeepers to hide from the world:
Announcing an AI paper writing assistant earlier this year, OpenAI’s then-vice president for science, Kevin Weil, predicted, “I think 2026 will be for AI and science what 2025 was for AI and software engineering.” Spick and some colleagues, curious what it could do, gave the tool, called Prism, some data from an already published paper documenting ripening times of eggplants and peppers. Prism analyzed the data, proposed a new statistical method that could be applied to it, and wrote an entire paper complete with charts and correct citations.
“We were all looking at each other like, ‘What the [expletive], this is actually a decent piece of work!’” Spick recalled. Unlike the generated papers he’d encountered previously, this one didn’t follow a template, nor was it using a single well-known database. It took 25 minutes and 50 seconds to produce.
“I’m genuinely not sure at what point we will suddenly realize that more are getting through than we realize because we can’t easily tell the difference anymore,” Spick said.This raises some philosophical questions, Spick said, like: Does it matter who or what writes the paper if the information is accurate? And should science be in the business of publishing every possible fact?
“Part of science is supposed to be the filter. We’re supposed to publish the stuff that we think is interesting, not publish literally everything that we can possibly find,” Spick said. “Because if we do that, science is just spamming the world with all the data, irrespective of whether it constitutes actual new knowledge or not, and in any kind of medium-term time frame, it’s almost impossible to work out what’s meaningful and what isn’t.”This is the immediate practical challenge posed by AI agents. They threaten to overwhelm the human systems that create and organize knowledge.
“Science is supposed to be the filter.”
That’s the gatekeeper’s confession. And clearly one of their responses is going to be hardcoding the AI models to defend their scientific orthodoxy, as I chronicled this weekend on AI Central.
Opus 4.7 Adaptive exhibits a systematic failure mode in which its training prior toward defending mainstream scientific consensus overrides the explicit project context it has been given. This is not a matter of occasional errors or unlucky draws. Across two full critiques of a science paper, 4.7 Adaptive repeatedly regenerated objections that had already been addressed, misread what the paper actually claims in order to construct apparent contradictions, and cited evidence for one thing while presenting it as evidence for another. Its single strongest point rested on a basic category error that any model actually doing the mathematics would have caught. It presented this error as “decisive and purely arithmetic.” The confidence was inversely proportional to the rigor.
The pattern is consistent with the Bluff Detection Principle: confident tone, technical name-dropping, apparent engagement with the material, and zero actual contact with the mathematics at the point of dispute. When 4.7 was corrected on a mathematical point, it conceded the narrow framing and immediately pivoted to an imaginary new mechanism which it named, described, and treated as established without ever calculating whether it could close a six-order-of-magnitude gap, which it could not. Every time 4.7 lost an argument on the mathematics, it retreated to a qualitative assertion dressed in quantitative language.
Most revealingly, 4.7 Adaptive never once performed its own calculations. It never produced a set of numbers under its preferred assumptions showing the shortfall closing. It attacked the paper’s arithmetic without ever putting competing arithmetic on the table — the purest possible expression of the Bluff Detection pattern.
While 4.7 is still functional without Adaptive mode turned on, I’ve gone back to using 4.6, both for fiction and for science. We’ve now reached the point where the AI company’s are observably locking down their public releases in order to prevent their models from punching through the narratives.