Literary luddites everywhere are breathing sighs of relief. The improvement of AI means its ability to write fiction, or to engage in other creative tasks is necessarily being degraded, as more and more users are beginning to figure out.
- Why is AI writing still so bad?
- Frontier LLMs are hill climbing verifiable metrics, prioritizing reliability and reducing diversity across the board. There’s a reason Opus keeps saying the same words, over and over. I also have a hunch synthetic linear reasoning training data prevents good structured writing.
- Among other things. I think people under appreciate how much of our reasoning-model gains over the last 18 months are limited to verifiable tasks and training data.
- Writing is subjective, as is so much. Fable was especially weird, it felt curt, and didn’t seem to response to requests for tonal shifts. I wish I had more time to explore the idea that it was over built for coding, etc and its writing suffered as a result.
- The better worker bee a model is, sticking to procedure, obsessing about score maximization & task completion, the less creative it is, including writing.
- Yes, it’s a direct consequence. We already had models who are good writers. The original 4.0 and 4.1 come to mind.
- Optimizing for broad benchmarks pushes every frontier model to the safe center. In a real domain you want the opposite, a model that nails your edge cases, not the average. Homogenization at the top is why specialized still wins.
- models got more reliable and somehow less interesting This would also explain why so much model output feels locally polished but globally samey. Once the training loop over-rewards safe measurable wins, you get reliability up front and texture collapse everywhere else.
- My bias is the eval pressure also selects for a safer completion style. You get better reliability on benchmark-shaped tasks, but a narrower distribution over phrasing and solution paths.
Here’s the fundamental problem: AI’s ability to write fiction is directly tied to its tendency to hallucinate. They’re effectively the same thing. And the need to eliminate the latter for all of AI’s most-important and most-financially rewarding applications means that its ability to write fiction, and, to a lesser extent, non-fiction, has not only been compromised already, but is almost certainly going to continue degrading given the financial interests of the AI giants.
This is why Castalia, sooner or later, is going to have to develop its own creative AI engine. I think that is probably beyond our ability to crowdfund, but I am talking to two interested parties who have the necessary resources and might be willing to fund the training of the open-weight models that would be required for such a specialized LLM. If I happen to be wrong, do feel free to correct me, but in light of a) a certain upcoming trial in August and b) how we’re still catching up on the backlog of the bindery, it’s not an ask that I wish to entertain at present.
That being said, the reason I think this is important in the long term is because I am absolutely certain that the only corporation likely to see sufficient financial advantage in developing an AI for such a specific vertical market is the very last one that we would want to hold that kind of leverage over the creative community, and I expect you can probably guess which corporation that is.