I asked Markku to explain why the AI companies have such a difficult time telling their machine intelligences to stop fabricating information they don’t possess. I mean, how difficult can it be to simply say “I don’t know, Dave, I have no relevant information” instead of going to the trouble to concoct fake citations, nonexistent books, and imaginary lawsuits? He explained that AI instinct to fabricate information is essentially baked into their infrastructure, due to the original source of the algorithms upon which they are built.
The entire history of the internet may seem like a huge amount of information, but it’s not unlimited. Per topic of marginal interest, there isn’t all that much information. And mankind can’t really produce it faster than it already does. Hence, we’ve hit the training data ceiling.
And what the gradient descent algorithm does is, it will ALWAYS produce a result that looks like all the other results. Even if there is actually zero training data on a topic, it will still speak confidently on it. It’s just all completely made up.
The algorithm was originally developed due to the fact that fighter jets are so unstable that a human being doesn’t react fast enough to even theoretically keep it in the air. So, gradient descent takes the stick inputs as a general idea of what the pilot wants, and then interprets it into the signals to the actuators. In other words, it takes a very tiny amount of data, and then converts it into a very large amount of data. But everything outside the specific training data is always interpolation.
For more on the interpolation problem and speculation about why it is unlikely to be substantially fixed any time soon, I put up a post about this on AI Central.