Markku has helpfully instructed the good people of AI Central on how to set up your own local version of Deepseek, thereby permitting you to control the data being utilized by it.
Until recently, training an AI on a set of research data has been so resource-intensive that it has been entirely out of reach for home users even for smaller (“distilled”) models intended for ordinary gaming computers. However, these days there is a methodology called Retrieval-Augmented Generation, RAG for short, that can achieve something very close to the effect of training in a relatively tiny portion of the time. The trade-off is that the understanding of the data is not as deep, and the data has to be processed every time the AI is launched.
With an average gaming PC with an NVIDIA GPU that’s in the RTX 3000 -series or newer, you can expect it to spend about 10 minutes, assuming you use a modest 7 billion parameter model. Parameters can be thought of as virtual brain cells. With a better computer, 14 billion is also realistic, especially if you are asking just a few but important questions. Since you are having the AI focus on a set of data that is extremely limited compared to cloud-based AI’s, the normally expected half trillion parameter counts aren’t important. You need them only when the AI has to essentially know the entire contents of the internet. For one set of books, 7 to 14 billion is sufficient. If you choose to get serious about using a locally installed AI, then you’d install it on a dedicated Linux server and keep it constantly running, which mitigates the problem to essentially zero.
It’s not for everyone, to be sure. Not yet, anyhow. But I’m sure there are a few hardcore programmers here who are more than up to the challenge, if interested.