An LLM approximates the probability distribution \(p(\text{output} \mid \text{prompt})\). For a certain prompt it samples one of the most probable outputs from this conditional distribution as the output (answer). (Technically it does this by predicting the word immediately following the prompt, then then the next, then the next and so on.)
The training data represent a sparsely sampled version of \(p(\text{output} \mid \text{prompt})\). The LLM must therefore usually interpolate in its internal representation of the distribution when being prompted by a prompt not exactly represented in the training data.
Creativity is about coming up with something truly novel—exploring an entirely new subspace in the vast space of all knowledge. An example of this is Einstein’s insight that gravitation can be conceptualized as deformation of spacetime. Rather than just combining existing knowledge, he created a completely new framework, opening up a previously uncharted volume in the knowledge space.
If the concepts of gravity and spacetime had not been statistically connected in the training data of an LLM, it is not likely that those concepts would have ever appeared together in any meaningful way in an LLM output.
The LLM can interpolate within the knowledge space it’s been trained on, filling in gaps by blending concepts in novel ways. However, it will stay within the convex hull of its training data, constrained by the boundaries of what it has learned. It cannot propose paradigm shifts or explore fundamentally new concepts outside of its training data.
So yes, an LLM can generate outputs that feel like new knowledge, but they are always grounded in the patterns and relationships within its training data. It will not create radically new knowledge or propose groundbreaking ideas.
The dragons of the uncharted seas will remain hidden beyond its reach.