Emergence

unlimited priors or unlimited training data allow experimenters to “buy” arbitrary levels of skills for a system, in a way that masks the system’s own generalization power.

Adding ever more data to a local-generalization learning system is certainly a fair strategy if one’s end goal is skill on the task considered, but it will not lead to generalization beyond the data the system has seen

Let's think it this holds for Large Language Models (LLM) and World Models.

In the case of LLM a simple task like predicting the next word can force the model to learn facts about the world and to do non trivial tasks. For example the LLM can do language translation, question answering, mathematics...

Emergent Abilities of Large Language Models