- “To Understand Language is to Understand Generalization,” Eric Jang, Dec. 17, 2021. https://evjang.com/2021/12/17/lang-generalization.html (accessed Jan. 12, 2022).
I suggest that the structure of language is the structure of generalization. If language models also capture the underlying structure of generalization vis-à-vis language, then perhaps we can use language models to “bolt generalization” onto non-verbal domains, such as robotics.
NLP is far beyond just the “UX layer for robots”. Natural language permits us to do everything we can communicate to another person: embed logical predicates, fuzzy definitions, precise source code, and even supplement knowledge that the model does not know ahead of time
Here’s a prediction (epistemic confidence 0.7): within the next 5 years, we’ll see a state-of-the-art model for a computer vision benchmark that does not involve natural language (e.g. ImageNet classification), and that model uses knowledge from internet-scale natural language datasets (by either training directly on NLP datasets or indirectly via re-using an existing language model).
“The trouble is that GPT-2’s solution is just an approximation to knowledge, and not substitute for knowledge itself. In particular what it acquires is an approximation to the statistics of how words co-occur with one another in large corpora—rather than a clean representation of concepts per se. To put it in a slogan, it is a model of word usage, not a model of ideas, with the former being used as an approximation to the latter. Such approximations are something like shadows to a complex three-dimensional world” – Gary Marcus
Compositionality in Language
D. Hupkes, V. Dankers, M. Mul, and E. Bruni, “Compositionality decomposed: how do neural networks generalise?,” arXiv:1908.08351 [cs, stat], Feb. 2020, Accessed: Jan. 12, 2022. [Online]. Available: http://arxiv.org/abs/1908.08351
Language is nothing more than the composition of a discrete set of tokens. How do the smallest units(words) fit together to create new meaning?
Intuition in Robotics:
- Systematicity - Stack blocks in new configurations not seen in training
- Productivity - Stacking more blocks than was done in training
- Substitutivity - Stacking blocks it hasn’t seen before (e.g. understanding that block color does not affect physical properties)
- Localism - Position of far-away objects do not affect behavior for stacking two blocks that are close together.
- Overgeneralization - the robot trains on stacking cubes, and knows not to stack a cylindrical block identically to the way it would stack a cube.