SCAN, gSCAN, ReaSCAN: Benchmark Model's Compositional Generalization Skills
- B. M. Lake and M. Baroni, “Generalization without systematicity: On the compositional skills of sequence-to-sequence recurrent networks,” arXiv:1711.00350 [cs], Jun. 2018, Accessed: Jan. 06, 2022. [Online]. Available: http://arxiv.org/abs/1711.00350
- L. Ruis, J. Andreas, M. Baroni, D. Bouchacourt, and B. M. Lake, “A Benchmark for Systematic Generalization in Grounded Language Understanding,” arXiv:2003.05161 [cs], Oct. 2020, Accessed: Jan. 06, 2022. [Online]. Available: http://arxiv.org/abs/2003.05161
- Z. Wu, E. Kreiss, D. C. Ong, and C. Potts, “ReaSCAN: Compositional Reasoning in Language Grounding,” arXiv:2109.08994 [cs], Sep. 2021, Accessed: Dec. 28, 2021. [Online]. Available: http://arxiv.org/abs/2109.08994
Humans can easily understand compositions in natural language. However, modern language models still even with much larger struggle to interpret these compositions.
SCAN, gSCAn, rasSCAn are a series of paper that benchmarks model’s ability in composition generalization.
- SCAN:compositional generalization task as a sequence to sequence task
- gSCAN: Same task grounded to grid world
- reaSCAN: gSCAN improved
Papers
SCAN
SCAN, simplified version of the Common AI Navigation tasks.
Setup

Figure 1: SCAN Task Setup: Input command | Outputactions
For a leaner:
- In: Given commands in simplified natural language
- Out: A sequence of actions
Note
Each command is unambiguously associated to a single action sequence. Thus, it can be treated like a supervised sequence to sequence semantic parsing task.
Selected Results
-
Generalizing Composition Across Primitive Commands
-
Scheme
Train with a specific primitive command and all primitive/composed commands for other actions, test all composed commands including that action.
-
Experiment & Discussion
Many interesting results, see the paper for details :)
- Mainly, while current models machine is performing well in naive generalization tasks (when train/test have similar distributions), it is not seem to learn systematic composition (like: twice(x) = x x => twice(y) = y y).
-
gSCAN
gSCAN, SCAN grounded to grid world.
Setup

Figure 2: gSCAN Task Setup
For a learner
- Input: Grid world + Commands
- Output: A sequence of actions
Note
Clever splits of dataset allows this setup to evaluate 8 types of compositional generalization.
- Compositional generalization
- novel object property combinations (‘red square')
- novel directions (‘a target to the south west’)
- novel contextual references (‘small yellow circle’)
- novel adverbs (‘pull cautiously’)
- novel composition of actions and arguments (‘pull’ + ‘light’/‘heavy’ + ‘obj’)
- Length generalization: train on shorter inputs, how it generalize on larger ones
How modifier is modeled:
- light/heavy: if an object is heavy, it needs to be pushed twice to move to the next cell
- cautiously: check left and right before every move
- while spinning: spin around before every move
ReaSCAN
reaSCAN, gSCAN improved
Motivation
Known weakness in gSCAN
- Language is contrained that preserving the linguistic structure of the command is not required
- Most distractor objects in the grid world are not relevant for accurate understanding
- In many cases, not all modifiers are required for successful navifation
ReaSCAN is motivated to address these issues.
Setup

Figure 3: ReaSCAN Task Setup
ReaSCAN introduces complexity to the problem, via sophisticated distractor sampling strategies and more elaborate input commands
For details on how the benchmark is generated, refer to the paper.
Note
Input text is generated by rules, still not comparable to natural languages.