- J. Andreas, M. Rohrbach, T. Darrell, and D. Klein, “Neural Module Networks,” Nov. 2015, Accessed: Jan. 16, 2022. [Online]. Available: https://arxiv.org/abs/1511.02799v4
Visual question answering is compositional in nature. This paper seeks to exploit the representational capacity of deep networks and the compositional linguistic structure of questions.
This approach decomposes questions into their linguistic substructures, and uses these structures to dynamically instantiate modular networks.
Different composable modules for different sub-tasks.
- Type is high-level module type like attention, classification, etc
- INSTANCE is the particular instance that this module considers
This paper introduces several types of modules for ViQA, adding newer ones should also be easy.
From Questions to Assembed Networks
Question String => via Stanfoard Parser => Symbolic Dependency Representation of Question => via assembling => Neural Module Network built on the fly