Neural Module Network for Text: New Module Design & More Supervision

  • N. Gupta, K. Lin, D. Roth, S. Singh, and M. Gardner, “Neural Module Networks for Reasoning over Text,” Dec. 2019, Accessed: Jan. 16, 2022. [Online]. Available:


Answering compositional questions that require multiple steps of reasoning against text is chanllenging, especially when they involve discrete, symbolic operarions.

Neural Module Networks learn to parse such question as executable programs composed of learnable modules, performing well on stnthetic visual QA domains.However, it is found to be chanllenging to learn these models for non-synthetic questions on open-domain text.

This paper extends NMNs by:

  1. introducing modules to reason over a paragraph of text, performing symbolic reasoning
  2. proposing an unsupervised auxiliaryloss to help extract arguments associate with the events in text


Figure 1: Model Overview

Figure 1: Model Overview


Figure 2: Description of Modules

Figure 2: Description of Modules

Meaning of Terms:

  • Question (Q) and Paragraph (P) attentions: soft subsets of relevant tokens in the text.
  • Number (N) and Date (D): soft subset of unique numbers and dates from the passage.
  • Count Number (C): count value as a distribution over the supported count values (0-9)
  • Time Delta (TD): a value amongst all possible unique differences between dates in the paragraph. In this work, we consider differences in terms of years.
  • Span (S): span-type answers as two probability values (start/end) for each paragraph token.

Auxiliary Supervision

Training all the modules, parsers in an end-to-end fashion is extermely hard. Auxiliary supervision in the middle is made to mitigate the issue.


  • Unsupervised Auxiliary Loss


  • Question Parse Supervision

    “In order to bootstrap the parser, we analyze some questions manually and come up with a few heuristic patterns to get program and corresponding question attention supervision (for modules that require it) for a subset of the training data.”

  • Intermediate Module Output Supervision

    We provide heuristically-obtained noisy supervision for the output of the find-num and find-date modules for a subset of the questions (5%) for which we also provide question program supervision.