Data Augumentation that Enhances Compositionality

  • E. Akyürek and J. Andreas, “Compositionality as Lexical Symmetry,” arXiv:2201.12926 [cs], Jan. 2022, Accessed: Feb. 02, 2022. [Online]. Available: http://arxiv.org/abs/2201.12926

Abstract

Weakness of previous works:

  • Standard deep network models lack the inductive biases needed to generalize compositionally in tasks like semantic parsing, translation, and question answering.

Novelty:

  • A long line of language processing research has operationalized the principle of compositionality as a constraint on model architecture. How can we train unstructured neural sequence models that generalize compositionally? In this paper, we describe how to inject compositional inductive bias into general function approximators by operationalizing compositionality as a constraint on data distributions rather than model architectures

Contributions:

  1. a domain-general framework for compositional modeling that instead formulates compositionality as a constraint on data distributions.
  2. concrete instantiation of this framework as a data augmentation scheme that improves model performance on multiple tasks

Approach

Prelimenaries

Formal Definition of Symmetry

Definition 1. A symmetry of a set \(X\) is a function \(f\) satisfying:

\[ \{f(x): x \in X\}=X \]

That is, applying \(f\) to each element of \(X\) leaves \(X\) unchanged.

Method

Setup

We consider problems defined by a space of possible examples \(\mathcal{X}\), of which a subset of examples \(X\) are well-formed.

We assume each example \(\mathbf{x} \in \mathcal{X}\) is a discrete sequence \(\left[x_{1}, \ldots, x_{n}\right]\), with \(x_{i}\) drawn from a vocabulary \(\Sigma\).

Finally, we assume that well-formedness can be computed by a interpretation function \(\mathcal{I}: \mathcal{X} \rightarrow\{0,1\}\) with \(\mathcal{I}(\mathbf{x})=1\) iff \(\mathbf{x} \in X\).

  • Examples

    • Semantic Parsing

      Examples \(\mathbf{x}\) are pairs \(\left(\mathrm{x}_{\mathrm{NL}}, \mathrm{x}_{\mathrm{LF}}\right)\), where \(\mathrm{x}_{\mathrm{NL}}\) is an sentence, \(\mathrm{x}_{\mathrm{LF}}\) is a logical form, and \(\mathcal{I}\left(\mathrm{x}_{\mathrm{NL}}, \mathrm{x}_{\mathrm{LF}}\right)=1\) iff \(\mathrm{x}_{\mathrm{LF}}\) represents a possible meaning of \(\mathrm{x}_{\mathrm{NL}}\).

    • Paraphrase Detection

      Examples \(\mathbf{x}\) are sentence pairs \(\left(\mathbf{x}^{\prime}, \mathbf{x}^{\prime \prime}\right)\), with \(\mathcal{I}\left(\mathbf{x}^{\prime}, \mathbf{x}^{\prime \prime}\right)=1\) iff \(\mathrm{x}^{\prime}\) and \(\mathrm{x}^{\prime \prime}\) have the same meaning.

Compositionality as Lexical Symmetry

  • Definition of Lexicon

    A lexicon \(\mathcal{L}\) is a pair \((\tau, \epsilon)\) where \(\tau\) : \(\Sigma \rightarrow \mathcal{T}\) assigns each symbol a type from some set \(\mathcal{T}\), and \(\epsilon: \Sigma \times \Sigma \rightarrow\{0,1\}\) is a relation indicating which symbols are semantically equivalent.

  • Definition of Lexicon Abstraction

    Denote the lexical abstraction \(\mathcal{L}(\mathbf{x})=(\tau(\mathbf{x}), \epsilon(\mathbf{x}))\) where \(\tau(\mathbf{x})\) is a vector with \(\tau(\mathbf{x})_{i}=\tau\left(x_{i}\right)\) and \(\epsilon(\mathbf{x})\) is a matrix with \(\epsilon(\mathbf{x})_{i j}=\) \(\epsilon\left(x_{i}, x_{j}\right)\) (see Fig. 2). Then we say that the interpretation function \(\mathcal{I}\) (or equivalently the set \(X\) ) is $\mathcal{L}$-compositional iff we can write \(\mathcal{I}(\mathbf{x})=\mathcal{C}(\mathcal{L}(\mathbf{x}))\) for some composition procedure \(\mathcal{C}\). In other words, \(\mathcal{I}\) (and \(X\) ) are compositional if they can be defined purely in terms of the types of and relations between the symbols in \(\mathrm{x}\), without any information about the identity of the symbols themselves.

    • Examples

      • Semantic Parser

        Figure 1: Semanic Parser Example

        Figure 1: Semanic Parser Example

        Given an input, if we can build up the abtract by type, and still fully determine the well-formness, compositionality exists.

  • Definition of Homomorphism

    A function \(f\) is a homomorphism of \(\Sigma\) with respect to \(\mathcal{L}\) (an " \(\mathcal{L}\) -homomorphism") if:

    \begin{gathered} \forall x \in \Sigma: \tau(x)=\tau(f(x)) \\
    \forall x, x^{\prime} \in \Sigma: \epsilon\left(x, x^{\prime}\right)=\epsilon\left(f(x), f\left(x^{\prime}\right)\right) \end{gathered}

    \(f\) “preserves the structure” of \(\mathcal{L}\) : it replaces symbols with other symbols of the same type, and ensures that pairs of symbols maintain equivalence relationships. An example is depicted in Fig. 1; note that both the words yellow and green and the corresponding meanings must be swapped in order to satisfy Eq. 3 .

  • Proposition

    Proposition 1. If \(X\) is $\mathcal{L}$-compositional, \(f\) is an $\mathcal{L}$-homomorphism, and \(\mathrm{x} \in X\), then \(f(\mathrm{x})=\left[f\left(x_{1}\right), \ldots, f\left(x_{n}\right)\right] \in X\)

    Thus, every homomorphism of \(\mathcal{L}\) corresponds to a symmetry of \(X\).

    Proof. From Definition 3 and \(4, \tau(\mathbf{x})=\) \(\tau(f(\mathbf{x}))\) and \(\epsilon(\mathbf{x})=\epsilon(f(\mathbf{x}))\). Then, $$

    \begin{aligned} \mathbb{1}_{[f(\mathbf{x}) \in X]} &=\mathcal{I}(f(\mathbf{x})) \\
    &=\mathcal{C}(\mathcal{L}(f(\mathbf{x}))) \\
    &=\mathcal{C}(\mathcal{L}(\tau(f(\mathbf{x})), \epsilon(f(\mathbf{x})))) \\
    &=\mathcal{C}(\mathcal{L}(\tau(f(\mathbf{x}), \epsilon(f(\mathbf{x}))))\\
    &=\mathcal{I}(\mathbf{x})=\mathbb{1}_{[\mathbf{x} \in X]} \end{aligned}

    $$ Despite its simplicity, Proposition 1 has an important consequence: if we can identify candidate entries in \(\mathcal{L}\), even if \(\mathcal{C}\) is unknown, we can construct new examples \(\mathbf{x} \in X\) that respect (and provide evidence for) the compositional structure of \(X\).

  • Discovering Symmetries Automatically

    Figure 2: An application of the Approach

    Figure 2: An application of the Approach

  • Limiations

    The limitations are mainly from the power of data transormation builder. It is not capable to produce all the data transformations commonly associated with compositional generalization

    • Eg: it will not exchange substructures larger than a single token