Data Augumentation that Enhances Compositionality
- E. Akyürek and J. Andreas, “Compositionality as Lexical Symmetry,” arXiv:2201.12926 [cs], Jan. 2022, Accessed: Feb. 02, 2022. [Online]. Available: http://arxiv.org/abs/2201.12926
Abstract
Weakness of previous works:
- Standard deep network models lack the inductive biases needed to generalize compositionally in tasks like semantic parsing, translation, and question answering.
Novelty:
- A long line of language processing research has operationalized the principle of compositionality as a constraint on model architecture. How can we train unstructured neural sequence models that generalize compositionally? In this paper, we describe how to inject compositional inductive bias into general function approximators by operationalizing compositionality as a constraint on data distributions rather than model architectures
Contributions:
- a domain-general framework for compositional modeling that instead formulates compositionality as a constraint on data distributions.
- concrete instantiation of this framework as a data augmentation scheme that improves model performance on multiple tasks
Approach
Prelimenaries
Formal Definition of Symmetry
Definition 1. A symmetry of a set \(X\) is a function \(f\) satisfying:
\[ \{f(x): x \in X\}=X \]
That is, applying \(f\) to each element of \(X\) leaves \(X\) unchanged.
Method
Setup
We consider problems defined by a space of possible examples \(\mathcal{X}\), of which a subset of examples \(X\) are well-formed.
We assume each example \(\mathbf{x} \in \mathcal{X}\) is a discrete sequence \(\left[x_{1}, \ldots, x_{n}\right]\), with \(x_{i}\) drawn from a vocabulary \(\Sigma\).
Finally, we assume that well-formedness can be computed by a interpretation function \(\mathcal{I}: \mathcal{X} \rightarrow\{0,1\}\) with \(\mathcal{I}(\mathbf{x})=1\) iff \(\mathbf{x} \in X\).
-
Examples
-
Semantic Parsing
Examples \(\mathbf{x}\) are pairs \(\left(\mathrm{x}_{\mathrm{NL}}, \mathrm{x}_{\mathrm{LF}}\right)\), where \(\mathrm{x}_{\mathrm{NL}}\) is an sentence, \(\mathrm{x}_{\mathrm{LF}}\) is a logical form, and \(\mathcal{I}\left(\mathrm{x}_{\mathrm{NL}}, \mathrm{x}_{\mathrm{LF}}\right)=1\) iff \(\mathrm{x}_{\mathrm{LF}}\) represents a possible meaning of \(\mathrm{x}_{\mathrm{NL}}\).
-
Paraphrase Detection
Examples \(\mathbf{x}\) are sentence pairs \(\left(\mathbf{x}^{\prime}, \mathbf{x}^{\prime \prime}\right)\), with \(\mathcal{I}\left(\mathbf{x}^{\prime}, \mathbf{x}^{\prime \prime}\right)=1\) iff \(\mathrm{x}^{\prime}\) and \(\mathrm{x}^{\prime \prime}\) have the same meaning.
-
Compositionality as Lexical Symmetry
-
Definition of Lexicon
A lexicon \(\mathcal{L}\) is a pair \((\tau, \epsilon)\) where \(\tau\) : \(\Sigma \rightarrow \mathcal{T}\) assigns each symbol a type from some set \(\mathcal{T}\), and \(\epsilon: \Sigma \times \Sigma \rightarrow\{0,1\}\) is a relation indicating which symbols are semantically equivalent.
-
Definition of Lexicon Abstraction
Denote the lexical abstraction \(\mathcal{L}(\mathbf{x})=(\tau(\mathbf{x}), \epsilon(\mathbf{x}))\) where \(\tau(\mathbf{x})\) is a vector with \(\tau(\mathbf{x})_{i}=\tau\left(x_{i}\right)\) and \(\epsilon(\mathbf{x})\) is a matrix with \(\epsilon(\mathbf{x})_{i j}=\) \(\epsilon\left(x_{i}, x_{j}\right)\) (see Fig. 2). Then we say that the interpretation function \(\mathcal{I}\) (or equivalently the set \(X\) ) is $\mathcal{L}$-compositional iff we can write \(\mathcal{I}(\mathbf{x})=\mathcal{C}(\mathcal{L}(\mathbf{x}))\) for some composition procedure \(\mathcal{C}\). In other words, \(\mathcal{I}\) (and \(X\) ) are compositional if they can be defined purely in terms of the types of and relations between the symbols in \(\mathrm{x}\), without any information about the identity of the symbols themselves.
-
Examples
-
Semantic Parser
Figure 1: Semanic Parser Example
Given an input, if we can build up the abtract by type, and still fully determine the well-formness, compositionality exists.
-
-
-
Definition of Homomorphism
A function \(f\) is a homomorphism of \(\Sigma\) with respect to \(\mathcal{L}\) (an " \(\mathcal{L}\) -homomorphism") if:
\begin{gathered} \forall x \in \Sigma: \tau(x)=\tau(f(x)) \\
\forall x, x^{\prime} \in \Sigma: \epsilon\left(x, x^{\prime}\right)=\epsilon\left(f(x), f\left(x^{\prime}\right)\right) \end{gathered}\(f\) “preserves the structure” of \(\mathcal{L}\) : it replaces symbols with other symbols of the same type, and ensures that pairs of symbols maintain equivalence relationships. An example is depicted in Fig. 1; note that both the words yellow and green and the corresponding meanings must be swapped in order to satisfy Eq. 3 .
-
Proposition
Proposition 1. If \(X\) is $\mathcal{L}$-compositional, \(f\) is an $\mathcal{L}$-homomorphism, and \(\mathrm{x} \in X\), then \(f(\mathrm{x})=\left[f\left(x_{1}\right), \ldots, f\left(x_{n}\right)\right] \in X\)
Thus, every homomorphism of \(\mathcal{L}\) corresponds to a symmetry of \(X\).
Proof. From Definition 3 and \(4, \tau(\mathbf{x})=\) \(\tau(f(\mathbf{x}))\) and \(\epsilon(\mathbf{x})=\epsilon(f(\mathbf{x}))\). Then, $$
\begin{aligned} \mathbb{1}_{[f(\mathbf{x}) \in X]} &=\mathcal{I}(f(\mathbf{x})) \\
&=\mathcal{C}(\mathcal{L}(f(\mathbf{x}))) \\
&=\mathcal{C}(\mathcal{L}(\tau(f(\mathbf{x})), \epsilon(f(\mathbf{x})))) \\
&=\mathcal{C}(\mathcal{L}(\tau(f(\mathbf{x}), \epsilon(f(\mathbf{x}))))\\
&=\mathcal{I}(\mathbf{x})=\mathbb{1}_{[\mathbf{x} \in X]} \end{aligned}$$ Despite its simplicity, Proposition 1 has an important consequence: if we can identify candidate entries in \(\mathcal{L}\), even if \(\mathcal{C}\) is unknown, we can construct new examples \(\mathbf{x} \in X\) that respect (and provide evidence for) the compositional structure of \(X\).
-
Discovering Symmetries Automatically
Figure 2: An application of the Approach
-
Limiations
The limitations are mainly from the power of data transormation builder. It is not capable to produce all the data transformations commonly associated with compositional generalization
- Eg: it will not exchange substructures larger than a single token