Conventions#
WebAssembly is a programming language that has multiple concrete representations (its binary format and the text format). Both map to a common structure. For conciseness, this structure is described in the form of an abstract syntax. All parts of this specification are defined in terms of this abstract syntax.
Grammar Notation#
The following conventions are adopted in defining grammar rules for abstract syntax.
Terminal symbols (atoms) are written in sans-serif font: \(\mathsf{i32}, \mathsf{end}\).
Nonterminal symbols are written in italic font: \(\mathit{valtype}, \mathit{instr}\).
\(A^n\) is a sequence of \(n\geq 0\) iterations of \(A\).
\(A^\ast\) is a possibly empty sequence of iterations of \(A\). (This is a shorthand for \(A^n\) used where \(n\) is not relevant.)
\(A^+\) is a non-empty sequence of iterations of \(A\). (This is a shorthand for \(A^n\) where \(n \geq 1\).)
\(A^?\) is an optional occurrence of \(A\). (This is a shorthand for \(A^n\) where \(n \leq 1\).)
Productions are written \(\mathit{sym} ::= A_1 ~|~ \dots ~|~ A_n\).
Large productions may be split into multiple definitions, indicated by ending the first one with explicit ellipses, \(\mathit{sym} ::= A_1 ~|~ \dots\), and starting continuations with ellipses, \(\mathit{sym} ::= \dots ~|~ A_2\).
Some productions are augmented with side conditions in parentheses, “\((\mathrel{\mbox{if}} \mathit{condition})\)”, that provide a shorthand for a combinatorial expansion of the production into many separate cases.
If the same meta variable or non-terminal symbol appears multiple times in a production, then all those occurrences must have the same instantiation. (This is a shorthand for a side condition requiring multiple different variables to be equal.)
Auxiliary Notation#
When dealing with syntactic constructs the following notation is also used:
\(\epsilon\) denotes the empty sequence.
\(|s|\) denotes the length of a sequence \(s\).
\(s[i]\) denotes the \(i\)-th element of a sequence \(s\), starting from \(0\).
\(s[i \href{../syntax/conventions.html#notation-slice}{\mathrel{\mathbf{:}}} n]\) denotes the sub-sequence \(s[i]~\dots~s[i+n-1]\) of a sequence \(s\).
\(s \href{../syntax/conventions.html#notation-replace}{\mathrel{\mbox{with}}} [i] = A\) denotes the same sequence as \(s\), except that the \(i\)-th element is replaced with \(A\).
\(s \href{../syntax/conventions.html#notation-replace}{\mathrel{\mbox{with}}} [i \href{../syntax/conventions.html#notation-slice}{\mathrel{\mathbf{:}}} n] = A^n\) denotes the same sequence as \(s\), except that the sub-sequence \(s[i \href{../syntax/conventions.html#notation-slice}{\mathrel{\mathbf{:}}} n]\) is replaced with \(A^n\).
\(\href{../syntax/conventions.html#notation-concat}{\mathrm{concat}}(s^\ast)\) denotes the flat sequence formed by concatenating all sequences \(s_i\) in \(s^\ast\).
Moreover, the following conventions are employed:
The notation \(x^n\), where \(x\) is a non-terminal symbol, is treated as a meta variable ranging over respective sequences of \(x\) (similarly for \(x^\ast\), \(x^+\), \(x^?\)).
When given a sequence \(x^n\), then the occurrences of \(x\) in a sequence written \((A_1~x~A_2)^n\) are assumed to be in point-wise correspondence with \(x^n\) (similarly for \(x^\ast\), \(x^+\), \(x^?\)). This implicitly expresses a form of mapping syntactic constructions over a sequence.
Productions of the following form are interpreted as records that map a fixed set of fields \(\mathsf{field}_i\) to “values” \(A_i\), respectively:
The following notation is adopted for manipulating such records:
\(r.\mathsf{field}\) denotes the contents of the \(\mathsf{field}\) component of \(r\).
\(r \href{../syntax/conventions.html#notation-replace}{\mathrel{\mbox{with}}} \mathsf{field} = A\) denotes the same record as \(r\), except that the contents of the \(\mathsf{field}\) component is replaced with \(A\).
\(r_1 \href{../syntax/conventions.html#notation-compose}{\oplus} r_2\) denotes the composition of two records with the same fields of sequences by appending each sequence point-wise:
\[\{ \mathsf{field}_1\,A_1^\ast, \mathsf{field}_2\,A_2^\ast, \dots \} \href{../syntax/conventions.html#notation-compose}{\oplus} \{ \mathsf{field}_1\,B_1^\ast, \mathsf{field}_2\,B_2^\ast, \dots \} = \{ \mathsf{field}_1\,A_1^\ast~B_1^\ast, \mathsf{field}_2\,A_2^\ast~B_2^\ast, \dots \}\]\(\href{../syntax/conventions.html#notation-compose}{\bigoplus} r^\ast\) denotes the composition of a sequence of records, respectively; if the sequence is empty, then all fields of the resulting record are empty.
The update notation for sequences and records generalizes recursively to nested components accessed by “paths” \(\mathit{pth} ::= ([\dots] \;| \;.\mathsf{field})^+\):
\(s \href{../syntax/conventions.html#notation-replace}{\mathrel{\mbox{with}}} [i]\,\mathit{pth} = A\) is short for \(s \href{../syntax/conventions.html#notation-replace}{\mathrel{\mbox{with}}} [i] = (s[i] \href{../syntax/conventions.html#notation-replace}{\mathrel{\mbox{with}}} \mathit{pth} = A)\),
\(r \href{../syntax/conventions.html#notation-replace}{\mathrel{\mbox{with}}} \mathsf{field}\,\mathit{pth} = A\) is short for \(r \href{../syntax/conventions.html#notation-replace}{\mathrel{\mbox{with}}} \mathsf{field} = (r.\mathsf{field} \href{../syntax/conventions.html#notation-replace}{\mathrel{\mbox{with}}} \mathit{pth} = A)\),
where \(r \href{../syntax/conventions.html#notation-replace}{\mathrel{\mbox{with}}}~.\mathsf{field} = A\) is shortened to \(r \href{../syntax/conventions.html#notation-replace}{\mathrel{\mbox{with}}} \mathsf{field} = A\).
Vectors#
Vectors are bounded sequences of the form \(A^n\) (or \(A^\ast\)), where the \(A\) can either be values or complex constructions. A vector can have at most \(2^{32}-1\) elements.