Object Language Embedding

Standard ML is often used to write programs that manipulate another language, called the "object language", or OL. For example, the syntax of the HOL logic in hol90 or the syntax of CCS for the Concurrency Workbench. Typically, one defines the abstract syntax of OL by a datatype declaration. Then useful functions over the datatype can be defined (such as finding the free variables of a formula when OL is a logic). Soon afterwards, one concludes that concrete syntax is easier for humans to read than abstract syntax, and so writes a parser and prettyprinter for the OL.

In the situation just outlined, ML is called the metalanguage. (Edinburgh/INRIA/Cambridge ML, the precursor to Standard ML, was originally a programming metalanguage for a particular object language, the LCF logic, in the LCF proof assistant.) The purpose of a quotation/antiquotation mechanism is to allow one to embed expressions in the object language’s concrete syntax inside of ML programs, and to mix the object language expressions with ML expressions.

Quotation and Antiquotation

The quote/antiquote mechanism is enabled by setting

Compiler.Control.quotation : bool ref

to true. Then the backquote character ceases to be legal in symbolic identifiers, and takes on a special meaning.

A quotation is a special form of literal expression that represents the concrete syntax of an OL phrase. The backquote character is used to delimit quotations.

For a running example, suppose our OL is a simple propositional logic with propositions represented as values of abstract type prop. We might wish to write propositional expressions such as A/\B/\C.

The most common approximation to quotation is strings. This is not pleasant at times, especially when dealing with backslashes and newlines. Still, strings are bearable. Strings are not adequate, however, for the following idea.

The ML-OL relationship invites a notion of antiquotation: the temporary abandonment of parsing so that an ML value can be spliced into the middle of a quotation. Operations like this have cropped up under various names in various places: antiquote is due to Milner; Quine had a version called quasi-quotation in his 1940 book; Carnap used a notation much like it. It also closely resembles the Lisp backquote facility.

Using backquote, we write

- val f = `A /\ B \/ C`;
val f = [QUOTE "A /\ B \/ C"] : 'a SMLofNJ.frag list

The compiler interprets this as a list of fragments, using the frag data type from the SMLofNJ structure:

structure SMLofNJ = struct
  ...
  datatype 'a frag = QUOTE of string | ANTIQUOTE of 'a
end

More commonly, we invoke an OL parser to parse, enforce precedence, etc. By naming the parser something concise, such as “%”, we can use the syntax

- val % = my_proposition_parser;
val % : prop frag -> prop
- val p = % `A /\ B \/ C`;
val p = -: prop

An antiquote is written as a caret ^ character followed by either an SML identifier or a parenthesized SML expression. Antiquotation can be used to conveniently express contexts, which are often used as a descriptive tool for syntax. A context could be defined as a function taking a prop and directly placing it at a location in a quotation.

- fun foo a = % `^a ==> A`;
val foo : prop -> prop

In this case, foo p would denote the same proposition as

% `(A /\ B \/ C) ==> A`

Antiquotations can have nested quotations (which may contain antiquotes of their own, etc.):

- let val K x y = x
      val I x = x
   in % `A /\ ^(K (% `B`) (I (% `C`))) \/ C`
  end;

gives the same prop as that denoted by p. We note in passing that the power of the OL parser is completely up to its author: for example, in the framework offered here, one could write an OL “parser” for Scheme that parses function plus arguments, evaluates the function on the arguments, and finally prints the returned value.

Implementation of OL parsers

A concrete syntax quotation is mapped by the SML compiler into a frag list. Intuitively, a frag is a contiguous part of a quotation: A /\ B maps to [ QUOTE "A /\ B" ] while ^x /\ ^y maps to

[QUOTE "", ANTIQUOTE x, QUOTE "/\", ANTIQUOTE y, QUOTE ""]

(Note that the names frag, QUOTE, and ANTIQUOTE need to be qualified by the SMLofNJ structure-id, in practice.)

In this approach, the value of a quotation has type ol frag list where ol is the type of object language expressions; the type of the OL parser is ol frag list → ol.

The OL parser (in our example, %) must handle these lists and insert the antiquoted ML values in the right places.

CAVEATS

Often one wants to parse stratified languages, such as first order logic, or typed lambda calculus, which requires a trick. Also, there is a bit of trickery when one wants to deal with ML-Yacc and ML-Lex, especially when functorizing the parser.