Jones N.D.Partial evaluation and automatic program generation.1999
.pdfTypes in programming languages 31
shows that (m + n; m , n) has type Z Z. The nodes in the inference tree are obtained from the inference rules by instantiating the types t, t1, and t2 to Z.
For another example, when = [f 7! N !N ; x 7! N], the following inference
` f : N !N ` x : N
` f : N !N |
|
|
|
` f |
(x) : N |
||
|
|||
|
|
||
` f(f(x)) : N |
|
shows that f(f(x)) has type N. Thus if the arguments of function twice above have type N !N and N, then its result has type N .
2.2.3Sum types
The sum of two or more sets is a set of tagged values. A tagged value is written as a tag applied to a value. For instance, if Inttag and Booltag are tags, then the Inttag(17) and Booltag(false) are values belonging to the sum type
Inttag Z j Booltag B
The tags are also called constructors.
A general sum type has the following form, where the Ci are the tags (or constructors):
C1 t11 . . . t1k1 j j Cn tn1 . . . tnkn
where n 1, all ki 0, and all constructors Ci must be distinct. A value belonging to this type is a construction of form Ci(vi1; . . . ; viki ), where vi1 : ti1, . . . , viki : tiki . The above sum type denotes the following set of values:
n |
|
|
|
[ |
f Ci(v1 |
; . . . ; vki) j vj : tij |
for j = 1; . . . ; ki g |
i=1 |
|
|
|
Note that while all values belonging to a product type such as N (B B) must have the same structure, namely (n; (b1; b2)), the values of a sum type may have di erent structures. However, all values with the same constructor must have the same structure.
Programming notations for sum types
In Pascal, sum types can be expressed by variant records, using an enumerated type to hold the tags (such as Inttag and Booltag above). The rst example could be written:
32 Functions, Types, and Expressions
type tag = (Inttag, Booltag); sum = record
case t : tag of
Inttag : ( N : integer); Booltag : ( B : boolean)
end
In ML, the notation is closer to that shown above:
datatype sum = Inttag of int | Booltag of bool
fun f1 (Inttag (n)) = Inttag(1 - n)
|f1 (Booltag (b)) = Booltag(not b)
Since the programming language Scheme is dynamically typed, it has no notation for de ning sum types. However, in Section 2.3.4 we show an encoding of sum types in Scheme.
2.3Recursive data types
Our last way to de ne types is by recursive type equations. The collection of values de ned this way is called a data type. An equation de nes a type T as a sum type
T = C1 t11 . . . t1k1 j j Cn tn1 . . . tnkn
where the type expressions tij may contain T and names of other types. As before, all constructors must be distinct.
Since a data type is essentially a sum type, its values need not all have the same form. Since it may be recursive, its values may be arbitrarily large.
Familiar examples of data types include search trees, game trees, and expressions in a programming language. We begin with a simple and widely used special case: the type of lists, i.e. nite sequences of elements from a given base type.
2.3.1List types
When describing programming languages, one often needs sequences of values whose length is not statically determined. Typical examples are the input les and output les of a program, and program texts. For such applications the list type is useful.
Let A be a type (set). Then the set of nite lists of elements of A is the data type As de ned by:
As = nil j cons A As |
(2.1) |
Recursive data types 33
where nil and cons are constructors. The de nition says that an element of As is either nil, or has form cons(a; as), where a has type A and as has type As. That is, a nite list is either the empty list nil, or is non-empty and has a rst element (head) a of type A, and a tail as, which must in turn be a list of A elements.
The type of nite lists of elements of A is often called A . For example, the list containing 9 and 13 is written cons(7; cons(9; cons(13; nil))), and has type Z . Usually the more convenient notation [7; 9; 13] is used:
1. The list containing a1; a2; . . .; an in that order is written
[a1; a2; . . . ; an]
instead of cons(a1; cons(a2; . . . ; cons(an; nil) . . .)) for n 0, when all ai : A.
2.The empty list is written [] instead of nil.
3.The constructor `cons' can be written in x as `::', such that
a:: [a1; a2; . . . ; an]) = [a; a1; a2; . . . ; an]
4.The in x constructor `::' associates to the right, so that
a1 :: (a2 :: ( :: (an :: nil) )) = a1 :: a2 :: :: an :: nil
5. The operators hd : A !A and tl : A !A are de ned by:
hd([a1; a2; . . . ; an]) |
= |
a1; |
provided n 6= 0 |
tl([a1; a2; . . . ; an]) |
= |
[a2; . . . ; an]; |
provided n 6= 0 |
Programming notations for list types
Since Pascal has no built-in declarations or operations for manipulating lists, they must be encoded using pointers, which requires great care.
ML uses essentially the in x :: notation and the bracket list notation [7,9,13] above. The type A is written as A list, where ML type A denotes set A. The use of pattern matching allows simple and compact programs. For example, the length of a list is computed by:
fun |
length |
nil |
= |
0 |
|
| |
length |
(_ :: xs) = |
1 |
+ length xs |
Scheme uses S-expressions to represent lists, writing the empty list nil as (), and the example list above above as '(7 9 13). The functions hd and tl are called car and cdr in Scheme.
In fact, Scheme S-expressions roughly correspond to the recursive datatype:
Se = () j cons Se Se j atom Basetype |
(2.2) |
except that the constructor atom on Basetype values is omitted. Application of a constructor C to a value a is written as function application (C a), so the list shown above is actually an abbreviation for (cons 7 (cons 9 (cons 13 nil))).
34Functions, Types, and Expressions
2.3.2The meaning of recursive data type de nitions
It is important to agree on the meaning (or semantics) of type equation such as equation (2.1), de ning the type of lists. If we regard a type as a set of values, it is natural to interpret j as set union [, and nil as the one-element set fnilg. The equation then asserts the equality of two sets:
As = fnilg [ f cons(a; as) j a 2 A and as 2 As g |
(2.3) |
We de ne that the set As speci ed by type equation (2.1) is the least set which is a solution to equation (2.3).
The least set solving equation (2.3) can be constructed iteratively, starting with the empty set As0 = fg. For example, if A denotes the set f1; 2g, then
As0 |
= fg |
|
|
|
As1 |
= |
f[ ]g [ fa :: as |
j |
a 2 A and as 2 As0g |
|
= |
f[ ]g |
|
|
As2 |
= |
f[ ]g [ fa :: as |
j |
a 2 A and as 2 As1g |
=f[ ]; [1]; [2]g
As3 = f[ ]g [ fa :: as j a 2 A and as 2 As2g
=f[ ]; [1]; [2]; [1; 1]; [1; 2]; [2; 1]; [2; 2]g
As4 = . . .
Note that As1 contains all lists of length less than 1, As2 contains all lists of length less than 2, etc. The set As of all nite lists is clearly the union of all these:
As = As0 [ As1 [ As2 [ . . ..
2.3.3Syntax trees
Recursive data types are often used for representing program fragments, such as arithmetic expressions. For instance, the set of arithmetic expressions built from integer constants and the operators Add and Mul (for addition and multiplication) can be represented by the data type Nexp:
Nexp = Int N j Add Nexp Nexp j Mul Nexp Nexp
For example, the arithmetic expression 5+(6*7) can be represented by the Nexp- value Add(Int(5); Mul (Int(6); Int (7))). Drawing this value as a tree makes its structure more obvious:
Recursive data types 35
|
|
Add |
|
|
|
,, |
,,@@ |
@@ |
|
||
|
|
|
|||
Int |
|
|
Mul |
||
|
|
,,,@@@ |
|
||
5 |
|||||
|
|
Int |
Int |
||
|
|
|
|
|
|
|
|
6 |
7 |
This is called an abstract syntax tree (or just a syntax tree) for the expression 5+(6*7). Also, a recursive data type used for representing program fragments is often called an abstract syntax.
Recursive data type de nitions resemble the production rules of a context-free grammar. However, a grammar describes a set of linear strings of symbols, whereas a recursive data type de nition describes a set of trees, where each subtree is labelled by a constructor.
Programming notations for syntax trees
Pascal has no recursive declarations in the sense just given, so all structure manipulation must be done by means of pointers.
ML has a notion of data type very similar to the one above. The type of arithmetic expressions can be de ned as follows, together with a function to evaluate the expressions:
datatype nexp = Int of int
| Add of nexp * nexp | Mul of nexp * nexp
(* |
Constant |
*) |
(* |
Addition |
*) |
(* Multiplication *)
fun |
neval (Int n) |
= |
n |
|
|
| |
neval (Add(e1, e2)) = |
(neval e1) |
+ (neval |
e2) |
|
| |
neval (Mul(e1, e2)) = |
(neval e1) |
* (neval |
e2) |
The Nexp element above, written in ML as an element of the nexp datatype, would be Add(Int 5, Mul(Int 6, Int 7)).
2.3.4 Encoding recursive data types in Scheme
In Scheme, one cannot declare recursive data types. However, there is a standard way to encode them using Scheme's (dynamically typed) lists.
36 Functions, Types, and Expressions
A tagged value C(v1; . . . ; vn) is represented as a Scheme list (C v10 . . . vn0 ). Therst element of the list is a Scheme atom C representing the constructor, and the remaining elements vi0 are representations of the constructor's arguments vi.
For example, Inttag(17) is represented as (Inttag 17). Also, the Nexp value shown above is encoded in Scheme as
(Add (Int 5) (Mul (Int 6) (Int 7)))
Pattern matching on a tagged value v is encoded as a sequence of tests on its tag. Since the tag is the rst element in the encoding, it can be extracted by (car v). The rst and second constructor arguments can be extracted by (car (cdr v)) and (car (cdr (cdr v))), which can be abbreviated (cadr v) and (caddr v).
Using this encoding, the neval function shown in ML above can be written in Scheme as follows:
(define (neval v)
(if (equal? (car v) 'Int) (cadr v)
(if (equal? (car v) 'Add)
(+ (neval (cadr v)) (neval (caddr v))) (if (equal? (car v) 'Mul)
(* (neval (cadr v)) (neval (caddr v))) (error)
))))
In practice the tags on numbers are usually left out, writing 5 instead of (Int 5). Then pattern matching exploits the fact that there is a predicate number? in Scheme which distinguishes numbers from all other values. Similarly, a source program variable x is usually represented just as the Scheme atom x instead of (Var x), say. Then pattern matching exploits the Scheme predicate atom?.
We use this encoding of recursive data types and pattern matching frequently in later chapters. However, since pattern matching improves the readability and reduces the length of programs, we shall allow ourselves to use it freely in pseudoScheme programs, although this introduces some ambiguity. The intention is that a knowledgeable programmer can convert pseudo-Scheme programs into real Scheme programs by replacing pattern matching with a sequence of tests as above.
2.4Summary
In this chapter we have introduced the following concepts and terminology for future use:
algorithms and programs, and their di erences from functions;
the Cartesian product A B of two sets;
Exercises 37
t
the set A!B of total functions from set A to set B;
the set A!B of partial functions from set A to set B;
notations for nite functions and updating (f[x 7!v]);
types as sets of possible data values;
sum data types with constructors;
recursively de ned data types with constructors;
abstract syntax as a special case of recursively de ned data types;
an encoding of recursive data types in Scheme.
2.5Exercises
Exercise 2.1 Discuss the relations between a mathematical function f : A ! B and the following concepts from programming:
1.an array in Pascal;
2.a relation in a database;
3.a function in Pascal.
2
Exercise 2.2 Give an example illustrating the di erence between the type A !
(B ! C) and the type (A ! B) ! C. |
2 |
Exercise 2.3 The types A ! (B ! C) and A B ! C are similar but not identical. Give an intuitive explanation of the similarity. Can you de ne an ML function uncurry, such that whenever f is a function of type (A ! (B ! C)),
then uncurry f is the corresponding function of type A B ! C? |
2 |
|
Exercise 2.4 |
Use the type inference rules to show that when arguments m and n |
|
k both have type Z, then expression k(m; n) has type Z Z. |
2 |
|
Exercise 2.5 |
De ne a recursive data type Bintree of binary trees whose leaves |
are integers. De ne a function add of type Bintree ! Z, such that add(tree)
calculates the sum of all the leaves in tree. |
2 |
||
Exercise 2.6 Which set has most members: |
|
||
Fun |
= |
ff j f is a partial function: N ! Ng |
|
Prog |
= |
fp j p is a Pascal program, de ning a function: N ! Ng |
|
2
Chapter 3
Programming Languages and
Interpreters
3.1Interpreters, compilers, and running times
To give a semantics for a programming language is to assign systematically a meaning to every program in the language.
3.1.1Operational semantics
Here we de ne the meaning of programs in a language S by giving an interpreter for S-programs, that is, a program for executing S-programs. This provides a very concrete computer-oriented semantics. Interpreters have been used for language de nition for many years, e.g. for Lisp [181], the lambda calculus [162], and as a general formalism for language de nitions [25].
More abstract versions of operational semantics exist, notably Plotkin's structural operational semantics [220], and Kahn's natural semantics [142]. In these systems a language de nition is a set of axioms and inference rules su cient to execute programs. They have some advantages over interpreters as language de nitions: they are more compact; they are less dependent on current computer architectures (for example, arbitrary decisions due to the sequential nature of most computers can be avoided); their mathematical properties are easier to analyse; and they seem better suited to describing communication among parallel processes.
On the other hand, it is still not clear how to implement structural operational semantics or natural semantics e ciently on current computers, although progress has recently been made [142,110]. Axiomatic and algebraic semantics seem less well suited to automation, although both have been used as guidelines for writing correct compilers.
38
Interpreters, compilers, and running times 39
3.1.2Programming languages
Let L-programs denote the set of syntactically correct programs in language L. As in Chapter 1, we de ne the meaning of program p 2 L-programs to be
[[p]]L: input ! output
That is, the meaning of program p is the input{output function it computes.
3.1.3Interpreters
Let L be an (implementation) language and S be a (source) language. An interpreter int 2 L-programs for a language S has two inputs: a source program p 2 S-programs to be executed, and the source program's input data d 2 D. Running the interpreter with inputs p and d on an L-machine must produce the same result as running p with input d on an S-machine.
More precisely, the L-program int is an interpreter for S if for every programs and every d 2 D,
[[p]]S d = [[int]]L [p,d]
We use the symbol
S |
|
L |
= f int j 8p; d: [[p]]S d = [[int]]L [p,d] g |
to denote the set of all interpreters for S written in L.
3.1.4Compilers
Now let T be a (target) language. A compiler comp 2 L-programs from S to T has one input: a source program p 2 S-programs to be compiled. Running the compiler with input p (on an L-machine) must produce a target, such that running target on a T-machine has the same e ect as running p on an S-machine.
More precisely, the L-program comp is a compiler from S to T if for every p 2 S-programs and every d 2 D,
[[p]]S d = [[[[comp]]L p]]T d We use the symbol
40 Programming Languages and Interpreters
S |
- |
T |
|
|
|
= f comp j 8p; d: [[p]S d |
= [[[[comp]]L p ]]T d g |
||
|
L |
|
||
|
|
|||
|
|
|
|
|
|
|
|
|
|
to denote the set of compilers from S to T written in L.
3.1.5Compilation
Suppose we are given a collection of S-programs, nature unspeci ed. This set can be denoted by
** S
Suppose we also have a compiler comp from source language S to target language T, written in L. We can then perform translations (assuming L-programs can be executed). This situation can be described by the diagram
source program 2 |
** |
|
|
|
|
|
** 3 target program |
|
|
|
- |
|
|||
|
S |
S |
T |
|
T |
||
|
|
|
|||||
|
|
|
|
L |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
" |
|
|
|
compiler
3.1.6Layers of interpretation
Diagrams such as these can be thought of as describing `computer runs'. For example, suppose a Lisp system (called L) is processed interpretively by an interpreter written in Sun RISC machine code (call this M). The machine code itself is processed by the central processor (call this C) so two levels of interpretation are involved, as described by the following gure: