Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Jones N.D.Partial evaluation and automatic program generation.1999

.pdf
Скачиваний:
9
Добавлен:
23.08.2013
Размер:
1.75 Mб
Скачать

Types in programming languages 31

shows that (m + n; m , n) has type Z Z. The nodes in the inference tree are obtained from the inference rules by instantiating the types t, t1, and t2 to Z.

For another example, when = [f 7! N !N ; x 7! N], the following inference

` f : N !N ` x : N

` f : N !N

 

 

` f

(x) : N

 

 

 

` f(f(x)) : N

 

shows that f(f(x)) has type N. Thus if the arguments of function twice above have type N !N and N, then its result has type N .

2.2.3Sum types

The sum of two or more sets is a set of tagged values. A tagged value is written as a tag applied to a value. For instance, if Inttag and Booltag are tags, then the Inttag(17) and Booltag(false) are values belonging to the sum type

Inttag Z j Booltag B

The tags are also called constructors.

A general sum type has the following form, where the Ci are the tags (or constructors):

C1 t11 . . . t1k1 j j Cn tn1 . . . tnkn

where n 1, all ki 0, and all constructors Ci must be distinct. A value belonging to this type is a construction of form Ci(vi1; . . . ; viki ), where vi1 : ti1, . . . , viki : tiki . The above sum type denotes the following set of values:

n

 

 

 

[

f Ci(v1

; . . . ; vki) j vj : tij

for j = 1; . . . ; ki g

i=1

 

 

 

Note that while all values belonging to a product type such as N (B B) must have the same structure, namely (n; (b1; b2)), the values of a sum type may have di erent structures. However, all values with the same constructor must have the same structure.

Programming notations for sum types

In Pascal, sum types can be expressed by variant records, using an enumerated type to hold the tags (such as Inttag and Booltag above). The rst example could be written:

32 Functions, Types, and Expressions

type tag = (Inttag, Booltag); sum = record

case t : tag of

Inttag : ( N : integer); Booltag : ( B : boolean)

end

In ML, the notation is closer to that shown above:

datatype sum = Inttag of int | Booltag of bool

fun f1 (Inttag (n)) = Inttag(1 - n)

|f1 (Booltag (b)) = Booltag(not b)

Since the programming language Scheme is dynamically typed, it has no notation for de ning sum types. However, in Section 2.3.4 we show an encoding of sum types in Scheme.

2.3Recursive data types

Our last way to de ne types is by recursive type equations. The collection of values de ned this way is called a data type. An equation de nes a type T as a sum type

T = C1 t11 . . . t1k1 j j Cn tn1 . . . tnkn

where the type expressions tij may contain T and names of other types. As before, all constructors must be distinct.

Since a data type is essentially a sum type, its values need not all have the same form. Since it may be recursive, its values may be arbitrarily large.

Familiar examples of data types include search trees, game trees, and expressions in a programming language. We begin with a simple and widely used special case: the type of lists, i.e. nite sequences of elements from a given base type.

2.3.1List types

When describing programming languages, one often needs sequences of values whose length is not statically determined. Typical examples are the input les and output les of a program, and program texts. For such applications the list type is useful.

Let A be a type (set). Then the set of nite lists of elements of A is the data type As de ned by:

As = nil j cons A As

(2.1)

Recursive data types 33

where nil and cons are constructors. The de nition says that an element of As is either nil, or has form cons(a; as), where a has type A and as has type As. That is, a nite list is either the empty list nil, or is non-empty and has a rst element (head) a of type A, and a tail as, which must in turn be a list of A elements.

The type of nite lists of elements of A is often called A . For example, the list containing 9 and 13 is written cons(7; cons(9; cons(13; nil))), and has type Z . Usually the more convenient notation [7; 9; 13] is used:

1. The list containing a1; a2; . . .; an in that order is written

[a1; a2; . . . ; an]

instead of cons(a1; cons(a2; . . . ; cons(an; nil) . . .)) for n 0, when all ai : A.

2.The empty list is written [] instead of nil.

3.The constructor `cons' can be written in x as `::', such that

a:: [a1; a2; . . . ; an]) = [a; a1; a2; . . . ; an]

4.The in x constructor `::' associates to the right, so that

a1 :: (a2 :: ( :: (an :: nil) )) = a1 :: a2 :: :: an :: nil

5. The operators hd : A !A and tl : A !A are de ned by:

hd([a1; a2; . . . ; an])

=

a1;

provided n 6= 0

tl([a1; a2; . . . ; an])

=

[a2; . . . ; an];

provided n 6= 0

Programming notations for list types

Since Pascal has no built-in declarations or operations for manipulating lists, they must be encoded using pointers, which requires great care.

ML uses essentially the in x :: notation and the bracket list notation [7,9,13] above. The type A is written as A list, where ML type A denotes set A. The use of pattern matching allows simple and compact programs. For example, the length of a list is computed by:

fun

length

nil

=

0

 

|

length

(_ :: xs) =

1

+ length xs

Scheme uses S-expressions to represent lists, writing the empty list nil as (), and the example list above above as '(7 9 13). The functions hd and tl are called car and cdr in Scheme.

In fact, Scheme S-expressions roughly correspond to the recursive datatype:

Se = () j cons Se Se j atom Basetype

(2.2)

except that the constructor atom on Basetype values is omitted. Application of a constructor C to a value a is written as function application (C a), so the list shown above is actually an abbreviation for (cons 7 (cons 9 (cons 13 nil))).

34Functions, Types, and Expressions

2.3.2The meaning of recursive data type de nitions

It is important to agree on the meaning (or semantics) of type equation such as equation (2.1), de ning the type of lists. If we regard a type as a set of values, it is natural to interpret j as set union [, and nil as the one-element set fnilg. The equation then asserts the equality of two sets:

As = fnilg [ f cons(a; as) j a 2 A and as 2 As g

(2.3)

We de ne that the set As speci ed by type equation (2.1) is the least set which is a solution to equation (2.3).

The least set solving equation (2.3) can be constructed iteratively, starting with the empty set As0 = fg. For example, if A denotes the set f1; 2g, then

As0

= fg

 

 

As1

=

f[ ]g [ fa :: as

j

a 2 A and as 2 As0g

 

=

f[ ]g

 

 

As2

=

f[ ]g [ fa :: as

j

a 2 A and as 2 As1g

=f[ ]; [1]; [2]g

As3 = f[ ]g [ fa :: as j a 2 A and as 2 As2g

=f[ ]; [1]; [2]; [1; 1]; [1; 2]; [2; 1]; [2; 2]g

As4 = . . .

Note that As1 contains all lists of length less than 1, As2 contains all lists of length less than 2, etc. The set As of all nite lists is clearly the union of all these:

As = As0 [ As1 [ As2 [ . . ..

2.3.3Syntax trees

Recursive data types are often used for representing program fragments, such as arithmetic expressions. For instance, the set of arithmetic expressions built from integer constants and the operators Add and Mul (for addition and multiplication) can be represented by the data type Nexp:

Nexp = Int N j Add Nexp Nexp j Mul Nexp Nexp

For example, the arithmetic expression 5+(6*7) can be represented by the Nexp- value Add(Int(5); Mul (Int(6); Int (7))). Drawing this value as a tree makes its structure more obvious:

Recursive data types 35

 

 

Add

 

 

,,

,,@@

@@

 

 

 

 

Int

 

 

Mul

 

 

,,,@@@

 

5

 

 

Int

Int

 

 

 

 

 

 

 

 

6

7

This is called an abstract syntax tree (or just a syntax tree) for the expression 5+(6*7). Also, a recursive data type used for representing program fragments is often called an abstract syntax.

Recursive data type de nitions resemble the production rules of a context-free grammar. However, a grammar describes a set of linear strings of symbols, whereas a recursive data type de nition describes a set of trees, where each subtree is labelled by a constructor.

Programming notations for syntax trees

Pascal has no recursive declarations in the sense just given, so all structure manipulation must be done by means of pointers.

ML has a notion of data type very similar to the one above. The type of arithmetic expressions can be de ned as follows, together with a function to evaluate the expressions:

datatype nexp = Int of int

| Add of nexp * nexp | Mul of nexp * nexp

(*

Constant

*)

(*

Addition

*)

(* Multiplication *)

fun

neval (Int n)

=

n

 

 

|

neval (Add(e1, e2)) =

(neval e1)

+ (neval

e2)

|

neval (Mul(e1, e2)) =

(neval e1)

* (neval

e2)

The Nexp element above, written in ML as an element of the nexp datatype, would be Add(Int 5, Mul(Int 6, Int 7)).

2.3.4 Encoding recursive data types in Scheme

In Scheme, one cannot declare recursive data types. However, there is a standard way to encode them using Scheme's (dynamically typed) lists.

36 Functions, Types, and Expressions

A tagged value C(v1; . . . ; vn) is represented as a Scheme list (C v10 . . . vn0 ). Therst element of the list is a Scheme atom C representing the constructor, and the remaining elements vi0 are representations of the constructor's arguments vi.

For example, Inttag(17) is represented as (Inttag 17). Also, the Nexp value shown above is encoded in Scheme as

(Add (Int 5) (Mul (Int 6) (Int 7)))

Pattern matching on a tagged value v is encoded as a sequence of tests on its tag. Since the tag is the rst element in the encoding, it can be extracted by (car v). The rst and second constructor arguments can be extracted by (car (cdr v)) and (car (cdr (cdr v))), which can be abbreviated (cadr v) and (caddr v).

Using this encoding, the neval function shown in ML above can be written in Scheme as follows:

(define (neval v)

(if (equal? (car v) 'Int) (cadr v)

(if (equal? (car v) 'Add)

(+ (neval (cadr v)) (neval (caddr v))) (if (equal? (car v) 'Mul)

(* (neval (cadr v)) (neval (caddr v))) (error)

))))

In practice the tags on numbers are usually left out, writing 5 instead of (Int 5). Then pattern matching exploits the fact that there is a predicate number? in Scheme which distinguishes numbers from all other values. Similarly, a source program variable x is usually represented just as the Scheme atom x instead of (Var x), say. Then pattern matching exploits the Scheme predicate atom?.

We use this encoding of recursive data types and pattern matching frequently in later chapters. However, since pattern matching improves the readability and reduces the length of programs, we shall allow ourselves to use it freely in pseudoScheme programs, although this introduces some ambiguity. The intention is that a knowledgeable programmer can convert pseudo-Scheme programs into real Scheme programs by replacing pattern matching with a sequence of tests as above.

2.4Summary

In this chapter we have introduced the following concepts and terminology for future use:

algorithms and programs, and their di erences from functions;

the Cartesian product A B of two sets;

Exercises 37

t

the set A!B of total functions from set A to set B;

the set A!B of partial functions from set A to set B;

notations for nite functions and updating (f[x 7!v]);

types as sets of possible data values;

sum data types with constructors;

recursively de ned data types with constructors;

abstract syntax as a special case of recursively de ned data types;

an encoding of recursive data types in Scheme.

2.5Exercises

Exercise 2.1 Discuss the relations between a mathematical function f : A ! B and the following concepts from programming:

1.an array in Pascal;

2.a relation in a database;

3.a function in Pascal.

2

Exercise 2.2 Give an example illustrating the di erence between the type A !

(B ! C) and the type (A ! B) ! C.

2

Exercise 2.3 The types A ! (B ! C) and A B ! C are similar but not identical. Give an intuitive explanation of the similarity. Can you de ne an ML function uncurry, such that whenever f is a function of type (A ! (B ! C)),

then uncurry f is the corresponding function of type A B ! C?

2

Exercise 2.4

Use the type inference rules to show that when arguments m and n

k both have type Z, then expression k(m; n) has type Z Z.

2

Exercise 2.5

De ne a recursive data type Bintree of binary trees whose leaves

are integers. De ne a function add of type Bintree ! Z, such that add(tree)

calculates the sum of all the leaves in tree.

2

Exercise 2.6 Which set has most members:

 

Fun

=

ff j f is a partial function: N ! Ng

 

Prog

=

fp j p is a Pascal program, de ning a function: N ! Ng

 

2

Chapter 3

Programming Languages and

Interpreters

3.1Interpreters, compilers, and running times

To give a semantics for a programming language is to assign systematically a meaning to every program in the language.

3.1.1Operational semantics

Here we de ne the meaning of programs in a language S by giving an interpreter for S-programs, that is, a program for executing S-programs. This provides a very concrete computer-oriented semantics. Interpreters have been used for language de nition for many years, e.g. for Lisp [181], the lambda calculus [162], and as a general formalism for language de nitions [25].

More abstract versions of operational semantics exist, notably Plotkin's structural operational semantics [220], and Kahn's natural semantics [142]. In these systems a language de nition is a set of axioms and inference rules su cient to execute programs. They have some advantages over interpreters as language de nitions: they are more compact; they are less dependent on current computer architectures (for example, arbitrary decisions due to the sequential nature of most computers can be avoided); their mathematical properties are easier to analyse; and they seem better suited to describing communication among parallel processes.

On the other hand, it is still not clear how to implement structural operational semantics or natural semantics e ciently on current computers, although progress has recently been made [142,110]. Axiomatic and algebraic semantics seem less well suited to automation, although both have been used as guidelines for writing correct compilers.

38

p 2 S-

Interpreters, compilers, and running times 39

3.1.2Programming languages

Let L-programs denote the set of syntactically correct programs in language L. As in Chapter 1, we de ne the meaning of program p 2 L-programs to be

[[p]]L: input ! output

That is, the meaning of program p is the input{output function it computes.

3.1.3Interpreters

Let L be an (implementation) language and S be a (source) language. An interpreter int 2 L-programs for a language S has two inputs: a source program p 2 S-programs to be executed, and the source program's input data d 2 D. Running the interpreter with inputs p and d on an L-machine must produce the same result as running p with input d on an S-machine.

More precisely, the L-program int is an interpreter for S if for every programs and every d 2 D,

[[p]]S d = [[int]]L [p,d]

We use the symbol

S

 

L

= f int j 8p; d: [[p]]S d = [[int]]L [p,d] g

to denote the set of all interpreters for S written in L.

3.1.4Compilers

Now let T be a (target) language. A compiler comp 2 L-programs from S to T has one input: a source program p 2 S-programs to be compiled. Running the compiler with input p (on an L-machine) must produce a target, such that running target on a T-machine has the same e ect as running p on an S-machine.

More precisely, the L-program comp is a compiler from S to T if for every p 2 S-programs and every d 2 D,

[[p]]S d = [[[[comp]]L p]]T d We use the symbol

40 Programming Languages and Interpreters

S

-

T

 

 

 

= f comp j 8p; d: [[p]S d

= [[[[comp]]L p ]]T d g

 

L

 

 

 

 

 

 

 

 

 

 

 

 

to denote the set of compilers from S to T written in L.

3.1.5Compilation

Suppose we are given a collection of S-programs, nature unspeci ed. This set can be denoted by

** S

Suppose we also have a compiler comp from source language S to target language T, written in L. We can then perform translations (assuming L-programs can be executed). This situation can be described by the diagram

source program 2

**

 

 

 

 

 

** 3 target program

 

 

 

-

 

 

S

S

T

 

T

 

 

 

 

 

 

 

L

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

"

 

 

 

compiler

3.1.6Layers of interpretation

Diagrams such as these can be thought of as describing `computer runs'. For example, suppose a Lisp system (called L) is processed interpretively by an interpreter written in Sun RISC machine code (call this M). The machine code itself is processed by the central processor (call this C) so two levels of interpretation are involved, as described by the following gure: