FunCPU - 7 bit Homebrew CPU dedicated to Functional Programming: June 2014

2014-06-27

FunCPU - Evaluation Strategy

It is not trivial how expressions should be reduced, i.e. evaluated. I have already mentioned that while some strategy could be fruitful in reducing an expression in a given interpretation, others may be not.

Let us have a look at the following simple example of evaluation of the factorial function is defined under the alias fac.
fac(n):= if n=0 then 1 else n*fac(n-1)

The reduction of fac(1) may lead to

if 1=0 then 0 else 1*fac(0)
If 1=0 then 0 else 1*(if 0=0 then 1 else 0*fac(-1))
If 1=0 then 0 else 1*(if 0=0 then 1 else 0*(if -1=0 then 0 else -1*fac(-1))) .......

We can easily see that the reduction goes on forever, it never terminates.

The goals of the evaluation strategy are as follows:

Must terminate (simple applicative rules may result in infinite loop)
Must be efficient (as possible).
Must be simple (suitable to be represented in hardware directly).

The following strategy will be applied:

A user-defined function is reduced if and only if all of its arguments are constants. Similarly, inc and dec are reduced only, if their arguments are literals. As far as if-then-else is concerned: if its condition part is constant 0 (i.e. true), then it is reduced to exp1. If the condition part is a constant other than 0, then it is reduced to exp2. Othewise the reduction is postponed.

Corollary: all parameters passed to functions are constants.

The reduction algorithm is as follows:

Scan input expression symbol by symbol.
Copy symbol from source to output if symbol is not a function, or its a function but cannot be reduced in its current form (i.e. contains at least one argument, which is not a constant).
Evaluate functions (if, inc, dec, and user-defined) if possible. That is, rewrite the function call by replacing the call with the function definition with bounded arguments.
If EOX symbol is found, then the whole cycle restarts. Note: in the next cycle: the current output becomes the input and vice versa.
Stop, if the first symbol is a constant. This literal is the result of the reduction.

It is clear that inc, dec terminates immediately, thus reduces the length of expression. The expression size also shrinks when in the expression „if cond exp1 exp2” cond is a constant. Eventually everything is based on/can be reduced to the three built-in functions. Therefore sooner or later, all „if”, „inc”, „dec” are evaluated/reduced, thus the evaluation terminates (if this is possible).

2014-06-22

FunCPU - Memory Models

Needless to say that not only initial and final expression, but also temporary expressions should be stored in memory somehow. The way of how expressions are represented has impact on hardware design.

During reduction the cycles, expression symbols are being copied one by one. When a function symbol is found, then the symbol itself along with possible other consecutive symbols are rewritten in accordance with the function definition (this is used in a broader sense, and includes definition of built-in functions).

Please note that the following discussion is related only to expression evaluation, as user-defined functions are stored in a separate memory, possible in ROM.

Separate source and destination memory

Separate fragment of memory is used for source and destination expressions. Upon each reduction cycle the role of the memories are swapped. Source memory will become destination and vice versa. This is illustrated in the figure below.

If l is the longest possible size (in memory cells including EOX) of the expression during the evaluation, then the minimum memory requirement is 2xl cells.

Shared memory, with index increasing and decreasing

A shared memory can also be used. In this case source index is increasing, whereas destination is decreasing while copying symbols. After each evaluation cycle, the role of the indexes will be swapped again. This memory model may be more econimic, since in some cases it will be suitable to reduce expressions over half of the memory size, provided that the reduced expression becomes smaller, satisfying the following equation, which in fact should hold in every cycle: e+r≤m, where e, r and m denote expression size (excluding EOX), reduced expression size and memory size respectively.

Shared memory with index following

The last model is also comparable to the previous one in econimical terms, as the same equation among expression lengths and memory size must hold. However, it has the additional benefit, namely that index roles do not need to be changed. This simplifies hardware design, as multiplexers (swapping the source and destination registers) and its relevant control signal and logic can be avoided. As source index reaches the end of input expression marked by the EOX symbol, it should just copy and step over that symbol and the continue with the next reduction cycle. If memory end is reached, then indexes will wrap around starting from a lower location.

2014-06-01

FunCPU - Functional Encoding Scheme

In the previous post we have seen how 8 bit symbols represent literals, arguments, functions, etc. Similarly, it was vital to have a good and efficient function encoding. Basically in the context of function encoding we need to be able to answer the following question:

Where does the function definition begin?
Where does it end?
How many arguments does a function have?

First, I have planned to use function identifiers as references to physical memory locations. So, for instance, functions will be indexed by starting from 0. But this indirection could cost a lot in terms of hardware.

Instead, I am using the actual symbol to determine the function address directly. The most significant bit itself signals whether the symbol in question is a function (it is a function, if it is on, except for EOX). The consecutive 5 bits multiplied by 8 will actually give the function physical address. For example, %1aaa.aaxx - function definition will start at physical binary address %aaaa.a000.

The observant reader may be wondering what the two least significant bits are used for. They represent the argument count encoded in two-complenent form as follows:

- 4 arguments
%01 - 3 arguments
%10 - 2 arguments
%11 - 1 argument.

Again, this rather strange encoding is selected to facilitate the processing in hardware.

In the light of the aforementioned, a full function encoding works as follows:

%10000000 - is a user-defined function with 4 arguments, of which definition starts at address $00.
%10000111 - is a user-defined function with one argument with definition beginning at address $08.
%11111010 - is a user-defined function with 2 arguments definition at address $F0.

Please also note that built-in function "if" is encoded in accordance with the above arity encoding, i.e $FD (symbol encoding "if") represents a 3 argument function. I should mention that the encoding of inc/dec does not conform to this, but they are being treated differently as we will see.