In this blog post I'm going to analyse how loader.c program works, based on full.c Loader provided. You can check out that last program here.

Note that I don't fully understand structure of the program itself, so if you could help me understand some parts of it, I'll be really thankful.

## Analysis

typedef int Tree; typedef int INT; typedef int TREE; typedef int BitStream; #define DESCEND xx Tree lastRight, accumulate;

This is the header. Multiple types are defined, I suppose just for simplicity, or because this is required by compression Loader used to get 512-character program. DESCEND is defined to be the same as xx, if I understand correctly. It's used once in program, and the first thing I don't understand is what it's for. Two global variables are also defined.

TREE Pair (TREE yy, TREE xx) { return yy - ~yy << xx; } TREE Right (TREE xx) { return lastRight = xx % 2 ? 0 : 1 + Right (xx / 2); } TREE Left (TREE xx) { return xx / 2 >> Right (xx); }

As Loader's program works on integers (of defined type TREE) way of encoding expressions is needed. Simple pairing is enough for this (function used is \(\langle yy,xx\rangle\rightarrow 2^{xx}(2\cdot yy+1)\). Functions Left and Right just reverse the process. Note that when using function Right, result is stored in lastRight variable. Left function calls Right function, so lastRight is again overwriten with the result.

// Encoding // PI(A,B) = Pair(0,Pair(A,B)) // LAMBDA(A,B) = Pair(1,Pair(A,B)) // APPLY(A,B) = Pair(2,Pair(A,B)) // STAR = Pair(3,0) = 7 // BOX = Pair(3,1) = 14 // VAR(n) = Pair(4+2n,0) = 9 + 4n [n >= 0] // The empty context is 0, and the context Gamma,A is Pair (A,Gamma). // STAR and BOX are the only terms x with (x&2)!=0

At this point Loader explains in comment the encoding used. As of yet I'm not exactly sure what each of these mean, but I guess I'll learn as I'm getting into the program.

Details of the above encodings are only needed to understand the program itself, because the Turing machine I'm going to make (YES, this will happen!) will use more obvious encoding.

#define Lift(xx) Subst (4, 13, -4, xx)

This is shorthand for special case of not-yet-defined function Subst. Basically, if xx encodes some CoC expression, Lift will replace every variable with different variable. I guess this is used when some new variable is introduced.

TREE Subst (INT vv, TREE yy, INT context, TREE term) { Tree aux = Left (term), xx = lastRight;

This is beginning of definition of Subst. It's used to do operations on variable term, which stores some expression. Recall that each number encodes a pair - left element of a pair is an operation applied to the right element. They are stored in variables aux and xx, respectively.

{ return aux - 2 ? aux > 2 ? aux - vv ? term - (aux > vv) * context : yy : Pair (aux, Pair (Subst (vv, yy, context, Left (xx)), Subst (vv+2, term = Lift (yy), context, Right (xx)))) : Apply (Subst (vv, yy, context, Left (xx)), Subst (vv, yy, context, Right (xx))); } }

Clever use of if-then-else - first we ask if aux is 2. If it is, then operation of term is APPLY (c.f. encoding), and we skip to line with Apply function, which is yet to be defined.

If aux isn't 2, then we ask if it's greater than 2. If it is, then we perform substitution. If aux is greater than vv, then we subtract value of context from it. If it's smaller, we do nothing, if aux = vv, then we return yy, so we replace vv with yy.

If aux=0 or =1, then we will go deeper into the tree, leaving operation (aux) as it was. To the left element of the body same Subst is applied, while to right element we apply slightly changed Subst. My guess is that it's here to leave variable vv like it is in the right element.

Now we can make more sense of Lift(xx):

#define Lift(xx) Subst (4, 13, -4, xx)

We go up the tree, and from every variable >13 we subtract -4, i.e. we add 4, so we change VAR(n) to VAR(n+1), and we replace 4 with 13. Three things I don't really get - because 14>13, we are not sure if BOX won't be changed along with all variables. Possibly this does no harm, we might see as we go into the program. Other thing is that why are we changing 4 to 13? I'm not skilled CoC, so maybe this somehow introduces new variable, I;m not sure what is the role of PI in CoC. Last but not least, why not 9 instead of 13? No variable, box or star has term between 9 and 13, so we could just change 4 to 9 and save one precious character?

TREE Apply (TREE yy, TREE xx) { return Left (yy) - 1 ? 5 << Pair (yy, xx) : Subst (4, xx, 4, Right (lastRight)); }

We are asking if operation in the first term is LAMBDA - if it isn't, we will not do anything - we return Pair(2,Pair(yy,xx))=APPLY(yy,xx). If it is LAMBDA, we perform substitution - we again replace 4 with xx, reduce index of each variable and return the result.

#define MAYBE (xx /= 2) % 2 &&

This macro first halves xx (ignoring the last bit) and then takes last bit of this half and performs logical AND of this bit and whatever follows. In short - we take last but one bit and AND it with what follows. I'm not really sure what the last thing is for, but there will come time for this.

TREE Derive (BitStream xx) { Tree aux, auxTerm, context = 0, term = 7, type = 14;

Here starts the definition of Derive. First we define few variables.

while (DESCEND && Derive (xx - 1), MAYBE (1))

As Loader writes, in this while loop there is main recursion - we compute Derive(xx-1), and as we will see, the result will be stored in global variable accumulate, which will do nothing but increase. There is also a tricky way thanks to which we can be sure we won't, at any point, run into the problem of computing D(-1) or anything - if DESCEND, defined as xx, is equal to 0, then, because of how C works with logical connectives, it won't even look at the right side of &&.

Now, MAYBE is a macro, which first halves xx (this is important) and looks at its last bit, which is second-to-last bit of old xx. It also uses && on this bit and (1). What for? Not sure. I guess compressing reasons.

auxTerm = Left (Left (Derive (xx))), aux = Left (lastRight), xx = Left (lastRight),

These three lines give us parts of result of Derive(xx), note that this is not circular, as we halved xx in MAYBE macro. By looking at the result of Derive, we see that auxTerm is (the value of) term, aux is type, lastRight becomes context and xx becomes what it became in Derive(xx/2). Let's see what it is exactly - in Derive(xx/2) it takes the value taken in Derive(xx/4), but divided by 32 (because of 5 MAYBEs later). That value which we divide is the value xx takes in Derive(xx/8), but divided by 32, and so on. So for me, it's clear that this value turns out to be 0. I think I'm missing something important here, because MAYBE is still taking bits out of it.

- analysis temporarily stopped until I figure out what happens to xx*

## Questions

- Why is DESCEND defined?

By looking at reduced.c DESCEND is replaced with x, so I suppose FBZ was right that it's only for debugging purposes.

- Shouldn't 13 in definition of Lift be larger?

As of now, I don't know if BOX, encoded by 14, isn't in risk of getting changed by Lift. I'm quite sure that, with properly coded term, we won't hit any point with BOX being changed, but I decided to leave it as a question for now.

- Couldn't 13 in definition of Lift be smaller?

13 is code for VAR(1), but Loader's encoding allows us to have VAR(0), coded by 9. If we instead used 9, we could save one character and use it to make output larger. Loader surely isn't dumb, so there must be a reason on why he kept it as 13.

- Why are we changing 4 to 13 in Lift?

I'm not asking "why 13?", because it's question above, but "why 4?". 4=PAIR(0,PAIR(0,1))=PI(0,1). I'm not sure what PI exactly does. According to one of Loader's comments in program it's "pi formation", ~~and googling that didn't provide me any explanation, just inference rules involving it.~~ I found this.

I think I might have solved the mystery - by looking at what Apply function does, I think that 4 might be the code for variable in the scope of nearest lambda.

- Why do we have (1) in the definition of while loop?

My guess is that for compressing purposes Loader put ( inside MAYBE macro (you can see this in loader.c program), and to make syntax correct he had to do (stuff) && (1).

- Doesn't xx become 0 inside Derive?

By looking at what happens to xx, I think it does, but it shouldn't.