- Background
- Case-study: Naive Fibonacci
- The silly-k language
- silly-k by examples
- Conclusion and further ideas
This post is all about my silly-k project: a small experimental K-like, APL inspired, nanopass compiled, language targeting Malfunction. The code lives on GitHub and the tests are run by Travis.
The a posteriori motivation for the language is summarized into these two points:
- performance of recursive algorithms should be similar to the ML-family
- have syntax similar to K but be compiled to a native runtime making no use of type-checks
Disclaimer The silly prefix is used to indicate that the language is a fun experiment, and to indicate, by way of contrast, the amazing nature of APL and K.
Background
I wanted to implement a language with:
- a compiler in the nanopass style
- simply typed lambda calculus
- Malfunction
- type-inference (in particular I wanted to use the Hindley-Milner style type reconstruction I read about in TaPL §22.4)
- APL/J/K inspired syntax
The harmony between these chosen goals was not taken into account a priori, that was part of the adventure!
Malfunction is what you get when you cut the OCaml's compiler in half and put a simple lambda calculus-like syntax in front. You thus get the optimization, code-generation and all of OCaml's ecosystem with an easy to target language for your compiler.
Malfunction's input language is based on s-expressions, so I thought: why not write the compiler in Scheme using the nanopass-framework.
Before writing the compiler I had just read the book The Reasoned Schemer about miniKanren so Scheme was the language I was learning at the time, I had barely used it before reading that book.
During spring this year I got quite fed up with coding in the industry's JSON in, JSON out where all code seems intellectually useless.
The only remedy that worked was APL. Introduced to me by my friend and colleague inrick (thank you for suffering through all the silly-ness!). He also showed me K and J, preferring K himself.
For reference, here are some implementations of the mentioned languages:
- Dyalog APL, which is very well documented
- J, a related language created by Iverson, here's its vocabulary
- The official K, closed source (but it's the real thing!)
- Kona, open source with a well stocked wiki
- oK, open source, can be run in the browser
The code density is absolutely amazing, finally I can read the code without hurting my wrists, straining my memory or being forced to use an IDE. On this note I highly recommend checking out this talk by the author of co-dfns, and in particular the points he makes about tools and complexity and the benefits of the high semantic density (of APL).
I was hooked immediately, and its legacy can't be disputed nor its power.
I started writing my solutions to Project Euler using Dyalog APL, and with just about 6 lines of code per solution it was an enormous contrast to my everyday job writing Scala, which, though terser than Java, you're strained to write anything other than Hello World using 6 lines of code. The fun part was that it wasn't even difficult to solve the problems, often the solution could be read verbatim without jumping around in the source code, and the speed with which one can translate an idea into code is amazing.
Clearly Iverson was on to something.
Enter the K language, which is a spiritual successor to APL, featuring (among many many things) heavily overloaded operators. My idea was to use the type-inference to resolve the overloading, so that the runtime code does not have to do any type-checks.
However, my thinking processes are still recursive and haven't yet fully adapted to a vector based mind-set.
Case-study: Naive Fibonacci
I was surprised the naive Fibonacci algorithm had such atrocious performance in K:
fib:{$[x<2;1;_f[x-1]+_f[x-2]]}
Of course it's not a good algorithm to start with, nor very suitable to evaluate
in an array focused language.
But when I went back to Haskell
or OCaml
or C,
they all allowed me to be naive for a while.
On my laptop it took about 8 minutes for fib 44
to terminate in K,
compared to around 4-5 seconds in the other languages I tried.
At this point I thought, what might be going on here? I then tried the same code in Python, which terminated after about 4 minutes. One hypothesis is that it's the interpretative nature of K and Python that make them so slow. That seems to be part of the explanation for Python at least, with pypy the time was 22 seconds. I also tried Chez scheme, which had just slightly worse performance (clocking in at 6 seconds). I'm thinking runtime type-checks does not impact performance in that magnitude.
Question then: what would be the performance of K if it were compiled? As far as I know, there's no compiling variant of K, and so that serves as part of the motivation for implementing silly-k.
The silly-k language
These are some of the intended features of the language:
- compiled (via Malfunction)
- interpreted (via Scheme backend)
- anonymous functions in a dfns style
with
a
andw
as the ASCII counterparts forα
andω
- overloaded operators that are resolved using type-inference during compilation
- coercion of booleans to integers, meaning that you can for instance sum a vector of booleans
- let-bindings with scoping rules similar to K's
The basic idea of translating the APL or K-like syntax is to
transform the expression l{a+w}r
into an intermediate form:
(apply (apply (lambda (w) (lambda (a) a+w)) r) l)
Note here that the right-hand argument is evaluated before the left-hand element, which indicates that expressions should both be read and evaluated from right to left.
The source code is parsed using a parser generated by the lalr-scm parser generator. This parser generator was used primarily because at the time I was not interested to write a parser myself, the focus was on the compiler internals.
This intermediate language is transformed by the compiler
(through 17 intermediate languages and 22 passes) into an underlying language with,
for instance, only monomorphic, scalar and dyadic
arithmetic operators, in contrast to the source language where +
can be applied
to both scalars and vectors on both sides.
The underlying language of the compiler is then translated (quite trivially at this point) to either Scheme, for the REPL, or into Malfunction for compilation.
Overloading
Overloaded types of a term is handled by providing an ordered list of alternative sets of constraints for the unification algorithms to try.
Consider the +
operator, it can be given 4 possible types:
int -> int -> int
(from the expression:1+2
, i.e. scalar addition)(vector int) -> int -> (vector int)
(from the expression:1+2 3 4
, i.e. distributed partially applied addition, map)int -> (vector int) -> (vector int)
(from the expression:1 2 3+4
, i.e. distributed partially applied addition, map)(vector int) -> (vector int) -> (vector int)
(from the expression:1 2+3 4
, i.e. point-wise addition, zip)
Here I'm using the conventional arrow-style of encoding functions types.
Internally the type a -> b -> c
is represented simply as
(lambda a (lambda b c))
Meaning that when the unification algorithm has given +
the type (vector int) -> int -> (vector int)
, we can replace
that +
with the following expression based on the primitive scalar operator +
instead:
(lambda (ys) (lambda (x)
(apply
(apply
(primfun map)
(lambda (y)
(apply (apply (primfun +) y) x)))
ys)))
Here I've omitted the type annotations included in the actual intermediate language,
the primitive function map
has the expected type polymorphic type (a -> b) -> (vector a) -> (vector b)
.
Coercion
Using the overloading mechanism described above, it was quite easy implement coercion of booleans into integers. In particular the following expression should type and evaluate correctly:
+/0=1 0 2 0 3
to 2
.
Here a coercion is necessary since 0=1 0 2 0 3
has type (vector bool)
and we want to count the true
values of that vector.
With knowledge of the OCaml runtime code one might interject:
"Why not treat them the same, i.e. just a word?"
Well, at least we expect only boolean values in conditionals, and
in the OCaml code we can throw away the actual coercion when generating
code.
Note that extending conditionals to allow integers and treat any non-zero
value as true
would after code generation be exactly the same as having
explicit coercion.
Hence I preferred to have more type information during compilation, not over-complicate
the conditional expression, and chiefly confirm the flexibility of the
nanopass compiler by adding explicit coercion by reusing the overloading mechanism,
which was already introduced.
Adding coercion was just adding another pass that introduced coercion-points
surrounding each expression in an apply
node.
Assume that the wrapped expression has type a
and that the coercion-point is given
the fresh type b
, then these constraints are tried in order:
b = a
a = bool
andb = int
a = (vector bool)
andb = (vector int)
After unification using these types, the coercion-points that were given the same type (first case above) are thrown away, the others are translated into corresponding primitive functions, taking the appropriate action when generating Malfuction-code (just identity function) or Scheme.
At-operator
Having coercion in the language already simplified implementation of
the At operator. As an example, assume l
is a list with values 1 2 3
,
then l@1
should evaluate to 2
.
That is, I'm following K and have ⎕IO←0
(i.e. the index origin from APL for the uninitiated).
The @
symbol is only used in the parser to generate
a monadic apply of the left-hand side to the right-hand side.
This means that when we try to type l
we would want to give it the type
of int -> a
assuming that l
has type (vector a)
.
That can be expressed using the coercion mechanism. Assume that the wrapped expression
has type a
and the coercion point has type b
, then these constraints are tried:
a = (vector c)
and b = int -> c
, where c
is a fresh type-variable.
After unification these coercions are easily detectable by their given types and translating them into appropriate primitive functions is easy.
Let-bindings using a "let-spine"
Consider the expression x+x:7
, which is interpreted as:
- evaluate 7
- bind the result of the evaluation the name
x
- apply
+
tox
(I'll denote the partial application as+x
) - apply
+x
tox
This expression is translated by silly-k into this AST:
(apply (apply plus (let (x) 7)) x)
and represented graphically:
However, the scoping rules of the lambda calculus do not accept this. In lambda calculus with let-bindings this would read:
(apply (apply plus (let (x 7) x)) x)
The lexically scoped binding of x
does nothing to help
us interpret the outer x
.
What we want is closer to dynamic scoping: once the x
is bound the binding
is in scope for any subsequent evaluation (unless it's rebound of course).
To stay within the realms of lambda calculus and its lexical scope, silly-k
binds a name to every evaluated value, called its spine-value, so when the x
is encountered it's substituted for the corresponding spine-value.
The AST gets rebalanced into a right-biased tree, from which I took its name, the let-spine:
Note that the apply nodes have only spine-values as leafs, indicating that both have been evaluated before and have been given bindings for eventual referrals late in the expression.
This is not a particularly beautiful nor performant (compile time and optimization time is severely impacted), but it's a simple solution that allows a more direct translation into simply typed lambda calculus.
Interestingly, the OCaml compiler manages to optimize away most of the generated let:s (i.e. the lambdas and apply:s).
If reference cells were added to Malfunction this approach would not be necessary. However, for me it's soothing to know that it's possible to translate at least some of the scoping of K to simple lambda calculus.
Of course one can easily achieve a reference cell or like we wanted here a mutable dictionary
by referring to linked in OCaml code using the global
syntax in Malfunction.
I felt this is not in line with the goal of silly-k: translate K into a somewhat
pure lambda calculus.
silly-k by examples
These examples are taken directly from the tests.
Numbers and comparisons
Code | Result
-----|-------
]7
| 7
]1 2 3
| 1 2 3
]1=2
| 0
]2=2
| 1
]1 2 3=3
| 0 0 1
]1=1 2 3
| 1 0 0
]~1=2
| 1
]~2=2
| 0
]2<3
| 1
]2<2
| 0
]3<2
| 0
]1<1 2 3
| 0 1 1
]1 2 3>2
| 0 0 1
Arithmetic operators
Code | Result
-----|-------
]1+2 3
| 3 4
]1 2+3 4
| 4 6
]1 2+3
| 4 5
]2-3
| -1
]1-(-2)
| 3
]1 2-3 4
| -2 -2
]-7
| -7
]-(-2)
| 2
]1-2 3
| -1 -2
]1 2-3
| -2 -1
]2*3
| 6
]1 2*3
| 3 6
]4*2 3
| 8 12
]1 2*3 4
| 3 8
Array operators
Code | Result
-----|-------
]!4
| 0 1 2 3
]*1 2 3
| 1
]#1 2 3 4
| 4
]4#1 2
| 1 2 1 2
]3#1
| 1 1 1
]1 2 3@1
| 2
]1 2 3@0 2
| 1 3
]&1 2 3
| 0 1 1 2 2 2
]7 8@&2 3
| 7 7 8 8 8
Each and reduce adverbs
Code | Result
-----|-------
]{w+1}'1 2 3
| 2 3 4
]{1-w}'3 4 5
| -2 -3 -4
]2+'3 4 5
| 5 6 7
]2-'3 4 5
| -1 -2 -3
]2{a+w}'3 4 5
| 5 6 7
]2{w-a}'3 4 5
| 1 2 3
]{w@1}'{+w}'1 2 3
| 2 3 4
]{w@1}'{-w}'1 2 3
| 0 -1 -2
]+/1 2 3
| 6
]-/1 2 3
| 2
]{w-a}/1 2 3
| 0
]+/2<!5
| 2
]&/0<1 2 3
| 1
]&/0<1 0 3
| 0
Conditionals
Code | Result
-----|-------
](1=1;2;3)
| 2
](1=2;1 2;3 4)
| 3 4
]7{w=1;w;a}1
| 1
]7{w=1;w;a}8
| 7
Recursion
Code | Result
-----|-------
]{w=0;0;w+_f(w-1)}6
| 21
]{w=1;1;w=2;1;(_f(w-2))+_f(w-1)}1:
| 13
Bindings
Code | Result
-----|-------
]x+x:7
| 14
](x:1)+x:2
| 3
]x+(x:1)+x:2
| 4
]x+x{x:1+a-w}x:2
| 3
Note about ]
The ]
operator is the output operator (taken from J).
It was necessary to add it in order to give more information to the type-inference,
so that the result is closer to what the user expects.
For example consider the *
operator, it has two interpretations:
- the monadic Head operator:
(vector a) -> a
- the dyadic multiplication operator:
int -> int -> int
(for brevity, the distributed variants are omitted)
Since ]
only has types: int -> int
and (vector int) -> (vector int)
,
then the *
operator in the expression ]*1 2 3
will be given the monadic type,
since the other type(s) all have higher arity, and so can't be displayed by ]
.
Conclusion and further ideas
At the end of the day, did I achieve the things I set out to do? I think yes:
- the compiler both fun and not particularly hard to implement (fun times achieved over the vacation and some weekends)
- the syntax became quite K-like, of course there's a lot missing, but the struts are in place
- the performance of the naive Fibonacci is clocking it at 5 seconds, without any particular thought or extra optimization passes
One idea I had even from the start of the project was to write a Church encoding-backend or target something like unlambda. But I'll leave that one as an exercise for the reader :D