My private PL Zoo

After reading The Wizard Book and going through The SICP lectures, I found myself deeply interested in PL design and implementation. That naturally led me to further reading on the subject, as well as to the development of compilers and interpreters for toy programming languages.

With those projects, I got to learn about and implement:

  • Compilers for stack-based and register-based target architectures.
  • Interpreters for dynamically-typed languages.
  • Garbage collection algorithms.
  • Virtual Tables for OOP inheritance and dynamic dispatch.
  • Static typechecking.
  • Hand-written and tool-generated parsers.
  • First-class functions, closures and their low-level implementation.
  • Optimizations such as tail-call elimination, string interning and constant folding.
  • Different evaluation strategies (nondeterminism, lazyness).
  • Pattern-matching and unification.

Yet another Scheme Metacircular Evaluator

This is where the magic began: a Scheme interpreter written in Scheme following Chapter 4 of The Wizard Book. Differently than what most people do, however, I kept extending the first evaluator with the variations that followed in the next sections (instead of building them upon a minimal, scratch version).

The result turned up as a macro-less R5RS(-ish) implementation that separates syntactic analysis from the actual interpreter execution and additionally provides built-in amb and try-catch special forms for nondeterministic computing, as well as a retry command in the REPL that backtracks to the most recent nondeterministic fork and tries a different path.

Using the nondeterministic Scheme evaluator to solve the SEND+MORE=MONEY puzzle (source not shown).

Bytecode compiler and VM for Lox

This was my implementation of Robert Nystrom’s Crafting Interpreters single-pass compiler and bytecode virtual machine for the Lox programming language, coming from what is probably the best (also free) beginner-friendly resource (definitely much better than The Dragon Book) for learning about how programming languages are implemented. Most of it is written in standard C99, making use of features such as Flexible Array Members (FAMs), designated struct initializers and Variable-Length Arrays (VLAs, just once).

Notably, Lox is an efficient, dynamically-typed, object-oriented, garbage-collected scripting language with some runtime introspection and support for closures and first-class functions. The implemented clox version uses a hand-written recursive descent parser (a.k.a. Pratt parser) and compiles down to bytecode targetting a stack-based VM that encompasses a Mark & Sweep GC and includes many useful optimizations such as string interning, NaN boxing and computed gotos for efficient instruction dispatch (requires GCC’s labels-as-values extension).

A minimal BASIC simulator

Wanting to improve my french before going abroad on an exchange program, I decided to acquire some programming-related vocabulary by skimming through the original, untranslated version of Developing Applications With Objective Caml (DAWOC) while learning F# instead of OCaml. One example application in Chapter 6 caught my attention: it was a very minimal BASIC interpreter which I extended by implementing system directives (using as reference whatever I could find online about the old Microsoft BASIC) and by adding support for subroutines. The entire simulator is quite short given that BASIC is a very basic simple non-structured imperative language with dynamic types and no notion of scopes or even functions. In fact, a considerable part of its development time was dedicated to the hand-written shift-reduce parser, which was much harder to grasp than the top-down parsers I had coded until then.

Fizz buzz in BASIC. Hopefully the last thing I’ll ever program in this language.

Deca - A compiled subset of Java

I’ve been told all Ensimag students have at least one project in common: the Projet Génie Logiciel (french for “Software Engineering Project”). In this project, students have to work in groups of five to develop, in less than a month and using Agile practices, a full-blown compiler for Deca: a statically-typed, object-oriented language that resembles a subset of Java and is further specified in a 231-page long document that is given to students in the beginning of the project. It compiles down to a register-based architecture – inspired by the 68000 – that is used in Ensimag’s closed-source abstract machine.

The compiler uses ANTLR to generate a lexer and a parser from a description of the accepted tokens and grammar. Those are employed to build the AST that is then processed during the multiple verification and code generation passes that follow. Unfortunately, the project needed to be coded in Java and the initial skeleton used an Interpreter pattern for everything, meaning we had to deal with a ton of trivial files and the usual OOP inheritance abuse (worsened by the fact that the original codebase programmers had apparently never heard of interfaces, only abstract classes). In the end though, I was happy with the test suite we set up – allowing expected output to be written with regular expressions inside comments in the Deca test sources – and the fact that we got to implement virtual method tables and some basic constant folding optimization.

We also had the opportunity to extend the base language with anonymous functions, closures and functional object interfaces. In doing so, I finally scratched the itch to design something better than what Java has for functional types (how it forces users/libraries to define a different interface for every possible function arity, plus specific interfaces for primitive types), using something that looks like Lambda<ArgTypes*,ReturnType>.

Lazy (even) Streams library in Deca-λ (Deca + our functional extension), side by side with the generated assembly code.

Honorable Mentions

Logic programming in lisp

Another one from the venerable Wizard Book: a (lispy) logic programming language interpreter with built-in pattern-matching and unification.

Automatum-compiling macros

Here I simply implement the interpreter and the macro-time compiler for the FSM DSL as presented in Shriram Krishnamurthi’s excelent macro tutorial.

Assembling Forth

In order to test my custom CPU, I wrote a simple assembler script in Python to translate a Forth-like assembly language into executable machine code.

Java-embedded Scheme

On another project, I added a tree-walking interpreter to a distributed Java application in order to use Scheme as an embedded scripting language.