Home / News / Implementing Forth in Go and C

Implementing Forth in Go and C

I first ran into Forth about 20 years ago when reading a book about
designing embedded hardware.
The reason I got the book back then was to actually learn more about the HW
aspects, so having skimmed the Forth chapter I just registered an “oh, this is neat”
mental note and moved on with my life. Over the last two decades I
heard about Forth a few more times here and there, such as that time when
Factor was talked about for a brief period, maybe
10-12 years ago or so.

It always occupied a slot in the “weird language” category inside my brain, and
I never paid it much attention. Until June this year, when a couple of factors
combined fortuitously:

And something clicked. I’m going to implement a Forth, because… why not?

So I spent much of my free hacking time over the past two months learning
about Forth and implementing two of them.

Forth: the user level and the hacker level

It’s useful to think of Forth (at least standard Forth,
not offshoots like Factor) as having two different “levels”:

  1. User level: you just want to use the language to write programs. Maybe
    you’re indeed bringing up new hardware, and find Forth a useful
    calculator + REPL + script language. You don’t care about Forth’s
    implementation or its soul, you just want to complete your task.
  2. Hacker level: you’re interested in the deeper soul of Forth. Isn’t it
    amazing that even control flow constructs like IF...THEN or loops like
    BEGIN...UNTIL are just Forth words, and if you wanted, you could implement
    your own control flow constructs and have them be first-class citizens, as
    seamless and efficient as the standard ones?

Another way to look at it (useful if you belong to a certain crowd) is that
user-level Forth is like Lisp without macros, and hacker-level Forth has macros
enabled. Lisp can still be great and useful without macros, but macros take
it to an entire new level and also unlock the deeper soul of the language.

This distinction will be important when discussing my Forth implementations
below.

goforth and ctil

Logo of goforth

There’s a certain way Forth is supposed to be implemented; this is how it was
originally designed, and if you get closer to the hacker level, it
becomes apparent that you’re pretty much required to implement it this way –
otherwise supporting all of the language’s standard words will be very
difficult. I’m talking about the classical approach of a linked dictionary,
where a word is represented as a “threaded” list [1], and this dictionary is
available for user code to augment and modify. Thus, much of the Forth
implementation can be written in Forth itself.

The first implementation I tried is stubbornly different. Can we just make a
pure interpreter? This is what goforth
is trying to explore (the Go implementation located in the root directory of
that repository). Many built-in words are supported – definitely enough to
write useful programs – and compilation
(the definition of new Forth words using : word ... ;) is implemented by
storing the actual string following the word name in the dictionary, so it can
be interpreted when the word is invoked.

This was an interesting approach and in some sense, it “works”. For the user
level of Forth, this is perfectly usable (albeit slow). However, it’s
insufficient for the hacker level, because the host language interpreter (the
one in Go) has all the control, so it’s impossible to implement IF...THEN in
Forth, for example (it has to be implemented in the host language).

That was a fun way to get a deeper sense of what Forth is about, but I did want
to implement the hacker level as well, so the second implementation –
ctil – does just that.
It’s inspired by the jonesforth
assembly implementation, but done in C instead [2].

ctil actually lets us implement major parts of Forth in Forth itself. For
example, variable:

: variable create 1 cells allot ;

Conditionals:

\ IF, ELSE, THEN work together to compile to lower-level branches.
\
\ IF ... THEN compiles to:
\   0BRANCH OFFSET true-part rest
\ where OFFSET is the offset of rest
\
\ IF ... ELSE ... THEN compiles to :
\   0BRANCH OFFSET true-part BRANCH OFFSET2 false-part rest
\ where OFFSET is the offset of false-part and OFFSET2 is the offset of rest
: if immediate
  ' 0branch ,
  here
  0 ,
  ;

: then immediate
  dup
  here swap -
  swap
  ! ;

: else immediate
  ' branch ,
  here
  0 ,
  swap
  dup
  here swap -
  swap
  ! ;

These are actual examples of ctil’s “prelude” – a Forth file loaded before any
user code. If you understand Forth, this code is actually rather mind-blowing.
We compile IF and the other words by directly laying our their low-level
representation in memory, and different words communicate with each other
using the data stack during compilation.

Thoughts on Forth itself

Forth made perfect sense in the historic context in which it was created in
the early 1970s. Imagine having some HW connected to your computer (a telescope
in the case of Forth’s creator), and you have to interact with it. In terms
of languages at your disposal – you don’t have much, even BASIC wasn’t invented
yet. Perhaps your machine still didn’t have a C compiler ported to it; C
compilers aren’t simple, and C isn’t very great for exploratory scripting
anyway. So you mostly just have your assembly language and whatever you build
on top.

Forth is easy to implement in assembly and it gives you a much higher-level
language; you can use it as a calculator, as a REPL, and as a DSL for pretty
much anything due to its composable nature.

Forth certainly has interesting aspects; it’s a concatenative language,
and thus inherently point-free.
A classical example is that instead of writing the following in a more
traditional syntax:

eat(bake(prove(mix(ingredients))))

You just write this:

ingredients mix prove bake eat

There is no need to explicitly pass parameters, or to explicitly return results.
Everything happens implicitly on the stack.

This is useful for REPL-style programming where you use your language not
necessarily for writing large programs, but more for interactive instructions to
various HW devices. This dearth of syntax is also what makes Forth simple
to implement.

All that said, in my mind Forth is firmly in the “weird language” category;
it’s instructive to learn and to implement, but I wouldn’t actually use it
for anything real these days. The stack-based programming model is cool for
very terse point-free programs, but it’s not particularly readable and hard
to reason about without extensive comments, in my experience.

Consider the implementation of a pretty standard Forth word: +!. It expects
and address at the top of stack, and an addend below it. It adds the addend to
the value stored at that address. Here’s a Forth implementation from
ctil’s prelude:

: +!        ( addend addr -- )
  tuck      ( addr addend addr )
  @         ( addr addend value-at-addr )
  +         ( addr updated-value )
  swap      ( updated-value addr )
  ! ;

Look at that stack wrangling! It’s really hard to follow what goes where without
the detailed comments showing the stack layout on the right of each instruction
(a common practice for Forth programs). Sure, we can create additional words
that would make this simpler, but that just increases the lexicon of words to
know.

My point is, there’s fundamental difficulty here. When you see this C code:

int func(int a, int b) {
  return foo(a, bar(b));
}

Even without any documentation, you can immediately know several important
things:

  • bar has one parameter and one return value
  • foo has two parameters and one return value
  • func also has two parameters and one return value
  • It’s immediately obvious how the various values flow from one function call
    to the next.

Written in Forth [3]:

: func bar foo ;

How can you know the arity of the functions without adding explicit comments?
Sure, if you have a handful of words like bar and foo you know like the
back of your hand, this is easy. But imagine reading a large, unfamiliar code
base full of code like this and trying to comprehend it.

Summary and links

The source code of my goforth project is on GitHub; both
implementations are there, with a comprehensive test harness that tests both.

The learn Forth itself, I found these resources very useful:

To learn how to implement Forth:

Implementing Forth is a great self-improvement project for a coder; there’s a
pleasantly challenging hump of understanding to overcome, and you gain valuable
insights into stack machines, interpretation vs. compilation and mixing these
levels of abstraction in cool ways.

Also, implementing programming languages
from scratch is fun! It’s hard to beat the feeling of getting to interact with
your implementation for the first time, and then iterating on improving it
and making it more featureful. Just one more word!


[1] This has nothing to do with threads in the sense of concurrency.
Rather, it’s thread like in sewing, where the elements of the list
are all connected to each other as if with a thread. See
this page for
more details.
[2]

Which is another deviation from the norm. Forth is really supposed to
be implemented in assembly – this is what it was designed for, and it’s
very clear from its structure that it must be so in order to achieve
peak performance.

But where’s the fun in doing things the way they were supposed to be
done? Besides, jonesforth is already a perfectly fine Forth implementation
in assembly, so I wouldn’t have learned much by just copying it.

I had a lot of fun coding in C for this one; it’s been a while since I
last wrote non-trivial amounts of C, and I found it very enjoyable.

[3] Assuming the convention that multi-parameter functions have their
parameters pushed to the stack from left to right.
Tagged:

Leave a Reply

Your email address will not be published. Required fields are marked *