wee: A Tiny Instruction Set

wee is a tiny instruction set. There’s a C compiler for it. Try it out online!

wee is designed to make it as easy as possible to write C code for esoteric architectures and languages, especially if you don’t care as much about the size of the program. Check out the code here. 🐪

Specification

wee programs use a memory array and 2 registers, A and B. An instruction either has no arguments or exactly one immediate signed integer argument.

Instruction	Description
`mov n`	`A = n`
`swap`	`swap values of A and B`
`add`	`A += B`
`sub`	`A -= B`
`load`	`A = memory[A]`
`store`	`memory[A] = B`
`setlt`	`if A < B, then set A = 1, else A = 0`
`jmpz n`	`if A == 0, go to nth instruction`
`getc`	`A = getchar()`
`putc`	`putchar(A)`
`exit`	`stop the program`

Here’s a 30 line example python interpreter.

Note that the register size isn’t defined by wee. Instead, the runtime that you would write for your target is what defines it. You’d need to make sure your C program doesn’t overflow the register size that you’ve chosen.

Some highlights:

There are only two registers.
There are no redundant comparison instructions (jlt, jge, etc).
There is no jump register instruction.
There are no labels nor data directives, and the instructions don’t take register arguments. In other words, it’s easy to parse.

Why?

For very strange and contrived architectures (e.g., brainf***), it’s convenient to know all jump targets ahead of time. In the past, I used elvm to create a C to ROP compiler. The most annoying instruction to implement was the jump register instruction. It’s better to eliminate the jump register instruction at a higher level of abstraction rather than everyone work around it individually for their elvm backends.

Given that wee is designed to be easy to parse, it makes it very easy to get a PoC runtime/transpilation working on a new architecture.

How?

wee is a simple rewriting of elvm IR. The main tricks are:

To reduce the registers to just A and B, store all other elvm registers directly in memory as “pseudo registers”.
To get rid of redundant comparison instructions, rewrite all comparison instructions in terms of setlt and jmpz.
To eliminate the elvm jump register instruction, at compile time, deduce all possible elvm jump targets. Then, at runtime, to emulate a jump register instruction, read the register, binary search over the possible elvm destinations, and jump to the corresponding wee address.
To get rid of data directives, transform all data directives into a set of instructions that will write the data directly to the memory.

Limitations

Naturally, wee is pretty inefficient. Elvm itself is already inefficient given that it doesn’t have an optimizer. Transpiling to wee incurs an additional 6.5x instruction overhead. In comparison, elvm to brainf*** incurs about a 1335x overhead. I expect that the most improvement will be found in writing an optimizer for elvm IR. I took a stab at that some time ago, but haven’t made much progress. As of this writing, a hello world program is ~500 wee instructions.

Conclusion

Thanks for reading! I hope that it’s easier than ever to compile C to weird things like redstone computers, spreadsheets, and anything else that was never meant to be turing complete. If you find an interesting use case for wee I’d love to hear about it!