The Symbolic Programming Language
=================================
a guide for humans, from zero
Welcome! This is a complete, from-scratch introduction to Symbolic, a small systems language with no keywords. If you have written code in any language before, you can finish this guide and be productive. If you have never programmed, you can still follow along - every symbol is explained the first time it appears.
It is modeled on the structure of The Rust Programming Language: we start by getting a program running, then build up concepts one at a time, each with runnable examples.
Prefer learning by doing? Work through Learn Symbolic by Building first - seven hands-on projects from a tip calculator to Conway's Game of Life - then come back here for the systematic reference.
How to read the examples. Every block of code in this guide is a real program or fragment. Lines beginning with
:::are comments - the compiler ignores them, and we use them to show expected output.
Table of contents
- Getting Started
- The Symbol Grammar
- Registers & Mutability
- Numbers & Operators
- Data Flow & Width
- Control Flow
- Functions
- Memory & Segments
- Hashes
- Structs, Enums & Matching
- Generics, Traits & Closures
- Ternary Computing
- Building & Targets
1. Getting Started
1.1 Install the compiler
The recommended path is install.sh, which builds the Rust seed (symc0),
bootstraps the self-hosting compiler (symc), compiles the package manager
(sigil), and puts everything on your PATH:
git clone <this-repo> symbolic && cd symbolic
bash install.sh
source ~/.symbolic/env
There is no assembler, linker, or C compiler in Symbolic's pipeline.
symc writes finished executables by itself.
To build only the Rust seed for development:
cargo build --release -p symc0
# produces ./target/release/symc0
1.2 Your first program
Create a file hello.sym:
::: hello.sym - my first Symbolic program
((hello, world\n)) > @screen
!!
Compile and run it (the self-hosting symc reads source from stdin and writes
a binary to stdout):
symc < hello.sym > hello && chmod +x hello && ./hello
::: hello, world
Using the Rust seed directly (symc0 takes a filename and -o):
./target/release/symc0 --no-run --target x64-linux -o hello hello.sym
./hello
::: hello, world
Let's read those two lines:
((hello, world\n)) > @screen
'-----.-----' | '--.--'
a string | the screen
literal |
'-- "flow this value into that destination"
!!
'-- halt the program (like `exit(0)`)
(( ... ))is a string literal. Escapes:\nnewline,\ttab,\rcarriage return,\0NUL,\\backslash,\"quote. To print parentheses, escape them —\(→(and\)→)— so they aren't mistaken for the closing)). (A lone)not followed by another)also works unescaped.) These are resolved at compile time into the literal's bytes: no runtime cost.>means flow - send the value on the left to the destination on the right.@screenis the screen segment - standard output.!!halts the program.
That's a whole program. No main, no imports, no boilerplate. Top-level
statements are the program.
1.3 Printing a number
@screen prints raw bytes. To print a number as decimal, use the built-in
:wrint ("write integer"):
:wrint { [42] } !
::: 42
We'll explain the :name { [args] } ! call shape in Chapter 7;
for now, treat :wrint { [x] } ! as "print the number x followed by a newline."
2. The Symbol Grammar
Symbolic has no reserved words. Instead, a handful of rules generate the entire language. Learn these four and the rest is recognition, not memorization.
+-----------------------------------------------------------------
| RULE 1 . - negates flips an operator's meaning
| RULE 2 . + extends loosens / broadens an operator
| RULE 3 . , reduces shrinks width, adds delay
| RULE 4 . spacing matters touching = modifier
| spaced = standalone token
+-----------------------------------------------------------------
plus: doubling / tripling intensifies ( + -> ++ -> +++ )
Watch one symbol family grow from these rules:
+ add ::: base
++ multiply ::: RULE: doubling intensifies "increase"
+++ power ::: tripling intensifies further
- subtract ::: RULE 1: "-" is the decreasing counterpart
-- divide
--- modulo
And the ? family:
?[c]{ ... } run once if c is true ::: the conditional
?[c]{ ... }? loop while c is true ::: RULE 4: trailing `?` = back-edge
-?{ ... } else ::: RULE 1: `-` negates the `?`
?? pattern match
Two more conventions you'll see everywhere:
:::starts a comment to end of line.- Sigils introduce names:
$xis a register,:fis a label/function,::Tis a type,#his a hash cell,@sis a memory segment. The sigil tells you what kind of thing a name is, instantly.
Naming rule. Register, label, and type names use
a–z A–Z 0–9 _and are at most 6 characters long. Short names are a deliberate constraint that keeps code dense and scannable.
3. Registers & Mutability
A register is Symbolic's variable. It holds a 64-bit integer by default.
3.1 Creating a register
You create a register by flowing a value into it with > ~$name:
42 > ~$x ::: create $x, give it 42
:wrint { [$x] } ! ::: 42
$xreads the register.~$xdeclares / writes the register. The~marks ownership and mutability - it says "I am (re)binding this name."
3.2 Immutable by default
Reading is $x; it does not change anything. To change a register, you write
to it again with ~:
0 > ~$n ::: $n = 0
$n + 1 > ~$n ::: $n = 1 (read $n, add 1, write back)
$n + 1 > ~$n ::: $n = 2
:wrint { [$n] } ! ::: 2
This read-compute-write pattern is the bread and butter of Symbolic. Because the
write target is explicit (~$n), data flow is always visible: values move
left-to-right into named slots.
3.3 Ownership and moves
The ~ sigil is also Symbolic's ownership marker (the same idea as Rust's
ownership). A plain $x is a use; ~$x is a binding. For the scalar
registers in this guide the distinction is simply "read vs. write," but for
larger values it governs moves - a value flowed into a new owner is moved, not
copied, and the compiler tracks initialization so you cannot read a register
before it has been written.
5 > ~$a
$a > ~$b ::: $b now owns the value; for scalars this is a copy
:wrint { [$b] } ! ::: 5
4. Numbers & Operators
Everything in this chapter returns a value you can flow somewhere.
4.1 Integer literals
42 ::: decimal
0xFF ::: hexadecimal -> 255
0b1010 ::: binary -> 10
(A) ::: a character literal -> its byte value, 65
4.2 Arithmetic
3 + 4 ::: 7 add
3 ++ 4 ::: 12 multiply
3 +++ 4 ::: 81 power (3 to the 4th)
10 - 3 ::: 7 subtract
10 -- 3 ::: 3 divide (integer)
10 --- 3 ::: 1 modulo (remainder)
A worked example:
3 ++ 4 > ~$area ::: 12
$area + 1 > ~$area ::: 13
:wrint { [$area] } ! ::: 13
4.3 Comparisons
Comparisons evaluate to 1 (true) or 0 (false), so you can store or print
them directly. Read them through the grammar: = is the comparison base, +
extends it upward (greater), - negates it downward (less).
5 == 5 ::: 1 equal
5 != 3 ::: 1 not equal
5 =+ 3 ::: 1 greater-or-equal (= extended by +)
5 =++ 3 ::: 1 greater-than (extended twice)
5 --= 3 ::: 0 less-than (- negates toward "less")
5 -= 3 ::: 0 less-or-equal
7 =++ 4 > ~$big
:wrint { [$big] } ! ::: 1
4.4 Bitwise & shifts
0xFF & 0x0F ::: 15 bitwise AND
12 &+ 3 ::: 15 bitwise OR (& extended by +)
12 -&+ 10 ::: 6 bitwise XOR (- negates, & , extended)
-& 0 ::: -1 bitwise NOT (prefix; flips all bits)
1 -< 4 ::: 16 shift left (-< )
256 <+ 2 ::: 64 shift right (<+ )
1 --< 1 ::: rotate left
1 <++ 1 ::: rotate right
4.5 Precedence
From lowest to highest binding power:
comparisons == != --= -= =+ =++
bitwise & &+ -&+
shifts / rotates -< <+ --< <++
add / subtract + -
multiply/div/mod ++ -- ---
power +++ (right-associative)
So 2 + 3 ++ 4 is 2 + (3 ++ 4) = 2 + 12 = 14. When in doubt, compute a
sub-result into a register first - it reads clearly and never surprises you.
5. Data Flow & Width
5.1 The flow operator >
> is how values move. Its left side is a value; its right side is a
destination - a register (~$x), a hash cell (~#h), a memory segment
(@screen), or a control target.
99 > ~$x ::: into a register
$x > @screen ::: into the screen (prints the raw byte 99 = 'c')
5.2 Returning and breaking with the > family
The > symbol composes with ! and ? into the control verbs:
>!? return the value to the caller of the current function
>? conditional flow / jump
!!> break out of the innermost loop
These are introduced properly in Chapters 6 and 7.
5.3 Bit widths with the comma
By default a register and a flow are 64-bit. Touching commas reduce the
width (RULE 3 - ", reduces"):
$a, ::: $a as a 32-bit register
$a,, ::: 16-bit
$a,,, ::: 8-bit
>, ::: a 32-bit flow
A spaced comma is a different token entirely - a sequence separator - because spacing changes meaning (RULE 4). You will rarely need widths until you do systems-level work; they're here when you need them.
6. Control Flow
6.1 The conditional ?[ ... ]{ ... }
? opens a condition in [ ]; the body goes in { }. The body runs once if
the condition is non-zero (true):
5 > ~$x
?[$x =++ 3]{ ::: if x > 3
((big\n)) > @screen
}
::: big
6.2 Else with -?
-? is "else" - literally the ? negated by -:
?[$x =++ 10]{ ((big\n)) > @screen }
-?{ ((small\n)) > @screen }
::: small
You can chain -?{ ?[...]{...} -?{...} } to get else-if ladders.
6.3 Loops ?[ ... ]{ ... }?
Add a trailing ? to the closing brace and the condition becomes a loop: it
re-checks on every pass and repeats while true.
0 > ~$i
?[$i --= 5]{ ::: while i < 5
:wrint { [$i] } !
$i + 1 > ~$i
}?
::: 0 1 2 3 4
Counters work exactly as written: a register updated across the loop's back-edge keeps one stable location.
6.4 Breaking out with !!>
!!> jumps to the exit of the innermost loop:
0 > ~$i
?[1 == 1]{ ::: loop "forever"...
?[$i =+ 5]{ !!> } ::: ...until i >= 5, then break
:wrint { [$i] } !
$i + 1 > ~$i
}?
:wrint { [999] } ! ::: 0 1 2 3 4 999
6.5 Halting and panicking
!! halt the program normally (exit 0)
!!! panic (abort)
6.6 Pattern matching ??
?? matches a value against patterns; each arm is pattern > { body } and _
is the catch-all (see Chapter 10):
?? $n {
0 > { ((zero\n)) > @screen }
1 > { ((one\n)) > @screen }
_ > { ((other\n)) > @screen } ::: wildcard arm
}
7. Functions
7.1 Declaring a function
A function is a label :name, a parameter block { ... }, a body, and a return:
:add { [i64:$a] & [i64:$b] }
$a + $b >!?
:addis the function's name.{ [i64:$a] & [i64:$b] }is the parameter list: each parameter is[type:$name], separated by&.$a + $b >!?computes the sum and returns it with>!?.
Why the type annotation? Writing
[i64:$a](rather than just[$a]) is what tells the compiler "this is a declaration of a parameter," as opposed to[expr]which is an argument at a call site. The type also documents the parameter.i64is the 64-bit integer used throughout this guide.
7.2 Calling a function
A call is :name { [arg] & [arg] ... } !. The trailing ! means "invoke."
Capture the result with > ~$dest:
:add { [20] & [22] } ! > ~$sum
:wrint { [$sum] } ! ::: 42
A function with no parameters is called :name { } !.
7.3 Many arguments
Functions take any number of arguments - there is no fixed limit. The first six travel in registers and the rest on the stack, automatically:
:sum8 { [i64:$a]&[i64:$b]&[i64:$c]&[i64:$d]&[i64:$e]&[i64:$f]&[i64:$g]&[i64:$h] }
$a + $b + $c + $d + $e + $f + $g + $h >!?
:sum8 { [1]&[2]&[3]&[4]&[5]&[6]&[7]&[8] } ! > ~$r
:wrint { [$r] } ! ::: 36
7.4 Early return
>!? can appear anywhere, including inside a conditional, to return early:
:max { [i64:$a] & [i64:$b] }
?[$a =+ $b]{ $a >!? } ::: if a >= b, return a
$b >!?
:max { [3] & [9] } ! > ~$m
:wrint { [$m] } ! ::: 9
7.5 Recursion
Functions may call themselves. Here is factorial:
:fac { [i64:$n] }
?[$n --= 2]{ 1 >!? } ::: base case: n < 2 -> 1
$n - 1 > ~$m
:fac { [$m] } ! > ~$r
$n ++ $r >!? ::: n * fac(n-1)
:fac { [5] } ! > ~$f
:wrint { [$f] } ! ::: 120
Tip - program shape. Define all your functions first, then write the top-level statements that drive them. Each function body ends at its top-level
>!?return.
7.6 The built-in functions
These behave like functions and are always available; they are how a program talks to the outside world:
| Built-in | Meaning |
|---|---|
:wrint { [n] } ! |
print integer n (decimal + newline) |
:wrch { [b] } ! |
write one byte b to the screen |
:wrbuf { [ptr] & [len] } ! |
write len bytes starting at ptr |
:rdall { } ! -> ptr |
read all of stdin; returns a pointer to [len:i64][bytes...] |
:alloc { [n] } ! -> ptr |
allocate n bytes on the heap |
:ld8 / :ld64 { [addr] } ! -> v |
load a byte / 8 bytes from memory |
:st8 / :st64 { [addr] & [v] } ! |
store a byte / 8 bytes to memory |
We use these next.
8. Memory & Segments
8.1 Segments
Memory is addressed through named segments, all written @name:
| Segment | Purpose |
|---|---|
@screen |
display output (stdout) |
@inp |
keyboard / standard input |
@mem |
general RAM (@mem[addr]) |
@stck |
the stack |
@sec |
secure memory |
@net @rng @time @sys |
network, randomness, clock, system vectors |
((hi\n)) > @screen ::: write to the display
8.2 The heap: allocate, store, load
Use :alloc to get memory and the :ld*/:st* built-ins to use it:
:alloc { [64] } ! > ~$p ::: 64 bytes, $p points at them
123456789 > ~$v
:st64 { [$p] & [$v] } ! ::: store 8 bytes at $p
:ld64 { [$p] } ! > ~$r ::: read them back
:wrint { [$r] } ! ::: 123456789
65 > ~$c
:st8 { [$p + 8] & [$c] } ! ::: store the byte 65 ('A') at $p+8
:ld8 { [$p + 8] } ! > ~$b
:wrch { [$b] } ! ::: A
:wrch { [10] } ! ::: newline
Addresses are ordinary integers, so [$p + 8 + $i] indexes naturally.
8.3 Reading standard input
:rdall slurps all of stdin into a heap buffer laid out as
[length: 8 bytes][raw bytes...]. This echo reads input and writes it back:
:rdall { } ! > ~$buf
:ld64 { [$buf] } ! > ~$len ::: first 8 bytes = the length
0 > ~$i
?[$i --= $len]{
:ld8 { [$buf + 8 + $i] } ! > ~$ch
:wrch { [$ch] } !
$i + 1 > ~$i
}?
!!
echo -n "round trip" | ./echo
::: round trip
This is exactly the input path the self-hosting compiler uses to read your source code.
9. Hashes
Symbolic has first-class hash cells: named, program-global storage organized
into three tiers by lifetime, written with one, two, or three #.
| Form | Tier | Meaning |
|---|---|---|
#name |
ephemeral | a mutable global cell |
##name |
persistent | survives across runs (where the platform supports it) |
###name |
ROM | a constant baked into the binary |
9.1 Declaring and using cells
A declaration is #name <constant>. After that, read it as #name and write to
it with > ~#name:
###MAX 1000 ::: a ROM constant
#count 0 ::: a mutable counter
###MAX > ~$m
:wrint { [$m] } ! ::: 1000
5 > ~#count ::: write the cell
#count + 1 > ~#count
:wrint { [#count] } ! ::: 6
Cells are how a long-running program (like the compiler) keeps global state - buffers, counters, tables.
9.2 Content-addressed hashes
The spec's most distinctive feature: #[:fn & key] computes a slot by applying
a hash function :fn to a key, then reads or writes that slot. It is a
built-in hash map with collision handling, expressed in the grammar:
:h { [i64:$k] } $k --- 64 >!? ::: our hash function: key mod 64
111 > ~#[:h & 7] ::: store 111 at slot hash(7)
222 > ~#[:h & 71] ::: 71 mod 64 == 7 -> collides; resolved by probing
#[:h & 7] > ~$a
:wrint { [$a] } ! ::: 111
#[:h & 71] > ~$b
:wrint { [$b] } ! ::: 222
The runtime keeps an open-addressed table; colliding keys probe to the next free slot, so distinct keys never clobber each other.
10. Structs, Enums & Matching
10.1 Types
A type is named with ::Name. A struct groups fields; each field is written
type:[name], separated by &:
::Plyr { i32:[hp] & i32:[mp] & f64:[spd] }
Create one with a value list, read fields with ., and write a field with
> ~$p.field:
::Plyr { [100] & [50] & [1.5] } > ~$p
$p.hp > @screen ::: 100 (writes the raw byte; use :wrint to see decimal)
75 > ~$p.hp ::: fields are mutable
$p.hp > @screen ::: 75
$p.mp > @screen ::: 50
10.2 Enums and pattern matching
The ?? operator matches a value against patterns. Each arm is
pattern > { body }, and _ is the wildcard that matches anything else, so a
match is always exhaustive:
3 > ~$n
?? $n {
0 > { :wrint { [100] } ! }
1 > { :wrint { [101] } ! }
_ > { :wrint { [999] } ! } ::: $n is 3 -> 999
}
Enums are types with named variants (::Shape { .Circle { i64:[r] } & ... }),
selected with .Variant; you match their variants the same way.
11. Generics, Traits & Closures
11.1 Generics with $T$
A generic type parameter is written $T$ (a name between two dollar signs).
It lets one function work for any type:
:id $T$ { [$T$ $x] }
$x >!?
:id { [42] } ! > ~$r
:wrint { [$r] } ! ::: 42
Here $T$ after the function name introduces the type parameter, and [$T$ $x]
declares a parameter $x of that generic type. A single compiled body serves
every instantiation.
11.2 Traits & impls with ^.^
^.^ introduces an implementation block - methods attached to a type,
optionally satisfying a trait. Each method is a label with a braced body, and
several methods are scoped together inside one { }:
^.^ ::Plyr {
:heal { [i32:$h] } $p.hp + $h >!? }
:dbl { } $p.hp ++ 2 >!? }
}
Both inherent impls (^.^ ::Type { ... }) and trait impls
(^.^ ::Type ::Trait { ... }, naming the trait after the type) use this same
shape - the ^.^ marker makes an implementation block unambiguous.
Method dispatch is static - each call resolves to a concrete function at compile time and lowers to a direct call (no vtable, no runtime lookup). This is what makes the abstractions below zero-cost: they compile to exactly the code you'd write by hand.
11.3 Iterators (zero-cost)
An iterator is just a struct holding its state, advanced by ^.^ methods. Since
the state lives in the struct (no per-step heap allocation) and the methods are
statically dispatched (direct calls), the loop compiles to a tight
increment-compare-call with no overhead - see examples/iterator.sym:
::Rng { i64:[cur] & i64:[end] }
^.^ ::Rng {
:more { $self.cur --= $self.end >!? } ::: more items?
:take { $self.cur > ~$v $self.cur + 1 > ~$nc $nc > ~$self.cur $v >!? } ::: yield + advance
}
::Rng { [0] & [5] } > ~$it
?[1 == 1]{ $it :more ! > ~$m ?[$m == 0]{ !!> } $it :take ! > ~$x :wrint { [$x] } ! }? ::: 0 1 2 3 4
(The split more/take keeps it allocation-free. A single next returning the
Option enum is also valid, but Option is heap-allocated, so it is not
zero-cost in a hot loop.)
11.4 Closures with >{ }
A closure is an anonymous function written >{ ... } that captures registers
from its surrounding scope by value. Capture-only closures take no arguments:
3 > ~$a
4 > ~$b
>{ $a + $b } > ~$add ::: captures $a and $b
$add ! > ~$r ::: call with `!`
:wrint { [$r] } ! ::: 7
To take an argument, write the parameter as type:$name > before the body:
10 > ~$base
>{ i32:$x > $x + $base } > ~$f ::: parameter $x, capturing $base
$f ! { [5] } > ~$s ::: call with an argument list
:wrint { [$s] } ! ::: 15
The closure's body is the value of its last expression; captured registers are frozen at the point the closure is created.
12. Ternary Computing
Symbolic is unusual in offering base-3 types alongside binary ones. This is useful for logic with a third state and for balanced-ternary arithmetic.
12.1 trit - three-valued logic
A trit is a Kleene three-state truth value: False, True, Unknown. The
built-ins :tand, :tor, :tnot, :teq implement Kleene logic, where
Unknown propagates sensibly (e.g. True OR Unknown = True, but
False OR Unknown = Unknown).
12.2 trool - four-valued logic
A trool adds a distinct Undefined state that always propagates: anything
combined with Undefined is Undefined. This models hardware "don't care / X"
signals.
12.3 Balanced ternary integers
The tadd/tsub built-ins do single-digit balanced ternary arithmetic
(digits 0, +1, -1), and the wider ti*/tu* types extend this to
multi-digit integers, with carries handled by the runtime. Balanced ternary
represents negative numbers without a separate sign bit - a genuinely different
way to count.
13. Building & Targets
13.1 Compiler flags
The self-hosting symc reads source from stdin and writes a binary to
stdout. The --target flag selects the output format (default x64-linux):
symc [--target T] [-O0..3] [--no-std] [--heap MiB] < prog.sym > out
symc --target wasm32 < prog.sym > prog.wasm
| Flag | Default | Meaning |
|---|---|---|
--target T |
x64-linux |
output format (see §13.2) |
-O0 … -O3 |
-O2 |
optimization level (fold/DCE/copy-prop at -O1, algebraic identities at -O2, strength reduction at -O3) |
--no-std |
off | don't prepend the std prelude — used when the source already bundles std (the toolchain itself does) |
--heap <max> or <min>:<max> |
0:4096 |
the produced binary's bump heap, MiB, Java -Xms:-Xmx style (§see below) |
--heap <max> (or <min>:<max>) sets the bump-allocator heap baked into
the output binary, in MiB — like Java's -Xms/-Xmx:
- max (default 4 GiB) is the heap's hard ceiling — a lazily-mapped BSS
reservation on Unix, large enough that the compiler self-compiles to every
target (incl. wasm32) without exhausting it.
--heap 64sets just the max. - min (default 0) is pre-faulted resident at startup (
-Xms): the firstminMiB are touched so they're committed immediately rather than lazily on first use.--heap 256:512→ 256 MiB resident up front, 512 MiB ceiling. Ifmin > maxit's clamped tomax.
It is honored on every hosted platform, each using that platform's natural allocation:
| Platform | How the heap is provided | --heap |
|---|---|---|
| Linux / macOS / FreeBSD | BSS region, lazily mapped by the kernel (a large ceiling costs nothing until touched) | yes |
| arm64 / riscv64 / loongarch64 | same lazily-mapped BSS | yes |
| Windows | VirtualAlloc(NULL, min(heap,1 GiB), MEM_RESERVE|MEM_COMMIT) at startup — committed up front, so capped at 1 GiB to avoid over-committing |
yes (≤ 1 GiB) |
| UEFI / bare-metal | fixed 16 MiB (no OS to grow it) | no (fixed) |
| wasm32 | WebAssembly linear memory (64 KiB pages, growable) — sized by its own page model | n/a |
Because the Unix heap is a lazily-mapped BSS, a 4 GiB default is free until
touched, so the allocator never runs off the end on any realistic program. Lower
it for embedded / valgrind & callgrind builds (--heap 64) — a multi-GiB
bump heap can't be mapped under Valgrind, which is why tests/iai.sh compiles
its workloads with symc --heap 64. Raise it for programs that allocate even
more.
A subtlety worth knowing: a binary's own heap size is fixed by the compiler that builds it (the emitter reads the running compiler's
#heapbwhen it writes the output's BSS), not by--heappassed to that same binary later. So to grow the compiler's own heap you rebuild it through one self-host stage with the new size;--heapon a normalsymcinvocation sizes the programsymcis currently producing.
The Rust seed symc0 takes a filename and uses -o:
symc0 [--no-run] [--target T] [-o OUT] FILE.sym
symc0 --dump-tokens FILE.sym # print the token stream
symc0 --dump-ast FILE.sym # print the parsed item count
symc0 --dump-ir FILE.sym # print IR statistics
symc0 --emit-asm FILE.sym # print assembly text instead of a binary
13.2 Targets
All thirteen targets are selected with --target. Every output is a raw,
self-contained binary — no assembler or linker is involved.
--target |
Output format | Verified by |
|---|---|---|
x64-linux |
static x86-64 Linux ELF | native execution + fixpoint |
x64-macos |
x86-64 Mach-O | format check |
x64-windows |
x86-64 PE/COFF (.exe) | format check |
x64-uefi |
x86-64 PE UEFI application | format check + qemu OVMF |
x64-freebsd |
x86-64 FreeBSD ELF (OSABI 9) | format check |
arm64-linux |
AArch64 Linux ELF | qemu-aarch64 |
arm64-macos |
AArch64 Mach-O | format check |
ios-arm64 |
AArch64 Mach-O (LC_BUILD_VERSION iOS) | format check |
android-arm64 |
AArch64 Linux ELF + PT_NOTE | format check |
riscv64-linux |
RISC-V 64 Linux ELF | qemu-riscv64 |
loongarch64 |
LoongArch64 Linux ELF | qemu-loongarch64 |
wasm32 |
WebAssembly binary (WASI) | browser WASI sandbox |
spirv |
SPIR-V binary | validator |
13.3 The self-hosting compiler
The compiler is the five runes std/ lex/ parse/ ir/ back/, assembled by
build-symc.sh and compiled to symc. To reproduce the bootstrap fixpoint:
bash install.sh # builds symc0 -> symc.lcc -> symc.s1 -> s2 -> s3, asserts s2==s3
The assembled single-file source lives in sigil/runes/symc/src/main.sym.
See docs/SELFHOSTING.md for a guided tour.
13.4 Testing & benchmarking
Differential fuzzer (tests/fuzz.sh). The strongest correctness check: it
generates random integer programs (tests/fuzzgen.py) in both Symbolic and C,
compiles the Symbolic at every optimization level and the C with cc, runs all
five, and requires byte-exact agreement. Any divergence is a real bug:
tests/fuzz.sh [symc] [count] [start-seed]
tests/fuzz.sh dist/x64-linux/symc 300 1 # 300 random programs, seeds 1..300
- the four
-O0..-O3outputs disagreeing → an optimizer / regalloc / codegen bug; - all agreeing but
!=the C oracle → a front-end (parse/lower) or codegen bug.
The C oracle uses uint64_t (defined wraparound), so it's sound. Failing seeds
are printed and reproducible: python3 tests/fuzzgen.py --seed <n> --sym p.sym --c p.c regenerates the exact program.
A related check, tests/validate-rcx.sh, guards changes to the backend: it
re-runs the 3-stage self-host fixpoint (s2 == s3) and the kernel scoreboard,
failing on any miscompile.
Benchmark suite (the bench rune + sigil bench). Benchmarking is native —
std provides a monotonic clock (:nowns) and decimal print (:bint); the
bench rune (sigil/runes/bench/) is a Criterion-style harness on top:
sigil bench # times the compile, then samples each benchmark
Each benchmark takes 10 samples at varying iteration counts and reports
[min median max] per-op plus a least-squares slope (ns/iter, cancels fixed
overhead), classifies outliers with Tukey's fences, auto-scales units (ns/us/ms),
and compares the median to a saved criterion-baseline to flag
Improved/Regressed/No change. It also prints a perf stat-style cache/memory
summary (via perf_event_open; shows n/a when a CI container restricts it).
Cross-language comparisons live alongside it:
sigil/runes/bench/vs.sh— the same 64-bit LCG+xorshift loop in every available language (C, Rust, Java, Node, Python, …), each self-timing its loop (startup/JIT excluded), normalized to ns/iter vs C. Symbolic's from-scratch backend lands ~1.18×gcc -O2. The printed 64-bit result must match across languages — a built-in correctness check, so a "win" is never a miscompile.sigil/runes/bench/kernels.sh— a multi-kernel scoreboard vscc -O3 -march=native, each kernel's integer result checked exactly.sigil/runes/bench/ilp.sh— an instruction-level-parallelism companion tovs.sh(independent chains rather than one latency-bound recurrence).
Appendix A: Complete Symbol Index
NAMES & SIGILS
$name register (variable) ~ ownership / mutability / write
:name label / function ::Name type
::: comment #name hash cell (ephemeral)
##name hash cell (persistent) ###name hash cell (ROM constant)
@name memory segment $T$ generic type parameter
' lifetime annotation
LITERALS
42 decimal int 0xFF hex 0b1010 binary
3.14 float (A) char ((text\n)) string
ARITHMETIC COMPARISON BITWISE / SHIFT
+ add == equal & and
++ multiply != not equal &+ or
+++ power =+ >= -&+ xor
- subtract =++ > -& not (prefix)
-- divide --= < -< shift left
--- modulo -= <= <+ shift right
--< rotate left
<++ rotate right
FLOW & CONTROL
> flow value -> destination >!? return from function
>? conditional flow / jump >, 32-bit flow (touching comma)
?[c]{} run once if c ?[c]{}? loop while c
-?{} else ?? pattern match
! call !! halt
!!> break loop !!! panic
-?! ?! conditional calls -! dam (guard)
ENCAPSULATION
[ ] grouping / argument & parameter lists { } blocks / scopes
& argument separator / bitwise-and . field access
, width & temporal reduction
CONTENT-ADDRESSED
#[:fn & key] hash `key` with `:fn`, address the resulting slot
IMPL / CLOSURE
^.^ implementation block >{ ... } closure (captures by value)
Appendix B: Grammar
A simplified EBNF of the core language (the subset the self-hosting compiler accepts; the reference compiler accepts a superset including types, traits, generics, closures, and ternary forms):
program := item* ;
item := fn_decl | cell_decl | stmt ;
fn_decl := LABEL generic? param_block? stmt* '>!?' ;
generic := '$' NAME '$' ;
param_block := '{' ( '[' type? REG ']' ('&' '[' type? REG ']')* )? '}' ;
type := NAME ':' | '$' NAME '$' ;
cell_decl := ('#'|'##'|'###') NAME INT ;
stmt := if | loop | 'break' | 'halt' | expr_stmt ;
if := '?' '[' expr ']' '{' stmt* '}' ( '-?' '{' stmt* '}' )? ;
loop := '?' '[' expr ']' '{' stmt* '}?' ;
break := '!!>' ;
halt := '!!' ;
expr_stmt := expr ( '>!?' | '>' ('~')? (REG | CELL) )? ;
expr := primary ( binop expr )* ;
primary := INT | REG | CELL | call | '[' expr ']'
| '-' primary | '-&' primary ;
call := LABEL '{' ( '[' expr ']' ('&' '[' expr ']')* )? '}' '!' ;
binop := '+'|'++'|'+++'|'-'|'--'|'---'
| '=='|'!='|'--='|'-='|'=+'|'=++'
| '&'|'&+'|'-&+'|'-<'|'<+'|'--<'|'<++' ;
You now know Symbolic. Go read examples/features/ -
there is one runnable program per feature - then open
sigil/runes/symc/src/main.sym and watch the language describe itself.
Sources & inspiration: structure modeled on The Rust Programming Language and its table of contents; authoritative symbol semantics are defined in spec.md.