The Guarantees Your Type System Makes

2747 words

Recently, while talking to a friend, I realized I couldn’t explain why Typescript’s types don’t offer the same guarantees as those in C# or Rust. Looking for an answer led me somewhere more interesting, where I had to learn a little bit about programming language theory to understand what type safety actually means, why it’s a property of a language’s design and why it matters more today than it ever has.

Sound languages

I found two popular definitions for what a safe language is. The first is from The Practical Foundations of Programming Languages book by Robert Harper and it establishes a couple of core concepts:

“Most programming languages exhibit a phase distinction between the static and dynamic phases of processing. The static phase consists of parsing and type checking to ensure that the program is well-formed; the dynamic phase consists of execution of well-formed programs. A language is said to be safe exactly when well-formed programs are well-behaved when executed.

The static phase is specified by a statics comprising a collection of rules for deriving typing judgments stating that an expression is well-formed of a certain type. Types mediate the interaction between the constituent parts of a program by “predicting” some aspects of the execution behavior of the parts so that we may ensure they fit together properly at run-time. Type safety tells us that these predictions are accurate; if not, the statics is considered to be improperly defined, and the language is deemed unsafe for execution.“

From this extract, we can learn that:

  • Processing can be divided into two stages, the statics: it derives typing judgments to check if what you said is of a certain type actually is that type; and the dynamics which is concerned with the evaluation of those programs.
  • Types predict aspects of the execution to ensure they make sense at runtime.
  • It gives us our first mention of what type safety means: it guarantees us that the predictions made by the types are right.

It mentions the notion of a well-formed program: one that has passed parsing and type checking. We’ll refer to them as well-typed programs, otherwise they are ill-typed which means that we know they are ill-behaved or can’t be proven to be well-behaved.

Lastly, it defines a language as safe if all well-typed programs are well-behaved, meaning they exhibit only behaviors predicted by the statics. The fulfillment of this property is what marks a language as sound or unsound.We’ll use “safe” and “sound” interchangeably.

If we keep looking, the Type Systems paper by Luca Cardelli has a similar take on what type safety is:

“A program fragment is safe if it does not cause untrapped errors to occur. Languages where all program fragments are safe are called safe languages.”

He introduces the concepts of trapped and untrapped errors: trapped errors stop computation immediately while untrapped errors don’t but come back to bite you later.

If we keep reading we see that he defines type safety and type soundness as two separate things:

Type safety: The property stating that programs do not cause untrapped errors.”

Type soundness: The property stating that programs do not cause forbidden errors.”

Forbidden errors are nothing more than a set of errors that include all the untrapped errors and a previously defined subset of trapped errors. I find that this fits with the mental model of Harper.

Given this, we should have a mental framework of what a sound language looks like and why soundness is a property designed into a language and not a property of an implementation.


Let’s play with some examples to illustrate what we have introduced so far. Imagine this function defined in a fictitious language which has the property of being sound, meaning that well-typed programs are well-behaved at runtime.

fn add(a: Int, b: Int) Int {
    return a + b;
}

// This would result in a well-typed program.
let res = add(10, 314);

// This would result in an ill-typed program.
let res = add(10, "Some");

The expression add(10, 314) is sound because add requires its arguments to be of type Int and both 10 and 314 satisfy this requirement; therefore the program is well-typed. The statics for Int would guarantee that the + operator will not get stuck and because of this, the program is thought of as falling within the guarantees of the language.

By comparison, add(10, "Some") is ill-typed because "Some" does not have type Int nor can it be reasonably coerced to it, so it doesn’t satisfy the context. If we allowed this program to be executed it would fail to uphold the guarantees of the language and we wouldn’t know what it could happen when it gets stuck. Another example would be:

fn add(a: Int, b: Str) Int {
    return a + b;
}

let res = add(10, "Some");

The program may or may not be well-typed depending on the statics of the language. There are two possibilities:

  • There is a rule defined for + with Int and Str as operands, the program is well-typed and guaranteed not to get stuck.
  • No rule exists: the program is considered ill-typed and executing it would break the language guarantees baked into the types.

Let’s look at this example, thinking of our language as no longer proven to be sound:

fn div(a: Int, b: Int) : Int {
    return a / b
}

let res = div(10, 0);

This program is clearly well-typed, but stating that the language is sound now depends on the dynamics. More precisely, we can determine if it is sound by how it handles errors, in this case a division by zero:

  • The dynamics do not define what happens in this failure case, undefined behavior takes place: the language is unsound.
  • Division by zero is accounted for in the dynamics of the language: the language is considered sound. The dynamics can verify this via static and dynamic checks. Being able to detect this class of errors at compile time is harder than at runtime and involves more expressive type systems (such as implementing dependent types like in Idris or F*), so dynamic checks are more commonly found: throwing errors, exceptions, panicking, returning a special value, etc.

This is all well and good, but as I hinted earlier, a language has to prove it is sound; naturally, it must follow rules.

How type safety is proven

We have a sense of what type safety is; we now know it expresses the coherence between the statics and the dynamics; where the statics predict that the value of an expression will have a certain form so that the dynamics of that expression are well-defined.

Consequently, evaluation of a well-typed expression cannot get stuck! It can never reach a non-value state for which no evaluation rule applies. And this is exactly what is proven via the Preservation and Progress theorems.

Preservation and Progress

Let’s first introduce some notation to express these theorems precisely:

'Γ'     the static environment or typing context. 
'⊢'     the turnstile; reads as "under the assumption of".
'τ'     the type in question.
':'     the typing relation.
'->'    single step evaluation.
'∧'     is logical "and".
'∅'     the empty typing context.
'∃'     existential quantifier.
'∨'     is logical "or".
'val'   denotes that e is a value.
'.'     the binder operator and reads as “such that”.

Where Γ represents all the variables that are in the scope alongside their types. It’s used in the relation Γ ⊢ e : τ that can be read as “the typing context shows that e has type τ”.

The theorems can be expressed as:

1. If Γ ⊢ e : τ ∧ e -> e', then Γ ⊢ e': τ       # Preservation
2. If ∅ ⊢ e : τ then (e val) ∨ (∃ e'. e -> e')  # Progress

The first theorem is called Preservation and it can be read as “if under the typing context Γ, e has type τ, and e steps to e’ in one step, then e’ also has type τ under Γ”. It ensures that each step of evaluation preserves the type.

The second is called Progress and it can be read as “if under the empty typing context ∅, e has type, τ then e is either a value or there exists an expression e’ reachable by a single step from e”. It ensures that well-typed closed expressions are either values or can be further evaluated. As you could have guessed by now, a term is “stuck” when it fails to satisfy Progress.

Safety is the conjunction of Preservation and Progress.

Where do dynamic languages fit?

Up until now, our discussion was heavily centered around types, so it is reasonable to wonder how any of this apply to dynamic languages, since they are often referred to as “untyped” languages. The answer is that it is types all the way down and the notion of not having them is a lie.

Dynamic languages embrace a model of computation where multiple classes of values exist for a single recursive type. In the words of Harper:

“Every dynamic language is inherently a static language in which we confine ourselves to a (needlessly) restricted type discipline to ensure safety.”

The gist of it is what he calls “type discipline”. In this type of language, every value is tagged to indicate which class of value they belong through a process called Classification. Since the static phase limits itself to checking only that the expression is well-formedHere “well-formed” stands for parsing not type-checking. and that there are no free variables in the expression; the dynamics must check for errors at runtime that would have never arisen in a statically-typed language; this is called Class Checking and uses the tags each value has.

It’s useful to look at an example in the wild. An example of this single recursive type can be found in the CPython Language Reference. It defines a type PyObject, where all object types are extensions of this type and one of its fields, called ob_type, is a pointer to PyTypeObject and encodes the object type which is exactly the idea of tagging a value to know which class of value they belong to.

Knowing this, what happens to the theorems of Preservation and Progress? How do we talk about type safety in this kind of languages?

Harper presents the theorem of Progress as:

If d ok, then (d val) ∨ (d err) ∨ (∃ d' . d->d')

This reads as “If d is a closed expression, then it’s either a value, results in a runtime error or it can be further evaluated”. He also defines Exclusivity, which states that, for any d in the language, exactly one of these outcomes must be true at any given time.

The key difference is that, in the previous model, these kinds of errors were guaranteed not to occur for well-typed programs. But, since catching those errors before evaluation requires static typing, now they have become part of the semantics of the language! This way, evaluation never becomes stuck. I think it’s relevant to notice how now catching these errors falls upon the dynamics of the language, adding non-negligible overhead.

And since there exists only one singular recursive type, Preservation now plays a lesser role here; there is no static type information to preserve between evaluations! The runtime, through Classification and Class Checking, enforces the discipline that guarantees safe evaluation.

There have been great efforts made towards introducing static typing to dynamic languages: Python has external tools like mypy and ty for static analysis based on annotations. Little old Typescript, the most popular language in the world, plays a similar role by adding a static type system on top of Javascript. More recently, Elixir has been working on adding a gradual type system.

Case study: Division by zero

We’ll look at simple enough snippet in Rust and C to compare what their specifications have to say and what different implementations end up doing, in order to illustrate why soundness is a property of a language semantics and not of it implementations.

fn div(a: u32, b: u32) -> u32 {
    a / b
}

fn main() {
    let res = div(10, 0);
    println!("{res}");
}

Here, we are using rustc and executing the output binary results in a panic:

$ rustc -o main main.rs; ./main

thread 'main' (432991) panicked at main.rs:2:5:
attempt to divide by zero
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
with a full backtrace in:

If we add the RUST_BACKTRACE=1 environment variable we can see the language’s built-in error handling machinery at work:

$ rustc -o main main.rs; RUST_BACKTRACE=1 ./main

thread 'main' (433803) panicked at main.rs:2:5:
attempt to divide by zero
stack backtrace:
   0: __rustc::rust_begin_unwind
             at /rustc/2286e5d224b3413484cf4f398a9f078487e7b49d/library/std/src/panicking.rs:690:5
   1: core::panicking::panic_fmt
             at /rustc/2286e5d224b3413484cf4f398a9f078487e7b49d/library/core/src/panicking.rs:80:14
   2: core::panicking::panic_const::panic_const_div_by_zero
             at /rustc/2286e5d224b3413484cf4f398a9f078487e7b49d/library/core/src/panicking.rs:175:17
   3: main::div
   4: main::main
   5: core::ops::function::FnOnce::call_once

Rust does not yet have a formal specification; the closest thing is the Rust Reference, and in there, we can see that this behavior is defined by the language: division by zero does not result in undefined behavior but instead transitions the program to a specified error state where computation is stopped immediately. Although the program ends abruptly, it doesn’t reach an undefined state, panicking is a defined outcomeRecovering from a panic like this can be done by using catch_unwind or avoided by using more idiomatic approaches, such as checked_div or encoding the invariants in the function’s type signature. . This is why Rust is considered sound in the presence of such an error.

Now on the C program, we first use GCC:

#include <stdio.h>

int div(int a, int b) {
    return a / b;
}

int main() {
    int res = div(10,0);
    printf("%d", res);
}

We do gcc -Wall -Wextra -Wpedantic -O0 -o main main.c and it compiles successfully, but then:

$ ./main
Floating point exception   (core dumped) ./main

We can use gdb to learn more about what happened:

$ gdb -q ./main -ex run -ex "set disassembly-flavor intel" \
     -ex disassemble -ex quit
...
Program received signal SIGFPE, Arithmetic exception.
0x0000555555555147 in div (a=10, b=0) at main.c:4
4           return a / b;
Dump of assembler code for function div:
   0x0000555555555139 <+0>:     push   rbp
   0x000055555555513a <+1>:     mov    rbp,rsp
   0x000055555555513d <+4>:     mov    DWORD PTR [rbp-0x4],edi
   0x0000555555555140 <+7>:     mov    DWORD PTR [rbp-0x8],esi
   0x0000555555555143 <+10>:    mov    eax,DWORD PTR [rbp-0x4]
   0x0000555555555146 <+13>:    cdq
=> 0x0000555555555147 <+14>:    idiv   DWORD PTR [rbp-0x8]
   0x000055555555514a <+17>:    pop    rbp
   0x000055555555514b <+18>:    ret
End of assembler dump.

We see that the fault came from our div function. Notice how at address 0x0000555555555147 the division is performed with the idiv instruction on x86_64. Since the divisor is zero, the CPU raises a #DE (Division Error) exception. The OS handles it by sending a SIGFPE signal to the program.

The latest publicly available C language specification classifies this as undefined behavior (UB), which is to say anything goes. To be fair, Rust is not free of UB, although it is a lot less pervasive than in C since it’s largely confined to unsafe blocks. Whether the program crashes, hangs, or produces unpredictable results depends on the implementation, the target architecture and OS.

This illustrates one way the C language is unsound: even well-typed programs can reach runtime states that the language does not account for.

This behavior is a design decision. It allows compilers more freedom to do optimizations, which is great and makes your program faster, but it can end up changing the behavior of your program in ways you don’t expect.

To showcase what can happen with these optimizations, let’s see three different implementations of C: GCC, Clang and Fil-C:

$ gcc -g -O3 -o main main.c && gdb -q ./main -ex run \
-ex "set disassembly-flavor intel" -ex disassemble -ex quit
...
Program received signal SIGILL, Illegal instruction.
main () at main.c:8
8           int res = div(10,0);
Dump of assembler code for function main:
=> 0x0000555555555020 <+0>:     ud2
End of assembler dump.

Now the fault comes from our main function. Inspecting the assembly we see that the compiler optimized the whole program to a ud2 instruction which generates a #UD (Invalid Opcode) exception. Now the program receives a SIGILL from the OS.

$ clang -O3 -o main main.c && ./main 
151705912 

$ clang -O3 -o main main.c && gdb -q ./main -ex "set disassembly-flavor intel" \
-ex "disassemble main" -ex quit
...
Dump of assembler code for function main:
   0x0000000000001150 <+0>:     push   rax
   0x0000000000001151 <+1>:     lea    rdi,[rip+0xeac]        # 0x2004
   0x0000000000001158 <+8>:     xor    eax,eax
   0x000000000000115a <+10>:    call   0x1030 <printf@plt>
   0x000000000000115f <+15>:    xor    eax,eax
   0x0000000000001161 <+17>:    pop    rcx
   0x0000000000001162 <+18>:    ret
End of assembler dump.

This is interesting: without optimizations it behaves similarly to the GCC-compiled one. With optimizations enabled, Clang also deletes div; however, it doesn’t replace main’s body with a ud2 and instead continues execution normally, printing an arbitrary value!

Fil-CAt the time of writing, it is at version 0.678. is a fork of Clang 20.1.8 that adds static and runtime memory safety features to C and to properly explain what it does deserves an article of its own:

$ ./build/bin/clang -g -O3 -o main main.c && ./main
0

This time, the Fil-C compiler ends up rewriting our program to always return 0.

Why It Matters

Up to this point, it’s clear what a sound language offers you: it makes programs easier to reason about and helps avoid whole classes of bugs. And with the case study, we saw how it is actually a property of the design of the language and not of the implementation (compared to, for example, memory safety).

Most languages are not perfectly sound, and it’s often the case that it is due to tradeoffs. Typescript sacrifices soundness for expressiveness and to be able to type one big mess of a languageCuriously, in this community, the term “type safe” is often misused to refer to static checking, which is not the same thing , C does it for performance, Go for simplicity and the list goes on. Even languages famously known for an emphasis on type safety have their own escape hatches, like Obj.magic in Ocaml or std::transmute in Rust. Not to mention the fact that you will have to interface with code written in C or C++ (for example, when working with Python, Ruby or Node.js, somewhat popular choices).

And yet, the degree to which a language commits to soundness genuinely matters, since its benefits compound over time:You could argue that this applies more to static languages with strong type systems. For that I can only say that I don’t hide my preferences.

  • Local reasoning: You can reason more about parts of a system without knowing all of it. This impacts code reviewing, makes refactoring easier and allows you to make sense of a codebase faster.
  • Mental overhead: Somewhat related to the last point, one usually wants to work less, not more. In unsound languages you have to work harder and rely more on tools, your own experience or the community’s to learn the patterns to avoid the unsound parts. It must have been fun debugging this.
  • Language choice: It is often the case that languages that care about type safety also care about some things you would very much like to have, like strong type systems, a concern about correctness and the ability to express powerful ideas in the code itself.

Why do I think that type safety is more important today than it was ever before?

The crux of my argument is really simple:

Type Safety makes agent-generated code easier to verify and make sense of, since the type system does a lot of heavy lifting for you.

With the advent of agentic coding, you spend more time reviewing code, making sure it is what you wanted, with the level of care you want it to have and thinking about how it fits with the rest of the codebase.I like what Antirez wrote on Automatic programming, even if I don’t like the naming. The idea is that we know what we are doing, we are not amateurs, we do not vibe. To invest in this way of working means to choose better tools, and a language is one of those choices that heavily influences your results.

Even if you go further and find yourself a good enough oracle that you feel needs no supervision, the work doesn’t disappear, it moves upstream. You still have to work designing the guard-rails so it doesn’t go off-track. The language you choose determines how well you can do that: what invariants you can encode, what the type system can verify on your behalf, how much you can say with a signature alone.

Further reading

If any of the themes touched upon this article piqued your interest, I have the following reading recommendations: