There it is.

Torn between Programming Language Design and Scientific Computation, I chose over the last few months to bite off some very big problems, with extremely uncertain pay-off.

Whether we succeed or we fail, we're certainly guaranteed to a learn a lot!

1. The Great malloc Foot-Gun: Manual Memory Management

Zig has a small feature-set, comparable in spirit to C. What's more, it prevents many of simple mistakes you might make in C, which would otherwise cause you to "shoot yourself in the foot" (colloquially referred to as a "foot-gun").

These extra protections include:

  • The active field of a union is automatically checked
  • Out-of-bounds array access is automatically checked
  • Pointers cannot be null, unless specifically marked as "possibly empty"

The features that enable these are Tagged Unions, Slices and First-class Arrays, and Optional Types. Even better, these are implemented at zero-cost by performing their checks at compile-time or, when that's not possible, by instrumenting debug builds, leaving release builds as fast as normal C. You can find the rest of the sales pitch on their home page.

There's one big problem it hasn't solved yet though, which Rust famously has: Manual memory management, including dangling pointers, data races, and memory leaks

With this background, our goal is obvious: Slay the Great malloc Foot-gun

Rust gives us a lot of importants hints here, since it has already solved this problem. However, its complexity as a language and its strictness toward safety mean that it is not always preferred for prototyping and experimentation, which Zig considers to be a first-class use case.

As such, we're going to have to get creative about this one. As far as usage semantics, I don't think it gets any simpler than what Rust has put together, but we do have hope to:

  1. Expose the implementation of the semantics in a more simple language, opening the door to enforcing other invariants with similar machinery
  2. Be less strict about application, so that the user can gradually introduce statically-verified pointers into their code as they finish prototyping

There is some fascinating literature to explore here. Separation Logic is the deepest and most fundamental, and clearly still has much further to go in terms of its theoretical development. There's a brilliantly clear and practical introduction from the ACM here.

More on this beast later :)

2. Expressivity: Foundations to a Collaborative Community

This one is inspired by Julia, of course!

Especially since eagerly streaming half of JuliaCon 2021 a month ago, I'm very much in awe of the speed and creativity with which the community cooks up new ideas and re-purposes old ones.

Certainly a big part of this is due to the folks involved: scientists of diverse backgrounds are naturally going to be exposed to a lot of different ideas and eager to pick up new ones, happily sharing their advances with others. This is true, and it's is a very beautiful thing. Nonetheless, I think it's really the confluence of technology and community that are generating this kind of ingenuity.

I believe the core technological piece that makes the difference is something called The Expression Problem (coined in 1998!). It's a beautifully simple statement of something that seems like it should be possible, but isn't in almost any well-established language I'm aware of:

  1. Extensibility in both dimensions: A solution must allow the addition of new data variants and new operations and support extending existing operation.
  2. Strong static type safety: A solution must prevent applying an operation to a data variant which it cannot handle using static checks.
  3. No modification or duplication: Existing code must not be modified nor duplicated.
  4. Separate compilation and type-checking: Safety checks or compilation steps must not be deferred until link or runtime.
  5. Independent extensibility: It should be possible to combine independently developed extensions so that they can be used jointly

Some good posts on this topic can be found here and here.

Notice that a direct consequence of a solution is composability. Independent extensions can be used in concert with one another. This is important when we think about the total space of "functionality" that is implemented as a function of developer time.

Composability means that as development happens across the community, the total program "functionality space" that is covered can potentially scale combinatorially. Without a solution to The Expression Problem, this coverage is generally limited to linear scaling, due to restricting to fixed combinations of components.

My hunch is that this difference in scaling is Julia's real "secret sauce", even moreso than its "Python-like" features, including an easily read-able syntax and dynamically-managed objects. Although I must admit that having terse, comprehensible implementations of algorithms is also a huge boon for ingenuity.

Aside: The reason that this Problem has not been solved more satisfactorily is that, practically speaking, most teams choose to break rule (3). That is, you can modify anything you want if you control all of the code. This works to get your project done, but it doesn't work to create a collaborative community.