Proposal: Support inference of type constructor parameters in function parameters
Motivation
Currently writing a function which is comptime
-polymorphic requires using anytype
, which removes a lot of potential for self-documentation in the function prototype.
Compare the monomorphic version of a basic append
function versus its generic counterparts:
// Monomorphic
fn u8_append(l: ArrayList(u8), x: u8) // <- Fully-monorphized, clear types
// Compile-time Polymorphic
fn generic_append1(l: anytype, x: anytype) // <- Option 1: Have to guess or read implementation
fn generic_append2(comptime T: type, l: ArrayList(T), x: T) // <- Option 2: Better, but this is _less_ convenient
// at the call-site, discouraging use!
The reader of generic_append1
has to guess somewhat blindly that this function is actually trying to be generic by allowing u8
to vary. Without the monomorphic version nearby or inspecting the implementation, it'd be especially hard to understand what's permitted to pass into the function. generic_append2
is a bit better, but it's verbose to read and, more importantly, it changed the shape of the function to be less convenient to use.
The user is forced to choose between an uninformative function signature which support convenient type inference, or an informative one which requires explicitly passing in the correct type separately.
Wouldn't it be nice if we could just express this variance directly? We'd ideally like to be able to write functions like:
fn append(l: ArrayList(any), x: any) // More readable
fn append(l: ArrayList(any T), x: any T) // Even better, fully constrained!
Proposal
At first glance, this inference appears to be a basic unification problem. However, because functions are allowed to perform arbitrary computation on their parameters before constructing a type
instance, it's impossible to back-calculate from a given type
the parameter values that were given to the function that generated it.
Even if you try to store the input function parameters somehow, one fn(...) type
can delegate to another, making parameter inference undecidable for the compiler and ambiguous to the user.
The problem here is that type
objects have no canonical constructor function, whose parameters we could associate with the type
and use for unification. To enable this, we take inspiration from #1717 and borrow the fn
syntax for struct
.
A struct
block can now take its own set of parameters:
const FixedArray = struct(T: type, N: usize) {
contents: [N]T
};
var x: FixedArray(u8, 5) = ...
FixedArray
is called a type constructor function, which we will informally call a type(...)
function, distinguishing these from fully-constructed types (e.g. FixedArray(u8, 5)
or u8
).
The key restriction that makes this proposal possible is that we only allow inferring parameters to type(...)
functions, not to fn(...) type
functions. In the compiler, a constructed type
saves the parameters provided to the type(...)
function, so that these can be used for inference at the call site to a function as above.
The value inference problem itself then becomes a first-order term unification problem, with no metavariables on one side (the object being passed in). This is decidable and straightforward to compute.
Extending to union
For completeness, union
will need the same treatment as struct
, but we already have a union(tag)
syntax for tagged unions.
Thankfully, the existing syntax isn't incompatible with a naive solution, although it is a bit awkward:
// Tagged
const Result = union(enum)(T: type) {
ok: T,
not_ok: void,
};
// Untagged
const SpecialEnum = union(T: type) {
default: T,
int: u32,
float: f32,
};
Any suggestions to improve this syntax are more than welcome.
Backwards compatibility
Existing usage of fn(...) type
would be backwards-compatible with this change, but these type
objects are not type(...)
functions, so they cannot be used for inference. To support inference, the module author has to expose a type(...)
function, either by defining it at the top-level or, less commonly, returning one from a function unevaluated.
Extension: Erased type parameters
As an extension to this proposal, we can support erased T
to indicate an inferred parameter that should be constrained but not used to specialize the function. In this case, any constraints on T
are still enforced statically at the call-site, but T
is not allowed to be used inside the function.
Note that C's infamously polymorphic void *
is spiritually akin to * erased T
. Similarly, Zig's existing unsized array type [*]T
could be interpreted as *[erased]T
.
This feature is useful, for example, for ensuring at compile-time that multiple parameters are type-compatible, without specializing a function at comptime
:
fn accumulate(a: *LinkedList(erased T), b: *fn(*erased T) u64) u64
This can also be used to make sure that a method dispatch table (i.e. vtable) is compatible with an object it is used with. It also makes the use of run-time vs. comptime
polymorphism more symmetric:
// run-time polymorphic
fn accumulate(a: *LinkedList(erased T), b: *fn(*erased T) u64) u64
// comptime polymorphic
fn accumulate(a: *LinkedList(any T), comptime b: fn(*any T) u64) u64
As usual with run-time erasure, erased
types live behind pointers. Such a pointer is opaque and must be cast to a non-erased type
before being dereferenced.
In theory, it may be possible to dereference and use an erased
type for specific accesses/operations, provided that the operation is solely a function of its non-erased parameters. However, such behavior would be an extension of this proposal.
I'm not certain, but I believe this also opens the door to additional debug safety: The compiler should be able to replace an erased
pointer with a fat pointer describing its erased
parameters. Then, when a specialized function fn(x: *A)
is cast to an erased
function pointer *fn(x: * erased T)
, it can be wrapped in a check that inspects the fat pointer information to make sure that the underlying type is A
.
If this works, it may help to avoid a foot-gun with run-time polymorphism today, as described in (Minor) Proposal: parent
Pointer Attribute for safer polymorphism in Zig
Details: Type constructor functions as variables ("Sandwich" Inference)
"Sandwich" inference is when one of the type constructor functions in the signature is a metavariable to be inferred.
The sandwich part refers to the fact that the variables can appear in-between constant terms, rather than just at the top-most level of parameters. This can lead to pretty strange and complex patterns:
fn(x: any L(any T, any N), x: T[N]) u32
fn(x: any T(u32, u8)) u32
fn(x: ArrayList(any T(u32, u8))) u32
This inference is perfectly decidable as a unification problem, but these patterns often feel more confusing than helpful. For that reason, it may be preferable to forbid sandwich inference, at least until a clear use case arises.
Details: Syntax and keywords
Not to bikeshed, but syntax is fairly important for this proposal so I wanted to flesh out the options a bit. Especially if erased parameters are accepted as an extension, we'll want to make sure we maintain some consistency between the runtime-erased and comptime-specialized type inference keywords.
I've used any
/erased
in this document, but there are other variations, as well:
Option 1 Option 2 Option 3 Option 4
Comptime specialized: any T | comptime any T | any T | var T
Run-time erased: erased T | any T | undefined T | erased T
I've been preferring Option 1 on the grounds that Option 2 is too verbose and Option 4 is misleading by labelling comptime specialization with the var
keyword. Option 3 is a good candidate as well, since it doesn't require introducing a new keyword, but it seems unintuitive to have a potentially-constrained metavariable labelled with undefined
, even if it's true that you're not allowed to use it within the function.
This is also related to https://github.com/ziglang/zig/issues/5893.
Resolving the question for this proposal would also imply a solution for #5893. Furthermore, some of the conceptual unification that issue gets at seems to be provided by this proposal. anyframe
could become Frame(any)
and anyerror
could become Error(any)
, although the special-case keywords are easily spotted so they might be worth keeping around.
If this proposal is accepted, we might also want to consider an alternative slice syntax such as *[var]T
, since many newcomers to the language seem to be suprised that []T
is actually a reference type.
Finally, in order to disambiguate between inferred parts of a term versus constants, we should require erased
/any
for every appearance of a metavariable T
. Of course, a metavariable T
should never shadow a constant.
Details: @TypeInfo/@Type
@TypeInfo
/@Type
would be supported exactly as they are today. In particular, they would not support creating a type(...)
function.
There are places in the standard library (e.g. enums.zig
, mem.zig
, meta.zig
) that use @Type
today. These use cases would benefit from a built-in @TypeFunction()
which converts a fn(...) TypeInfo
function into a type(...)
function, if inference is really desired.