Topic: Validity Invariants
Audience
Compiler, Lang— Unsafe Code Guidelines
Key People
RalfJ
When
Monday, 14:00 - 15:00
Where
Gainsboro
Meeting Style
Discussion
Deliverables

Homework to do before meeting
Agenda
  • if we have time: what about unsized types, in particular considering custom DSTs?
  • Maybe record?

NOTES


What can we pick and what can’t we pick as invariants.
Ralf proposes having two kinds of invariants: Safety, and Validity
“Safety” is what you can assume about a safe function when it interacts with an instance of a data type. It can be user-defined; e.g. a Vec<T> has an invariant relating its len field to the contents of the heap-allocated array of data.
Defining safety invariant is hard, and proposed to be out of scope for this meeting. Its something that user code knows about and makes use of, but the compiler is not aware of (and does not make use of in its analyses and transformations of the code).
A “Validity” invariant is something the compiler knows about. It is an invariant that holds at any point, including during all execution of unsafe code.
Centril: Validity corresponds to invariant of the abstract machine, while Safety is the invariant of the type system.
Validity invariants in Rust: surveying all the types and type constuctors.
Two most interesting cases: integers, and references

For integers:

the abstract machine is different from concrete hardware. Every bit has three potential states: either 0, 1, or uninitialized.
The question: Do we want to allow the existence of an integer that is uninitialized.
cramertj: One could imagine having a fourth “poison” state in addition to “uninitialized”
ralfj: “poison” is essentially “uninitialized”
cramertj: well it may be useful to have it correspond to “undefined”
ralfj: LLVM’s use of Undef is fraught with trouble. We don’t want to run into the issues they did
cramertj: One might imagine abstract machines that support passing a &Undef into a safe function … (subdiscussion with ralf about whether this is well-defined and/or useful, eventually coming around to ... uninitialized integers … but with a sidetrack differentiating uninitialized memory from non-deterministically initialized memory)
ralfj: What are the interesting use cases for uninitialized integer data
ralfj: e.g. byteslice of mem::uninitialized that you pass into a different function for initialization
ralfj: so if we were to allow uninitialized values in integers, you could read the values and write them out. It would be UB to do arithmetic (e.g. + or / ) with them.
ralfj is not sure whether we would want to support arithmetic comparisons with uninitialized values. Crossbeam may want that to be supported.
someone gives an example of an fn foo(a: u32, b: u32) where b is only guaranteed to have a value if a == 1, and then asks if it is okay to pass foo(0, uninitilialized). ralfj agrees that this is what we are talking about: That’s allowed if and only if the validity invariants allow it.

For references:

ralfj: We know references should be (1.) non-null, (2.) aligned, and (3.) referenceable.
ralfj: Does the validity invariant for references say anything more? e.g. is it insta-UB to have a &! (where I don’t dereference it). In other words, does validity recurse into a type.
ralfj: another less-extreme example: can &bool point to 3?
several examples given of code in wild that has uninitialized memory and then takes a reference to it.
cramertj specifically says that they would expect the compiler to optimize based on knowledge that  &!  cannot be valid. This led ralfj and eddyb to dive a little more into their examples, e.g. a vtable for some trait implemented for &! (where ralfj and eddyb said such code should be eliminated before you even try to make a vtable).
centril: we should gather data on how much one sees references to uninhabited types in the wild