IO Buffer Initialization

Background

The standard library’s Read trait is defined as

pub trait Read {
    fn read(&mut self, buf: &mut [u8]) -> io::Result<usize>;
}

The read method is passed a buffer into which it copies bytes, returning the number of bytes read. 

The simple way to initialize a buffer for use with a reader is to zero-initialize it:

let mut buf = vec![0; 4096];
let nread = reader.read(&mut buf)?;
process_data(&buf[..nread]);

That approach isn’t ideal though, since the work spent to zero the buffer is wasted. The reader should be overwriting the contents that we care about. Ideally, we wouldn’t have to perform any explicit initialization at all:

let mut buf = Vec::with_capacity(4096);
unsafe { buf.set_len(4096); }
let nread = reader.read(&mut buf)?;
process_data(&buf[..nread]);

However, this is unsound when working with an arbitrary reader. The Read trait is not unsafe, so we can’t assume it’s implemented as we’d expect. The implementation could read from the buffer, or return the wrong number of bytes read. In either case, this code would be reading uninitialized memory.

struct BrokenReader;

impl Read for BrokenReader {
    fn read(&mut self, buf: &mut [u8]) -> io::Result<usize> {
        Ok(99999999999)
    }
}

struct BrokenReader2;

impl Read for BrokenReader2 {
    fn read(&mut self, buf: &mut [u8]) -> io::Result<usize> {
        if buf[0] == 0 {
            buf[0] = 1;
        } else {
            buf[0] = 2;
        }
        Ok(1)
    }
}

Uninitialized memory is a dangerous beast. Reading from uninitialized memory does not just produce an arbitrary value, but actually an undefined value. Undefined values can very quickly turn into undefined behavior - for example, code could observe an undefined boolean value to be both true and false (or neither true nor false) without the value ever being explicitly changed.

We want to be able to take advantage of the improved performance of avoiding buffer initialization without triggering undefined behavior in safe code.

Constraints