Bloaty Puffs and the Go Compiler
by Jeff Baker, June 2020


Protocol buffers is a really large library. The Go compiler isn’t helping. Some source code changes can make programs more compact.

The Story

Brad complained about the size of a Go program with Protocol Buffers, and as I can’t let a protobuf performance issue slide by, I decided to have a look.
It doesn’t seem like they could have been complaining about the heap usage of such a program, because a Go program with a trivial protocol message only uses about 40kB more than a program with the equivalent trivial struct. But the size of the program itself was a lot bigger: 6.3MB compared to 3.3MB on x86_64 with Go 1.14.3.

For reference, Doom II was 700KB.

I made a diff of the symbol table between the two programs. Transitive dependencies of protobuf — like time.Parse and the whole regexp package — add up to 25% of the margin, but the rest is in protobuf itself, about half of that in protobuf/internal/impl and a lot going to support things like the text format, reflection, and the registry. You can explore this unreadable radial treemap for more information.

How does all this code add up to three million bytes? There are a lot of functions and Go’s generated code has a tendency to be surprisingly long. Sometimes a function gets inlined a million times when it might have been better to leave it out-of-line (or as a goto, which I’ll discuss in a moment), sometimes an out-of-line call is generated but the Go calling convention is so noisy that the call site ends up being longer than the inline would have been. Go’s calling convention is a known problem for peak performance, but changing it is going to break every assembly function in the world, so fixing it is still in the future.
Let’s look at protobuf/proto.UnmarshalOptions.unmarshalList, the largest symbol in the program. It is 18895 bytes long (slightly larger than /sbin/mkfs). The parseError function has been inlined everywhere, resulting in blocks of code like these.
0x5585c0 CMPQ $-0x4, CX 
0x5585c4 JG 0x55862c 
0x5585c6 CMPQ $-0x5, CX 
0x5585ca JNE 0x558606 
0x5585cc MOVQ, AX 
0x5585d3 MOVQ, CX 
0x5585da MOVQ $0x0, 0x428(SP) 
0x5585e6 MOVQ CX, 0x430(SP) 
0x5585ee MOVQ AX, 0x438(SP) 
0x5585f6 MOVQ 0x3c0(SP), BP 
0x5585fe ADDQ $0x3c8, SP 
0x558605 RET 
0x558606 CMPQ $-0x4, CX 
0x55860a JNE 0x55861c 
0x55860c MOVQ, AX 
0x558613 MOVQ, CX 
0x55861a JMP 0x5585da 
0x55861c MOVQ, AX 
0x558623 MOVQ, CX 
0x55862a JMP 0x5585da 
0x55862c CMPQ $-0x3, CX 
0x558630 JNE 0x558642 
0x558632 MOVQ, AX 
0x558639 MOVQ, CX 
0x558640 JMP 0x5585da 
0x558642 CMPQ $-0x2, CX 
0x558646 JNE 0x558658 
0x558648 MOVQ, CX 
0x55864f MOVQ, AX 
0x558656 JMP 0x5585da 
0x558658 MOVQ io.ErrUnexpectedEOF(SB), CX 
0x55865f MOVQ io.ErrUnexpectedEOF+8(SB), AX 
0x558666 JMP 0x5585da