Some very obvious and easily avoidable problems (of the binary format): \* Messa...

Some very obvious and easily avoidable problems (of the binary format):

* Messages are designed in such a way that only the size of the constituents is given. The size of the container message isn't known. Therefore the top-level message doesn't record its size. This requires one to invent an extra bit of the binary format, when they decide how to delimit top-level messages. Different Protobuf implementations do it differently. So, if you have two clients independently implementing the same spec, it's possible that both will never be able to communicate with the same service. (This doesn't happen a lot in practice, because most developers use tools to generate clients that are developed by the same team, and so, coincidentally they all get the same solution to the same problem, but alternative tools exist, and they actually differ in this respect).

* Messages were designed in such a way as to implement "+" operator in C++. A completely worthless property. Never used in practice... but this design choice made the authors require that repeating keys in messages be allowed and that the last key wins. This precludes SAX-like parsing of the payload, since no processing can take place before the entire payload is received.

* Protobuf is rife with other useless properties, added exclusively to support Google's use-cases. Various containers for primitive types to make them nullable. JSON conversion support (that doesn't work all the time because it relies on undocumented naming convention).

* Protobuf payload doesn't have a concept of version / identity. It's possible, and, in fact, happens quite a bit, that incorrect schema is applied to payload, and the operation "succeeds", but, the resulting interpretation of the message is different from intended.

* The concept of default values, that is supposed to allow for not sending some values is another design flaw: it makes it easy to misinterpret the payload. Depending on how the reader language deals with absence of values, the results of the parse will vary, sometimes leading to unintended consequences.

* It's not possible to write a memory-efficient encoder because it's hard / impractical sometimes to calculate the length of the message constituents, and so, the typical implementation is to encode the constituents in a "scratch" buffer, measure the outcome, and then copy from "scratch" to the "actual" buffer, which, on top of this, might require resizing / wasting memory for "padding". If, on the other hand, the implementation does try to calculate all the lengths necessary to calculate the final length of the top-level message, it will prevent it from encoding the message in a single pass (all components of the message will have to be examined at least twice).

----

Had the author of this creation tried to use it for a while, he'd known about these problems and would try to fix them, I'm sure. What I think happened is that it was the first ever attempt for the author in doing this, and he never looked back, switching to other tasks, while whoever picked up the task after him was too scared to fix the problems (I hear the author was a huge deal in Google, and so nobody would tell him how awful his creation was).