YAML is a poorly engineered format. It doesn't think about storage at all, for example (i.e. in its definition, it's just a single text blob. What if you need to split it?
The format is awful for random access: you need to read the whole thing to extract a single data item. If you need to query such a configuration multiple times you'll do a lot of unnecessary work.
YAML poorly defines numerical types (how many digits can a float have? what about integers? Are floats and integers the same thing?) YAML lists... allow mixed types. Not sure it's such a good thing from performance and correctness perspective.
YAML has unnecessary "hash-table" type. Hash-tables aren't a data format, they require function in the reader that makes asymptotic properties of hash-tables work. In fact, YAML has just a weird sub-type of list, where elements come in pairs. Since it inherited all this from JSON, there's also the ambiguity related to repeated keys in such hash-tables -- what should an application do with those nobody knows.
YAML sucks for transmission due to the ambiguity caused by repeated keys in "hash-tables". You cannot stream such a format if you have the policy that the last key wins or if you have the policy that repetition is not allowed. Also, since YAML allows references but doesn't require that the reference lexically point to a previously defined element, you may have unresolved references which, again, will prevent you from streaming.
YAML defines mappings to "native" elements of different other languages... but what should you do if you are parsing it in Ruby, but get a mapping to Python?
YAML schema sucks. It would require a separate "expose" to describe why.
YAML comes with no concept of users or namespaces which would be necessary for ownership / secure sharing of information. It also comes without triggers which would allow an application to respond to changes in data.
YAML is very hard to parse for no reason. It's very easy to make a typo in YAML that will not make it invalid, but will be interpreted contrary to what was intended.
YAML doesn't have a canoncial form which makes comparing two elements a non-trivial and in some cases undefined task.
YAML doesn't have variables, nor does it have forall / exists functionality. This results in a lot of tools that work with YAML overlaying it with yet another layer of configuration, which includes variables (eg. Helm charts, Ansible playbooks).
----
I mean, honestly, people who created YAML probably saw it as an inconsequential, sketchy project that should take maybe a weekend or two. They didn't plan for this format to be the best... they just made something... that sort of did something... but not really. I have no idea why something like this received the acclaim that it did.
I use ini files for my Windows application [1]. They are enough if you don't need multiple levels of nesting. The simpler the format is, the better.
When storing settings in config files, you can just copy the program folder to another computer and your settings will be transferred. This is called "a portable application" in the Windows world. When using the registry, you need to export the registry keys and import them, which makes it hard to backup all your apps with their settings or to set up multiple computers to use the same apps with the same settings.
Indeed, even the meaning of the name YAML admits failure: "Yet another markup language".
The year is 2023. Can we really not come up with better things for configuration?
I mean, we did, 25 years ago With XML. The problem is that it was too much of a dumpster fire when it came to tooling, and folks just abused it until everyone was sick of it in the early naughts. But the fact remains, XML meets all the needs for configuration files, is flexible enough to handle large and small use-cases, and with the right tooling, can be a breeze to deal with.
But no. Instead we have half-solutions like json, and worse, yaml.
I have never seen any programming project back-peddling to its previous design in recognition that it was better than the current one. Even though, unlike in many professions, we have the means (via source-control).
The hope is that as time passes, the generation that had first-hand experience with the old thing will die off and the new generation will re-discover things unjustly forgotten.
> I mean, honestly, people who created YAML probably saw it as an inconsequential, sketchy project that should take maybe a weekend or two. They didn't plan for this format to be the best... they just made something... that sort of did something... but not really. I have no idea why something like this received the acclaim that it did.
One of the co-creators responded [1] to an article [2] about how Yaml is “not that great” on this site.
It comes off to me as, well we did it for free and people use it so I’m happy abuot it.
Well, if something is free, it does make it good, just not in engineering sense...
It's valid to make things w/o thorough design or deep knowledge of the domain you work in. It's a good way to learn how to do things. The blame isn't with the authors. The blame is with the community which mistook YAML for a good configuration format.
Try configuring systemd (that's including all the services it has to run) in INI... I mean, it's sort of what it does with unit files... but this means you now have hundreds of INI files. They are all related, but you don't really know how. You could try versioning them, but you don't even know where all of them are, so every configuration change is a minefield.
Or, another, perhaps a simpler mental experiment: try writing a CloudFormation template in an INI format. After all CloudFormation templates are configuration for the service that creates virtual appliances. If you look at the current JSON / YAML way to define this kind of template, you'll see that YAML isn't expressive enough to do it, and they had to invent a lot of meta-language conventions to be able to express what they need. With INI your struggle will be monumental to accomplish that.
YAML is a poorly engineered format. It doesn't think about storage at all, for example (i.e. in its definition, it's just a single text blob. What if you need to split it?
The format is awful for random access: you need to read the whole thing to extract a single data item. If you need to query such a configuration multiple times you'll do a lot of unnecessary work.
YAML poorly defines numerical types (how many digits can a float have? what about integers? Are floats and integers the same thing?) YAML lists... allow mixed types. Not sure it's such a good thing from performance and correctness perspective.
YAML has unnecessary "hash-table" type. Hash-tables aren't a data format, they require function in the reader that makes asymptotic properties of hash-tables work. In fact, YAML has just a weird sub-type of list, where elements come in pairs. Since it inherited all this from JSON, there's also the ambiguity related to repeated keys in such hash-tables -- what should an application do with those nobody knows.
YAML sucks for transmission due to the ambiguity caused by repeated keys in "hash-tables". You cannot stream such a format if you have the policy that the last key wins or if you have the policy that repetition is not allowed. Also, since YAML allows references but doesn't require that the reference lexically point to a previously defined element, you may have unresolved references which, again, will prevent you from streaming.
YAML defines mappings to "native" elements of different other languages... but what should you do if you are parsing it in Ruby, but get a mapping to Python?
YAML schema sucks. It would require a separate "expose" to describe why.
YAML comes with no concept of users or namespaces which would be necessary for ownership / secure sharing of information. It also comes without triggers which would allow an application to respond to changes in data.
YAML is very hard to parse for no reason. It's very easy to make a typo in YAML that will not make it invalid, but will be interpreted contrary to what was intended.
YAML doesn't have a canoncial form which makes comparing two elements a non-trivial and in some cases undefined task.
YAML doesn't have variables, nor does it have forall / exists functionality. This results in a lot of tools that work with YAML overlaying it with yet another layer of configuration, which includes variables (eg. Helm charts, Ansible playbooks).
----
I mean, honestly, people who created YAML probably saw it as an inconsequential, sketchy project that should take maybe a weekend or two. They didn't plan for this format to be the best... they just made something... that sort of did something... but not really. I have no idea why something like this received the acclaim that it did.