The Yaml document from hell

dig1 · on Jan 12, 2023

> is that for a long time it was the only viable configuration format

Actually, this is not the case. We had INI format for simple stuff and XML (protected with entire schema) for complex things many years ago, which worked. Yet, we wanted something readable (like INI), but able to express complex types (XML).

I don't think Toml is a viable replacement - for me, it has an INI-level of simplicity with even worst hacks to have nested structures. But, give it time and you'll have another YAML.

But yes, YAML is confusing for edge cases (true/false, quoting), but I'm going to find a powerful replacement that is not XML. Maybe EDN can jump in, but for anything more complex, I'd rather have a Lisp interpreter and structures around than any of mentioned and upcoming formats.

Thinking that JSON is a suitable replacement, imagine writing Ansible or k8s stuff in it; that would be fun ;)

CipherThrowaway · on Jan 12, 2023

> Thinking that JSON is a suitable replacement, imagine writing Ansible or k8s stuff in it; that would be fun ;)

Writing in YAML doesn't feel much better. YMMV but I've been on teams using Pulumi for k8s and the developer experience has been significantly better. I can automate, type check, lint, click through to definitions the same way I do with other typescript.

Pulumi is a young product with many rough edges but it's already been a game changer for me.

PaulHoule · on Jan 12, 2023

JSON has no comments. That disqualifies it immediately.

bogwog · on Jan 12, 2023

    {
        “/*These are the ports we use”: “*/”,
        “ports”: [22, 80, 443]
    }

ok123456 · on Jan 12, 2023

Now, write a json-schema for that.

dtech · on Jan 12, 2023

Can be done by either additionalProperties: true and not specifying the property, or "patternProperties": { "^/\*": { "type": "string" }}.

ok123456 · on Jan 12, 2023

I'd rather have real comments than this epicycle.

nawgz · on Jan 12, 2023

"additionalProperties": true

deterministic · on Jan 14, 2023

“Comment”: ”This is a comment that is optional and ignored by code”

zvr · on Jan 15, 2023

"Comment": "but you can only have one per object"

deterministic · on Jan 16, 2023

"Comment1": "This is the first comment"

"Comment2": "This is the next comment"

or

"Comments": ["Here is one", "Here is another!"]

or

"Comments": { "Name": "This is the name of the customer", "Age": "This is the age (0 to 1000) of the customer" }

or

"Values": [ {"Name":"Age", "Value":"20", "Comment":"This is the age of the customer"}, {"Name":"Name", "Value":"Mr. Foo Bar", "Comment":"This is the name of the customer"} ]

The only limit is your imagination.

int_19h · on Jan 12, 2023

Many JSON dialects do in practice.

cocochanel · on Jan 12, 2023

I checked the sample Kubernetes project in Java [0] and it looks hideous. YAML looks much better in comparison.

[0] https://www.pulumi.com/docs/get-started/kubernetes/review-pr...

danudey · on Jan 12, 2023

What, you mean you don't want to write code like

  Deployment("nginx",DeploymentArgs.builder().spec(DeploymentSpecArgs.builder().selector(LabelSelectorArgs.builder().matchLabels(labels).build()).replicas(1).template(PodTemplateSpecArgs.builder().metadata(ObjectMetaArgs.builder().labels(labels).build()).spec(PodSpecArgs.builder().containers(ContainerArgs.builder().name("nginx").image("nginx").ports(ContainerPortArgs.builder().containerPort(80).build()).build()).build()).build()).build()).build());

lifehasleft · on Jan 13, 2023

I actually laughed out loud when I looked at the example and just saw a bunch of carefully-indented “.build()”s

CipherThrowaway · on Jan 12, 2023

Don't disagree there. Java support is less than a year old and the Java API is.. not great. TypeScript is the more popular and mature choice.

ilyt · on Jan 12, 2023

It's very simple actually.

... just fucking don't, generate config in your configuration management tool of choice and then serialize it to YAML. You get all of the advantages (nice to read)and none of the disadvantages (need editor or it is PITA)

brnewd · on Jan 12, 2023

I would love to swap Pulumi for Ansible, but it's cloud-only for the most part.

urbandw311er · on Jan 12, 2023

> YAML is confusing for edge cases (true/false, quoting)

I'm not sure I would consider booleans an 'edge case'

esrauch · on Jan 12, 2023

Booleans aren't an edge case, on is an edge case way to spell false.

411111111111111 · on Jan 12, 2023

specifically according to https://yaml.org/type/bool.html:

     y|Y|yes|Yes|YES|n|N|no|No|NO
    |true|True|TRUE|false|False|FALSE
    |on|On|ON|off|Off|OFF

Not all parsers support all of these options, but true|false is always supported as YAML is a superset of JSON which makes these mandatory.

cemerick · on Jan 17, 2023

That is actually only the case for yaml 1.1; the 1.2 spec (circa 2009!) eliminated this forest of boolean encodings, leaving only `true` and `false`: https://yaml.org/spec/1.2.0/#id2602744

xmonkee · on Jan 12, 2023

I can't imagine thinking this is a good idea if you've done any programming whatsoever. What happened here?

randomswede · on Jan 13, 2023

If you aim to be "human-friendly" (and that is, as I understand, the raison d'etre for YAML), there is a subtle semantic difference between "true" and "on" (and "false" and "off") and as a human it may be nice to express that semantic difference.

As for that semantic difference, if we expect the light source to have one of exactly two states (that is, "not a dimmable light"), we probably want to express that as "lightsource: on" rather than "lightsource: true".

And that is where the friction between "humanfriendly" and "computer-friendly" starts being problematic. Computer-to-computer protocols should be painfully strict and non-ambiguous, human-to-computer should be as adapted as they can to humans, erring on "expressive" rather than "strict".

I am also not sure if I am happy or sad that the set of configuration languages in the original article didn't include Flabbergast[1], which was heavily inspired by what may be simultaneously the best and worst configuration language I have seen, BCL (a language that I once was very relieved to never have to see again, and nine months later missed so INCREDIBLY much, because all the other ones are sadly more horrible).

[1] https://www.flabbergast.org/

orra · on Jan 12, 2023

Even leaving aside the word <on>, it's problematic that, syntactically, <true> and <foobar> look the same but are treated differently.

esrauch · on Jan 15, 2023

Its something of a tradeoff space that is always there you either have things that look the same be implicit (eg is 1 and 10000000 the same type? How about 1 and 1. ? How about 1 and -1?, how about 1 and moo? 1e2 and 1f2?

You can have things be explicit and verbose (have explicit unambiguous identifiers on every value), or you can have things magic and then surprising in something like this.

Rails also made a similar decision to basically auto convert what would be the string true to the Boolean value true in many cases. It's not obviously right or wrong, it's just a choice that has tradeoffs.

I think my own sensibility is that it's actually reasonable for something like yaml to allow you to choose to omit quotes and if you do then it's automatic typing: 1e2 is a number and 1z2 is a string. If you want it to be a string you can put quotes on every value if you want, exactly like you would in json, choosing unquoted for things that are strings is choosing the automatic typing case.

But these other values like y and n being autotyped as bool are suspect autotype behavior to me: even once you've accepted unquoted is autotyped you are more likely to be caught off guard by this than by 0 vs true vs 0z.

singingfish · on Jan 12, 2023

I'm nose deep in an Oracle to Postgres conversion at the moment and from that experience, among others, I can absolutely assure you that although booleans are definitely simple they are also most definitely a very sharp edge case.

falcolas · on Jan 12, 2023

But who doesn't love the OracleDB tri-state boolean?

/s

int_19h · on Jan 12, 2023

Booleans, or NULL specifically?

singingfish · on Jan 13, 2023

yes (in oracle - at least the early 21st century version I'm using you have to roll your own boolean-ish - traditionally char(1).

ilyt · on Jan 12, 2023

The "edge case" here is "I'm using dynamically typed language and some parsers implemented alternative wording for true/false"

int_19h · on Jan 12, 2023

It was a part of the YAML spec for a long time, so it has nothing to do with dynamic typing or parser extensions.

ilyt · on Jan 12, 2023

It does. If I define struct to contain boolean, Go yaml parser will always parse it to boolean

If I define it as any other type, it will parse it as that type. So any conflicts like in the article just don't happen, or at worst, produce error in parsing.

Static typing in this case acts essentially as schema

lifehasleft · on Jan 13, 2023

Go isn’t the only language using YAML

jimmaswell · on Jan 12, 2023

XML did everything and it's perfectly readable if it's formatted and structured well. This zoo of different markup/object/whatever languages we have to deal with now is largely a mistake.

gnulinux · on Jan 12, 2023

This is confusing because XML is the archetype of what I would consider unreadable. If I got a prompt in a programming language design workshop to intentionally design something unreadable for humans, I would start thinking about XML and see where it leads me. Can you name any language any more unreadable than XML to help me?

jimmaswell · on Jan 12, 2023

https://www.mssqltips.com/tipimages2/2899_img5.jpg

I look at this and I see a visually-well-organized hierarchy of objects with properties and children. It's a sideways tree diagram. I honestly don't get what's supposed to be unappealing about it.

> Can you name any language any more unreadable than XML to help me?

Most of the other markup languages depending on the usecase. The "redundant" tags everywhere like </Address> in XML are very helpful if you're in the weeds of a big document that's taller than your viewport - you still know exactly what you're looking at and where in the hierarchy you are, as opposed to JSON or YAML which I find much easier to get lost in.

Anyone can pick up a (properly-named) XML document and tell what it means at a glance, but JSON is easily a mystery unless you know the implicit schema beforehand.

Maybe there's some minor friction to get past looking at XML because it's a little visually busy before you learn how to parse it out? Is that all there is to it?

lifehasleft · on Jan 13, 2023

I understand where you’re coming from because it’s true XML is going to perform the functions required without all the problems implicit with YAML’s “readable” syntax…

But I have worked with some very complex XML and I feel you’ve chosen a simpler to read example.

I have seen very complicated XML from Microsoft’s own configurations that is so large, long, and complicated that it is very hard to follow

wyuenho · on Jan 12, 2023

XML is a mistake. 80% of every XML document is just redundant noise because the IBM lawyer who invented the SGML syntax had never heard of S-expression.

jimmaswell · on Jan 12, 2023

The "redundant" noise makes it more human-readable in my opinion. If you're in the weeds of a big XML document that's larger than your viewport, being able to see that you're at the end of a </Address> helps you keep your place, as opposed to something like JSON which I find much easier to get lost in. Even if it's not that big I appreciate the extra rails on the XML making it easier to keep my place.

wyuenho · on Jan 13, 2023

I consider this a fallacy. The condition you are given is not a common occurence. In the event they do occur, the solution can be found in your editor or viewer. It does not make sense to inject spurious junk into what is designed as a data interchange format, in which includes network information interchange.

There's probably a name for this kind of fallacy that exaggerate the effect of a problem and the resulting contortions that were engineered around it.

int_19h · on Jan 12, 2023

Writing documents with S-exprs will get very tedious once you have to deal with stuff like whitespace and such that will inevitably show up when you're dealing with text documents. SGML is specifically optimized for those use cases, which is why it is so much more awkward to use when all you want is serializing some random structured data.

wyuenho · on Jan 13, 2023

All you need is some sort of Python’s raw string or triple quote or HEREDOC. What’s so hard about that?

int_19h · on Jan 14, 2023

Try writing a document that's mostly text, but with plenty of markup interspersed in the middle, in this manner.

wyuenho · on Jan 14, 2023

We are talking about a configuration language here, which is never about markups. Also, most of the so-called whitespace problem in Ansible or k8 configs comes from embedding config that's not YAML. In these cases you need a path or URI for the processor to resolve before applying, not inverting the entire configuration language to cater to 1 uncommon use case.

int_19h · on Jan 14, 2023

The point of my original comment was precisely that SGML was optimized for text documents. I agree that adopting it for configs was a mistake, but the complaint that "IBM lawyer who invented the SGML syntax had never heard of S-expression" doesn't make any sense in that context.

sam_lowry_ · on Jan 12, 2023

SGML is way shorter than XML.

But as complex as YAML to understand

bogwog · on Jan 12, 2023

XML is too much of a pain to write by hand.

wombatpm · on Jan 12, 2023

XML is like violence. If it’s not solving your problem, you need to use more

tetraca · on Jan 12, 2023

XML is pretty tolerable with a slightly more sophisticated editor that supports completion. The things that make YAML awful never get any better.

jimmaswell · on Jan 12, 2023

Throw in copilot to anticipate all the boilerplate and repetition for you and it's downright pleasant.

berniedurfee · on Jan 12, 2023

It’s not that bad really. About the same as HTML, yet more difficult to mess up.

Has comments, unlike JSON.

More difficult to mess up than YAML.

berniedurfee · on Jan 12, 2023

If JSON and YAML were first on the scene; XML would be invented as an answer.

Hard to mess up. Robust schema language. Very flexible. Easy to process.

Yeah, it’s verbose, but that’s the trade-off between usability and robustness.

Now get off my lawn!

lanstin · on Jan 19, 2023

What XML replaced was a combination of custom designed (often binary) formats and HTTP query-string like syntaxes. It is quite verbose compared to either of them, and explodes both your bandwidth and serialize/deserialize times, but it is arguably easier for your new co-worker to guess that the address line 2 goes in <address2>...</address2> rather than prefix tagged with 13, or worse yet at offset 42-61.

I got used to XML, tho I never could quite understand XSLT and the desire to program in it. I got used to json, but yaml I just can't bring myself to parse. YAML is 90% stuff you can't guess and just 10% data. And why so many?

Normal_gaussian · on Jan 12, 2023

> Thinking that JSON is a suitable replacement, imagine writing Ansible or k8s stuff in it; that would be fun ;)

k8s via helm is often templated via go template strings; which works by creating an unreadable and unhighightable mess, introducing lots of its own bugs.

wyuenho · on Jan 12, 2023

INI has no spec, and there are many variations all slightly incompatible with each other, kinda like Markdown. YAML is really the only sane configuration language that lets you denote nested structures while keeping them look nested.

Basically, all of the problem identified in the article can be dealt with 1 rule - always quote your strings. I agree with the author we should have reduced, safe and minimalist subset of yaml, which is basically YAML 1.2, released in 2009.

P.S. Please just stop using PyYAML.

mike_hearn · on Jan 12, 2023

There's HOCON which is pretty good if you can run on a JVM. It's a superset of JSON designed for readability and human-friendliness when writing config files. It doesn't change the type system and doesn't have yamls weird edge cases, but is still a lot easier to write than JSON. There's also a relatively tight spec.

https://github.com/lightbend/config/blob/main/HOCON.md

We use an extended version of it for our app and the resulting config is pretty clean. You can see an example here:

https://hydraulic.software/blog/8-packaging-electron-apps.ht...

The only downside is that the reference implementation is hardly maintained anymore.

talideon · on Jan 12, 2023

I second that. And if people need to deal with YAML in Python, they should be using ruamel.yaml, which is a far superior library on just about every level: https://pypi.org/project/ruamel.yaml/

lylecheatham · on Jan 12, 2023

Except that the author of ruamel.yaml:

refuses to use git [0]

refuses to take community submissions (except through Stack Overflow? Seems like a misuse of SO) [1]

and refuses to implement .dumps() [2].

He is difficult to work with, and any time I need to debug code that intimately deals with ruamel.yaml types, I wince.

[0] https://github.com/pycontribs/ruyaml/issues/1

[1] https://yaml.readthedocs.io/en/latest/contributing.html

[2] https://stackoverflow.com/a/63179923/15170511

talideon · on Jan 25, 2023

Why would I care that the maintainer prefers Mercurial? The document you linked to doesn't say anything about them not taking community submissions: I think you need to scroll down further in the page. And it certainly doesn't say that you have to use SO to submit changes.

wyuenho · on Jan 12, 2023

Or if you want to go really minimalist there's also https://github.com/crdoconnor/strictyaml

jwilk · on Jan 12, 2023

> Please just stop using PyYAML.

Why?

wyuenho · on Jan 12, 2023

From the article:

  Although yaml 1.2 is more than 10 years old by now, you would be mistaken to think that it is widely supported: the latest version libyaml at the time of writing (which is used among others by PyYAML) implements yaml 1.1 and parses 22:22 as 1342.

GauntletWizard · on Jan 12, 2023

Jsonnet and Cue are the places to be looking.

chazu · on Jan 12, 2023

Cue is one of the most exciting developments that's impacted me professionally in recent years. I can't advocate for it enough.

tremon · on Jan 12, 2023

Cue is interesting, but why is it only available as a command-line tool rather than a library? I'd want to integrate such a configuration language in my programs, so I could use its evaluation and validation capabilities rather than writing a custom parser/validator.

dangoor · on Jan 12, 2023

If you're writing in Go, you can use it as a library. This is poorly documented, unfortunately.

IshKebab · on Jan 12, 2023

Also their website is terrible. One of those projects that assumes you've already decided (or been forced) to use it and have cleared out a week of your schedule to learn how to use it.

mirekrusin · on Jan 12, 2023

I learned cue from it during one weekend with plenty of time to play with kids, using it in production since 2020, it's been absolutely great, zero problems, very terse configs, intuitive formalism.

IshKebab · on Jan 12, 2023

I took another look and eventually found the bit of the website that they should put front and centre in the Tutorial page. Still difficult to navigate (why doesn't the page tree show up on the left?) but it is at least well written and to the point.

The "learn more" button on the front page should link to that, perhaps with a single paragraph giving motivation.

And the main page breaks the fundamental rule of programming languages/formats. Put examples on the front page!

I assumed they hadn't done that because the examples would be too complex or maybe the concepts were too difficult to demonstrate with small examples but having gone through the tutorial that isn't there case at all.

GauntletWizard · on Jan 12, 2023

Are you talking about https://cuelang.org/docs/tutorials/tour/intro/ ? Even that is a bit light on detail for real usage, while https://github.com/cue-lang/cue/blob/v0.4.3/doc/tutorial/kub... is kinda rambling. I don't think that there's "One true way" to introduce these concepts; How you teach cue to a config-generation novice is very different from someone who's used to using an IDE to generate kubernetes YAML.

mirekrusin · on Jan 13, 2023

Spec [0] is very good + practice with small files.

[0] https://cuelang.org/docs/references/spec

GauntletWizard · on Jan 14, 2023

I find specs nearly unreadable when trying to first digest a language; While invaluable for advanced usage and implementation, I can't read a BNF-Style Spec and make heads or tails of what's going on unless I also have an annotated example next to it.

tremon · on Jan 12, 2023

I agree. The website is heavy on theory and very light on practical usage.

throwawaymaths · on Jan 12, 2023

No. XML is broken for structured data that isn't HTML because it's not really clear (there aren't even "best practices" afaict) what should be a text node, what should be an attribute node, and what should be a subtag node.

int_19h · on Jan 12, 2023

It's not like other formats don't face similar questions. E.g. if you have a list of key-value pairs to serialize to JSON, do you translate those keys to JSON properties in an object, or do you translate each pair as an object in an array?

throwawaymaths · on Jan 12, 2023

The motivation for doing that in JSON comes with best practices: you use the key-value objects thing if and only if you have an ad-hoc list of items that you'd really really like to shove into a well-typed schema.

eru · on Jan 12, 2023

Use S-Expressions?

tgbugs · on Jan 12, 2023

My distaste for yaml led me to attempt just that. The first piece that was missing for me was a python parser that could produce reasonable error messages and could be transformed into the desired internal representation in python. So I wrote one [0]. It was supposed to be a single file that was less than 100 lines and could be copied and pasted into any project that I needed. Turns out that the issue was a bit more complex.

The issue is that there is sufficient complexity in finding a portable representation for configuration formats that it just kicks the can down the road. On the other hand it means that as soon as you decide what format you are going to support you can quickly implement it. There is more or less a intersectional grammar that works across most if not all lisps, and that is the plist `(:k v :k2 (:k3 v2))`. So I settled on that for my own use.

After all that work I have not dealt with the fact that numbers and chars do not have a portable representation across lisp dialects, which is a key complaint in other threads here. Limited support for let binding constants also seems like a feature that would allow for just enough expressivity to make the format useful without opening up the terror that is `&` and `*` in yaml (cool and useful as it may be).

In summary s-expressions are: 1. missing good parsers in a number of language ecosystems 2. not standard across lisp dialects 3. need additional semantics for binding, multiple expressions, etc. 4. still better than yaml and json

0. https://github.com/tgbugs/sxpyr

jerf · on Jan 12, 2023

I know what s-expressions are, vaguely. Vaguely in terms of "I couldn't write a grammar for them off the top of my head.", that is, not "what are they".

Is there a single agreed-upon defined grammar that everyone can use? Preferably one simple enough that like JSON's it is at least capable of being used as a graphic on the home page for the format? https://www.json.org/json-en.html

This is an honest question, because there may well be and I don't know it.

However, I will put this marker down in advance: If multiple people jump up to say "oh, yes, of course, it's right here", and their answers are not 100% compatible with each other, then the answer is no.

The other marker I'll put down is "just use common lisp", I want verification that it really is 100% standardized, no question what any construct means, ever, and I still bet we get people who would rather see Scheme or Clojure, and I bet there's some sort of difference.

Neither of these objections is fatal to the idea. JSON is technically not just "javascript objects", so if someone carved out a defined format from s-expressions, then held it up as a standard, that would be as valid as what Crockford did. But at least as of right now, I'm not aware of anyone having done that standardization work. Replies welcome.

mirekrusin · on Jan 12, 2023

Dune [0] is using sexp [1].

[0] https://dune.build

[1] https://dev.realworldocaml.org/data-serialization.html

jerf · on Jan 12, 2023

I don't see a grammar I could use for an external standard there. I could be missing it. And I assume there must be one implied, since it appears to be a read-write format. However such things have a nasty habit of turning out upon further inspection to not be as generic as supposed and have local idioms buried in them, often in surprising places. Even JSON, simple as it is, had that problem, and it is after all a subset in an attempt to squeeze out those localisms.

Is it compatible with Common Lisp, such that all such s-expressions will be compatible with it in some natural way?

mirekrusin · on Jan 13, 2023

Grammar is here [0].

ps. there is also jq like cli [1].

[0] https://github.com/janestreet/sexplib/blob/master/src/parser...

[1] https://github.com/janestreet/sexp

lapinot · on Jan 12, 2023

  SEXPR := YOUR-CUSTOM-LITERALS
         | "(" SEXPR* ")"

zvr · on Jan 15, 2023

Ah, no types, no NIL. Every user has to specify how their usage expresses strings, integers, booleans, etc.

Maybe start from one of the already existing standards https://en.wikipedia.org/wiki/S-expression#Standardization ?

jerf · on Jan 12, 2023

Along with the many other issues that would have for just straight-up not being a usable grammar, the word "CUSTOM" appearing in a nominal cross language standard is a non-starter.

lapinot · on Jan 12, 2023

Oh come on. Did you think anybody would write an RFC in the comments? Add arbitrary list of whitespace tokens everywhere and pick whatever favorite grammar you have for literals. Json strings and numbers since you seem to like them.

jerf · on Jan 12, 2023

Of course not. But you are at least semi-seriously trying to propose something. I promise you it won't be that easy to get any sort of agreement.

This is a classic "oh come on how hard can it be" that, once you get into it, it turns out the answer is very. Arbitrarily so. Agreement is hard. Details are hard. A standard with no details is not terribly different from "just give us some text".

eru · on Jan 13, 2023

Well, I would have suggested Racket. Common Lisp is a bit ugly, both conceptually and syntax-wise.

usrnm · on Jan 12, 2023

With each tool using its own macro-based DSL for managing them. That would be fun

mixedCase · on Jan 12, 2023

S-expressions do not necessarily imply Lisp nor an executable environment, it's a (family of) syntax(es) to build a document format. Like EDN.

ilyt · on Jan 12, 2023

So even worse as you can't use any prior knowledge of lisp

mirekrusin · on Jan 12, 2023

JSON is good because it doesn't let you do stupid things like writing programming languages in it.

revskill · on Jan 12, 2023

This is exactly what i'm doing with this YAML to SQL converter: https://gist.github.com/revskill10/57ecd8efb72f361b93e6d9d9f...

xmcqdpt2 · on Jan 12, 2023

Why???

revskill · on Jan 12, 2023

- No parentheses is needed

- No forced order of select, order, having,... Want from before select ? Fine.

- Composability

- Extensibility.

- Security: Prevent SQL injection when sent from browser.

anonymoushn · on Jan 12, 2023

Is INI a format in the sense that it has a formal grammar and a bunch of compatible parsers and serializers, or is it more of a convention?

chrizel · on Jan 12, 2023

I've never met anyone who say's "I like YAML, it is great"... most people that worked with it say something like "YAML is annoying, I don't like it"...

While introducing Kubernetes at our company in the last two years, we are currently in a process going more and more away from YAML with internal Helm charts to a much simpler process by just using HCL and Terraform, and defining Kuberentes resources as Terraform resources.

As a software developer HCL just makes so much more sense than this YAML + Helm + Go templates hell, which feels like C preprocessor hell all over again. Other solutions like kustomize are neat, but I don't see how all of these YAML workarounds should be better than something like HCL with Terraform. HCL feels like a real declarative programming language (with real conditions, variables, a module system and useful built-in functions). YAML feels like another more complex JSON and other tools like Helm or Kustomize try to work around the weaknesses of YAML with some kind of templating system.

YAML looks nice to read in simple demos and in small files, but is just not adequate in the real world (in my personal opinion - I know that YAML is used by a lot of people in production as of today).

dijit · on Jan 12, 2023

> I've never met anyone who say's "I like YAML, it is great"

Maybe I'm older than you, but I have definitely heard that line.

Mostly because the alternatives were XML, INI or the myriad of bespoke formats, relayd/apachehttpd .conf or iptables etc;etc;

INI has parsers that operate in different ways and doesn't support heirarchies... so that's not ideal.

JSON and YAML came to the fore around the same time, and JSONs limitations in comments and it's picky semantics meant that people did prefer YAML over JSON for human readable configs.

YAML itself is fine, it has some really awkward warts and the parsers are usually programatically unsafe in their implementation (leading to less compatible "safe_load" or other types of loaders)[0]; the issue we actually have with YAML is that we:

A) Template it (jinja, mustache whatever)

B) Put entirely too much stuff into it. (kubernetes manfiests can grow to the hundreds of lines really easily)

These problems will affect any configuration file format we choose to use, including TOML (which is comparatively new on the block), because reading templated/enormous files is really difficult.

What I've taken to doing is programatically generating objects and then serialising them as whatever my software depends on. It might feel ugly to use an entire turing complete language to generate objects that are mostly static: but honestly... the ability to breakpoint, test and print the subsections of output is astonishingly nice.

Then I don't care at all what the format is.

[0]: https://www.serendipidata.com/posts/safe-api-design-and-pyya...

falcolas · on Jan 12, 2023

I kinda miss XML. Yeah, I'm weird.

The tooling is super mature, it's easy to emit, it's easy to parse, it's easy to validate, it can just a little hard to read and write by hand (and I mostly blame SOAP for that). Still, basic XML isn't that hard to read or write, thanks to editor support.

dijit · on Jan 12, 2023

I still wouldn't want to read this, and this is a simple example. :\

    <apiVersion>apps/v1</apiVersion>
    <kind>Deployment</kind>
    <metadata>
      <name>
        some-deployment
      </name>
      <namespace>
        deployment-namespace
      </namespace>
    </metadata>
    <spec>
      <replicas>1</replicas>
      <revisionHistoryLimit>3</revisionHistoryLimit>
      <selector>
        <matchLabels>
          <app>
            some
          </app>
        </matchLabels>
      </selector>
      <template>
        <metadata>
          <labels>
            <app>
              some
            </app>
          </labels>
        </metadata>
        <spec>
          <serviceAccountName>dbuser</serviceAccountName>
          <nodeSelector>
            <iam.gke.io/gke-metadata-server-enabled>true</iam.gke.io/gke-metadata-server-enabled>
          </nodeSelector>
          <imagePullSecrets>
            <name>dckr-auth</name>
          </imagePullSecrets>
          <containers>
            <name>
              some-service-container
            </name>
            <image>
              dckr.io/some/deploymentImage:latest
            </image>
            <imagePullPolicy>Always</imagePullPolicy>
            <ports>
              <containerPort>8000</containerPort>
            </ports>
            <env>
              <name>DB_URL</name>
              <value>postgresql://pgsql%40{{.Env.GCP_PROJECT_ID}}.iam@127.0.0.1:5432/{{.Env.NAMESPACE}}-{{.Env.SERVICE | strings.TrimSuffix "-service"}}</value>
            </env>
            <env>
              <name>LOGGING_LEVEL</name>
              <value>debug</value>
            </env>
            <env>
              <name>TRACING_ENABLED</name>
              <value>false</value>
            </env>
            <env>
              <name>APP_NAME</name>
              <value>
                some
              </value>
            </env>
            <env>
              <name>AUTH_URL</name>
              <value>http://auth/auth</value>
            </env>
            <env>
              <name>KEYCLOAK_KEYSET</name>
              <value>/protocol/openid-connect/certs</value>
            </env>
            <env>
              <name>MAILCHIMP_URL</name>
              <value>https://mandrillapp.com/api/1.0</value>
            </env>
            <env>
              <name>STREAMING_SASL_ENABLED</name>
              <value>true</value>
            </env>
            <env>
              <name>CORS_ENABLED</name>
              <value>true</value>
            </env>
            <env>
              <name>CORS_ALLOWED_ORIGINS</name>
              <value>*</value>
            </env>
            <env>
              <name>CORS_ALLOWED_METHODS</name>
              <value>*</value>
            </env>
            <envFrom>
              <secretRef>
                <name>kafka-access-secret</name>
              </secretRef>
            </envFrom>
          </containers>
          <containers>
            <name>cloud-sql-proxy</name>
            <image>gcr.io/cloud-sql-connectors/cloud-sql-proxy:2.0.0-preview.3</image>
            <args>--auto-iam-authn</args>
            <args>--structured-logs</args>
            <args>{{.Env.SQL_CONNECTION_STRING}}</args>
            <securityContext>
              <runAsNonRoot>true</runAsNonRoot>
            </securityContext>
          </containers>
        </spec>
      </template>
    </spec>

falcolas · on Jan 12, 2023

Honestly, it doesn't feel too bad to me. But it's also a bit overly nested for many things; I wouldn't want to read that in JSON either.

XPath (and JQ) FTW here.

lostmsu · on Jan 12, 2023

Reads fine to me, although would benefit from a few element->attribute switches.

mickeyp · on Jan 12, 2023

I like YAML.

I like that you can use anchors and merges. It greatly simplifies complex, repetive structures. And most of the complaints about yaml can be worked around by string-quoting.

The whitespace can get in the way if you're templating, but then you can also use [1, 2, 3] as a list notation, for example.

In fact, most of the complaints could be resolved by running it through a linter.

tipiirai · on Jan 12, 2023

I like YAML. More specifically, the subset of YAML, like the author suggests. Clear, intuitive, and allows expressing of complex data structures like JSON does. Much better than TOML, which easily becomes a mess with more complex data.

andybak · on Jan 12, 2023

Yeah. My headcanon YAML is amazing.

xwolfi · on Jan 12, 2023

Yup exactly my experience as well, again a stupid idea to try and make a "configuration" language out of nested key value pairs that end up needing fancy interpreters allowing more and more semantic into the keys and values to start doing what a simple program could have done in half the time...

I ve worked in 4 companies over a period of 10 years, each had exactly this problem, with yml, json, xml, properties file (you dont want to see business logic conditionals in a properties text file, where the keys shapes command an interpreter to behave dynamically...)

The only times I saw a team do it well was a php backend of all things where the lead said they d program all their variations in php rather than source it from configuration flat descriptors and it was amazing, clear, simple and powerful. They had to release the backend at each config change instead of releasing the config change only, but Im still unsure why exactly that's a problem: the configs are software too if we re honest with ourselves, shoe-horning them in a descriptor language isnt gonna make them flat.

ptsneves · on Jan 12, 2023

The problem is mixing data with logic. I cannot imagine how maintainable a small program becomes when this concept is employed, much less a big one.

jeroenhd · on Jan 12, 2023

I don't think YAML is great, but I still think it's the best format out there.

The only confusing problem I've run into was the sexagesimal number notation and even that was fairly obvious. Perhaps it's because I tend to overquote strings?

I mean sure, the on/off to boolean mappings are annoying, but they also become very obvious when you're parsing config because the type validation will fail. If `flush_cache` has an enum `on` but no key `True` then the type validator will instantly complain about both the missing key and the extra key in the dictionary.

Same with accidental numbers, any type check will show that the parsing failed.

I find JSON for config files to become unreadable quickly because of the non-obvious nesting and the lack of comments. You can pick a JSON extension but then you need to pick one that your tooling will support.

jefftk · on Jan 12, 2023

> I still think it's the best format out there.

What do you think of https://toml.io ?

jeroenhd · on Jan 12, 2023

TOML solves a lot of issues but I find it hard to visualise when you get much deeper than two or three levels.

jacurtis · on Jan 12, 2023

Exactly this. I hear so many people recommend TOML over YAML.

I see the logic in it. For simple Key-value configurations, TOML is superior and more straightforward to YAML. You can add sub-level values and it isn't too bad (if there aren't too many), but beyond two levels, TOML becomes difficult to use.

If you really work in YAML in any sort of more advanced capacity (kubernetes, Ansible, CI/CD Pipelines) then you really need the complexity that YAML provides. You also get used to the "gotchas" mentioned here. Navigating them is fairly straightfoward.

I think the article was vastly overblown. Is YAML perfect? Certainly not. But you find a better way to display such complex data structures in a more human-readable and human-writable way. The complexity is YAML's strength, but it comes with caveats as all complexity generally does. I really think its the best we have.

ilyt · on Jan 12, 2023

I think problem in this particular case is using YAML as DSL. Every other data format would be equally bad here. Replace YAML with TOML and you're still in same templating hell.

YAML is least worst for me, and I don't think I ever hit the problems article is showing because

* I use editor that will highlight stuff like anchors

* I often generate config from CM so it can't have those errors

* Loading into defined struct in statically typed language also makes them impossible.

singingfish · on Jan 12, 2023

YAML is nuts, and JSON is annoying (trailing comma limitations, lack of comment syntax no matter how annoying it is that the spec is correct about why there are no comments).

Both have their place though. YAML came out of perl, and both are some confluence between awesome and horriffic (although yaml wins the horrific crown for sure).

I've had a little bit to do with Ingy - the inventor of yaml, and I've worked closely with some of his collaborators. Ingy is nuts, mostly in a good way, but I wouldn't put him in charge of the architecture, I'd put him in charge of the abyss.

red_admiral · on Jan 12, 2023

JSON also borrows (earlier versions of?) JavaScript's "all numbers are doubles", leading to this kind of thing: https://rachelbythebay.com/w/2021/02/03/bits/

Though, in fairness, I think old Perl did that too. It's super convenient until it isn't.

Rachel also doesn't approve of JSON in high-reliability systems for other reasons: https://rachelbythebay.com/w/2019/07/21/reliability/ and point taken, if you're sending data from your service A to your service B and neither is a web browser, nor are they written in JS, then there's far better formats and you almost need a reason not to use protobuf.

rwmj · on Jan 12, 2023

There was (I think probably still is) a qemu bug with JSON. It accepted requests to read guest memory in JSON format, with the memory addresses encoded as JSON numbers.

When reading out guest kernel memory (addresses are at the top of 64 bit space) these would silently be rounded to the nearest whole double. It took me a very long time to understand what was going on.

svieira · on Jan 12, 2023

Actually JSON doesn't specify what numbers are - it would be perfectly licit for a JSON parser to transparently use a real numerical tower, allowing perfect representations of any non-repeating decimal fractional number (since it has to be represented as a dotted-fraction and there's no support for a vinculum (aka U+0305 / COMBINING OVERLINE / ◌̅ / 3.21̅) there's no way to represent non-repeating fractions if there is not a non-repeating representation in base 10). A few JSON parsers even do this. That said, if you don't control the both sides sending something that won't be handled by the lowest-common-denominator (browser JSON parsers / JS numbers) is asking for trouble.

int_19h · on Jan 12, 2023

The short story is that if you want real foolproof interoperability, always represent numbers as strings in JSON.

themulticaster · on Jan 12, 2023

Interestingly, your example renders incorrectly for me (Firefox/W10). The overline is placed between 1 and ).

scrollaway · on Jan 12, 2023

This is a great post but my understanding is this has nothing to do with JSON, which is unopinionated about numbers. Rather, with JS's JSON parser.

Python, for example, has several JSON libraries which let you swap out the numeric parser so it yields Decimal objects all the time. It's overkill for most use cases, but essential if you're working with REST APIs in Fintech.

anonymoushn · on Jan 12, 2023

JSON doesn't specify what numbers are. Integers that take 2MB to represent are valid JSON numbers.

Regarding protobuf, the following opinion is obviously insane, and if your org is already using protobuf you should ignore it: protobuf actually seems pretty bad? It has a bunch of vestigial features that people just say not to use. Its integer encoding bloats the encoded size and causes unnecessary dependency chains in the decoder. I would strongly prefer sending simdjson tape between processes and storing simdjson tape at rest, but if my coworkers insisted on doing something normal, maybe I would look into flatbuffers or capnproto.

ilyt · on Jan 12, 2023

YAML used from withing statically typed language gets rid of most of the problem, but the main one seems to be "well, we figured out which stuff was just a bad idea and put it in 1.2, except nobody uses it"

> Both have their place though. YAML came out of perl, and both are some confluence between awesome and horriffic (although yaml wins the horrific crown for sure).

Weirdly enough I'm not getting most of those issues in Perl YAML, "norway problem" for example

    use Data::Dumper;
    use YAML;
    
    my $a ="---
    geoblock_regions:
      - dk
      - fi
      - is
      - no
      - se
    ";
    print Dumper(Load($a));

    $VAR1 = {
              'geoblock_regions' => [
                                      'dk',
                                      'fi',
                                      'is',
                                      'no',
                                      'se'
                                    ]
            };

steveBK123 · on Jan 12, 2023

This reminds me of a certain architect at my last shop who invented a DSL on top of his Python superapp. He expected all projects to go through his superapp. The DSL was configured in YAML. The YAML was often so dense he recommended devs use Jinja to generate the YAML.

This meant debug was hell, plus it wasn't always clear if what you were trying to do was even supported / if not why & what needed to be changed. This was because you were now 3 levels of abstraction away from the Python code that was actually executing.

Every time a dev took on a new project they had to jump on a call with architect or right hand man to figure out if what they were trying to do was going to be possible.

It escalated into the architect demanding to know a sprint in advance any task devs were trying to do, in a review session, so he could explain if it was possible or not and try to triage in his DSL..

ilyt · on Jan 12, 2023

>The DSL was configured in YAML. The YAML was often so dense he recommended devs use Jinja to generate the YAML.

Did he then went on to design Ansible ? It falls into same trap

Only way you should be generating data format using language's templating system is

    <%= YAML.dump(@config) %>

Also 9 times out of 10 I wished the app designer just used <app language> or <any common embeddable language> (like Lua) instead of making any kind of DSL (whether that's just data file pretending to be code or micro programming language)

steveBK123 · on Jan 12, 2023

Yes. I think this is like the uncanny valley of development.

It's not no-code UI driven stuff you can put in front of a business user.

It's not real coding, which an engineer wants to do.

It's config jockeying, which devs find boring, and is generally far more limited. So you end up building out more and more complex layers of config to work around the limits, including scripts to generate config.. etc.

Seems like what you really want is modular apps that are easy to extend in the native programming language(s).

Also maybe I'm just stupid, but 9 out of 10 times, a text file or csv/demitted file accomplishes most of what you need for pure config that really belongs in config.

gpderetta · on Jan 12, 2023

Tell me! I'm currently attempting to resist an upcoming transition form .csv to yaml...

... which is used to generate .xml.

ljm · on Jan 12, 2023

This is basically how I feel about working with K8S and dredging through a repo full of templated YAML spaghetti. What am I looking at now? Helm, Keda, Flux, Argo, OperatorHub, GitHub Actions? oh actually this bit is in Terraform in another folder, whoops.

You can’t actually deploy something unless you can mentally untangle it all, it just sits in front of your infra as a sort of DevOps Coming of Age ritual, where you look whistfully over your shoulder at the old Heroku or Vercel account you grew up with. Simpler times.

formerly_proven · on Jan 12, 2023

At work someone is trying to introduce a system where a bunch of Jinja templates in a repository are used to generate XML which can then be used to generate another XML document which can then be "executed", resulting in an annotated XML document :)

steveBK123 · on Jan 12, 2023

Wow that's like 2000s & 2020s mutant baby

formerly_proven · on Jan 12, 2023

And the context variables for the jinja is in part read from a YAML file iirc

rollcat · on Jan 12, 2023

I've read about places that do this kind of stuff. Although it sounds like pure hell, I'm sure there's always a reasonable explanation, an intent. What kinds of problems was the org facing that led to the development of this?

steveBK123 · on Jan 12, 2023

It wasn't really necessary. It also made the core superapp a blocker of essentially every user delivery. If it has to do everything, then you have to add a lot of features to it.

So you have N devs doing jinja/yaml/dsl and N/4 doing core superapp underlying dev.

For the first Z features/projects, you inevitably see new things that your superapp doesn't support yet, and becomes emergent blockers midway through implementation.

Given the ratio of devs, new blockers were being generated faster than they could be cleared.

Eventually business side pulled the fire alarm and grabbed most of the devs over to use a more common AWS-centric service directly and exit the superapp dev use completely.

IE - if you are going to depend on something, would you rather it be an AWS service with 10s-100s of dev-years behind it, or some internal superapp spun up 3 months ago with 2 guys on it? Which is more likely to already support what you need?

singingfish · on Jan 12, 2023

I guess YAML has a place in that it would prevent that kind of thing happening in the first place.

YAML is easy to debug (thanks to having comment syntax) because it just deserialises into code. Sometimes it deserialises into code that compiles on the fly mind you which is never a good idea.

On the other hand one time I debugged a really nasty memory leak by dumping many megabytes of YAML then running git diff against the dumps. That was fun. Of course the client used the quick and bad hack rather than the demonstrably correct fix (thanks to the dumps) because they were frightened of their own code.

goodoldneon · on Jan 12, 2023

I would quit within a month

steveBK123 · on Jan 12, 2023

Some did :-D

duxup · on Jan 12, 2023

That sounds like a layer of insanity that would make me consider jobs elsewhere. It sounds entirely unnecessary and burdensome , but was it unnecessary?

steveBK123 · on Jan 12, 2023

It was unnecessary because he was being too clever & having a good time, versus ever having delivered real production systems in our industry.

It also put devs on a dead end path which they realized pretty quickly. Do you want to work for years on this team becoming experts in jinja to yaml to in-house DSL you'll never use anywhere else? Or do you want to write some python? If you can't get "promoted" into the team writing the core python engine, then you are obviously a second rate.. why stay?

duxup · on Jan 12, 2023

Even beyond that I would worry about the decision making at the top that… they thought that was a good idea and kept at it.

At that point a lot of potentially absurd things could be in play.

steveBK123 · on Jan 12, 2023

Less technical management hires a hero who tells them everything they want to hear!

"I'm going to deliver the superapp, everything will be super centralized & tidy.. small dev team, then all the specific implementations will be grunt work by cheap devs!"

Throw in some buzzwords and they are sold.

Same audience that always signs the checks for no code/low code stuff no one actually wants.

jimmaswell · on Jan 12, 2023

> lack of comment syntax no matter how annoying it is that the spec is correct about why there are no comments

This completely arbitrary ideological purity has come at the expense of countless wasted hours, headaches, and suboptimal workarounds like using strings as comments, with zero tangible benefit - zero bad things would have ever happened if JSON allowed comments. There is nothing correct about it.

singingfish · on Jan 12, 2023

I dunno about that. I remember how microsoft were back in the IE days. They could have used comments for really bad stuff.

reilly3000 · on Jan 12, 2023

> I wouldn't put him in charge of the architecture, I'd put him in charge of the abyss.

:-) I enjoyed this bit a bit too much, now I want it on my tombstone #lifegoals

garfieldnate · on Jan 12, 2023

Is this the same Ingy that made Test::Base? It's the best data-driven testing framework I've ever used, and I've missed it often while working with other languages. The follow-up polyglot framework just didn't cut it for me.

singingfish · on Jan 12, 2023

I would be really surprised (and slightly worried) if there was more than one Ingy :)

api · on Jan 12, 2023

Do people dislike TOML only because it looks like a Windows INI file? I think it’s nice. Rust chose it in keeping with their penchant for sanity most of the time.

red_admiral · on Jan 12, 2023

I would prefer if logically nested blocks could also be phyisically nested (and indented), so you can have a full tree structure. If you're describing something that can have variable levels of nesting (think folders) then it can sometimes make the format easier to understand.

ilyt · on Jan 12, 2023

I like YAML for reading and TOML is entirely worse for reading (still million times better than JSON tho), and as the use cases are mostly read, rarely written, and if written they are code-generated (using configuration management), YAML fits better.

IshKebab · on Jan 12, 2023

JSON5 is the answer. I have yet to find a better option.

bambax · on Jan 12, 2023

I never could understand the hate XML got, but I'm having a bit of schadenfreude seeing what people suffer through with its replacements.

An XML document with a well-thought-out domain-specific DTD would solve all these problems; instead, we have something where no sometimes means false (but not always) and 22:22 sometimes means 1342 (but not always!)... because... why not?!?

All that horrible mess, it seems, because people didn't like to have to close tags.

donatj · on Jan 12, 2023

It’s the learning curve really.

Our industry has the remarkable properties of being almost entirely newbies, a constant churn of green developers, combined with being very bad about passing down generational knowledge. This is why things that are easier to explain win out over things that are technically superior but take longer to get your head around almost every time.

The thing JSON had going for it over XML is that it maps cleanly to most languages object models so you can read the results directly. No writing XQueries or DOMs to read values.

It’s only major downfall is lack of comments, which has lead people to YAML. (There’s plenty of other things it lacks in comparison to XML, like native schemas, but most of that falls into things newbies don’t know they want)

YAML is in this weird middle place where it’s easy to explain but impossible for a human to master. It appears as simple as JSON to newcomers who adopt it, but the long time users of it find it full of foot guns. People wanted JSON with comments but instead they got the complexity of XML minus the clarity.

makeitdouble · on Jan 12, 2023

XML is/was hated by newbies and old farts alike.

Of course some of the hate come from the application where XML was used, more than XML itself, but it also is a deeply flawed language.

Nowadays the main issue would be that it requires a complex generator and parser libraries to be any useful (you'll never want to deal with XML parsing/escaping by yourself) yet it's not as efficient as binary formats like protobuff for instance.

That means that anything you'll want to edit by hand or be purely textual and readble will be better done in yaml or json, and anything beyond that can be done in other ways. The need for a single language trying to awkwardly span all the spectrum isn't big.

gpderetta · on Jan 12, 2023

We all hate XML.

We equally hate all replacements (as a config language) after it. At least XML stayed around long enough to have decent tooling.

red_admiral · on Jan 12, 2023

XML is ok when both the reader and writer are machines, but you still want a text-based format (otherwise I'd just use protobuf) or when maybe you occasinally want to edit by hand but not often.

I think it's the same reason why, although HTML is a great standard, lots of us like writing in Markdown and having something convert that to HTML, despite all the problems when you try and push Markdown further than it was intended.

YAML, as I see it, is trying to be to XML configurations what Markdown is to HTML, with the added bonus of an attempt at a tag/reference system to store object graphs that are not trees. Back in the day of XML-based Spring config files, we had <bean> and <ref bean=...> but as far as I know that's implemented on the Spring layer, it's not a generic property of XML, whereas YAML tries to abstract that into the format itself.

bambax · on Jan 12, 2023

> YAML, as I see it, is trying to be to XML configurations what Markdown is to HTML

Yes, that's probably a good analogy. The big difference though is that if some Markdown fails to parse, nothing really bad happens, while a YAML file that fails to parse can bring a whole system down.

Also, Markdown is a famously ambiguous format; it trades precision for ease of write, and that's fine, mostly.

But in a configuration file, ambiguity is really the opposite of what you want.

thrwawayjan1223 · on Jan 12, 2023

I worked with hand editing XML files in the past, and I didn't really have an issue with simple XML files. I actually prefer XML in some cases.

I see your point if you need to edit a complex XML with multiple namespaces mixed together, but a plain XML file can be just as readable as JSON.

Some JSON files can be really hard to edit by hand too. At my current workplace I often have to deal with nested JSON files, where a JSON contains values that are also JSON, but encoded and escaped so that it is difficult to edit.

Jenk · on Jan 12, 2023

Hard disagree.

XML is great for systems to read and write, but utterly abhorrent for humans. It's not just having to close tags. It's content vs attribute confusion, namespacing noise (which are also muddled with attributes), and there's a squint factor incurred by the density of information.

bambax · on Jan 12, 2023

Namespacing is the opposite of noise: it lets different domains coexist with no possible confusion about which is which.

But experience has taught me that trying to explain why XML is great and simple to read and write for humans, to someone who thinks differently, is useless. And so, I won't.

Yet, please accept that some people, like me, really liked it and didn't mind the little quirks in light of all it offered.

wartijn_ · on Jan 12, 2023

It makes sense that explaining why xml is great doesn’t work. Maybe you can convince someone who’s never worked with xml. But if someone has used xml and thinks it’s a bad format based on those experiences, you’ll need to bring _really_ good arguments to let them think it’s a good format anyway.

Just ask yourself: could a comment here convince you that xml is a bad format? If not, why would the opposite work?

lostmsu · on Jan 12, 2023

Well yes, if it described a format as pleasant for humans as YAML, while not being ambigious and error-prone.

deafpolygon · on Jan 12, 2023

I disliked it intensely at first, but over time .. I now see the merit in it. I like XML, especially compared to everything else (that came after). It was possible to clearly define things.

Jenk · on Jan 12, 2023

On the same system? I can see that. One thing I've lamented about XML for years is that it's a pain when moving to different systems that have their own namespaces/DTDs.

Ygg2 · on Jan 12, 2023

On the other hand XML document with good schema is pleasure to write in any decent IDE.

It just autocompletes itself. I know JSON schema exist, but it never managed to just work that well.

vincnetas · on Jan 12, 2023

i think that propper schema and autocompleate would save the day. even i would say make things easier. talking about XML

piskov · on Jan 12, 2023

I’ve been using this for (schema + autocomplete) for at least 18(!) years.

Moreover I even got comments on xml attributes and elements, warnings if required element is missing or required count is not met.

You even got visual hierarchical collapsable UI for viewing/editing that XSD schema.

I repeat: 18 years of autocomplete, comments on hover, and validation for typos/count/order for XML in Visual Studio.

Jenk · on Jan 12, 2023

20-something years of web/app.config is the primary reason why I hate XML. Very, very closely followed by Java ("a DSL for turning XML into stacktraces") and its over complicated ecosystem of xml configurations.

thrwawayjan1223 · on Jan 12, 2023

I have worked with Java for over two decades, and your experience does not match mine.

At $work there are only two XML files in each project:

- logback.xml: configuring logging which is usually only done once or twice and is the same for each project.

- pom.xml: Maven config which is mostly autogenerated by IntelliJ when creating a new project. I only add a few lines for new dependencies.

In rare cases I may use some plugins to do specific build operations, but the actual XML is simple. Most of the time is spent on understanding the plugin itself, and not "fighting" XML. Moving this to YAML has no effect on my productivity

All other configuration is either done using YAML, properties or configuration in code (Spring Boot).

I see Spring still has options to use XML configurations, but I don't see any reasons why anyone would do that in 2023. The "new" standard in Spring is configuration by code, which I have done for the last 8 years.

The ecosystem for Java today can not be compared to what it was 10 years ago. If editing XML configurations is still a problem at your workplace, find a new job :-) Either you work with a legacy system that nobody wants to update to the new standards, or you have architects who are stuck.

singingfish · on Jan 12, 2023

XML has the hate because back in the day everyone was like "oh XML is a human and computer readable data format, that will solve all of our problems".

Sure it solved all of the problems by making a data format that both humans and computers found difficult to read.

"Oh I've got a problem. I'll solve it with XML. Now you have two problems" ;)

acdha · on Jan 12, 2023

I remember being excited when XML hit 1.0 (as a new programmer this seemed like a huge advance over things like classic Unix configuration files), and progressively disappointed over the next decade as the promise not only wasn’t delivered on.

The things which killed XML seem to me to be related to the old standards culture: the people involved assumed adoption was inevitable and distracted themselves with increasingly arcane thickets of new standards, with the assumption that someone else would spend time on the “boring” work of building professional-quality tools and documentation or cleaning up usability warts. That other 80% of the work never happened and most people who had a choice moved on.

As a thought experience, imagine if libxml2 had had even a single dedicated developer focused on tracking standards or making usability improvements, instead of training multiple generations of users that XML was slow and hostile to users. Various XML committees’ travel expenses building standards which were never used likely cost more than that. Not leaving XPath frozen around the turn of the century would have helped in so many places.

The other wart I think would have made a surprising difference is the usability disaster around namespaces. So many tool developers forced users to switch between the short namespace:attribute form they used everywhere in the document and the {namespace url}attribute form that resolves to, or forced you to respecify the namespaces on every operation rather than reusing the values the parser had already loaded. Users begrudged that verbosity but they hated it when it meant something silently returned incorrect results because a selector using the document’s own syntax didn’t find the element they could see using those exact values. Absolutely nothing anyone did in the XML world was a better use of time than fixing that would have been since it trained people to think of XML-based tools as a painful, error-prone experience to be avoided — and they did as soon as they could.

melx · on Jan 12, 2023

I miss the old days of working with XSLT and XPath. A nice way of giving some design touches and filtering/sorting to XML documents. It wasn't perfect but I would take any time over the yaml/json/toml/ini.

thraxil · on Jan 12, 2023

I don't know if I'd say I "miss" XSLT and XPath, but yeah, it was fun and powerful at the time. I made some crazy stuff with Cocoon and AxKit that, if you didn't mind the syntax, were actually pretty elegant.

manbart · on Jan 12, 2023

I'm surprised XQuery never became more popular, it's essentially XSLT with a 'normal' syntax

rwmj · on Jan 12, 2023

XPath is cool (and I still use it for work regularly). jq JSON expressions seem to be inspired by it as well.

XSLT was horrible though. A programming language with XML syntax, no thanks!

int_19h · on Jan 12, 2023

XQuery gives you 90% of XSLT with a syntax that is a superset of XPath. Verbose queries (https://en.wikipedia.org/wiki/FLWOR) are especially nice and readable.

bambax · on Jan 12, 2023

Me too!

MilStdJunkie · on Jan 12, 2023

Eh, definitely there's some generational round robin going on. And the "amsterdam problem" in YML is just insanity. But you got to admit, XML has some very low level intrinsic problems

* Schemas violate underlying XML rules all the time, and due to variance in XML parsers, it gets a pass but only on specific configurations. There's no one Holy XML Parser. Leading whitespace in attributes? Sure, why not?! Whitespace sensitive element order? Ooh yeah, lots of it! But try and feed it to libxml and the whole thing collapses - without error handling, mind you, but that's not XML's fault.

* Whitespace agnostic means diffs and merges are element aware, and it's nigh-impossible to estimate the upper compute limit of an element aware diff

* There is NO OFFICIAL WHITESPACE SPEC - so if you try and fix it with normalization and lines, it's going to be different everywhere you go. So you're forced to switch on element aware diff / merge in your VCS, which is a pretty big change, and it's one you have to sell other departments on.

* XML breaking 1NF, which, aka means that XML can have XML inside of XML infinite recursion when playing data format = which breaks so very very many things . . it's actually going against the whole concept of a data hierarchy, which is built in to XML at a low level.

* Sort of riding on that, in order to parse XML you have to eat it element by element, and there's no way to tell when an element is going to end or of it recurses N levels. This makes it computationally expensive. The CAD STEP format has some of this disease - it has to be loaded in its entirety to parse, which can be holy hell with a TB file.

* Yeah sure, namespace hell. No one really figured out a way to fix it. S1000D (XML spec) to this day just denies that XML namespaces ever existed, and I can't really fault their WG for doing that. I can fault them for so many other things, but not that.

silvestrov · on Jan 12, 2023

The really great thing about xml with XSD is the ability to validate the document really well.

I.e. having content matching regexps, so you can be sure version numbers are always \d+(\.\d+)* and does not have unexpected letters or spaces in it. That date values are in ISO format.

Very useful for APIs to 3rd parties as it now is very easy to catch most errors and provide an useful error message for them without having to code every check.

YeGoblynQueenne · on Jan 12, 2023

>> All that horrible mess, it seems, because people didn't like to have to close tags.

If people keep burning their faces off it's not the fault of the flamethrower! If they keep hacking each other's limbs off it's not the fault of the chainsaw! If they keep blowing their houses up it's not the fault of the gas bottles! It's not like we need to accommodate every possible failure mode of the human brain, and make things safe just because the next idiot that comes along is going to cause a catastrophe. It's the idiot's fault that they forget to apply the breaks and take advantage of the failsafes, it's not the responsibility of the system to have the breaks and failsafes where any idiot can find them!

I've head that kind of excuse 0 times, outside of software circles. But of course I'm exaggerating because nobody really suffers because of XML other than the people who use it everyday. Like this once junior dev, for example, who was made to create XML by hand by eyballing an Excel document, just because paying a junior dev salary is easier than getting middle management (and clients) to learn to structure an Excel spreadsheet properly.

Sometimes, just sometimes, you have to design a tool with the way it will actually be used in the real world in mind. And all the rest of the time? Well, all the rest of the time you have to do it that way, too.

scott_w · on Jan 12, 2023

> All that horrible mess, it seems, because people didn't like to have to close tags.

The issue isn't that it's bad per-se. The issue is that it's cumbersome to write. When you want to setup and tweak a tool, it gets annoying very quickly having to deal with little errors because you misspelled a closing tag. Maybe you want to test enabling a setting? Typing out 4 characters is so much less friction than the 20 or more (including the < > : / characters) you'd need for an XML config.

wirrbel · on Jan 12, 2023

I mean it wasn't just that.

* `<?xml ...` something line lost a percentage of people already, etc.

* Essentially XML was a markup language (SGML stripped down), while you can use markup to mark up key-value pairs, it just wasn't designed or optimized for that.

* To the contrary, it also did away with the nice SGML features for that instead of `<tag>value</tag>` one could write `<tag/value/` and the self-closing tags (like `<ol><li> ... <li></ol>` in HTML. For most particularly deep config files self closing tags would have been perfect.

* XML attributes vs substructure was just an awkward choice to give out. I.e. `<myConfig userId="asdf" host="asdfasdf" ...>` vs `<myConfig><userId>asdf</userId><host>asdfasdf</host></myConfig>`.

More so I get the feeling when looking at the XSL, XSLT, etc. mania that a similar pattern was at work as with UML: special interest groups and tooling providers driving the evolution of a standard with their interests in mind and not the interest of users or developers.

Overall XML could have been something like this `<myConfig / <userId/asdf/ <host/asdfasdf/` if it was SGML.

YAML and JSON succeeded because they had a clean and predictable, no-nonsense mapping between encoding and object-model after decoding. Probably we should all switch to an almost-yaml format that does away with the peculiarities, and the FANG companies would have the momentum to make that happen.

I personally would like for HJSON (https://hjson.github.io) to see more adoption, but that train has passed...

acdha · on Jan 12, 2023

It’s more typing but it’s simple typing most people don’t even think about because it’s predictable (not to mention automatic in many editors). I’d take that over the frictional cost of thinking your YAML is done and then having to debug magic data conversion or realize that you left out one character causing something to be parsed completely differently.

Where XML falls down hard is tool usability. There’s still no standard formatter or good validation tool, and things like namespace support is a constant source of friction.

scott_w · on Jan 12, 2023

> I’d take that over the frictional cost of

You might but the rest of us didn't. At a visual level, YAML is really nice. The problem is when you actually want to use it for... well... anything.

On the flip side, XML is downright ugly to read and write, which makes it a no-go for configuration files that are written and modified by hand.

acdha · on Jan 12, 2023

Look, I’m not saying XML is perfect but the worthy criticism is more substantial than closing tags. It’s a very easy convention to learn requiring little thought and I haven’t used an editor which didn’t automate that process since the turn of the century. That’s less cognitive load than having to remember a bunch of context-sensitive rules about YAML’s magic behavior - I’m thinking in particular of people I know who’ve burned hours only realize that they’d missed a character somewhere, forgot to escape one value, or had an indentation issue causing something to be ignored.

The criticisms I would make are more fundamental: the XML data model is different than the most popular data structures and the APIs in most languages are quite cumbersome and can lead to silent data loss (name spacing and selectors). Someone probably could have done an XML5 effort 15 years ago but by now I don’t see that happening.

If I had to rank the options for a configuration language, I’d probably go HCL, TOML, JSONC, JSON, YAML + Prettier + a YAML lint schema, YAML, XML. XML could rise up with better tools but so could a low-magic YAML variant.

makeitdouble · on Jan 12, 2023

> An XML document with a well-thought-out domain-specific DTD would solve all these problems;

But then a YAML document written by disciplined and well-thinking people will also not have these problems.

Any of the complex formats work well when used in the most fitting way and go horribly wrong when people try to benefit too much from the complexity. XML was also a nightmare when put in the wrong hand, just as yaml is, and the next hyped and turing complete document syntax will be.

enw · on Jan 12, 2023

If something's too easy to misuse, it's bad design.

makeitdouble · on Jan 12, 2023

Ease to use and ease to misuse usually come hand in hand.

If we had very clear and static requirements I’d see languages/tools with more safeguards, fool proofing, optimized design etc. But we’ll never have that for configuration languages, and when aiming for flexibility you have to trust people to not shoot their own feet.

talideon · on Jan 12, 2023

The problem with XML is mainly in the 'M': it's a _markup_ language. Using it for configuration and arbitrary data serialisation isn't where its strengths lie, but it got shoved into those niches because if you look at it _just right_, you can make it work in them.

I don't hate XML myself, but I do hate how it's been abused over the years.

xienze · on Jan 12, 2023

> I never could understand the hate XML got

It's because so many people would see examples of complex _XML-based formats (WSDL, SOAP, etc.)_ and infer that _XML itself_ is complicated. XML is really not that difficult to understand, and I find it quite amusing that the same people who don't bat an eye at writing HTML complain about how baffling XML is.

Is writing an XML parser difficult? Yes, very much so. But again, that doesn't make XML itself complicated. And before anyone tries to call out this particular comment, keep in mind that writing a fast, correct, and safe JSON parser is no walk in the park either.

Are there many examples of complicated XML-based formats? Yes, but that's just a reflection of the complexity of some particular configuration model, not XML itself.

PurpleRamen · on Jan 12, 2023

XML is cool and useful, but for humans you need good tooling to handle it well. And it still is very noisy with the all it's boilerplate and what advanced features can bring in. And overall it has a culture of making things complex, and complicated.

ppseafield · on Jan 12, 2023

Just like bad/fragmented YAML libraries now, there were plenty of bad XML libraries and implementations. I didn't think XML was too bad until I had to write a SOAP request where the endpoint would throw an error if the arguments (in their own tags, mind you) weren't in a specific order. The endpoint gave me no hints.

Additionally we had to update WSDLs for these services, but the service generating the WSDL used features our server's SOAP library didn't support, so someone had to manually transform the XML and via lengthy trial and error to get it to work.

mmis1000 · on Jan 12, 2023

These days, JSON also has JSON schema that allows you to enforce the structure of a JSON file.

If you need a enum that only allow specific value, you just list it.

philjohn · on Jan 12, 2023

True - but then again, writing an XML config file can be a bit of a pain as it's more verbose.

singularity2001 · on Jan 12, 2023

XML is preferable for long or heavily nested data structures.

For small flat trees the alternatives are preferable

api · on Jan 12, 2023

It’s the verbosity for me. Why does everything need a fully spelled out closing tag?

FpUser · on Jan 12, 2023

I see nothing wrong about closing tag. Helps to navigate and understand in my opinion. I have way more problem with developers trying to be super concise and writing constructs that are very hard to parse mentally when looking at.

singularity2001 · on Jan 12, 2023

So that in long nested data you know which tag closes what. {{{{{}}}}}

jrmg · on Jan 12, 2023

I do think YAML is overly complex - but there is some hyperbole in this document.

- Many of these complaints are about YAML 1.1.

- YAML 1.2 was released _14 years ago_.

- The author makes some allusions to 1.2.2, and it requiring a team of people 14 years to make, but, from the yaml.com announcement they link to: “No normative changes from the 1.2.1 revision. YAML 1.2 has not been changed”

I guess my first two comments are undercut by PyYAML using YAML 1.1 (Really?! Python’s had 24 years of the Norway problem?!)

IshKebab · on Jan 12, 2023

The article mentions the fact that YAML 1.2 is really old and the fact that it doesn't matter because YAML 1.1 is still the most commonly supported version, and the fact that it's arguably even worse because YAML 1.2 gives different parsing results to YAML 1.1!

I highly recommend reading the article - it's very good.

wyuenho · on Jan 12, 2023

Yes, the problem is PyYAML.

sandwell · on Jan 12, 2023

Agree with the final part of this article that "programmable" configuration languages like Nix and Dhall are the way forward.

I've spent a lot of time writing YAML for Ansible, Cloudformation, k8s, Helm, etc. Some of the issues this article mentions are pitfalls but once you get a bit of experience with it, you know what to look out for.

I've also spent and a lot of time writing Nix expressions, which is much more "joyful" IMO. Seemingly simple features like being able to create a function to reuse the same parameterized configuration makes life much easier.

Add in a layer of type safety and some integration with the 'parent' app (think replacements for CloudFormation's !GetAtt or Ansible's handlers), the ability to perform basic unit tests, then configuration becomes more like writing code which I consider a good thing.

milliams · on Jan 12, 2023

Almost all those problems are solved by marking up the strings as strings, using " and ".