Hacker News new | past | comments | ask | show | jobs | submit login
The Yaml document from hell (ruudvanasseldonk.com)
566 points by ruuda on Jan 12, 2023 | hide | past | favorite | 353 comments



> is that for a long time it was the only viable configuration format

Actually, this is not the case. We had INI format for simple stuff and XML (protected with entire schema) for complex things many years ago, which worked. Yet, we wanted something readable (like INI), but able to express complex types (XML).

I don't think Toml is a viable replacement - for me, it has an INI-level of simplicity with even worst hacks to have nested structures. But, give it time and you'll have another YAML.

But yes, YAML is confusing for edge cases (true/false, quoting), but I'm going to find a powerful replacement that is not XML. Maybe EDN can jump in, but for anything more complex, I'd rather have a Lisp interpreter and structures around than any of mentioned and upcoming formats.

Thinking that JSON is a suitable replacement, imagine writing Ansible or k8s stuff in it; that would be fun ;)


> Thinking that JSON is a suitable replacement, imagine writing Ansible or k8s stuff in it; that would be fun ;)

Writing in YAML doesn't feel much better. YMMV but I've been on teams using Pulumi for k8s and the developer experience has been significantly better. I can automate, type check, lint, click through to definitions the same way I do with other typescript.

Pulumi is a young product with many rough edges but it's already been a game changer for me.


JSON has no comments. That disqualifies it immediately.


    {
        “/*These are the ports we use”: “*/”,
        “ports”: [22, 80, 443]
    }


Now, write a json-schema for that.


Can be done by either additionalProperties: true and not specifying the property, or "patternProperties": { "^/\*": { "type": "string" }}.


I'd rather have real comments than this epicycle.


"additionalProperties": true


“Comment”: ”This is a comment that is optional and ignored by code”


"Comment": "but you can only have one per object"


"Comment1": "This is the first comment"

"Comment2": "This is the next comment"

or

"Comments": ["Here is one", "Here is another!"]

or

"Comments": { "Name": "This is the name of the customer", "Age": "This is the age (0 to 1000) of the customer" }

or

"Values": [ {"Name":"Age", "Value":"20", "Comment":"This is the age of the customer"}, {"Name":"Name", "Value":"Mr. Foo Bar", "Comment":"This is the name of the customer"} ]

The only limit is your imagination.


Many JSON dialects do in practice.


I checked the sample Kubernetes project in Java [0] and it looks hideous. YAML looks much better in comparison.

[0] https://www.pulumi.com/docs/get-started/kubernetes/review-pr...


What, you mean you don't want to write code like

  Deployment("nginx",DeploymentArgs.builder().spec(DeploymentSpecArgs.builder().selector(LabelSelectorArgs.builder().matchLabels(labels).build()).replicas(1).template(PodTemplateSpecArgs.builder().metadata(ObjectMetaArgs.builder().labels(labels).build()).spec(PodSpecArgs.builder().containers(ContainerArgs.builder().name("nginx").image("nginx").ports(ContainerPortArgs.builder().containerPort(80).build()).build()).build()).build()).build()).build());


I actually laughed out loud when I looked at the example and just saw a bunch of carefully-indented “.build()”s


Don't disagree there. Java support is less than a year old and the Java API is.. not great. TypeScript is the more popular and mature choice.


It's very simple actually.

... just fucking don't, generate config in your configuration management tool of choice and then serialize it to YAML. You get all of the advantages (nice to read)and none of the disadvantages (need editor or it is PITA)


I would love to swap Pulumi for Ansible, but it's cloud-only for the most part.


> YAML is confusing for edge cases (true/false, quoting)

I'm not sure I would consider booleans an 'edge case'


Booleans aren't an edge case, on is an edge case way to spell false.


specifically according to https://yaml.org/type/bool.html:

     y|Y|yes|Yes|YES|n|N|no|No|NO
    |true|True|TRUE|false|False|FALSE
    |on|On|ON|off|Off|OFF
Not all parsers support all of these options, but true|false is always supported as YAML is a superset of JSON which makes these mandatory.


That is actually only the case for yaml 1.1; the 1.2 spec (circa 2009!) eliminated this forest of boolean encodings, leaving only `true` and `false`: https://yaml.org/spec/1.2.0/#id2602744


I can't imagine thinking this is a good idea if you've done any programming whatsoever. What happened here?


If you aim to be "human-friendly" (and that is, as I understand, the raison d'etre for YAML), there is a subtle semantic difference between "true" and "on" (and "false" and "off") and as a human it may be nice to express that semantic difference.

As for that semantic difference, if we expect the light source to have one of exactly two states (that is, "not a dimmable light"), we probably want to express that as "lightsource: on" rather than "lightsource: true".

And that is where the friction between "humanfriendly" and "computer-friendly" starts being problematic. Computer-to-computer protocols should be painfully strict and non-ambiguous, human-to-computer should be as adapted as they can to humans, erring on "expressive" rather than "strict".

I am also not sure if I am happy or sad that the set of configuration languages in the original article didn't include Flabbergast[1], which was heavily inspired by what may be simultaneously the best and worst configuration language I have seen, BCL (a language that I once was very relieved to never have to see again, and nine months later missed so INCREDIBLY much, because all the other ones are sadly more horrible).

[1] https://www.flabbergast.org/


Even leaving aside the word <on>, it's problematic that, syntactically, <true> and <foobar> look the same but are treated differently.


Its something of a tradeoff space that is always there you either have things that look the same be implicit (eg is 1 and 10000000 the same type? How about 1 and 1. ? How about 1 and -1?, how about 1 and moo? 1e2 and 1f2?

You can have things be explicit and verbose (have explicit unambiguous identifiers on every value), or you can have things magic and then surprising in something like this.

Rails also made a similar decision to basically auto convert what would be the string true to the Boolean value true in many cases. It's not obviously right or wrong, it's just a choice that has tradeoffs.

I think my own sensibility is that it's actually reasonable for something like yaml to allow you to choose to omit quotes and if you do then it's automatic typing: 1e2 is a number and 1z2 is a string. If you want it to be a string you can put quotes on every value if you want, exactly like you would in json, choosing unquoted for things that are strings is choosing the automatic typing case.

But these other values like y and n being autotyped as bool are suspect autotype behavior to me: even once you've accepted unquoted is autotyped you are more likely to be caught off guard by this than by 0 vs true vs 0z.


I'm nose deep in an Oracle to Postgres conversion at the moment and from that experience, among others, I can absolutely assure you that although booleans are definitely simple they are also most definitely a very sharp edge case.


But who doesn't love the OracleDB tri-state boolean?

/s


Booleans, or NULL specifically?


yes (in oracle - at least the early 21st century version I'm using you have to roll your own boolean-ish - traditionally char(1).


The "edge case" here is "I'm using dynamically typed language and some parsers implemented alternative wording for true/false"


It was a part of the YAML spec for a long time, so it has nothing to do with dynamic typing or parser extensions.


It does. If I define struct to contain boolean, Go yaml parser will always parse it to boolean

If I define it as any other type, it will parse it as that type. So any conflicts like in the article just don't happen, or at worst, produce error in parsing.

Static typing in this case acts essentially as schema


Go isn’t the only language using YAML


XML did everything and it's perfectly readable if it's formatted and structured well. This zoo of different markup/object/whatever languages we have to deal with now is largely a mistake.


This is confusing because XML is the archetype of what I would consider unreadable. If I got a prompt in a programming language design workshop to intentionally design something unreadable for humans, I would start thinking about XML and see where it leads me. Can you name any language any more unreadable than XML to help me?


https://www.mssqltips.com/tipimages2/2899_img5.jpg

I look at this and I see a visually-well-organized hierarchy of objects with properties and children. It's a sideways tree diagram. I honestly don't get what's supposed to be unappealing about it.

> Can you name any language any more unreadable than XML to help me?

Most of the other markup languages depending on the usecase. The "redundant" tags everywhere like </Address> in XML are very helpful if you're in the weeds of a big document that's taller than your viewport - you still know exactly what you're looking at and where in the hierarchy you are, as opposed to JSON or YAML which I find much easier to get lost in.

Anyone can pick up a (properly-named) XML document and tell what it means at a glance, but JSON is easily a mystery unless you know the implicit schema beforehand.

Maybe there's some minor friction to get past looking at XML because it's a little visually busy before you learn how to parse it out? Is that all there is to it?


I understand where you’re coming from because it’s true XML is going to perform the functions required without all the problems implicit with YAML’s “readable” syntax…

But I have worked with some very complex XML and I feel you’ve chosen a simpler to read example.

I have seen very complicated XML from Microsoft’s own configurations that is so large, long, and complicated that it is very hard to follow


XML is a mistake. 80% of every XML document is just redundant noise because the IBM lawyer who invented the SGML syntax had never heard of S-expression.


The "redundant" noise makes it more human-readable in my opinion. If you're in the weeds of a big XML document that's larger than your viewport, being able to see that you're at the end of a </Address> helps you keep your place, as opposed to something like JSON which I find much easier to get lost in. Even if it's not that big I appreciate the extra rails on the XML making it easier to keep my place.


I consider this a fallacy. The condition you are given is not a common occurence. In the event they do occur, the solution can be found in your editor or viewer. It does not make sense to inject spurious junk into what is designed as a data interchange format, in which includes network information interchange.

There's probably a name for this kind of fallacy that exaggerate the effect of a problem and the resulting contortions that were engineered around it.


Writing documents with S-exprs will get very tedious once you have to deal with stuff like whitespace and such that will inevitably show up when you're dealing with text documents. SGML is specifically optimized for those use cases, which is why it is so much more awkward to use when all you want is serializing some random structured data.


All you need is some sort of Python’s raw string or triple quote or HEREDOC. What’s so hard about that?


Try writing a document that's mostly text, but with plenty of markup interspersed in the middle, in this manner.


We are talking about a configuration language here, which is never about markups. Also, most of the so-called whitespace problem in Ansible or k8 configs comes from embedding config that's not YAML. In these cases you need a path or URI for the processor to resolve before applying, not inverting the entire configuration language to cater to 1 uncommon use case.


The point of my original comment was precisely that SGML was optimized for text documents. I agree that adopting it for configs was a mistake, but the complaint that "IBM lawyer who invented the SGML syntax had never heard of S-expression" doesn't make any sense in that context.


SGML is way shorter than XML.

But as complex as YAML to understand


XML is too much of a pain to write by hand.


XML is like violence. If it’s not solving your problem, you need to use more


XML is pretty tolerable with a slightly more sophisticated editor that supports completion. The things that make YAML awful never get any better.


Throw in copilot to anticipate all the boilerplate and repetition for you and it's downright pleasant.


It’s not that bad really. About the same as HTML, yet more difficult to mess up.

Has comments, unlike JSON.

More difficult to mess up than YAML.


If JSON and YAML were first on the scene; XML would be invented as an answer.

Hard to mess up. Robust schema language. Very flexible. Easy to process.

Yeah, it’s verbose, but that’s the trade-off between usability and robustness.

Now get off my lawn!


What XML replaced was a combination of custom designed (often binary) formats and HTTP query-string like syntaxes. It is quite verbose compared to either of them, and explodes both your bandwidth and serialize/deserialize times, but it is arguably easier for your new co-worker to guess that the address line 2 goes in <address2>...</address2> rather than prefix tagged with 13, or worse yet at offset 42-61.

I got used to XML, tho I never could quite understand XSLT and the desire to program in it. I got used to json, but yaml I just can't bring myself to parse. YAML is 90% stuff you can't guess and just 10% data. And why so many?


> Thinking that JSON is a suitable replacement, imagine writing Ansible or k8s stuff in it; that would be fun ;)

k8s via helm is often templated via go template strings; which works by creating an unreadable and unhighightable mess, introducing lots of its own bugs.


INI has no spec, and there are many variations all slightly incompatible with each other, kinda like Markdown. YAML is really the only sane configuration language that lets you denote nested structures while keeping them look nested.

Basically, all of the problem identified in the article can be dealt with 1 rule - always quote your strings. I agree with the author we should have reduced, safe and minimalist subset of yaml, which is basically YAML 1.2, released in 2009.

P.S. Please just stop using PyYAML.


There's HOCON which is pretty good if you can run on a JVM. It's a superset of JSON designed for readability and human-friendliness when writing config files. It doesn't change the type system and doesn't have yamls weird edge cases, but is still a lot easier to write than JSON. There's also a relatively tight spec.

https://github.com/lightbend/config/blob/main/HOCON.md

We use an extended version of it for our app and the resulting config is pretty clean. You can see an example here:

https://hydraulic.software/blog/8-packaging-electron-apps.ht...

The only downside is that the reference implementation is hardly maintained anymore.


I second that. And if people need to deal with YAML in Python, they should be using ruamel.yaml, which is a far superior library on just about every level: https://pypi.org/project/ruamel.yaml/


Except that the author of ruamel.yaml:

refuses to use git [0]

refuses to take community submissions (except through Stack Overflow? Seems like a misuse of SO) [1]

and refuses to implement .dumps() [2].

He is difficult to work with, and any time I need to debug code that intimately deals with ruamel.yaml types, I wince.

[0] https://github.com/pycontribs/ruyaml/issues/1

[1] https://yaml.readthedocs.io/en/latest/contributing.html

[2] https://stackoverflow.com/a/63179923/15170511


Why would I care that the maintainer prefers Mercurial? The document you linked to doesn't say anything about them not taking community submissions: I think you need to scroll down further in the page. And it certainly doesn't say that you have to use SO to submit changes.


Or if you want to go really minimalist there's also https://github.com/crdoconnor/strictyaml


> Please just stop using PyYAML.

Why?


From the article:

  Although yaml 1.2 is more than 10 years old by now, you would be mistaken to think that it is widely supported: the latest version libyaml at the time of writing (which is used among others by PyYAML) implements yaml 1.1 and parses 22:22 as 1342.


Jsonnet and Cue are the places to be looking.


Cue is one of the most exciting developments that's impacted me professionally in recent years. I can't advocate for it enough.


Cue is interesting, but why is it only available as a command-line tool rather than a library? I'd want to integrate such a configuration language in my programs, so I could use its evaluation and validation capabilities rather than writing a custom parser/validator.


If you're writing in Go, you can use it as a library. This is poorly documented, unfortunately.


Also their website is terrible. One of those projects that assumes you've already decided (or been forced) to use it and have cleared out a week of your schedule to learn how to use it.


I learned cue from it during one weekend with plenty of time to play with kids, using it in production since 2020, it's been absolutely great, zero problems, very terse configs, intuitive formalism.


I took another look and eventually found the bit of the website that they should put front and centre in the Tutorial page. Still difficult to navigate (why doesn't the page tree show up on the left?) but it is at least well written and to the point.

The "learn more" button on the front page should link to that, perhaps with a single paragraph giving motivation.

And the main page breaks the fundamental rule of programming languages/formats. Put examples on the front page!

I assumed they hadn't done that because the examples would be too complex or maybe the concepts were too difficult to demonstrate with small examples but having gone through the tutorial that isn't there case at all.


Are you talking about https://cuelang.org/docs/tutorials/tour/intro/ ? Even that is a bit light on detail for real usage, while https://github.com/cue-lang/cue/blob/v0.4.3/doc/tutorial/kub... is kinda rambling. I don't think that there's "One true way" to introduce these concepts; How you teach cue to a config-generation novice is very different from someone who's used to using an IDE to generate kubernetes YAML.


Spec [0] is very good + practice with small files.

[0] https://cuelang.org/docs/references/spec


I find specs nearly unreadable when trying to first digest a language; While invaluable for advanced usage and implementation, I can't read a BNF-Style Spec and make heads or tails of what's going on unless I also have an annotated example next to it.


I agree. The website is heavy on theory and very light on practical usage.


No. XML is broken for structured data that isn't HTML because it's not really clear (there aren't even "best practices" afaict) what should be a text node, what should be an attribute node, and what should be a subtag node.


It's not like other formats don't face similar questions. E.g. if you have a list of key-value pairs to serialize to JSON, do you translate those keys to JSON properties in an object, or do you translate each pair as an object in an array?


The motivation for doing that in JSON comes with best practices: you use the key-value objects thing if and only if you have an ad-hoc list of items that you'd really really like to shove into a well-typed schema.


Use S-Expressions?


My distaste for yaml led me to attempt just that. The first piece that was missing for me was a python parser that could produce reasonable error messages and could be transformed into the desired internal representation in python. So I wrote one [0]. It was supposed to be a single file that was less than 100 lines and could be copied and pasted into any project that I needed. Turns out that the issue was a bit more complex.

The issue is that there is sufficient complexity in finding a portable representation for configuration formats that it just kicks the can down the road. On the other hand it means that as soon as you decide what format you are going to support you can quickly implement it. There is more or less a intersectional grammar that works across most if not all lisps, and that is the plist `(:k v :k2 (:k3 v2))`. So I settled on that for my own use.

After all that work I have not dealt with the fact that numbers and chars do not have a portable representation across lisp dialects, which is a key complaint in other threads here. Limited support for let binding constants also seems like a feature that would allow for just enough expressivity to make the format useful without opening up the terror that is `&` and `*` in yaml (cool and useful as it may be).

In summary s-expressions are: 1. missing good parsers in a number of language ecosystems 2. not standard across lisp dialects 3. need additional semantics for binding, multiple expressions, etc. 4. still better than yaml and json

0. https://github.com/tgbugs/sxpyr


I know what s-expressions are, vaguely. Vaguely in terms of "I couldn't write a grammar for them off the top of my head.", that is, not "what are they".

Is there a single agreed-upon defined grammar that everyone can use? Preferably one simple enough that like JSON's it is at least capable of being used as a graphic on the home page for the format? https://www.json.org/json-en.html

This is an honest question, because there may well be and I don't know it.

However, I will put this marker down in advance: If multiple people jump up to say "oh, yes, of course, it's right here", and their answers are not 100% compatible with each other, then the answer is no.

The other marker I'll put down is "just use common lisp", I want verification that it really is 100% standardized, no question what any construct means, ever, and I still bet we get people who would rather see Scheme or Clojure, and I bet there's some sort of difference.

Neither of these objections is fatal to the idea. JSON is technically not just "javascript objects", so if someone carved out a defined format from s-expressions, then held it up as a standard, that would be as valid as what Crockford did. But at least as of right now, I'm not aware of anyone having done that standardization work. Replies welcome.



I don't see a grammar I could use for an external standard there. I could be missing it. And I assume there must be one implied, since it appears to be a read-write format. However such things have a nasty habit of turning out upon further inspection to not be as generic as supposed and have local idioms buried in them, often in surprising places. Even JSON, simple as it is, had that problem, and it is after all a subset in an attempt to squeeze out those localisms.

Is it compatible with Common Lisp, such that all such s-expressions will be compatible with it in some natural way?



  SEXPR := YOUR-CUSTOM-LITERALS
         | "(" SEXPR* ")"


Ah, no types, no NIL. Every user has to specify how their usage expresses strings, integers, booleans, etc.

Maybe start from one of the already existing standards https://en.wikipedia.org/wiki/S-expression#Standardization ?


Along with the many other issues that would have for just straight-up not being a usable grammar, the word "CUSTOM" appearing in a nominal cross language standard is a non-starter.


Oh come on. Did you think anybody would write an RFC in the comments? Add arbitrary list of whitespace tokens everywhere and pick whatever favorite grammar you have for literals. Json strings and numbers since you seem to like them.


Of course not. But you are at least semi-seriously trying to propose something. I promise you it won't be that easy to get any sort of agreement.

This is a classic "oh come on how hard can it be" that, once you get into it, it turns out the answer is very. Arbitrarily so. Agreement is hard. Details are hard. A standard with no details is not terribly different from "just give us some text".


Well, I would have suggested Racket. Common Lisp is a bit ugly, both conceptually and syntax-wise.


With each tool using its own macro-based DSL for managing them. That would be fun


S-expressions do not necessarily imply Lisp nor an executable environment, it's a (family of) syntax(es) to build a document format. Like EDN.


So even worse as you can't use any prior knowledge of lisp


JSON is good because it doesn't let you do stupid things like writing programming languages in it.


This is exactly what i'm doing with this YAML to SQL converter: https://gist.github.com/revskill10/57ecd8efb72f361b93e6d9d9f...


Why???


- No parentheses is needed

- No forced order of select, order, having,... Want from before select ? Fine.

- Composability

- Extensibility.

- Security: Prevent SQL injection when sent from browser.


Is INI a format in the sense that it has a formal grammar and a bunch of compatible parsers and serializers, or is it more of a convention?


I've never met anyone who say's "I like YAML, it is great"... most people that worked with it say something like "YAML is annoying, I don't like it"...

While introducing Kubernetes at our company in the last two years, we are currently in a process going more and more away from YAML with internal Helm charts to a much simpler process by just using HCL and Terraform, and defining Kuberentes resources as Terraform resources.

As a software developer HCL just makes so much more sense than this YAML + Helm + Go templates hell, which feels like C preprocessor hell all over again. Other solutions like kustomize are neat, but I don't see how all of these YAML workarounds should be better than something like HCL with Terraform. HCL feels like a real declarative programming language (with real conditions, variables, a module system and useful built-in functions). YAML feels like another more complex JSON and other tools like Helm or Kustomize try to work around the weaknesses of YAML with some kind of templating system.

YAML looks nice to read in simple demos and in small files, but is just not adequate in the real world (in my personal opinion - I know that YAML is used by a lot of people in production as of today).


> I've never met anyone who say's "I like YAML, it is great"

Maybe I'm older than you, but I have definitely heard that line.

Mostly because the alternatives were XML, INI or the myriad of bespoke formats, relayd/apachehttpd .conf or iptables etc;etc;

INI has parsers that operate in different ways and doesn't support heirarchies... so that's not ideal.

JSON and YAML came to the fore around the same time, and JSONs limitations in comments and it's picky semantics meant that people did prefer YAML over JSON for human readable configs.

YAML itself is fine, it has some really awkward warts and the parsers are usually programatically unsafe in their implementation (leading to less compatible "safe_load" or other types of loaders)[0]; the issue we actually have with YAML is that we:

A) Template it (jinja, mustache whatever)

B) Put entirely too much stuff into it. (kubernetes manfiests can grow to the hundreds of lines really easily)

These problems will affect any configuration file format we choose to use, including TOML (which is comparatively new on the block), because reading templated/enormous files is really difficult.

What I've taken to doing is programatically generating objects and then serialising them as whatever my software depends on. It might feel ugly to use an entire turing complete language to generate objects that are mostly static: but honestly... the ability to breakpoint, test and print the subsections of output is astonishingly nice.

Then I don't care at all what the format is.

[0]: https://www.serendipidata.com/posts/safe-api-design-and-pyya...


I kinda miss XML. Yeah, I'm weird.

The tooling is super mature, it's easy to emit, it's easy to parse, it's easy to validate, it can just a little hard to read and write by hand (and I mostly blame SOAP for that). Still, basic XML isn't that hard to read or write, thanks to editor support.


I still wouldn't want to read this, and this is a simple example. :\

    <apiVersion>apps/v1</apiVersion>
    <kind>Deployment</kind>
    <metadata>
      <name>
        some-deployment
      </name>
      <namespace>
        deployment-namespace
      </namespace>
    </metadata>
    <spec>
      <replicas>1</replicas>
      <revisionHistoryLimit>3</revisionHistoryLimit>
      <selector>
        <matchLabels>
          <app>
            some
          </app>
        </matchLabels>
      </selector>
      <template>
        <metadata>
          <labels>
            <app>
              some
            </app>
          </labels>
        </metadata>
        <spec>
          <serviceAccountName>dbuser</serviceAccountName>
          <nodeSelector>
            <iam.gke.io/gke-metadata-server-enabled>true</iam.gke.io/gke-metadata-server-enabled>
          </nodeSelector>
          <imagePullSecrets>
            <name>dckr-auth</name>
          </imagePullSecrets>
          <containers>
            <name>
              some-service-container
            </name>
            <image>
              dckr.io/some/deploymentImage:latest
            </image>
            <imagePullPolicy>Always</imagePullPolicy>
            <ports>
              <containerPort>8000</containerPort>
            </ports>
            <env>
              <name>DB_URL</name>
              <value>postgresql://pgsql%40{{.Env.GCP_PROJECT_ID}}.iam@127.0.0.1:5432/{{.Env.NAMESPACE}}-{{.Env.SERVICE | strings.TrimSuffix "-service"}}</value>
            </env>
            <env>
              <name>LOGGING_LEVEL</name>
              <value>debug</value>
            </env>
            <env>
              <name>TRACING_ENABLED</name>
              <value>false</value>
            </env>
            <env>
              <name>APP_NAME</name>
              <value>
                some
              </value>
            </env>
            <env>
              <name>AUTH_URL</name>
              <value>http://auth/auth</value>
            </env>
            <env>
              <name>KEYCLOAK_KEYSET</name>
              <value>/protocol/openid-connect/certs</value>
            </env>
            <env>
              <name>MAILCHIMP_URL</name>
              <value>https://mandrillapp.com/api/1.0</value>
            </env>
            <env>
              <name>STREAMING_SASL_ENABLED</name>
              <value>true</value>
            </env>
            <env>
              <name>CORS_ENABLED</name>
              <value>true</value>
            </env>
            <env>
              <name>CORS_ALLOWED_ORIGINS</name>
              <value>*</value>
            </env>
            <env>
              <name>CORS_ALLOWED_METHODS</name>
              <value>*</value>
            </env>
            <envFrom>
              <secretRef>
                <name>kafka-access-secret</name>
              </secretRef>
            </envFrom>
          </containers>
          <containers>
            <name>cloud-sql-proxy</name>
            <image>gcr.io/cloud-sql-connectors/cloud-sql-proxy:2.0.0-preview.3</image>
            <args>--auto-iam-authn</args>
            <args>--structured-logs</args>
            <args>{{.Env.SQL_CONNECTION_STRING}}</args>
            <securityContext>
              <runAsNonRoot>true</runAsNonRoot>
            </securityContext>
          </containers>
        </spec>
      </template>
    </spec>


Honestly, it doesn't feel too bad to me. But it's also a bit overly nested for many things; I wouldn't want to read that in JSON either.

XPath (and JQ) FTW here.


Reads fine to me, although would benefit from a few element->attribute switches.


I like YAML.

I like that you can use anchors and merges. It greatly simplifies complex, repetive structures. And most of the complaints about yaml can be worked around by string-quoting.

The whitespace can get in the way if you're templating, but then you can also use [1, 2, 3] as a list notation, for example.

In fact, most of the complaints could be resolved by running it through a linter.


I like YAML. More specifically, the subset of YAML, like the author suggests. Clear, intuitive, and allows expressing of complex data structures like JSON does. Much better than TOML, which easily becomes a mess with more complex data.


Yeah. My headcanon YAML is amazing.


Yup exactly my experience as well, again a stupid idea to try and make a "configuration" language out of nested key value pairs that end up needing fancy interpreters allowing more and more semantic into the keys and values to start doing what a simple program could have done in half the time...

I ve worked in 4 companies over a period of 10 years, each had exactly this problem, with yml, json, xml, properties file (you dont want to see business logic conditionals in a properties text file, where the keys shapes command an interpreter to behave dynamically...)

The only times I saw a team do it well was a php backend of all things where the lead said they d program all their variations in php rather than source it from configuration flat descriptors and it was amazing, clear, simple and powerful. They had to release the backend at each config change instead of releasing the config change only, but Im still unsure why exactly that's a problem: the configs are software too if we re honest with ourselves, shoe-horning them in a descriptor language isnt gonna make them flat.


The problem is mixing data with logic. I cannot imagine how maintainable a small program becomes when this concept is employed, much less a big one.


I don't think YAML is great, but I still think it's the best format out there.

The only confusing problem I've run into was the sexagesimal number notation and even that was fairly obvious. Perhaps it's because I tend to overquote strings?

I mean sure, the on/off to boolean mappings are annoying, but they also become very obvious when you're parsing config because the type validation will fail. If `flush_cache` has an enum `on` but no key `True` then the type validator will instantly complain about both the missing key and the extra key in the dictionary.

Same with accidental numbers, any type check will show that the parsing failed.

I find JSON for config files to become unreadable quickly because of the non-obvious nesting and the lack of comments. You can pick a JSON extension but then you need to pick one that your tooling will support.


> I still think it's the best format out there.

What do you think of https://toml.io ?


TOML solves a lot of issues but I find it hard to visualise when you get much deeper than two or three levels.


Exactly this. I hear so many people recommend TOML over YAML.

I see the logic in it. For simple Key-value configurations, TOML is superior and more straightforward to YAML. You can add sub-level values and it isn't too bad (if there aren't too many), but beyond two levels, TOML becomes difficult to use.

If you really work in YAML in any sort of more advanced capacity (kubernetes, Ansible, CI/CD Pipelines) then you really need the complexity that YAML provides. You also get used to the "gotchas" mentioned here. Navigating them is fairly straightfoward.

I think the article was vastly overblown. Is YAML perfect? Certainly not. But you find a better way to display such complex data structures in a more human-readable and human-writable way. The complexity is YAML's strength, but it comes with caveats as all complexity generally does. I really think its the best we have.


I think problem in this particular case is using YAML as DSL. Every other data format would be equally bad here. Replace YAML with TOML and you're still in same templating hell.

YAML is least worst for me, and I don't think I ever hit the problems article is showing because

* I use editor that will highlight stuff like anchors

* I often generate config from CM so it can't have those errors

* Loading into defined struct in statically typed language also makes them impossible.


YAML is nuts, and JSON is annoying (trailing comma limitations, lack of comment syntax no matter how annoying it is that the spec is correct about why there are no comments).

Both have their place though. YAML came out of perl, and both are some confluence between awesome and horriffic (although yaml wins the horrific crown for sure).

I've had a little bit to do with Ingy - the inventor of yaml, and I've worked closely with some of his collaborators. Ingy is nuts, mostly in a good way, but I wouldn't put him in charge of the architecture, I'd put him in charge of the abyss.


JSON also borrows (earlier versions of?) JavaScript's "all numbers are doubles", leading to this kind of thing: https://rachelbythebay.com/w/2021/02/03/bits/

Though, in fairness, I think old Perl did that too. It's super convenient until it isn't.

Rachel also doesn't approve of JSON in high-reliability systems for other reasons: https://rachelbythebay.com/w/2019/07/21/reliability/ and point taken, if you're sending data from your service A to your service B and neither is a web browser, nor are they written in JS, then there's far better formats and you almost need a reason not to use protobuf.


There was (I think probably still is) a qemu bug with JSON. It accepted requests to read guest memory in JSON format, with the memory addresses encoded as JSON numbers.

When reading out guest kernel memory (addresses are at the top of 64 bit space) these would silently be rounded to the nearest whole double. It took me a very long time to understand what was going on.


Actually JSON doesn't specify what numbers are - it would be perfectly licit for a JSON parser to transparently use a real numerical tower, allowing perfect representations of any non-repeating decimal fractional number (since it has to be represented as a dotted-fraction and there's no support for a vinculum (aka U+0305 / COMBINING OVERLINE / ◌̅ / 3.21̅) there's no way to represent non-repeating fractions if there is not a non-repeating representation in base 10). A few JSON parsers even do this. That said, if you don't control the both sides sending something that won't be handled by the lowest-common-denominator (browser JSON parsers / JS numbers) is asking for trouble.


The short story is that if you want real foolproof interoperability, always represent numbers as strings in JSON.


Interestingly, your example renders incorrectly for me (Firefox/W10). The overline is placed between 1 and ).


This is a great post but my understanding is this has nothing to do with JSON, which is unopinionated about numbers. Rather, with JS's JSON parser.

Python, for example, has several JSON libraries which let you swap out the numeric parser so it yields Decimal objects all the time. It's overkill for most use cases, but essential if you're working with REST APIs in Fintech.


JSON doesn't specify what numbers are. Integers that take 2MB to represent are valid JSON numbers.

Regarding protobuf, the following opinion is obviously insane, and if your org is already using protobuf you should ignore it: protobuf actually seems pretty bad? It has a bunch of vestigial features that people just say not to use. Its integer encoding bloats the encoded size and causes unnecessary dependency chains in the decoder. I would strongly prefer sending simdjson tape between processes and storing simdjson tape at rest, but if my coworkers insisted on doing something normal, maybe I would look into flatbuffers or capnproto.


YAML used from withing statically typed language gets rid of most of the problem, but the main one seems to be "well, we figured out which stuff was just a bad idea and put it in 1.2, except nobody uses it"

> Both have their place though. YAML came out of perl, and both are some confluence between awesome and horriffic (although yaml wins the horrific crown for sure).

Weirdly enough I'm not getting most of those issues in Perl YAML, "norway problem" for example

    use Data::Dumper;
    use YAML;
    
    my $a ="---
    geoblock_regions:
      - dk
      - fi
      - is
      - no
      - se
    ";
    print Dumper(Load($a));

    $VAR1 = {
              'geoblock_regions' => [
                                      'dk',
                                      'fi',
                                      'is',
                                      'no',
                                      'se'
                                    ]
            };


This reminds me of a certain architect at my last shop who invented a DSL on top of his Python superapp. He expected all projects to go through his superapp. The DSL was configured in YAML. The YAML was often so dense he recommended devs use Jinja to generate the YAML.

This meant debug was hell, plus it wasn't always clear if what you were trying to do was even supported / if not why & what needed to be changed. This was because you were now 3 levels of abstraction away from the Python code that was actually executing.

Every time a dev took on a new project they had to jump on a call with architect or right hand man to figure out if what they were trying to do was going to be possible.

It escalated into the architect demanding to know a sprint in advance any task devs were trying to do, in a review session, so he could explain if it was possible or not and try to triage in his DSL..


>The DSL was configured in YAML. The YAML was often so dense he recommended devs use Jinja to generate the YAML.

Did he then went on to design Ansible ? It falls into same trap

Only way you should be generating data format using language's templating system is

    <%= YAML.dump(@config) %>
Also 9 times out of 10 I wished the app designer just used <app language> or <any common embeddable language> (like Lua) instead of making any kind of DSL (whether that's just data file pretending to be code or micro programming language)


Yes. I think this is like the uncanny valley of development.

It's not no-code UI driven stuff you can put in front of a business user.

It's not real coding, which an engineer wants to do.

It's config jockeying, which devs find boring, and is generally far more limited. So you end up building out more and more complex layers of config to work around the limits, including scripts to generate config.. etc.

Seems like what you really want is modular apps that are easy to extend in the native programming language(s).

Also maybe I'm just stupid, but 9 out of 10 times, a text file or csv/demitted file accomplishes most of what you need for pure config that really belongs in config.


Tell me! I'm currently attempting to resist an upcoming transition form .csv to yaml...

... which is used to generate .xml.


This is basically how I feel about working with K8S and dredging through a repo full of templated YAML spaghetti. What am I looking at now? Helm, Keda, Flux, Argo, OperatorHub, GitHub Actions? oh actually this bit is in Terraform in another folder, whoops.

You can’t actually deploy something unless you can mentally untangle it all, it just sits in front of your infra as a sort of DevOps Coming of Age ritual, where you look whistfully over your shoulder at the old Heroku or Vercel account you grew up with. Simpler times.


At work someone is trying to introduce a system where a bunch of Jinja templates in a repository are used to generate XML which can then be used to generate another XML document which can then be "executed", resulting in an annotated XML document :)


Wow that's like 2000s & 2020s mutant baby


And the context variables for the jinja is in part read from a YAML file iirc


I've read about places that do this kind of stuff. Although it sounds like pure hell, I'm sure there's always a reasonable explanation, an intent. What kinds of problems was the org facing that led to the development of this?


It wasn't really necessary. It also made the core superapp a blocker of essentially every user delivery. If it has to do everything, then you have to add a lot of features to it.

So you have N devs doing jinja/yaml/dsl and N/4 doing core superapp underlying dev.

For the first Z features/projects, you inevitably see new things that your superapp doesn't support yet, and becomes emergent blockers midway through implementation.

Given the ratio of devs, new blockers were being generated faster than they could be cleared.

Eventually business side pulled the fire alarm and grabbed most of the devs over to use a more common AWS-centric service directly and exit the superapp dev use completely.

IE - if you are going to depend on something, would you rather it be an AWS service with 10s-100s of dev-years behind it, or some internal superapp spun up 3 months ago with 2 guys on it? Which is more likely to already support what you need?


I guess YAML has a place in that it would prevent that kind of thing happening in the first place.

YAML is easy to debug (thanks to having comment syntax) because it just deserialises into code. Sometimes it deserialises into code that compiles on the fly mind you which is never a good idea.

On the other hand one time I debugged a really nasty memory leak by dumping many megabytes of YAML then running git diff against the dumps. That was fun. Of course the client used the quick and bad hack rather than the demonstrably correct fix (thanks to the dumps) because they were frightened of their own code.


I would quit within a month


Some did :-D


That sounds like a layer of insanity that would make me consider jobs elsewhere. It sounds entirely unnecessary and burdensome , but was it unnecessary?


It was unnecessary because he was being too clever & having a good time, versus ever having delivered real production systems in our industry.

It also put devs on a dead end path which they realized pretty quickly. Do you want to work for years on this team becoming experts in jinja to yaml to in-house DSL you'll never use anywhere else? Or do you want to write some python? If you can't get "promoted" into the team writing the core python engine, then you are obviously a second rate.. why stay?


Even beyond that I would worry about the decision making at the top that… they thought that was a good idea and kept at it.

At that point a lot of potentially absurd things could be in play.


Less technical management hires a hero who tells them everything they want to hear!

"I'm going to deliver the superapp, everything will be super centralized & tidy.. small dev team, then all the specific implementations will be grunt work by cheap devs!"

Throw in some buzzwords and they are sold.

Same audience that always signs the checks for no code/low code stuff no one actually wants.


> lack of comment syntax no matter how annoying it is that the spec is correct about why there are no comments

This completely arbitrary ideological purity has come at the expense of countless wasted hours, headaches, and suboptimal workarounds like using strings as comments, with zero tangible benefit - zero bad things would have ever happened if JSON allowed comments. There is nothing correct about it.


I dunno about that. I remember how microsoft were back in the IE days. They could have used comments for really bad stuff.


> I wouldn't put him in charge of the architecture, I'd put him in charge of the abyss.

:-) I enjoyed this bit a bit too much, now I want it on my tombstone #lifegoals


Is this the same Ingy that made Test::Base? It's the best data-driven testing framework I've ever used, and I've missed it often while working with other languages. The follow-up polyglot framework just didn't cut it for me.


I would be really surprised (and slightly worried) if there was more than one Ingy :)


Do people dislike TOML only because it looks like a Windows INI file? I think it’s nice. Rust chose it in keeping with their penchant for sanity most of the time.


I would prefer if logically nested blocks could also be phyisically nested (and indented), so you can have a full tree structure. If you're describing something that can have variable levels of nesting (think folders) then it can sometimes make the format easier to understand.


I like YAML for reading and TOML is entirely worse for reading (still million times better than JSON tho), and as the use cases are mostly read, rarely written, and if written they are code-generated (using configuration management), YAML fits better.


JSON5 is the answer. I have yet to find a better option.


I never could understand the hate XML got, but I'm having a bit of schadenfreude seeing what people suffer through with its replacements.

An XML document with a well-thought-out domain-specific DTD would solve all these problems; instead, we have something where no sometimes means false (but not always) and 22:22 sometimes means 1342 (but not always!)... because... why not?!?

All that horrible mess, it seems, because people didn't like to have to close tags.


It’s the learning curve really.

Our industry has the remarkable properties of being almost entirely newbies, a constant churn of green developers, combined with being very bad about passing down generational knowledge. This is why things that are easier to explain win out over things that are technically superior but take longer to get your head around almost every time.

The thing JSON had going for it over XML is that it maps cleanly to most languages object models so you can read the results directly. No writing XQueries or DOMs to read values.

It’s only major downfall is lack of comments, which has lead people to YAML. (There’s plenty of other things it lacks in comparison to XML, like native schemas, but most of that falls into things newbies don’t know they want)

YAML is in this weird middle place where it’s easy to explain but impossible for a human to master. It appears as simple as JSON to newcomers who adopt it, but the long time users of it find it full of foot guns. People wanted JSON with comments but instead they got the complexity of XML minus the clarity.


XML is/was hated by newbies and old farts alike.

Of course some of the hate come from the application where XML was used, more than XML itself, but it also is a deeply flawed language.

Nowadays the main issue would be that it requires a complex generator and parser libraries to be any useful (you'll never want to deal with XML parsing/escaping by yourself) yet it's not as efficient as binary formats like protobuff for instance.

That means that anything you'll want to edit by hand or be purely textual and readble will be better done in yaml or json, and anything beyond that can be done in other ways. The need for a single language trying to awkwardly span all the spectrum isn't big.


We all hate XML.

We equally hate all replacements (as a config language) after it. At least XML stayed around long enough to have decent tooling.


XML is ok when both the reader and writer are machines, but you still want a text-based format (otherwise I'd just use protobuf) or when maybe you occasinally want to edit by hand but not often.

I think it's the same reason why, although HTML is a great standard, lots of us like writing in Markdown and having something convert that to HTML, despite all the problems when you try and push Markdown further than it was intended.

YAML, as I see it, is trying to be to XML configurations what Markdown is to HTML, with the added bonus of an attempt at a tag/reference system to store object graphs that are not trees. Back in the day of XML-based Spring config files, we had <bean> and <ref bean=...> but as far as I know that's implemented on the Spring layer, it's not a generic property of XML, whereas YAML tries to abstract that into the format itself.


> YAML, as I see it, is trying to be to XML configurations what Markdown is to HTML

Yes, that's probably a good analogy. The big difference though is that if some Markdown fails to parse, nothing really bad happens, while a YAML file that fails to parse can bring a whole system down.

Also, Markdown is a famously ambiguous format; it trades precision for ease of write, and that's fine, mostly.

But in a configuration file, ambiguity is really the opposite of what you want.


I worked with hand editing XML files in the past, and I didn't really have an issue with simple XML files. I actually prefer XML in some cases.

I see your point if you need to edit a complex XML with multiple namespaces mixed together, but a plain XML file can be just as readable as JSON.

Some JSON files can be really hard to edit by hand too. At my current workplace I often have to deal with nested JSON files, where a JSON contains values that are also JSON, but encoded and escaped so that it is difficult to edit.


Hard disagree.

XML is great for systems to read and write, but utterly abhorrent for humans. It's not just having to close tags. It's content vs attribute confusion, namespacing noise (which are also muddled with attributes), and there's a squint factor incurred by the density of information.


Namespacing is the opposite of noise: it lets different domains coexist with no possible confusion about which is which.

But experience has taught me that trying to explain why XML is great and simple to read and write for humans, to someone who thinks differently, is useless. And so, I won't.

Yet, please accept that some people, like me, really liked it and didn't mind the little quirks in light of all it offered.


It makes sense that explaining why xml is great doesn’t work. Maybe you can convince someone who’s never worked with xml. But if someone has used xml and thinks it’s a bad format based on those experiences, you’ll need to bring _really_ good arguments to let them think it’s a good format anyway.

Just ask yourself: could a comment here convince you that xml is a bad format? If not, why would the opposite work?


Well yes, if it described a format as pleasant for humans as YAML, while not being ambigious and error-prone.


I disliked it intensely at first, but over time .. I now see the merit in it. I like XML, especially compared to everything else (that came after). It was possible to clearly define things.


On the same system? I can see that. One thing I've lamented about XML for years is that it's a pain when moving to different systems that have their own namespaces/DTDs.


On the other hand XML document with good schema is pleasure to write in any decent IDE.

It just autocompletes itself. I know JSON schema exist, but it never managed to just work that well.


i think that propper schema and autocompleate would save the day. even i would say make things easier. talking about XML


I’ve been using this for (schema + autocomplete) for at least 18(!) years.

Moreover I even got comments on xml attributes and elements, warnings if required element is missing or required count is not met.

You even got visual hierarchical collapsable UI for viewing/editing that XSD schema.

I repeat: 18 years of autocomplete, comments on hover, and validation for typos/count/order for XML in Visual Studio.


20-something years of web/app.config is the primary reason why I hate XML. Very, very closely followed by Java ("a DSL for turning XML into stacktraces") and its over complicated ecosystem of xml configurations.


I have worked with Java for over two decades, and your experience does not match mine.

At $work there are only two XML files in each project:

- logback.xml: configuring logging which is usually only done once or twice and is the same for each project.

- pom.xml: Maven config which is mostly autogenerated by IntelliJ when creating a new project. I only add a few lines for new dependencies.

In rare cases I may use some plugins to do specific build operations, but the actual XML is simple. Most of the time is spent on understanding the plugin itself, and not "fighting" XML. Moving this to YAML has no effect on my productivity

All other configuration is either done using YAML, properties or configuration in code (Spring Boot).

I see Spring still has options to use XML configurations, but I don't see any reasons why anyone would do that in 2023. The "new" standard in Spring is configuration by code, which I have done for the last 8 years.

The ecosystem for Java today can not be compared to what it was 10 years ago. If editing XML configurations is still a problem at your workplace, find a new job :-) Either you work with a legacy system that nobody wants to update to the new standards, or you have architects who are stuck.


XML has the hate because back in the day everyone was like "oh XML is a human and computer readable data format, that will solve all of our problems".

Sure it solved all of the problems by making a data format that both humans and computers found difficult to read.

"Oh I've got a problem. I'll solve it with XML. Now you have two problems" ;)


I remember being excited when XML hit 1.0 (as a new programmer this seemed like a huge advance over things like classic Unix configuration files), and progressively disappointed over the next decade as the promise not only wasn’t delivered on.

The things which killed XML seem to me to be related to the old standards culture: the people involved assumed adoption was inevitable and distracted themselves with increasingly arcane thickets of new standards, with the assumption that someone else would spend time on the “boring” work of building professional-quality tools and documentation or cleaning up usability warts. That other 80% of the work never happened and most people who had a choice moved on.

As a thought experience, imagine if libxml2 had had even a single dedicated developer focused on tracking standards or making usability improvements, instead of training multiple generations of users that XML was slow and hostile to users. Various XML committees’ travel expenses building standards which were never used likely cost more than that. Not leaving XPath frozen around the turn of the century would have helped in so many places.

The other wart I think would have made a surprising difference is the usability disaster around namespaces. So many tool developers forced users to switch between the short namespace:attribute form they used everywhere in the document and the {namespace url}attribute form that resolves to, or forced you to respecify the namespaces on every operation rather than reusing the values the parser had already loaded. Users begrudged that verbosity but they hated it when it meant something silently returned incorrect results because a selector using the document’s own syntax didn’t find the element they could see using those exact values. Absolutely nothing anyone did in the XML world was a better use of time than fixing that would have been since it trained people to think of XML-based tools as a painful, error-prone experience to be avoided — and they did as soon as they could.


I miss the old days of working with XSLT and XPath. A nice way of giving some design touches and filtering/sorting to XML documents. It wasn't perfect but I would take any time over the yaml/json/toml/ini.


I don't know if I'd say I "miss" XSLT and XPath, but yeah, it was fun and powerful at the time. I made some crazy stuff with Cocoon and AxKit that, if you didn't mind the syntax, were actually pretty elegant.


I'm surprised XQuery never became more popular, it's essentially XSLT with a 'normal' syntax


XPath is cool (and I still use it for work regularly). jq JSON expressions seem to be inspired by it as well.

XSLT was horrible though. A programming language with XML syntax, no thanks!


XQuery gives you 90% of XSLT with a syntax that is a superset of XPath. Verbose queries (https://en.wikipedia.org/wiki/FLWOR) are especially nice and readable.


Me too!


Eh, definitely there's some generational round robin going on. And the "amsterdam problem" in YML is just insanity. But you got to admit, XML has some very low level intrinsic problems

* Schemas violate underlying XML rules all the time, and due to variance in XML parsers, it gets a pass but only on specific configurations. There's no one Holy XML Parser. Leading whitespace in attributes? Sure, why not?! Whitespace sensitive element order? Ooh yeah, lots of it! But try and feed it to libxml and the whole thing collapses - without error handling, mind you, but that's not XML's fault.

* Whitespace agnostic means diffs and merges are element aware, and it's nigh-impossible to estimate the upper compute limit of an element aware diff

* There is NO OFFICIAL WHITESPACE SPEC - so if you try and fix it with normalization and lines, it's going to be different everywhere you go. So you're forced to switch on element aware diff / merge in your VCS, which is a pretty big change, and it's one you have to sell other departments on.

* XML breaking 1NF, which, aka means that XML can have XML inside of XML infinite recursion when playing data format = which breaks so very very many things . . it's actually going against the whole concept of a data hierarchy, which is built in to XML at a low level.

* Sort of riding on that, in order to parse XML you have to eat it element by element, and there's no way to tell when an element is going to end or of it recurses N levels. This makes it computationally expensive. The CAD STEP format has some of this disease - it has to be loaded in its entirety to parse, which can be holy hell with a TB file.

* Yeah sure, namespace hell. No one really figured out a way to fix it. S1000D (XML spec) to this day just denies that XML namespaces ever existed, and I can't really fault their WG for doing that. I can fault them for so many other things, but not that.


The really great thing about xml with XSD is the ability to validate the document really well.

I.e. having content matching regexps, so you can be sure version numbers are always \d+(\.\d+)* and does not have unexpected letters or spaces in it. That date values are in ISO format.

Very useful for APIs to 3rd parties as it now is very easy to catch most errors and provide an useful error message for them without having to code every check.


>> All that horrible mess, it seems, because people didn't like to have to close tags.

If people keep burning their faces off it's not the fault of the flamethrower! If they keep hacking each other's limbs off it's not the fault of the chainsaw! If they keep blowing their houses up it's not the fault of the gas bottles! It's not like we need to accommodate every possible failure mode of the human brain, and make things safe just because the next idiot that comes along is going to cause a catastrophe. It's the idiot's fault that they forget to apply the breaks and take advantage of the failsafes, it's not the responsibility of the system to have the breaks and failsafes where any idiot can find them!

I've head that kind of excuse 0 times, outside of software circles. But of course I'm exaggerating because nobody really suffers because of XML other than the people who use it everyday. Like this once junior dev, for example, who was made to create XML by hand by eyballing an Excel document, just because paying a junior dev salary is easier than getting middle management (and clients) to learn to structure an Excel spreadsheet properly.

Sometimes, just sometimes, you have to design a tool with the way it will actually be used in the real world in mind. And all the rest of the time? Well, all the rest of the time you have to do it that way, too.


> All that horrible mess, it seems, because people didn't like to have to close tags.

The issue isn't that it's bad per-se. The issue is that it's cumbersome to write. When you want to setup and tweak a tool, it gets annoying very quickly having to deal with little errors because you misspelled a closing tag. Maybe you want to test enabling a setting? Typing out 4 characters is so much less friction than the 20 or more (including the < > : / characters) you'd need for an XML config.


I mean it wasn't just that.

* `<?xml ...` something line lost a percentage of people already, etc.

* Essentially XML was a markup language (SGML stripped down), while you can use markup to mark up key-value pairs, it just wasn't designed or optimized for that.

* To the contrary, it also did away with the nice SGML features for that instead of `<tag>value</tag>` one could write `<tag/value/` and the self-closing tags (like `<ol><li> ... <li></ol>` in HTML. For most particularly deep config files self closing tags would have been perfect.

* XML attributes vs substructure was just an awkward choice to give out. I.e. `<myConfig userId="asdf" host="asdfasdf" ...>` vs `<myConfig><userId>asdf</userId><host>asdfasdf</host></myConfig>`.

More so I get the feeling when looking at the XSL, XSLT, etc. mania that a similar pattern was at work as with UML: special interest groups and tooling providers driving the evolution of a standard with their interests in mind and not the interest of users or developers.

Overall XML could have been something like this `<myConfig / <userId/asdf/ <host/asdfasdf/` if it was SGML.

YAML and JSON succeeded because they had a clean and predictable, no-nonsense mapping between encoding and object-model after decoding. Probably we should all switch to an almost-yaml format that does away with the peculiarities, and the FANG companies would have the momentum to make that happen.

I personally would like for HJSON (https://hjson.github.io) to see more adoption, but that train has passed...


It’s more typing but it’s simple typing most people don’t even think about because it’s predictable (not to mention automatic in many editors). I’d take that over the frictional cost of thinking your YAML is done and then having to debug magic data conversion or realize that you left out one character causing something to be parsed completely differently.

Where XML falls down hard is tool usability. There’s still no standard formatter or good validation tool, and things like namespace support is a constant source of friction.


> I’d take that over the frictional cost of

You might but the rest of us didn't. At a visual level, YAML is really nice. The problem is when you actually want to use it for... well... anything.

On the flip side, XML is downright ugly to read and write, which makes it a no-go for configuration files that are written and modified by hand.


Look, I’m not saying XML is perfect but the worthy criticism is more substantial than closing tags. It’s a very easy convention to learn requiring little thought and I haven’t used an editor which didn’t automate that process since the turn of the century. That’s less cognitive load than having to remember a bunch of context-sensitive rules about YAML’s magic behavior - I’m thinking in particular of people I know who’ve burned hours only realize that they’d missed a character somewhere, forgot to escape one value, or had an indentation issue causing something to be ignored.

The criticisms I would make are more fundamental: the XML data model is different than the most popular data structures and the APIs in most languages are quite cumbersome and can lead to silent data loss (name spacing and selectors). Someone probably could have done an XML5 effort 15 years ago but by now I don’t see that happening.

If I had to rank the options for a configuration language, I’d probably go HCL, TOML, JSONC, JSON, YAML + Prettier + a YAML lint schema, YAML, XML. XML could rise up with better tools but so could a low-magic YAML variant.


> An XML document with a well-thought-out domain-specific DTD would solve all these problems;

But then a YAML document written by disciplined and well-thinking people will also not have these problems.

Any of the complex formats work well when used in the most fitting way and go horribly wrong when people try to benefit too much from the complexity. XML was also a nightmare when put in the wrong hand, just as yaml is, and the next hyped and turing complete document syntax will be.


If something's too easy to misuse, it's bad design.


Ease to use and ease to misuse usually come hand in hand.

If we had very clear and static requirements I’d see languages/tools with more safeguards, fool proofing, optimized design etc. But we’ll never have that for configuration languages, and when aiming for flexibility you have to trust people to not shoot their own feet.


The problem with XML is mainly in the 'M': it's a _markup_ language. Using it for configuration and arbitrary data serialisation isn't where its strengths lie, but it got shoved into those niches because if you look at it _just right_, you can make it work in them.

I don't hate XML myself, but I do hate how it's been abused over the years.


> I never could understand the hate XML got

It's because so many people would see examples of complex _XML-based formats (WSDL, SOAP, etc.)_ and infer that _XML itself_ is complicated. XML is really not that difficult to understand, and I find it quite amusing that the same people who don't bat an eye at writing HTML complain about how baffling XML is.

Is writing an XML parser difficult? Yes, very much so. But again, that doesn't make XML itself complicated. And before anyone tries to call out this particular comment, keep in mind that writing a fast, correct, and safe JSON parser is no walk in the park either.

Are there many examples of complicated XML-based formats? Yes, but that's just a reflection of the complexity of some particular configuration model, not XML itself.


XML is cool and useful, but for humans you need good tooling to handle it well. And it still is very noisy with the all it's boilerplate and what advanced features can bring in. And overall it has a culture of making things complex, and complicated.


Just like bad/fragmented YAML libraries now, there were plenty of bad XML libraries and implementations. I didn't think XML was too bad until I had to write a SOAP request where the endpoint would throw an error if the arguments (in their own tags, mind you) weren't in a specific order. The endpoint gave me no hints.

Additionally we had to update WSDLs for these services, but the service generating the WSDL used features our server's SOAP library didn't support, so someone had to manually transform the XML and via lengthy trial and error to get it to work.


These days, JSON also has JSON schema that allows you to enforce the structure of a JSON file.

If you need a enum that only allow specific value, you just list it.


True - but then again, writing an XML config file can be a bit of a pain as it's more verbose.


XML is preferable for long or heavily nested data structures.

For small flat trees the alternatives are preferable


It’s the verbosity for me. Why does everything need a fully spelled out closing tag?


I see nothing wrong about closing tag. Helps to navigate and understand in my opinion. I have way more problem with developers trying to be super concise and writing constructs that are very hard to parse mentally when looking at.


So that in long nested data you know which tag closes what. {{{{{}}}}}


I do think YAML is overly complex - but there is some hyperbole in this document.

- Many of these complaints are about YAML 1.1.

- YAML 1.2 was released _14 years ago_.

- The author makes some allusions to 1.2.2, and it requiring a team of people 14 years to make, but, from the yaml.com announcement they link to: “No normative changes from the 1.2.1 revision. YAML 1.2 has not been changed”

I guess my first two comments are undercut by PyYAML using YAML 1.1 (Really?! Python’s had 24 years of the Norway problem?!)


The article mentions the fact that YAML 1.2 is really old and the fact that it doesn't matter because YAML 1.1 is still the most commonly supported version, and the fact that it's arguably even worse because YAML 1.2 gives different parsing results to YAML 1.1!

I highly recommend reading the article - it's very good.


Yes, the problem is PyYAML.


Agree with the final part of this article that "programmable" configuration languages like Nix and Dhall are the way forward.

I've spent a lot of time writing YAML for Ansible, Cloudformation, k8s, Helm, etc. Some of the issues this article mentions are pitfalls but once you get a bit of experience with it, you know what to look out for.

I've also spent and a lot of time writing Nix expressions, which is much more "joyful" IMO. Seemingly simple features like being able to create a function to reuse the same parameterized configuration makes life much easier.

Add in a layer of type safety and some integration with the 'parent' app (think replacements for CloudFormation's !GetAtt or Ansible's handlers), the ability to perform basic unit tests, then configuration becomes more like writing code which I consider a good thing.


Almost all those problems are solved by marking up the strings as strings, using " and ".


Then they should make quotes necessary in the YAML-specification. Little more to write, but better to validate.


I agree. They make a rod for their own back by having implicit conversion rules which, while well defined, are not well understood. "Explicit is better than implicit" and all that.


Advantage: things like "literal_block: |" or ">" are not necessary with a simple quote.

And i really don't know, why in hell they add binary-Data to a configuration language for humans.


Great idea. I would also suggest adding some syntax to make lists and maps more obvious (it can be a bit unclear in YAML). Maybe {} for maps and [] for lists.


My wish for a dream config language is this: Allow a choice between unambiguously-typed expressions (quoted strings, only true/false, decimal numbers) and explicit type annotations. So this:

    regions: $[string]
      - no
      - se
    options: ${bool}
      a: yes
      b: no
(with possibly a different syntax) would be equivalent to

    regions:
      - "no"
      - "se"
    options:
      a: true
      b: false


No, all the problems are solved by this.

It’s really no wonder that it’s hard to create a language that’s supposed to know whether the author intended to write a string or not without it being indicated by the syntax. No other language in the world tries to do this, for exactly the reasons this article points out.


Since everyone seems to throwing their favorite format into the ring (), I will too: EDN [0]

* no enclosing element (i.e., can be streamed)

* maps, lists, even sets

* tags (like "Person". UUIDs and timestamps are built-in tags)

* floating point numbers

* integers

* comments

* UTF-8

* true booleans

* no need to worry about too many or too few commas in the right or wrong place

Implementations in almost every language under the sun [1].

The format is simple enough that it's easy to implement, verify, and test. No strange string interpretation craziness (see YAML and "Norway problem"), no ambiguity between FP and integers (see JSON), comments. And if your editor has rainbow parenthesis support, reading is actually a pleasant experience.

[0]: https://github.com/edn-format/edn

[1]: https://github.com/edn-format/edn/wiki/Implementations


> loading an untrusted yaml document is generally unsafe,

TIL and this is... Not great?

I often parse JSON as YAML. Since YAML is a superset of JSON, all valid JSON files parse as a valid YAML file too, but now you can add comments to your JSON since YAML has comments. Didn't know I was opening my process up to code execution doing this...

Knowing this, YAML is dead to me as a format. This really seems unacceptable for a markup parser.


Pretty much all parsers have some way to "safely" parse a YAML document which disables the custom-object unmarshal stuff; sometimes this is the default, sometimes not, some libraries/languages don't support this at all in the first place.

If it's enabled, then it's trivial to exploit; e.g. in Python:

  !!python/object/apply:os.system
  args: ['ls /']
If you parse this with "yaml.load()" in PyYAML (which, I believe, is the most commonly used YAML library for Python) then it will execute that code. You will need to use "yaml.safe_load()".

I believe the situation in Ruby is similar, and in PHP the only way to get any safe sane behaviour is to modify a php.ini setting (which at least defaults to "off" now, but there is no way to know if a line is safe just by looking at the code).

In Go this entire feature isn't supported at all AFAIK, so it's always safe.

Not sure about other languages from the top of my head.


I'm not sure it's true that parsing an untrusted YAML document is generally unsafe. That has to be a property of the parser, not the format.

As TFA says:

> In Python, you can avoid this pitfall by using yaml.safe_load instead of yaml.load

In Ruby, there are now analogous options (including - of course lol - a gem which monkeypatches the YAML module to make it safe by default) [1]:

> Use the safe-yaml gem which overrides YAML.load

> Require a newer version of Psych that provides a safe-load option

In Java, SnakeYaml has a document about security which i think is a long way of admitting that it's not secure [2]. It looks like eo-yaml just doesn't support tags at all.

[1] https://trailofbits.github.io/rubysec/yaml/index.html

[2] https://bitbucket.org/snakeyaml/snakeyaml/wiki/CVE%20&%20NIS...


To be fair, YAML itself does not specify how tags should be interpreted, technically the idea of grabbing constructors from those is the library's issue. But then again, you wouldn't want to accidentally introduce code execution when you adopt a new library/language.


At least YAML has comments, which JSON doesn't. I know people will say "JSON is not meant as config format" but in reality it is used as such.

In general I don't understand why there is so much churn and focus on config file formats. They can be annoying but in the end the real problem is that we have complex configurations so a different format is not going to help much.

Personally I prefer YAML over XML or JSON for config because I don't have to worry about closing brackets or braces. Maybe we need a good editor to make editing them easier.


EDN’s my favourite config format.

+ Simple (like JSON)

+ Has comments

+ Extensible

+ Built-in sets, UUIDs & timestamps

+ Well-suited to configuration (cf. Aero)

+ Lispy (well, Clojurey)

- Lispy (well, Clojurey)

- Handwavey specs

- Not widely supported


Would you please link an example, a specification, or something? I think I found something on GitHub, but it was not straightforward.



> In Python, you can avoid this pitfall by using yaml.safe_load instead of yaml.load.

It's harder to make this mistake in Python these days. yaml.load(input) has been depreacted since PyYAML 5.1 (https://github.com/yaml/pyyaml/wiki/PyYAML-yaml.load(input)-...) and finally stopped working in 6.0.


>The literals off, no, and n, in various capitalizations (but not any capitalization!), are all false in yaml 1.1, while on, yes, and y are true.

Fyi... I looked at PyYAML source code and the parser code omitted single-character 'y' in the regex. Previous comment: https://news.ycombinator.com/item?id=26681295


> omitted the single character options of 'y' and 'Y' but it still has 'n' and 'N'

'n'/'N' were removed already in 2006:

https://github.com/yaml/pyyaml/commit/8f9b8bed40a53c28



I was at a place 7-8 years ago with a hand-rolled JSON parser that had been written across a few afternoons by one guy without much ceremony. It was untouched after that aside from some renaming and other minor refactors. As usual with these things, the decision to hand-roll was made by one very opinionated dev who got his way because the business depended on him.

When that article was published it started an argument among the team. The guy ended up implementing the test suite to defend his decision. To my surprise the parser passed all tests apart from 3 or 4 unicodey-ones where it was more permissive than it should have been. There were plenty of production bugs at this place but none of them were due to the parser.

IMO parsing JSON really is simple. If you read through the article you'll see that most of the test cases are dead simple and hardly landmines.


This is not claimed by the article afaict. It's stated though that json spec/syntax is simple, and that it's simple as a language; my interpretation is that this means "simple/understandable/predictable for humans".


JSON is a subset of YAML. (you can just dump JSON in your YAML document and it’s a valid YAML.)

So parsing YAML will be strictly harder, as it needs to include a JSON parser.


If language A is a subset of language B it doesn't follow that parsing B is strictly harder.

This is not guaranteed. Consider two languages:

1. Some number of "a" followed by the same number of "b". For example, aaabbb but not aabbb.

2. Any number of "a" followed by any number of "b". For example, aabbb but not aba.

In this case (1) is a subset of (2), but parsing (2) is easier than parsing (1) since in (1) you have to verify that the numbers match (which, in technical terms, means its not regular).


> Json is so obvious that Douglas Crockford claims to have discovered it — not invented.

I mean, it's called "JavaScript Object Notation" for a reason. From what I know, the beginnings of it really were in the early days of AJAX: People were looking for a good way to encode complex data structures in dynamically generated script tags.

Then someone stumbled over essentially a random syntax element in JavaScript that turned out to be extremely well-suited for that task.

So I think talking about "discovering" JSON as a JS syntax feature which later became its own thing is quite correct here.


YAML seems like a good that gets harmed by feature-creep. All I want when using YAML is a simple human-friendly text-format for slightly structured data. But all those special features and edge cases...as if I even have the time to learn about them. Values should be only text. Outside of the handful clear specified cases of numbers and maybe true/false/none, they should not have any other interpretation on the parser.

Isn't there something like a simple YAML-definition and some YAMl-schema for the powerusers?


He mentions it at the end (the 'YAML subset' alternative), but the edge cases only cause issues if your keys and values begin with special characters but are intended to be simple text. I tend to defensively quote stuff (probably much more than I have to), and I rarely find these issues. I have other issues with YAML but not this.


> Values should be only text. Outside of the handful clear specified cases of numbers and maybe true/false/none, they should not have any other interpretation on the parser.

I think simpler way would be just forcing quoting on text. That fixes the boolean and the different base numerics problem


YAML has come from Perl world.

Of course it has feature creep. It’s a Perl thing.


Wouldn't it be easy to make those features optional and switched off by default in "YAML--"?


There's a lot of alternatives out there. One of the most complete (in terms of specification) and very promising is concise encoding (https://concise-encoding.org/) which focuses on the duality of editable human compatible representation (as text) and efficient binary encoding/decoding. The biggest issue right now is lack of actual implementations. The project is open source (I'm not affiliated with it).


In the linked YouTube video talking about JSON: "People were putting instructions to the parser in comments... so I [removed] the comments." - Douglas Crockford.

Let the impact of that sink in - that decision has almost certainly literally cost millions of developer hours! In fact, the article actually claims that the lack of comments in part led to YAML itself so by implication...

I jest (sorta) but what a great example of an initially helpful decision having last consequences.


Every time I’ve reached for yaml after being bitten by the Norway problem has been because I wrote comments in my json file and it failed be read.


Wow, you must really be forgetful. Maybe your productivity would improve with some post-it notes stuck next to your monitor, reminding you of the basics about how these formats work.


Crockford probably didn't imagine the idiocy of creating YAML to replace the lack of comments in JSON.


At CircleCI, we have had to handle many strange YAML documents. Two that come to mind:

A recursive document:

    key: &anchor
     - *anchor
The `~` character represents null:

    ~:
     - ~
This parses as a map of {null: [null]}


What were the authors trying to accomplish with these strange documents?


The recursive anchors seemed to be user-error. Either copy+paste bugs, or an anchors places in the wrong position.

The null character would appear when people try to write a string path, and use the the `~` character without quotes.


Break their parser, probably


When I first discovered YAML, I loved it's apparent simplicity. Nowadays, I sigh every time I have to use it and am very happy with the Rust world having mostly adopted TOML.


I wonder why generated JSON is not used more widely, something like jsonnet[0], although I've never used it.

Most systems come with Python 3 installed, which supports working with JSON[1] without installing any dependencies. So, I'm wondering if we could use Python to generate JSON data that is then consumed by other tools. This would fix the lack of comments and trailing commas in JSON, you get functions and other abstractions from Python, and you don't need to install any third-party tools (vs. something like jsonnet).

[0]: https://jsonnet.org/

[1]: https://docs.python.org/3/library/json.html


I’m at a crossroads and evaluating whether to introduce jsonnet to my org. We are doing a massive conversion of legacy yaml manifests from a proprietary vendor format, and there are not a lot of chances like this where you can make such a change at scale. Any experience reports or advice on jsonnet would be appreciated.


At Grafana Labs we're using jsonnet at scale, while being a powerful functional language it is also excellent for rendering JSON/YAML config. We have developed Tanka[0] to work with Kubernetes, for other purposes I can recommend this course[1] (authored by me).

[0] https://tanka.dev/

[1] https://jsonnet-libs.github.io/jsonnet-training-course/


Wow I love it here. Where else can you post something and get a project leader chiming in within hours?

We are at a crossroads in moving towards GitOps and a CD vendor change, and are forced into some manifest conversion. From where we are coming from Helm is the obvious choice, but I do think we can do better, especially for debugging, readability, and inheritance.

ArgoCD’s support for jsonnet turned me onto it and Tanka, and now I’m going to take it for a more serious spin. I’m early days with it but I’m curious about how life is with having a compiled language for an infra development. I had the same reluctance when thinking about making the Typescript leap, but that turned out just fine. I kinda want to jsonnet all the things if it works out well, but I worry the compilation steps would make it cumbersome for some use cases.


Hit me up on the #jsonnet channel on Kubernetes Slack or the #tanka channel on Grafana Slack if you want to go a bit more in depth.


Isn't TOML just the .INI files of yore? At least it looks very similar.


The feature set of toml is similar to json, but with ini like syntax.

Most significantly ini-files have a fixed two level structure (section -> key -> value) while toml supports arbitrary nested arrays and dictionaries.


And don't forget one of the more understated benefits of TOML over INI:

> A TOML file must be a valid UTF-8 encoded Unicode document.


> yaml is a superset of json

This is oft-repeated, but incorrect claim:

https://news.ycombinator.com/item?id=30052633


About

  Sometimes an application will start out with a need for just a configuration
  format, but over time you end up with many many similar stanzas, and you
  would like to share parts between them, and abstract some repetition away.
  This tends to happen in for example Kubernetes and GitHub Actions. When the
  configuration language does not support abstraction, people often reach for
  templating, which is a bad idea for the reasons explained earlier. 
  Proper programming languages, possibly domain-specific ones, are a better 
  fit. Some of my favorites are Nix and Python
Templating is fine with YAML if you don't do it with strings but with a system that turns one YAML document into another YAML document, that is, do your substitutions in the YAML universe.

The "code up your configuration in your favorite programming language" can really work but it frequently becomes a point of frustration itself. As much as I want to like YAML it has a number of intrinsic problems like the Norway problem... But I do think people are confusing the accidental and essential complexity and because of that they flinch and reach for non-solutions to the problems of configuration rather than really thinking it through.


I don't get all the YAML hate. The article even mentions solutions, which are all better than adopting some nonstandard json variant or toml.

- for whatever language you're using, be aware of which YAML version its YAML library supports and its defaults, and how to safe load yaml in that language.

- defensively quote strings, particularly if you're on a language with an antiquated yaml library that defaults to yaml 1.1.

- defensively use true/false only, and proactively convert any other booleans in your codebase to true/false.

- Depending on your language, you can avoid any custom data types for any externally-distributed applications to mitigate the risks, even when it might make things more convenient. Use safe loading (most languages with yaml libs support it) to avoid loading any. The other major YAML alternatives that YAML haters recommend won't have custom data types either.

Features like references and folding semantics are very convenient, and you don't even have to use them. Basic yaml enjoys better readability than json. toml is only fine if you don't have much nested data.

The author's note that there are json variants that help with some json failings makes no sense. If you're going to adopt some non-standard json variant, why not just adopt yaml 1.2, make sure your language has a yaml lib that supports it, and use that? At least yaml 1.2 is standardized. It's not their fault if python's libyaml only supports yaml 1.1. It looks like pyyaml is essentially in maintenance mode and ruamel.yaml is what everyone should be using? Unfortunately nobody's gotten around to implementing safe loading natively, but that's a python problem, not a yaml 1.2 problem, and ruamel.yaml supports pure-py safe loading that's compliant with 1.2 (no integer-interpretation gotchas), which is fine in most cases where yaml is only loaded occasionally, i.e. at start-up, and performance isn't critical.

Obviously YAML has historical problems, but what's better? Using another flawed or even more limited data format, inventing your own which will begin with zero adoption, or simply ensuring your environment/app uses yaml 1.2 and best practices?


Having to watch out for footguns (by being defensive as you described) means you're just having to learn these rules. What benefit is YAML giving you then?

Why not use JSON? Conversion of types from YAML to the language's native types is a poor feature to get in return for footguns.

If you _need_ complex types - like a in RPC serialization - then i would imagine using XML is better, and forget about making it human-readable. Create a client/testbed instead.

For simple things, JSON fits really well imho.


Or you could ensure your language supports and defaults to yaml 1.2, so you don't have worry so much.

Every language has warts and pitfalls and things that are recommended against. Suggesting that one should abandon such languages, whether full-featured or data-description DSLs, when they're otherwise more productive than the alternatives, is disingenuous.

Suggesting that yaml offers no genuine advantages? JSON is annoying. It requires good editor syntax checking support to avoid mismatched and trailing punctuation. It lacks comments. I could live without the nice syntactic sugar of yaml, and even its lack of references and line-folding, if it weren't for those other things.

Which leaves something like cfg or json5, both of which have less support than yaml 1.2. So wth are we talking about? Don't use perhaps the best available (human-editable, inline-nested) data DSL because it might not be supported in some language, and because it has a few potential footguns just like json does? toml may be clean, but it's not as easy to deal with because of how it represents nested structures (which aren't fully inlined like they are in json or yaml), as so many people on this thread have pointed out.

There's a reason so many modern tools use yaml. It's not that their authors weren't aware of json. Plenty of people are simply more sick of json's problems, and consider yaml a superior, not perfect, alternative.


The amount of "defensively [do something]" and "best practices" is another way of saying "this format is so bad you need to thread carefully or else we'll blame it on you".

Good format should not require defensive writing, and also "best practice" is a bit of misunderstanding: it's best practice to avoid such a bad format, not attempt to screw people into it and then blame them for not following some arbitrary set of rules.


I agree with most of your suggestions, but I wouldn't use "safe_load" unless the file has untrusted input (in which case, you could argue, you shouldn't be using YAML at all).

If you use safe_load then you'll get an exception if you use dates or timestamps, even if it's an unquoted string that happens to look like a date (but you can get around that last one by quoting).


Why are the solutions better than adopting toml? You don’t provide any justification for that claim.


I did. Extensively nested data structures.


> Obviously YAML has historical problems, but what's better?

toml


One of the major use-cases for YAML is as a host language for DSLs, essentially basic programming languages for specific use-cases. TOML isn't as good there, as these languages often need deep nesting of data structures. GitHub Actions is the obvious example.


Yeah, toml isn't great for that. It's greatly preferable to yaml for human-readable config. Json is better as a serialisation format.

DSLs are bit vexed, not least because they're often horrible in themselves - cruft-accumulating hacks with no tooling, blurring the distinction between code and config often as a workaround for unwieldy dev processes. That's certainly not true of all DSLs, though (can't speak for Github Actions) and there is a need for a simple common base syntax/parser for them. I don't think yaml is a good solution.

Being a cantankerous old git, I tend to think the problem of a common base syntax/parser for DSLs was solved 70 years ago: s-expressions/lisp.


Yes, agreed. I've found TOML for config to be quite nice.

This is my go-to article for discussions about generic languages vs DSLs: http://mikehadlow.blogspot.com/2012/05/configuration-complex...

I'm very iffy on DSLs in general. One that I'm using at the moment has foreach loops but no if statements! And... all the inputs must be specified in a YAML file!


A great alternative for stuff like that is https://kdl.dev/, which feels much more natural for DSLs.


Quite frankly, all YAML DSLs in the wild are terrible, so if that's a major use-case, I'd say it was a failure.


What we need is some kind of extensible markup language ...


Funny but not quite.

Nobody wants extensibility and "document markup" isn't the same as "configuration specification"


This is why my own configs are either ini where I know 100% that it would never become complex and take more than few lines or XML for everything else.


> The yaml package supports most of YAML 1.2, but preserves some behavior from 1.1 for backwards compatibility. YAML 1.1 bools (yes/no, on/off) are supported as long as they are being decoded into a typed bool value. Otherwise they behave as a string.

Good solution for using YAML with strongly typed languages. No reason to leave type decisions up to a parser. I assume serde does something similar in Rust?


To me, the yaml document from hell is a series of yaml documents I helped create that replaced a bunch of database config and hard-coded constants. It helped 'unlock' the business to create newer forms of products much more quickly and correctly, but it never got moved to an actual database. So at this point, they have some yaml documents that, if expanded to include all references, end up becoming this massive fucking tangled mess.

IIRC we never moved it to a database partly because we did not have time to solve for how we actually wanted references to work.

The yaml document and code that reads it still work as a way of speeding up the business compared to how things used to be. But that code takes a long time to parse, has probably caused a decent level of confusion for developers who have to mess with it who aren't on that team, and has absolutely taken a lot of clock time being parsed for validation checks in unit tests.

Still not sure I would have solved that problem differently, aside from perhaps moving the thing to a database immediately after the first phase of working on it.


YAML with templating becomes troublesome even more so when these are Django tmeplates. That's why I do not prefer ansible for automation. It becomes so much cumbersome at certain point that it is more valuable to jump into a real programming language instead of wresteling with both YAML, Django templating and python.

That's where I am enthusiastic proponent of pyinfra.


Ansible and YAML were my primary (de)motivators to create Judo (https://github.com/rollcat/judo). This combo is extremely frustrating: for every line in a (hypothetical) shell script that would do one thing, I needed 3-5 (sometimes many more) lines of YAML. Most people on the team who were just getting started with Ansible, would often do half of their work just shelling out. I would usually push to do things "the Ansible way", but even I had to acknowledge the mental overhead of translating back & forth. I think what finally pushed me over the edge was when we started venturing into compose & k8s, and had to mix & juggle YAML+Jinja in two entirely different contexts, each with its own quirks, bugs, gotchas and brain damage.

I figured I just need a layer of glue to run shell scripts across a bunch of remote hosts (hence Judo), and otherwise resort to other tooling (like Terraform, AWS CLI, k8s CLI, etc) for problems that don't map to SSH.


why would you use this over pssh or dsh (https://www.garbled.net/clusterit.html ) ?

It looks to do exactly the same thing, only slightly less efficiently.


I didn't know these existed, until I read your reply. Thanks - looks interesting, I will have to investigate further.

First, I've done some research into alternatives, but my primary motivation was to prove you can replace Ansible with a weekend hack that closely follows the UNIX philosophy. Here I am 6 years later still using it everyday in production, so I guess I was right.

Second, dsh looks a bit antiquated - defaulting to rsh? Granted I have very little experience with commercial Unices (which seem to be the primary use case there), but... rsh?


To me, the #1 reason to use YAML rather than anything else is if you need to embed complex multi-line strings in your document. This is a practical necessity in CI systems: you always end up with at least a little bit of shell scripting to glue things together, and embedding shell scripts into anything other than YAML is an absolute nightmare.

Yes, there are too many options for expressing multi-line strings (In the vast majority of cases, I really don't care whether it has a trailing newline or not). Yes, there are weird rules that don't make sense, like how in "folding" mode a newline followed by two spaces more than the current indentation level doesn't get folded. But it's still the best option for CI, which is why Github actually ditched HCL in favor of YAML after a while: https://news.ycombinator.com/item?id=20647691


The CFG configuration format is a text format for configuration files which is similar to, and a superset of, the JSON format. It's not new - it dates from well before its first announcement in 2008 - and has the following aims:

* Allow a hierarchical configuration scheme with support for key-value mappings, lists and basic types such as strings, ints, floats, Booleans and date/times.

* Support cross-references between one part of the configuration and another.

* Provide a string interpolation facility to easily build up configuration values from other configuration values.

* Provide the ability to compose configurations (using include and merge facilities).

* Provide the ability to access real application objects safely, where supported by the platform.

* Be completely declarative.

It's similar to newer formats such as JSON5, HJSON, HOCON and similar but offers a number of features [0] which they don't, as indicated by the above list. It's not intended to occupy the niche where you find things like Cue, Jsonnet, Dhall and similar.

It was just never especially publicised when first implemented for use in Python projects, but it now also has implementations for the JVM, .NET, Go, Rust, D, JavaScript [1], Ruby and Elixir (all BSD-3-Clause licensed) and it would be great to get feedback on the project from the HN community.

[0]: https://docs.red-dove.com/cfg/intro.html - description of features and comparison with other similar systems

[1]: https://docs.red-dove.com/cfg/playground.html - uses the JS implementation to create an interactive playground


I will never willingly use YAML on a project ever again. It's a bloody disaster. I would actually prefer XML over YAML.


I like to write Python to manage JSON.

Keeping the data and the logic orthogonal (these effin' Fn::'s make my eyes bleed) is a win.


YAML isn't the best, but it's the least worst.

INI doesn't let me do hierarchies and is out right out of the gate.

I started my career during the "XML ALL THE THINGS" craze and feel physically ill when I need to edit XML. It's technically the best, but it can be misused in so many ways[0] that it makes it unviable. XML Schemas, DTD etc make it a breeze to handle programmatically though.

JSON might be good, but it doesn't have first-class comments and no links or refers. It even kinda has a schema spec.

YAML has a ton of edge cases (like in the linked post), but it's hierarchical, human editable and machine readable-ish as long as you take the edge cases into use. You can also link and refer elements to reduce repetition.

[0] elements with no values, but 20 attributes, too many elements with values, but no attributes, etc etc...


Many commends are (correctly IMO) pointing to JSON as superior to YAML given YAML's problems, but these ignore JSON's problems due to its lack of semantics. When moving JSON between systems, the JSON parser makes certain assumptions about the JSON document, such as whether the numbers represent integers or single-precision or double-precision floats. This is certainly less obnoxious than "no" being treated as a bool, but it is a problem.

A friend of mine created Preserves[0] in part to resolve this problem. I'd encourage everyone to check it out. To be fair, he's also quite pro-Dhall and Nix.

[0] https://news.ycombinator.com/item?id=34354614


The simple answer to that is to store anything that may not fit in double without loss as a string. (I don't know of JSON decoders which use floats)

After all, one of the biggest upsides of JSON is its wide language/tool compatibility. And that means intentionally limiting the format to the most basic subset of types that all languages have.


Amazing that developers still spend so much time getting bogged down in encoding issues in 2023. Why can't we just edit some data structures directly and send send them around seamlessly? It's a deeply dysfunctional state of affairs.


For that, you'd need a set of commonly accepted data structures.

Alternatively, you can describe and send arbitrary data structures. Although that begs the question of how to process arbitrary data structures. (JSON and XML aren't arbitrary, for that matter)


For a human-readable, schema-free data representation, I really like Amazon's Ion format (disclosure, I'm an Amazon employee). It is basically JSON + additional types (integers, timestamps, symbols, sexpressions), and support for humane features like trailing commas and comments. It also has a more efficient binary representation. I think it could have been a contender had it been released a decade earlier, but now I think the existing formats are way too entrenched.

https://amazon-ion.github.io/ion-docs/index.html


For managing configs of ML experiments (where each experiment can override a base config, and "variant" configs can further override the experiment config, etc), Hydra + Yaml + OmegaConf is really nice.

https://hydra.cc/

I admit I don't fully understand all the advanced options in Hydra, but the basic usage is already very useful. A nice guide is here:

https://florianwilhelm.info/2022/01/configuration_via_yaml_a...


This is why configurations as code should be written in actual programming languages. Dhall, Nix, and Pulumi are all solutions geared towards configuration but written with fully capable, user-extensible, programming languages.


Don't forget Lua! Can be easily sandboxed, meaning you have full control over which functionality is made available to the user. Also, the Lua interpreter itself fits in less than 300kB and can be easily embedded in any language that supports C extensions or FFI, so people don't even need to write their own parsers.


Ansible collection writer here, pray for me...


What are the practical modern alternatives for YAML use cased? TOML, others?


People often forget about the environment, and I feel like a lot of the simpler services and applications could be configured through it easily.

For anything that requires more complex structures, JSON5 (https://json5.org/) has been gaining traction. It's basically JSON that allows a couple of more things, including comments, trailing commas, and even unquoted keys, if you're into that.


This for this it indeed looks very much better than json or yaml.


For simple use cases like key-value style config you can use JSON or TOML without too many issues.

For IaC the alternative is general purpose programming languages using declarative infrastructure APIs. E.g. TS, Go, Python with CDK, CDKTF, Pulumi. For CI are stuck with YAML for now.


As another top-level comment mentions, one alternative is NestedText, where everything is a string unless the ingesting code says otherwise, and there is no escaping needed at all.


TOML is good for simple plain structures. As soon as you have more nested structures, you can also use json.


edn

Extensible Data Notation, it's to Clojure what json is to Javascript.


Read the article?


Welcome to dynamic typing hell.

As long as you ask your configuration language to infer the data type from a piece of text, these are the kinds of edge cases you get to experience. (They vary by language. Less types means less mistakes, and so does less expressive power - but without some form of type hinting, that's where you are)

Personally, I like YAML for expressive simplicity at the start of a project, and then if it becomes an actual load bearing piece of software (vs a haphazard one off, or an experiment), let's add some schema validation & typing.


I was looking for a human-friendly format to store data for a staticly generated website. Stuff like copy, lists, addresses, contact information (so that the rendering logic is kept separate).

Landed on NestedText, and apart from it having few implementations, I loved it! https://nestedtext.org/en/stable/

I did feel like I missed a more popular alternative, because I can't be the only one to want a human-friendly data format.


I'm a huge fan of NestedText, especially as there is no escaping needed ever.

If you ever want to use it as a pre-format to generate either TOML, JSON, or YAML, I used the official reference implementation to make a CLI converter between them and NestedText.

When generating one of these formats, you can use yamlpath queries to concisely but explicitly apply supported types to data elements.

- My CLI converter: https://github.com/AndydeCleyre/nestedtextto

- yamlpath info: https://github.com/wwkimball/yamlpath/wiki/Search-Expression...


Wow, that's really versatile! Thanks for sharing! `json2nt` seems like a great alternative to `jq .` for pretty printing.


I remember the first time I played with YAML ten-ish years ago. I experimented with converting our configuration file from INI to YAML. It came with an actual noticeable increase to our app startup time in the tens of milliseconds.

The YAML evangelist in my company when asked told me I should cache the results… of reading our config file… from the local filesystem. What do I store that cache as? JSON? Just cut out the middle man. Left a really bad taste in my mouth at the time I have yet to get out.


“YAML evangelist” sounds terrifying, and I have never met one in real life. Most people who used YAML basically resorted to the old argument of “Well JSON doesn't have comments, so YAML it is”.


The way to solve YAML issues is to invent some widely used yaml pretty printer and shout at everyone not using it.

There's `yq -P` but it does not implement proper normalization for plenty of mentioned issues which is a shame, really.

As I see it, everyone can just start with JSON (as JSON is a subset of YAML) and pretty-print it and use the result whenever it's supposed to be used.

In other words use safe subset of YAML. May be it'll be formalized some day. Yamlite.


That safe subset exists and is implemented in a number of languages. It is called strict-yaml: https://hitchdev.com/strictyaml/


In which languages is it implemented? I have seen a supposedly equivalent Rust library, but could not find anything else. Is there any C library or something like that? Unfortunately, these are the easiest to use from a lot of languages because of how easy they are to wrap.

(I am genuinely interested in any useful library better than libyaml).


They don't use quotes, so not safe.


it is an old song.

JSON (as per spec) is a lightweight data-interchange format.

YAML is a "human-friendly data serialization language for all programming languages" and by mere coincidence (it does not look to be by design until version 1.2.2 [0]) JSON is its subset and not vice versa!

And YAML is human-friendly in two ways:

- it is notoriously "human" in interpreting input, i.e. ignoring the context or misinterpreting others' words;

- it helps read serialized data via indentation-based syntax.

The biggest caveat is for whatever reason everyone ignores the key highlights of both JSON and YAML that are "data-interchange format" and "data serialization language". If I were to interpret these two terms I would rather say they both present great means for heterogenous programs to exchange data with YAML being a fancier brother better suited for human eyes. They were not meant to be used for configuration purposes or what not. But people started using them exactly for this purpose instead of creating a configuration format/language that would render into program-readable JSON(if needed).

CUE and Dhall are perhaps the first two attempts to create specifically a configuration language with the ability to serialize in yaml/json/etc.

I very much hope using YAML, JSON and even TOML for writing configurations will become an anti-pattern soon and will be replaced by something more suited for this role.

[0] https://yaml.org/spec/1.2.2/#12-yaml-history


If you only need something for internal use, nothing prevents you from using s-expressions, and interpret atoms (number -> uint16_t, dates in "yyyy-mm-dd") whatever your application feels like... With this lil' trick all parsing problems are gone. Inferior Format Users will be jealous, and Inferior-Format-Parser-Writers will hate you


I'm surprised nobody mentions an alternative called NestedText. It only supports string type, which puts the complexity and responsibility of typing with the application. I think it's less prone to parsing errors.

https://nestedtext.org/en/stable/


> The key on is common in the wild because it is used in GitHub Actions. I would be really curious to know whether GitHub Actions’ parser looks at "on" or true under the hood.

I can say, at least, that using both on: and 'on' work in GHA. I usually write the latter, makes it simpler to do ad-hoc YAML.parse or whatever on the workflow files.


Someone needs to write

YAML: The Good Parts

If you're strict about how you use it, it's more than workable. And, yes, the Norway problem is insane.


I will stick to JSON or TOML if the choice was mine. Hell, I'd much rather deal with XML than deal with YAML.


I've ways preferred HOCON to YAML but it never took off :-/

It's a pity, because compared to YAML it's a lot simpler from the rules perspective while being flexible enough to do some useful stuff over plain JSON

Also, it isn't white space dependent, something which annoys me know end with YAML


> Templating yaml is a terrible, terrible idea

I've had a good time using ytt: https://carvel.dev/ytt/. It implements language-aware templating, which is IMO the only reasonable way to do it.


Language aware is definitely important. But honestly I've usually found it easier to use a "real programming language" to generate the structures. You can read and write as normal. Then you can just serialize them to YAML at the end.

Helm is so bad. It has a `| quote` function that is almost always wrong. People use it to escape strings into YAML strings but it just adds quotes around the values. It doesn't actually escape to YAML. I had a surprising hard time convincing people that they should be using `| toJson` and ideally be doing it for every interpolation.

But really working with text is just an inconvenience here. You can imagine how much better everything would work if your base template was `console.log(JSON.stringify({ /* your config here */ }))`


I've always hated YAML, and now I'm glad there's a very nicely written article to back me up.

I also notice that I have a preference to procedural as opposed to declarative specifications. E.g. I find conditionals and loops in declarative specifications quite bad.


We need to go back to xml. Admit it.


What would be good is “strict yaml”, that is, yaml with all those weird cases strictly disallowed?

Which would make it backwards compatible, as the cases would just make it invalid “strict yaml”?

And I’m thinking it might already exist? I’m surely not the first one to think of that


It exists, and it is called "JSON" :)


Call me naive or quote xkcd, but whenever I read about the shortcomings of YAML, I daydream about a new standard that's mostly a subset of YAML, that looks like this:

    rules:
     - "All strings must be quoted and are parsed identically to JSON strings"
     - "true and false are the only boolean values"
     - """multiline strings
    are allowed"""
     - """|
       there can even be special multiline string types
       to indicate things like "dedent every line and 
       trim the beginning and end lines if empty
       """
     # Comments are allowed!
     - Number syntax just like JSON
    
    better_than_yaml: true
    "a number with multiword key": 1e9
    nesting:
      present: true
      difficult: false
It has the ease of coding of yaml, maps 1:1 with json (so can easily be re-serialized to json or yaml if needed), is familiar to folks who know yaml, and supports comments. And it eliminates ambiguous parses and gotchas, references, tags/directives, and other complexities.


If Python is valid option then lisp syntax should also be considered as a viable alternative.

Heck the number of times I've seen attempts at creating a lisp in yaml, it is easier to just get it over with and make it a lisp in the first place.


Almost all of the issues mentioned would be solved by using quoted strings.


> There were two changes to it in 2005 (the removal of comments, and the addition of scientific notation for numbers)

They actually had comments in JSON but removed them in 2005. That is odd and a bit annoying :-)


> Leaving strings unquoted can easily lead to unintentional numbers.

It seems like this sums up all the issues pointed out by the article. Ie. you’re safe as long as you quote your strings.


Yaml is the complete opposite of what you want: a simple specification language. JSON is the gold standard to measure against. However one step better IMHO is S-expressions.


Just say Norway to YAML!


YAML is the worst format I've had the displeasure of working with. Given how ubiquitous it is, I don't see it going away soon, but I am still hopeful.


An unquoted version 3.10 in yaml becomes 3.1. Fun stuff!


I am not a fan of LISP but I use S-expressions all the time for configuration files, dumping data etc. It is simple yet infinitely flexible.


Strict yaml fixes this, I recommend it. Before that I disabled as many of yaml’s features in the library I was using. Look for “safe parsing.”


Is there a tool that parses yaml and outputs json, so you could check if the yaml you wrote parses as you intent?


I guess this can easily be scripted with https://pypi.org/project/yq/ Internally this seems to be a wrapper around jq :)


I use https://kislyuk.github.io/yq (it also includes and 'xq' command, which converts XML to JSON)


gojq (https://github.com/itchyny/gojq) and fq (https://github.com/wader/fq) can read YAML and output JSON.

But that won't nescessarily help, because other parsers can intepreret that YAML differently.


Let's not forget: JSON is also valid YAML

YAML, OpenSSL, and ASN.1 are what happen when every feature in the world is added.


This is why I invented KYLI - https://shkspr.mobi/blog/2017/03/kyli-because-it-is-superior...

People shouldn't be hand-editing these sorts of configuration documents. 21st century tooling would prevent so many of these "foot-guns".


This one liners lead to unreadable configuration. With tools, the format is not important. But we did not talk about tools, we talk about that humans can easy read and write a configuration.


A text editor is a tool. Text files are only "human readable"/"human editable" in the sense that such tools are ubiquitous whereas structured data editors are not.

Which is a real shame. Entering, manipulating and validating small amounts of structured data is such a fundamental task when using computers that it deserves dedicated tools, and the sheer amount of issues that only exist because such tools are not ubiquitous is absolutely mindboggling.


While it is indeed total insanity it could easily be avoided by putting quotes around all string values.


This explains a lot! I just know my docker compose files were always a pain and unintuitive to write.


i want *any* configuration format that can rewrite itself(loaded and written to disk again) with two properties we've seem to have lost when we switched away from xml.

1. keep ordering of document 2. keep comments of document

Does anything like that exist?


I honestly don't understand why people still write YAML considering there are alternatives like TOML available.

I think attributing semantic value to indentation is a terrible idea. It gets even worse when there is so much visual ambiguity when it comes to defining values and escaping certain character sequences.


Secret fact: Hackernews data is stored as a huge YAML file on disk.


But, of course, _all_ yaml documents are from hell.


so the suggestion is: use toml for configuration, otherwise just stay with json?

there is jsonc which is json+comments but I have not used.


"Use the EDN, Luke!"


I saw NixOS


"YAML is Human-readable"

"You must count whitespace for nesting"

Pick one or the other, but not both.


You don't need to count whitespace for nesting:

    test:
        something:
        complicated:
            {even: { more: { levels: {can: { be: {achieved: {like: {this: true } } } } } } } }
Whitespace nesting also doesn't seem to be a problem for Python programmers. I don't prefer it myself (especially because of the tabs vs spaces problem) but it's not good or bad.


Extra whitespace before 'something' in your example would cause an issue.

> Whitespace nesting also doesn't seem to be a problem for Python programmers.

It does cause problems, IDE's mitigate the issue, the same as they do for YAML. However YAML is supposed to be a human readable data-serialization language and If you need an IDE to make it human readable then that defeats the point.

I can't imagine ever being convinced that having to 'count the number of invisible things to determine what something is' is human readable even if you can use other lines as reference points.


Every day we stray farther from the light of XML.

At this point, it’s just plain ridiculous.


TLDR: unexpected conversions to other types in YAML 1.1.

A larger JSON document is hell to write and read, though.


JSON is arguably much worse for usage in Git. Have fun matching braces in the default diff tools out in the wild.

YAML is the much better alternative with similar semantics to JSON. The numeric issues still hold true though.


[flagged]


Did you create this account just to place an insulting comment? Be kind


you should be kind to idiots? then how will they learn that they are idiots?


Indeed, we're being too kind to you.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: