>And why do devs do this? TBH, I really don't know.
it's one of the pillars of the unix philosophy at work - text is the universal interface.
your nice rich type requires figuring out how that type works, maybe even reading documentation. if you make it a string, you can just treat it like a string. and we all know how to handle strings, or at least how to copy code from somewhere else that handles strings.
C#'s System.Object base type contains a ToString, allowing any richer object to fallback into such inefficient systems if required.
I'd also caution that "everything is a string so no need to read the documentation" is unrealistic and even outright dangerous. *NIX's string philosophy has directly resulted in multiple security vulnerabilities.
I'm not so sure about the documentation part, but this seems valid at face value:
> it's one of the pillars of the unix philosophy at work - text is the universal interface.
In general, text is indeed the lowest common denominator that's available to you and fits with the majority of GNU tools nicely (and how most *nix distros are structured). Of course, some structure helps, since dealing with something like Nginx or Apache2 text logs might not be as nice as having your logs in JSON, though in regards to configuration files or program output there's lots of nuance - structured text (like JSON) will be easier to work with programmatically but might not be as nice to look at. Working with binary data in comparison can be asking for problems, to the point where using SQLite for your application data might make more sense: https://sqlite.org/appfileformat.html
In this particular case, however, the conversions were made within the confines of a single program, so the interop is indeed not a valid concern here. A properly written type might indeed allow you to avoid various inefficiencies. If you're not serializing your objects and whatnot to make the life of the consumer of your data easier (be it another tool, or a REST/SOAP API client that needs JSON/XML, or a configuration file on the file system), then there's not that much point in doing those conversions.
Unfortunately "text is the universal interface" is pretty much a lie, for the exact same reason that "bytes in memory are the universal interface" is a lie.
An interface has some kind of structure for the data (and operations) that it represents. That structure may be explicitly defined or left implicit, but it is always there. And in the vast majority of cases, that structure is more complex than "a single independent string".
That's where the whole "pipe text between processes" approach breaks down - in many cases, you aren't just dealing with a single string, but with some sort of structured data that just happens to be encoded as human-readable text. That it's "text" is entirely immaterial to the structure of the data, or how to work with it.
And that's how you end up with every CLI tool implementing its own set of subtly incompatible data parsers and stringifiers. This is strictly worse than having a standardized structural format to communicate in.
> That structure may be explicitly defined or left implicit, but it is always there.
But this is something that every piece of software out there has to deal with. By this reasoning, claiming that RESTful APIs can evolve over time and don't need something like WSDL (though OpenAPI still exists) is also a similar lie, because there absolutely are assumptions about how things are structured by anyone and everyone who actually needs to integrate with the API. Even "schema-less" data storage solutions like MongoDB are then also built on similar lies, because while you can throw arbitrary data at them to store it, querying the data later will still depend on some assumptions about the data structure.
I'm inclined to agree, and yet, RESTful APIs are still very popular and seem to have much success (though some push for more "immutable" APIs that are versioned, which is nice), as do "schema-less" data stores in certain domains. Edit: you can also say that about everything from CLI tools or even dynamic programming languages as a whole. I think there's more to it.
> And that's how you end up with every CLI tool implementing its own set of subtly incompatible data parsers and stringifiers. This is strictly worse than having a standardized structural format to communicate in.
I suspect that this is where opinions might differ. Half-assing things and counting on vaguely compatible (at a given point in time) implementations has been the approach on which many pieces of software are built to solve numerous problems. On one hand, that is ignorant, but on the other hand - how often is the output from "ls -la" going to change its format, really? The majority of CLI tool writers seem to understand how brittle and bad everything is and thus treat their data formats carefully, adding a whole bunch of different flags as needed, or allowing for custom formats, like: "lsblk --output UUID,LABEL,SIZE"
For many out there, that's good enough. At least to the point where something like PowerShell isn't mainstream, despite some great ideas in it (or other shells that would let you work with objects, instead of text).
> Even "schema-less" data storage solutions like MongoDB are then also built on similar lies, because while you can throw arbitrary data at them to store it, querying the data later will still depend on some assumptions about the data structure.
They are. I have been complaining loudly about this for years :)
> I'm inclined to agree, and yet, RESTful APIs are still very popular and seem to have much success (though some push for more "immutable" APIs that are versioned, which is nice), as do "schema-less" data stores in certain domains. Edit: you can also say that about everything from CLI tools or even dynamic programming languages as a whole. I think there's more to it.
The part that I think you're missing is the difference between structure and schema. It's entirely valid to leave the schema implicit in many circumstances (but not all - generally not in a database, for example), but you still need the general structure of how data is represented.
When you deserialize something from JSON, CBOR, whatever, you will get back a bunch of structured data. You may not know what it means semantically, but it's clear what the correct (memory) representation is for each bit of data, and that's exactly the most fragile part of dealing with data, which is now solved for you.
You do not get the same benefit when passing around strings in ad-hoc formats that need custom parsers; it is very easy to mess up that fragile part of data handling, which is why parsers are so infamously difficult to write, whereas working with parsed data structures is generally considered much simpler.
Likewise, from a usability perspective, it's fairly trivial to let the user pass eg. a dotpath to some tool to select some nested data from its input; but letting the user pass an entire parsing specification as an argument is not viable, and that's why most tools just don't allow that, and instead come with one or more built-in formats. If those supported formats don't match between two tools you're using, well, sucks to be you. They just won't interoperate now.
Essentially, "standardized structure but flexible schema" is the 'happy compromise' where the most fragile part is taken care of for you, but the least predictable and most variable part is still entirely customizable. You can see this work successfully in eg. nushell or Powershell, or even to a more limited degree in tools like `jq`.
> I suspect that this is where opinions might differ. Half-assing things and counting on vaguely compatible (at a given point in time) implementations has been the approach on which many pieces of software are built to solve numerous problems. On one hand, that is ignorant, but on the other hand - how often is the output from "ls -la" going to change its format, really? The majority of CLI tool writers seem to understand how brittle and bad everything is and thus treat their data formats carefully, adding a whole bunch of different flags as needed, or allowing for custom formats, like: "lsblk --output UUID,LABEL,SIZE"
Speaking as someone who is currently working on a project that needs to parse a lot of CLI tools: the formats still change all the time, often in subtle ways that a human wouldn't notice but a parser would, and there's effectively zero consistency between tools, with a ton of edgecases, some of them with security impact (eg. item separators). It's an absolute nightmare to work with, and interop is just bad.
> For many out there, that's good enough.
I think there's a strong selection bias here. It's not without reason that so many people have an aversion to terminals today - they just aren't very good. And it's not just their terminal-ness either, because many people who cannot deal with a standard (eg. bash) shell will happily use terminal-like input systems in specialized software.
It's certainly true that most people who use terminals on a daily basis, consider this paradigm good enough. But that's probably because those who don't, just stop using terminals. We should be striving to make tech better and more accessible, not just "good enough for a bunch of Linux nerds", and certainly not upholding "good enough" paradigms as some sort of virtue of computing, which is what happens when people say "text is the universal interface".
Of course changing this is a long process, and there are very real practical barriers to adoption of other interop models. But that's not a reason to downgrade the 'ideal' to fit the current reality, only a reason to acknowledge that we're just not there yet.
> It's entirely valid to leave the schema implicit in many circumstances (but not all - generally not in a database, for example), but you still need the general structure of how data is represented.
> ...
> Essentially, "standardized structure but flexible schema" is the 'happy compromise' where the most fragile part is taken care of for you, but the least predictable and most variable part is still entirely customizable. You can see this work successfully in eg. nushell or Powershell, or even to a more limited degree in tools like `jq`.
That's a fair point, thanks for putting emphasis on this!
> I think there's a strong selection bias here. It's not without reason that so many people have an aversion to terminals today - they just aren't very good. And it's not just their terminal-ness either, because many people who cannot deal with a standard (eg. bash) shell will happily use terminal-like input systems in specialized software.
The question then becomes what can be reasonably done about it? Or rather, can it even be realistically achieved (in an arbitrary time scale that we care about), given how much of the culture and tooling is currently centered around passing text from one command to another. Change for something so foundational surely wouldn't be quick.
> But that's not a reason to downgrade the 'ideal' to fit the current reality, only a reason to acknowledge that we're just not there yet.
In the end, this probably sums it up nicely. Even if that current reality, which often will be the lowest common denominator, is what most people will earn their paychecks with (and possibly leave edge cases for someone else to deal with down the line).
> The question then becomes what can be reasonably done about it? Or rather, can it even be realistically achieved (in an arbitrary time scale that we care about), given how much of the culture and tooling is currently centered around passing text from one command to another. Change for something so foundational surely wouldn't be quick.
It definitely won't be quick, no. I think the tooling is actually not that big of a problem there; the functionality provided by the 'common tools' generally isn't that complex in scope (at least, in the context of modern development tools), and the many RIIR projects have shown that total do-overs for these sorts of tools are viable.
The bigger problem is going to be cultural, and particularly that persistent (but often unspoken) belief that "computers peaked with UNIX in the 70s/80s and what we have now is the best it will ever be". It often results in people actively pushing back on improvements that really wouldn't have any downsides for them at all, thereby creating unnecessary friction.
That same 'ideology', for lack of a better word, also makes it very difficult to talk about eg. hybrid terminal/graphical systems - because "GUIs are evil, text is better" and similar sentiments. That really gets in the way of advancing technology here.
Ultimately, I think the actual interop and/or reimplementation problems are going to be a walk in the park, compared to dealing with the messy human and cultural/ideological factors involved that actively resist change.
yeah, i'm not trying to say it's a good idea. but the parent asked why, and the answer is essentially because we're lazy, it's easy, and it works often enough that we can get away with it.
i figured the article we're all commenting on would be evidence enough of why it's not necessarily a good idea.
it's one of the pillars of the unix philosophy at work - text is the universal interface.
your nice rich type requires figuring out how that type works, maybe even reading documentation. if you make it a string, you can just treat it like a string. and we all know how to handle strings, or at least how to copy code from somewhere else that handles strings.