> If you’re a Software Engineer/Developer, then consider that a service (at least, for me), is a piece of code running in a live production system, that YOU wrote, only YOU know how it works, thus YOU own.
Like this is the single biggest truth in the article, and I'm glad to see it stated so clearly. Shout it from the rooftops, please. It's a direct logical consequence, too — and yet, so many people seem to make decisions that violate this truth.
I field so many questions about "why is service X doing Y?" Have you asked the service owners?
Unfortunately, I've found one more or less has to become proficient in rapidly understanding services you don't own, because getting other people to act logically is a fool's errand.
> Are you logging to stdout ?
Nooooo to stderr, that's literally what it is there for. (As C says, "for writing diagnostic output". Logs are that.) Also, it is sometimes buffered and you don't (IMO) really want that.
Any output producing program requires stdout for the output, and you can't co-mingle logs with that and have piping still work. While it is unlikely that your production service is producing output, there's no reason to do anything different with the logs. (I'd say a part of being a good production service is "don't be needlessly special".)
(But our tooling will just capture and mux the two streams together, too, so it doesn't matter, unless buffering means the error logs don't make it right before your service is killed.)
Also, your infra team provides the metrics service, but you need to capture your own metrics. My metrics provider does not have a crystal ball, it cannot peer into your service's memory and pull out critical stats. You must push them yourself. Talk to your infra team, they can show you the API they use… (We collect common, machine level stats, like "CPU in use" or external things about your service that are easily visible, like per-container memory usage. But not your reqs/sec.)
> Nooooo to stderr, that's literally what it is there for.
Bah… use syslog() (or whatever uses the same protocol) and then you get priority, name of the daemon… and if you step it up to journald, then you get to log key:value stuff.
Of course most golang developers have never heard of syslog() and think that logging is done with stdout and then a bunch of parsers to extract information that was there to begin with, had they used a proper logging.
We can argue about exact implementation, but SRE demanding all apps (assuming non-interactive ones run on servers) log to the same channel (with enough tagging info to ID the app/server) is a good thing. In my case we are automagically configured with a Splunk appender for Logback (and the platform also sends the stdout/stderr to Splunk under a different sourceType).
Ah, yeah. My BE work is almost exclusively inside containers, where journald is not available.
(We could perhaps arrange that, but JSON-lines is typically good enough, and easier for devs to understand.)
(Note that the KV stuff requires you to speak journald's protocol: syslog in systemd (and really, everything I've ever seen speak syslog) is the old BSD syslog protocol, which doesn't support KV data. Not that journald's protocol is particularly hard to speak.)
Like this is the single biggest truth in the article, and I'm glad to see it stated so clearly. Shout it from the rooftops, please. It's a direct logical consequence, too — and yet, so many people seem to make decisions that violate this truth.
I field so many questions about "why is service X doing Y?" Have you asked the service owners?
Unfortunately, I've found one more or less has to become proficient in rapidly understanding services you don't own, because getting other people to act logically is a fool's errand.
> Are you logging to stdout ?
Nooooo to stderr, that's literally what it is there for. (As C says, "for writing diagnostic output". Logs are that.) Also, it is sometimes buffered and you don't (IMO) really want that.
Any output producing program requires stdout for the output, and you can't co-mingle logs with that and have piping still work. While it is unlikely that your production service is producing output, there's no reason to do anything different with the logs. (I'd say a part of being a good production service is "don't be needlessly special".)
(But our tooling will just capture and mux the two streams together, too, so it doesn't matter, unless buffering means the error logs don't make it right before your service is killed.)
Also, your infra team provides the metrics service, but you need to capture your own metrics. My metrics provider does not have a crystal ball, it cannot peer into your service's memory and pull out critical stats. You must push them yourself. Talk to your infra team, they can show you the API they use… (We collect common, machine level stats, like "CPU in use" or external things about your service that are easily visible, like per-container memory usage. But not your reqs/sec.)