Some people have stated that GitHub Actions (or even GitLab CI) might now have a...

majkinetor · on June 30, 2022

> I know that people really enjoy Prometheus and Grafana, but personally it all seems to work decently

Those are for hard core metrics - not for status metrics which are trivial and you have only couple of them. When you have thousands of metrics, uptime kuma and bunch of friends wont help you.

Detailed App/Infra metrics can also run on your own infrastructure unlike status pages that should use something independent. In your case, if your local mattermost fails, you will get 0 notifications. It doesn't even have to fail, its enough that your company Internet stops working (like it happened to me today, 20 + services were not available, including local statping [1] instance)

> even if Zabbix for example is a bit old fashioned,

Its not a problem that it is old fashioned but that it is harder then it needs to be - compared for example to telegraf/influxdb/grafana its setup on both sides is far from trivial. With influxdb, I can send metrics even from PowerShell scripts with 0 effort [2].

[1]: https://github.com/statping/statping

[2]: https://github.com/majkinetor/psinflux

KronisLV · on June 30, 2022

> When you have thousands of metrics, uptime kuma and bunch of friends wont help you.

This is fair! Actually Uptime Kuma still doesn't support a multi-user mode (e.g. one admin user, multiple users that can edit values/setup, maybe some for viewing data).

That said, I'm also at the scale where it makes perfect sense to use something this simplistic and there are few things that give me joy than running/building a container image and getting working software in less than an hour, which at my scale is also good for most if not all "day 2" concerns.

> Detailed App/Infra metrics can also run on your own infrastructure unlike status pages that should use something independent In your case, if your local mattermost fails, you will get 0 notifications.

Another fair point! That said, there's very little preventing you from choosing the most boring and stable multi-cloud setup that you can find. A Docker container for the software, with a reverse proxy and connected to the aforementioned infrastructure monitoring.

Has the Docker service failed? I'll get a notification. Docker bridge network down? I'll get a notification. Containers fail health checks? Might still need to work on this, but totally doable as well with minimal work.

Of course, there's also a lot of variability to how you can lay everything out - for example, I run some of my personal infrastructure from nodes that are in another room at my place, most other parts off of rented VMs in a semi-local company. My homepage, for example, has both Uptime Kuma as well as external monitoring service connected to it, just to compare how believable those values are.

At work, though? For development/test environments, Uptime Kuma on a separate server is enough (say, if you have one that controls the container cluster or aggregates other metrics, might as well spin up a simple container there), or any other software package that's necessary, like Apache Skywalking etc.

For production? Frankly, depending on what you're running, you might as well get a team of people together and come up with something that has proper redundancies in place, as well as a multi-cloud strategy.

majkinetor · on June 30, 2022

> Has the Docker service failed? I'll get a notification. Docker bridge network down? I'll get a notification.

If you rely on cloud services yes. If you run your own infra, then no, you will have to metric/alert that in a custom manner as with everything else. So that thing you mention is NOT a borring technology (which should be promoted) but outsourceing (which should NOT get promoted in general).

> For development/test environments, Uptime Kuma on a separate server is enough

It doesn't matter as your network will fail. There is nothing worse then status page having false positives.

KronisLV · on June 30, 2022

> If you rely on cloud services yes. If you run your own infra, then no, you will have to metric/alert that in a custom manner as with everything else.

Consider this example:

  I have Zabbix on server A.
  I have an e-mail server on server B.
  I have Uptime Kuma on server C.
  I have an instance of Mattermost on server D.
  I have the application that I want to monitor on server E.

In a zero trust model (or even just running WireGuard) there is very little preventing you from having either on different cloud providers. There's also very little preventing you from having a setup like A-D on a few boxes that sit under your desk/colocated somewhere but having D in the cloud.

Thus, one can reason about the potential failure states:

  If servers C-E run into issues (say, Docker issues), I'll get a notification thanks to A and B (Zabbix sending an e-mail).
  If servers C-E are utterly unreachable (say, network interface problems), I'll get a notification thanks to A and B (Zabbix sending an e-mail).
  If servers A-B or E run into issues, I'll get a notification thanks to C and D (Uptime Kuma sending a message).
  In the current configuration, I wouldn't be protected against a compound failure of A-D (both Zabbix and Uptime Kuma down), but those might as well run on different clouds, with different orchestrators.

Of course, you can setup failover and redundancy options, but by that point you're probably also looking into distributed file systems for any backing storage like GlusterFS or Ceph but right now I don't need that complexity.

Furthermore, as you said, you can also rely on cloud services in addition to what you already have, so should A-D go down, then E will still be monitored by another solution as an alternative, though that's also hardly necessary for most things.

Hell, for all I care, I might as well have a Raspberry Pi on my desk that pings the servers, checks SSH connections, checks running Docker images, does a curl call and blinks and beeps aggressively when something isn't okay on servers that sit in a data center somewhere. It's not like there's not an endless amount of options. Of course, you can also go in the opposite direction and pick whatever is good enough, such as having A-B as a single server (or VM) and C-D as a single server (or VM), to not overcomplicate.

majkinetor · on June 30, 2022

I know you can have all that :) All I say is that you must relay on externals if A-E are all on the same network as it may go down. Then your emails or other notif. channels wont work.

Be that as it may I think people generally tend to overkill redundancy. One can usually tolerate most of the regular services going down an hour or tow once every couple of years...

KronisLV · on July 1, 2022

> All I say is that you must relay on externals if A-E are all on the same network as it may go down.

Thankfully, it's not too hard to take advantage of multiple networks in a hybrid/multi-cloud setup nowadays! Though, depending on the necessary access controls and auditing, such a setup might require slightly more work.

You do bring up an excellent point, though, about how it's a serious single point of failure in many systems out there, because personally I've also seen many setups like that (the majority of them, actually): I do suspect that in many cases that is indeed done for ease of use/convenience, even if it may lead to downtime.

Of course, in some cases downtime is acceptable, so I cannot argue that it can also make sense to choose such a simpler setup - for example, for having your own company's applications/monitoring for development environments all on the same network.

Though if this topology is retained at scale, things can get a bit interesting. On a similar note, I recall Bryan Cantrill doing an interesting presentation "Debugging Under Fire: Keep your Head when Systems have Lost their Mind" that talked about restarting their whole data center and the implications of that: https://youtu.be/30jNsCVLpAE

GordonS · on June 30, 2022

I've been using Site24x7 for at least 6 years now, probably closer to 10. They have a "free forever" plan that's good enough for small projects: https://www.site24x7.com

It's been rock solid, haven't had a single issue.

I'm at the point where I now want a bit more, such as a status page, more frequent polling, monitoring from different continents and a record of historical outages. So I'm either going to pony up for a paid Site24x7 plan (£7/m), or self-host something myself - Kuma looks awesome BTW, I hadn't come across it before!

dmd · on June 30, 2022

OK, I'll bite: Why is Zabbix old-fashioned? What would you use instead? (I have ~60 bare metal servers of various vintages and OSes to monitor.)

KronisLV · on June 30, 2022

It's a sentiment that I've heard a lot - Zabbix focuses a lot on a core set of functionality, which is mostly just monitoring the state of some number of servers/VMs, with some optional integrations.

While the metrics that come out of the box are nice, everything does feel a bit cumbersome, personally - such as creating network maps (which aren't automatically generated), dashboards for getting input on CPU/RAM/storage at a glance, as well as viewing web monitoring statistics.

Actually, previously I used it for web monitoring, though for some reason by default it does not allow you to have triggers for those that would send an e-mail once a site goes down, even if the performance information and configuration was passable.

But overall it's just views after views, nested inside of tables with weird UI/UX choices, such as having Update/Add buttons, both of which are necessary to persist changes (for example, in the web monitoring section when adding new monitoring steps and also wanting to save those changes).

Like another commenter suggested, there are systems out there that make ingesting and visualizing data more easy and sometimes also more pleasant... Though personally Zabbix is fine for what I need.

jpeeler · on June 30, 2022

What made you go with Mattermost for relaying messages to your phone?

KronisLV · on June 30, 2022

Mostly the fact that I already had an instance up and running for chatting with some people and because it was pretty trivial to integrate Uptime Kuma with it.

Sending e-mails would have also been a decent idea, but honestly configuring a Mattermost app/account/token is easier than messing around with a new mail account and all of the SMTP settings.

As for Mattermost itself, previously I also enjoyed running Rocket.Chat, though somehow Mattermost felt a bit more like Slack (what I'm used to), runs with PostgreSQL as a backing store and also has the Boards and Playbooks integrations, alongside lots of other goodies.

I do miss the ability to integrate Jitsi or something else in the actual interface like Rocket.Chat did, but it's not really a large problem overall.