Low level development is often fun and somewhat easy on mature hardware where someone else has already discovered the issues.
But on new hardware, you're dealing with gremlins. Physical factors can affect hardware, and sometimes you just can't be sure if it's your code, Cthulhu or position of the moon. For example higher temperature can both decrease or increase crystal frequency. Firmware/software can affect hardware through power consumption and other routes. It can be really hard to know if it's hardware or software, except in hindsight.
Catching something that happens frequently is one thing, you have oscilloscopes and stuff. But good luck hunting down the issue that happens once per month with 20 test devices. Sometimes these things can take more than half a year to catch and fix.
When you mess up in kernel driver, result is often something pretty bad. Like crashing or freezing the system, or worse, silent corruption. Freezes and crashes are great, if you can use kernel debugger or get the dumps. Not so great when it happens on the other side of the world without any low level data to work with. You also have to really understand kernel interfaces, OS power management, driver life-cycle issues and performance ramifications to mention some.
I don't mean low level is necessarily hard, but I do mean it really, really depends!
This brings back memories.... At a previous employer we had a product of ours that would just randomly reboot. Standard Windows on an industrial PC, nothing you’d think would be that esoteric. It’d look like a sudden power failure. No crashes, nothing in the logs to indicate anything at all. Similar hardware platform to some other products we had where this issue never arose.
It literally took months to find the cause, while in the meantime the people that didn’t want us to succeed (think big-company politics) were scoring points against us. We did everything we could think of, fixed a lot of bugs, tried stress-testing the system, things like that. Nothing worked. We were met with skepticism when we started saying it was the hardware. They said it can’t be the hardware....
Except this time it was — and we found out almost purely by good luck. One of the hardware guys happened to have an oscilloscope hooked up to a system when the reboot happened. The reboot was preceded by ‘something strange’ on the scope (I’m likely missing stuff here since this is memory from a few years back).
It turned out the motherboard had a bug which would manifest when the system went into a certain lower power state. That’s why stress-testing never caused it. In fact, the testing we did was actively preventing us from catching it!!
Lesson well and truly learned.
On a related note - is it just me or did Intel have a lot of trouble with their power-management implementation a few years back? I’ve since worked on another product that would randomly hard-freeze and sure enough after checking the Intel errata doc for the CPU/SoC in question (an Atom something-or-other), sure enough there were issues with S-states that meant we needed to limit them (to S1 I believe) in the BIOS, which did indeed fix the freeze issue.
To anyone that wants to avoid the long road here, realise that modern Windows, Linux and macOS systems never freeze or kernel panic without a special reason. Part of what I had to unlearn many years ago was the notion that on Windows, freezes and blue-screens were things that ‘just happened’ in the course of otherwise regular operation.
Low-level can get you, even if you aren’t doing low-level work.
Problems like that are fairly common when you're developing hardware products. Even if you're using products / modules someone else made.
> On a related note - is it just me or did Intel have a lot of trouble with their power-management implementation a few years back?
Everyone seems to have troubles with their power management stuff! Power state transitions are tricky in hardware, firmware and software. There are a ton of corner cases and assumptions. For example, you have to be careful (in software!) that you don't cause voltage dips by switching on chips and their peripherals too quickly. See [0].
CPUs, chipsets and peripherals have pages and pages of errata that BIOS, microcode/firmware updates and operating systems work around.
All of this is true, and you haven't even started getting into prototype hardware where a minor coding error could cause your hardware to literally catch fire. When you're playing with mechatronics, you often get into fun games of "was it mechanical, electrical or software that let the magic smoke out". :)
If you consider the various meaning of "smoke test" at https://en.wikipedia.org/wiki/Smoke_testing , I suspect you'll agree that the mechanical meaning is almost certainly older than the electrical or the software.
That's where you have some vessel that should be air-tight, and you blow some smoke into it to see if/where the smoke leaks. You could do that for plumbing, you could do that for musical instruments, you could do that for all sorts of things.
Maybe the electrical engineers came up with the term "smoke testing" all own their own, but I think it's a lot likelier that they heard it from some plumbers or something, liked it, and then it gained _additional_ cachet from the association with smoke-means-failure.
I don't think it's too much of a stretch to imagine EEs coming up with "smoke test" independently. I've never heard it actually used in the mechanical sense (although I think I was aware of the term) but I've heard many people talk about magic smoke and electrical smoke tests. And the usage isn't quite congruent - an electrical smoke test (as I understand the term, anyway) is very specifically the act of applying full voltage to a freshly assembled and untested piece of equipment and being ready to shut off the power if anything starts to smoke (or less dramatically, seeing if any circuit breakers trip.)
Indeed, a smoke test will demonstrate the integrity or not of one's plumbing stack. An exterminator used on to determine that rats were getting into our house through a vent pipe that had been removed without the remainder being capped.
Additionally, when bringing up new hardware, the knowledge you need in order to understand the state of a malfunctioning device and debug a crash is often stored in the head of the logic designer who wrote the HDL, not exactly something you can look up on StackOverflow.
It's always great when you need to figure out what a particular register means or the architecture of a particular error handling process or something like that, and you open the hardware spec and find "TBD" because it's all still unreleased...
Yeah... just need to have a ton of debug registers included. Especially if it's ASIC.
And yes, Stack Overflow is really not going to help you with pretty much any of this. When doing low level programming, better get comfortable with that there's often no one in the world who can answer your question. Find it out yourself.
There isn't as much work in that area, but there aren't as many people that do it, either. It balances out to pay pretty well, at least in my neck of the woods.
More like, back-end is easy, easier than front-end. He is comparing low-level backends with web frontends.
On my first serious project at the first job I estimated back-end part of an app as 75% and front-end as 25%. It ended up exact reverse of that. And in the next years the pattern repeated over and over.
Everybody seems to think that UI is just about placing buttons but backend is the heavy lifting. In reality, requirements change more often and in a more fundamental way, code is harder to test, more third party dependencies involved, even the dumb KLOC metric ends up bigger.
By any mean, low-level backend will be harder than high-level backend, and low-level frontend will be harder than high-level frontend. Try implementing HTTP server in plain C vs. plain Java (no third party frameworks) and you will see. Being easier is practically the definition of high-level, he is just comparing things that are too different.
> He is comparing low-level backends with web frontends.
No, I don't think so. I think he's doing exactly what he said - comparing low-level with high-level. The low level stuff he was working on doesn't sound like it was a backend; it sounds like it was an embedded device running bare-metal (no OS). You wouldn't do a backend (database server, say) that way (or so I suspect).
But that's not what "backend" means. "Backend" means "not directly facing end users, but doing part of the work for the parts that are directly facing end users". Embedded systems are not backend. They are their own category.
Back-end being exciting and interesting might make it a tad easier since you can focus on it. When doing front-end I have to fight myself from browsing the web instead of implementing another CRUD while being asked to somehow make it pop while having no specs.
All the good stuff seems to be happening on personal blogs/programmer's own websites and tends to be hyper-specific/technical. More general philosophy seems to be lacking and mostly the domain of writers lacking substance (the stuff on medium for instance). Unfortunately I don't really have anywhere to point you right now but if you pay attention to forums (subreddits for programming languages etc.) and the twitters of writers/programmers you stumble upon you might start to find more interesting content. There is no central place, but there are a lot of interesting people doing interesting work out there.
High-level software development is basically building Jenga towers with thousands of (let's be real) mostly not very good parts. Hands up: how many developers think 100% code coverage is the absolute minimum? Then consider that you can reach 100% coverage without ever testing a single corner case. No, not that `if` statement in your code, but rather what happens when this value is zero, when this time zone name is unknown (at any of frontend, backend or database levels), when screen size is unavailable, or any of a bazillion things which can go wrong? Hardware devs have to test these things, and that's why you don't have to reboot every three milliseconds.
> I keep what feels like most of ARM946E-S in my brain
ARM64 architecture reference manual is 5242 pages and doesn't even include things like GPU, CPU (554 pages for A57), interrupt controller (240 pages for the main GIC version 2 doc), dma controller (100 pages) and tons more you need to do low-level stuff.
But on new hardware, you're dealing with gremlins. Physical factors can affect hardware, and sometimes you just can't be sure if it's your code, Cthulhu or position of the moon. For example higher temperature can both decrease or increase crystal frequency. Firmware/software can affect hardware through power consumption and other routes. It can be really hard to know if it's hardware or software, except in hindsight.
Catching something that happens frequently is one thing, you have oscilloscopes and stuff. But good luck hunting down the issue that happens once per month with 20 test devices. Sometimes these things can take more than half a year to catch and fix.
When you mess up in kernel driver, result is often something pretty bad. Like crashing or freezing the system, or worse, silent corruption. Freezes and crashes are great, if you can use kernel debugger or get the dumps. Not so great when it happens on the other side of the world without any low level data to work with. You also have to really understand kernel interfaces, OS power management, driver life-cycle issues and performance ramifications to mention some.
I don't mean low level is necessarily hard, but I do mean it really, really depends!