The counter-argument to this is if you are building something that is in the critical path of an application (for example, parsing HTTP in a web server), you need to be performance-minded from the beginning because design decisions lead to design decisions. If you are building something in the critical path of the application, the best thing to do is build it from the ground up measuring the performance of what you have as you go. This way, each time you add something you will see the performance impact and usually there’s a more performant way of doing something that isn’t more obscure. If you do this as you build, early choices become constraints, but because you chose the most performant thing at every stage, the whole process takes you in the direction of a highly-performant implementation.
Why should you care about performance?
I can give you my personal experience: I’ve been working on a Java web/application server for the past 15 years and a typical request (only reading, not writing to the db) would take maybe 4-5 ms to execute. That includes HTTP request parsing, JSON parsing, session validation, method execution, JSON serialization, and HTTP response dispatch. Over the past 9 months I have refactored the entire application for performance and a typical request now takes about 0.25 ms or 250 microseconds. The computer is doing so much less work to accomplish the same tasks, it’s almost silly how much work it was doing before. And the result is the machine can handle 20x more requests in the same amount of time. If it could handle 200 requests per second per core before, now it can handle 4000. That means the need to scale is felt 20x less intensely, which means less complexity around scaling.
High performance means reduced scaling requirements.
Please accept a high five from a fellow "it does so little work it must have sub-millisecond request latency" aficionado (though I must admit I'm guilty of abusing memory caches to achieve this).
But even that sort of depends right? Hardware is often pretty cheap in comparison to dev-time. I really depends on the project, what kind of servers you're using, the nature of the application etc, but I think a lot of the time it might be cheaper to just pay for 20x the servers than it would be to pay a human to go find a critical path.
I'm not saying you completely throw caution to the wind, I'm just saying that there's a finite amount of human resources and it can really vary how you want to allocate them. Sometimes the better path is to just throw money at the problem.
I think it depends on what you’re building and who’s building it. We’re all benefitting from the fact that the designers of NGINX made performance a priority. We like using things that were designed to be performant. We like high-FPS games. We like fast internet.
I personally don’t like the idea of throwing compute at a slow solution. I like when the extra effort has been put into something. The good feeling I get from interacting with something that is optimal or excellent is an end in itself and one of the things I live for.
Sure, though I've mentioned a few times in this thread now that the thing that bothers me more than CPU optimizations is not taking into account latency, particularly when hitting the network, and I think focusing on that will generally pay higher dividends than trying to optimize for processing.
CPUs are ridiculously fast now, and compilers are really really good now too. I'm not going to say that processing speed is a "solved" problem, but I am going to say that in a lot of performance-related cases the CPU processing is probably not your problem. I will admit that this kind of pokes holes in my previous response, because introducing more machines into the mix will almost certainly increase latency, but I think it more or less holds depending on context.
But I think it really is a matter of nuance, which you hinted at. If I'm making an admin screen that's going to have like a dozen users max, then a slow, crappy solution is probably fine; the requests will be served fast enough to where no one will notice anyway, and you can probably even get away with the cheapest machine/VM. If I'm making an FPS game that has 100,000 concurrent users, then it almost certainly will be beneficial to squeeze out as much performance out of the machine as possible, both CPU and latency-wise.
But as I keep repeating everywhere, you have to measure. You cannot assume that your intuition is going to be right, particularly at-scale.
I absolutely agree that latency is the real thing to optimize for. In my case, I only leave the application to access the db, and my applications tend not to be write-heavy. So in my case latency-per-request == how much work the computer has to do, which is constrained to one core because the overhead of parallelizing any part of the pipeline is greater than the work required. See, in that sense, we’re already close to the performance ceiling for per-request processing because clock speeds aren’t going up. You can’t make the processing of a given request faster by throwing more hardware at it. You can only make it faster by creating less work for the hardware to do.
(Ironically, HN is buckling under load right now, or some other issue.)
It almost certainly would require more than 20x servers because setting up horizontal scaling will have some sort of overhead. Not only that, there is the significant engineering effort to develop and maintain the code to scale.
If your problem can fit on one server, it can massively reduce engineering and infrastructure costs.
Why should you care about performance?
I can give you my personal experience: I’ve been working on a Java web/application server for the past 15 years and a typical request (only reading, not writing to the db) would take maybe 4-5 ms to execute. That includes HTTP request parsing, JSON parsing, session validation, method execution, JSON serialization, and HTTP response dispatch. Over the past 9 months I have refactored the entire application for performance and a typical request now takes about 0.25 ms or 250 microseconds. The computer is doing so much less work to accomplish the same tasks, it’s almost silly how much work it was doing before. And the result is the machine can handle 20x more requests in the same amount of time. If it could handle 200 requests per second per core before, now it can handle 4000. That means the need to scale is felt 20x less intensely, which means less complexity around scaling.
High performance means reduced scaling requirements.