The number of people at Amazon is pretty much irrelevant; the org is going to en...

scottlamb · 2026-04-06T04:25:06 1775449506

> The number of people at Amazon is pretty much irrelevant; the org is going to ensure that someone is keeping an eye on kernel performance, but also that the work isn’t duplicative.

I'd guess they have dozens of people across say a Linux kernel team, a Graviton hardware integration team, an EC2 team, and a Amazon RDS for PostgreSQL team who might at one point or another run a benchmark like this. They probably coordinate to an extent, but not so much that only one person would ever run this test. So yes it is duplicative. And they're likely intending to test the configurations they use in production, yes, but people just make mistakes.

db48x · 2026-04-06T07:49:45 1775461785

True; to err is human. But it is weird that they didn’t just fire up a standard RDS instance of one or more sizes and test those. After all, it’s already automated; two clicks on the website gets you a standard configuration and a couple more get you a 96c graviton cpu. I just wonder how the mistake happened.

menaerus · 2026-04-06T09:23:05 1775467385

You're assuming that they ran the workload with huge-pages disabled unintentionally.

db48x · 2026-04-06T12:06:22 1775477182

No… I’m assuming that they didn’t use the same automation that creates RDS clusters for actual customers. No doubt that automation configures the EC2 nodes sanely, with hugepages turned on. Leaving them turned off in this benchmark could have been accidental, but some accident of that kind was bound to happen as soon as the tests use any kind of setup that is different from what customers actually get.

menaerus · 2026-04-06T14:41:21 1775486481

You're again assuming that having huge pages turned on always brings the net benefit, which it doesn't. I have at least one example where it didn't bring any observable benefit while at the same time it incurred extra code complexity, server administration overhead, and necessitated extra documentation.

scottlamb · 2026-04-07T15:28:43 1775575723

FYI: huge pages isn't just a system-wide toggle, but a variety of things you can do:

* explicit huge pages

* transparent huge pages system-wide default

* app-specific or even mapping-specific toggles

* various memory allocator settings to raise its effectiveness

It would be really surprising to me to see a workload for which it's optimal to not use huge pages anywhere on the system.

menaerus · 2026-04-08T05:59:20 1775627960

It is a system-wide toggle in a sense that it requires you to first enable huge-pages, and then set them up, even if you just want to use explicit huge pages from within your code only (madvise, mmap). I wasn't talking about the THP.

When you deploy software all around the globe and not only on your servers that you fully control this becomes problematic. Even in the latter case it is frowned upon by admins/teams if you can't prove the benefit.

Yes, there are workloads where huge-pages do not bring any measurable benefit, I don't understand why would that be questionable? Even if they don't bring the runtime performance down, which they could, extra work and complexity they incur is in a sense not optimal when compared to the baseline of not using huge-pages.

scottlamb · 2026-04-12T23:37:29 1776037049

> Yes, there are workloads where huge-pages do not bring any measurable benefit

I really doubt it, except of course workloads where you just use a trivial amount of memory to begin with. In systems I've seen, anywhere from 5% to 15% of the CPU time is spent waiting for TLB misses. It's obvious then that huge pages can be hugely beneficial if properly used; by definition they hugely relieve TLB pressure.

You can of course end up in situations where transparent TLB scanning is worse than nothing, but that's exactly why I pointed out there's a variety of ways to use huge pages.

menaerus · 2026-04-15T07:58:47 1776239927

You don't seem to understand the idea that CPU spending time on TLB misses and at the same time seeing no measureable effects in E2E performance because much larger bottleneck is elsewhere can be both valid simultaneously. In database kernels with large and unpredictable workloads, high IO and memory footprint, this is certainly easy to prove.

scottlamb · 2026-04-16T01:16:55 1776302215

I think you're moving the goalpost here. There's a measurement improvement to CPU usage. You're over-provisioned on CPU and don't care. Fine.