Reptar

dang · on Nov 14, 2023

Related: https://cloud.google.com/blog/products/identity-security/goo...

(via https://news.ycombinator.com/item?id=38268043, but we merged the comments hither)

xyst · on Nov 14, 2023

Reading this makes me realize how little I know of the hardware that runs my software

> Prefixes allow you to change how instructions behave by enabling or disabling features

Why do we need “prefixes” to disable or enable features? Is this for dynamically toggling feature so you don’t have to go into BIOS?

db48x · on Nov 14, 2023

Read https://wiki.osdev.org/X86-64_Instruction_Encoding#Legacy_Pr...

The REP prefixes are the most common; they just let you perform the same instruction a variable number of times. It looks in the CX register for the count. This makes many common loops really, really short, especially for moving objects around in memory. The memcpy function is often inlined as a single REP MOVS instruction, possibly with an instruction to copy the count into CX if it isn’t already there.

I suppose the REX (operand size) prefix is pretty common too, since 64–bit programs will want to operate on 64–bit values and addresses pretty frequently.

None of the prefixes toggle things that can be set globally, by the BIOS or otherwise. They all just specify things that the next instruction needs to do.

pclmulqdq · on Nov 14, 2023

The ModR/M and SIB prefixes are probably the most common prefixes in instructions. They are so common that assemblers elide their existence when you read code. REX is in the same boat: so common that it's usually elided. The VEX prefix is also really common (all of the V* AVX instructions, like VMOVDQ), and then the LOCK prefix (all atomics).

After all of those, REP is not that uncommon of a prefix to run into, although many people prefer SIMD memcpy/memset to REP MOVSB/REP STOSB. It is slightly unusual.

epcoa · on Nov 15, 2023

This isn't correct. ModR/M and SIB are not prefixes. They are suffixes and essentially part of the core instruction encoding for certain memory and register access instruction. they are the primary means of encoding the myriad addressing modes of the x86. And their existence is not elided in any meaningful way, their value is explicitly derived from the instruction operands (SIB is scale, index, base), so when you see an instruction like:

mov BYTE PTR [rdi+rbx*4],0x4

SIB is determined by the register indices of rdi, rbx, and 4, all right there in the instruction. Likewise, Mod R/M encodes the addressing mode, which is clear from the operands in the assembler listing. Though x86 is such as mess that there are cases where you can encode the same instruction in either a Mod R/M form or a shorter form, eg PUSH/POP.

REX is a prefix, but it is a bit special as it must be the last one, and repeats are undefined. It is not elided because of commonality but because its presence and value is usually implied from the operands, it is therefore redundant to list it.

For instance, PUSH R12 must use a REX prefix (REX.B with the one byte encoding).

bonzini · on Nov 14, 2023

ModRM and SIB are not a prefix, they're part of the opcode (second and third byte after all the prefixes and the 0Fh/0F38h/0F3Ah opcode map selectors)

EarlKing · on Nov 14, 2023

More specifically, they're affixed to certain opcodes that require them. There are a number of byte-sized opcodes that do not require a ModRM or SIB byte (although a number of those got gobbled up to make the REX prefix, but that's another story).

TL;DR Weeee! Intel machine language is crazy!

EarlKing · on Nov 14, 2023

There's a good reason for using vector instructions over REP: Until relatively recently that was how you got maximum performance in small, tight loops. REP is making a comeback precisely because of ERMS and FSRM, so unfortunately this will become a bigger problem going forward.

msm_ · on Nov 16, 2023

This doesn't sound right.

REP prefixes are pretty rare. Depending on compiler they are used rarely, for a few specific operations (like rep movsd for memcpy) or usually never.

Most common prefixes are by far REX prefixes in 6h4 64bit assembly (don't believe me? Look at the 64bit code in vim and see all those `H` letters around. It's REX.W). Segment override prefixes are another class of prefixes that are used in handwritten assembly (in bootloader or special runtime functions) but almost never used by compilers.

In older code, most common prefixes are 0x66h (it doesn't even has opcode, there's no way to emit it directly) and maybe 0x67h.

They are all used to modify some aspect of the next executing instruction, for example the "default" register size is 32bit, but you can change it with a prefix. I think an example will help.

And there are also the extensions of base, SIB and Mod/RM fields, mentioned by the article (to the extended set of high registers):

$ echo 4431c0 | xxd -r -ps | ndisasm -b64 /dev/stdin 00000000 4431C0 xor eax,r8d

$ echo 4131c0 | xxd -r -ps | ndisasm -b64 /dev/stdin 00000000 4131C0 xor r8d,eax

ajross · on Nov 14, 2023

"Prefixes" in this case mostly expand the instruction encoding space.

So rarely-used addressing modes get a "segment prefix" that causes them to use a segment other than DS. Or x86_64 added a "REX" prefix that added more bits to the register fields allowing for 16 GPRs. Likewise the "LOCK" prefix (though poorly specified originally) causes (some!) memory operations to be atomic with respect to the rest of the system (c.f. "LOCK CMPXCHG" to effect a compare-and-set).

All these things are operations other CPU architectures represent too, though they tend to pack them into the existing instruction space, requiring more bits to represent every instruction.

Notably the "REP" prefix in question turns out to be the one exception. This is a microcoded repeat prefix left over from the ancient days. But it represents operations (c.f. memset/memmove) that are performance-sensitive even today, so it's worthwhile for CPU vendors to continue to optimize them. Which is how the bug in question seems to have happened.

shenberg · on Nov 14, 2023

Prefixes are modifiers to specific instructions executed by the processor, e.g. to control the size of the operands or enable locking for concurrency.

Tuna-Fish · on Nov 14, 2023

x86 was designed in 78, basically for the purpose of running a primitive laser printer (or other similar workloads). The big problem with this is that the encoding space for instructions was "efficiently utilized". When new instructions, or worse, additional registers were later added, you had to fit the new instruction variants in somehow, and you did this by tacking on prefixes.

mschuster91 · on Nov 14, 2023

Nah, x86 goes even earlier in its heritage - it was, effectively, a bolt-on on Intel's way older designs, as a huge part of the 8086 was being ASM source-compatible with the older 8xxx chips, even as the instruction set itself changed [1]. What utterly amazes me is that the original 8086 was mostly designed by hand by a team of not even two dozen people - and today, we got hundreds if not thousands of people working on designing ASICs...

[1] https://en.wikipedia.org/wiki/Intel_8086#The_first_x86_desig...

irdc · on Nov 15, 2023

Acckkghtually, if you go back far enough you end up at the Datapoint 2200. If you want to understand where some of the crazier parts of the 8086 originate from, Ken Shirriff has a nice read: http://www.righto.com/2023/08/datapoint-to-8086.html

hulitu · on Nov 15, 2023

It is because testing plays a bigger part today than back then. The complexity has also increased (people do not design at transistor level anymore).

thaumasiotes · on Nov 15, 2023

> x86 was designed in 78, basically for the purpose of running a primitive laser printer

It's interesting that ASCII is transparently just a bunch of control codes for a physical printer/typewriter, combining things like "advance the paper one line", "advance the paper one inch", "reset the carriage position", and "strike an F at the carriage position", all of which are different mechanical actions that you might want a typewriter to do.

But now we have Unicode, which is dedicated to the purpose of assigning ID numbers to visual glyphs, and ASCII has been interpreted as a bunch of glyph references instead of a bunch of machine instructions, and there are the control codes with no visual representation, sitting in Unicode, being inappropriate in every possible way.

It's kind of like if Unicode were to incorporate "start microwave" as part of a set with "1", "2", "3", etc.

rswail · on Nov 15, 2023

ASCII was used by teletypes, not typewriters. They were "cylinder" heads, as compared to IBM's golfball typewriters.

The endless CR/LF/CRLF line ending problem would have been solved if the RS (Record Separator) ASCII code was used instead of the physical CR = carriage return, ie move print head back to start of line, and LF = line feed, ie rotate paper up one line.

But Unix decided on LF, Apple used CR, Windows used CRLF, and even today, I had to get a guy to stop setting his system to "Windows" because he was screwing up a git repo with extraneous CRs.

jeffbee · on Nov 14, 2023

It's just because x86 as an ISA has accreted over the course of 40+ years, and has variable-length instructions. Every time they extend the ISA they carve out part of the opcode space to squeeze in a new prefix. This will only continue, considering that Intel has proposed another new scheme this year.

jasonwatkinspdx · on Nov 15, 2023

You got some great answers already, but to your first point check out Hennessey and Patterson's books, namely Computer Architecture and Computer Organization and Design.

The latter is probably more suited to you unless you wanna go on a dive into computer architecture itself. There's older editions available for free (authorized by the authors) on the web.

I first read the 3rd edition of Computer Architecture and besides being one of the most clear textbooks I've ever read it vastly improved my understanding of what's going on in there in relation to OoO speculative execution, etc.

epcoa · on Nov 14, 2023

That's a very poor summary of what prefixes are. My advice, just skip the original article which isn't very good or interesting and read taviso's blog that is linked in the top comment (it gives a few concrete examples of these prefixes). They are modifiers that are part of the CPU instruction.

tedunangst · on Nov 14, 2023

Their diagnosis reminds me of what happened when qemu ran into repz ret. https://repzret.org/p/repzret/

ChrisRR · on Nov 16, 2023

I really think HN rules should disallow titles like this. It tells me nothing about what the link is, the URL is even more confusing.

I think for such meaningless titles that the poster should add a small description

hombre_fatal · on Nov 16, 2023

I disagree. We’ve seen what happens when titles have max context: people don’t click the link and they polish their semi adjacent hobby horses in the comments as they would a tweet.

HN goes for a middle ground that promotes intellectual curiosity and link clicking. If you refuse to click the link for obscure titles at least you’re stuck replying to those who did click the link and that’s still better than what we have on the rest of the internet.

Submissions that don’t have the payoff to justify more obscure, whimsical titles fall off the first page unlike this one.

krylon · on Nov 15, 2023

This is very well written. I know little about assembly programming and Intel's ISA, let alone their microarchitectures, but I could follow the explanation and feel like I have a rough understanding of what is going on here.

Does anyone know if AMD CPUs are affected?

quotemstr · on Nov 14, 2023

If the problem really is that the processor is confused about instruction length, I'm impressed that this problem can be fixed in microcode without a huge performance hit: my intuition (which could be totally wrong) is that computing the length of an instruction would be something synthesized directly to logic gates.

Actually, come to think of it, my hunch is that the uOP decoder (presumably in hardware) is actually fine and that the microcoded optimized copy routine is trying to infer things about the uOP stream that just aren't true --- "Oh, this is a rep mov, so of course I need to go backward two uOPs to loop" or something.

I expect Intel's CPU team isn't going to divulge the details though. :-)

atesti · on Nov 15, 2023

I don't understand "ERMS" and "FSRM" and there seems to be nothing good on google about it.

Are these just CPUID flags that tell you that you can use a rep movsb for maximum performance instead of optimized SSE memcpy implementations? Or is it a special encoding/prefix for rep movsb to make it faster? In case of the later, why would that be necessary? How does one make use of fsrm?

tommiegannert · on Nov 15, 2023

Found this [1], which also links to the Intel Optimization Manual [2].

Seems like ERMS was a cheaper replacement for AVX and FSRM was a better version, for shorter blocks.

> Cheapest versions of later processors - Kaby Lake Celeron and Pentium, released in 2017, don't have AVX that could have been used for fast memory copy, but still have the Enhanced REP MOVSB. And some of Intel's mobile and low-power architectures released in 2018 and onwards, which were not based on SkyLake, copy about twice more bytes per CPU cycle with REP MOVSB than previous generations of microarchitectures.

> Enhanced REP MOVSB (ERMSB) before the Ice Lake microarchitecture with Fast Short REP MOV (FSRM) was only faster than AVX copy or general-use register copy if the block size is at least 256 bytes. For the blocks below 64 bytes, it was much slower, because there is a high internal startup in ERMSB - about 35 cycles. The FSRM feature intended blocks before 128 bytes also be quick.

[1] https://stackoverflow.com/a/43837564

[2] http://www.intel.com/content/dam/www/public/us/en/documents/...

ithkuil · on Nov 15, 2023

FSRM is just the name of a cpu optimization that affects existing code.

Choosing an optimal instruction choice and scheduling can be done statically during compile time or dynamically (via chosing one of several library functions at runtime, or jitting).

In order to be able to detect which is the optimal instruction scheduling at runtime you need to know the actual CPU. You could have a table of all cpu models or you could just ask your OS whether the CPU you run on has that optimization implemented.

Linux had to be patched so that it can _report_ that a CPU does implement that optimization.

https://www.phoronix.com/news/Intel-5.6-FSRM-Memmove

rwmj · on Nov 15, 2023

The flags just tell you that, on this CPU, rep movsb is fast so you don't need to use an SSE/AVX-optimized implementation.

writeslowly · on Nov 14, 2023

I noticed the Intel advisory [1] says the following

Intel would like to thank Intel employees:[...] for finding this issue internally.

Intel would like to thank Google Employees: [...] for also reporting this issue.

[1] https://www.intel.com/content/www/us/en/security-center/advi...

narinxas · on Nov 14, 2023

I wonder how much sooner than google did intel employees found this issue

narinxas · on Nov 14, 2023

but what I am really wondering about is how much money (if any) was the vulnerability worth up the moment when google also discovered this?

ajross · on Nov 14, 2023

As described it's just a CPU crash exploit that requires local binary execution. Getting to a vulnerability would require understanding exactly how the corrupted microcode state works, and that seems extremely difficult outside of Intel.

So as described, this isn't a "valuable" bug.

dgacmu · on Nov 14, 2023

It's not super-valuable yet, but it would keep you mount a really nasty DoS on cloud providers by triggering hard resets of the physical machines. Some people would probably pay for that, though it's obviously more interesting to push on privilege or exfiltration.

Particularly since the MCEs triggered could prevent an automatic reboot. Would depend what the hardware management system did - do machines presenting MCEs get pulled?

toast0 · on Nov 14, 2023

If I'm a cloud provider and somebody's workflow is hard resetting lots of my physical machines, I'm going to give them free access to single tenant machines at the very minimum. If they keep crashing the machines that only they run on, I guess that's ok.

dgacmu · on Nov 14, 2023

You can exploit this from a single core shared instance.

So you go and find yourself a thousand cheap / free tier accounts, spin up an instance in a few regions each, and boom, you've taken out 10k physical hosts. And run it in a lambda at the same time, and see how well the security mechanisms identify and isolate you.

Causing a near simultaneous reboot of enough hosts is likely to take other parts of the infrastructure down.

ajross · on Nov 14, 2023

I'm curious what part of this scheme involves "not ending up in jail"? Needless to say you can't do this without identifying yourself. To make this an exploitable DoS attack you need to be able to run arbitrary binaries on a few thousand cloud hosts that you didn't lease yourself.

mschuster91 · on Nov 15, 2023

> I'm curious what part of this scheme involves "not ending up in jail"? Needless to say you can't do this without identifying yourself.

Stolen credit cards are a dime a dozen, and nation state actors can just use their domestic banks or agents in the banks of other countries in a pinch to deflect blame or lay false trails.

If I were Russia or China, I'd invest a lot of money into researching all kinds of avenues on how to take out the large three public cloud providers if need be: take out AWS, Google, Microsoft and on the CDN side Cloudflare and Akamai and suddenly the entire Western economy grinds to a halt.

The only ones who will not be affected are the US government cloud services in AWS, as this runs separate from other AWS regions - that is, unless the attacker gets access to credentials that allow them executions on the GovCloud regions...

ajross · on Nov 15, 2023

> If I were Russia or China, I'd invest a lot of money into researching all kinds of avenues on how to take out the large three public cloud providers

This subthread started with "is this issue a valuable exploit". Needless to say, if you need to invoke superpower-scale cyber warfare to find an application, the answer is "no". Russia and China have plenty of options to "take out" western infrastructure if they're willing to blow things up[1] at that scale.

[1] Figuratively and literally

dgacmu · on Nov 15, 2023

Countries have proven far more reticent to use kinetic options vs. cyberattacks. Or, put differently, we're all hacking each other left and right and the responses have thus far mostly remained in the digital realm.

See, e.g., https://madsciblog.tradoc.army.mil/156-what-is-the-threshold...

> responses are usually proportional to and in the same domain as the provocation

mschuster91 · on Nov 15, 2023

> Or, put differently, we're all hacking each other left and right and the responses have thus far mostly remained in the digital realm.

Which is both good and bad at the same time. Cyber warfare has been significantly impacting our economies and our citizens - anything from scam callcenters over ransomware to industrial espionage - to the tune of many dozens of billions of dollars a year. And yet, no Western government has ever held the bad actors publicly accountable, which means that they will continue to be a drain on our resources at best and a threat to national security at worst (e.g. the Chinese F-35 hack).

I mean, I'm not calling for nuking Bejing, that would be disproportionate - but even after all that's happened, Russia and China are still connected to the global Internet, no sanctions, nothing.

blibble · on Nov 15, 2023

it's not superpower-scale

some bored kid with a couple of hundred stolen credit cards can bring down a significant chunk of AWS/GCP/...

vbezhenar · on Nov 15, 2023

If clouds use shared servers to run their management workloads and if very important companies use shared servers to run their workloads, they would deserve it.

But I don't believe it. People are not that stupid.

mschuster91 · on Nov 15, 2023

> If clouds use shared servers to run their management workloads and if very important companies use shared servers to run their workloads, they would deserve it.

Why target the management plane? Fire off payloads to take down the physical VM hosts and suddenly any cloud provider has a serious issue because the entire compute capacity drops.

dgacmu · on Nov 15, 2023

I mean, you kinda can. There's a depressingly thriving market for stolen cards and things like compromised accounts. A card is a couple of dollars. There are many jurisdictions that turn a blind eye to hacking us companies. Look at how hard it's been to rein in the ransomware gangs and even 'booter' (ddos-for-rent) services.

DoS isn't as lucrative as other things; I assume that most state actors would far prefer to find a way to turn this into a privilege escalation. But being able to possibly take out a cloud provider for a while is still monetizable.

blibble · on Nov 14, 2023

there exist people outside of your jurisdiction

e.g. the GRU

TeMPOraL · on Nov 15, 2023

So Replit, Godbolt, and whatever other cloud-hosted compilers are there?

sweetjuly · on Nov 15, 2023

The blogpost describes that unrelated sibling SMT threads can become corrupted and branch erratically. If you can get a hypervisor thread executing as your SMT sibling and you can figure out how to control it (this is not an if so much as a when), that's a VM escape. The Intel advisory acknowledges this too when they say it can lead to privilege escalation. This is hardly a useless bug, in fact it's awfully powerful!

nrabulinski · on Nov 16, 2023

Intel themselves said it could lead to privilege escalation and a friend of mine (who coincidentally was responsible for this Intel-related talk: https://youtu.be/Zda7yMbbW7s) already managed to get privilege escalation with it, though I’m not sure if he’ll want to share any details, at least for now.

It’s anything but a minor bug and anyone that says so clearly hasn’t worked with CPUs

derefr · on Nov 14, 2023

This assumes that either 1. partners and interested sponsor-state state actors aren't kept abreast Intel's microcode backend architecture, or 2. that there hasn't been at least one leak of this information from one of these partners into the hands of interested APT developers. I wouldn't put strong faith in either of these assumptions.

ajross · on Nov 14, 2023

It does, but the same is true for virtually any such crash vulnerability. The question was whether this was a "valuable exploit", not whether it might theoretically be worse.

The space of theoretically-very-bad attacks is much larger than practical ones people will pay for, c.f. rowhammer.

ethbr1 · on Nov 15, 2023

>> Getting to a vulnerability would require understanding exactly how the corrupted microcode state works, and that seems extremely difficult outside of Intel.

Intel knows exactly how their ROB works.

Therefore Intel knows the possible consequences of this bug and how to trigger them.

If there is a privilege execution path from this, Intel knows. And anyone Intel chose to share it with knew.

Thankfully, since it's public now, the value of that decreases and customers can begin to mitigate.

ajross · on Nov 15, 2023

> If there is a privilege execution path from this, Intel knows. And anyone Intel chose to share it with knew.

No, or at least not yet. I mean, I've written plenty of bugs. More than I can count. How many of them were genuine security vulnerabilities if properly exploited? Probably not zero. But... I don't know. And I wrote the code!

saagarjha · on Nov 15, 2023

Intel said it can be used for escalation if that answers your question.

lmm · on Nov 15, 2023

Did they confirm that it can definitely be used for escalation? The description I saw was "may allow an authenticated user to potentially enable escalation of privilege and/or information disclosure and/or denial of service via local access" which sounds like they're covering all their bases and may not actually know what is and isn't possible.

saagarjha · on Nov 14, 2023

See also Intel’s advisory, which has a description of impact: https://www.intel.com/content/www/us/en/security-center/advi...

> Sequence of processor instructions leads to unexpected behavior for some Intel(R) Processors may allow an authenticated user to potentially enable escalation of privilege and/or information disclosure and/or denial of service via local access.

yborg · on Nov 15, 2023

'Some' appears to be almost any Intel x86 CPU made in the last 6 years.

malkia · on Nov 14, 2023

Konrad Magnusson from Paradox Interactive (Victoria 3) team found something related to that and mimalloc -> https://github.com/microsoft/mimalloc/issues/807

Not sure if fully related, but possibly.

saagarjha · on Nov 15, 2023

Seems unlikely unless they somehow emitted redundant prefixes

lights0123 · on Nov 15, 2023

The article mentions

> This fact is sometimes useful; compilers can use redundant prefixes to pad a single instruction to a desirable alignment boundary.

so I imagine that could happen under the right optimization mode.

ithkuil · on Nov 15, 2023

Why would a compiler prefer a redundant prefix over a nop for alignment?

Vecr · on Nov 15, 2023

It can be faster (at runtime).

ithkuil · on Nov 15, 2023

so basically you're saying that the cpu frontend missed the opportunity to ignore the 0x90 because it was an actual instruction which would be converted into an actual nop uop?

Is this still the case or modern intel CPUs optimize out the nop in the frontend decoder?

Vecr · on Nov 15, 2023

Some compiler writers thought that was the case, if [0] is related to OP. I don't have a "modern" (after 6th gen) Intel CPU to test it on, but note that most programs are compiled for a relatively generic CPU.

[0]: https://github.com/microsoft/mimalloc/issues/807

rasz · on Nov 15, 2023

tedunangst down in the comments linked https://repzret.org/p/repzret/ :

"Looking in the old AMD optimisation guide for the then-current K8 processor microarchitecture (the first implementation of 64bit x86!), there is effectively mention of a “Two-Byte Near-Return ret Instruction”.

The text goes on to explain in advice 6.2 that “A two-byte ret has a rep instruction inserted before the ret, which produces the functional equivalent of the single-byte near-return ret instruction”.

It says that this form is preferred to the simple ret either when it is the target of any kind of branch, conditional (jne/je/...) or unconditional (jmp/call/...), or when it directly follows a conditional branch.

Basically, when the next instruction after a branch is a ret, whether the branch was taken or not, it should have a rep prefix.

Why? Because “The processor is unable to apply a branch prediction to the single-byte near-return form (opcode C3h) of the ret instruction.” Thus, “Use of a two-byte near-return can improve performance”, because it is not affected by this shortcoming."

...

" If a ret is at an odd offset and follows another branch, they will share a branch selector and will therefore be mispredicted (only when the branch was taken at least once, else it would not take up any branch indicator %2B selector). Otherwise, if it is the target of a branch, and if it is at an even offset but not 16-byte aligned, as all branch indicators are at odd offsets except at byte 0, it will have no branch indicator, thus no branch selector, and will be mispredicted.

Looking back at the gcc mailing list message introducing repz ret, we understand that previously, gcc generated: nop, ret

But decoding two instructions is more expensive than the equivalent repz ret.

The optimization guide for the following AMD CPU generation, the K10, has an interesting modification in the advice 6.2: instead of the two byte repz ret, the three-byte ret 0 is recommended

Continuing in the following generation of AMD CPUs, Bulldozer, we see that any advice regarding ret has disappeared from the optimization guide."

TLDR: Blame AMD K8! First x64 CPU. This GCC optimization is outdated and should only be used when specifically optimizing for K8.

doublerabbit · on Nov 14, 2023

Any reason to why its named after the dinosaur from the cartoon Rugrats? Or was that what was on TV at the time?

Maybe I should start hacking while watching Teenage Mutant Ninja Turtles.

AdmiralAsshat · on Nov 14, 2023

If you discover a major processor vulnerability and wanna name it Shredder/Krang/Bebop/Rocksteady, I feel like you will have earned that right!

Blackthorn · on Nov 14, 2023

I think from the memey line "Halt! I am Reptar!" Plus the rep prefix

2OEH8eoCRo0 · on Nov 14, 2023

rep is an assembly instruction prefix

jefc1111 · on Nov 14, 2023

This was a lot more fun than the Google puff piece.

Borg3 · on Nov 15, 2023

Uhm.. Why not padding using NOP ? Looks much more safer that slapping around random prefixes.

muricula · on Nov 15, 2023

Modern Intel CPUs I am led to believe that issuing nops is actually slower than adding prefixes. I think there is work in the backend updating retired instruction counters and other state which still occurs for nops, but decoding prefixes happens entirely in the front end.

When a nop truly is necessary you will see compilers and performance engineers add prefixes to the nop to make it the desired size.

bobim · on Nov 14, 2023

Is it even possible to design a cpu with out-of-order and speculative execution that would have no security issue? Is the future leads to a swarm of disconnected A55 cores each running a single application?

Tuna-Fish · on Nov 14, 2023

This vulnerability was not caused by OoO or speculative execution. It was caused by the fact that x86 was designed 45 years ago, and has had feature after feature piled on the same base, which has never been adequately rebuilt.

The more proximate cause is that some instructions with multiple redundant prefixes (which is legal, but pointless) have their length miscalculated by some Intel CPUs, which results in wrong outcomes.

gumby · on Nov 14, 2023

> It was caused by the fact that x86 was designed 45 years ago, and has had feature after feature piled on the same base, which has never been adequately rebuilt.

Itanic would like to object! Unfortunately it can’t get through the door.

epcoa · on Nov 14, 2023

Not entirely pointless, redundant prefixes are occasionally the useful method for alignment.

TheCoreh · on Nov 14, 2023

A more sensible approach for that use-case would be IMO to have well-defined specialized prefixes for padding, instead of relying on the case-by-case behavior of redundant prefixes. (However I understand that there's almost certainly a good historical reason why this was not the way it was done)

kccqzy · on Nov 14, 2023

The easiest way of doing padding is to add a bunch of `nop` instructions which are one byte each.

If you read the manual, Intel encourages minor variations of the `nop` instructions that can be lengthened into different number of bytes (like `nop dword ptr [eax]` or `nop dword ptr [eax + eax*1 + 00000000h]`).

It is never recommended anywhere in my knowledge to rely on redundant prefixes of random non-nop instructions.

epcoa · on Nov 14, 2023

NOPs are not generally free.

It's a pretty old and well known technique:

https://stackoverflow.com/questions/48046814/what-methods-ca...

Note that this technique is really only legitimate where the used prefix already has defined behavior with the given instruction ("Use of repeat prefixes and/or undefined opcodes with other Intel 64 or IA-32 instructions is reserved; such use may cause unpredictable behavior."), and of course the REX prefix has special limitations. The key is redundant, not spurious. It is not a good idea to be doing rep add for example. But otherwise, there is no issue.

epcoa · on Nov 14, 2023

The prefixes are redundant so it's not really case-by-case behavior. You're just repeating the prefix you would be using anyway in that location.

Using specialized prefixes wastes encoding space for no real gain. You realize on most common processors NOP itself is a pseudo-instruction? Even the apparently meme-worthy (see sibling comment) RISC-V, it's ADDI x0, x0, 0.

tedunangst · on Nov 14, 2023

And then there are CPUs that retcon behavioral changes onto nops.

> Moving a register to itself is functionally a nop, but the processor overloads it to signal information about priority.

https://devblogs.microsoft.com/oldnewthing/20180809-00/?p=99...

_a_a_a_ · on Nov 14, 2023

> A program can voluntarily set itself to low priority if it is waiting for a spin lock

What does this even mean? How can a program do this when thread priority is an OS thing? It's seems just weird.

epcoa · on Nov 15, 2023

Hardware threads as in SMT means thread priority is also a hardware thing.

tedunangst · on Nov 15, 2023

It's an SMT CPU that dynamically assigns decode, registers, etc. https://course.ece.cmu.edu/~ece740/f13/lib/exe/fetch.php?med...

shadowgovt · on Nov 15, 2023

Usually, the historical reason is that adding the logic to do something well-defined when unexpected prefixes are used is going to cost ten more transistors per chip, which is going to add to cost to handle a corner case that almost nobody will try to be in anyway. Far better to let whatever the implementation does happen as long as what happens doesn't break the system.

The issue here is their verification of possible internal CPU states didn't account for this one.

(There is, perhaps, an argument to be made that the x86 architecture has become so complex that the emulator between its embarrassingly stupid PDP-11-style single-thread codeflow and the embarrassingly parallel computation it does under the hood to give the user more performance than a really fast PDP-11 cannot be reliably tested to exhaustion, so perhaps something needs to give on the design or the cost of the chips).

bobim · on Nov 14, 2023

Are new ISA solving this? Time to move to Risc V?

dontlaugh · on Nov 14, 2023

RISC V is not great at this either, with the compression extension being common and variable length.

ARM 64 gets this right, with fixed length 32 bit instructions.

snvzz · on Nov 15, 2023

>ARM 64 gets this right, with fixed length 32 bit instructions.

At the expense of code density, yet RISC-V is easy to decode, with implementations going up to 12-way decode (Veyron V2) despite variable length.

ARM64 hardly "gets it right".

camel-cdr · on Nov 15, 2023

I wouldn't say ARM64 gets it wrong either, I think both are viable approaches.

snvzz · on Nov 15, 2023

Both approaches are viable, but RISC-V's approach is better, as it provides higher code density without imposing a significant increase in complexity in exchange.

Higher code density is valuable. E.g.:

- The decoders can see more by looking at a window of code of the same size, or we can have a narrowed window.

- We can have less cache and save area and power. We can also clock the cache higher, enabled by it being smaller, lowering latency cycles.

- Smaller binaries or rom image.

Soon to be available (2024) large, high performance implementations will demonstrate RISC-V advantages well.

epcoa · on Nov 14, 2023

N/A and No.

iforgotpassword · on Nov 14, 2023

Because they cost no/less cycles compared to NOPs?

tedunangst · on Nov 14, 2023

See http://repzret.org/p/repzret/

akoboldfrying · on Nov 14, 2023

Well, the bug in this specific case (based on the article by Tavis O. linked elsewhere in comments) looks to be the regular kind -- probably an off-by-one in a microcode edge case. That is, here it's not the case that the CPU functions correctly but leaves behind traces of things that should be private in timing side channels, as was the case for Spectre.

trebligdivad · on Nov 14, 2023

Yeh just a fun bug rather than anything too fundamental. Still, it is a fun bug.

nextaccountic · on Nov 14, 2023

I think formal methods could help designing of such machine, if you can write a mathematical statement that amounts to "there is no side channel between A and B"

Or at least put a practical bound on how many bits per second at most you can from any such side channel (the reasoning being, if you can get at most a bit for each million years, you probably don't have an attack)

Then you verify if a given design meets this constraint

tsimionescu · on Nov 14, 2023

Formal methods are widely used in processor design. It is hard to formalize specs to assert behaviors that bugs we haven't thought about don't exist. At least hard while also preserving the property of being a Turing machine.

nextaccountic · on Nov 14, 2023

I know. I mean applying formal methods to this specific problem of proving side channels don't exist (which seems a very hard thing to do and might even require to modify the whole design to be amenable to this analysis)

less_less · on Nov 14, 2023

As a tidbit, this was part of how one of the teams involved in the original Spectre paper found some of the vulnerabilities. Basically the idea was to design a small CPU that could be formally shown to be free of certain timing attacks. In the process they found a bunch of things that would have to change for the analysis to work... maybe in a small system those wouldn't actually lead to vulnerabilities, but they couldn't prove it (or it would require lots of careful analysis). And in big systems, those features do lead to vulnerabilities.

nextaccountic · on Nov 15, 2023

That's amazing!

Do you have some link about the designed CPU?

less_less · on Nov 16, 2023

I'm not sure it ever got built! The Spectre stuff was found during the "how would we even begin to do this" phase. I've seen a fair amount of academic work about formally verifying RISC-V cores though.

bobim · on Nov 14, 2023

What would be the typical size of such a constraint-based problem, and do we have the compute power to translate the rules into an implementation? And what if one forgot a rule somewhere… Deeply interesting subject.

less_less · on Nov 14, 2023

I think you'd want it to be a theorem (in Lean, Coq, Isabelle/HOL or whatever) instead of a constraint problem. So it would be more limited by developer effort than by computational power.

Theoretically you can do this from software down to (idealized) gates, but in practice the effort is so great that it's only been done in extremely limited systems.

mgaunard · on Nov 14, 2023

A program is itself a formal specification of what an algorithm does.

kiitos · on Nov 20, 2023

Programs are implementations of specifications, they're not specifications themselves. Unless you're talking about like Coq or TLA+.

lmm · on Nov 15, 2023

> Is it even possible to design a cpu with out-of-order and speculative execution that would have no security issue?

Yes, of course. But we'd have to put actual effort in, and realistically people wouldn't pay enough extra to make it worthwhile.

SmoothBrain12 · on Nov 14, 2023

Yes, but they won't clock as fast because they'll be waiting for RAM.

bobim · on Nov 14, 2023

We need to keep programs small so they fit in the cache.

moffkalast · on Nov 14, 2023

We need 2 GBs of L1 cache, thus solving the cache miss problem once and for all.

rep_lodsb · on Nov 14, 2023

640K should be enough for anyone ;)

JohnBooty · on Nov 14, 2023

    Is the future leads to a swarm of disconnected A55 
    cores each running a single application?

don't you dare tease me like that

bobim · on Nov 14, 2023

And programmed in… Forth!

varispeed · on Nov 14, 2023

It's going to be a pain for cloud and shared hosting.

Most likely dedicated resources on demand will be the future. Some companies already offer it.

kevincox · on Nov 15, 2023

GCP and AWS both offer non-shared hardware. If people want the extra isolation they just need to pay for it.

rep_lodsb · on Nov 14, 2023

The REX prefix is redundant for 'movsb', but not 'movsd'/'movsq' (moving either 32- or 64-bit words, depending on the prefix). That may have something to do with the bug, if there is any shared microcode between those instructions?

tasty_freeze · on Nov 14, 2023

Benchmarking is always problematic -- what is a good representative workload? All the same, I'd be curious if the ucode update that plugs this bug has affected CPU performance, eg, it diverts the "fast short rep move" path to just use the "bad for short moves but great for long moves" version.

akoboldfrying · on Nov 14, 2023

In the article by Tavis O. linked elsewhere in comments, he suggests disabling the FSRM CPU feature only as an expensive workaround to be taken only if the microcode can't be updated for some reason. That suggests to me that he, at least, expects the update to do better.

ReactiveJelly · on Nov 14, 2023

That would be the conservative thing to do. If there's no limit on microcode updates, if I was Intel, I'd consider doing that first and then speeding it up again later. Based on the 5-second guess that people who update everything regularly will care that we did the right thing for security, and people who hate updates won't be happy anyway, so at least the first update will be secure if they never get the next one.

(I think there is a limit on microcode, they seem conservative to release new ones - I don't remember the details)

kevincox · on Nov 15, 2023

It's a shame that Google didn't publish numbers. They have very good profiling across all of their servers and probably have incredibly high confidence numbers for the real-world impact on this. (Assuming that your world is lots of copying protocol buffers in C++ and Java)

Flow · on Nov 14, 2023

Would be possible to describe a modern CPU in something like TLA+ to find all non-electrical problems like these?

sterlind · on Nov 14, 2023

I've heard Intel does use TLA+ extensively for specifying their designs and verifying their specs. But TLA+ specs are extremely high-level, so they don't capture implementation details that can lead to bugs. And model checking isn't a formal proof, only (tractably small) finite state spaces can be checked with TLC. And even there, you're only checking the invariants you specified.

That said, I'm sure there's some verification framework like SPARK for VHDL, and this feels like exactly the kind of thing it should catch.

dboreham · on Nov 14, 2023

Formal methods have been used in CPU design for nearly 40 years [1] but not yet for everything, and the methods tend to not have "round-trip-engineering" properties (e.g. TLA+ is not actually proving validity of the code you will run in production, just your description of its behavior and your idea of exhaustive test cases).

[1] https://www.academia.edu/60937699/The_IMS_T_800_Transputer

foobiekr · on Nov 15, 2023

CPU designers are so professional about verification and specification that they _dwarf_ software. There's just no comparison.

boxfire · on Nov 14, 2023

There are still bit flipping tricks like rowhammer for RAM, I wouldn't be surprised if there are such vulnerabilities in some CPUs.

sterlind · on Nov 14, 2023

Rowhammer is an electrical vulnerability though. PP specified non-electrical vulns.

quietpain · on Nov 14, 2023

    ...our validation pipeline produced an interesting assertion...

What is a validation pipeline?

ForkMeOnTinder · on Nov 14, 2023

It's described one paragraph earlier.

> I’ve written previously about a processor validation technique called Oracle Serialization that we’ve been using. The idea is to generate two forms of the same randomly generated program and verify their final state is identical.

1f60c · on Nov 14, 2023

Sounds like the real story should be that Google solved the halting problem. :-P

kadoban · on Nov 14, 2023

You're free to solve the halting problem for restricted sets of programs, that doesn't break any rules of the universe.

They also could be just discarding any where it runs for longer than X time, or a bunch of other possibilities.

tgv · on Nov 15, 2023

They might be generating programs that they know will halt. Like: applications with finite loops and such. There are not enough details.

tonfa · on Nov 14, 2023

The blog has a link to https://lock.cmpxchg8b.com/zenbleed.html#discovery which presents the concept.

mgaunard · on Nov 14, 2023

[flagged]

boneitis · on Nov 14, 2023

Rather, it looks to be:

https://en.wikipedia.org/wiki/Instruction_pipelining

From the Zenbleed link in sibling:

> I found this bug by fuzzing, big surprise [..] In fact, vendors fuzz their own products extensively - the industry term for it is Post-Silicon Validation.

That is, "a pipeline to validate the pipeline".

farhanhubble · on Nov 14, 2023

This is such an interesting read, right in the league of "Smashing the stack" and "row hammer". As someone with very little knowledge of security I wonder if CPU designers do any kind of formal verification of the microcode architecture?

saagarjha · on Nov 15, 2023

tommiegannert · on Nov 15, 2023

Nice find. That indeed sounds terrible for anyone executing external code in what they believe to be sandboxes. Good thing it can be patched (and AFAICT, it seems to be a good fix, rather than a performance-affecting workaround.)

eigenform · on Nov 14, 2023

I wonder which MCEs are being taken when this is triggered?

blauditore · on Nov 14, 2023

Can someone give a TL;DR for non-CPU experts? All technical articles seem pretty long and/or complex.

kmeisthax · on Nov 14, 2023

x86 has a builtin memory copy instruction, provided by the combination of the movsb instruction and a rep prefix byte, that says you want the instruction to run in a loop until it runs out data to copy. This is "rep movsb". This instruction is fairly old, meaning a lot of code still has it, even though there's faster ways to copy memory in x86.

Intel added two features to modern x86 chips that detects rep movsb and accelerates it to be as fast as those other ways. However, those features have a bug. You see, because rep is a prefix byte, you can just keep adding more prefix bytes to the instruction (up to a maximum of 16 AFAIK). x86 has other prefix bytes too, such as rex (used to access registers 8-16), vex, evex, etc. The part of the processor that recognizes a rep movsb does NOT account for these other prefix bytes, which makes the processor get confused in ways that are difficult to understand. The processor can start executing garbage, take the wrong branch in if statements, and so on.

Most disturbingly, when multiple physical cores are executing these "rep rep rep rep movsb" instructions at the same time, they will start generating machine check exceptions, which can at worst force a physical machine reboot. This is very bad for Google because they rent out compute time to different companies and they all need to be able to share the same machine. They don't want some prankster running these instructions and killing someone else's compute jobs. We call this a "Denial of Service" vulnerability because, while I can't read someone else's computations or change them, I can keep them from completing, which is just as bad.

BlueTemplar · on Nov 14, 2023

> they all need to be able to share the same machine

Do they ? As these issues keep piling up, it just seems that it's not worth the hassle, and they should instead never do sharing like this...

jrockway · on Nov 14, 2023

To some extent, anyone with a web browser is sharing their machine with other people. That's Javascript.

If you ever download untrustworthy code and run it in a VM to protect your main set of data, that's another case.

The success of cloud computing is from the idea that multiple people can share the same computer. You only need one core, but CPUs come with 128, but with the cloud you can buy just that one core and share 1/128th of the power supply, rack space, motherboard, ethernet cable, sysadmin time, etc. and that reduces your costs. That assumption is all based on virtualization working, though; nobody wants 1/128th of someone else's computer, they want their own computer that's 1/128th as fast. Bugs like these demonstrate that you're just sharing a computer with someone, which is bad for the business of cloud providers.

BlueTemplar · on Nov 15, 2023

My point is that for a sufficiently large user, you can probably use enough of the 128 cores by yourself alone, that it's more worthwhile to do that and turn off these mitigations : both because it removes a whole class of threats, and also because the mitigations tend to have a non-negligible performance impact, especially when first discovered, on chips that haven't been designed to protect against them.

jrockway · on Nov 15, 2023

I very much agree with that. The reality is that cloud providers can replace entire machines with only a small latency blip in your application (or at least GCP can), so if you are doing things like buying 2 core VMs 64 times to avoid losing more than 1% capacity when a machine dies, you probably don't actually need to do that. You could get a 128 core dedicated machine, and then not share it with anyone, and your availability time in that region/AZ probably wouldn't change much.

That said, machines are really monstrously huge these days, and it can be hard to put them to good use. You also miss out on cost savings like burstable instances, which rely on someone else using the capacity for the 16 hours a day when you don't need it. It's a balance, but I'd say "just buy a computer" would be my starting point for most application deployments.

kmeisthax · on Nov 15, 2023

So your argument is that everyone who wants to run a WordPress blog should be paying $320/mo[0] to rent a whole machine just so we can avoid one specific kind of security problem?

[0] Based on the cost to rent an EC2 Dedicated Host (a1 family). See https://aws.amazon.com/ec2/dedicated-hosts/pricing/

kevincox · on Nov 15, 2023

If you don't want to share GCP and AWS both offer ways to rent machines that aren't shared with other users. But for most people the cost isn't worth it because shared machines work well enough and provide much better resource utilization.

Arnavion · on Nov 14, 2023

Some x86 instructions can have prefixes that modify their behavior in a meaningful way. Such a prefix can be applied generally to any instruction, but it's expected to have no effect when applied to an instruction it doesn't make sense with. But it turns out the CPU actually misbehaves in some cases when this is done. Intel released a CPU firmware update to fix it.

asylteltine · on Nov 15, 2023

Interesting write up. The submission needs a better and more accurate title though

ZoomerCretin · on Nov 14, 2023

Intel is a known partner of the NSA. If Intel was intentionally creating backdoors at the behest of the NSA, how would they look different from this vulnerability and the many other discovered vulnerabilities before it?

thelittleone · on Nov 14, 2023

But so is Google. It would be some very crafty theatrics if it's all coordinated.

ZoomerCretin · on Nov 15, 2023

Only the people inserting the backdoor or using it would need to be bound by a National Security Letter's gag order. I doubt anyone at Google (including those subject to NSL gag orders) was made aware of this specific vulnerability.

# Google’s commitment to collaboration and hardware security

## As Reptar, Zenbleed, and Downfall suggest, computing hardware and processors remain susceptible to these types of vulnerabilities. This trend will only continue as hardware becomes increasingly complex. This is why Google continues to invest heavily in CPU and vulnerability research. Work like this, done in close collaboration with our industry partners, allows us to keep users safe and is critical to finding and mitigating vulnerabilities before they can be exploited.

There's a tension between the NSA wanting backdoors and service providers (CPU designers + Cloud hosting) wanting secure platforms. It's possible that by employing CPU and security researchers, Google can tip the scales a bit further in their favor.

gosub100 · on Nov 14, 2023

the backdoor would just be an encrypted stream of "random" data flowing right out the RNG. there's some maxim of crypto that encrypted data is indistinguishable from random bytes.

rep_lodsb · on Nov 14, 2023

My guess is that it would be something that could be exploited via JavaScript. And no JIT would emit an instruction like the one that causes this bug.

tedunangst · on Nov 14, 2023

How would you distinguish this backdoor from one inserted by an unknown partner of the NSA?

frontalier · on Nov 14, 2023

The date on the article is for tomorrow?

bitwize · on Nov 14, 2023

Cereal Killer: Check this out, it's a memo about how they're gonna deal with those oil spills on the 14th.

Acid Burn: What oil spills?

Lord Nikon: Yo, brain dead, today's the 13th.

Cereal Killer: Whoa, this hasn't happened yet!

ShadowBanThis01 · on Nov 14, 2023

Is what? Another useless title.

Lammy · on Nov 14, 2023

> the processor would begin to report machine check exceptions and halt.

I get it https://www.youtube.com/watch?v=dXekDCcw2FE

shadowgovt · on Nov 15, 2023

... it literally took me all Goddamn day. Well done.

Credit where credit is due: Google has some of the best codenames.

mike_d · on Nov 14, 2023

The most awesome part:

> This bug was independently discovered by multiple research teams within Google, including the silifuzz team and Google Information Security Engineering.

tazjin · on Nov 15, 2023

Can we get a better title for this? "Reptar - new CPU vulnerability" or something. I thought it was some random startup ad until I picked up the name somewhere else.

weinzierl · on Nov 15, 2023

If it is changed to what you suggested a question mark would be warranted, because it is not yet clear what can be done with this "glitch" (as the article calls it).

Thorrez · on Nov 15, 2023

Intel says

>A potential security vulnerability in some Intel® Processors may allow escalation of privilege and/or information disclosure and/or denial of service via local access.

https://www.intel.com/content/www/us/en/security-center/advi...

yodon · on Nov 14, 2023

Dupe: https://news.ycombinator.com/item?id=38268043

(As of this writing, this post has more votes, the other has more comments)

dang · on Nov 14, 2023

We'll merge that one hither. Please stand by!

purpleidea · on Nov 15, 2023

In this new Intel microcode bug, Tavis writes:

"We know something strange is happening, but how microcode works in modern systems is a closely guarded secret."

My question: How likely is it that this is an intentional bug door that was added into the microcode by Intel and its government partners?

I don't know enough about microcode and CPU's to be able to answer this myself, so backed-up opinions welcome!

jsnell · on Nov 15, 2023

0%.

This isn't how anyone would backdoor a CPU. An actual backdoor would be done via some instruction sequence that is basically impossible to trigger by accident and hard to detect even when triggered.

fsflover · on Nov 15, 2023

Can you give an example of such sequence? Is it really so easy to hide it given that the microcode can be decoded in principle, https://news.ycombinator.com/item?id=32145324? Why is hiding it in a "bug" a worse solution? Why you can't do both?

jsnell · on Nov 15, 2023

Here's a couple of plausible ways.

One is to make the condition for the backdoor trigger based on multiple (unlikely) instructions in sequence. This bug was triggered by a single instruction, so it would have been a pretty easy case for fuzzing. If you need a sequence of 10 specific instructions in a specific sequence, with no kind of observable side-effects for getting just the first 9 right so that nobody can do a guided search? That's not going to be found just by random chance. It doesn't matter what those instructions are, as long as they're not something that would get generated by real compilers on real programs.

The other is to make it dependent on the data rather than just the static instructions. Like, what if you had the SHA1 acceleration instructions trigger a backdoor iff the output of the hash is a certain value? You could probably even arrange for the backdoor to get triggered from managed and sandboxed runtimes like Javascript, rather than needing to get the victim to run native code. And somebody triggering this by accident would be equivalent to a SHA1 preimage collision.

rvba · on Nov 14, 2023

It looks like Intel was cutting corners to be faster than AMD and now all those thigs come out. How much slower will all those processors be after multiple errata? 10%? 30%? 50%?

In a duopoly market there seems to be no real competition. And yes I know that some (not all) bugs also happen for AMD.

mschuster91 · on Nov 14, 2023

> And yes I know that some (not all) bugs also happen for AMD.

Some of these novel side-channel attacks actually even apply in completely unrelated architectures such as ARM [1] or RISC-V [2].

I think the problem is not (just) a lack of competition (although you're right that the duopoly in desktop/laptop/non-cloud servers for x86 brings its own serious issues, I've written and ranted more often than I can count [3]), it rather is that modern CPUs and SoCs have simply become so utterly complex and loaded with decades worth of backwards-compatibility baggage that it is impossible for any single human, even a small team of the best experts you can bring together, to fully grasp every tiny bit of them.

[1] https://www.zdnet.com/article/arm-cpus-impacted-by-rare-side...

[2] https://www.sciencedirect.com/science/article/pii/S004579062...

[3] https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...

snvzz · on Nov 15, 2023

>Some of these novel side-channel attacks actually even apply in completely unrelated architectures such as ARM [1] or RISC-V [2].

Possible? Yes. But far less likely.

Complexity carries over and breeds bugs. RISC-V is an order of magnitude simpler than ARM64, which in turn is an order of magnitude simpler than x86.

And it is so w/o disadvantage[0], positioning itself as the better ISA.

0. https://news.ycombinator.com/item?id=38272318

bobim · on Nov 14, 2023

So no saving grace from the ISA… humans just lost ground on CPU design, and I suspect the situation will worsen when AI will enter the picture.

mschuster91 · on Nov 14, 2023

> and I suspect the situation will worsen when AI will enter the picture.

For now, AI lacks the contextual depth - but an AI that can actually design a CPU from scratch (and not just rehashing prior-art VHDL it has ... learned? somehow), if that happens we'll be at a Cambrian Explosion-style event anyway, and all we can do is stand on the sides, munch popcorn and remember this tiny quote from Star Wars [1].

[1] https://www.youtube.com/watch?v=Xr9s6-tuppI

nwmcsween · on Nov 15, 2023

Once AI can create itself, we will most likely be redundant.

akoboldfrying · on Nov 14, 2023

Not sure what other errata you're referring to, but this looks like an off-by-one in the microcode. I would expect the fix to have zero or minimal penalty.

arp242 · on Nov 14, 2023

It's not clear to me this fix will have any performance impact. I strongly suspect it will be negligible or zero.

This seems like a "simple" bug of the type that people write every day, not deep architectural problems like Spectre and the like, which also affected AMD (in roughly equal measure if I recall correctly).

kmeisthax · on Nov 14, 2023

Parent commenter might be thinking of Meltdown, a related architectural bug that only bit Intel and IBM PPC. Everything with speculative execution has Spectre[0], but you only have Meltdown if you speculate across security boundaries.

The reason why Meltdown has a more dramatic name than Spectre, despite being the same vulnerability, is that hardware privilege boundaries are the only defensible boundary against timing attacks. We already expect context switches to be expensive, so we're allowed to make them a little more expensive. It'd be prohibitively expensive to avoid leaking timing from, say, one executable library to a block of JIT compiled JavaScript code within the same browser content process.

[0] https://randomascii.wordpress.com/2018/01/07/finding-a-cpu-d...