Screaming Fast Galois Field Arithmetic Using Intel SIMD Instructions (2013)

nkurz · on Dec 4, 2016

At a glance, this seems like a clear explanation of using standard SIMD instructions to solve the problem, but I think the landscape has changed since this was written such that there are now better approaches.

In 2010, Intel released processors with the a dedicated instruction for "packed carry-less multiplication": https://software.intel.com/en-us/articles/intel-carry-less-m.... Unfortunately, the early implementations (through Sandy Bridge) were slow, and could be beaten by combining other SIMD operations as shown in this paper.

With the Haswell generation released in 2013, though, PCLMULQDQ got much faster. Instead of being able to complete one instruction every 8 cycles, it became possible to finish one every 2 cycles (inverse throughput went from 8 to 2). This paper 2015 paper "Faster 64-bit universal hashing using carry-less multiplications" shows the difference this makes: https://arxiv.org/pdf/1503.03465.pdf

If you are looking for an explanation of how the problem could be solved with the basic building blocks of SIMD, the 2013 Plank, Greenan, Miller paper might be a good resource. But if you are hoping to implement high performance solution for modern processors, the 2015 Lemire and Kaser paper is probably a better starting point.

(This is with the caveat that I don't actually understand the theory or terminology of Galois fields, and maybe there is something about applying it to Erasure Coding that makes the faster PCLMULQDQ approach inapplicable.)

nullc · on Dec 4, 2016

Erasure codes normally deal with fairly small fields (e.g. F(2^8)); which might reduce the usefulness of the carryless multiply instruction.

problems · on Dec 4, 2016

This sounds potentially useful for things like GCM too, would it be more helpful there?

nullc · on Dec 5, 2016

AFAICT, the carryless multiply instruction was pretty much added for GCM's benefit.

nickcw · on Dec 4, 2016

Klaus Post did an implementation of this in Go using with the relevant bits in SSE3 assembler: https://github.com/klauspost/reedsolomon

He references the paper in the amd64 code blob: https://github.com/klauspost/reedsolomon/blob/master/galois_...

ms512 · on Dec 5, 2016

Intel's ISA-L ended up implementing this method. Their implementation is interesting because they took this further and took advantage of knowledge of the instruction latency to pipeline multiple iterations of this method to achieve some really amazing throughput.

For reference, see https://01.org/intel%C2%AE-storage-acceleration-library-open... and the source code at https://github.com/01org/isa-l (see the erasure code folder for details).

In general, I've found Prof. Plank's other papers and presentations very interesting, innovative, and accessible.

profquail · on Dec 4, 2016

Looks like this technique is covered by a patent claim by a 3rd party? (See the link on that page to download their software.)

fdej · on Dec 4, 2016

https://www.techdirt.com/articles/20141115/07113529155/paten...

CalChris · on Dec 4, 2016

How is StreamScale a patent troll?

They're most definitely not an NPE, non-practicing entity. They developed their technology and they sell it. Simply because they're asserting their patent rights for technology they've developed doesn't make them a troll.

I can understand, even if I don't agree, that some people are anti-patent altogether. Fine. We part ways. But if you're going to incent inventors+developers with patents to take risks then you're going to have cases like this. And folks, everything looks obvious after the fact.

Is what StreamScale did novel? Apparently yes. If you have a counter-example to novelty then an ex-parte review will probably be a slam dunk. Also, the paper passed the Usenix papers committee review and they kinda like novelty.

Is what they did non-obvious? Intel announced these instructions in 2007 and, by the Usenix authors' own admission:

  >I have verified that StreamScale's solution is faster than GF-Complete or Jerasure in at least some respects.

There's more to non-obviousness than performance but I'm only looking skin deep at this. GF field theory ain't my thang. What it does look like is that the Usenix paper, of which one of the authors works for a competitor of StreamScale, can't even replicate what would be supposedly obvious.

StreamScale filed their patent Dec 30, 2011. The Usenix paper is from 2013.

Troll is not the word I'd use. This just looks like a first to file patent.

davidtgoldblatt · on Dec 4, 2016

Saying that "patent troll" is synonymous with "NPE" is more restrictive than its common usage; for example, wikipedia defines it as "a person or company that attempts to enforce patent rights against accused infringers far beyond the patent's actual value or contribution to the prior art".

If you take a minute to read the patent in question (https://www.google.com/patents/US8683296), it's like a caricature of "what's wrong with the patent system". It describes using matrix blocking and vector instructions to do the matrix mulitplication step of (e.g.) Reed-Solomon. It's a very old matrix multiplication speedup. If a particular technique for a problem isn't novel, then using that technique when you encounter that problem in a particular domain shouldn't count as novel. By contrast, the Usenix paper described an actual mechanism of implementing the arithmetic using vector instructions for table lookups, something the StreamScale patent never did. After reading the StreamScale patent, I came away with no new insights on implementing ECC. After reading the Usenix paper, I did.

The arguments you make for why StreamScale's things don't make any sense to me. It seems like they are:

- No one's fought the patent legally

- The author was threatened into issuing a notice with a super weak statement that StreamScale is sometimes faster than his one-man-shop academic code.

- One of the authors of the paper (though not the software in question as far as I can tell) works for a StreamScale competitor? Not really sure why that was relevant.

But of course no one wants to get into an expensive legal battle merely to advance the public good in this instance; no one is incentivized to. The fact that no one has doesn't mean that the patent is a good thing. Of course the author can be threatened into a non-apology apology with vague wording. That he did so doesn't mean all of those statements should be accepted uncritically.

A lot of times with patents, copyrights, trademarks, we get into a weird sort of cognitive dissonance. We (society) create a new type of property right (here, ownership of IP) in order to incentive people to engage in some type of behavior (here, inventing things). It's then easy to forget that the property right isn't necessarily the thing society wants to protect; it's a means to and end. But in cases like these, it's plainly obvious that the patent is stifling rather than encouraging innovation. The world is a worse place with Professor Plank threatened into removing his code.

CalChris · on Dec 4, 2016

I didn't say troll => NPE. But I will say that NPE => troll and StreamScale isn't an NPE. You might be able to say they're a troll via some other approach, just not NPE.

The University of Tennessee Knoxville isn't a one man shop. The paper even has three authors. What I'm freaking amazed about is that the UofT intellectual property office didn't send back a registered letter saying:

  You don't fuck with them. You fuck with us.

As for the Wikipedia definition, they also go on to say:

  Patent troll is currently a controversial term, susceptible
  to numerous definitions, none of which are considered
  satisfactory from the perspective of understanding how
  patent trolls should be treated in law.

> If a particular technique for a problem isn't novel

Ok, if you're going to attack the patent on novelty grounds, you have an easy case to make:

  produce an example from before the priority date

Obviousness is harder; you can argue obviousness both ways. But I think the Usenix paper (6 years after Intel's announcement) makes an obviousness attack harder and not easier.

davidtgoldblatt · on Dec 4, 2016

I don't totally follow the conclusion you're trying to draw w.r.t. the definition of a patent troll; clearly there's a lot of people who think that filing and then attempting to enforce a bogus patent applies (though people differ in which patents they think are bogus). Is this some sort of prescriptivist/descriptivist thing?

> The University of Tennessee Knoxville isn't a one man shop.

I mean, sure; but so what? Clearly they were unable or unwilling to fight this battle. Maybe you're right that they should have better protected Plank and society from StreamScale, but I think it makes more sense to put the blame on StreamScale for engaging in this behavior, and our broken patent for encouraging it. Likewise, the paper has multiple authors, but that doesn't really affect the argument at all.

> Ok, if you're going to attack the patent on novelty grounds, you have an easy case to make

I feel like you're deliberately missing the point. Where do you think I'll make this case? I'm certainly not going to spend tens of thousands of dollars fighting the good fight in court. "Technique" in the quote referred to the use of blocking and vector instructions for matrix multiplication. Lots of matrix libraries had such techniques well before the priority date of the patent.

The patent is on using those existing techniques for matrix multiplication, when that matrix multiplication happens to be a part of an error correcting code. I don't think those sorts of patents should exist. (The "matrix multiplication" here is tricky because it's multiplication in a Galois field, but that's not what the patent claims cover).

CalChris · on Dec 4, 2016

> I don't totally follow the conclusion you're trying to draw w.r.t. the definition of a patent troll

Then let me make my view of StreamScale not being a troll perfectly clear. I think they're good faith entrepreneurs taking a risk and developing something new, patenting it ($30-50,000) and selling it. That's not trolling. Shutting down academics from the UofTK is not a realistic source of revenue; it is however reasonably defending their rights which is expected of them now if they're to defend themselves later.

> that's not what the patent claims cover

Careful. You're not a patent attorney (and neither am I). But my attorney told me the engineer to not read claims and certainly not to read them to understand them. (Definitely don't write claims either but that's another matter entirely.) The Supreme Court has said that the claims section is the hardest legal document to write. You both want to avoid prior art (narrow) and preclude inventing around (broad). That's hard to do and hard to understand.

Read the specification for what it's teaching. That's what it's there for. The specification has to teach+support the claims and this one says Galois field all over the place.

Lastly, never say I feel like you're deliberately missing the point. Just don't.

countingteeth · on Dec 5, 2016

> Careful. You're not a patent attorney (and neither am I). But my attorney told me the engineer to not read claims and certainly not to read them to understand them.

Yes, a standard engineer's employment agreement prohibits engineers from performing patent searches.

Isn't this all a good argument that patents are hindering their intended purpose, e.g., PUBLIC DISCLOSURE OF INVENTIONS to advance the state of the art? If claims are written virtually incomprehensibly and are so useless as to not even be allowed as references to actual practitioners, I would argue they are not serving their purpose of public disclosure to advance the state of the art.

CalChris · on Dec 5, 2016

> Yes, a standard engineer's employment agreement prohibits engineers from performing patent searches.

Patent searches are one thing and yeah, big companies like Apple don't want their engineers reading patents at all since it puts them in line for treble damages should they lose an infringement suit. Startups on the other hand, are a different breed and they're not going to fail based on infringement. So you won't see those terms in a startup's employment agreement. Later, maybe.

Also, patent search != reading claims. First, a patent is broken into roughly two halves: specification and claims. Claims are written in a very stylized legal manner which is not meant to teach at all. The specification teaches and supports the claims. Everything that's in the claims must be supported in the specification AND the specification must effectively teach the invention.

So when I say that you shouldn't read claims, I'm saying that you the engineer who's trying to learn from the patent's teaching is wasting your time trying to figure things out from the claims. Figure that out from the specification.

But saying don't read the claims doesn't mean don't do a patent search.

> patents are hindering their intended purpose, e.g., PUBLIC DISCLOSURE OF INVENTIONS to advance the state of the art

This is mistaken. The purpose of patents is to reward the development of new inventions with a temporary monopoly. In exchange for that monopoly, the patent must teach. Public disclosure is not the purpose, it's the other half of the deal. In house lawyers who say never do a patent search are being well, unnecessarily careful.

ianai · on Dec 4, 2016

Makes my blood boil.

the8472 · on Dec 4, 2016

at least mirrors of the code exist, so at least projects that do not have to concern themselves with patents can still benefit from it.

mac01021 · on Dec 4, 2016

For my own edification (I'm not an expert on IP law), what kind of project is free to disregard patents?

Is it just a matter of whether or not the project is a commercial product?

wtallis · on Dec 4, 2016

It's a matter of not operating in a jurisdiction that hands out software patents. Europe is generally much more sensible in this respect.

ape4 · on Dec 4, 2016

Can a program legally do:

    if (country == usa) {
       workSlowPatentFreeWay()
    else {
       workScreamingFastWay()
    }

JoshTriplett · on Dec 4, 2016

The usual approach: uncompiled source code implementing a patented algorithm doesn't directly violate the patent, for the same reason a paper describing a patented algorithm doesn't. So, you can ship the necessary source code, and a compile-time option to build with or without it.

Freetype used to do this with bytecode hinting. Various media libraries do this as well.

the8472 · on Dec 4, 2016

things like ffmpeg can be configured to exclude various kinds of code based on licences. I assume similar build-time flags could be set to exclude patent-encumbered code.

The source would include the code, but binaries could be shipped with and without it.

dom0 · on Dec 4, 2016

What do I (a German) care about some u-boat patents some American may or may not or whatever have on some technique I use in my software?

zardo · on Dec 4, 2016

The patent is for calculating erasure codes, if you want to do Galois field arithmetic for any other reason the patent does not apply.

ChuckMcM · on Dec 4, 2016

Time to file an ex-parte review/challenge.

rch · on Dec 4, 2016

The founder, president, and CEO of StreamScale, Michael H. Anderson, seems to have 'over a dozen granted patents in the storage field', some listed here:

http://www.streamscale.com/cgi-bin/complex2/showPage.plx?pid...

CalChris · on Dec 4, 2016

Time to file an ex-parte review/challenge.

That's gonna cost about $20-40,000 although it'd be somewhat less for a micro-entity.

orasis · on Dec 4, 2016

Is there a TL;DR of how much faster Reed Solomon codes are with this?

nneonneo · on Dec 4, 2016

Needs a (2013) tag.

iamwil · on Dec 4, 2016

What is a Galois Field used for?

aisofteng · on Dec 4, 2016

Reading the abstract will answer your question.

Quequau · on Dec 4, 2016

So:

"Galois Field arithmetic forms the basis of Reed-Solomon and other erasure coding techniques to protect storage systems from failures. Most implementations of Galois Field arithmetic rely on multiplication tables or discrete logarithms to perform this operation. However, the advent of 128-bit instructions, such as Intel's Streaming SIMD Extensions, allows us to perform Galois Field arithmetic much faster. This short paper details how to leverage these instructions for various field sizes, and demonstrates the significant performance improvements on commodity microprocessors. The techniques that we describe are available as open source software."

Rebelgecko · on Dec 4, 2016

GCM (Galois Counter Mode) authenticated encryption

speleo_engr · on Dec 4, 2016

AES and Reed-Solomon codes