Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Money quote: "... on an AMD computer then you may set the environment variable MKL_DEBUG_CPU_TYPE=5."

When run on an AMD, any program built with Intel's compiler should have the environment variable set. I don't think there is any downside to leaving it on all the time, unless you are measuring how badly Intel has tried to cripple your AMD performance.



My understanding is that that flag is gone, as of couple of months ago. Intel “fixed” it.


Yes, starting with MKL 2020.01 release. The Wikipedia page has more information and references:

https://en.wikipedia.org/wiki/Math_Kernel_Library#Performanc...

This is quite bad, since a lot of software relies on Intel MKL as the default BLAS implementation (e.g. PyTorch binaries).


Why not patch out the CPUID check as a post compilation step?


That's definitely possible (it probably checks that the manufacturer ID is GenuineIntel), but nobody wants to distribute patched MKL versions, because it most likely violates the MKL license.

It may even be easier to replace the function altogether with LD_PRELOAD.


Indeed works. A simple trace reveals that the function is called mkl_serv_intel_cpu_true().

Make a file with the following content:

    int mkl_serv_intel_cpu_true() {
      return 1;
    }
Compile

    gcc -shared -o libfake.so fake.c
Run

    LD_PRELOAD=libfake.so yourprogram
And it uses the optimized AVX codepaths.

Disclaimer: may not be legal in your country. I take no responsibility.


By the way, if you want make this permanent in a binary, there is no need to set LD_PRELOAD all the time. You could just add a DT_NEEDED entry to the dynamic section. E.g. something like:

    patchelf --add-needed libfakeintel.so yourbinary


Wow. I wasn't quite expecting something as simple as "if CPU is not intel, make everything worse."


I’m sure their justification is that (1) they have no obligation to help AMD, and (2) how could you guarantee AMD implements CPUID the same as Intel (as in: what if AMD implements a feature bit differently?)

Of course, the second one makes no sense as x86 programs run just as fine on AMD as Intel with the same feature set (albeit at different speeds)


You distribute a binary patch for a given MKL release, have your package download the official MKL release and then patch it using the binary patch. Nobody suffers, everyone wins.


No need to patch MKL, just your own binaries post compile.


Exactly what I was thinking. For libs like MKL it should even be feasible to have a database of known binary releases with a patch offset so you can speed up your scientific application using a little patch tool. But even for executables my guess is that it should be relatively easy to programmatically find the relevant check and patch it, unless Intel starts to deliberately obfuscate it, like copy protection checks in games.


How on is the end user supposed to know to do that, know when to do that, or know what to do when the next update to Intel’s compiler that puts cpu-type-5 on the pessimal code path?

Is there something I can add to my bashrc to handle that?


If the environment variable still works, it could be set by a distribution (esp scientific ones) or in your .bashrc: eg. `export VAR=5`.

If that fails, as OP implies, you can still override the function by creating a tiny library with it always returning true. On GNU/Linux systems, you do that using LD_PRELOAD. Perhaps someone's already done that so you just need to download, compile and set it.

Sorry for the lack of specifics, but I do not deal with these libraries, yet I was still hoping to point you in the right direction.


Not any program built using ICC, rather, any program using Intel’s MKL, a set of basic linear algebra libraries (BLAS). This is typically limited to scientific computing applications and libraries.


A lot of software depends on a BLAS as a dependency somewhere.


The statement you were responding to is only referring to the Intel mkl, though. There are many other blas libraries. Where you making a more general statement about some set kf blas implementations? Or the blas interface in general perhaps?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: