Hacker News new | past | comments | ask | show | jobs | submit login

I see HN pieces on SIMD optimizations for numerical computing every so often. Are these the sort of optimizations that a hobbyist with say a normal laptop (either a macbook air or consumer grade thinkpad) can actually end up tinkering with as well? Or am I going to have to rent some AWS machine and have it run on some specific architecture to be able to mess around with?



Absolutely. Most development work for these things is just fine on a laptop.

Each processor family has its own supported SIMD instructions though. For a long time, SSE / AVX / AVX2 were the only game in town on x86. AVX-512 introduced in Skylake was primarily available only on server parts, and was a mixed bag, and didn't have support on AMD machines until just recently (with Genoa / Zen4).

A modern Mac laptop with an M-series chip will have ARM NEON vector instructions, just like on an iPhone or similar. There is a new-ish ARM vector instruction set called Scalable Vector Extensions (SVE) that Apple doesn't support, but Graviton 3 does. But generally improvements on your laptop often translate onto the server side, and you can definitely test correctness.

There are a few least-common denominator SIMD libraries out there, but realistically for many problems, you'll get the most mileage from writing some intrinsics yourself for your target platform.


The easiest way for a beginner to do SIMD numerical computing now is with Java.

Java's Panama project is working on their 6th preview (JEP-450) for the September 21 (LTS) release, but the released Java 20 previews JEP-426 features, including

- basic math

- transcendental functions (sin, cos...)

- load/store to foreign memory e.g., to interop with C

- compress/expand lanes, for parallel selection or filtering

- bit-wise ops (with count, reverse, compress, expand)

- all translated on Intel to SSE/AVX or on ARM to NEON/SVE

See e.g.,

- <https://jbaker.io/2022/06/09/vectors-in-java/>

- <https://openjdk.org/jeps/426>

- <https://openjdk.org/jeps/448>

- <https://openjdk.org/projects/panama/>


I use optimized SIMD instructions on iOS to do on-device linear algebra for the cooking app i work on as a hobby.

https://theflavor.app/


I only see a landing page with missing images (on mobile). How do I use it?


here’s a TestFlight beta testing link

https://testflight.apple.com/join/WYT0giJd


If you want to dig around with the lower level implementation of using SIMD instructions, you might need to drop to assembly to really get into the guts of things just to see it in action. But if you just want your code to take advantage of it, you can just use Numpy in Python, which will probe what the CPU supports and optimize for that. SIMD is not really new, the first modern opcode to do SIMD was MMX, introduced in Intel in 1997, but the idea has been around since the 70's. So your laptop should at least support that command, and a whole host of others.

https://numpy.org/doc/stable/reference/simd/index.html


This is a Rust crate doc, but it's a nice place to browse for different cpu features and what they unlock: https://docs.rs/target-features/latest/target_features/docs/...

If you go to the x86 cpu list it will tell you what features are enabled if you would optimize for a particular one https://docs.rs/target-features/latest/target_features/docs/...

On linux you can run lscpu to check what cpu features are detected.


Basically all CPUs have simd nowadays. You can even make use of it in wasm/the web. (Though currently not supported on iOS.) Just compile with ‘emcc -msimd128 main.c’


Yes, regular computers have had SIMD for decades. Ex: Intel was SSE and now AVX. Fancier servers will have the same, except maybe wider and some other tricks.

If you go down the vectorization path, unless you are a crufty CPU DB vendor, I'd skip ahead to GPU for the same. Ecosystem is built to do the same but faster and easier. CPU makers are slowly changing to look more like GPUs, so just skip ahead...


If you're asking how widely available SIMD is, it has been common in consumer hardware for 2 decades. To perform SIMD instructions manually you will need a compiled language that supports it, like Rust or C. But the compiler can actually implement it for you as an optimization.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: