Well, don't forget the GPU. Intel Iris is much better than Intel HD. The way I s...

r1ch · on March 23, 2016

I think it's more developers getting lazy. Why bother with SIMD and GPUs when you can write in a high level language like javascript, design with HTML / CSS, deploy your app with embedded libchromium and have a faster time to market. SSDs and fast CPUs have made efficient software somewhat of a rarity these days.

kristianp · on March 24, 2016

It not about being lazy, it's more about how can I get cross-platform GUI support that looks good without having to use C++? The answer seems to be html these days, unfortunately.

creshal · on March 25, 2016

And even Qt is shifting to using Chromium web views to supplant their native widgets.

krylon · on March 23, 2016

On most workloads typical desktop users run (there are many exceptions, of course, but in terms of numbers of people, those are in the minority), the computational speed of the CPU is not a limiting factor any more. I/O, amount of RAM and probably memory bandwidth are far more important; on a typical mid-range desktop machine running Windows, Office and some line-of-business application, I/O completely dominates, at least from what I have observed working as a sysadmin / helpdesk monkey.

TheOtherHobbes · on March 24, 2016

MS Office: yes. Content creation: no.

3D rendering, video editing, and music production all need as many cycles as you can afford, and then some.

VR and all those AI/ML technologies waiting around the corner are going to be even more greedy.

krylon · on March 24, 2016

That is true.

At work, our CAD people use Autodesk Inventor heavily, and that thing will happily gobble up all the CPU cycles one can throw at it. (It it the one example I have first-hand experience with.)

What I meant was that for most users of desktop PCs in an office environment, a faster CPU is not going to make much of a difference in overall system performance. (I might be a little sore because at work, users will sometimes complain there computer is too slow and then demand a new one with an i7, and then I have to explain to them why that is not going to help, while a RAM upgrade and an SSD are going to make a big difference.)

But you are right, there are plenty of examples where there is no such thing as "fast enough". ;-)

vanderZwan · on March 23, 2016

I blame the latter on language expressiveness more than anything else. Here's two pieces of C++ code; one "clean", one fast, taken from [0]:

    void blur(const Image &in, Image &blurred) {
        Image tmp(in.width(), in.height());
        for (int y = 0; y < in.height(); y++){
            for (int x = 0; x < in.width(); x++){
                tmp(x, y) = (in(x-1, y) + in(x, y) + in(x+1, y))/3;
            }
        }
        for (int y = 0; y < in.height(); y++){
            for (int x = 0; x < in.width(); x++){
                blurred(x, y) = (tmp(x, y-1) + tmp(x, y) + tmp(x, y+1))/3;
            }
        }
    }

The optimised-for-speed version (order of magnitude difference):

    void fast_blur(const Image &in, Image &blurred) {
        m128i one_third = _mm_set1_epi16(21846);
        #pragma omp parallel for
        for (int yTile = 0; yTile < in.height(); yTile += 32) {
            m128i a, b, c, sum, avg;
            m128i tmp[(256/8)*(32+2)];
            for (int xTile = 0; xTile < in.width(); xTile += 256) {
                m128i *tmpPtr = tmp;
                for (int y = -1; y < 32+1; y++) {
                    const uint16_t *inPtr = &(in(xTile, yTile+y));
                    for (int x = 0; x < 256; x += 8) {
                        a = _mm_loadu_si128(( m128i*)(inPtr-1));
                        b = _mm_loadu_si128(( m128i*)(inPtr+1));
                        c = _mm_load_si128(( m128i*)(inPtr));
                        sum = _mm_add_epi16(_mm_add_epi16(a, b), c);
                        avg = _mm_mulhi_epi16(sum, one_third);
                        _mm_store_si128(tmpPtr++, avg);
                        inPtr += 8;
                    }
                }
                tmpPtr = tmp;
                for (int y = 0; y < 32; y++) {
                    m128i *outPtr = ( m128i *)(&(blurred(xTile, yTile+y)));
                    for (int x = 0; x < 256; x += 8) {
                        a = _mm_load_si128(tmpPtr+(2*256)/8);
                        b = _mm_load_si128(tmpPtr+256/8);
                        c = _mm_load_si128(tmpPtr++);
                        sum = _mm_add_epi16(_mm_add_epi16(a, b), c);
                        avg = _mm_mulhi_epi16(sum, one_third);
                        _mm_store_si128(outPtr++, avg);
                    }
                }
            }
        }
    }

I don't know about you, but that looks like an error prone maintenance disaster waiting to happen.

And, just for comparison, Halide code that produces results as fast as the second code:

    Func halide_blur(Func in) {
        Func tmp, blurred;
        Var x, y, xi, yi;

        // The algorithm
        tmp(x, y) = (in(x-1, y) + in(x, y) + in(x+1, y))/3;
        blurred(x, y) = (tmp(x, y-1) + tmp(x, y) + tmp(x, y+1))/3;

        // The schedule
        blurred.tile(x, y, xi, yi, 256, 32).vectorize(xi, 8).parallel(y);
        tmp.chunk(x).vectorize(x, 8);

        return blurred;

}

(this is kind of a weird coincidence; last time I replied to you I mentioned Halide[1] as well)

[0] http://people.csail.mit.edu/jrk/halide12/halide12.pdf

[1] http://halide-lang.org/

pcwalton · on March 23, 2016

Definitely, we've failed in programming language design as well. The biggest problem is that we keep sticking with C++ :)

AKrumbach · on March 23, 2016

  (defun language-choice (developer) 
    (if (> (developer-hipness developer) (developer-experience developer)) (lang-du-jour)
      (if (developer-scared-of developer 'parenthesis)
        (c-family-language)
        (lisp-family-language))))

vanderZwan · on March 23, 2016

You work in Rust, right? How would you express the above in that language?

pcwalton · on March 23, 2016

SIMD is a work in progress, but we have the foundations laid for a much more ergonomic approach: http://huonw.github.io/blog/2015/08/simd-in-rust/

Zardoz84 · on March 23, 2016

Switch to DLang

creshal · on March 23, 2016

When graphics are a bottleneck it's usually easier and cheaper to pop an entry-level graphics card than to throw out or replace the whole computer (unless it's a laptop). 3-4 years old low-end graphics cards still beat Iris Pro.

seanp2k2 · on March 23, 2016

Yeah, even though Iris is "good enough for light gaming", integrated still really lags behind dedicated GPUs in how smooth even a desktop experience is.