Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Well, don't forget the GPU. Intel Iris is much better than Intel HD.

The way I see it, this is our fault as software developers. A 2016 PC would be much faster than a 2011 PC if we as software developers made good use of SIMD and GPUs. But we don't.



I think it's more developers getting lazy. Why bother with SIMD and GPUs when you can write in a high level language like javascript, design with HTML / CSS, deploy your app with embedded libchromium and have a faster time to market. SSDs and fast CPUs have made efficient software somewhat of a rarity these days.


It not about being lazy, it's more about how can I get cross-platform GUI support that looks good without having to use C++? The answer seems to be html these days, unfortunately.


And even Qt is shifting to using Chromium web views to supplant their native widgets.


On most workloads typical desktop users run (there are many exceptions, of course, but in terms of numbers of people, those are in the minority), the computational speed of the CPU is not a limiting factor any more. I/O, amount of RAM and probably memory bandwidth are far more important; on a typical mid-range desktop machine running Windows, Office and some line-of-business application, I/O completely dominates, at least from what I have observed working as a sysadmin / helpdesk monkey.


MS Office: yes. Content creation: no.

3D rendering, video editing, and music production all need as many cycles as you can afford, and then some.

VR and all those AI/ML technologies waiting around the corner are going to be even more greedy.


That is true.

At work, our CAD people use Autodesk Inventor heavily, and that thing will happily gobble up all the CPU cycles one can throw at it. (It it the one example I have first-hand experience with.)

What I meant was that for most users of desktop PCs in an office environment, a faster CPU is not going to make much of a difference in overall system performance. (I might be a little sore because at work, users will sometimes complain there computer is too slow and then demand a new one with an i7, and then I have to explain to them why that is not going to help, while a RAM upgrade and an SSD are going to make a big difference.)

But you are right, there are plenty of examples where there is no such thing as "fast enough". ;-)


I blame the latter on language expressiveness more than anything else. Here's two pieces of C++ code; one "clean", one fast, taken from [0]:

    void blur(const Image &in, Image &blurred) {
        Image tmp(in.width(), in.height());
        for (int y = 0; y < in.height(); y++){
            for (int x = 0; x < in.width(); x++){
                tmp(x, y) = (in(x-1, y) + in(x, y) + in(x+1, y))/3;
            }
        }
        for (int y = 0; y < in.height(); y++){
            for (int x = 0; x < in.width(); x++){
                blurred(x, y) = (tmp(x, y-1) + tmp(x, y) + tmp(x, y+1))/3;
            }
        }
    }
The optimised-for-speed version (order of magnitude difference):

    void fast_blur(const Image &in, Image &blurred) {
        m128i one_third = _mm_set1_epi16(21846);
        #pragma omp parallel for
        for (int yTile = 0; yTile < in.height(); yTile += 32) {
            m128i a, b, c, sum, avg;
            m128i tmp[(256/8)*(32+2)];
            for (int xTile = 0; xTile < in.width(); xTile += 256) {
                m128i *tmpPtr = tmp;
                for (int y = -1; y < 32+1; y++) {
                    const uint16_t *inPtr = &(in(xTile, yTile+y));
                    for (int x = 0; x < 256; x += 8) {
                        a = _mm_loadu_si128(( m128i*)(inPtr-1));
                        b = _mm_loadu_si128(( m128i*)(inPtr+1));
                        c = _mm_load_si128(( m128i*)(inPtr));
                        sum = _mm_add_epi16(_mm_add_epi16(a, b), c);
                        avg = _mm_mulhi_epi16(sum, one_third);
                        _mm_store_si128(tmpPtr++, avg);
                        inPtr += 8;
                    }
                }
                tmpPtr = tmp;
                for (int y = 0; y < 32; y++) {
                    m128i *outPtr = ( m128i *)(&(blurred(xTile, yTile+y)));
                    for (int x = 0; x < 256; x += 8) {
                        a = _mm_load_si128(tmpPtr+(2*256)/8);
                        b = _mm_load_si128(tmpPtr+256/8);
                        c = _mm_load_si128(tmpPtr++);
                        sum = _mm_add_epi16(_mm_add_epi16(a, b), c);
                        avg = _mm_mulhi_epi16(sum, one_third);
                        _mm_store_si128(outPtr++, avg);
                    }
                }
            }
        }
    }
I don't know about you, but that looks like an error prone maintenance disaster waiting to happen.

And, just for comparison, Halide code that produces results as fast as the second code:

    Func halide_blur(Func in) {
        Func tmp, blurred;
        Var x, y, xi, yi;

        // The algorithm
        tmp(x, y) = (in(x-1, y) + in(x, y) + in(x+1, y))/3;
        blurred(x, y) = (tmp(x, y-1) + tmp(x, y) + tmp(x, y+1))/3;

        // The schedule
        blurred.tile(x, y, xi, yi, 256, 32).vectorize(xi, 8).parallel(y);
        tmp.chunk(x).vectorize(x, 8);

        return blurred;
}

(this is kind of a weird coincidence; last time I replied to you I mentioned Halide[1] as well)

[0] http://people.csail.mit.edu/jrk/halide12/halide12.pdf

[1] http://halide-lang.org/


Definitely, we've failed in programming language design as well. The biggest problem is that we keep sticking with C++ :)


  (defun language-choice (developer) 
    (if (> (developer-hipness developer) (developer-experience developer)) (lang-du-jour)
      (if (developer-scared-of developer 'parenthesis)
        (c-family-language)
        (lisp-family-language))))


You work in Rust, right? How would you express the above in that language?


SIMD is a work in progress, but we have the foundations laid for a much more ergonomic approach: http://huonw.github.io/blog/2015/08/simd-in-rust/


Switch to DLang


When graphics are a bottleneck it's usually easier and cheaper to pop an entry-level graphics card than to throw out or replace the whole computer (unless it's a laptop). 3-4 years old low-end graphics cards still beat Iris Pro.


Yeah, even though Iris is "good enough for light gaming", integrated still really lags behind dedicated GPUs in how smooth even a desktop experience is.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: