One programmer costs ~ 300 cores. (Bay Area salary with benefits vs. AWS EC2 pri...

anon1385 · on Sept 20, 2013

What on earth makes you think that all problems can be scaled out to as many cores as you like? This is exactly the web developer mindset that people are referring to elsewhere in this thread.

PaulHoule · on Sept 20, 2013

Well you're trading one kind of B.S. for another kind of B.S.

There's a lot of B.S. that comes with C++, and there's an entirely different kind of B.S. involved with writing things in Java + Hadoop.

Personally I stay out of the C/C++ ecosystem as much as I can because threads are never really going to work in the GNU world because you can't trust the standard libraries never mind all the other libraries.

The LMAX Disruptor shows that if you write code carefully in Java it can scream. They estimate that they could maybe get 10% better throughput in C++ at great cost, but the average C++ programmer would probably screw up the threading and make something buggy, and a C++ programmer that's 2 SD better than the mean would still struggle with cache line and other detailed CPU issues.

The difference between the LMAX Disruptor and the "genius grade" C++ I've seen is that the code for the Disruptor is simple and beautiful, whereas you might spend a week and a half just figuring out how to build a "genius grade" C++ program, taking half an hour each pop.

figglesonrails · on Sept 20, 2013

Really, you're trading execution speed for productivity, not "BS for BS" when you use these so-called "web languages". In some cases, there are other concerns such as memory usage or software environment (e.g. trying installing a Java program on a system than doesn't allow JIT compilations).

Some problems can scale out, but only if latency between nodes is low enough and bandwidth is high enough. For example, an MMO server would not function as well if there was a 50 msec ping between nodes. You may or may not have control over that depending on what cloud service you use.

These are real concerns and should not be trivialized as "BS for BS" or "throw more virtualized CPU cores at it". Every problem is different; it should be studied and the best solution for the problem applied.

PaulHoule · on Sept 20, 2013

I'm talking about parallel programming, in general, as a competitor to high-speed serial programming.

In that case it is a matter of one kind of BS (wondering why you don't get the same answer with the GPU that you do with the CPU, waiting 1.5 hours for your C++ program to build, etc.) vs another kind of BS (figuring out problems in parallel systems.)

Not all problems scale out like that, but you can pick the problems you work on.

fauigerzigerk · on Sept 20, 2013

Java performs well as long as you're CPU bound. But memory is becoming cheap enough to keep substantial parts of a database in memory. Avoiding all that IO translates into enormous performance gains. Unfortunately, in Java (using the Oracle VM) you can't keep a lot of data in memory without getting killed by garbage collector pauses.

cobrausn · on Sept 20, 2013

The genius of disruptor was in the data structure and access mechanisms, plus the fact that it worked for single producer / single consumer circumstances. It is certainly not an example you can tout for how Java is as fast as C/C++ under all circumstances if you are 'careful'. I think you are just falling prey to confirmation bias w.r.t. 'beauty' of code.

Having said that, C++ can be ugly as hell.

PaulHoule · on Sept 20, 2013

I'd say in some real life situations the gap is less than people think.

Back in the 1990's, when JIT compilation was new, I wrote a very crude implementation of Monte Carlo integration in Java that wasn't quite fast enough to do the parameter scan I wanted. I rewrote the program in C and switched to a more efficient sampling scheme.

When it was all said and done, I was disappointed with the performance delta of the C code. Writing the more complex algorithm in Java would have been a better use of my time.

waps · on Sept 22, 2013

But there are several things java insists on that are going to cost you in performance in java that are very, very difficult to fix.

1) UTF16 strings. Ever notice how sticking to byte[] arrays (which is a pain in the ass) can double performance in java ? C++ supports everything by default. Latin1, UTF-8, UTF-16, UTF-32, ... with sane defaults, and supports the full set of string operations on all of them. I have a program that caches a lot of string data. The java version is complete, but uses >10G of memory, where the C++ version storing the same data uses <3G.

2) Pointers everywhere. Pointers, pointers and yet more pointers, and more than that still. So datastructures in java will never match their equivalent in C++ in lookup speeds. Plus, in C++ you can do intrusive datastructures (not pretty, but works), which really wipe the floor with Java's structures. If you intend to store objects with lots of subobjects, this will bit you. As this wasn't bad enough java objects feel the need to store metadata, whereas C++ objects pretty much are what you declared them to be (the overhead comes from malloc, not from the language), unless you declared virtual member functions, in which case there's one pointer in there. In Java, it may (Sadly) be worth it to not have one object contain another, but rather copy all fields from the contained object into the parent object. You lose the benefits of typing (esp. since using an interface for this will eliminate your gains), but it does accelerate things by keeping both things together in memory.

3) Startup time. It's much improved in java 6, and again in java 7, but it's nowhere near C++ startup time.

4) Getting in and out of java is expensive. (Whereas in C++, jumping from one application into a .dll or a .so is about as expensive as a virtual method call)

5) Bounds checks. On every single non-primitive memory access at least one bounds check is done. This is insane. "int[5] a; a[3] = 2;" is 2 assembly instructions in C++, almost 20 in java. More importantly, it's one memory access in C++, it's 2 in java (and that's ignoring the fact that java writes type information into the object too, if that were counted, it'd be far worse). Java still hasn't picked up on Coq's tricks (you prove, mathematically, what the bounds of a loop variable are, then you try to prove the array is at least that big. If that succeeds -> no bounds checks).

6) Memory usage, in general. I believe this is mostly a consequence of 1) and 2), but in general java apps use a crapload more memory than their C++ equivalents (normal programs, written by normal programmers)

7) You can't do things like "mmap this file and return me an array of ComplicatedObject[]" instances.

But yes, in raw number performance, avoiding all the above problems, java does match C++. There actually are (contrived) cases where java will beat C++. Normal C++ that is. In C++ you can write self-modifying code that can do the same optimizations a JIT can do, and can ignore safety (after proving to yourself what you're doing is actually safe, of course).

Of course java has the big advantage of having fewer surprises. But over time I tend to work on programs making this evolution : python/perl/matlab/mathematica -> java -> C++. Each transition will yield at least a factor 2 difference in performance, often more. Surprisingly the "java" phase tends to be the phase where new features are implemented, cause you can't beat Java's refactoring tools.

Pyton/Mathematica have the advantage that you can express many algorithms as an expression chain, which is really, really fast to change. "Get the results from database query X", get out fields x,y, and z, compare with other array this-and-that, sort the result, and get me the grouped counts of field b, and graph me a histogram of the result -> 1 or 2 lines of (not very readable) code. When designing a new program from scratch, you wouldn't believe how much time this saves. IPython notebook FTW !

PaulHoule · on Sept 23, 2013

Hadoop and the latest version of Lucene come with alternative implementations of strings that avoid the UTF16 tax.

Second, I've seen companies fall behind the competition because they had a tangled up C++ codebase with 1.5 hour compiles and code nobody really understand.

The trouble I see with Python, Mathematica and such is that people end up with a bunch of twisty little scripts that all look alike, you get no code reuse, nobody can figure out how to use each other's scripts, etc.

I've been working on making my Java frameworks more fluent because I can write maintainable code in Java and skip the 80% of the work to get the last 20% of the way there with scripts..

mempko · on Sept 20, 2013

Try c++11

raverbashing · on Sept 20, 2013

"What on earth makes you think that all problems can be scaled out to as many cores as you like"

It certainly can't. And for example you can't do that on an app that runs on a phone, for example.

But, when possible, this is the cheapest way to do it.

Not to mention other cases where it's the "only" way to do it (like, CPU heavy processing, video processing, simulations, etc). A smart developer can help with a %, but with limited returns.

For example, Facebook invested on their PHP compiler since their server usage would only increase, and the resources for that gain (in terms of people) are more or less constant.

shin_lao · on Sept 20, 2013

Cores scale linearly while programmers don't.

It does not: http://en.wikipedia.org/wiki/Neil_J._Gunther#Universal_Law_o... and http://en.wikipedia.org/wiki/Amdahl%27s_law

Not only it does not but sometimes it doesn't scale at all. Scaling software is a very difficult problem, especially when you have low latency issues.

And what happens when you write desktop software? You ask your customers to open Amazon accounts?

So I set the guideline at 1000 cores. Do not make performance-productivity tradeoffs below that.

In which field do you work?

abelsson · on Sept 20, 2013

I'm sorry, but I can't outsource the computation of real time ultrasound denoising to EC2. Nor can I do the work of my LTE radio modem on EC2. Clouds and scaling out on clusters are great answer to a certain set of problems, but far from a panacea.

tlb · on Sept 20, 2013

In that case, you are running on >1000 cores and my guideline gives the right answer.

figglesonrails · on Sept 20, 2013

I'm pretty sure ultrasound denoising does not run on > 10000 cores. The point was more of "cloud this, cloud that" solves many types of problems but real-time is not one of them. Cf: why you can't play a game on the cloud by sending JPG screenshots of the rendered scene 60 times per second while polling a joystick at 120 Hz and send that to the cloud.

staunch · on Sept 20, 2013

http://en.wikipedia.org/wiki/Cloud_gaming#Video_streaming

figglesonrails · on Sept 20, 2013

I don't know why you bothered to provide a link and no commentary. This proves that people have attempted it, not that it is common nor that it produces equivalent results. I'm well aware that for some games, it can work "ok". Let me know when you get uncompressed 2K video at 60 FPS and <16 msec input response.

staunch · on Sept 20, 2013

You suggested that it couldn't be done for games, so I gave you a link showing you that it can and has been done. There are number of reasons why that isn't a popular way to play games, but it's not primarily a technical issue.

You certainly could pull off this architecture in a setting with a good LAN connection. Though I'm not really arguing that it's necessarily a great way to go.

figglesonrails · on Sept 20, 2013

Yeesh, way to mince words. Yes, it "can" be done... I guess what I meant by "can't" was "provides a poor experience to the point where is generally unacceptable and therefore not a solution and hence the impossibility of the solution could be just as easily expressed as `can't be done [right now]`".

asdasf · on Sept 20, 2013

>There are number of reasons why that isn't a popular way to play games, but it's not primarily a technical issue.

It is entirely technical. Everyone who tried it shit all over it because it was a terrible experience, entirely due to limitations of internet connectivity (both latency and bandwidth).

staunch · on Sept 20, 2013

I suppose the fact that many users have shitty internet connections is a technical issue. But for users with a high bandwidth low latency connection (FiOS, LAN, dedicated fiber) there's really no technical reason it can't work quite well.

The primary reasons these services didn't take off is that many users have shitty connections and almost everyone has a fast enough computer.

zeteo · on Sept 20, 2013

> One programmer costs ~ 300 cores. [...] If your code consumes less [than 300 cores], you should care about productivity. Adding cores is easy while adding programmers is hard. [...] Do not make performance-productivity tradeoffs below [1000 cores].

Computing time is cheap these days, but this kind of math doesn't make any more sense than comparing feet to miles per hour. Programming time saved is a one-time gain, whereas the performance loss in continuous. Let's say you write code for a single core, you spend 10 hours instead of 20 by accepting a 50% slowdown, and manage to compensate by adding an extra core. Depending on how long this code runs, there will be a point at which the ongoing costs of an extra core surpass the one-time saving of 10 hours' programming.

If all you need is a working prototype then sure, performance shortcuts may be worthwhile. (Although you can't calculate trade-offs as suggested). But for long-term production systems they will always start hurting at some point.

ned_roberts · on Sept 20, 2013

Unless you never make changes to your software, development time is as continuous as run time. As someone pointed out below, the fact that Google finds value in Go seems to point to there being enough of a cost in development time that they're willing to sacrifice run time to reduce it.

That said, what makes sense for Google doesn't necessarily make sense for the rest of us.

zeteo · on Sept 20, 2013

Yeah, as I said it depends on how long the code ends up running. For sections of the code that you end up changing all the time you'll have a much higher proportion of developer time to "running core" time, so you obviously can reduce costs more on the productivity side. But there's no simple, 300-cores-per-developer math for it.

waps · on Sept 22, 2013

Generally I'd say it depends on the component. There are tons of components that never change, at least in my apps. Splitting them off and moving them to java or C++ provides gigantic gains.

In practice, I think a lot of programmers simply don't know how to call C/C++ from python, even though it has become so much easier since ctypes. Thus doing this is derided as a waste of time, dangerous and whatever. You'll soon see that doing this has other advantages (like type safety).

hobb0001 · on Sept 20, 2013

Not all code is server-side code that can be addressed by elastic computing. Most C++ programmers work on desktop, mobile, and embedded programs. In such a domain, it is very likely that your code will be running on more than 10,000 cores on launch day (and with little or no intercommunication between the cores).

easytiger · on Sept 20, 2013

> One programmer costs ~ 300 cores. (Bay Area salary with benefits vs. AWS EC2 pricing)

That is gibberish. If I work at a company and we have 5 machines with 12 cores apiece already in place that is what I have to make do with. We don't all live in an elastic world.

Further to that the scaling of large single computations across cores is a costly and often pointless exercise.

tlb · on Sept 20, 2013

Yeah, companies do that all the time. Spend 3 months of a good engineer's time to save a few thousand bucks in hardware because the budget allocation is fixed.

In the long run, I think companies that value human capital appropriately will win.

jre · on Sept 20, 2013

A lot of problems are going to require quite a bit of engineering to scale to 300 cores...

I mean, if you're just serving some simple webpage, it's easy to just throw servers at it. But if you want to implement say a distributed k-means, the algorithm is different than for the single-threaded case. Not everything is easily scalable...

asdasf · on Sept 20, 2013

Yeah, companies do that all the time. Spend hundreds of thousands of dollars on hosting and hardware costs to save a few hours of a programmer's time because they blindly believe in the completely unfounded "truth" you are parroting.

PaulHoule · on Sept 20, 2013

You're lucky.

Some places are so inelastic that developers get hand-me-down laptops from sales people who couldn't sell, or when they buy a new machine from Dell they get one with two cores.

dekhn · on Sept 20, 2013

So.... they pay you a salary right? So long as they don't prevent it... you can take some of the money your company allocates to salary, and reallocate it to buying a few cheap VMs.

PaulHoule · on Sept 20, 2013

I was working as an enumerator for the U.S. Census in the year 2000 and one of the people on my team realized that for 2 hours worth of pay she could buy office supplies that would save us all (and the government) 60 hours worth of work.

She was stressed because there wasn't any official channel for us to buy office supplies other than the stuff they sent us.

I told her to buy the office supplies and say that she worked another 2 hours; this was breaking the rules but this did not strike me as at all unethical.

Now, not long after the 2008 crunch I was getting pissed about how long builds took on my cheap laptop (on which I was running both the client and server sides of a complex app).

Getting a better machine from management was out of the question, but I liked other aspects of the job, and 2009 wasn't the best time to go job seeking. So I bought myself a top end desktop computer and three inexpensive monitors.

When I left that company they wanted to buy the machine off me so as to keep all the proprietary code and data on it, but as things worked out, the value of my own proprietary code and data on that machine was worth a lot to me so I kept it, and fortunately things never went to court.

This type of decision has risks (for instance, you don't want to be the guy who loses a machine with social security numbers on it and forces his employer to pay for credit monitoring for 70,000 people) but it can be the right thing to do sometimes.

pyoung · on Sept 20, 2013

I am surprised that they let you use your own machine. I am also surprised that they didn't get the code and data from you (or take you to court), as most employment contracts state that all work done by the employee is considered company property.

carbocation · on Sept 20, 2013

You could also contribute to the company's rent payment. It would have a similar wealth transfer effect.

mempko · on Sept 20, 2013

If a company is making a profit, they owe more to you than you to them....

dekhn · on Sept 20, 2013

you're profitsharing, right? so if you go around the company's self limiting policies, and make more profit by using your salary... then it's a net win to you.

twoodfin · on Sept 20, 2013

What if I'm distributing my code to thousands or millions of users who are going to be running it dozens of times a day? Suddenly, spending an hour to make it run just a few seconds faster looks like a very good trade-off.

pault · on Sept 20, 2013

Right tool for the job, etc etc

dllthomas · on Sept 20, 2013

> Cores scale linearly while programmers don't.

Cores scale linearly, but performance may not scale linearly to cores.

awj · on Sept 20, 2013

Ok, sweet, so we just need to tell game developers to start packaging additional processors with their games.

You need to get out more, you've completely lost perspective.

speeder · on Sept 20, 2013

And then how my players (I make games) use more cores?

They hire Amazon before playing my physics game?

Oh, I know, just put the game in the cloud...

Then players will become angry because the game only works when internet works and they cannot play at the underground train while commuting.

Sorry, but sometimes I have impression that web developers forget there is other sort of development that exists.

figglesonrails · on Sept 20, 2013

Depending on how you value things, being able to spend fewer clock cycles may be more "environmentally responsible" in the long run, though data centers do run a pretty tight ship. ;)

jefftk · on Sept 20, 2013

Setting aside the validity of this comparison, keep in mind that Go is coming out of Google, a place where running on 1000 cores is dinky.

(Note: I work for Google, but on open-source server software.)

mempko · on Sept 20, 2013

This is going to become less true when/if power costs go up

asdasf · on Sept 20, 2013

>Cores scale linearly

You don't seriously think that do you? People struggle with concurrency and parallelism all the time. Tons of problems we have no way to scale up with more CPU cores, it is an open research problem to try to find ways to do so. Pretending that you can just buy fast is a huge mistake that costs millions of dollars.