Accelerating Deep Neuroevolution: Train Atari in Hours on a Single Computer

ironrabbit · on April 23, 2018

> a run that takes 1 hour on 720 cores can be run on the CPUs of a 48-core personal computer in 16 hours

Is calling a 48-core machine a "personal computer" a bit of a stretch, or am I missing something?

singhrac · on April 23, 2018

I think the intended implication is that it's accessible to individual researchers that don't work at Facebook/Brain/DeepMind with gigantic clusters (and the engineers able to maintain those clusters). Unfortunately deep learning is still expensive, but many areas of science are (most biology research, high energy physics, etc.). That's why we have grants.

ansible · on April 23, 2018

If you don't care about your electricity bill, you can buy used Xeon workstations on your favorite auction site.

A couple years ago, we picked up a HP Z800 workstation with two 12-core processors and 24GB of RAM for $400.

The main downside is that if the power supply (proprietary) or motherboard die, there's not easy fix for it other than buying another of the same model.

wpietri · on April 23, 2018

Absolutely!

I wanted something to host my private and experimental stuff. A friend of mine who tracks used server auctions pointed me at some servers that were Dell systems clearly custom built for one of the large SV companies. I paid $450 for a box with 72GB of RAM and 16 cores. At that price, I just bought a second one for spares. Which I haven't needed to use in the 4 or 5 years I've been using it.

It has been great.

telchar · on April 24, 2018

Would you care to share the server auction site? I've thought about doing something like that.

wpietri · on April 24, 2018

This was 4+ years ago, so I have no idea of the current market, but I bought from this eBay vendor:

http://stores.ebay.com/deepdiscountservers

foobiekr · on April 24, 2018

WeirdStuff also used to be great for this.

currymj · on April 23, 2018

i also raised my eyebrows at this. i think "workstation" might have been a better choice of word than "personal computer". but people, especially in the sciences, do actually buy desktop computers with these sorts of specs -- they run like $10k-20k, probably.

haZard_OS · on April 23, 2018

Closer to the 20k end, I'm afraid. More options are opening up now with GPU processing power on less expensive desktops ( ~ $13k ).

snovv_crash · on April 23, 2018

Dunno, check out [1]. You can configure for 48 cores and 256GB ECC DDR4 for under $10k.

[1] http://www.titancomputers.com/Titan-W375-Dual-AMD-EPYC-Proce...

Symmetry · on April 23, 2018

I think they're using "personal computer" there only in the narrow sense that a dual socket tower workstation conforms to the PC spec. But then they go on to GPU performance which allows a machine you could reasonably have as your home computer to do this quickly.

segmondy · on April 23, 2018

Dunno, I have a 32 core, 128gb ram that I bought for $1000. I have a Xeon phi 60core card that I bought for $200 sitting by it ready go go in when I get the chance. When GPU prices drop, I'm probably going to add 2 1080Ts.

I'm just a regular computer user.

derefr · on April 23, 2018

You’re a computer hobbyist, you mean. You have to have done a whole lot of bargain-hunting to set that build up at that price. The “regular computer user” doesn’t bother with that—they just buy what they can afford at their local shop, which usually means a rather crap laptop.

(Mind you, it doesn’t take any more technical knowledgeability than most users to bargain-hunt like you have; but knowing that the product can be of varying quality at a given price, and caring enough to put in the time to get something of good quality, makes you an exception, in the same way that people who e.g. comparison-shop for quality suits or shoes are an exception. It’s a general mentality that most people don’t subscribe to—neither the “buying the best” of the rich, nor the “economizing” of the poor, but rather the optimizing ROI of the economics-minded. And even an economist will only apply this kind of thinking to cases where they actually believe this thing will affect their life enough that the research time-cost is worth it—i.e. when it’s a hobby of theirs. Economics-minded computer hobbyists are a rather more niche set than “regular computer users!”)

taeric · on April 23, 2018

Define "regular computer user." What are you doing with these machines that comes close to using all of the cores?

nl · on April 24, 2018

I don't know what the OP is doing, but something like XGBoost scales really well across multiple cores.

I've used my 2014 MBP with a 4 core i7 (plus hyperthreading) plus 3 desktop i5 boxes for over 24 hours straight on the same XGB task before. The 8 CPU threads on the MBP lets it outperform more recent machines on this kind of task.

taeric · on April 24, 2018

Oh, I get that there are things you can do with a setup like this. I challenge that this is a "regular computer user" level of thing to do, though. Mining bitcoin is not a regular user task. Nor is being a render farm for an animated movie. :)

nl · on April 24, 2018

It's on a post called "Accelerating Deep Neuroevolution". I'm pretty sure it's a regular computer for anyone doing that....

I think the point of "regular" was that it's not exactly some extreme budget needed to do this.

taeric · on April 24, 2018

That makes some sense. However, even in an article about Olympic athletes, I would expect a "regular person" to not be one of them.

nl · on April 25, 2018

I think there’s an awful lot of emphasis on the word regular and not much on the “32 cores for $1000” which - given the context of this story - is much more relevant and interesting.

taeric · on April 25, 2018

I specifically quoted "regular computer user." This thread was specifically about that being a personal computer.

I mean, yes, I can see how an individual can get one. And it is definitely neat to see them pushed to their limits. But calling yourself a regular computer user with a machine that strong is a stretch.

banachtarski · on April 23, 2018

*eyeroll

You're also one of "those" people

mysterydip · on April 23, 2018

Where'd you get them? I could use an upgrade for that price.

segmondy · on April 24, 2018

32 core from Amazon Used, the Xeon phi when it was on sale 2 yrs ago.

igravious · on April 23, 2018

It's more than a bit of a stretch I reckon. That's how the all but hermetically sealed SV tech scene warps some engineers sense of what a regular nerd has access to I suppose.

lopmotr · on April 23, 2018

If you only had 10% of the cores, that hour would become a day, which is still pretty good.

minimaxir · on April 23, 2018

> Modern desktops also have GPUs, however, which are fast at running deep neural networks (DNNs).

That's the rub: deep learning is restricted to CUDA/cuDNN/NVIDIA GPUs, which exclude a large amount of desktops with AMD graphics (e.g. all modern Macs).

Current cloud GPU prices have dropped enough such that this approach may be pretty effective with spot/preemptible instances.

singhrac · on April 23, 2018

Does anyone know why AMD has dropped the ball so hard on not supporting Tensorflow/PyTorch? It seems like the kind of work 2-5 talented engineers could do (i.e. push harder on HipM, reimplement CuDNN, etc.), and would seriously impact sales, right? I probably misunderstand either the hardware limitations or business side from AMD's perspective.

Nokinside · on April 23, 2018

You misunderstand the difficulty in the software side.

Developing competitive CUDA equivalent is much harder task than what you assume. There needs to be lots of tooling and need for optimizations in software to make these low level libraries high performance. The amount of testing and benchmarking to make the right choices may take many people working full time. Implementing the functionality is not enough.

AMD has had apis for a long time and it has an has an open source deep learning stack but it's not good enough. AMD also has CUDA to HIP converter but the results are not competitive and miss features.

Both AMD and Intel are probably getting there eventually and Nvidia is cashing the monopoly phase before the prices stop.

currymj · on April 23, 2018

i don’t know that it has to be competitive in speed right away as long as it works and is easy to install.

i’d be thrilled to have the option of doing deep learning on an AMD card even if it ran at 1/3 the speed on a comparable (but probably somewhat more expensive) NVIDIA card. it would open up a lot of options even if it were still less economical in throughput/dollar.

if nothing else, how many machine learning researchers would like to prototype things on their MacBook Pros?

nl · on April 24, 2018

For prototyping (as in making sure you have your matrix shapes right) the CPU versions of TensorFlow and PyTorch are fine.

They are even ok for some kinds of training (eg, if you are doing transfer learning with a fixed embedding/feature representation and have a pretty small parameter space to learn).

But it's so easy to fire up a cloud instance at Paperspace or somewhere and push some code across.

singhrac · on April 23, 2018

That's fair enough, and I was being a little flippant of a very difficult problem. I guess I was more curious about whether anyone has any insight into their possible upside (i.e. whether they can scale production if demand grows, how big a % of Nvidia's revenue is from scientific computing, etc.).

currymj · on April 23, 2018

They are working on it pretty actively [1], you can see a progress chart on their website. I think some of the PyTorch devs have said they'd welcome AMD compatibility. If you go to the ROCmSoftwarePlatform github page you can see some of their ports, most of which seem to be actively developed.

[1] https://rocm.github.io/dl.html

fenollp · on April 23, 2018

Coriander [1] looks promising: "Build NVIDIA® CUDA™ code for OpenCL™ 1.2 devices".

[1] https://github.com/hughperkins/coriander

21 · on April 23, 2018

> cuDNN/NVIDIA GPUs, which exclude a large amount of desktops with AMD graphics

And whose fault is that?

fluxsauce · on April 24, 2018

The question isn't about fault, it's about accessibility and availability of commodity hardware. Consider the position of someone who made a financial decision to go with AMD to build a workstation and didn't have the budget to swap out the video card later.

cshenton · on April 24, 2018

If anyone’s interested, training a single game is affordable on spot instances.

I have a replication including cloud formation scripts here https://github.com/cshenton/neuroevolution

yazr · on April 24, 2018

Nice work

Can u elaborate more on the network requirements? How much bandwidth is required? (I presume latency is less of a problem)

cshenton · on April 24, 2018

The algorithm only communicates sequences of seeds so the communication overhead is very low. The master server doesn’t even break 1% cpu servicing a few 100 workers.