10nm versus 7nm

dmarcos · on May 1, 2016

I've been following the semiconductor news in the last year closer than ever before. It really feels that the 7nm node will be the end of Moore's law. As a software engineer it intrigues me the constrains is going to impose to our craft in the next decade. We've been riding the performance exponential growth that allowed us to be sloppy programmers and still produce increasingly complex software. I work on the VR space and the conventional wisdom is that headsets and GPUs are going to get drastically smaller and powerful every year. I think this is not a given anymore and as a software practitioners we will have to step up the game and be much more careful about using resources. How is the end of Moore's low going to affect the cost of Software Development? The top software companies will be those that best use the hardware. Performance optimizations usually makes software more complex and increases the rigor you operate in. Many people have been able to got from 0 to make relevant contributions after learning programming in a few months or going through a coding bootcamp. That might get harder and harder since you will need deeper knowledge to produce competitive software. Is this going to increase the barrier of entry for developers? Is the developer pool going to shrink? Salaries go up? Is it going to be more a and more expensive to produce competitive software? Will it make software startups more capital intensive, since gluing 4 pieces of existing code around a good idea won't be enough to be successful?

AnthonyMouse · on May 1, 2016

The interesting thing is going to be the potential for "free as in speech" hardware. Before the problem was that hardware was vertically integrated. If you wanted a state of the art processor you had to use a design from a company with a state of the art fab because nobody else had the resources and inclination to update their design for every new process node.

If we get to the point that TSMC has the same technology as Intel and IBM and it never changes then some hardware hackers will eventually publish something like a 250MHz processor that uses 10 watts on that process, which will be "good enough" for some embedded devices to justify a fairly large production run. Then you have a community and people make incremental design improvements for the same reasons Google and others do for Linux and in a few years the free hardware design is above 2GHz and below one watt.

nickpsecurity · on May 2, 2016

I really doubt that's going to happen given that I've been pushing various parties to do it who have skills, EDA tools, and some academic funding. And it's still not done.

My advise is still the same, though: put Gaisler's Leon3 SOC w/ key I/O on fab at 180nm-90nm using shuttleruns like Tekmos. Its been GPL with no action forever. Already has OSS software on it, too. SPARC compatible trademark only costs $99 if you wanted to be official.

So, if there's such a will or demand, where's our open-source SOC that uses Leon3? It's been one MPW, packaging, and PCB run away forever. Still don't have one...

adapteva · on May 2, 2016

Not sure I see what the purpose of taping out Leon at 180nm? Is it just to have an open source CPU? Hardware is only useful if it's competitive.

nickpsecurity · on May 2, 2016

The parent was satisfied with a low-performing chip that was fully open. There's a niche of people that want fully-open, inspectable hardware they control totally. I keep calling out proponents with Leon given they've been able to accomplish that at 350nm w/ visual inspection and 180-130nm affordably at least with OSS HW. Plus, there's quite a few open, cell libraries at 0.13nm.

Anyway, you know what I wanted: open-source, 28nm FPGA w/ key I/O already included so I could synthesize my own CPU's and such on it. Verify one design in hardware, worry about the rest in simulation. You never got back to me on that one. Whether it was a good idea to solve most people's open-hardware concerns or what it might cost?

(I'll say a Virtex-6 amount of slices or memories with USB 2.0, DDR3, PCI-E, SATA, and 10/100/1000 Ethernet onboard. Should do.)

visarga · on May 2, 2016

We might not need full trust. We could just add a secondary processor isolated from the system which can run encryption and store its keys inside, like the iPhone does, but one that we are sure is not corrupted. Maybe the security processor could also run a firewall to filter out which app can communicate what.

bhouston · on May 1, 2016

Isn't this was raspberry pi is?

azernik · on May 1, 2016

Raspberry Pi does not use open source hardware. The processor core (ARM) is proprietary. The GPU (VideoCore IV) is proprietary hardware, and even includes proprietary software in the form of microcode. And so on and so forth.

Open source hardware refers to the silicon itself using open source designs.

LukeShu · on May 1, 2016

The VC4 GPU is a lot worse than microcode! It's running the complete (proprietary) ThreadX operating system, regardless of what you choose to run on the ARM core.

sklogic · on May 2, 2016

You meant VPU, not GPU? The GPU is fully open.

LukeShu · on May 2, 2016

Yes, I called the VC4 the GPU because the parent post (and many other resources) call it a GPU. But yes, it is the VPU.

kyzyl · on May 1, 2016

Indeed, Broadcom doesn't even allow you to buy their proprietary SoC (at least, last I checked), so you can't make a knock off RPi board, or integrate the SoC and peripherals into a custom application board. You need to use an RPi directly or get some other Cortex-A53 chip with a comparable architecture.

dietrichepp · on May 1, 2016

Open source hardware refers to both the system level design and the IC level design. Credit where it's due, Raspberry PI is open hardware on the system level. You are right that it is not open on the IC level.

adapteva · on May 2, 2016

Nonsense. It's not open at any level by normal definitions (although some of the software is open source). The Raspberry Pi is a wonderful thing. It's cheap and it works great...but it's not open.

chongli · on May 2, 2016

No, but it is what RISC-V is! I'm really excited about RISC-V after reading Andrew Waterman's PhD thesis[0] on it from earlier today.

[0] http://www.eecs.berkeley.edu/Pubs/TechRpts/2016/EECS-2016-1....

code5fun · on May 2, 2016

Will the US allow something like that happen? The future is developing towards hardware backdoors in all processors and this would make it really hard to implement such system. There is no way this is going to happen.

ChuckMcM · on May 1, 2016

I believe you are correct as well and they mention Moore's Law being broken in the article[1]. Silicon at least is pretty much done for. And if their cost estimate of over a quarter billion to do an SOC in 10/7nm is accurate you won't see very many interesting "side" projects on that node, those will all live on 28nm at $30M/project.

But the more interesting effect for me has been that Moore's law has slowed down but the exponential growth it had given computers allowed them to speed past even our more amazing ideas. We haven't reached Jim Gray's "smoking hairy golfballs"[2] yet but we also don't upgrade our machines nearly as often. When I bought a new laptop, I bought it for its stylus experience, before that I bought a new lap top for its retina display, and two years before that was the last laptop I had purchased because the one before it wasn't powerful enough to do the stuff I needed to do (most e-cad, m-cad, and code development)

So how does the world change? Some software places might consider writing software that doesn't change for change sake. Every year my m-cad vendor wants me to upgrade to the latest version and every year its the same story, tweak here, bug fix there but no new features. Support for new things the OS has imposed sure, but not real features.

This has been great for Linux folks, where the same computer has been fine for 5 - 6 years now. They can look forward to actually seeing a computer fail due to old age rather than forced obsolescence. It will be harder to maintain a distro that runs on a decade's worth of hardware changes. And that latter is the killer.

So computers aren't changing but as I mentioned before the I/O around them is, better screens, better peripherals, different ways to do storage. Since that is the only way to force upgrades these days, I expect to see a lot more of that.

[1] From the article -- “Because Moore’s Law has broken down and you no longer get gains in all areas at the same time by going to the next node, each foundry customer will have a different strategy, depending on which parameter is most important,” according to a source at one large customer, who asked not to be named.

[2] http://research.microsoft.com/en-us/um/people/gray/papers/Su...

speeder · on May 1, 2016

I would like to point out that not all ideas have been outpaced by Moore's Law, some obviously hit serious blocks after the barrier in the early 2000s was hit and caused the invention of multi-core on consumer machines.

That I can personally attest to:

1. Simulation games are mostly stuck in power available since the 2000s, many common game algorithms can't be made parallel, and several algorithms increase in computing cost faster than processors increased in power since then (example: Astar, its computing cost rises quadratically with the search space, and the search space when you want to increase a game map size ALSO increases quadratically, thus to double a city simulator map size, you need to run A* 8 times the processing power... if you pay attention city simulators since SimCity 4 release tended to make instead smaller maps with more miscellaneous simulations).

2. CAD, CAE, etc... In this case the recent advances, including in GPU design made those software much faster, but they are still rather slow, if you need to compute a complex project, or even a simple project with greater precision, you still need some crazy powerful computer... As personal anecdote: I am trying to design a PC case for myself, decided to try my hand with CFD, first to test I made just a empty box with a "fake" fan, that is just a number saying how much air get inside the box... got a good result in 3 minutes. decided to test what happens if you have 4 fans side by side... well, this creates turbulence, thus need a more detailed grid to calculate that, the grid generation alone took 20 minutes, I decided to make half-assed simulation and the result was mostly nonsense. I decided then to test if I could model the entire fans, including their blades, housing, etc... I got a already done precise model of a fan, tried to generate a grid... and my computer overheated after 5 minutes using an i7 at 100% Of course, I knew I was "abusing" things, still, it became obvious to me that CFD for example, for anything "serious", is mostly impractical, if I had infinite money, I would still need to setup some kind of huge cluster or super computer, and the code to run CFD in that would be very complex, the thing is possible, but still impractical, even with infinite money.

yosefk · on May 1, 2016

Their figures for chip costs at all nodes are bullshit including the $30M/chip at 28nm figure. It's inflated out of all proportion and of course it depends tremendously on what you are doing.

petra · on May 1, 2016

Interesting. Could you please give more accurate examples, and maybe a rough breakdown between masks/tools/front-end/back-end ?

yosefk · on May 1, 2016

Adapteva has done chips for <$2M AFAIK, definitely <$5M, at 28 nm. Of course few chips are like theirs in terms of design simplicity relatively to their large size except for bitcoin miners. For $10M however you should be able to do pretty big designs with lots of licensed IP, heavyweight simulation acceleration etc. A breakdown of real projects involves violating NDAs.

Below 28nm their numbers become really deranged. This overstatement is IMO peddled by big name chip companies trying to convince analysts that FinFET nodes give them a moat. In reality the end of Moore's law means they'll face vicious competition.

Of course nothing prevents you from spending huge amounts of money. Buy an ARM architecture license - there goes $30M and that's before you pay for developing your own ARM-compatible CPU.

adapteva · on May 2, 2016

Yosefk, thanks for pitching in and fighting the complete bs that gets flung around regarding chip design costs. Some of these projections are just ludicrous, clearly made by people who are not currently involved in chip design.

For argument's sake, let's pin the tapeout cost at $5M and EDA costs at $1M. Everything else is engineering. The reason these projects are expensive is that the chips are complex (because the have 10B transistors!). It has nothing to do with technology scaling (except for the fact that smaller nodes lets you integrate more transistors). I can design a memory chip (with repeating structures) in any node for almost nothing (..besides the tapeout cost). If they would have costs have escalated from $50K at to $5M at 14nm, that would be reasonable. The $250M number is utter nonsense....

Some more background:

http://www.adapteva.com/andreas-blog/semiconductor-economics...

nickpsecurity · on May 2, 2016

The other comment to yosefk is mine citing your Lean Fabless article to contradict the point you just made slightly. Feel free to clarify on that if I'm misinterpreting something. Yet again, your next link supports "ludicrous" prices if it's a complex, full-featured SOC along lines of Apple products. Perhaps they should put qualifiers in these articles to be clear they're talking about high-end stuff with big numbers of parts and engineering required. If they even know what they're talking about.

Yet, two of your articles suggest me doing a Snapdragon or something without buying third-party I.P. or RTL could easily cost the numbers in the article on 28nm. Or the other nm. Intended audience of those big nodes will also mostly be companies making stuff like that where they can afford development costs. Do shuttleruns even exist for 16/14nm?

nickpsecurity · on May 2, 2016

You're talking about this:

http://www.adapteva.com/andreas-blog/a-lean-fabless-semicond...

Adapteva is pretty clever. Yet, they don't tell us whether they reached $2 million at the 16-core model with 64 costing much more or if that was both. The write-up also says they used third party I.P.. It implies they outsourced as much stuff as possible. You can bet it was mostly standard cell rather than custom in those. They used the shuttle runs I often advocate which are available on their chosen nodes. Biggest part, they recovered this by selling to niche markets that didn't care much about price. Try that in mass market outside being Apple you will fail.

All quite different than doing a whole SOC by yourself on 28nm. They even admitted complex SOC's can cost big money:

"Complicated SOCs are indeed very expensive to develop and there are few if any examples of complete products that have cost less than $10M from start to finish. "

So, now we've shifted from $2mil to $10mil that quickly. Gotta wonder what in-house development of full SOC a la Snapdragon with bells and whistles at 28nm costs with only packaging or PCB's outsourced. Is it maybe $10-30 million? ;)

adapteva · on May 2, 2016

We can't tell you the detailed costs because this would violate our NDAs, but until our 2011 28nm tapeout, we had raised less than $2.5M in capital and taped out 4 chips at 65 and 28nm.

There is no mystery here. EDA costs, tapeouts, engineering costs are all known. Tell me the chip you want to build, I'll tell you how much it will cost.:-)

People seem to have a hard time grasping that the term "SOC" is meaningless when it comes to predicting cost. It's like comparing the cost of a Ford Focus and a Ferrari. Sure they are both cars...

nickpsecurity · on May 2, 2016

Thanks for the clarification. That's hopeful. I just gave you a sample question on cost in another comment. Even a range will be helpful when soliciting academics on it.

adapteva · on May 2, 2016

I provided ranges in the paper. To focus in further, you will need to get actual quotes from vendors. This is because the price is completely opaque and depends on circumstances.

http://www.adapteva.com/andreas-blog/semiconductor-economics...

nickpsecurity · on May 2, 2016

Oh damn. I figured it wouldn't be that easy. Thanks anyway as your write-ups have been helpful. I imagine the chip-to-chip interconnect might be, too, at some point. :)

KSS42 · on May 2, 2016

Were your tapeouts shuttles or full production tapeouts?

adapteva · on May 2, 2016

Until 2011, all of our tapeouts were shuttles. (for our initial market the per die cost was fine). You get 50 dies/wafer on a shuttle run and can order a lot of wafers...with a theoretical per wafer price of $5000 that would mean $100 die, which is not bad for some systems.

Full mask tapeout was done later.

nickpsecurity · on May 2, 2016

What shuttle service did you use for 16 and 64 core development? Might help someone stay close to your cost profile. Btw, your data sheet for 64 core says 65nm, too. I thought it was 28nm or maybe I misread one of your posts. 16 to 64 core on same node would be nice. :)

And good luck with "The Brain." Saw that on your website just now. Funny that I was thinking of proposing to someone a prior product for revival that did NN's with 256 8-bit cores on 0.5 micron with good results. Then, ABC Thinking Machines Super-Connector had badass GA results back in the day with 64,000 simple cores. Your design looks like a Thinking-Machine-on-a-chip level of cores with cores being more versatile. Not 64-65k but close enough. :)

KSS42 · on May 2, 2016

You might find this interesting:

https://www.synopsys.com/dw/ipdir.php?ds=ev52-ev54

nsteel · on May 2, 2016

This.

Also, the design man-hours quoted in the article must be when designing from scratch, and may or may not be accurate. In the context of moving to a new node and iterating on a previous design, the quoted numbers are junk.

WalterSear · on May 1, 2016

> So how does the world change? Some software places might consider writing software that doesn't change for change sake.

Inconsequential feature creep doesn't happen for changes sake, it happens because people want to keep making money.

kbenson · on May 1, 2016

Well, when the performance cost of those features can no longer be waved away easily as something most people won't have to deal with sure to faster hardware, they may need to come up with some different ways of making money, or at least put some more thought into the current strategy.

mabbo · on May 1, 2016

> you will need deeper knowledge to produce competitive software

Alternative theory: there will continue to be bad programmers, but there will also be good money in compilers that can deal with it and turn crap into optimized goodness. Midas compilers.

I also suspect a few new language concepts (like Rusts ownership model) will need to be invented as well, to enforce certain optimizations with minimal cognitive load on the developer.

pstrateman · on May 1, 2016

> Alternative theory: there will continue to be bad programmers, but there will also be good money in compilers that can deal with it and turn crap into optimized goodness. Midas compilers.

No amount of compiler optimization is going to fix algorithmic failures.

chipsy · on May 2, 2016

This has already happened with SQL databases. Performance tuning a database can be messy because sometimes you have to outsmart the query optimizer. But in the average case it is doing the Right Thing without you having to think about what engineering has gone into it.

Demand for new data models is insatiable and for each one, a software ecosystem can develop around a similar degree of automation, tuned towards the specific domain. It's a very, very black-boxed future.

mabbo · on May 1, 2016

Depends on what limits you place on the definition of 'compiler'.

Optimizing via better algorithmic choices is a semi-deterministic process performed by a human. Why can't a computer do it too?

Code is just your way of telling the computer your intentions. Once it can understand those deeply, well, it can choose a better way to achieve those intentions.

petra · on May 1, 2016

To some extent the mobile platform did the same thing:increasing the constraints, both on performance, and on interfaces. So maybe we'll see something similar ? more demand, higher salaries, but non the less, options for relative noobs to do interesting things ?

kolinko · on May 1, 2016

"Nonetheless", ftfy. Sorry for nitpicking.

_nfin · on May 2, 2016

I'm actually of the opinion that when we can't scale smaller we will then scale horizontally/laterally.

More transistors, larger chips.

We have bought so much physical space going smaller and smaller - and I think a lot of people will happily pay for a thicker iPhone if it's not more room allocated for a larger battery.

Eventually - when that becomes ridiculous - we will write better software. More likely we'll buy an extra couple years after the end of Moore's Law and Intel will make some unexpected breakthroughs to go smaller again.

Color me cynical?

randyrand · on May 2, 2016

1 word. Heat.

Zardoz84 · on May 2, 2016

Another words to add : yield/cost

astrodust · on May 2, 2016

Intel had to work their ass off to get them out of the ditch they dug for themselves when pursuing insanely high clock speeds in the Pentium 4 era.

This heat problem can be solved if they make it a priority. Until now they've been able to get away with ignoring it as new nodes have kept them competitive. If that changes, which it is, a new approach is required.

whazor · on May 2, 2016

7nm is not the end of Moore's law. The whole reason why there are problems now is because they are switching from paradigm. ASML is the largest supplier of photolithography systems and has been upgraded from 32 nm to the next 10 nm.

However, as the new EUV machines are being put into production. It does not make financial sense to still make the 10 nm chips. As buying the new machines and infrastructure is a huge investment, you can understand that some companies are focussing on the existing 10 nm machines (maybe focus on lower end of the market). The problems with EUV are mainly starting problems with using different technology, IBM has already made 7nm chips and ASML has 5nm and 3nm in their roadmap[1].

[1] http://arstechnica.com/gadgets/2015/07/ibm-unveils-industrys...

[2] http://www.computerbase.de/bildstrecke/61547/2/

ozmbie · on May 2, 2016

Honest question: what's in the cards for after 2022? That's only 8 years away. How can the trend continue for another 50 years?

visarga · on May 2, 2016

> Is this going to increase the barrier of entry for developers? Is the developer pool going to shrink?

I don't think we are going to see that. Most use cases are not performance constrained (ex: a simple app or website). What you're talking about applies to cutting edge graphics, machine learning, OS and compiler design and a few other computationally intensive domains. Our computers are going to be at least as fast as those of today, so they will be able to run even the sloppy software of today with no problem other than (maybe) excessive power usage.

I think we are going to see an even greater shift towards asynchronous and parallel programming and design of specialized hardware for speed and low power. Also, we're going to have a much more complete AI toolbox than today, containing ready made image, video, audio, text, dialogue and behavior modules. For example, there is no need to reinvent the wheel when we already have a neural net that can distinguish tens of thousands of objects. Once it's trained in one place, it can be easily copied and run everywhere else, like an app.

Joeri · on May 1, 2016

Performance per dollar may still rise, when existing processes are made cheaper. But, I think for software it's clear that parallel processing is the way forward, and it impacts developers who need to upgrade their skillset, and language and tool makers, since just offering threads and locks and leaving the problem of scaling to the developer just won't cut it.

I've been doing a project that's a combination of java play, with its NIO and actor model approach to parallellism, and apache spark, with the rdd approach. Either solution lets you scale horizontally without having to manage locks and threads yourself and I like both approaches, but you can tell it's still early days. These approaches need to mature, standardize, and then percolate throughout all languages and frameworks, but eventually they will and we will be writing parallel code by default.

BuckRogers · on May 1, 2016

That's an idea I've been thinking about. But I have a few responses that have been bouncing around my head on that topic.

- Do we need much faster hardware for most tasks people are actually trying to accomplish? I don't think software is necessarily becoming more complex. It could be simplified it's just rushed out the door. I think complexity has plateaued. A common run of the mill usecase of application-layer software, a CRM, isn't limited by CPython (and definitely not by PyPy) today and there's no reason why it would start tomorrow.

- Something new will come along for hardware. So it actually is inevitable it moves forward. This isn't anywhere near the end. It may stratify hardware. If you truly need more performance, it may be very expensive hardware. The rest of us will rely on servers running on 7nm chips for ages. In connection to this point, the machine I'm typing this on is from 2009. A Q9450, it's still more CPU processing power than I need. I think that says a lot. The only reason I want to upgrade is for less power draw because that means less heat produced. If I needed more power for video, 3D rendering or compilation I'd use networked processing which would annihilate any local big iron I could put up in my basement.

- More development platforms will dump everything that isn't LLVM based. Significant gains could be had if say, Python was reimplemented on top of LLVM.

- For cases where the everyman company needs more power, they won't go for the chips made out of brain tissue or diamonds. They'll use services like AWS to gleam the power needed which will have the more expensive hardware on-top or massive swaths of 7nm CPUs to utilize.

- Chips will increasingly be built for specific tasks. Dropbox will keep having custom designs built by Intel for their specific workload. The general-use CPU will become less popular in the server room but still there for the small to midsize companies to lean on.

So while there's some individual points there, to add the always beloved car analogy. I've downgraded my car from a 325HP model 10 years ago to a 145HP model that I've been using since. Both got the job done within their environment and I suspect at worst, chips will simply become a commodity closer to modern day cars. The advance won't be more HP or better brakes, those are mostly maxed out in capacity. The paradigm shift is where the advancement will be (electrics for cars). But it's been settled for a while that either way, I really don't need over 150HP for what I'm doing.

nickpsecurity · on May 2, 2016

The future is what was started with the old mainframes: general purpose CPU's plus I/O and algorithm acceleration. Possibly onboard FPGA logic. See Cavium Octeon III's w/ their accelerators, the Zynq line which Intel/Altera will likely do for Xeons, AMD's Semi-Custom business that took off, little GPU's in all the SOC's, and so on.

Good news is many accelerators work well at old 90nm and 45nm nodes based on my reading of academic papers. Even startups or FOSS projects can still get plenty of mileage out of the cheaper nodes. The Core Duo 2 I'm typing this on... still running ever bloated web well... was taped out on 65nm.

eslaught · on May 2, 2016

We'll see a lot more co-design of hardware and software. So far, outside of major trends (e.g. multi-core CPUs, GPUs), authors of different abstraction layers (programming languages, compilers, OSes, etc.) have worked towards improvements independent of other areas. I suspect that much of the improvement over the next 50 years will come clever tricks that require collaboration across more than one area.

Consider a simple example: CPUs have had caches basically forever. Caches are the epitome of independent development, because they require modest performance improvement with no effort. (Advanced users and compilers can of course put in more effort to get more improvement, but the maximum improvement is limited because CPU vendors have been reticent to allowing direct user control over the cache.)

What if these caches were instead scratch pads? Frankly, this sounds like painful way to program, especially in traditional languages, because now everything will have to be explicitly moved into the scratch pad. Maybe high-performance users will benefit, but the vast majority will not.

But we can reinvent the language as well. Let's propose a different programming model: tasks (blobs of code) with explicit data usage declarations. The rule is that a task only runs when its data is available. Thus the processor can copy data into the scratchpad asynchronously, and only run the task when all the data is there. Suddenly, there are no more cache misses! We just had to give ourselves permission to modify the programming model too.

Well, actually this is not a hypothetical programming model. I've described Legion, a programming system from Stanford [1]. A software-only implementation of Legion was able to scale to 1000s of nodes on a distributed system [2], so we know the basic approach works. Even better, the Legion folks described a type system which allows you to hide the gory details behind an easy-to-use apparently-sequential programming language [3, 4].

So far, these results have focused on software only. What would it look like if we were able to change the hardware as well? To answer your original question: I think this will require more expertise (and more diverse expertise) on the part of the system designers, but if we do it right, we can still design a system which is easy for everyone else to use.

That's where I think the biggest wins will be in the next 50 years.

(Disclaimer: I work on Regent, the Legion programming language)

[1]: http://legion.stanford.edu/ [2]: http://legion.stanford.edu/pdfs/legion-fields.pdf [3]: http://legion.stanford.edu/pdfs/oopsla2013.pdf [4]: http://legion.stanford.edu/pdfs/regent2015.pdf

thesz · on May 1, 2016

Moore law is about number of transistors per chip (IC). You can make transistors smaller, you can make chips larger, you can stack chips and you can make integrated circuits from several parts.

I think your feeling is wrong.

speeder · on May 1, 2016

What you are saying is PRECISELY why Moore Law ended with 7nm.

Mind you, performance improvements didn't ended, we can still find other ways to do things...

But 7nm is the smallest physically possible transistor (well, theoretically it is 5nm, but that one can't be actually manufactured).

So, you hit 5nm, then what? Your only choice would be make chips larger... but if you do that, you limit your maximum clock (example: a 5cm processor, if it allowed data to travel at light speed, and had perfect cooling, would still be limited to 6ghz)

Or make them taller, but then you can't cool them easily.

That is the problem, Moore Law, as in transistors per chip, ended with 7nm, there is nowhere to go after that, the only thing you can do is add more chips and make them communicate with each other (ie: multi-cpu... see what AMD has been doing in their recent research, trying to figure how to make an API to allow infinite amount of GPUs in the same calculation in a way 100% transparent to the developer).

But making more chips is not improving Moore's Law.

The only thing that can be done now, is invent something else, silicon transistors are done for, you can't improve them much anymore, you can only change the tech (use something else other than silicon transistors) or find novel ways to use current silicon tech (better architectures, multi-chip systems, better code, etc...)

SemiTom · on May 1, 2016

Intel has a roadmap down to 3nm, and I believe Imec does, as well. But the future is probably a combination of different packaging with faster interconnects and better throughput to and from a variety of memory types. There are a couple of ways to drop the heat. One is a different transistor structure, which may allow for a sharp reduction in voltage (0.1 or 0.2 volts vs. 0.7 to 3.5V). The second way is to use silicon photonics between chips in a package, which is coming one of these days.

logicallee · on May 1, 2016

>Or make them taller, but then you can't cool them easily.

Haven't you answered your own question? Your other points are about impossibility - the one I just quoted, about making chips taller, is about difficulty.

If I were the head of Intel I would say "Well. Hmmm. All right. I guess we'd better start making them taller."

Moore's law is not a question of "how hard can it be". As you've pointed out, on the current approach it's a question of "It is physically impossible for us to increase transistor count without dropping the clock rate or moving out of roughly a plane. No thanks to Einstein. (My favorite link on this subject: https://www.google.com/search?q=c+%2F+4+ghz )

You say "Or make them taller, but then you can't cool them easily." Well then, make them taller and cool them with difficulty. Nobody said it would be easy. Off-hand you could:

-> Immerse the taller chip in insane cooling

-> Fabricate little duct work between layers and liquid cool at high velocity

-> Figure out a way to reduce heat waste usage, e.g.: different substrate, not silicon (pretty insane)

-> Possibly combining some of the above points, make it a superconductor.

These are insane ideas from me. But then again, I'm not the head of Intel and you've heard a few minutes of thought on me on this topic, where I'm not a chip designer.

"When you have eliminated the impossible, whatever remains, however [difficult], is your [roadmap.]"

NegativeK · on May 2, 2016

Without getting into semantics, Moore's Law roughly assumes that things will get cheaper or you'll get significantly greater bang for your buck. "Nobody said it would be easy" is a casual dismissal of the fact that consumers won't pay for this, unless someone conjures a materials revolution.

And no, we can't just assume the head of Intel will conjure up a materials revolution solely to adhere to an industry observational law.

logicallee · on May 2, 2016

I don't know, man:

https://en.wikipedia.org/wiki/Moore's_law#Moore.27s_second_l...

This talks about an exponential increase in R&D costs which is expected by Moore. With stuff like cloud computing, I don't see the need for denser and denser CPU's going away any time soon.

thesz · on May 1, 2016

I've seen electronic microscope image of a working polar transistor in few dozens of atoms. Wires were couple atoms wide. The method employed for that scale was able to place about couple of hundreds of tungsten atoms per second.

It was in 2005. I lost contact with these people, I don't know how they do now.

im3w1l · on May 2, 2016

> a 5cm processor, if it allowed data to travel at light speed,

Easy. Just have multiple synchronized clock sources.

yaantc · on May 1, 2016

Strictly speaking, Moore's original paper is not about transitors per chip but about transistors density for the least cost node.

So first it's about transistor density, hence the relation to having smaller and smaller transistor. And also and very important, it's about cost. You could go 3D to increase the amount of transistors per area, but that requires more masks/operations and impact the yield, and increases cost. If in the end the cost per transistor increases, then it's a density increase but it's not advancing Moore law.

In other words, Moore's law can be broken in two different ways: 1) we can't improve anymore the transistor density; 2) we can improve transistor density, but not in a cost effective way. If you want more you need to pay more (vs. getting more for less money when riding Moore's law).

dmarcos · on May 1, 2016

If you care about miniaturization you can't make chips larger. If you stack chips you will have to deal with heat dissipation. What do you mean about making integrated circuits from several parts? I'm not an expert on IC design. The reality is that laptops have not increase performance in the last 5 years as much as they did before. Desktop GPUs are getting more powerful at the expense of being more and more power hungry. I feel that in the smartphone space the claim of "This new model is 2x performance of the previous one" will be harder to make.

solnyshok · on May 1, 2016

re:desktop GPUs - AMD/NVidia stayed at 28nm for the last ~5 years. 14nm coming this summer to both.

kbob · on May 1, 2016

> “It will take chip designers about 500 man-years to bring out a mid-range 7nm SoC to production,” Gartner’s Wang said. Therefore, a team of 50 engineers will need 10 years to complete the chip design to tape-out. In comparison, it could take 300 engineer-years to bring out a 10nm device, 200 for 14nm, and 100 for 28nm, according to Gartner.

My mental model of this process is completely wrong. I thought they'd have a Verilog code base, and moving to a new process was basically a recompile.

What are they doing for 100/200/300/500 man-years, and why do the smaller processes take longer?

sweden · on May 1, 2016

If you are speaking of the digital designs, yes, what you say is absolutely right.

But you need to keep in mind the analog designs. While a typical CPU, a typical GPU or a typical cryptographic IP chip (for a example) can be developed in just Verilog since they are just pure logic, when you move to the Mixed Signal IP, you can't use only Verilog, you have to manually design parts of the chip (creating the schematics of the designs at the transistor level and converting them to layouts).

Examples of mixed signal IP are USB, HDMI, MIPI, Ethernet, PCIe, and so on.

We don't have tools capable of synthetizing really fast PLLs, DCOs, Band Gaps, most of the high speed analog blocks.

But I think the article was exaggerating on the "50 engineers will need 10 years to complete the chip design to tape-out". I work at Synopsys and we are already working on the latest process nodes (10nm, 7nm, 5 nm) and we already have some solutions ready for silicon and it didn't took us 10 years.

All these articles make it sound so dramatic that it's nowhere near the reality. The process of designing the analog blocks again in 10nm, 7nm, 5 nm and so on, it's exactly the same every time. It is very much like porting your C++ code to Python: sure you have to do it from scratch but the process and the fundamental concepts are exactly the same.

It's not rocket science nor quantum physics, it's easier than it sounds. The main problem is the huge cost of the tools and of the technology from the foundry. If you are past that, you just need time and some knowledge in micro-electronics.

Don't be fooled by these over dramatic articles, the hard part of digital and analog design is not getting the knowledge to do it, it's getting the money to have the proper tools to do it.

Cyph0n · on May 1, 2016

Actually, nowadays even some parts of digital designs are full custom, meaning that they are designed manually at the silicon level. This is usually the case with circuits that require high performance, such as adders, multipliers, etc.

Also, you forgot to mention that memory is simply not synthesizable, so on-chip cache for example needs to have a full layout from scratch.

Keep in mind that this is the case for Intel and not ARM. The latter uses standard cells, which means it's basically all synthesized.

Source: my professor.

sweden · on May 1, 2016

The process of design digital circuits will depend of course from company to company, depending on the need of higher performance modules. But the fundamental of digital design is that everything is done using Verilog and the change of process node is completely transparent.

moflome · on May 2, 2016

You're ignoring the complexity of the back-end design process, actually laying out the RTL design is a massive issue at 10nm.

adapteva · on May 2, 2016

If the cost of chip back-end is increasing exponentially from 65nm to 28nm to 14nm you are doing it wrong. (it's that simple). Since 0.25um, I have heard doom and gloom about how difficult "the next node" will be. So far it hasn't happened. If it does, I do agree that it's game over. So far the EDA companies and foundries have always come through.

nickpsecurity · on May 2, 2016

It was that difficult. In my anti-subversion and OSS HW research, I've been collecting papers on EDA techniques and issues going back to the beginning so teams can build replacements for commercial tools over time. The reason it's not doom and gloom for the ASIC developer is that CompSci, EDA startups, and Big Three threw so many PhD's worth of thinking at each set of problems it isn't even funny. Talk about conquering chaos with human ingenuity every time a node shrinks.

As you said, the EDA companies and foundries always came through. It was hard work, though. Work OSS will have to duplicate at least on EDA side.

esmi · on May 1, 2016

I definitely agree with the overall sentiment of your post but don't forget; even pure digital designs need to be powered and cooled and the design problems and available solutions for doing this at 28nm vs 10/7/5nm are different.

bhouston · on May 1, 2016

Great response. Got a question for you then: are we actually hitting the end of Moore's law here or are we just before a new technology transition for CPUs, such as away from silicon.

sweden · on May 1, 2016

We are indeed hitting the end of Moore's law.

The companies that provide IP are watching the market splitting into two: the companies that want the latest process node to show off technology (for example, Apple, Samsung, Huawei, etc) and the companies that want older but stable process nodes (IoT companies and the automotive industry).

When most of the articles speak of the newest process nodes (for example, the 5nm process node) they only touch the economics part of it: "it's expensive, it's costly". But they don't mention the many problems that come with it: it's unstable, there is much more current leakage which increases power consumption unexpectedly, it is not suitable for long longevity and the architecture of your analog blocks have to take into account all these problems which don't exist in the older process nodes.

It is possible to produce solutions in 5 nm, but the question is no longer about if we can do it or not. The downsides os these nodes go beyond the engineering resources and the economic resources that are needed to achieve them.

That is why the automotive industry is not interesting in these kind of nodes. They want the stable and cheap nodes like 28nm and 40nm, they know that these nodes are well tested and have long longevity.

We are also reaching a point in which we already have enough advanced technology for our needs, we need to start to get rid of the need of reaching the next process node in order to have better hardware.

yaantc · on May 1, 2016

There's no single answer here. First, Moore's law is about the density of the least cost process doubling every N months (N changed over time ;). So cost is a key part of Moore's law.

When you talk about cost, there are fixed NRE paid once per chip design, and then marginal cost per transistor (just including production). To get the full view, it's logical to consider a "fully loaded" cost per transistor, where you also amortize the fixed NRE expense taking into account the expected number of produces chips for a given design.

And that's where the increasing NREs bite and make a difference. Even when the marginal production cost per transistor decreases, to benefit from that cost decrease one must have sufficient chip production volume to amortize the NRE and still benefit from a cost decrease. With exploding NREs on more advanced nodes, this requires ever higher volume production to benefit from lower costs on more advanced nodes.

At this stage, if you're a medium chip vendor Moore's law is already over --- because with small volume the NRE leads to an increase in total cost per transistor for newer processes. Vendors like Broadcom and NVidia complained publicly about the end of Moore's law, so we're not talking small volume either.

For very large volumes, it may still be going on. But that's hard to tell. At some point, the cost may go up even for the biggest (Apple, Intel, Qualcomm) but they may still push forward a bit for improved performance --- meaning higher power efficiency --- as long as there are enough people willing to pay for it. It's no longer Moore's law (where you get more for less money), but it's still moving onto more advanced processes. Still, even assuming enough people are willing to pay more for improved performance (a big assumption IMHO) there are quite formidable challenges ahead.

petra · on May 1, 2016

In many places engineers complain about the high NRE costs.Do you as someone from the inside see a solution, in the future, leading for a startup to be able to design with the newest node ?

sweden · on May 1, 2016

No, I don't think there is an easy solution. It is expensive for the foundry (TSMC, GlobalFoundries, Intel, Samsung, etc.) and it is also expensive for the EDA companies that create all the necessary tools (Synopsys, Cadence and Mentor).

Even if the EDA companies were to provide all of the tools for free, the startups would be facing the high costs from the foundries.

But to tell you the truth, there is not much incentive to design in the newest node if you don't have money to spend. The technology not only is expensive but it is also unstable.

adapteva · on May 2, 2016

Agree that there will be less designs that need the advancing nodes (the number of apps needing 1B transistors per chip is limited), but there will be plenty of opportunities at the system level that warrant the $10M type investment needed.

petra · on May 2, 2016

Idk. The recent tsmc 16nm finfet process offers great leakage numbers (30pa/um) . coupled with packages that can manage extremly tiny packages like the one from tere-circuit could be really interesting , if it was affordable.

Or the other option is to build multi for chips like marvel with their Lego concept offered, all at advanced nodes, but most of the dies after reused from previous designs.

adapteva · on May 2, 2016

imho, the only reason to go to 14nm and below is if you are pushing the envelope in terms of performance (HPC) or integration (smartphone). Everything else can probably stay at 28nm or above and be combined with 14-7nm using 2.5D or 3D package integration. Note that unless you are a Tier-1 system player you will "never" be able to get known good dies from a big fabless semi vendor, but taping out twice yourself and combining them yourself is pretty straightforward if you have a good packaging partner (like Amkor/ASE)

williadc · on May 1, 2016

Your mental model only covers the front-end of design. The back-end, where I work, is very dependant on process. You must update the design to take advantage of process improvements (otherwise why bother moving?), and the new process comes with new restrictions for placing transistors and interconnect that must be accounted for.

petra · on May 1, 2016

Your part of the process is a mystery relative to the front end - which is similar to fpga design. Can you explain/link to how your work process looks like ?

vasco · on May 1, 2016

It means you live inside cadence actually drawing where metal will go, where poly will go and so on, you design the actual transistors, connections, everything. You do this design and you're constantly fixing warnings and errors about process limitations such as "this metal line can't be thinner than X amount", or "on level 3 you can't have this interconnect so closer together" and so on. Looks more or less like this: http://i.imgur.com/ECIVChO.png

petra · on May 1, 2016

Thanks.

But how does it Apply to a whole SOC ? you basically go to the critical sections and optimize them by hand, better than the synthesis tool ? Or just supply some basic components to the design team , and if so isn't it the job of the fab and the PDK(process design kit) ?

sweden · on May 1, 2016

I'm not the OP, but the same does not apply to the SoC.

What the OP showed you applies only to the Mixed-Signal IP, when you have a Digital part (developed in Verilog) and an Analog part (developed in tools like the OP showed).

When you start building a SoC, you will build it like you would be building a Lego: you contact the various IP providers (Synopsys, Cadence, ARM, Imagination technologies, etc) and you start buying IP for the CPU, for the HDMI chip, for the memory (DDR), and so on, and then you put them together.

The IP providers are responsible for making sure that you receive the layout of the IPs working for a certain process node. If you are just building the SoC, you won't need to look at the schematics or at the layout.

yaantc · on May 1, 2016

No, it's not only for mixed-signal IPs (unfortunately). It's also for soft IPs, in Verilog. When you start from Verilog you still have to do the back-end work, to meet timing constraints. Only "hard macros" IPs requires no back-end work, as the work has already been done by the IP provider for a very specific process. If you have a soft IP (Verilog) on the other hand, which are portable across nodes, the back-end work is for the SoC vendor to do. And yes, the IP design can make this more or less easy, but there's still work to do, and this back-end work gets heavier and heavier with more advanced nodes.

sweden · on May 1, 2016

Yes, of course that the Place&Route part requires some work at the layout level, but it's not like you are going to place your transistors by hand after the synthesis tool produces a netlist.

kolinko · on May 1, 2016

Is there a chance of there being better / more automated tools?

williadc · on May 1, 2016

The tools improve steadily, but the complexity of the designs themselves, and the process design rules are also increasing. Cadence and Synopsys (the two biggest EDA vendors) have both released major new versions of their automated place and route tools, with major improvements. However, 10nm presents new types of restrictions that need to be modeled and accounted for.

petra · on May 1, 2016

I think this area is controlled by a 2-3 big companies so it depends how they enable and accept startups. Building all the tools is very challenging - very large investments, maybe in the billions.

nickpsecurity · on May 2, 2016

Mentor, the little guy, was spending around $300-400 million a year on R&D. They also often acquire research and tooling from CompSci much to my irritation given I'm digging it up for OSS EDA purposes. So, yeah, ridiculously huge investments.

solnyshok · on May 1, 2016

any Y startups in this area?

typon · on May 2, 2016

These aren't simple 'apps'. These tools solve several NP complete problems, with decades of research in algorithms behind them. It is near impossible for a startup to compete with Cadence. Most startups are just plugins to Cadence that improve a tiny part of the flow, usually written in SKILL. And when they get good enough, Cadence acquires them.

williadc · on May 1, 2016

Is that wrong-way poly? I haven't seen that in years!

williadc · on May 1, 2016

Others mention SoC back-end as being separate from custom design, which is definitely true, but process has an impact on both.

For custom digital design and SoC, you start with RTL.

In custom design, you read the RTL and create a schematic using logic gates which implements the RTL spec. Generally, you have something to start with from the previous project, unless your block has seen massive changes. Then you floorplan the design, placing the interface and gates, and drawing the routes of the critical nets. Then you perform static timing analysis and many other checks to converge the design. This is a very iterative process, especially as some of your timing analysis depends on other blocks.

(I don't have experience in analog, other than working closely with a few analog designers as a customer for some mixed-signal design I did a few years ago) Analog designers use similar tools to create schematics/netlist and physical design. The verification process is much different, with a lot of work going to developing simulations to ensure the circuit works as it needs to in a wide variety of conditions (process variation, operating voltage, and temperature). There's probably a lot of other stuff as well, but I am not privy to all they do.

For SoC design, you compile RTL to netlist, using a logic synthesis tool. The synthesis tools are smart enough to approximate the physical design of the block, and you can refine your recipe to guide the synthesis tool to converge static timing in the netlist itself. Once that's complete you move to the automated place and route (APR) tools. Here you will spend some time and effort to floorplan your block: mostly placing of large sub-blocks and your pin interface to the rest of the SoC. Once you have that done, you'll go though placement, clock-tree synthesis, and routing, refining the recipe that will converge your design. Again, you will iterate on the design to converge static timing analysis and other checks.

In both these cases you will also need to converge the physical design to meet the design rule checks. The automated tools available in both environments take the design rules into account, but the complexity of these rules is not fully understood by the automation, and so user intervention is required. Depending on your group, you may have a mask designer available to help with meeting process design rules, but most groups require the block owner to do the majority of work in this domain, with the mask designer available for final clean-up at tape-in. Hope this helps.

thesz · on May 1, 2016

The "recompile" needs then to be extracted into another Verilog source code with all delays taken into account and all tests should pass on that extracted source. Then, if you are diligent enough, you execute static and dynamic analysis.

One should know that for processes less than 90nm the shape of wire starts to play - delay begin to depend not only on the length of wire, but on the number of turns as well.

All this means that the smaller the process, the more time and computing resources will be spent on post-synthesis, place and route stages.

esmi · on May 1, 2016

I would argue it's a recompile in the same sense that moving a c-based, command line only, project from OSX to Linux is a "recompile". It will most likely kinda work assuming you only touched the most fundamental API calls but the devil is in the details and those are different.

Even if we constrain ourselves to pure "digital" designs that somehow magically meet timing perfectly there will still need to be significant changes in architecture to the chip.

For example, presumably you went to the lower process node to up the transistor density. This will have serious impact on how charge and heat flow in the design so those systems are going to need a serious overhaul no matter what the RTL was. This is going to impact the layout and the area constraints which is likely going to ripple into the RTL hierarchy.

That said, the article's estimates all seem a bit high to me but it's impossible to be certain as they don't specify any of their terms. Is, say, a pipelined ADC in 28nm CMOS a "chip" in their book?

Panoramix · on May 1, 2016

My guess: As you go to faster circuits, parasitic capacitances and a lot of high frequency nuances become even more critical. These effects are relatively more difficult to design for, and not a matter of just recompiling, unless you are making a slower chip (but why would you do that using an advanced node?). So everything is a bit more difficult: metallization, distances between different circuits on chip, etc.

In addition to this, manufacturing design rules get more stringent and complicated, which result in more headaches for the designers. This is not only from the HF standpoint but things like narrower voltage windows, high power dissipation from all those extra transistors, etc.

wallacoloo · on May 1, 2016

> TSMC is moving from a 2D layout scheme at 16nm to 1D technology at 10nm. 1D layouts are easier to make in the fab, but they involve more restrictive design rules.

What is a 1D layout? Surely it must be something of a misnomer, as the only way I can visualize it is as a barcode, where each material you place has to span the entire width of the chip - and that seems utterly worthless.

Panoramix · on May 1, 2016

Some high end processes only allow you to pattern a bunch of parallel, narrow lines (hence 1D). These are subsequently "cut" in a useful way, and you can connect to them using a different layer above or below.

mjrpes · on May 1, 2016

Could anyone help me understand why is it 9 times more expensive to design a chip at 7nm than 28nm?

I would have naively thought chip design is fairly automated, but a 9x increase makes is sound like every gate and path is delicately hand-stitched like a Persian rug.

mikhailt · on May 1, 2016

You nearly got it, it is being "hand-stitched" but with lasers. It's the way the chips are etched with lasers (patterning) that is driving most of the costs.

The lasers have a limit of how small it can get and the smaller the chips get, the more you have to re-etch with different materials (for the laser to go through) to get it to where you want it, which also increases the amount of defects that will happen on the chip, driving up the testing costs as well.

28nm was the limit with a single pattern design. 20nm required a double-pattern and certain 10nm/7nm designs will need triple or quad-patterns which is super expensive and slows down the process.

That's what the article meant with this statement:

> But at 7nm, optical and multi-patterning are simply too complex and expensive, at least according to Samsung. So to make 7nm cost effective, it makes more sense to wait for EUV. In theory, EUV can simplify the patterning process.

You can find more information here: http://www.anandtech.com/show/10272/samsung-foundry-updates-...

This might help as well: http://www.extremetech.com/computing/160509-seeing-double-ts...

deegles · on May 1, 2016

I think it's because at that point they can't use a single "mask" for the lithography and have to do multiple exposures with different masks. The equipment is also fabulously more expensive. The D1X site will have $6 billion invested: http://www.bizjournals.com/portland/news/2012/10/24/intel-pl...

mjrpes · on May 1, 2016

I can understand the dramatically increased costs with chip production. But the article makes it sound like the 9x increase is with the design of the chip (e.g., with an HDL simulator).

yaantc · on May 1, 2016

The design cost is increasing. It's not as in software where you write high level code (Verilog for IC) and let the compiler do its job, and in most cases you're done. For IC design, there's a two part process: the front-end design in Verilog, and then the back-end design to map this to the process implementation. With old nodes, the back-end was a simple affair and mostly automated --- software like. But with more advanced nodes it's no longer the case. The tools need proper constraints to generate something that will work (like timing constraints), and this need human input. And at the most advanced nodes, it's common that when you reach the back-end work one realize that the design is not good enough and need to be tweaked to let the tool generate a good implementation. This creates round-trips between back-end and Verilog level design. And these are slow and expensive round trips (synthesis, tests in the loop are heavy).

So it's far from fully automatic. There's a lot of work done on the tools, but there are also a lot of additional constraints to deal with with each new nodes. It's a real complexity explosion, and this is what makes designs on advanced nodes so much more expensive. This a highly simplified description but hopefully enough to get a feeling for what's going on.

adapteva · on May 2, 2016

Simply not true as a general statement. There is a completely automated path from Verilog to GDS through modern EDA tools. While the back end complexity has increased, the folks at Synopsys, Cadence, Mentor have solved all of the problems for the designer (IR drop, cross talk, DFM, OCV, etc). Good designs have fully automatic flows with a turnaround time from RTL to tapeout of less than 24hrs.

KSS42 · on May 2, 2016

In theory but not in practice. 24hr RTL to GDS has been the promise, but the design and process complexities keep growing.

Depending on the complexity of the chip, the backend process takes several months. It's an iterative process. A reasonable complex chip has to be split into several partitions. The size of the partition determines the turn around. In the designs that I have worked on, we try to limit the turnaround time (RTL->GDS) to be one week for the larger partitions.

I would change your statement to say that the EDA vendors have created tools that allow Physical Design engineers to address the design challenges. It is still a gruelling, iterative, painstaking process.

adapteva · on May 2, 2016

Agree with you. Taping out chips is still like going to the dentist for a root canal. Still, we may have to disagree regarding the absolute numbers. For every one of our chips so far, we made RTL changes less than 24 hours before tapeout. (clearly this means our basic blocks are very small and we don't have a lot of them).

petra · on May 1, 2016

Is it just a matter of time and investments until NRE costs will go down ? and how far can that go ?

Animats · on May 1, 2016

Smaller line size doesn't mean more compute power any more. Clocks maxed out around 3GHz some years ago. For CPUs, getting rid of the heat has become the limiting factor. (Yes, with aggressive cooling you can go faster, but most systems don't go there.) Electromigration is starting to limit component lifespans.

But that's for CPUs. Memory, especially mostly-inactive memory such as flash devices, still has a few iterations ahead. Memory devices can tolerate and recover from bad cells and bad rows. There's also the option of going 3D and stacking memory. That doesn't work for CPUs because getting the heat out of the middle is very tough.

The other big problem is cost. Fabs have become multi-billion dollar projects. The article says that designing an SOIC for 7mm costs upwards of $200 million. (Why? Design tools? Mask making?) If cost per gate declines, we can still get more compute power by making bigger parts with more CPUs. Single CPU performance probably isn't going to climb much more, though.

Fortunately, most of the things we want to do in machine learning and AI can be done in parallel. This problem isn't going to keep us from getting to AI.

aexaey · on May 1, 2016

Wait, did I read this right - Intel is going to be the last one to get 10nm?

> Samsung, for one, plans to ship its 10nm finFET technology by year’s end.

> TSMC will move into 10nm production in early 2017

> Intel will move into 10nm production by mid-2017

hyperpallium · on May 1, 2016

> It’s now becoming clear that what other companies call a ‘10nm’ technology will not be as dense as Intel’s 10nm technology.

slashdev · on May 1, 2016

Yes, that's what a lay reading gets you - and why I was disappointed with the article. Is the author just clueless or are they trying to make one of Intel's competitors look better? Everyone who follows these things knows that the other guy's 14nm node is more comparable to Intel's 22nm node. Apparently a nanometer is not always a nanometer...

petra · on May 1, 2016

I think this journal aims to professional who know the details about intel/others process nodes, so he doesn't mention that specifically.

i73668012 · on May 1, 2016

The only thing you should be disappointed with is your reading comprehension. There are three paragraphs dedicated to your complaint.

>“Not all 10nm technologies are the same,” said Mark Bohr, a senior fellow and director of process architecture and integration at Intel. “It’s now becoming clear that what other companies call a ‘10nm’ technology will not be as dense as Intel’s 10nm technology. We expect that what others call ‘7nm’ will be close to Intel’s 10nm technology for density.”

>It wasn’t always like that. Traditionally, chipmakers scaled the key transistor specs by 0.7X at each node. This, in turn, roughly doubles the transistor density at each node.

>Intel continues to follow this formula. At 16nm/14nm, though, others deviated from the equation from a density standpoint. For example, foundry vendors introduced finFETs at 16nm/14nm, but it incorporated a 20nm interconnect scheme.

zaroth · on May 1, 2016

Would upvote but for your first sentence.

slashdev · on May 1, 2016

Missed that, I withdraw my complaint. The snark is not necessary though, not everyone has time to read the entire article, I read the top and skimmed the rest.

testrun · on May 2, 2016

And you start your comment with:

Yes, that's what a lay reading gets you - and why I was disappointed with the article.

martinald · on May 2, 2016

I think people are really underestimating how poor CPU performance improvement has been recently; it's been totally masked by I/O (SSD) and RAM improvements.

Before SSD it was common to see a CPU wasting >50% of its time waiting for IO. Now you rarely see iowait being a problem.

abecedarius · on May 1, 2016

> For a 10nm chip, it takes $120 million for the design cost, plus 60% for the embedded software.

I don't understand: 60% of what? What embedded software?

adapteva · on May 2, 2016

Put simply "stuff". Take a look at Fred Brooks, mythical man month if you haven't ready it already. The projects that cost $250M will have over 1,000 engineers involved. Running a tightly coupled engineering project of that size is impossible in my opinion, thus the cost. Figure that 50% of the budget goes to communication overhead, 25% to management overhead, and you are left with 25% of true engineering costs.

Lind5 · on May 1, 2016

Pain Points at 7nm also an interesting read http://semiengineering.com/pain-points-at-7nm/

cft · on May 1, 2016

One interesting consequence is that this will probably impose a ceiling on Google's growth.

MichaelBurge · on May 1, 2016

> Then, there is the cost-driven decision. This is for foundry customers who may have lower volume products. They may not recoup their investment at 10nm. So, it makes more sense for them to skip 10nm and move to 7nm.

Why is it cheaper to design a 7nm chip for a given performance than to design a 10nm chip? The article mentions the yields on newer processes are lower, so I would've expected them to be the performance option not the cost option.

quickaccount · on May 1, 2016

It isn't. But if you move straight to 7nm you only need to recoup the costs for 7nm, not 10nm + 7nm.

petra · on May 1, 2016

Not an expert, but even if it was only a recompile, the development tools and fabrication masks are a decent share of the development costs, tens of millions.

dboreham · on May 1, 2016

Cost is often driven by #die/wafer, which in turn is driven by die size which in turn is (often) driven by feature size.

freekh · on May 1, 2016

I wonder if this will create create a market asics and FPGA when the lasts drops of perf has been squeezed out of the cpus (and GPU)?

zanny · on May 1, 2016

Once we hit the wall of general CPU performance we will probably retrace a lot of old school custom hardware design. Expect a lot of the parts integrated into a CPU today to move back out, but simultaneously we will have to start shipping more sophisticated cooling solutions for denser CPU chips (since the main reason bridges were integrated were because dense cores were getting too hot for copper / aluminum fin heat transfer like most heatsinks).

It will take a really long time to actually squeeze out CPU optimization. We are still contending with most software not even properly scaling to four cores, let alone well implemented job queue systems for arbitrarily scaled cores. And once the chips start dropping in price as the node standardizes and companies can lay off the intense R&D investments, we can start introducing more heterogeneous compute clusters to drive performance cheaper even when the silicon itself is stuck until graphene takes the market.

amirhirsch · on May 1, 2016

"Moore's law is dead" is mostly false every time it's written. I would expect fixed costs to drop for 7nm so that more designs can go ASIC and benefit from lower cost per transistor and higher performance. Seems like even if we were locked at 7nm, decreasing NRE for custom SoCs will yield performance improvements for a long time coming.

monocasa · on May 1, 2016

I mean, that's like saying it's not really a car accident because even though the front of the car is crushing into the divider wall, the rear of the car is moving at almost the same speed.

_7sdg · on May 2, 2016

Anyone else getting a 404?

nuand · on May 2, 2016

Yea, I am too