Ask HN: How did you become a hardcore back-end developer?

SpikeGronim · on March 5, 2011

I'll share my experience, which may differ from other people's. The largest system that I've worked on was Amazon S3. At the time that I worked there we were doing 100,000+ requests per second (peak), storing 100+ billion objects (aka files), and growing our stored object count by more than double every year. The most important skills for that job were distributed system theory, managing complexity, and operations. I can't explain all of these skills in depth, but I will try to give you enough pointers to learn on your own.

For distributed systems there are two main things to learn from: good papers and good deployed systems. A researcher named Leslie Lamport invented a number of key ideas such as Lamport timestamps and Byzantine failure models. Some other basic ideas include quorums for replicated data storage and the linearizability consistency model. Google has published some good papers about their systems like MapReduce, BigTable, Dapper, and Percolator. Amazon's Dynamo paper was very influential. The Facebook engineering "notes" blog also has good content. Netflix has been blogging about their move to AWS.

Every software engineer needs to manage complexity, but there are some kinds of complexity that only show up in big systems. First, your system's modules wil be running on many different machines. The most important advice I can give is to have your modules separated by very simple APIs. Joshua Bloch has written a great presentation on how to do that. Think about what happens when you do a rolling upgrade of a 1,000 node system. It might take days to complete. All the systems have to interoperate correctly during the upgrade. The fewer, simpler interactions between components the better.

The best advice I know of about operating a big distributed system is this paper[1] by James Hamilton. I won't repeat its contents, but I can tell you that every time that we didn't follow its guidelines we ended up regretting it. The other important thing is to get really good with the Unix command line. You'll need to run ad-hoc commands on many machines, slice and dice log files, etc.

How did I learn these skills? The usual mix of how people learn anything - independent study, school, and building both experimental and production systems.

1. http://www.usenix.org/event/lisa07/tech/full_papers/hamilton...

tptacek · on March 5, 2011

Logical timestamps are an extremely simple idea that knocked me on my ass when I first worked them into a system. Also a great thing to look up to get a "flavor" of how distributed systems work.

I feel like if you walk into a job interview knowing the corner-cases of a two-phase commit and being able to solve a problem using Lamport timestamps, you're probably in the top 90th percentile of dev applicants.

palish · on March 5, 2011

90th percentile?

Maybe ~99th.

The author has been developing software for 20 years. He is likely a fine applicant for a significant number of software dev positions, since he can learn and apply many different technologies very quickly, from what he has said. And also, from what he has said... he doesn't come close to what you described.

I've been developing software professionally for five years. I've been programming C/C++ for 10. (I'm 23; my passion has been for gamedev.) And I don't come close to what you described.

If I devoted myself to learning what you just described, I could probably achieve a thorough understanding (deep knowledge, an important distinction from superficial knowledge) inside a month. But at the end of that, it seems doubtful I'd be much closer to accomplishing the author's stated goal... I would only know two essentially random cornercases.

All of that said, thank you (and SpikeGronim) for mentioning Lamport timestamps; time to go a-wikipedia'n.

http://en.wikipedia.org/wiki/Lamport_timestamps

SpikeGronim · on March 5, 2011

Maybe tptacek just has a really great applicant pool ;). You also have to account for specialization. I have no clue how to do optimized gamedev.

If you want more Lamport goodies: "Paxos Made Simple" (distributed transactions done right): http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.69....

"The Byzantine Generals Problem" (harshest failure model known and how to cope with it): http://research.microsoft.com/en-us/um/people/lamport/pubs/p...

An interview where he talks about his approach to systems, particularly formal reasoning and specification: http://www.budiu.info/blog/2007/05/03/an-interview-with-lesl...

His publication list - http://research.microsoft.com/en-us/um/people/lamport/pubs/p...

tptacek · on March 5, 2011

Being able to implement a working 2PC makes you a distributed systems programmer. It's not a piece of trivia.

It won't take you a month to learn these things, but with commit protocols in particular, you have to implement and test them to really grok them.

SpikeGronim · on March 5, 2011

One follow up re: how I personally learned distributed systems. This was a great course, and the lecture notes are public: http://www.andrew.cmu.edu/course/15-440-sp11/index/lecture_i... .

code_duck · on March 5, 2011

> The usual mix of how people learn anything - independent study, school, and building both experimental and production systems.

Exactly, that along with being in the company of people who know more than you. Studying and experience. One learns things like this incrementally, like anything else, and nothing is better than a good teacher.

Zeu5 · on March 5, 2011

Hi Spike,

I am asking a question on continuous deployment here. http://news.ycombinator.com/item?id=2288382

Was wondering if you can spare some advice for me?

Thank you.

SpikeGronim · on March 5, 2011

Here you go. http://news.ycombinator.com/item?id=2290675

YuriNiyazov · on March 5, 2011

hi spike!

SpikeGronim · on March 5, 2011

Hey Yuri long time no see. Facebook me if you're ever in Seattle.

jerf · on March 5, 2011

1. Find bottleneck.

2. Remove bottleneck.

3. Repeat.

4. Every once in a while, make a bold move to throw something out that can no longer work that way and replace it with something more scalable. But while this is important, it comes up less often than you might think.

The difference is that you spend a lot more time in that loop than a desktop dev, but if you understand programming it isn't a special black art until the very, very top end.

The other thing to get is that it's always about buying time rather than solving the problem forever. The goal is to have bought enough time that you don't have to be stuck in a local optima or make panicked decisions.

ltbarcly3 · on March 5, 2011

This is a good point, to which I would add:

Architect your system so that you have visibility into where it is breaking, and that no piece has more than one simple job. Otherwise you will end up spending all your time trying to figure out where the bottlenecks are, and every bug will take a day or more to track down. Any part of your system that is complex will basically not be fixable, since nobody will know what the consequences of any change actually is until it breaks something else, which will then take another day to fix, and yes this logic does lead to an infinite chain of days fixing bugs caused the previous day.

petervandijck · on March 6, 2011

Small pieces, loosely connected.

peterbotond · on March 7, 2011

... and have at least one other os than prod and just fiddle with it, fiddle with compiler switches, optimisation random testing. occasionally look at the results and see what they mean. for example, freebsd's jails can help setup many different things. use llvm/clang and look at the warnings. these help me find potential problems. what's better than the clairvoyant code-improver/bug-fixer. :-)

ismarc · on March 5, 2011

Ride on others coat tails, stand on others' shoulders. It's not that it's any harder, it's that the skills used day to day are different. The single skill I picked up that served me best was being able to rationalize about what complex , highly concurrent code was doing and the performance implications of it. And I got this by reading code, and not just little programs, but things like the udp packet handling in the Linux kernel, or the storage and firewall rule insertion mechanisms for iptables.

But, nothing beats working directly with geniuses. Earlier this year I made a change (at my last company) that increased the number of simultaneous users by well over an order of magnitude. The change was known and had been tried by others in the group, but was deemed infeasible. I didn't come up with the magic change needed, I found how to apply it. And what I learned in the process is applicable outside of that. Without working directly solving the problems, it's hard to learn how.

CyberFonic · on March 4, 2011

For me the path the heavy duty back-ends was Unix and C. Most of the work for large corporations, in addition to the mainframes, involves big systems; IBM: AIX, HP: HPUX, Sun: Solaris. Helps to know a bit about storage: EMC, Hitachi, NetApps, etc. And of course databases, DB2, Oracle.

The best news is that these days, you can build up these skills using a $1k box with Linux or BSD. Years ago, you needed to get a job first because systems were in the order of $millions and they wouldn't fit in your average spare room.

You'll also need to demonstrate so CS/SE chops, because mucking up a big back-end system is not like a web page that occasionally crashes, it can cost $10k's per hour while it's down.

davidhollander · on March 5, 2011

I would start by viewing it as tree structure optimization problem. Draw a tree where each node is a physical server and the root node is the domain name server. Now try to maximize throughput of random lookups while minimizing height (complexity). For each level of the tree, come up with a list of everything you can think of that might affect the traversal (processing\lookup) time when a node (server) in that level is entered. Also create a list of everything you can think of that might affect the lines (connections) between nodes. This exercise should give you a good idea of what you need to learn and help generate more specific questions.

ltbarcly3 · on March 5, 2011

There are maybe 20 people the world who 'know' how to scale a website up to millions of users. There are lots of teams of hundreds of people who actually do it.

Don't get worried that you won't be able to go in and run the show on the first day. There isn't any secret sauce, and sites that scale to this level are so rare that they probably each have their own arcane and complex way of doing it that has evolved over years of people trying different approaches and failing.

Anywhere that is worth working isn't looking for someone who knows how to scale a website to millions of users, they are looking for smart people who can contribute. Their development budget is probably in the millions of dollars per year, they will be more than happy if you can help.

TLDR; Nobody is going to write a book on this, since only 500 people in the world would benefit from reading it. There is no single answer.

To address the specifics of what you are asking, there is basically a balancing act of consistency vs performance. You need to find the exact balance that is 'good enough' for every problem. The oft quoted 'there are two hard problems in CS, cache invalidation and naming things' pretty much sums it up.

mathgladiator · on March 5, 2011

The simplest way is to just do it.

You are fortunate that you live in the age of cloud computing. For instance, you can spend $10 for a day and get access to more compute resources than most people could hope for after months of budget proposals.

Find a problem, solve it, launch it, test it, find bottleneck, kill it. Repeat this enough times and you can start to a feel for where bottlenecks will happen and how fail happens.

chintan · on March 5, 2011

I second this.

I wrote a distributed crawler from scratch with 10 EC2 machines. It was one of the best learning experiences ever!

fingerprinter · on March 5, 2011

I've was mostly a web guy, riding the internet from '94 until about '06 when I started to get into more serious stuff...up until that point it was C, Perl, Java etc , but it was mostly pushing business data around, which is what I think 90% of all commercial programming is these days (so don't knock it...it pays the bills).

In '06 I joined a startup and we needed to scale. I hadn't had experience with this stuff and neither did most people on my team...so here is what we did.

* Try new things, but basically find out what most people are doing that have already gone down this path (stand on shoulders of giants, as someone mentioned)

* Read, read, more reading...talking to other devs...network...DO NOT REINVENT SOMETHING (I also call this the Kiss of Death). Unless you are Google, Amazon or Facebook, use off the shelf if you can.

* Use technologies that will work for your problem. We chose Erlang for ours b/c it of what we were doing. Something like Java would have worked, but would have made the job 10x harder. C would have been ideal, but we would have to reinvent nearly all of Erlang, so just choose Erlang.

* LEARN about things like good architecture design, SOA and failure (when a system goes down, what happens...).

*Invest in a good test suite or test infrastructure, but realize that it will be nearly impossible to test at scale.

During that time I felt like I was constantly reading every paper I could find, blog on scaling and back-end systems and talking to every dev or had ever done it. It was work, but not the type normally associated w/ dev....but was 100% worth it.

diego · on March 5, 2011

I started writing my story but it became too long so I posted it here.

http://dbasch.posterous.com/how-did-you-become-a-hardcore-ba...

Tl;dr: in 1998 I created an mp3 search engine that got significant traffic, had to learn on the fly, ended up going to Inktomi where I joined a team tackling much bigger problems. We all learned a lot over the next four years.

jsarch · on March 5, 2011

@andywood,

Can you take a moment tomorrow and add an edit to your post giving a summary of whether you felt the comments answered your questions?

I ask simply because my first read of your post focused on "How do I get there?" and not "what was your path?" As such, I was surprised to be reading life stories of fellow HN'ers. Since we all absorb info differently, I'm curious to know if the stories helped and what you gleaned from them.

All the best in your endeavor. -- A fellow large-scale enthusiast.

andywood · on March 5, 2011

For some reason, I don't have an edit link for this post anymore, but I can answer this right now. All of these responses are exactly what I was looking for, and then some. As far as the phrasing, I'm equally interested in direct advice like "read this paper", and personal stories. I've always been able to intuit how to go about learning any given topic in computing, whether languages, game programming, HTTP, Win32, or what have you. I don't know exactly why this subject in particular seems more esoteric to me - probably a product of my background - but it does. I wanted to know how others learned. Before this thread, my best answer would have been "Get a job at Amazon or Google as a front-end dev, and try to work my way into the back end." Now I have papers to read, algorithms to learn, topics to explore, and ideas about setting up a toy environment for learning. So yes, all of the answers are definitely helping, and I hope a few more people will add their stories. A big thank you to everyone.

petervandijck · on March 6, 2011

To boost your resume, you could work on some of the large-scale open source systems (nosql etc.) That'll look good, and get you some good experience too.

You can run 1000 servers for an hour on Amazon for fairly cheap. If you use that to do some testing/benchmarks etc. of popular nosql systems, for example, and then write about that, you can create some notoriety in the big-systems world fairly fast.

Good luck!

Pahalial · on March 5, 2011

When you discount "learn on the job" and "read books", i'm really not sure what's left, or what you expect the people who have achieved success by doing these things to tell you (while omitting those things.)

andywood · on March 5, 2011

My intention was not to discount them at all. In fact, I've learned everything I know about web development on the job. I just didn't want the existence of obvious answers to deter anybody from sharing the details of their individual experiences. And all of the responses so far have been exactly the kinds of things I'm looking for.

ochekurishvili · on March 8, 2011

By mastering technologies I didn't know.

known · on March 5, 2011

Try developing using http://www.stoneridgetechnology.com/products/pci-e-developme...