Hacker News new | past | comments | ask | show | jobs | submit login
Software Development Estimates: Where Do I Start? (diegobasch.com)
143 points by clarkm on Sept 15, 2013 | hide | past | favorite | 98 comments



An analogy I like to use: "Here's a book of sudoku / crossword puzzles. How long would it take you to solve all of them?"

Estimating this is similar to estimating a software system. You've seen puzzles like them before, but you don't know the difficulty level of each or which ones have tricky clues that cause you to rack your brain trying to find the solution. It also nicely illustrates the 10x effect: some puzzle solvers breeze through them while others take forever.

If you could break a software system down into a rigorous formal specification that could be precisely estimated, it would be possible to use that specification to build it automatically. Then that shifts the estimation up the ladder of abstraction to one where you estimate the time to create the rigorous formal specification.


Suppose you're asked to estimate a single sudoku problem. That's basically impossible, because n=1. In small samples from a large population, variation is enormous.

But what if I ask you to estimate the time for 100 sudoku problems? For 1000? What if, after 2 years of solving sudoku problems full time I ask you for your estimate? What if I develop simple statistical models based on the number of squares that are already filled in?

Already it has become possible to improve your estimate, to make it more accurate. The value of an estimate is balanced on the line between the cost of its uncertainty and the cost of its development. Over time, for any class of problem, the cost of developing estimates falls.


> What if I develop simple statistical models based on the number of squares that are already filled in?

What if you were unable to? NP-hard problems are weird like that. What if the underlying statistical distribution had random variance or some other strange features that make traditional statistical models useless?

Maybe the issue is that we're trying to use intuitions developed with everyday problems that are essentially linear ("mow the lawn", "drive to the store") in places where they're not appropriate. Unlike usual everyday tasks, NP-hard problems (such as Sudoku or coding) have no known "royal road" - sometimes you just have to try all possible solutions.


NP-hard problems often have "good enough" solutions that don't take heat-death time to solve. Consider the TSP. You can wait until the last proton decays and tie up all matter and energy in the universe.

Or you can use an ant colony optimiser and accept that you will probably only come to within a few percent of the best possible solution. Oh noes!

And so it is with estimation. We can build an ever-more-elaborate estimate that will consecutively shave fractions of a percentage point of precision and accuracy. Or we can accept that they are estimates, that they are meant to be imprecise and that accuracy is a range, not a number.


Definitely true that estimates can be improved from some hypothetical floor of quality..

Estimates still come with a cost in themselves though. They're not an end, merely a means. So we should consider alternatives.

An alternative approach I've heard from some teams, though haven't had the pleasure of trying myself, is to break your work into similarly sized small chunks and measure team speed over time in terms of these chunks. This sounds a lot like relative point based estimation often seen on agile teams. However, the difference is that once you establish this sizing rhythm during analysis (with the help of estimation early on to hone in on sizing) then you skip estimation altogether.

I suspect this needs to be calibrated from time to time ("our stories have gotten too big") and exceptions need to be made (e.g. "there's just no way to split this work up and keep it releseable") but those scenarios are exceptions, not the norm.

A world without estimates sounds great to me.


> However, the difference is that once you establish this sizing rhythm during analysis (with the help of estimation early on to hone in on sizing) then you skip estimation altogether.

I'd argue that really what you're doing now is estimating the size of any given chunk. Chunks that are estimated to be too big are then decomposed.

But you still make some judgement about how big each chunk is before you actually queue it up.

> A world without estimates sounds great to me.

Most people estimate, they just don't think of it as estimating. Similarly, most people can instinctively perform differential calculus; they just think of it as, for example, "catching a ball".

I think that you're right that estimates are never ends in themselves and that their net value needs to be considered. In a sense, you perform a meta-estimate of an estimate's cost and return before doing one. Is it worth developing parametric models of when you will be home for dinner? Probably not, dinner will be cold and moldy by the time you get to it.


"What if I develop simple statistical models based on the number of squares that are already filled in?"

Better to just have the experience to look at a Sudoku and have a good feeling for how difficult it is - it's not just the number of filled in squares but their locations and their values.

After solving 1000's of Sudoku problems, experience will allow one to see the relevant patterns or rather have a feel for them when looking at a particular Sudoku. The process is sped up if one starts off estimating Sudoku solution times from day one and then compares those estimates to actual solution times.

A longer way to developing meaningful estimates (and I prefer meaningful over accurate, since a good estimate can still be very wrong), is to avoid estimating one's time as an end in itself.


> A longer way to developing meaningful estimates (and I prefer meaningful over accurate, since a good estimate can still be very wrong), is to avoid estimating one's time as an end in itself.

Could you elaborate? What would one estimate on a book of sudoku problems?


Who is the audience for the estimate?

An estimate of how long it will take me to solve 1000 Sudoku for recreation will not have much meaning to a client who just wants the solution.

What are the resources that will be brought to the task?

An estimate for solving 1000 Sudoku with a staff of professional programmers will be different from one premised upon bribing a dining hall full of seniors with coffee, juice and donuts down at the retirement village.

What is the unit of estimation?

Calendar days until delivery is different from man-hours because I spend so much time on HN.

The value of an estimate is the integrity and relevance of the process by which it was generated not its accuracy. Accuracy is a fortunate side effect - and then only sometimes. The key is that accuracy correlates to experience at making estimates, and rationalizing the failure to go through the process because past estimates have been inaccurate is the long road I mentioned.


Good analogy. But I would add "this book and the next one that I create" and "you can have help from whoever happens to be with you"

In the best case there is still a little "unknown unknown" about both the scope and team. The project manager can tighten things but you can never completely get rid of it.


Most people would agree that practice makes perfect, right? I think a big issue with software estimation is that it is hard to get practice. I'd done full project estimates for 4-6 projects in 3 years of working professionally. Compared to the amount of practice I have writing code, this is nothing. How can I ever hope to get good at something if I only do it once every 3-6 months and it might take 2 years to get feedback on if my final number was close to the actual cost? And unlike other skills, I can't practice on my own or take a course (I've never seen an open source project that did round-trip estimates, maybe that's something to try out...)

Another issue is when you have an "indirect estimate" - basically I will estimate the work, but someone else is the one that ends up doing the project. If you aren't careful to consider who will be doing the work, you might estimate too low (if you are an expert and the work is done by a bunch of new hires).

And none of this even touches on misinterpreting client demands, scope creep, dev team turn-over, or often neglected timesinks like documentation and meetings.


Perhaps you could try scaling estimation down, instead of up.

For example, in the SEI's Personal Software Process you estimate very frequently. Like agile, it works by attacking big problems in lots of small pieces. So instead of just doing the 4-6 whole-of-project estimates, you can also do estimates for stories, estimates for modules and so on.

The other thing to consider is re-estimating as you go. The Cone of Uncertainty demonstrates that as a project progresses, uncertainty about the problem and solution domains diminishes and so estimates can be made with tighter ranges.


> Most people would agree that practice makes perfect, right?

Probably, if you're building the same software system, over and over. If you're building a different one, no I shouldn't think it does.


My main trick these days for this problem is to discourage estimation by pointing out the costs of it and putting the burden on the requesters.

One great way to do that is to release early and often, allowing stakeholders to change plans in response to what they've learned. Instead of doing the work of (re-)estimating a bunch of stuff every week, their focus on what's actually going on lets them stop obsessing about arbitrary dates and semi-fictional plans.

People very rarely need estimates. They want a sense of control. They want to manage risk. They want to convince people that money is being well spent. They want to avoid looking like fools. If you solve those problems without using estimates, people generally stop caring.


In the enterprise I work in, people need estimates in order to manage resourcing - which is not something unique just to the software industry.

We always have a prioritised list of projects but only a limited number of developers, so for us estimation lets us know when a developer is available to work on the next project.

Estimation is important and before any project is accepted we go through maybe a day or two working out the main application structure, breaking it down and providing rough, high-level efforts. Then we factor in the usual holidays, risk buffer (illness, estimation uncertainty), rigging, testing, release etc. Once the project's accepted then a more accurate plan is delivered after we have a design and task list ready.

The main thing I've learnt is people are far more willing to accept over-estimated plans at the beginning rather than extend a project that's run overtime.


Resourcing and estimating against plans is one approach, but it's not the only one. You can also use a flow-based approach. Basically, you have teams with queues; whenever a team is out of work, they pop something from the queue and work on it until complete. Or, if the organization is more subtle, you release early and often, ending a project when it looks like the team can be more usefully deployed elsewhere.

Mary and Tom Poppendieck cover this approach a fair bit in their books.


I agree that this is a good alternative; I certainly prefer it.

I prefer to think of it as a derived factory. It used to be that we thought of software projects as being like a factory, which eventually turned out the end product: a single piece of software.

Now we've taken the first derivative in a calculus sense and realised that the actual output of this conceptual factory is changes to software. The manufacturing analogy makes more sense in that respect; though concepts like SPC are hard to map because normal variation of software production can be very high indeed.


I agree with that estimates are largely about a [false] sense of control, but this is a tough sell when dealing any company with budgets and stakeholders that have to go ask their boss for a finite number of dollars to trade for software.

Often times, this is a one-time ask. You get money allocated once, so you better get enough to build the whole thing. You can't go back and say, "hey boss! look what we got with the first $10k, can I have $10k more?"

If you are a startup or dealing with a 4-figure project from a local business, then sure, you can probably skip estimates.


Sure, but that's an artifact of traditional budgeting. In the Lean world (as in Lean Manufacturing), traditional budgeting is considered a dangerous and misleading activity.

Another way to work it is to have a backlog of valuable things to do that get handed out to teams. When somebody needs something to do, you pick the highest-ROI project from the backlog and give it to them.

In that context, detailed specs and careful estimates generally turn out to be wasteful. If you're trying to pick projects by expected ROI, then making the software estimate more precise than the (generally handwavey) business value estimate is pointless.


> you pick the highest-ROI project from the backlog

How do you calculate the return on investment without estimating the investment?


Well, one option is time-boxing. You say, "We'll put a team on this for up to two months: they get done what they get done." It's a fixed investment; it's up to the business stakeholders to judge what kind of return they can get out of that.

But in the this context, by "estimation" I mean "formal estimation". Having developers glance at something and ballpark it is basically a zero-cost activity. As long as that I number has about the same error bars as your R number, then it's perfectly fine for ROI. Especially given typical requirements volatility, and a project approach that sharpens ROI estimates as you go.


You don't have to know the ROI value of each project, just compare their relative sizes to select the best from the set of projects with the highest profit ranking.

This assumes, though, that the best of the projects is worth building in the first place.


How do you know their relative size? Isn't that an estimate?


> Often times, this is a one-time ask. You get money allocated once, so you better get enough to build the whole thing. You can't go back and say, "hey boss! look what we got with the first $10k, can I have $10k more?"

This is an area where software engineers who've made serious profits and turn around to become investors have an edge over traditional that do not understand the levels of risk involved with large software projects.

Explaining the granularity in the types of risk that are involved is complicated to people that expect to write a check once and have a finished product come out of it.


That does depend upon your circumstances however.

Often there are hard dates that can't be moved - because they are external to your organisation; the autumn TV schedule, or launching at a particular conference are two that have hit me in the past.

Of course that does make the estimation a slightly different process - not "when will this be done" but "can you be ready before this date?"


Sure. But instead of estimation, one can take a risk-management approach to that.

For example, I'll get people to talk about everything they want. I'll put each thing on an index card. I'll have them lay the index cards out in order of importance (or ROI) on a big table. Then I'll ask: "Draw a line where, if you didn't get that far, you'd just bow out of the conference."

Typically, that line is pretty early, 10-30% of what people would really like to have by the date. So then you take the backlog, make sure it isn't too lumpy, and start working. Just by measuring cards/week, you can pretty quickly know whether you'll hit the date. And typically, the team will have something shippable well before the date.

The question then becomes: when is the ROI for the next week's work going to be lower than the ROI for something else we might do? And that question ends up being pretty easy, because you've pushed the decision off until you have a lot more data.


That's one thing I really like about agile: putting work in order of value, not in order of build efficiency.

It's a bit like nature-inspired computation approaches to problem solving. They tend to introduce a lot of overhead that more direct algorithmic solutions won't, but you can terminate them at any point and get something. That flexibility is often very useful.


but "can you be ready before this date?

IMHO right answer to this is, Yes we can be ready but we might have to scale down on some of the features. But in reality most of the time only the initial 'Yes' part is heard with the caveats being ignored. As mentioned many a times it is "quality, features, schedule" and one can pick only two.


There's a lot of research in various industries that suggests quality is correlated with schedule. Poor quality -> more rework -> schedule blowout.


It's not only about control and risk management. In external-facing products, other departments (sales, marketing) need time in order to prepare materials, etc. in order to plan their own projects. Telling these groups "it'll be ready when it's ready" doesn't really help them.


Not sure where I first heard this (and it applies more to single developers), but I find this very true: when estimating the time needed to perform something, try imagining how much time ANOTHER person would need. In general we underestimate the amount of work when thinking in first person; we are much more accurate when assessing the ability of others.


Knowing a bit about estimating construction, and a bit more about the estimating of design which is more akin to software, there's more to the ability to execute according to a time table than the author gives credit...and the fact that the process can be separated between design and construction gives a clue.

Mature processes for delivering construction start with a budget and something called a program - an architectural program being a design and implementation independent description of the project's components and the relationships among and between those components.

Then there is the matter of age. Architects aren't worth a shit until they hit about sixty. Those running big designs have decades of experience. In the US the median age for initial licensure is 33. It's not twenty-something's freshly out of college, or even parents with grade schoolers organizing the process.

Then, in the US, there tend to be standard contracts which describe industry standard milestones and acknowledge that nobody really knows how things will change over what is often a multi-year cycle. Buildings are delivered reasonably on time and within budget because the process of contracting for the work doesn't require reinvention - even the plumber's subcontract is a standard form and tied to CSI format specifications which are tied to industry standards and to ANSI materials standards and the building codes.

Nobody roles their own using the coolest new fad. It's Java, not Haskell. If you read Hamurabi, you'll see the source of those traditions.


The software process can not actually be separated into design and construction. Software is a 100% design activity. Construction is what our compilers and installers do. Projects that appear otherwise have created artificial barriers to prevent certain kinds of design at certain times.

You're also missing that repeatability in software is a sign of waste. If developers are doing things that are truly well-understood and predictable, then they're doing something wrong. They should download a library or extract a framework, automate the predictable parts, and shift their attention to something worthwhile.


As an architect, my business is 100% based on estimating my design time. Within that design time I also estimate construction time.

What allows me to do both is not only my personal 20+ years of relevant experience, but also the collective experience of the industry and design methodologies that reflect that experience. I.e. architectural designs are delivered in moderately well understood stages of increasing concreteness and there is an established language for communicating designs to clients (plans, renderings, project manuals).

Furthermore, repeating the same operations in the design of a building is no less a waste than in the design of software. The same may be said for actually constructing a building. Programmers have no monopoly on the ideal qualities of laziness, impatience and hubris. It's just through experience that I have learned that at some point the nail gun has to come out if the plywood is to go up.

Or the pencil gets put to paper if there's going to be a design - and note that it is not uncommon for good designers in the software world to start on the drawing board not the computer.

And this, I think gets at some of why people have so much trouble estimating software. They start designing the building having already decided on its structural system and hvac layout and light switch locations.

Choosing Django over Rails over ASP.NET before diving into the problem means that the tool has to be made to fit the job rather than delaying that decision.

Now, I'm not suggesting that this is always the case. But to put the difference in collective and individual experience in perspective, one of my mentors [a formal mentor as part of the formal process of becoming licensed as an architect] was Ronn Ginn, an architect known locally but not somebody anyone has heard of. Anyway, Ronn is still practicing. He started practicing before there was FORTRAN.

Think about how many Ask HN's there have been based on the idea that 28 is too old.

Frank Gehry, who you probably have heard of, is about the same age. His first project to receive any sort of attention came when he was 42 in 1971. He was nearly 70 when Bilbao gave him international fame.


One thing my Dad always told me about software estimates: "Take the estimate, double it, and increase the time unit by one. So if they tell you it's a couple of days effort, that's really four weeks..."


I came here to say this. I think there is more than a kernel of truth in it - it helps you remember that effort almost always never equals duration.


Use the pi rule. Multiply the estimated duration by pi. Works for many other things too, besides software.


Multiplying by a value between 3 to 4 seems pretty common - the explanation I've heard being:

- One to work out what you should be building ("I have to tel you what I want? I might as well build it myself")

- One to build it

- One to get in working in the required way (allowing for "That may be what I asked for, but it's not what I want")

And a bit more for contingency...

NB The above approach is based on an assumption that the real difficulties come from working out what to build - technical risks are generally easier to address (prototypes etc.).


A gret book on that is Agile Estimating and Planning by Mike Cohn[1].

Having that said, there has been a lot of controversy on the value of estimates recently:

Estimation is Evil[2]

Purpose of Estimation[3]

I believe their TL;DR comes from the second one:

For me, estimation is valuable when it helps you make a significant decision

[1] http://www.amazon.com/Agile-Estimating-Planning-Mike-Cohn/dp...

[2] http://pragprog.com/magazines/2013-02/estimation-is-evil

[3] http://martinfowler.com/bliki/PurposeOfEstimation.html


> The gist of why estimates are hard: every new piece of software is a machine that has never been built before.

I am tired of the argument that we unique snowflakes amongst all professions and that consequently we deserve special treatment. We aren't. All professions deal with uncertainty and most of them deal with it deliberately[1]. Throwing your hands in the air because a perfect prediction is impossible is just silly; an estimate is by definition an uncertain statement of an unknown variable.

Estimating software can be difficult because there are many points at which complexity can be multiplied (McConnell's example of the requirement "validate phone numbers" shows a span of minutes to months)[2]. But when you see very wide ranges on an estimate, it's a signal that the problem is poorly understood.

Outside of genuinely novel research, improving estimation accuracy is possible and valuable. And how often do we invent publishable new algorithms for graph traversal?

I have skin in the estimation game, as I am currently developing a tool for performing estimates[3]. I've also been researching the topic of estimations generally.

I'm mostly amazed at how people in my profession try it once, get an inaccurate result, and then decide -- "That's it! It's impossible! I tried it one time and it didn't work!". If you go into a gymnastics club and you can't do a backflip, that doesn't mean backflips are impossible in principle. It means you can't do it. Estimation is a skill too.

[1] See Petroski's To Engineer is Human, I reviewed it here: http://chester.id.au/2013/07/07/review-to-engineer-is-human-...

[2] In Software Estimation: Demystifying the Black Art.

[3] http://confidest.com/


> I am tired of the argument that we unique snowflakes amongst all professions

Aren't we, though? Zero marginal cost of production is at least very rare. It basically means that to the extent that our projects aren't novel, then we're wasting time duplicating something that ideally would have been copied from elsewhere.

And that not even getting into the very high rate of change in tools, techniques, and materials (e.g., http://www.jcmit.com/mem2013.htm), which, if not completely unique, is certainly unusually high.

> But when you see very wide ranges on an estimate, it's a signal that the problem is poorly understood.

That is occasionally a sign of idiocy. But at least on the projects I see, it's much more often a sign that it's an interesting problem. Indeed, if you're delivering software in a competitive market, it's guaranteed the problem will remain poorly understood, because your competitors will be doing their best to create changes in the landscape.


> Aren't we, though? Zero marginal cost of production is at least very rare.

Every profession can come up with a comparable point of uniqueness. Perhaps I should have said "special snowflakes" instead of "unique snowflakes".

> It basically means that to the extent that our projects aren't novel, then we're wasting time duplicating something that ideally would have been copied from elsewhere.

This is a good argument. It reminds me of DeMarco's argument that as time goes on, software engineering becomes less and less about process, because anything that is repeatable will be automated. All that's left is the hard bits that can't be automated.

There is, however, an enormous amount of duplication in our industry. And even when we are adapting an existing system, we're still only rarely performing acts of genuine novelty. Carpenters produce many items, none of them quite identical, but they aren't inventing new methods of carpentry or entirely different kinds of furniture on every job.

> But at least on the projects I see, it's much more often a sign that it's an interesting problem.

I envy you.


> Every profession can come up with a comparable point of uniqueness.

Not in ways that have such dramatic effects on predictability of costs.

> There is, however, an enormous amount of duplication in our industry.

Sure, but if we're going to put our attention somewhere, I'd rather we work on eliminating that waste, rather than standardizing the waste so that we can better estimate the cost of doing work we didn't really need to do.


Your last remark is true, but a bit falsely dichotomous. The choice isn't between perfect estimates and zero wastage. In practice all projects lie on a continuum between totally novel and pointless duplication. Within that continuum we can draft estimates of varying accuracy and I don't see a reason why we shouldn't.


The amount of time we get is fixed. Estimation is, from the Lean persepective, muda (normally translated as waste). If I can be spending that time instead on an activity that reduces the waste of duplication, I'd rather do that.

I should be clear that I'm not totally opposed to doing estimates. I've done a lot of them, I'm good at it, and there are circumstances where I would do it again. Some waste is temporarily necessary.

What I'm pushing back against here is the common assumption (which maybe you don't have) that estimates are generally a good thing to do, or, often, the only way to do things.


> If I can be spending that time instead on an activity that reduces the waste of duplication, I'd rather do that.

Estimates can prevent waste by giving us a better of idea where to redirect effort. Time is, as you say, doled out at a constant rate.

The highest ROI possible for an estimate is to prevent wasteful effort from proceeding.

Timeboxed systems still perform estimation, you just hold different variables constant. In flow-based systems you can perform estimates by taking an integral of the current output.

My point is that estimation is always present, even when we go out of our way to say that it isn't. They lie on a continuum from gut feel to supercomputers and conference papers. Estimates are always present because humans must always reason under conditions of uncertainty; the future is strictly unknowable.

Part of the confusion here is that I think there are two or three definitions of estimate in this thread. I am talking both about the universal case and the methodical estimate-with-an-E case. The universal case is that all decisions leading to action or inaction involve estimation. The methodical case is that estimation for software development is possible and tractable outside of genuine research.


I think that it is on a spectrum from sysadmin work to software development work. The closer something is to engineering doctrine, the closer it is to systems administration. If company A wants a Linux VM with Apache and some static pages, there is some per-customer work versus company B, but it is very easily estimated. All of the differentiation lies in cost or customer service at that point. And there is a lot of corporate programming that gets close to that, and is more estimatable than other stuff.


If you go into a gymnastics club and you can't do a backflip, that doesn't mean backflips are impossible in principle. It means you can't do it. Estimation is a skill too.

Cute metaphor, but I'm calling bullshit. Fundamentally, if something is completely new, sometimes nobody knows if it's even possible. Granted, that's almost never the case for software algorithms. But holistically speaking, in real software-centric systems, it happens: particularly when hardware, delivery, external system integration, some form of regulation, unique knowledge individuals, utilities, or any other form of third party are involved.

In the real world, people truck on anyway with relative confidence and frequently reform an informal estimate along the way. The waterfall model sux. Estimates are perhaps for many software projects tied to waterfallesque models and the idea of that most loathed of middle-managers; the non-technical project manager[1]. Perhaps with the 'agile' trend we're finally shifting beyond that. A valuable insight from a project management perspective might be We can't realistically estimate[2] this; or even know if it's possible but a software person can do that rapidly and without wasting time hiring a project manager to estimate[2], and a backup can be prepared and/or developed in tandem.

[1] http://www.urbandictionary.com/define.php?term=project%20man...

[2] http://www.urbandictionary.com/define.php?term=estimate


The Manhattan Project was authorised because it could be shown that an atomic weapon was possible. Nobody had any idea how to build one or how it would work -- they just knew that a calculation based on runaway radioactive breakdown implied a large release of energy.

But how often are we doing a Manhattan Project, where research is a major initial output? Not very often at all. Throwing out estimation for all projects because some of them contain genuine, irreducible novelty is an example of the Nirvana Fallacy.

> But holistically speaking, in real software-centric systems, it happens: particularly when hardware, external system integration, some form of regulation, unique knowledge individuals, or any kind of third party are involved.

These are examples of uncertainty, not novelty. They exist in real non-software-centric systems too. The way you deal with them is to widen the ranges on your early estimates and then look for ways to reduce the uncertainty.

Edit for your edits:

> In the real world, people truck on anyway with relative confidence and frequently reform an informal estimate along the way.

Nothing about having a deliberate estimations process means "we will only do it once" (I suppose that's why you said it's a waterfall thing to do). In fact you should be re-estimating as you go to narrow the cone of uncertainty. One of the nice thing about agile methods is that this tends to be built into the overall loop.

> Perhaps with the 'agile' trend we're finally shifting beyond that.

Agile estimation works by frequently re-estimating. Traditional estimation works from size and then derives other measures. Agile holds other measures constant and then takes the integral of current velocity, which is ... size!

Even the word velocity correctly points out that both of these are two sides of the same bit of conceptual calculus.

> without wasting time hiring a project manager to estimate

Where did I suggest this? In software estimating the consensus is that the developers should be the ones who create estimates. They know the most about software development in this particular environment, after all.


All fair enough.

Perhaps what we can take from this constructively is that you could consider making the USP for your estimation tool more of a focus on uncertainty/overall project risk modelling more so than straight up timeframes, which we all know are pie-in-the-sky at the best of times and not really as critical as outright SPOFs/third party deps.


I'm taking the PERT 3-point estimation technique as my starting point (decompose the project, for each component, give best/likely/worst, roll up results with a formula). It bakes uncertainty and probabilities directly into the outcomes.

I'd encourage users to look at estimating project size first and then deriving effort, schedule and cost. But I won't constrain them to it.


PERT, eh? 1950s dies hard! It's good to have a model though. I would still ramp up the big-picture risk side. One benefit of that is that you could consider tying in to an operational risk management system with onselling potential to client projects post-launch. I am looking right now but finding it hard to locate anything meaningful in this area, probably as it crosses many units of conventional businesses. Also, you may like https://github.com/mdaines/viz.js/ which I recently found generates high quality SVG vectors from graphviz input in pure JS which you can make clickable/interactive and will print at any resolution: I'm using it in a cloud infrastructure management console, myself.


> PERT, eh? 1950s dies hard!

Reading the research has been fascinating. There's a bunch of papers which show that it improves estimate accuracy; still other papers where the problems with PERT are neatly highlighted (imagine the idea of a sub-critical path and wondering how many there might be).

One of my favourite facts about the classical PERT 3-point is that nobody has a good answer for why those particular formulae were chosen. They've simply passed into the literature as-is. Probably everyone looks, recognises something that approximates the normal distribution and mutters to themselves "yes, yes, I guess that's why".

Very silly, of course. Project outcomes don't resemble the normal distribution! So there's literature where this or that alternative formula is substituted for the originals. So I've made sure that part of the design is configurable.

> I would still ramp up the big-picture risk side. One benefit of that is that you could consider tieing in to an operational risk management system with onselling potential to client projects post-launch.

I'm hoping to keep it relatively narrowly focused, because there's the real risk of bloating by trying to invade multiple neighbouring problem domains too soon. Estimates have surface features in common with work breakdown structures, with quotes, with risk management ... you can get spread too thin too soon. I'm wary of that. I'm hoping to focus on making 3rd party integration as easy as possible instead.


I have heard that it is actually just as bad on large construction projects and that the trend there is to move towards more lean approaches.


Actually, the statistics show that for very large projects, what matters is making sure to be extremely thorough in the upfront stages. See Industrial Megaprojects by Merrow for a thorough discussion, backed by data.


My favourite book on software estimation is McConnell's Software Estimation: Demystifying the Black Art.

As usual he takes a vast body of literature and boils it down into a chatty, usable book. The tables and checklists are worth the sticker price on their own.


"As usual" for what - a book, the author, a book series called Demystifying the Black Art ?


As is usual for McConnell. All his books are excellent.


My key advice is to not pay attention to the minutia of the task, if you do you'll always underestimate. A better method is to forget the specifics and ask yourself 'If a buddy told me he was doing this, how long would I guess it would take?'. We have more experience hearing about projects and how long they took then we do really estimating the duration of creative tasks by reduction.


This is finger-in-the-air approach, which seems doomed to fail. For a start, it relies on your buddy having exactly the same career experience as you, and any new project being an almost exact match to previous projects you have worked on.

It doesn't take into account the unknowns, and doesn't break the task down into components that you could at least attempt to quantify.

Personally I go through a short mental breakdown of the project, trying to partition it in my mind into smaller jobs. Based on my years of experience I will then have a gut feeling for certain aspects of the project.

If any job is more than 3 days long, then I try to break it down further.

If I can't mentally break it down, then I need to spend some time researching why that is (investigate the unknowns).

Once everything has been broken down, and I have reduced the unknowns as much as possible I add contingency. Anything that I feel has the potential to be problematic I will add more contingency to, or add additional research time.


Completely agreed -- and I hit on the same themes on a talk on software engineering management (and its pathologies) at Surge on Friday[1]. The date fetishization so common in software engineering management is a direct result of non-technical management, who do not understand that every piece of software is solving a heretofore unsolved problem (even if a teensy tiny one) -- and that the unknowns often cause software to take much longer to build than one would anticipate. This gives a kind of CAP analogue for software: schedule, quality, features -- pick only two.

[1] http://www.slideshare.net/bcantrill/surge2013


There is a way to get a perfect estimate - and that is to build it and see how long it takes.

Everything else is using an abridged model is short-cut the process. However, I think this is an important point. Estimating is somewhat akin to actually building the end-product.

The reason I find this important is that the most common spiral of death I see is re-estimation. You take on something unrealistic, plough on regardless, realise too late, then you re-estimate. The re-estimation causes a project stall and takes time. The end result being you have less time.. Few months later you're in the same boat again (repeat).. Big organisations are really prone to this.


> There is a way to get a perfect estimate - and that is to build it and see how long it takes.

The concept of a "perfect estimate" is tautological, isn't it? Estimates are by their definition uncertain statements.


I believe that's his point.


Then the larger point is that if estimates are always imperfect, we need to stop obsessing about their always being imperfect.


Yeah, more or less :) I think admitting/deciding upfront what you want to be imperfect is a good start.


"The gist of why estimates are hard: every new piece of software is a machine that has never been built before."

This statement is simply not true. Every new piece of software can be broken down to pieces that are very similar to other software that had been done before. This is actually the secret to providing a good estimate.

What generally screws up estimates are complications that arise. Much of it actually has to do with pre-existing code as opposed to new code. If you were to build something from scratch and had a team of experienced engineers, I'd bet that you'd get a pretty damn accurate estimate.


Disagree entirely. If "Every new piece of software can be broken down to pieces that are very similar to other software that had been done before", you are talking about assembling library components, not programming (though as you say even that often comes with its own unforeseeable difficulties).

This is the fundamental paradox of software development estimation (and separate to the uncertainty of external forces): when you try to estimate development you are estimating design of something that has never been made before. If you aren't, you should probably be buying a product instead of building. It's a fool's errand basically.


One could argue the same for buildings: "Every newly designed building is simply composed of known elements: columns, pillars, windows, tiles etc". The composition _is_ the innovation.


"...The gist of why estimates are hard: every new piece of software is a machine that has never been built before..."

Yes, and that's why a lightweight, repeatable estimation process beats every other way of doing it. As you continue to estimate, you create and refine a mental model of the project's complexity. For some projects, you're able to create a mental model that has high fidelity quite easily. For others, it takes a bit of work. The entire article here was an exposition on this fact.

Some folks figure that out and want to just give up. Estimation is impossible! Other folks, however, are going to figure it out if it kills them. What happens in these cases is they start to list every possible variable that could be involved in such a model in every scenario, and then create one uber, ultimate, super model that works in all situations.

Over time and through lots of trial and error, both approaches have been found to be bullshit. Instead of giving up, or creating more and more complex models that take more and more time to work, with each project you're better off starting with the most ludicrously simple model you can and then adding complexity as needed. The trick is incremental complexity, repetition, and convergence. If you have that nailed, the details of your actual model, oddly enough, do not matter that much.

Obligatory link to previous comment: https://news.ycombinator.com/item?id=6389227


Interestingly, Thoughtworks have observed that over a project they get the same consistency simply counting number of tasks as counting effort estimates for those tasks. The key for project management is therefore to focus on maintaining a backlog of manageable tasks and throughput, not in estimating them individually.


We're finding this out in a lot of places. The key issue is variability. Larger sample sizes usually lead to less variability, which means flow-based systems will hold up better.

If you've broken your work out into tasks, instead of stories, presumably you have a lot of them, and they're mostly the same size over a large set. So sure, should work fine.

If, however, you have a small number of highly-variable chunks of work, then flow-based systems fail. It all depends on the nature of the item pool.


Summation can apparently reduce error without, actually, reducing error. I wrote an article about different ways of measuring estimate accuracy[1] that explains the difference.

It's a question for each business as to whether that matters.

[1] http://confidest.com/articles/how-accurate-was-that-estimate...


That's an interesting finding. Do you have a link to the research?



I agree that the estimation process needs to be tuned to fit the local context. There's no point building a 50-parameter statistical model for building a 3-page website.

But likewise, SWAGs aren't appropriate for megaprojects. A more involved process is called for.


Or one can just not do megaprojects.

An instructive example comes from one of Mary Poppendieck's books. The Empire State Building was never planned and estimated in the traditional sense. They picked a deadline and used a flow-based approach to make it happen. They were already building the lower floors before the upper floors were designed.

One of the problems they faced was that nobody had designed an electrical system nearly that size. So they split the building down the middle vertically and in three slices horizontally. That gave them 6 30-story buildings to wire, an understood problem.

Another good example is the Internet: a globe-spanning network that connects a large fraction of humanity. There is no central control, no central design, no plan. Much larger than any megaproject, but done without a megaproject's methods.

I agree that megaprojects use more involved methods, but I think "called for" is, as yet, unproven. I think it's more about the people and the social structures making the decisions than it is about what's being created. When all you have is primates, everything looks like a primate dominance hierarchy.


> The Empire State Building was never planned and estimated in the traditional sense. They picked a deadline and used a flow-based approach to make it happen.

Industrial projects tend to have operability thresholds; that is, they're not composed of relatively homogenous outputs. Each floor of a skyscraper is similar to the other floors. But a petrochemical process plant can't be built in slices; you have to a minimum amount of design, planning and construction simply to get to the point of turning it on.

> There is no central control, no central design, no plan.

You may have heard of the Internet Engineering Taskforce.

I agree that many systems have useful emergent properties born of simple rules; and where possible this should be used. Or rather, where such systems are found to already exist, they should be left largely alone.

However, sometimes solving a problem with interacting agents is more costly than simply picking the straight solution. If you can get an answer with a simple differential equation, then it's a waste of time to set up a particle swarm optimiser.


> But a petrochemical process plant can't be built in slices; you have to a minimum amount of design, planning and construction simply to get to the point of turning it on.

Poppendieck also gives examples of using similar processes for 3M's manufacturing plants. Regardless, I think drawing physical analogies for software is risky; software is infinitely soft.

> You may have heard of the Internet Engineering Taskforce.

Yes, I have. And either you don't know what they do or you're drawing a false analogy between traditional planning processes and what the IETF does.


> Regardless, I think drawing physical analogies for software is risky; software is infinitely soft.

Sure, but I also think that this doesn't imply infinite intractability for actual problems. That a problem is NP-hard, for example, doesn't mean we can't find quite-good solutions to it that have business value.

> And either you don't know what they do or you're drawing a false analogy between traditional planning processes and what the IETF does.

You said that the internet was not designed. The protocols didn't evolve without supervision. Every part of them was designed for a purpose.

The question here is whether you think I'm saying "complex systems with emergent properties can be estimated or planned". That's not what I'm saying. I'm saying that not all problems are complex and not all problem systems have emergent properties. Many problems are eminently suitable for estimation.

It does not follow that since in some cases estimation is going to provide very little net business value that we ought to do away with it in all cases.

Could you elaborate on the 3M example, or provide a link? I'd like to read more.


I did not say that the Internet was not designed. But I'm glad to say that now. Specifically, it was not designed in the sense that megaprojects requiring planning and estimation are designed. My point there is that people just assume that big achievements require big plans, but that is demonstrably false.

I do agree, as a long-time reader of RFCs, that some of the protocols were designed, sort of. But if you read more deeply, you'll see that they were just as much evolved. And, that they were never imposed through a central control structure. It's no accident that the foundational documents of the Internet are all Requests For Comment. An instructional contrast is the OSI protocol suite, a top-down alternative to the Internet. Now dead, of course.

I agree that some projects can be estimated. I'm saying a lot of them shouldn't be, because there are more effective ways to get results.

You can read more about that, and about the 3M example, in Mary Poppendieck's books. I think the specific one I have in mind Leading Lean Software Development.


> Specifically, it was not designed in the sense that megaprojects requiring planning and estimation are designed.

Right. Some systems can't be designed that way. But that doesn't mean that no system can be designed. And it doesn't mean that all systems are better off being designed or not-designed.

The reason megaprojects tend to be planned, controlled etc is because they haven't spontaneously emerged on their own. Somebody somewhere wishes to make a positive effort over and above the current baseline.

> I agree that some projects can be estimated. I'm saying a lot of them shouldn't be, because there are more effective ways to get results.

Agreed. A lot of the time a formal estimate isn't necessary. But a "lot of the time" is not the same as "always".

> An instructional contrast is the OSI protocol suite, a top-down alternative to the Internet. Now dead, of course.

There was a good history article on OSI in a recent IEEE or ACM magazine. The two major problems were an irreconcilable fight between circuit-oriented and packet-oriented designers (so they did both) and then lashings and lashings of vendor politics. The author of the piece argued that TCP/IP worked because it was driven by a small group of designers who just went ahead and cut code.

Generally the IETF model has worked because it's done by small groups focusing on a narrow problem in an environment of independent, interacting agents. Some systems work really well that way. Some don't.

> You can read more about that, and about the 3M example, in Mary Poppendieck's books. I think the specific one I have in mind Leading Lean Software Development.

I'll pick it up, thanks for the reference.


I think the problem is that developers(including myself) tend to make estimates assuming that they're not going to find anything that makes them say WTF! Code is never perfect and it's extremely difficult to know how much time it's going to take to dissect and restructure code to accommodate your new feature.


This is called optimism bias -- when estimating, we forget the things that can go wrong.


>> "The gist of why estimates are hard: every new piece of software is a machine that has never been built before."

What is not explicitly said here is that a good developer needs to know what would be a "new" piece of software. He should be at least vaguely aware of what has been built already, what is within the reach of the state of the art, etc. And just like in the stock market, it is impossibly hard to know all the needed information to come up with wise judgement, complicating the estimation process even more. Yet, this is an essential piece in estimation anyways. Just for example, for some projects you may simply declare them to be beyond the state of the art.


I address the problem by giving a range. Giving a single number is the real problem. For example I might estimate something as two weeks plus/minus six weeks, or a month plus/minus a week. That range is because of the usual factors (scope creep, things that have never been combined before, testing, meetings, holidays etc).

Sometimes the questioner picks up on the two weeks minus six weeks is a negative number. I then explain that existing functionality could be used, goals could be achieved in other ways, or maybe it just isn't really needed.


http://www.amazon.com/Software-Estimation-Demystifying-Pract... is fantastic for giving many approaches to the topic, and counseling for non-technical stakeholders where it isn't a negotiation, but math.


Always provide two quotes, the "shoot the moon" and the budget quote. The client will always cherry pick segments from the more expensive quote to arrive right at their budget. You can still loose to the next guy, but if you are in the ballpark, it won't be because you came in (a little) too high with this approach.



Good judgement comes from experience. Experience comes from bad judgement.


Minimum of 2 weeks for anything you think might take more than 4 days, if it's a day then double it. etc.


I really like this statement: "Every new piece of software is a machine that has never been built before. The process of describing how the machine works is the same as building the machine."

That's probably the best way of relating the problem to non-engineers that I've heard.


I ALWAYS quote fixed price projects regardless of scope. Some software engineers would say that's highly risky. For me, it tends to be highly profitable because my estimates are usually accurate. If anything, I've learnt to over estimate and impress clients by delivering on promises.

I am amazed by the number of people who try to portray computer science as a pseudo science like voodoo, where estimates are impossible because the problems are unknown. Just because information is abstract doesn't make it any more difficult to estimate than something physical. Computer science IS a science and software engineering IS engineering. Good engineering involves breaking problems down into small manageable pieces that are easily understood then planning how to implement them. This practice is called design. On more complex projects it's architecture - a sexier form of design.

Let's get back to basics. Who has heard of "the software development life cycle"? Everyone, I hope. It's modeled off something from the early 19th century called the product development life cycle. It needs to be mentioned because IT people like to think they're special and mysteriously invented SDLC. Feasibility, analysis, design, development, testing, deployment, etc. Agile methods are simply a minimal version of this repeated fast and frequently. Release the minimum viable product then iterate. Unfortunately, so called "hacking culture" has led people to jump straight to development without considering analysis or design. Real hackers design their attacks first. Successful hacks are well thought out and executed in small brilliant steps.

The analogies of sudoku in defining a problem and construction in defining a project are perfect, so I'll borrow those.

If someone presents me with a 1000 sudoku puzzles of varying difficulty, the first step in estimating is to DESIGN a solution. Allocate time to categorize each puzzle by complexity. Then estimate times based on complexity based on past experience. Anyone who says they can't do this because they have no experience with the problem should be immediately fired. You've obviously misrepresented your experience and/or skills as an engineer of sudoku problem solving.

If a problem is truly unique and novel (and very few are nowadays), then simply factor in time during the analysis & design to perform a few experiments that will more accurately help with estimation. Or training. OR HIRE SOMEONE WHO HAS DONE IT BEFORE to help with design and estimations. Then find the most effective resource to implement (solve) each puzzle.

As for construction, any idiot can slap a few lines down on a piece of paper and call it a design for a house or a building. That's the way some people are building software. Real architects and engineers exist because construction can be broken down into hundreds of thousands of pieces and estimated accurately. They know exactly how many bolts it will take and what types of contingencies to allow for. Software, although abstract, is no different. The software industry has been around for 40+ years and it's modeled off past industries like construction and manufacturing. It can be accurately designed and estimated.

Once a clear design exists, estimation is easy.

My preferred estimation method is function point analysis. It's simple. Break each problem down into the smallest manageable piece. Any piece that takes longer than a couple of hours is too large and should be broken down further. That's an excellent rule of thumb. Anything longer than a couple of hours also suggests the design was poor.

Of course, in some organizations this fails due to a lack of clear process. An analyst will be tasked at collecting business requirements to write a specification. That is often full of nonsense written by someone with no design experience. That specification is given directly to a developer for estimation and development. Where was the design? The nonsense works it's way into the product. It becomes an estimation nightmare. Projects and budgets run over or worse, fail.

How do I know this works?

When I break a job down for a client into small manageable pieces I know the entire scope and can estimate accurately.

Clients love it because they can see every single piece of the puzzle in the design. The quantity and times for each piece look reasonable when broken down. The clients begin to appreciate the scope but ultimately don't care how it's built or what technology is used. They just want to know how much and when? If there is a deadline? Either add more resources or cut functionality. Let them make that choice. It's not my concern.

The cream, and why this industry is so much better than construction and manufacturing, is to look at all the pieces and see the re-usable patterns. Design optimization. Copy, paste and re-use as much as possible. Clients don't care about re-usability. I charge top of market rate and still manage to be cheaper than competitors because my estimates were realistic (as opposed to their crystal ball methodologies). Re-usability and working smarter then gives me 4-5x that hourly rate. Shh! Don't tell the clients :)

If you can't estimate software accurately, you're clearly in the wrong business!


> If you can't estimate software accurately, you're clearly in the wrong business!

That was a terrible closing statement to a pretty helpful writeup. Software estimation isn't something you're going to be good at off the bat. So with your logic, nobody new should be getting into any software development were estimations are requested (most situations)?


I call bullshit.

"Once a clear design exists, estimation is easy."

Nice to have the luxury of putting in free design work in order to produce a fixed bid.


Do you use the ISBSG database at all?

edit: also, could you email me please? I'd be interested in asking some questions.


Does the customer understand what they are asking for?

Can they afford it?


Some additional notes/suggestions on estimation on Stackexchange: http://freelancing.stackexchange.com/q/495/67




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: