I have 30+ years of industry experience and I've been leaning heavily into spec driven development at work and it is a game changer. I love programming and now I get to program at one level higher: the spec.
I spend hours on a spec, working with Claude Code to first generate and iterate on all the requirements, going over the requirements using self-reviews in Claude first using Opus 4.5 and then CoPilot using GPT-5.2. The self-reviews are prompts to review the spec using all the roles and perspectives it thinks are appropriate. This self review process is critical and really polishes the requirements (I normally run 7-8 rounds of self-review).
Once the requirements are polished and any questions answered by stakeholders I use Claude Code again to create a extremely detailed and phased implementation plan with full code, again all in the spec (using a new file is the requirements doc is so large is fills the context window). The implementation plan then goes though the same multi-round self review using two models to polish (again, 7 or 8 rounds), finalized with a review by me.
The result? I can then tell Claude Code to implement the plan and it is usually done in 20 minutes. I've delivered major features using this process with zero changes in acceptance testing.
What is funny is that everything old is new again. When I started in industry I worked in defense contracting, working on the project to build the "black box" for the F-22. When I joined the team they were already a year into the spec writing process with zero code produced and they had (iirc) another year on the schedule for the spec. At my third job I found a literal shelf containing multiple binders that laid out the spec for a mainframe hosted publishing application written in the 1970s.
Looking back I've come to realize the agile movement, which was a backlash against this kind of heavy waterfall process I experienced at the start of my career, was basically an attempt to "vibe code" the overall system design. At least for me AI assisted mini-waterfall ("augmented cascade"?) seems a path back to producing better quality software that doesn't suffer from the agile "oh, I didn't think of that".
About 15 years ago, I worked on code that delivered working versions to customers, repeatedly, who used it an reported zero bugs. It simply did what it was meant to, what had been agreed, from the moment they started using it.
The key was this: "the requirements are polished and any questions answered by stakeholders"
We simply knew precisely what we were meant to be creating before we started creating it. I wonder to what degree the magic of "spec driven development" as you call it is just that, and using Claude code or some other similar is actually just the expression of being forced to understand and express clearly just what you actually want to create (compared to the much more prevalent model of just making things in the general direction and seeing how it goes).
Waterfall can work great when: 1/ the focus is long-term both in terms of knowing that she company can take a few years to get the thing live but also that it will be around for many more years, 2/ the people writing the spec and the code are largely the same people.
Agile was really pushing to make sure companies could get software live before they died (number 1) and to remedy the anti-pattern that appeared with number 2 where non-technical business people would write the (half-assed) spec and then technical people would be expected do the monkey work of implementing it.
Agile core is the feedback loop. I can't believe people still don't get it. Feedback from reality is always faster than guessing on the air.
Waterfall is never great. The only time when you need something else than Agile is when lives are at stake, you need there formal specifications and rigorous testing.
SDD allows better output than traditional programming. It is similar to waterfall in the sense that the model helps you to write design docs in hours instead of days and take more into account as a result. But the feedback loop is there and it is still the key part in the process.
The only software I ever worked on that delivered on time, under budget, and with users reporting zero bugs over multiple deliveries, was done with heavy waterfall. The key was knowing in advance what we were meant to be making, before we made it. This did demand high-quality customers; most customers are just not good enough.
> Feedback from reality is always faster than guessing on the air
Only if you have no idea what the results will be.
Professional engineering takes parts with specific tolerances, tested for a specific application, using a tried-and-true design, combines them into the solution that other people have already made, and watches it work, exactly as predicted. That's how we can build a skyscraper "the first time" and have it not fall down. We don't need to build 20 tiny versions of a building until we get a working skyscraper.
But when you build a skyscraper you don’t one shot a completed building that stays static its entire life - you build a set of empty floors that someone else designs & fits out, sometimes years after the building as a whole is commissioned, usually several times in the lifespan of the superstructure.
And in the fitting out there often are things that exist only to get customer feedback (of sales), such as model apartments, sample cubicle layouts etc.
So yes, you are right that engineering can guide us to building something right first time - the hard part from software perspective is usually building the right thing, no the thing right.
An interesting analogy I came across once but could never find again is that with software systems, we’re not building a building, we’re designing a factory that produces an output - the example was a mattress factory that took in raw rubber feedstock & cloth and produced mattresses.
Are you running a mattress factory? Or are you trying to run a hotel, and need mattresses, so you build a mattress factory? The "software industry" is that - dysfunctional with perverse incentives.
We should not be building the same software over and over and over and over. I've built the same goddamn app 10 times in my career. And I watch other people build it, making the same old mistakes over and over, like a thousand other people haven't already gone through this and could easily tell you how not to do it. In other engineering professions, they write that stuff down, and say "follow this plan" because it avoids all the big problems. Thank god we have a building code and not "agile buildings".
Agile sucks because it incentivizes those obvious mistakes and reinventing of wheels. Planning allows someone to stop and look up the correct way of building the skyscraper before it's 100 feet in the air with a cracked foundation.
Waterfall, weeks of planning, write 3 times anyway.
The point is, people don't know what they want or are asking for, until it's in front of them. No system is perfect, but waterfall leads to bigger disasters.
Any real software (that delivers value over time) is constantly rewritten and that's a good thing. The question is whether the same people are rewriting it that wrote it and what percentage of that rewriting is based off of a spec or based off of feedback from elsewhere in the system.
> The only time when you need something else than Agile is when lives are at stake, you need there formal specifications and rigorous testing.
Lives are always at stake, given that we use software everywhere, and often in unintended ways, even outside its spec (isn't that a definition of a "hack"?).
People think of medical appliance software, space/air traffic software, defense systems or real-time embedded systems as the only environments where "lives are stake", but actually, in subtle ways, a violation
of user expectancy (in some software companies, UX issues count as serious bugs) in a Word processor, Web browser or
the sort command can kill a human.
Two real-life examples:
(1) a few years ago, a Chinese factory worker was killed by a robot. It was not in the spec that a human could ever walk in the robot's path (the first attested example of "AI" killing a human that I found at the time). This was way before deep larning entered the stage, and the factory was a closed and fully automated environment.
(2) Also a few years back, the Dutch software for social benefits management screwed up, and thousands of families just did not get pay out any money at all for an extended period. Allegedly, this led to starvations (I don't have details - but if any Dutch read this, please share), and eventually a whole Dutch government was forced to resign over the scandal.
That's a very narrow definition of engineering. What about property? Sensitive information?
It's a fine "whoopsie-doodle," when your software erases the life savings of a few thousand people. "We'll fix that in the next release," is already too little, too late.
This is correct. Agile is control theory applied to software engineering.
The plant to control here isn't something simple like a valve. You're performing cascaded control of another process where the code base is the interface to the plant you're controlling.
I spent my career building software for executives that wanted to know exactly what they were going to get and when because they have budgets and deadlines i.e. the real world.
Mostly I’ve seen agile as, let’s do the same thing 3x we could have done once if we spent time on specs. The key phrase here is “requirements analysis” and if you’re not good at it either your software sucks or you’re going to iterate needlessly and waste massive time including on bad architecture. You don’t iterate the foundation of a house.
I see scenarios where Agile makes sense (scoped, in house software, skunk works) but just like cloud, jwts, and several other things making it default is often a huge waste of $ for problems you/most don’t have.
Talk to the stakeholders. Write the specs. Analyze. Then build. “Waterfall” became like a dirty word. Just because megacorps flubbed it doesn’t mean you switch to flying blind.
> The key phrase here is “requirements analysis” and if you’re not good at it either your software sucks or you’re going to iterate needlessly and waste massive time including on bad architecture. You don’t iterate the foundation of a house.
This depends heavily on the kind of problem you are trying to solve. In a lot of cases requirements are not fixed but evolve over time, either reacting to changes in the real word environment or by just realizing things which are nice in theory are not working out in practice.
You don’t iterate the foundation of a house because we have done it enough times and also the environment the house exists in (geography, climate, ...) is usually not expected to change much. If that were the case we would certainly build houses differently than we usually do.
> making it default is often a huge waste of $ for problems you/most don’t have.
It's the opposite — knowing the exact spec of your program up front is vanishingly rare, probably <1% of all projects. Usually you have no clue what you're doing, just a vague goal. The only way to find out what to build is to build something, toss it over to the users and see what happens.
No developer or, dear god, "stakeholder" can possibly know what the users need. Asking the users up front is better, but still doesn't help much — they don't know what they want either.
No plan survives first contact with the enemy and there's no substitute for testing — reality is far too complex for you to be able to model it up front.
> You don’t iterate the foundation of a house.
You do, actually. Or rather, we have — over thousands of years we've iterated and written up what we've learned so that nobody has to iterate from scratch for every new house anymore. It's just that our physics, environment, and requirements for "a house" doesn't change constantly, like it does for software and we've had thousands of years to perfect the craft, not some 50 years.
Also, civil engineers mess up in exactly the same ways. Who needs testing? [1]. Who needs to iterate as they're building? [2].
> knowing the exact spec of your program up front is vanishingly rare, probably <1% of all projects
I don't have anything useful to add, but both of you speak and write with conviction from your own experience and perspective yet to refuse that the situation might be different from others.
"Software engineering" is a really broad field, some people can spend their whole life working on projects where everything is known up front, others the straight opposite.
Kind of feel like you both need to be clearer up front about your context and where you're coming from, otherwise you're probably both right, but just in your own contexts.
My experience is that such one-shotted projects never survive the collision with reality. Even with extremely detailed specs, the end result will not be what people had in mind, because human minds cannot fully anticipate the complexity of software, and all the edge cases it needs to handle. "Oh, I didn't think that this scheduled alarm is super annoying, I'd actually expect this other alarm to supersede it. It's great we've built this prototype, because this was hard to anticipate on paper."
I'm not saying I don't believe your report - maybe you are working in a domain where everything is super deterministic. Anyway, I don't.
I've been doing spec-driven development for the past 2 months, and it's been a game changer (especially with Opus 4.5).
Writing a spec is akin to "working backwards" (or future backwards thinking, if you like) -- this is the outcome I want, how do I get there?
The process of writing the spec actually exposes the edge cases I didn't think of. It's very much in the same vein as "writing as a tool of thought". Just getting your thoughts and ideas onto a text file can be a powerful thing. Opus 4.5 is amazing at pointing out the blind spots and inconsistencies in a spec. The spec generator that I use also does some reasoning checks and adds property-based test generation (Python Hypothesis -- similar to Haskell's Quickcheck), which anchors the generated code to reality.
Also, I took to heart Grant Slatton's "Write everything twice" [1] heuristic -- write your code once, solve the problem, then stash it in a branch and write the code all over again.
> Slatton: A piece of advice I've given junior engineers is to write everything twice. Solve the problem. Stash your code onto a branch. Then write all the code again. I discovered this method by accident after the laptop containing a few days of work died. Rewriting the solution only took 25% the time as the initial implementation, and the result was much better. So you get maybe 2x higher quality code for 1.25x the time — this trade is usually a good one to make on projects you'll have to maintain for a long time.
This is effective because initial mental models of a new problem are usually wrong.
With a spec, I can get a version 1 out quickly and (mostly) correctly, poke around, and then see what I'm missing. Need a new feature? I tell the Opus to first update the spec then code it.
And here's the thing -- if you don't like version 1 of your code, throw it away but keep the spec (those are your learnings and insights). Then generate a version 2 free of any sunk-cost bias, which, as humans, we're terrible at resisting.
Spec-driven development lets you "write everything twice" (throwaway prototypes) faster, which improves the quality of your insights into the actual problem. I find this technique lets me 2x the quality of my code, through sheer mental model updating.
And this applies not just to coding, but most knowledge work, including certain kinds of scientific research (s/code/LaTeX/).
My experience with both Opus and GPT-codex is that they both just forget to implement big chunks of specs, unless you give them the means to self-validate their spec conformance. I’m finding myself sometimes spending more time coming up with tooling to enable this, than the actual work.
The key is generating a task list from the spec. Kiro IDE (not cli) generates tasks.md automatically. This is a checklist that Opus has to check off.
Try Kiro. It's just an all-round excellent spec-driven IDE.
You can still use Claude Code to implement code from the spec, but Kiro is far better at generating the specs.
p.s. if you don't use Kiro (though I recommend it), there’s a new way too — Yegge’s beads. After you install, prompt Claude Code to `write the plan in epics, stories and tasks in beads`. Opus will -- through tool use -- ensure every bead is implemented. But this is a more high variance approach -- whereas Kiro is much more systematic.
I’ve even built my own todo tool in zig, which is backed by SQLite and allows arbitrary levels of todo hierarchy. Those clankers just start ignoring tasks or checking them off with a wontfix comment the first time they hit adversity. Codex is better at this because it keeps going at hard problems. But then it compacts so many times over that it forgets the todo instructions.
I think there's a difference between people getting a system a d realising it isn't actually what they wanted and "never survive collision with reality".
They survive by being modified and I don't think that invalidates the process that got them in front of people faster than would otherwise have been possible.
This isn't a defence of waterfall though. It's really about increasing the pace of agile and the size of the loop that is possible.
I believe the future of programming will be specs so I’m curious to ask you as someone who operates this way already, are there any public specs you could point to worth learning from that you revere? I’m thinking the same way past generations were referred to John Carmack’s Quake code next generations will celebrate great specs.
While the environment is changing. That's the key.
If you already know the requirements, and they aren't going to change for the duration of the project, then you don't need agile.
And if you have the time. I recently was on a project with a compressed timeline. The general requirements were known, but not in perfect detail. We began implementation anyway, because the schedule did not permit a fully phased waterfall. We had to adjust somewhat to things not being as we expected, but only a little - say, 10%. We got our last change of requirements 3 or 4 weeks before the completion of implementation. The key to making this work was regular, detailed, technical conversations between the customer's engineers, the requirements writers, and our implementers.
How does the resulting code look like though? I found that while <insert your favorite LLM> can spit out barely working C++ code fast, I then have to spend 10x time prodding it to refactor the code to look at least somewhat acceptable.
No matter how much I tell it that it is a "professional experienced 10x developer versed in modern C++, a second coming of Stroustrup" in per-project or global config files it still keeps spewing the same crap big (like manual memory management instead of RAII here and there, initializing fields in ctor body instead of initializer list, having manual init/cleanup methods in classes instead of a proper ctor/dtor design to ensure that objects are always in a consistent state, bunch of other anti-patterns, etc.) and small (checking for nullptr before passing the pointer to delete/free, manually instantiating objects as argument to shared_ptr ctor instead of make_shared, endlessly casting stuff around back and forth instead of designing data types properly, etc.).
Which makes sense I guess because it is how average C++ code on GitHub looks like unfortunately and that is what all those models were trained on, but I keep feeling like my job turning into performing endless code review for a not-very- bright junior developer that just refuses to learn...
This could be a language specific failure mode. C++ is hard for humans too, and the training code out there is very uneven (most of it pre-C++11, much of it written by non-craftspeople to do very specific things).
On the other hand, LLMs are great at Go because Go was designed for average engineers at scale, and LLMs behave like fast average engineers. Go as a language was designed to support minimal cleverness (there's only so many ways to do things, and abstractions are constrained). This kind of uniformity is catnip for LLM training.
This. I feel like the sentiment on HN is very binomial. For me my experience with LLMs is very much what you experience. Anything outside of generic tasks fails miserably. I’m really curious how people make it work so well.
Agile isn’t against spec writing. Specs can be a task in your story and so can automated tests. Both can be deliverables in your acceptance criteria. But that’s not how it went - because the human nature is to look for least effort.
Which AI, least effort is the specs so that’s the “greatest thing to do” again.
Perhaps a better way than to view them as alternative choices is to view them as alternative modes of working, between which it is sometimes helpful to switch?
We know old-style classic waterfall lacks flexibility and agile lacks planning, but I don't see a reason why not to switch back and forth multiple times in the same project.
Yep. I've been into spec-driven development for a long time (when we had humans as agents) and it's never really failed me. We just have literally more attention (hah!) from LLMs than from humans.
What's amusing to me is that PRIDE, the oldest generally available software methodology and perhaps the least appreciated, is basically just "spec driven development with human programmers". Most of the time, and personnel, involved in development is on elucidating the requirements and developing the spec; programmers only get involved at the end and their contribution is about 15%. For a few decades this was considered the "correct" way to develop software. But then PCs happened, mom-and-pop software vendors stuffing floppy disks into Ziploc happened, and the myth of the lone "genius programmer" took hold of the industry, and programmers experienced such prestige inflation that they thought they were able to call the shots, and by and large management acquiesced. And that's how we got Agile.
With the rise of AI, maybe programmers will be put back in their rightful place, as contributors of the final small piece of the development process: a translation from business terms to the language of the computer. Programming as a profession should, by all rights, be obsolete. We should be able to express the solution directly in business terms and have the translation take place automatically. Maybe that day will be here soon.
As it is so often in life, extreme approaches are often bad. If you do pure waterfall you risk finding out very late that your plan might not work out, either because of unforeseen technical difficulties implementing it, the given requirements actually being wrong/incomplete or just simply missing the point in time where you planned enough. If you do extreme agile you often end up with a shit architecture which actually, among other things, hurt your future agility but you get a result which you can validate against reality. The "oh, I didn't think of that" is definitely present in both extremes.
Agile is really about removing managers. The twelve principles does encourage short development cycles, but that's to prevent someone from going off into the weeds — having no manager to tell them to stop.
I spend hours on a spec, working with Claude Code to first generate and iterate on all the requirements, going over the requirements using self-reviews in Claude first using Opus 4.5 and then CoPilot using GPT-5.2. The self-reviews are prompts to review the spec using all the roles and perspectives it thinks are appropriate. This self review process is critical and really polishes the requirements (I normally run 7-8 rounds of self-review).
Once the requirements are polished and any questions answered by stakeholders I use Claude Code again to create a extremely detailed and phased implementation plan with full code, again all in the spec (using a new file is the requirements doc is so large is fills the context window). The implementation plan then goes though the same multi-round self review using two models to polish (again, 7 or 8 rounds), finalized with a review by me.
The result? I can then tell Claude Code to implement the plan and it is usually done in 20 minutes. I've delivered major features using this process with zero changes in acceptance testing.
What is funny is that everything old is new again. When I started in industry I worked in defense contracting, working on the project to build the "black box" for the F-22. When I joined the team they were already a year into the spec writing process with zero code produced and they had (iirc) another year on the schedule for the spec. At my third job I found a literal shelf containing multiple binders that laid out the spec for a mainframe hosted publishing application written in the 1970s.
Looking back I've come to realize the agile movement, which was a backlash against this kind of heavy waterfall process I experienced at the start of my career, was basically an attempt to "vibe code" the overall system design. At least for me AI assisted mini-waterfall ("augmented cascade"?) seems a path back to producing better quality software that doesn't suffer from the agile "oh, I didn't think of that".