"Conventional wisdom holds that you should never rewrite your code from scratch, and that’s good advice."
Speaking as someone who has done a fair number of rewrites as well as watching rewrites fail, conventional wisdom is somewhat wrong.
1. Do a rewrite. Don't try to add features, just replace the existing functionality. Avoid a moving target.
2. Rewrite the same project. Don't redesign the database schema at the same time you are rewriting. Try to keep the friction down to a manageable level.
3. Incremental rewrites are best. Pick part of the project, rewrite and release that, then get feedback while you work on rewriting the next chunk.
That's sometimes a good strategy, but breaks when you do too much at once. See boeing's recent fiasco where they did basically just that with parts and a fuselage that weren't really compatible. Metaphor extends: try to replace too much in part and something will break.
I would say the Boeing incident more so favors the counterpoint. Boeing tried to keep adding features to an airframe (code base) that could not support it.
The company I work for actually did this with our main product.
Gradually re-written from Java (JBOSS) to Python over about 10 years. Basically the Python side knew what URLs it could handle and proxied the rest over to Javaland. We ended up shutting down the last Java bits early last year.
Been doing that quite a few times back in PHP-land.
Partial rewrites based on Domain/Model/REST endpoint and just proxying to both webapps based on which was new already.
No breaking changes from the outside, either share the parts of the code that are still good (most of the business logic might be) or fork them (and then refactor and maybe you need to fix bugs twice) and after a while you can switch off the old part.
Works like a charm with added benefits of being in the same language to avoid wasting time. But the base idea works even if you use different languages.
It's also not for web apps, for example Apache Storm can do Bolts? (been a while) in several languages so you can also easily rewrite parts, if you can serialize your data in and out of it.
I would argue that if you already have a lot of in-house knowledge with Java then you might as well stick with it (and just not make whatever mistakes you made last time around) but Python seems like a reasonable alternative to me.
At the time there was a strong internal directive that it not only not be Java, but not anything that even looked a bit like Java if you squinted at it (e.g. dotnet) - plus we've always been an "anything but Windows" shop anyway
There's a lot of culture that comes with a language. 10 years ago, Java and C# were associated with enterprisey complexity and interchangeable programmers working under architecture astronauts, Python was used by the cool kids. Keeping to simplicity is hard when the available libraries and people you recruit were used to https://github.com/EnterpriseQualityCoding/FizzBuzzEnterpris...
At the time the re-write started, Django hadn't yet seen a public release, Rails barely existed (and had zero traction, and Ruby-the-runtime was horrible), dotnet was barely a thing, Clojure and Go didn't exist yet, dotnet barely existed. 2005 was a weird time.
Realistically the only options at the time were Python, PHP (4, not 5), or Perl.
I didn't come in until much later, but I don't really think there was a better option at the time.
> Don't redesign the database schema at the same time you are rewriting.
Chances are, that's the problem though. You have to re-write because your design is bad enough that it's necessary. The only time you'd re-write without changing the design is when changing languages or platforms.
"Show me your flowcharts and conceal your tables, and I shall continue to be mystified. Show me your tables, and I won’t usually need your flowcharts; they’ll be obvious. "
-- Fred Brooks, The Mythical Man-Month
Attempting a design re-write without changing the data structures will just result in the same design. And you can't change the data design without then having to change the code to match.
I've done several huge rewrites over my sordid career, and a strategy that works well is to abstract your bad data design behind a good one. This involves some kind of transformer layer between your bad data design spec and your abstracted better one. There's of course performance hits of some kind no matter what, but the hope is that they're worth the benefits of the cleaner code/architecture that the better data design spec affords. It's difficult I'll grant, much harder than abstracting simply bad code, but it's possible, and also makes it that much easier to replace the actual data design at some point in the future, and remove the transformer layer. Hope that makes sense!
I've done the transformation layer in the small to keep components running during a re-write/refactor. Create a new structure and write some code (in the DB or otherwise) to make the new structure look like the old structure long enough to eventually replace that old code.
I can't imagine doing that as an independent step towards a rewrite.
Something similar I've seen is a bad data design that is embedded into a giant monolith, and has dependency tendrils everywhere. In that case it's worthwhile to factor out the logic (which includes cutting the dependencies) even leaving the schema intact. Once the problem layer is sequestered e.g. into its own microservice it is hopefully possible to release it on its own cycle much faster than the monolith, paving the way to improving the schema next.
>Attempting a design re-write without changing the data structures will just result in the same design. And you can't change the data design without then having to change the code to match.
Sure you can. You decouple your data structures from your schema with an abstraction layer, facade, etc.
Point 3 seems to redefine “rewrite” as “partial rewrite”, which is essentially in the spirit of “never rewrite the [whole] code[base] from scratch”. Am I missing something?
Take a typical CRUD app. It'll have, say, something for inserting data, editing data, viewing data, and reporting. Pick one of those chunks (reporting is good; it's usually easy and gives insight into the data) and rewrite that. Release it (if it's a web application, modify the old app to redirect reporting requests to the new app). Lather, rinse, repeat until you have a completely rewritten app.
It is more along the line of "don't try to do a big-bang release of everything" when you are initially writing an app.
Joel's article is the famous classic, but he proved himself wrong when his company, Fog Creek, built Trello from the ground up, sharing no code with FogBugz. Trello became a huge success while FogBugz languished. https://medium.com/@herbcaudill/lessons-from-6-software-rewr...
(I think this article should be the new classic.)
> My takeaway from these stories is this: Once you’ve learned enough that there’s a certain distance between the current version of your product and the best version of that product you can imagine, then the right approach is not to replace your software with a new version, but to build something new next to it — without throwing away what you have.
+1 on all fronts. Rewrites/big refactors can be tremendously helpful 3-4 years in a big project's life. At least on the stuff I've worked on, that's around the time that accumulated tech debt has built up and the problem domain the we were hoping to solve is now well understood with production traffic.
Yep, I've never seen a system that was a) so bad that it requires rewriting and b) had a great schema that rewrite could be done on top of.
Usually a quick glance at the schema will shine a bright light on top of all sorts of badness - denormalizations, excessive metadata-driven-ness etc. Rewriting code on top of crud like that would be epensive, and result in a lot of ugly code, which would need to be rewritten again.
Right, rewrites are basically invest now vs spend exponentially later.
They make little sense as soon as a project is completed, even if you rewrite, the software hasn't had time to point you in a direction, a 'nice' rewrite of the same wrong solution is just as useless as a bad implementation of the wrong solution.
They may make sense after a product has matured, since the actual scope of the software will be more defined. The 'correct' schema is starting to reveal itself, but usually it's rare that business wise it's worth rewriting at this stage, adding features is the better choice.
They definitely make sense once a product is fully matured, starting to gather legacy issues - since fighting around all the cruft and incorrect schema restrictions is wasting more time than redesigning it from ground up using the existing implementation as the guide.
That's perfectly compatible with the conventional wisdom you claims wrong, which, again, was “you should never rewrite your code from scratch” [emphasis added]
3) is usually the only way to go if something has a little bit of history and complexity. It’s almost impossible to capture all features and uses of a system completely and then deliver something that does all of these things. Much better to work in small increments which can be understood. It may take longer and feel less heroic but at least you always have something that works.
Incremental rewrites (aka refactoring) is best, but sometimes that's just not viable, which is where I feel conventional wisdom breaks down.
If your code is trying to get updated, you can usually refactor. If your code is having a paradigm shift, it's likely any incremental rewrite will take more time and have more duplication without benefit for far too long to be successful.
Refactoring is recoding a component while maintaining it's external interface so that external interactions with the component are unaffected.
Rewrites, incremental or otherwise, do not necessary preserve external interfaces (an incremental rewrite doesn't start clean slate, but instead progresses by gradual replacement of existing code.
#1 and #2 ignore why a rewrite is happening in the first place. No one rewrites a system for fun, they're rewritten because the current codebase has major failings that prevent new features or make the current codebase unworkable. These should be the first things added to a rewrite to prove that it's not a wasted effort.
Thank you! I read the first sentence and had to speak "bullshit" out loud. Sometimes rewriting something feels so good. Especially after some time has past and you are still in the process of learning. You apply all the things you have learned and your are amazed how much better it has become.
Depends on the size of the re-write. If it's small, no problem. If it's big, it's a death-march for result-driven developer => because at the end of the re-write, the product is still more or less the same, just different engine.
Some developers prefer to leave the company during the re-write (which usually happens after 3 years exciting hyper-growth).
Re #2 - sometimes the code isn't all the problem, but it is the schema - and the two can be intertwined in such a way that you can't fix one without fixing the other.
The reason to avoid rewrites is because they generally get cancelled, or they build the wrong thing.
In the first case, you have a system that mostly does what you want, but it has architectural issues that make it hard to change. You plan a rewrite. You get buy in from management. But the problem is that the business still needs those changes. If your rewrite takes more than a few months, then you're still going to have to do the changes.
Over time, you end up with 2 groups: legacy and greenfield. The legacy group is usually made up of crufty old salts that don't care what they work on. They just grind it out. The greenfield group contains usually younger people who what to do something new. They want to "get it right" this time. They care deeply about what they work on and so there is usually a fair amount of conflict on the team about how to approach just about anything.
The legacy team grinds along, slowly solving business problems and the greenfield team provides no business value until they are finished (while constantly promising that it will all be amazing as soon as they are done). But the business types see only that the legacy product is doing what they want and that the team is quietly plugging away. The greenfield product does not do what they want and the team seems to be doing a lot of excited talking, but it still isn't finished yet.
So they say, "Wouldn't it be better to move the greenfield team back on the legacy product so that we can get things done faster? After all, they said they couldn't move forward with the old architecture, but it's moving forward just fine. And with more resources it will go along a lot faster. Besides we have these business emergencies that we have to deal with, so let's just put the greenfield project on hold for a while until we can sort out exactly what's best". And the greenfield project is effectively buried.
Now, it doesn't have to turn out this way, but powerful forces point you in that direction and you have to be very careful not to have it happen to you.
The other main problem is that when people think about doing a rewrite the requirements document usually looks like "Do exactly the same thing as the last system". But the problem is nobody actually knows what the last system does precisely while simultaneously they all think they know precisely how the old system works.
You get a lot of push back from the business when you start gathering requirements because the only thing they really want to say is "Just do it like the old system". Only, after months and months of development you end up realising that there were a lot of corner cases that you missed in the rewrite. Oh... and it's not possible in the new architecture to do that without jumping through a lot of hoops.
So, incremental rewrites are best, but only if you can get the business to actually use the greenfield project. Normally they will completely ignore it because their paycheque and their promotion and their happiness depends on being able to do their job at least as well as they could with the old system. And the new system doesn't do everything that it needs to do (because we've released incrementally). What's more the developers keep asking stupid questions like "What do you want it to do" and the business people are responding with "Why can't you listen to me? I've told you one hundred times to do it the same as the old system!"
And so the new system is shunned by the business. It becomes radio active. Unless some upper management type sends a mandate down to force the business to use the new system, nobody will touch it. The upper management, in the meantime are wondering, "Why are we doing a rewrite again? The old system seems to do what we want, while everyone is complaining about the old system".
Yep, it can be done, but it requires considerably amount of help from upper management. You should avoid it at all costs unless you are sure upper management is supporting you all the way.
But even when you are successful, it may just be the beginning of the end. You've managed to stave off cancellation. You've managed to keep user engagement and work through the requirements so that the new system is equivalent. But it turns out that there is some small detail or decision that makes the new product non-viable.
For examples of products I've been personally involved with: Word Perfect 5 was written in assembly code and was not based on Windows. Word Perfect 6 was rewritten in C++ and was based on Windows. The team that did the rewrite were very proud of their work. But they through out all the keybindings in WP 5 (because GUI is way better!). The new rewrite was also very, very slow compared to WP 5. This was the beginning of the end for WP, even though one could definitely say that a rewrite was necessary. They just made the wrong choices.
Similarly, I once worked for Nortel and they had a telephone switch that they sold for $10 million a pop. It had 31 million lines of code (and stinky, stinky, stinky code at that). They realised that they could rewrite it in a fraction of that amount of code. They had 3000 developers working on the rewrite and after a few years they succeeded. Only... it didn't work with all of the business equipment that the old switch worked on. And it turned out that nobody wanted it unless it worked with that equipment. And.. the new architecture was not conducive to making it work with the business equipment. In the end, they gave a few switches to China before they abandoned it.
Rewrites are hard even when you've made the right choice to do it.
"incremental rewrites are best, but only if you can get the business to actually use the greenfield project. "
That completely fits my experience. If the new code is being used early and can grow iteratively while being relevant and useful, its gonna work out fine.
Having the codebases running side by side somehow is the best choice all around, if it can be managed.
Speaking as someone who has done a fair number of rewrites as well as watching rewrites fail, conventional wisdom is somewhat wrong.
1. Do a rewrite. Don't try to add features, just replace the existing functionality. Avoid a moving target.
2. Rewrite the same project. Don't redesign the database schema at the same time you are rewriting. Try to keep the friction down to a manageable level.
3. Incremental rewrites are best. Pick part of the project, rewrite and release that, then get feedback while you work on rewriting the next chunk.