Hacker News new | past | comments | ask | show | jobs | submit login
Exploding Software-Engineering Myths (2009) (microsoft.com)
145 points by luu on March 29, 2015 | hide | past | favorite | 87 comments



After reading the paper [1] regarding the empirical study on TDD, I'm pretty dubious about how well backed the conclusions are. If you look at Table 2, in the 4 case studies, 2 of the projects took longer with TDD (2.00x, and 2.64x ratio in man months to 3sf) but then the other 2 did better: (0.319x and 0.476x ratio in man months, 3sf). But all of this seems to be ignored in proving the original statement that TDD take 15-35% longer, which appears to be backed by what is described as "Management Estimates" [Table 3].

In fact, taking the geometric mean of the man month values (which, I think, but am not sure is the right metric to use when comparing ratios), we find the average ratio to be 0.947x. Make of this what you will, given the size of the sample.

[1]: http://research.microsoft.com/en-us/groups/ese/nagappan_tdd....


The Nagappan paper gets cited a fair bit, mostly because there is so little hard evidence of any sort that actually backs TDD as presented by the big name evangelists.

However, on a closer reading, even that paper isn't really assessing TDD. It basically defines any test-first coding practice to be TDD, including those that still spend considerable time and resources on up-front design.

What we really need is some properly structured study that compares professional practitioners (not just students or recent grads) over extended periods (not just a brief project that doesn't show the long-term benefits or costs in terms of quality and maintainability) who are using no unit tests but otherwise common development practices (control group), the same but writing unit tests after the main code, the same but with unit tests written before the main code, and full test-driven development where other aspects of the development process, particularly design, are effectively replaced by the test-writing activities as advocated in well-known TDD books and other training materials.

Of course it's all but impossible to really achieve that sort of like-for-like comparison with usefully controlled conditions, which is why these issues are so hard to debate objectively. But try finding any study with pro-TDD conclusions that doesn't obviously fail the objectivity test in at least one of the three ways above (not a sample of professional programmers, too short term to draw general conclusions, or conflating TDD with some other unit-test-related development process).


You're right, this kind of study is really hard to set up properly. However, in this instance, I'm more concerned by the quality of the analysis. It seems to be, ignoring actual data in favour of anecdotal evidence, even when they clearly aren't even correlated at a cursory glance.


It would be really interesting to see how much variation there was between two groups using the same method as well, to get an idea of how much different team vary, and how much controls can be relied on in such comparisons.


"Over a development cycle of 12 months, 35 percent is another four months, which is huge,” Nagappan says. “However, the tradeoff is that you reduce post-release maintenance costs significantly, since code quality is so much better."

Another way of describing this is that the non TDD teams weren't actually done when they shipped.


I viewed this as a trade-off. "If you can spare 35% more time, write your tests first. If speed is so important that you can slough off on testing at the end, don't."

Sometimes partially perfect and released beats better but later. (Perhaps not for Microsoft, but certainly for lots of their competitors) How many more bugs can you find with software "in the Wild"?

This isn't to say it's worth tossing crappy code over the transom, just that real tradeoffs do exist.


> Sometimes partially perfect and released beats better but later.

Especially since "later" implies a new market, with new demands on the product. "Better" then becomes a moving target, and with "Best" you might as well say "Stop building the road when you reach the horizon and the sky stops you".


I think this is a fallacy and a hard one to fall into. By shipping buggy code, you will end up creating a larger number of incoming bugs. Consequently, the amount of time you spend on incoming bugs will increase from 5% of sprint to 30% to cases where you have to do bug-only sprints.

In the short term, TDD will look like it's costing more (especially if the product is not released). But if you want developers to maintain confidence in their code or for your team to depend upon the dev team's ability to deliver robust code, it's an important safeguard to have.


Yes, it's a tradeoff. Not doing TDD is trading quality for speed. See the Iron Triangle of PM: fast, good, or cheap, you can pick two at most.


Also, what if the non-TDD were given another four months just to fix bugs after the release?

In that light, TDD might just be a ploy to buy more time.


Yes, call it PRR (Pre Release Refactor) and people will buy into it.


>Another way of describing this is that the non TDD teams weren't actually done when they shipped.

Was ANY code, TDD or not, ever "done" when it shipped?


Absolutely. Off the top of my head, the first thing that comes to mind are video games that were shipped on ROM cartridges. Bugs or not, when shipped, it's done.


Those days are long gone. Even in an deeply embedded world like automotive or AV equipment, code is (or can be) frequently updated.


In the very deeply embedded world, there are often no update mechanisms, period.


Possibly pacemaker firmware? I have no real knowledge of the subject, but I'm guessing it'd be kind of risky to equip a pacemaker with Bluetooth to accept over-the-air firmware updates.


You'd probably be right about the risk. But a buddy of mine has a pacemaker, and the version he has now allows wireless updates and tuning. It's possible that a firmware update still requires direct access, but it's possible it's wireless.

Of course, you'd better be darn sure the update works (firmware updates always scare me a little, never mind on a pacemaker!)


There were headlines in 2012 about pacemaker vulnerabilities. Apparently, they could be commanded to deliver an 830 volt shock, probably fatal, by someone with a laptop nearby.


I've long argued against people who say TDD is 'just as quick' when you get used to it. It's simply not true. I still test everything however, the payoffs are multitude. It's important to accustom your product managers to your velocity when tests are included.


TDD, or at least effective use of automated tests can make implementation much faster and code better.

Mappings + conversions are such a case.

For example if you're using an ORM being able to test the mappings in seconds will increase velocity. Or if you've written a function to convert type T0 to type T1, then automatically testing all combinations of T is very effective.


Absolutely agree that it can. For cases like you mention, where it's convenient to be able to rapidly test the outcome of a piece of code, it can be orders of magnitude faster and you have a greater confidence in the end result.

On average though, it takes longer (as summarised in the article).


This is why I don't trust TDD proponents. I'm not against TDD itself, or unit testing as a whole, but I'm also not 20 y/o college kid. There is absolutely no way the increase in workload of TDD doesn't have a cost.

It's a question of tradeoffs, something I've said many many times, and the research appears to be bearing that out.

No one who is being honest can really say with a straight face that development with TDD is actively faster unless they're naive or they're using cherry picked examples.


Its faster in terms development + time spent fixing bugs.

If you want it out now, but have loads of bugs sure. But i'm pretty sure most managers don't mean that, when they say they want it out sooner.


Not even the research bears that out.


> There is absolutely no way the increase in workload of TDD doesn't have a cost.

TDD has both costs and benefits. Like most everything else.


Lots of discussion here about TDD and test coverage... but the most significant factor of "code quality" found was "organizational metrics":

> Organizational metrics, which are not related to the code, can predict software failure-proneness with a precision and recall of 85 percent. This is a significantly higher precision than traditional metrics such as churn, complexity, or coverage that have been used until now to predict failure-proneness. This was probably the most surprising outcome of all the studies.

Which I take to confirm my belief that the freedom and responsibility given to developers, and how good they are, is more significant than the specific testing policy enforced :)


More freedom, means more freedom to test correctly. As opposed to be constantly being pushed to release as fast as possible.


It's a hell of a lot more complex than that.

A developer with a short deadline is going to design things differently than one with a longer deadline. That superior design can often result in better long term maintenance.

There's also the idea of requirements gathering. Some people take it for granted, but if your developers aren't in a position of power in your company, they can't insist on good requirements for software. I've experienced this firsthand.

I once worked for a company in which the field engineers brought in a large chunk of the money for the company. They were the "big dick swingers" in the company, if you will, while the software development team was an attempt at the company to automate a lot of the work done by engineers and then sell services with the gathered data.

This presented a conflict of interest in that the software would have actively made the company need less engineers as they could do their job faster, but because of the two groups relative status within the company, there was no way for the software team to insist on accurate specifications.

The day I walked out of that company I had been asked to rewrite a module of the software for the 3rd time and requested written documentation. I was on the phone with both my manager and the engineer we were supposed to be coordinating with. The engineer flat out refused and told me I didn't need it. I asked him how many years of software dev experience he had.

That's one anecdote, but the point is, politics in a company severely affect the performance of developers and not simply because of testing. I honestly think that's an awfully naive view of the world.


I agree with everything you just said.

Freedom gives freedom to adjust anything as required, not just testing.

Having freedom means being able to ask for longer deadlines to do things properly, regardless of politics for example.


One way of looking at the coverage issue is that high coverage isn't a very strong positive signal, but low or absent coverage is a strong negative signal.

It's also interesting to combine metrics like cyclomatic complexity, or relative churn rate, with coverage. Files/modules with high complexity/churn and low coverage are much scarier than those with low complexity and low coverage


Empirically, I have not seen this to be the case.

Test coverage has had little correlation to code quality on the projects I've worked on.

I would agree that well-written tests of the most complex parts of code help contribute towards a quality system. Most of the other tests -- sometimes more so, and sometimes less so, depending on other aspects of the project.


I was trying hard not to make a strong assertion, so I'm not sure what you are disagreeing with entirely...

TL;DR - it's a matter of context

But here's an anecdote where two complex pieces of code interacted...

A few years ago I was working on a fairly big project (~50 developers) and was responsible for much of the architecture, and on an implementation level, implementing some of the lines that connected boxes in said architecture. One of those was a line that crossed what can loosely be thought of as a system/application boundary, this involved both implementing the application hosting layer and writing the marshalling code that moved data in either direction, there was a bunch of complexity because of not quite convergent type systems, some threading issues etc...

So all this is done, not too many tests have been written, but hey, it works, if it didn't nothing in the system would work... a year or so into the project, the most senior technical person on the team has a flaky test that once in a while just completely fails to make progress, he investigates, and comes to the conclusion that once in a blue moon, the transport layer (my code) is dropping a message.

Needless to say I'm totally flummoxed, there is no code path that can drop messages without also creating a huge stink in logs etc. I spend two months investigating this, eventually I have a consistent repo, a fairly big hole in the other guys state machine that causes it to just stop making forward progress.

So despite the fact that a test exposed the bug, neither piece of complex code really had proper tests, but I would assert that the "line" didn't need the same level of tests, it was being exercised constantly by dozens of "boxes", but the unicorn "box" with the logic error could sure as hell have used some unit tests to validate its state transitions.


One thing I have never understood is how to apply TDD when the code that you are writing is nearly 100% dependent on external services and "black-box" library code. For instance, I rely on several more or less undocumented Microsoft APIs in my day job. How do I even know what correct outputs to code interacting with those components would be? It could depend on server policies, registry settings, firewall settings, SQL server settings, Active directory settings, and a hundred other things. My application code is bog-simple next to all of the code handling possible FUBAR states of all of this environment.


In my opinion [1], TDD is best suited for algorithmic logic, but doesn't really give any benefits for coordinating code. This is explained well in "Selective Unit Testing – Costs and Benefits" http://blog.stevensanderson.com/2009/11/04/selective-unit-te...

[1] http://henrikwarne.com/2014/02/19/5-unit-testing-mistakes/


I've read the opposite argued by proponents of TDD (either here or on StackOverflow, I forget): that TDD is definitely not for deriving algorithmic logic, but for run-of-the-mill CRUDs.

For me the definite example that TDD for algorithmic design/exploration is not very useful is the infamous Sudoku Debacle, but I've also never seen anything more than toy examples (e.g. fibonacci) of algorithmic exploration using TDD. Is there any real-world example of complex algorithmic design using TDD where the author doesn't cheat by using hidden domain knowledge to make "magical" leaps of inference in some of the TDD steps?

PS: just read your blog post, but you seem to be talking about unit testing with JUnit & mocking frameworks. Just to be sure we're on the same page: we do agree that TDD and Unit Testing are not the same and do not even share the same goal, right?


So.... it's basically useless for 90% of enterprise development?


You can either have your tests call the external service or you can mock out the service. You could also replicate the service for the purposes of testing, for example its pretty standard to have both a test and a production database.


Your issue is that you're not mocking correctly. You need to write mocks for services you depend on, ideally by recording and replaying real api interactions, such that you can verify that given the libraries you depend on your tests will be correct. Then write TDD tests, then write code, then write integ tests which will verify your original mocks work as assumed.


When your software is tightly coupled to some complicated external component, the most likely cause of bugs is bad assumptions about how that component behaves. The integration tests are therefore the most important, since those are what will expose the bad assumptions.

I wrote a program that used the computer's MAC address. The API that reports the MAC strips leading 0s, but my MAC had no leading 0s and so I I did not notice this behavior during development. If I had just captured and replayed this result, my tests would have passed on all machines, even affected ones. But because my tests exercised the real API, I encountered the bug and was able to work around it.


Another approach (which is not really so different) is to build adapters around external services and then write your code to use the adapters. The upside is that you can avoid writing integration tests which might be very difficult to set up (depending on the situation). The downside is that you might implement more in the adapter than you need.

In practice, I tend to do a mixture. I will mock/fake the external services and then build adapters and then finally remove the mocks/fakes. In the final part you may have to stub the adapters. The unit tests in the adapters will alert me to changes in the external services, so this is relatively safe to do. Your main sources of error will be subtle bugs where you have insufficient coverage in the adapters. You must also use restraint in working on the adapters independent of the code that uses it, because you have no end to end tests.

Building fakes instead of mocks in this situation is also a good way to really test your understanding of the external services. It takes more time, but it can pay dividends. Many times I will write a fake and then think, "Does it really work that way?" only to realize that I can't actually use the external service in the way I envisioned. With a mock, you completely divorce yourself from implementation so it is relatively easy to assume that the external service can do impossible things. You can go a long way before you realize that your code is unusable.

One thing I will caution -- mocks, fakes, stubs and integration tests tend to attract people with almost religious views in how they should be used. I tend to believe that this is stuff is hard and that current industry practice is very naive. I think there is a lot of room for experimentation with a very big upside. But you are likely to attract a lot of criticism if you wander outside someone's boundaries, so it is best to spend significant amounts of time pairing to make sure that everybody is comfortable with the approaches you use. Nothing is worse that "test wars" where people are going in 100 different directions in the tests.


This is the approach we take: Adapters+mocks for external services, but we're okay with hard dependencies (via injection if possible) on non-network libraries. This includes things like json serializers, other serializers, automappers, math libraries, etc.


If you're doing extensive mocking, then you should be doing integration tests instead.


Here's an answer I posted on SO that might be helpful

http://stackoverflow.com/questions/2611280/how-can-i-effecti...


What you're saying is that it's not clear to you what your code needs to do.

That's a fundamental problem that neither TDD or any other methodology can really solve.


It's clear to me what my code needs to do, however there are a myriad of environmental factors that can fuck up what my simple code needs to do. For instance, a common thing that I need to do is to look up one Active Directory property based on another. Depending on the environment, I may have access to that property using the logged-in user's credentials, or I may need to use a special service account set of credentials in my application, defined in a database or an app/web.config.

At some point, an over-zealous administrator can disable the rights for the account that my application is using inside a client organization. Then I end up on the phone for an hour or more, figuring out why an account that used to work no longer does, and getting my clients to raise a change request with the admins that broke our application in the first place to put things back.


These two comments (below) seem to be in opposition to one another.

It's clear to me what my code needs to do (Above)

How do I even know what correct outputs to code interacting with those components would be? (From your original comment)

If you know all the possible environmental factors that you need to deal with, then you can write up front tests for them (whether that is economical or not, is a different question).

However, if the complete set of environmental factors is unknown (though potentially discoverable) then you have a requirements problem, not a code problem. TDD can't solve that problem (although it might help make it obvious that the problem exists).

TDD is based on the assumption that it is possible to know when your code "works", and requires you to define that criteria up front.

But any development process you follow is must have some way of answering the "am I done yet?" question, or you'd never ship anything.

Perhaps your concern is that TDD doesn't help solve the "I don't have a complete set of requirements" problem. If so, then that's true, but it never claims to. If you don't have the requirements yet, then TDD says you're not ready to write code, and some other process has to be followed in order to produce requirements.

It is possible to do TDD if you only have some (but not all) requirements - but you can only write code for the requirements you do have.


>It's clear to me what my code needs to do, however there are a myriad of environmental factors that can fuck up what my simple code needs to do. For instance, a common thing that I need to do is to look up one Active Directory property based on another. Depending on the environment, I may have access to that property using the logged-in user's credentials, or I may need to use a special service account set of credentials in my application, defined in a database or an app/web.config.

Some stuff you test in unit tests, other you do functional testing for. What you describe is probably higher level than unit tests.

>At some point, an over-zealous administrator can disable the rights for the account that my application is using inside a client organization. Then I end up on the phone for an hour or more, figuring out why an account that used to work no longer does, and getting my clients to raise a change request with the admins that broke our application in the first place to put things back.

So? That's orthogonal to your app. You test that your app works if the rights are working OK, and you test what it shows to the user if the rights are not OK.

Other than that, obviously you cannot ensure it works without things it needs to have, nor do you need to test each and every externally dependent failure mode (e.g. user wiped out his hard drive).


If you actually know all these myriad environmental factors, writing tests for them is no harder than for anything else.

It could be a lot of work, of course, but that just reflects that you're building a complex system.

Test 1: Look up AD prop with user credentials Test 2: Look up AD prop when user credentials don't work

You'd almost certainly want to mock out the environment. Don't use the actual services.


That reply reads disparaging, perhaps implying that the author needs to spend more time designing. But that's not necessarily true! Have you ever changed your code to work around a bug in some version of a web browser, or compiler? If so, you discovered a limitation of TDD as it relates to components outside your control.

Saying that "no methodology can solve this" is defeatist: there are lots of things you can do. One technique is to write your tests to use the actual component, instead of mocks. For example, I never mock the filesystem: I always design tests to run using the real thing. This reduces the chances that I've encoded some bogus assumption into my tests, and makes it more likely that I'll encounter the individual quirks of the filesystem.


I just meant it sounded like the requirements aren't fully known.

That's hardly OPs fault, and neither more design or more TDD can solve it.


Agreed. Much of the time the users can't accurately describe what they want. Its easier to build what you think they want quickly and show them it, then get them to say what they want different. (Though a problem with this approach is feature creep).


You know, I understand and agree that code coverage isn't the be-all and end-all of testing. But... we do all secretly agree that if it isn't pretty high, then the testing isn't doing much, right?

Like, does anyone feel really good about saying there's good test coverage of a class with 50% code coverage? 60%?

After I hit 80-90% coverage, I start wanting to know about other attributes of the tests. Are they really really really testing that all the outputs are exactly what we expect? Are they controlling dependencies in a useful, realistic way? Are they exploring a wide range of inputs?

Before that, though, all of the above seem a bit premature. I don't deeply care if you've done a really thorough job of testing 20% of the code if the other 80% isn't even run.


Like most things, the answer is... it depends.

We have several with 0% and I'm just fine with that. They're exceedingly simple. And/or they're 15 years old and only change in very slight ways every few years.

There's other classes that are 100% that I'm not fine with. Because while the code is short, the implications are complex.

I feel like blindly talking about these metrics is like saying "hey, I bought an earring for $100, did I get ripped off?"


I think that you're broadening the question. Sure, sometimes you feel okay with code that's not at all tested. But surely you don't feel that code that is not at all tested is well tested.

I'm talking about to what extent code coverage can and can not be used as a proxy for "well tested," not to what extent "well tested" can and can not be used as a proxy for "I feel good about my code."


But... we do all secretly agree that if it isn't pretty high, then the testing isn't doing much, right?

No, sorry, not even close.

To me, measuring the value of a test suite by code coverage is like measuring the productivity of developers by lines of code. It's just not what actually matters.

For example, it's well known that bugs in code tend to cluster. Given a large system where some parts are inherently complex and therefore error-prone but most of the code is routine boilerplate stuff, I would much prefer to see test effort concentrated on the challenging areas.


If you write 100% code coverage, you will probably have a lot more lines of code.


I would agree generally (because most code has exposure to the consumer even when it shouldn't), but, in the specific instance that 1. the code is effectively formed into a library/component/service that you only interact with through a well-defined public API, and 2. the tests test every exposed function of the API, then you don't need to test anything other than that API—100% code coverage (covering code-paths the API can't reach) is meaningless. Any code that the tests don't test in such a scenario is actually code that the library doesn't use; it's dead weight, and should be culled (and actually, given a whole-program optimization pass, it would be.)


>Like, does anyone feel really good about saying there's good test coverage of a class with 50% code coverage? 60%?

Depends. 20% might be enough too. You can just test the 20% harder and more failure dependent paths in your app, not the 80% trivial BS code.

Remember the 80/20 rule?


I do remember it! I feel like having bugs in my code that is only run 20% of the time is not an acceptable amount of bugs.

Like, seriously guys, I'm sure we can all come up with some kind of degenerate case where 80% of the code is in exception-handling or some other really, really, really rarely called piece of code, and feel okay with not testing that. But that's hardly the ordinary scenario.

And, for example, in the messaging code I've recently been writing, it certainly is the case that some of the code is exercised a lot more than others. For example, reads are done an order of magnitude more than writes, and writes are done an order of magnitude more than deletes.

But that doesn't make me feel okay about bugs in my delete code. It's a whole feature. It needs to work.


>I do remember it! I feel like having bugs in my code that is only run 20% of the time is not an acceptable amount of bugs.

Then maybe you don't understand the notion of "opportunity cost".

Perhaps you feel "0 is the only acceptable amount of bugs", but

a) nobody cares for an imaginary bug-free product unless it ships. And shiping, for programs intended for the market, means cutting corners and, yes, shipping bugs too.

b) programs that have 0 bugs are so few as to be statistical noise, including among TDD/full coverage programs.

TDD will basically help you cover that the behavior you expect is happening and help you track breakage when you need to refactor. It wont eliminated all bugs.

>But that doesn't make me feel okay about bugs in my delete code. It's a whole feature. It needs to work.

Perhaps you conflated untested with "not working". People created succesful software for ages, software that changed the world, with no or less than 50% "test coverage".


As others have said, code coverage is an extremely misleading statistic. Most tools measure code coverage by the number of lines of code that were executed in a test. I can easily write "tests" with no expectations that hit every line of code. It is merely showing that in the cases the programmer thought of, it doesn't crash.

Consider this as well: many interpreted languages count the definition of a method/function as a "covered line of code". In other words, if a method/class is defined then you have at least one (maybe several) lines of code "covered". Add to that things like declaring variables and you can easily get to a point where (in good code with very short methods) you have 60% coverage if the method is simply defined and run. If you are writing really good OO code that has very few conditionals, then you may get very, very close to 100% test coverage simply by running each method.

Also, tests are code too. You are just as likely to write a bug in your test as you are in your production code. In fact, I've noticed that in some systems, I actually end up writing more lines of test code than production code. I might actually be more likely to write a bug in the tests. Especially for bugs that are due to problems with requirements, you are very, very likely to put the same bug in the tests as you are the production code.

This is why I don't like calling these things "tests". IMHO the very best "tests" document assumptions that the original programmer had. If you break the assumption, the the test should break. It may or may not be a bug, but it is a big hint that you should be thinking very carefully about what you are doing.

Code coverage is a good statistic in the reverse. It doesn't tell you much if you have good code coverage. On the other hand, it can lead you to find places where your tests are unlikely to be sufficient. Depending on the language/environment you use, if you have less than 80% code coverage in an area, it might indicate that nothing of value is being tested. It is a good place to start looking to check.

Note that setting a code coverage target will defeat this strategy, so I recommend never, ever doing it. As I said, getting nearly 100% code coverage without actually testing anything is relatively trivial if the code is in any way decent (and if it isn't, then you have bigger fish to fry). Setting a threshold above the place where poor testing is obvious just means that you no longer have a means of finding places that are poorly tested.


I think that you think you're arguing with me, but are actually agreeing with me.

Like, I agree with every claim about fact that you're making. Of course you can have high code coverage without having good tests. If all you do is, like, touch lines of code, all you're really testing is that they don't raise an exception. That's why I said that once you hit 80% code coverage or so, I start to want to know not "what your code coverage is," but "how good are your tests?"

But if you don't even run the majority of your code, then your tests are kind of by definition... not testing your code well.


I thought programs spent 80% of their time in 20% of the code?

TDD "code coverage" always seemed like a misguided breadth-first approach that needed to be depth-first primarily and breadth-first as needed.

But I think the latter approach then leads to integration testing as the primary approach, so TDD zealots will hate that.


Unit testing is a leading indicator. The higher the coverage the better the quality of the product is a good broad rule. Engineers hate unit testing to 100% until they do, and then they don't go back, it becomes the norm for their development process. The confidence and assurance you get from that level of coverage is amazing. It takes the stress out of software.


A more recent paper that's related to this, "Coverage Is Not Strongly Correlated with Test Suite Effectiveness": http://www.linozemtseva.com/research/2014/icse/coverage/cove...

I wouldn't want people to conclude that coverage was a useless metric based on this, but it does seem to support the idea that chasing code coverage dogmatically is not necessarily the best use of your time.


I agree -- IMO being dogmatic about 100% or any other number can be a big waste of time.

But a coverage ratio and/or browsing annotated source is a good feedback mechanism for how many tests you've written.


I'm not sure I fully understood the moral of the article but it seems to me that all of these findings are basically what "people knew for a long time". Okay it's good to put data on "beliefs" but is it really useful when quantifying "promoted usage" and "obligation" has failed?

The article does prove that some people tried to impose (via management or structure in large organizations) metrics on coding as a way to rationalize the development quality and miserably failed but clearly lack the counterpart: what happens when you don't impose these metrics? Experience is important but that's not new is it? In the end it just feels like: small programs won't have much problem as do small organizations - Big organization should function like small companies. Not much of an answer.

Also, I'm not sure that a manager saying something like "according to data, we need one more month to make the program 35% more stable" is a good argument (or even a new one). What would be interesting is to try something like "ship the program one month earlier and dedicate the two remaining month to fix what's broken according to feedback", then compare with the first approach.


"higher code coverage was not the best measure of post-release failures in the field"

Perhaps they should measure the coverage by taking the number of tested paths through the software divided by the total number of paths.


The mentioned technical paper about assertions states:

"The assertions are primarily of two types of specification: specification of function interfaces (e.g. consistency between arguments, dependency of return value on arguments, effect of global state etc.) and specification of function bodies (e.g. consistency of default branch in switch statement, consistency between related data etc.)."

from http://research.microsoft.com/pubs/70290/tr-2006-54.pdf

Is there somebody here, who could elaborate on this, ideally with some small examples?


You want to have examples for assertions?

Consistency between arguments: memcpy "The memory areas must not overlap."

Dependency of return value on arguments: max returns one of its arguments. r = max(x,y); assert (r == x || r == y);

Effect on global state: printf might change the global errno variable.


So this means embedding small tests into the production code itself while accepting minor performance loss? And you somehow get a feeling for when this will help through experience? I wonder, whether this impacts all programming languages the same.


It is not necessarily about runtime tests. Often assertions are only used for debug builds. Java requires an extra flag to enable assert statements. Alternatively, assertions could be invariants or pre/post conditions for your formal proofs of correctness.

I'm too lazy to check the actual paper, if they clarify their understanding. ;)


"Code coverage measures how comprehensively a piece of code has been tested; if a program contains 100 lines of code and the quality-assurance process tests 95 lines, the effective code coverage is 95 percent."

I've never heard of this before. Where does this come from and why did they suspect this would be indicative of anything?


The idea is that shipping untested code is worse than shipping (unit) tested code. Code coverage reporting is quite common and I'm surprised you've never even heard of it if you've ever written software professionally.


It's popular among the hobbyist / activist crowd.

People who like to talk about the "proper" way to develop software tend to like it.

People who run open source projects tend to either like it or bow to peer pressure.

But people working on in-house and enterprisey stuff? Yeah, not so much there.


> But people working on in-house and enterprisey stuff? Yeah, not so much there.

This depends a lot on your corporate development culture, even for in-house enterprisey software.


> But people working on in-house and enterprisey stuff? Yeah, not so much there.

I suspect you are working from a limited sample size there.


"But people working on in-house and enterprisey stuff? Yeah, not so much there."

That is absolutely incorrect, in my experience.


> Where does this come from

It's defining the term "code coverage" to be the ratio of lines covered by tests to the total lines.

> why did they suspect this would be indicative of anything?

If anything, it's a useful feedback mechanism. e.g. "Gee, we wrote sixteen unit tests for this module but we're still only at 34% coverage. Ohh, I see it now -- framistan.cpp doesn't get used by these tests at all. We should write a test that enables framistan mode!"


Take a look at tools like Istanbul for Node.js and Simplecov for Ruby on Rails. https://coveralls.io/ has a nice list of code coverage tools for many languages.

If you don't have tests, it is an extremely manual process to change anything about your codebase (including dependencies) and have any confidence that you haven't broken things. Once you write tests, code coverage lets you figure out what code is not being tested.


It is a fairly common practice in companies that do not have technical managers (http://en.wikipedia.org/wiki/Code_coverage). Basically it is useful to measure the coverage of existing conditions (and insure stability) but clearly lacks the ability to test the conditions you forgot to write. Kind of defeats half of the purpose.


Companies also track and report is when required by external entities such as the FAA for safety critical systems.

I think it goes deeper than "the conditions you forgot to write" although that's part of it. A developer's test suite tends to make assumptions and these are the same assumptions that exist in the production code. If your assumptions about usage, or input, or edge cases are poor then the tests you write will be poor too and in that case 100% coverage isn't worth much. Big reason I'm a fan of things like QuickCheck in Haskell.


A quick Google suggests the FAA uses some version of decision coverage (i.e. for each branch in your code, what percentage were taken during testing), not simple line coverage.


Checking your coverage for tests is a pretty old concept.

Early 1960's or so!


there needs to be a culture of using assertions in order to produce the desired results.

IMO, proper "culture" is a significant part of almost all aspects of project management and no matter what, without it, the teams will never reach their optimum potentials, unless some highly motivated dev takes upon their shoulders to implement these base tenets for everyone else.

I'm not saying it's always easy to implement or it guarantees success, but that the lack of it always rears it's ugly head at some point in the development process.


Data helps. But you still have to know how to apply it.


The Mythical Man Month is damn correct.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: