Lines of Sight




  1. When will the system-platform-service be accepted? Really? How do you know?
  2. How many more test cycles should we plan for? No... I mean exactly?
  3. Are the client's needs satisfied? Prove it.

System Stabilization Metrics

Early in my career, in the role of a project manager on a multi-month project, one particular Marketing Director would periodically stop by, asking how long it be before her new product would ready for release. I would say in about "three or four weeks, which seemed reasonable since we had been testing for two weeks already and things were going well. (I had learned to estimate in date ranges always better than an exact date estimate.) I was pretty happy with "three or four weeks, rather than "June 13." Then the Director asked how I knew it wouldn't be two weeks or five weeks. I didn't have a good answer and didn't want to admit that I was running on gut feel. Tracking defect inflow and outflow is one of the best ways to verify intuition when forecasting releases. Defect inflow is the rate at which new bugs are reported against the system. Plotting defect inflow over time will look like an inverted V because testing is slow going at first. Common causes include:
  1. The system under test contains blocking defects. (A blocking defect prevents further deeper characterization of the system under test past the point of the defect.)
  2. Setup and testing extended past the last cycle, delaying test-design activities into the current cycle, impeding characterization.
  3. Testers are unfamiliar with the system under test, due to evolving designs engagement gaps with development discussions.
Defect outflow represents the number of defects fixed each week. Typically this will start out slow (it cannot exceed the rate of inflow at first, until a backlog of defects builds), but will grow and level off at a steady rate. Just prior to release, there is usually a push to fix only absolutely critical defects. At this point, outflow drops dramatically. The metrics in the Figure are representative of most successful projects. On the larger projects, it's a good idea to plot defect inflow and outflow on a monthly basis; daily and even weekly ranges can vary too much. I found it best to limit defect charts to eight periods of data. If you are following an agile process such as XP or Scrum, a project can change drastically in that time, so older data is of minimal value. Eight months is also long enough to see most trends. As you monitor defect inflow and outflow, consider all factors that may influence your measurements. You don't want to make a pronouncement that the system is nearly ready to go-live only due to declining inflows. Inflows decline for a number of reasons, for example, testers reworking a significant number of tests rather than characterizing functionality. Since all defects are not equal, you may want to try weighted inflow and outflow tracking. Use a five-tier defect categorization and weighting such as:
  1. Critical 20
  2. High 10
  3. Medium 7
  4. Low 3
  5. Very Low 1
By weighting defect inflow and outflow, stakeholders get a much better feel for what is happening on a project. For example, I have typically found that trends in weighted inflow usually lead trends in unweighted inflow. On most projects, there is a point at which testing is still finding a significant number of bugs, but the bugs are lower in severity. In this case, you'll see weighted inflow drop while unweighted inflow will typically remain constant. In most cases, this indicates that you are one to three periods away from seeing a matching drop in unweighted inflow. By measuring weighted defect inflow, you'll be able to forecast this important turning point in a project a few weeks earlier. Monitoring weighted outflow (the weighted defects fixed) helps ensure that your programmers are addressing the highest severity defects first. Regardless of how you define Critical and Low, you would almost assuredly prefer that programmers work on Critical defects. Focusing on weighted outflows directs everyone's attention toward higher-value work.

Programmer Quality Metrics

It is useful to track fix rejects (or QA rejects), meaning QA was unable to verify a particular defect fix as, indeed, fixed. You can calculate rejects on the programming team as a whole, on each programmer individually, or on both. For many complex enterprise teams, comprising many programmers and testers, it is common to see reject rates exceeding 8-12%. Programmer Quality Metrics Graphic Be careful when using these metrics. In almost all cases, you should not use them as a significant factor in formal or informal evaluations of programmers. Knowing a team's reject rate is essential in predicting GO-LIVE timing.

Counting Defect Inflow and Outflow

Counting defect inflow and outflow is complex. One common complication is how to count defects that have been reopened. For example, suppose QA reports a defect one day; a programmer marks it as fixed the next. Now, two weeks later, the defect shows up again. Should it be reported as part of inflow? I say no. Think of defect inflow as defects first reported during the period. This is the easiest way to generate the metrics out of the defect tracking systems I've used. In most defect tracking systems, it is easy to get a count of defects that were marked fixed in a given reporting period, but it is difficult (albeit not impossible) to get a count of defects closed this week that had been closed before. Imagine the query count all defects opened this week or defects reopened this week that were resolved earlier. Keep in mind, though, that if you count defect inflow as defects first reported during the period and outflow as any bug marked as closed during the period, you can find yourself with some unusual results. For example, in a two-week period assume that one defect is reported during the first week and no defects are reported during the second week. If a programmer fixes the defect the first week, you'll show an outflow of one. During the second week, a tester discovers that the defect was only hidden and is really still there. If the programmer really fixes the defect the second week, you'll show no inflow but an outflow of one in that second week. In other words your aggregate metrics will show one defect reported, two fixed. Don't go through the extra work of reducing inflow or outflow counts from previous weeks. The metrics should reflect knowledge at the time they were collected. There will always be some amount of uncertainty in your latest period of metrics, so it is consistent to allow that same amount of uncertainty in prior weeks. The real solution to situations like this is to make sure that anyone who will make decisions based on your metrics understands how they are calculated and any biases they may contain. For example, the method of counting inflow and outflow described above is what I call a programmer-biased metric. In other words, the metric is completely consistent with how a programmer (as opposed to a tester) would view the counts. As such, it is likely to result in a slightly optimistic view of the program at any point. It is optimistic because defect inflow is shown as slightly less than it probably is and defect outflow is shown as slightly better than it probably is. Be sure that those who use your metrics understand the biases built into them. If your customers aren't happy, then you won't be happy for long. Tracking separately field reported defects, collected during early field trials, as well as user acceptance defects collected during phased iterations, is your last line of defense before GOING-LIVE. Trends should follow the same V pattern. Thresholds should meet or exceed Field Support constraints in order to meet ongoing cost objectives. Programmer Quality Metrics Graphic

Parting Guidance

With any measurement effort, most people will adjust behavior to measure well against the metric. Others will game the system avoiding detection. Two good methods work correcting the latter human tendency, serving to alter behavior after it becomes "measured". First, you can make it very clear that there is no consequence to the measurement. That was my recommendation with defect rejects. Don't take the worst couple of programmers out behind the corporate woodshed; instead, simply identify them for additional training, or assignment to more appropriate work, or awareness estimating. The second method is to measure other complementary areas that will expose, and, ultimately, counteract those costly behaviors. At minimum, defect-trend metrics are essential for every significant systems development project. Without them, Project Managers are forced to rely on gut feel or other intangible or anecdotal evidence in answering those critical questions. The needs of each project or development organization are different, but those metrics at least point you in the right direction.

If you liked my post, feel free to subscribe to my rss feeds

Post Lines of Sight to digg. Post Lines of Sight to Reddit add to stumbleupon Google Bookmark myweb bookmark on newsvine Tailrank magnolia Furl co.mments shadows simpy blinklist

Post a Comment

Your email is never published nor shared. Required fields are marked *