Software Estimation and Planning done right

I’ve been recently asked about my approach to software estimation (with a follow on regarding planning), here are my thoughts on the matter.

When talking to whomever is asking for the estimate/plan, it is imperative to clarify that these are indeed simply estimates and are going to be wrong. As Steve McConnell has shown in his book Software Project Survival Guide, the further away from the end of the project, the less accurate the estimate:

Cone of uncertainty (Code Complete, Steve McConnell)

Unfortunately, there’s a tendency to take estimates and turn them into deadlines – it is an easy trap to fall into and is something we all do.

Estimating, the right way

Luckily there’s a simply solution to this issue:

Never provide a single value as an estimate – give a range.

A range? Absolutely – it immediately conveys the amount of uncertainty that is inherent in the estimate. It gives your stakeholders something concrete to base plans on, and the ability to add risk assessment and mitigation options to their plans. Saying that “this project is estimated to take between two weeks and four months” tells people how much is unknown and uncertain about the project. This is a good thing.

But how do I get this range? What estimation process would provide it?

The lower number is simple – it is likely what you already do when you estimate (whether alone or with your team) – it is the estimate most developers give when asked for an estimate, the optimistic, “nothing will go wrong” estimate, the “this is how long things should take, all else being equal”. It is the estimate that ignores many possible issues and delays.

But how to get to the higher number? While estimating, after getting the lower number, ask – how would this change if something delayed the work? Examples of which are: major refactoring, new feature in a library that the code depends on, hardware provisioning etc… – I am sure you can come up with other scenarios that are suitable to whatever task is being estimated. I suggest using something plausible – best if it is such an issue that has happened in the past with a similar task. I find that assuming such a monkey wrench thrown into the works tends to change the estimates drastically. Which is exactly what I am looking for.

And here you have it – an estimate range, with built in uncertainty :)

But, I hear you ask, what if the scope isn’t clear? What do to if the task cannot be broken down? How do I deal with changes?

These are all valid – and the answers are – if the scope isn’t clear, try to clarify it as much as possible. If this requires research in order to gain some understanding of the issue – that’s good. The research should give you an idea about what is required and the different large items that are needed and how long they should take (again – a range, from optimistic to pessimistic).

Additionally, most tasks *can* be broken down, to large components, if nothing else. Again, research helps here.

And changes? They require re-estimation. Figure out what needs to change as the requirements change (if lucky, nothing already done needs to be changed). Estimate the re-work required, estimate the new requirements and, remove old no longer needed tasks and presto – a new set of estimates, with the changes included.

Planning, the right way

Another thing I like to do with task/project estimations is to break things down to knowns and unknowns, where knowns are tasks that are familiar – these are items that are very similar to things that have already been done by the team, so are a known quantity – people know how long the similar item took, so there’s some confidence in getting an accurate estimate.

But what about the unknowns? And I class the following into that: new language, library or platform. As well as any project/task that largely deviates from things that have been previously done by the team (including things in much larger scale or that depends on a new vendor). With these, it is best to break down to individual items – and keep breaking down till you get to known items or small tasks (small meaning – estimated at a few hours). The broken down items should be estimated in the same way as above.

The end result should be a fully estimated set of tasks – each with a low and high estimate, the range of which should convey the uncertainty in the estimate. A bunch of these tasks would naturally be grouped to larger tasks/projects – the larger estimate range for those is simply the sums of each of the ranges (i.e. – add up the lower estimates to get the aggregate lower estimate and the same for the higher estimate).

At this point, you have a list of features with estimate ranges that you can show your stakeholders (or, better yet, a product manager, if you are lucky enough to have one) for planning and prioritising.

Hopefully, that list of features makes sense – it is best to cluster tasks to groups that make sense as whole features or related features – and try to get some coherence across the board (you will find some items conflict with each other or are at complete opposites) – you and your product manager will need to make sense of those.

I suggest prioritising by what would bring most benefit to the business.

If at all possible, ensure tasks can be worked on independently and delivered independently – this means you can have something to show earlier, possibly even something you can deploy so the business/client can get some value from it as soon as possible. This has an added benefit of shortening the feedback cycle – any new features or bug fixes happen earlier. It is a virtuous cycle.

Another thing to do is try and simplify the features and tasks as much as possible. If you can cut 20% of a feature to deliver it 80% faster? That’s a massive win. Most of the time, you will find that gold-plating a feature is wasted time. Being able to only do the simple thing instead of accounting for all edge cases and possible uses is a boon – it is a form of YAGNI, and most of the time, you really won’t miss out on things.

Pro Git 2nd Edition – a review

I found the book to be quite a good introduction as well as a suitable book for power users.

The book starts with the basics – the common uses that most users will have of git, it then goes into workflows followed by more esoteric uses and an in depth look into git internals.

In the in-depth chapters, the details are very low level – how git stores different objects, what they look like and what information the different object types contain.

The book is very *nix centric – some chapters and commands assume a *nix environment, and it isn’t clear how these specifics translate to windows (examples include hooks and hook scripting).

Things I found interesting:

  • I did not know that I could just use a network share for collaboration (using the local protocol). This is very simple to setup – could be perfect for a small office/home office environment where code doesn’t need to be on the Internet.
  • rerere – recording confict resolution. A git feature that allows recording how merge conflicts have been resolved, so future conflicts can be automatically resolved using the same strategy (this is a rare use but can be very helpful if the same merge needs to be performed repeatedly).
  • Git hooks for customizing actions – for example, special commit rules, running commands when fetching and more (the book doesn’t make it clear if/how to manage this on Windows).
  • filter-branch – a very powerful, but very dangerous feature, that allows rewriting history across the whole repository and commits in it. Can be useful for removing certain files from all revisions (say a private key file was committed by error and should be completely expunged).
  • How reset works – the section explains the different states of a repository being tracked, what HEAD means and how to think about it and what reset does. Clarified quite a bit for me.
  • Splitting a repository – for example, if the repository has grown a lot and only the recent history is of interest, it is possible to split a repository into old historical/current.

I recommended this book if you feel you are not using git effectively.

It contains lots of info about how different commands work, and is a good introduction to many tools, some of which you may not be familiar with (stash, rerere, reset, bundling, rebasing and more) as well as a chapter dedicated to the use of github.

The book can be read online, downloaded for free (supported formats are pdf, epub, mobi and HTML), and a dead tree edition is available on amazon.

Did you know? A .NET CSV parser that comes with Visual Studio?

In the Microsoft.VisualBasic.FileIO (Microsoft.VisualBasic.dll) namespace lives the TextFieldParser class.

This useful little class can be used to parse structured text files – either delimited or fixed length. You can iterate over the lines in the file and extract the data through the ReadFields method.

Since it is provided by Microsoft, you can use it in environments that do not allow “third-party” libraries and as it is a .NET libarary you can use it in any .NET language (yes, C# and F# included) – just import the library.

The examples on MSDN are all in VB.NET, but are easily translated to other .NET dialects.

How to: Read From Comma-Delimited Text Files
How to: Read From Fixed-width Text Files
How to: Read From Text Files with Multiple Formats

Class libraries do not have configuration

Why is that?

If one takes the time to think about it, class libraries only ever execute in the context of an application – a web application, a console application, a unit test runner or some other executable.

A class library gets loaded into the memory space of the application and gets called and executed there.

It makes sense then that the executing application should be the one deciding on how to configure a class library it is using, rather than the other way around.

The .NET framework configuration subsystem subscribes to this idea – when using the System.Configuration namespace, the application configuration file will be the one queried, even if there is a configuration file matching the dll name.

I would add that configuration should be treated as a dependency – the values should be injected into any class or method that required them – read the excellent blog by Paul Hiles – Configuration Settings Are A Dependency That Should Be Injected.

Date and Time format strings in .NET – Introduction

One of the most common misunderstandings I see on StackOverflow regarding the DateTime structure is the difference between the value of a DateTime instance and how it is displayed.

A DateTime instance (say one representing midnight of March 26th 2011) has an internal representation that has no specific formatting – it is not something that will make sense to any human being in that form (this post is not about what exactly that representation is).

What this means is that every time you see a value for a DateTime instance, you are seeing this internal value after it has been formatted for human eyes.

How this formatted value comes to be is the subject of this blog series, starting with this introduction.

In future posts I will discuss the roles of IFormatProvider and format strings, Cultures and the Regional settings.

So you need to format some money

When formatting a currency for display, always use a CultureInfo object when outputting in order to get the correct formatting – different places will have a different thousands separator, decimal separator and more.

In many cases, you can get the CultureInfo from the UI thread and in a web application you could guess which one is the correct one, by parsing out the user agent header.

If you always want the exact same output, you do not have to specify a format string, simply use CultureInfo.InvariantCulture, this is a dummy culture and does not correspond to any country/region. The different settings are similar to “en-US”.

This example will output the format for the Swedish culture.

decimal value = -16325.62m;

This example will output the format for the InvariantCulture.

decimal value = -16325.62m;

Here is a list of culture names.