Software Estimation and Planning done right

I’ve been recently asked about my approach to software estimation (with a follow on regarding planning), here are my thoughts on the matter.

When talking to whomever is asking for the estimate/plan, it is imperative to clarify that these are indeed simply estimates and are going to be wrong. As Steve McConnell has shown in his book Software Project Survival Guide, the further away from the end of the project, the less accurate the estimate:

Cone of uncertainty (Code Complete, Steve McConnell)

Unfortunately, there’s a tendency to take estimates and turn them into deadlines – it is an easy trap to fall into and is something we all do.

Estimating, the right way

Luckily there’s a simply solution to this issue:

Never provide a single value as an estimate – give a range.

A range? Absolutely – it immediately conveys the amount of uncertainty that is inherent in the estimate. It gives your stakeholders something concrete to base plans on, and the ability to add risk assessment and mitigation options to their plans. Saying that “this project is estimated to take between two weeks and four months” tells people how much is unknown and uncertain about the project. This is a good thing.

But how do I get this range? What estimation process would provide it?

The lower number is simple – it is likely what you already do when you estimate (whether alone or with your team) – it is the estimate most developers give when asked for an estimate, the optimistic, “nothing will go wrong” estimate, the “this is how long things should take, all else being equal”. It is the estimate that ignores many possible issues and delays.

But how to get to the higher number? While estimating, after getting the lower number, ask – how would this change if something delayed the work? Examples of which are: major refactoring, new feature in a library that the code depends on, hardware provisioning etc… – I am sure you can come up with other scenarios that are suitable to whatever task is being estimated. I suggest using something plausible – best if it is such an issue that has happened in the past with a similar task. I find that assuming such a monkey wrench thrown into the works tends to change the estimates drastically. Which is exactly what I am looking for.

And here you have it – an estimate range, with built in uncertainty :)

But, I hear you ask, what if the scope isn’t clear? What do to if the task cannot be broken down? How do I deal with changes?

These are all valid – and the answers are – if the scope isn’t clear, try to clarify it as much as possible. If this requires research in order to gain some understanding of the issue – that’s good. The research should give you an idea about what is required and the different large items that are needed and how long they should take (again – a range, from optimistic to pessimistic).

Additionally, most tasks *can* be broken down, to large components, if nothing else. Again, research helps here.

And changes? They require re-estimation. Figure out what needs to change as the requirements change (if lucky, nothing already done needs to be changed). Estimate the re-work required, estimate the new requirements and, remove old no longer needed tasks and presto – a new set of estimates, with the changes included.

Planning, the right way

Another thing I like to do with task/project estimations is to break things down to knowns and unknowns, where knowns are tasks that are familiar – these are items that are very similar to things that have already been done by the team, so are a known quantity – people know how long the similar item took, so there’s some confidence in getting an accurate estimate.

But what about the unknowns? And I class the following into that: new language, library or platform. As well as any project/task that largely deviates from things that have been previously done by the team (including things in much larger scale or that depends on a new vendor). With these, it is best to break down to individual items – and keep breaking down till you get to known items or small tasks (small meaning – estimated at a few hours). The broken down items should be estimated in the same way as above.

The end result should be a fully estimated set of tasks – each with a low and high estimate, the range of which should convey the uncertainty in the estimate. A bunch of these tasks would naturally be grouped to larger tasks/projects – the larger estimate range for those is simply the sums of each of the ranges (i.e. – add up the lower estimates to get the aggregate lower estimate and the same for the higher estimate).

At this point, you have a list of features with estimate ranges that you can show your stakeholders (or, better yet, a product manager, if you are lucky enough to have one) for planning and prioritising.

Hopefully, that list of features makes sense – it is best to cluster tasks to groups that make sense as whole features or related features – and try to get some coherence across the board (you will find some items conflict with each other or are at complete opposites) – you and your product manager will need to make sense of those.

I suggest prioritising by what would bring most benefit to the business.

If at all possible, ensure tasks can be worked on independently and delivered independently – this means you can have something to show earlier, possibly even something you can deploy so the business/client can get some value from it as soon as possible. This has an added benefit of shortening the feedback cycle – any new features or bug fixes happen earlier. It is a virtuous cycle.

Another thing to do is try and simplify the features and tasks as much as possible. If you can cut 20% of a feature to deliver it 80% faster? That’s a massive win. Most of the time, you will find that gold-plating a feature is wasted time. Being able to only do the simple thing instead of accounting for all edge cases and possible uses is a boon – it is a form of YAGNI, and most of the time, you really won’t miss out on things.

GitHub for windows – git bash – remembering the passphrase

I seem to hit this issue every time I setup GitHub for Windows on a new machine.

I setup my SSH connection to the git server, but every single time I use git over the SSH transport, I need to re-enter my passphrase.

After finding and trying a few different solutions, this is what worked for me:

  1. Create a .bashrc file in your home directory (the %USERPROFILE% environement variable).
  2. Add the following lines in it:
    eval $(ssh-agent)

Start a git bash session – you will be asked for your passphrase once, at the start, and no more.

Using eval 'ssh-agent' did not work for me (and strictly speaking the #!/bin/bash is not needed).

Would love to hear about better alternatives / ways to achieve this.

Design for developers – how it went

It has been a while since I went through the course, and I wanted to report back on how it went. Some things went well, some no so well…

The good:

  • The course is a great introduction to design.
  • I learned a lot of design jargon (for example: visual hierarchy).
  • I gained a solid understanding of what it takes to do design.
  • I now know what makes good design and what makes bad design.
  • I learned about some of the existing design tools and utilities out there (color scheme selection helpers, style tiles).
  • Having a mentor and getting feedback both on the site and via video (Skype in this case) was incredibly useful and pushed my learning much faster.

The not-so-good:

  • Though advertised as only requiring 10 hours a week, it took significantly longer than that:
    • Large amounts of reading (at least 10 hours worth per week).
    • Being unfamiliar with design tools meant I had to learn and experiment with them as well (GIMP, Inkscape and Balsamiq Mockups to name a few).
  • My mentor disappeared on me during the last week of the course (he had a product launch).
  • I did not realize that iteration were expected from the get go – the interface wasn’t clear about this (there were a few hints, but not enough for me), not did my mentor explicitly mention it.
  • Some of the downloaded resources require Photoshop and other commercial tools, where a developer or someone learning about design wouldn’t necessarily have them (or want to purchase them just for the course).

In general, the course was great, though I didn’t get as much from it as I probably could have.

Coming into it, I didn’t have a specific project that I wanted to apply my learning to. Had I had one, I would have had more motivation and some end goal to work towards during the course (the course did seem geared towards the notion of working on an existing project, in particular the last two weeks of it).

I struggled with the actual time requirements, partially due to the need to learn new tools and partially due to the large amount of required reading.

I was able to gain an understanding of the design workflow, language and tooling. Not enough to become a designer myself, by enough to converse and understand design jargon, and possibly come up with a decent enough design myself.

I had a feedback session with one of the founders – he was very receptive to everything I had to say, the good and the bad, and I have seen that since, they have made some changes. To be fair to them, the course was still very new (I was in the second “class” – only the second time they have run through it) – they have now had plenty of time to refine things.

The course has been renamed from “Design for Developers” to “Design 101“, which I think is a much better name. The reading requirements have been adjusted and there have been many changes to the UI to address some of the issues I’ve raised. I’ve been told that introductions to the different tools and downloadable resources have been added to the site.

Design Labs are also offering a – UX Research & Strategy course these days.

If you are thinking of taking this course I can make a few recommendations:

  • Have a project that needs some design work. This can be something you work towards polishing throughout the course.
  • Expect to spend a good amount of time learning the tools.
  • Make sure you iterate from day one – this is where the value is. Given feedback from your mentor – follow up on it, make changes and iterate.
  • Enjoy!

Pro Git 2nd Edition – a review

I found the book to be quite a good introduction as well as a suitable book for power users.

The book starts with the basics – the common uses that most users will have of git, it then goes into workflows followed by more esoteric uses and an in depth look into git internals.

In the in-depth chapters, the details are very low level – how git stores different objects, what they look like and what information the different object types contain.

The book is very *nix centric – some chapters and commands assume a *nix environment, and it isn’t clear how these specifics translate to windows (examples include hooks and hook scripting).

Things I found interesting:

  • I did not know that I could just use a network share for collaboration (using the local protocol). This is very simple to setup – could be perfect for a small office/home office environment where code doesn’t need to be on the Internet.
  • rerere – recording confict resolution. A git feature that allows recording how merge conflicts have been resolved, so future conflicts can be automatically resolved using the same strategy (this is a rare use but can be very helpful if the same merge needs to be performed repeatedly).
  • Git hooks for customizing actions – for example, special commit rules, running commands when fetching and more (the book doesn’t make it clear if/how to manage this on Windows).
  • filter-branch – a very powerful, but very dangerous feature, that allows rewriting history across the whole repository and commits in it. Can be useful for removing certain files from all revisions (say a private key file was committed by error and should be completely expunged).
  • How reset works – the section explains the different states of a repository being tracked, what HEAD means and how to think about it and what reset does. Clarified quite a bit for me.
  • Splitting a repository – for example, if the repository has grown a lot and only the recent history is of interest, it is possible to split a repository into old historical/current.

I recommended this book if you feel you are not using git effectively.

It contains lots of info about how different commands work, and is a good introduction to many tools, some of which you may not be familiar with (stash, rerere, reset, bundling, rebasing and more) as well as a chapter dedicated to the use of github.

The book can be read online, downloaded for free (supported formats are pdf, epub, mobi and HTML), and a dead tree edition is available on amazon.

Design for developers – the journey begins

So I decided to take a design course.

Graphic design, that is. Not design patterns, not architecture. Graphic design.

Why would I, a programmer for well over a decade, decide to do that?

Because design by developers:

Design by developers

(image taken from You Sank my Battleship on infovark)

I have always been one of those developers that said – I can build it, but I can’t make it pretty.

Not that a design course would magically make my software pretty – but at least I will have the basis to identify whether the design is bad and to talk to designers in their language, without making a fool of myself. I mean – think about how non-programmers talk to you about programming, right?

The course I have taken is design for developers, offered by DesignLab. This is a four week course, where one is expected to put in 5-10 hours a week.

I know this sounds like a lot, in particular considering this is done outside of “office hours” (I work remotely and make my own hours – but my study for this course is strictly off the clock). However, having taken Open University courses in the past, while in full time employment, I cam confident this is easily achievable in particular given that weekends are off the clock.

I intend to document my progress here as I go through the course.

Stay tuned.

What Developers Need to Know about SQL – SQL is not for everything

Here is another intersting tweet from What Developers Need to Know About SQL Server:

What Aaron is talking about here is that SQL is a Query Language – that is what SQL and relational databases are optimized for – getting result sets.

SQL is not a data manipulation language – yes, you can do quite a lot of manipulation of string, dates and such, but the language does not lend itself to such operations.

It is far more efficient to move such operations to the client/UI code – whatever language you use to interact with the database will almost certainly have better facilities for manipulating strings and dates.

What Developers Need to Know about SQL – Set Based Programming

Here is one stand alone tweet from What Developers Need to Know About SQL Server, which is rather information dense and worth expanding on:

Stuart Miller compares the mental shift required for someone who is used to OOP do develop in SQL to the mental shift those used to imperative programming (c, COBOL, Basic and more) need to go through in order to truly understand and use OOP correctly.

This different programming paradigm is set based programming, taking its name from set theory, a mathematical framework from which the relational model of databases was derived by Edgar F. Codd, which in turn gave relational databases their name.

One of the major differences between set based programming and object oriented programming has to do with how they deal with groups of related items. Where in object oriented programming, one would iterate over a collection of objects, dealing with each item at a time, set based programming deals with the group as a whole – there are no iteration constructs. That’s right – there are no loops in set based programming.

As result, the way to think about and apply operators is in groups of related items (sets, or relations, in Codds’ relational model), not individual items. Operators allow filtering, aggregating, projection, joining and more – always resulting in groups of related items (where a group can be empty or consist of a single item).

Relational theory is the basis from which relational databases have been built and the relational engines that power them are optimized for this kind of thinking – set based thinking. Most relational database do offer looping constructs – but these are most often than not best avoided as they go against the grain and can be the cause of performance issues. This is not because relational databases are slow – it is because using loops instead of operating on sets goes against how they were built and what they are optimized to do.

What Developers Need to Know about SQL – Introduction

I will be starting a series of blogs, inspired by a twitter conversion Brent Ozar had with other database folks and the highlights of which he collated as a blog post, titled What Developers Need to Know About SQL Server.

Since these started life on twitter, they are quite short and though Brent briefly expands on some of the tweets, I believe there is value is expanding on them even further, explaining the rationale and thinking behind them.

Though Brent and the conversation are ostensibly SQL Server oriented, many of the points raised are applicable to any relational database – Oracle, MySql, PostgreSQL etc. I plan to mostly deal with these shared points.

Let me know in a comment if there is anything specific you are interested in.

Anatomy of a XSS vulnerability on Stack Overflow

Stack Overflow has had a mobile version of the site for quite a while now, and to make life easy for our users, we have a switcher in the footer – allowing one to toggle between the mobile and the full versions of the site.

Recently, an XSS vulnerability on this link was disclosed via our meta site.

The link markup was looking like the following:

<a onclick='StackExchange.switchMobile("on", "/some/path")'>mobile</a>

Where, "/some/path" is the current request path – this ensures that when switching between mobile and full, one remains on the same page.

As it turns out, this path was rendered to the page using HTML encoding only, meaning that when accessing the site with the URL,alert%281%29,%22 (which would throw up a “404 page not found” that contains the footer), clicking the link would execute a JavaScript alert.

How does it work?

When rendering the path %22,alert%281%29,%22, the link ended up looking like this:

<a onclick='StackExchange.switchMobile("on", "",alert(1),"")'>mobile</a>

Since %22 encodes ", %28 is ( and %29 is ).

The initial fix was to ensure " was correctly encoded. It was a quick fix – done to minimize damage from this particular form of attack. It was followed by a change to the StackExchange.switchMobile function where no path parameter exists anymore and which precludes this attack – the path is no longer in a string and URL checking was moved to the server side.

Did you know? A .NET CSV parser that comes with Visual Studio?

In the Microsoft.VisualBasic.FileIO (Microsoft.VisualBasic.dll) namespace lives the TextFieldParser class.

This useful little class can be used to parse structured text files – either delimited or fixed length. You can iterate over the lines in the file and extract the data through the ReadFields method.

Since it is provided by Microsoft, you can use it in environments that do not allow “third-party” libraries and as it is a .NET libarary you can use it in any .NET language (yes, C# and F# included) – just import the library.

The examples on MSDN are all in VB.NET, but are easily translated to other .NET dialects.

How to: Read From Comma-Delimited Text Files
How to: Read From Fixed-width Text Files
How to: Read From Text Files with Multiple Formats