
Things that go bump in the night
Late last year I found it really interesting reading Martin Daly’s ‘Higher Education’ series of posts. Here was a real story about about a real project – which had a real resonance with me. As one large project draws to a close, and inspired by Martin’s posts I thought I’d share some of my experience about the reality of projects.
It shouldn’t be suprising to hear that sometimes projects have their ups and downs – if you’ve sat in a presentation of mine recently you’ll realise why no amount of GANTT charts can help this situation. So it was little suprise to me that towards the end of this large project we had a few bumps – I should say that whilst I wasn’t suprised I was frustrated because we’d been putting alot of time and effort into making sure we tried to unearth issues early on.
The bumps
Stability
The first was associated to a data loading component in our solution. This component automates the processing of state wide source data into a complex data model. The software runs from within ArcCatalog and depends upon a combination of geoprocessing models, custom ArcObjects and some pl\sql packages which move the data into the final schema. We knew it was a pretty intense process and we’d spent a lot of time making it perform because performance was important to the client, however while we were looking one way we forgot to to look the other and we let a stability issue pass us by. Bump
Functionality
The second was a particular piece of functionality which we’d implemented into our services tier. This functionality was fairly straightforward – we had to analyse a geometry that was defined and created in memory at runtime against some layers – again defined at runtime to return some values (once again defined at runtime). All our tests had been successful until we encountered a specific type of geometry. Bump!
Why did we bump
I don’t need to explain in detail about these two issues, but its useful to try and reflect why we encountered them, after all we were using SCRUM and reflecting is an important part of the process:
Iterating is not enough – we had a series of sprints, at each sprint the usual scrum ceremonies took place – every few sprints a larger set of the client team joined us and we spent a full day going through all the functionality developed up to that point. But iterating is not enough – were missing the final press of the GO button – deploying each iteration. There were reasons why we couldn’t deploy in the early stages, we were deploying onto out internal test infrastructure from our development infrastructure and we had our CI server doing the heavy lifting for us, but we underestimated the reality and value of the software being directly in the hands of users from as early as possible.
Look out for the warning signs – there were warning signs for the stability issues. I missed them. Not being able to deploy frequently onto client infrastructure stopped the issue bubbling up earlier, but even in our own testing and deployment we dropped the ball, perhaps because we were focussed on high priority features such as performance.
Testing the boundary – we did a lot of testing, but we missed a crucial boundary area. In fact its more accurate to say that we would have benefited from the client scrutinising those crazy boundary cases that only they’re familiar with earlier on, this ommission was enough to let that functional issue slip through the cracks.
Getting over the bumps
We got over these bumps fairly quickly and its as valid to look at how we did this as understand why they occured in the first place:
Review: In the case of our stability issue with the data loader our solution lay in a review – we’d done code reviews during development, but I’ve got a growing belief that continuous review is an extremely valid process. After some careful performance testing we came to the conclusion that we had a memory leak or some nasty fragmentation going on, but we were struggling to isolate the real cause of the problem. I’m lucky enough to work with some very smart people and a phonecall or two later and a great piece of review we had the issue nailed.
Good design…and a good backup: SCRUM is some ways forces better design. If you expect change to occur as a result of an iterative process you’re always thinking that your design must account for change. This means that good OO design and constant refactoring can help you overcome changes as they occur. I like to think we did these things and this prepared us well for our functionality issue. I didn’t have to pull the system apart when our we found the bug in our initial design, i’d given myself enough room to move, I also had a back up design. Its really important to record the design process IMO – notes, sketches, and ideas may all be useful at a later date. Its also useful to not be afraid to throw away something. My design allowed me to throw away the problem part without a knock on affect, I could fairly rapidly test new designs and when it came to it get a fresh perspective on the issue.
A great team: This is the most important factor in overcoming the bumps. Being able to bounce ideas of others in the immediate and not so immediate team helped a great deal. The committment of the team to get through the issues – the folks that worked on the project really wanted to see it succeed – this was a huge, huge factor. I think one of the main reasons why our team got over the hurdles was because our client worked with us, as part of the team. When we encountered the issues we didn’t brush them under the carpet we discussed the problem, assessed the options, provided advice, shared ideas and in the end helped each other to get it sorted. There is something deeply unsatisfying when you have to work in a client v’s vendor – us v’s them relationship – I’m happy to say we left all that bullshit behind in about week 3 of the project.
The outcome – there’s lots of things I would change, but the project has been a success from my perspective. Hopefully the clients too. I’m going to take a great deal of this experience forward and hopefully there will be a few less bumps the next time.
Late last year I was really pleased when I read Martin Daly’s ‘Higher Education’ series of posts. Here was a real story about about a real project – which had a real resonance with me. As one large project draws to a close, (..and inspired by Martin’s posts) I thought I’d share some of my experience about the reality of projects.
It shouldn’t be surprising to hear that sometimes projects have their ups and downs – no amount of GANTT charts can help this situation! So it was little surprise to me that towards the end of this large project we had a few bumps – I should say that whilst I wasn’t surprised I was frustrated because we’d been putting a lot of time and effort into making sure we tried to unearth such bumps early!!!
The bumps
1. Stability
The first bump was associated to a data loading component in our solution. This component automates the processing of state wide source data into a complex data model. We knew it was a pretty intense process and we’d spent a lot of time making it perform – because performance was important to our client, however while we were looking one way we forgot to to look the other and we let a stability issue pass us by. Bump!
2. Functionality
The second was a particular piece of functionality which we’d implemented into our services tier. This functionality was fairly straightforward – we had to analyse a geometry that was defined and created in memory at runtime against some layers – again defined at runtime to return some values (once again defined at runtime). All our tests had been successful until we encountered a specific type of geometry, and a bug in the underlying technology. Bump!
Why did we bump
I don’t need to explain in detail about these two issues, but its useful to try and reflect on why we encountered them, after all we were using SCRUM and reflecting is an important part of the process:
1. Iterating is not enough
We had a series of sprints, at each sprint the usual scrum ceremonies took place – every few sprints a larger set of the client team joined us and we spent a full day going through all the functionality developed up to that point. But iterating is not enough – we were missing the final press of the GO button – deploying each iteration. There were reasons why we couldn’t deploy in the early stages. We were deploying onto our internal test infrastructure from our development infrastructure and we had our CI server doing the heavy lifting for us, but we underestimated the reality and value of the software being directly in the hands of users from as early as possible.
2. Look out for the warning signs
There were warning signs for the stability issues. I missed them. Not being able to deploy frequently onto client infrastructure stopped the issue bubbling up earlier, but even in our own testing and deployment environments we dropped the ball. I think our focus on the high priority features such as performance was a factor in this.
3. Testing the boundary
We did a lot of testing, but we missed a crucial boundary area. In fact its more accurate to say that we would have benefited from the client scrutinising those crazy boundary cases that only they’re familiar with earlier on. This omission was enough to let that functional issue slip through the cracks.
Getting over the bumps
It’s important to look at how we resolved these issues as well as understanding why they occurred in the first place:
1. Review
In the case of our stability issue with the data loader our solution lay in a review and memory profiling. We’d done code reviews during development as well as testing and review of UI functionality and of course end to end testing of the load process (which takes around 6-7 hours). When we encountered the issue we conducted some careful memory monitoring and we came to the conclusion that we had a leak or some nasty fragmentation going on, but we were struggling to isolate the real cause of the problem. I’m lucky enough to work with some very smart people and a phone call or two later and a great piece of review we had the issue nailed. I’ve got a growing belief that continuous review is an extremely important process – from pair programming to formal code review – these processes are invaluable.
2. Good design…and a good backup
An iterative process can lead you to design with change in mind, just like unit testing can lead to more carefully and considered design. Good OO design and constant refactoring can help you overcome changes as they occur. I like to think we did these things and this prepared us well for our functionality issue. I didn’t have to pull the system apart when we found the issue in our initial design, I’d given myself enough room to move. I also had a back up design. Its really important to record the design process IMO – notes, sketches, and ideas may all be useful at a later date. It’s also useful to not be afraid to throw away something. My design allowed me to throw away the problem without a knock on affect. I could fairly rapidly test new designs and when it came to it get a fresh perspective on the issue.
3. A great team
This was the most important factor in overcoming the bumps. Being able to bounce ideas of others in the immediate and not so immediate team helped a great deal. The commitment of the team to get through the issues was fantastic – the folks that worked on the project really wanted to see it succeed – this was a huge, huge factor. I think one of the main reasons why our team got over the hurdles was because our client worked with us, as part of the team. When we encountered the issues we didn’t brush them under the carpet - we discussed the problems openly, assessed the options, provided advice, shared ideas and in the end helped each other to get it sorted. There is something deeply unsatisfying when you have to work in a client v’s vendor – us v’s them relationship – I’m happy to say we left all that bullshit behind in about week 3 of the project.
The outcome
There’s lots of things I would change, but the project has been a success from my perspective – although we had a little stumble near the end its important to remember that we delivered a complex system comprising of database, server and client components. We listened to our client and hopefully they consider the project a success too. All projects have the odd bump here and there (some more so than others), if you hear otherwise you should be suspicious. Accepting that projects occasionally bump is not an acceptance of failure – it’s an acceptance of reality, an acceptance of change. When measuring the success of a project then I think we should include an assessment on how well the project manages its issues along the way. I’m certainly going to take a great deal of this experience forward and hopefully there will be a few less bumps the next time.