Building Real Software: Scrum

Showing posts with label Scrum. Show all posts

Wednesday, March 4, 2015

Putting Security into Sprints

To build a secure app, you can’t wait to the end and hope to “test security in”. For teams who follow Agile methods like Scrum, this means you have to find a way to add security into Sprints. Here’s how to do it:

Sprint Zero

A few basic security steps need to be included upfront in Sprint Zero:

Platform selection – when you are choosing your language and application framework, take some time to understand the security functions they provide. Then look around for security libraries like Apache Shiro (a framework for authentication, session management and access control), Google KeyCzar (crypto), and the OWASP Java Encoder (XSS protection) to fill in any blanks.
Data privacy and compliance requirements – make sure that you understand data needs to be protected and audited for compliance purposes (including PII), and what you will need to prove to compliance auditors.
Secure development training – check the skill level of the team, fill in as needed with training on secure coding. If you can’t afford training, buy a couple of copies of Iron-Clad Java, and check out SAFECode’s free seminars on secure coding.
Coding guidelines and code review guidelines – consider where security fits in. Take a look at CERT’s Secure Java Coding Guidelines.
Testing approach – plan for security unit testing in your Continuous Integration pipeline. And choose a static analysis tool and wire it into Continuous Integration too. Plan for pen testing or other security stage gates/reviews later in development.
Assigning a security lead - someone on the team who has experience and training in secure development (or who will get extra training in secure development) or someone from infosec, who will act as the point person on risk assessments, lead threat modeling sessions, coordinate pen testing and scanning and triage the vulnerabilities found, bring new developers up to speed.
Incident Response - think about how the team will help ops respond to outages and to security incidents.

Early Sprints

The first few Sprints, where you start to work out the design and build out the platform and the first-ofs for key interfaces and integration points, is when the application’s attack surface expands quickly.

You need to do threat modeling to understand security risks and make sure that you are handling them properly.

Start with Adam Shostack’s 4 basic threat modeling questions:

What are you building?
What can go wrong?
What are you going to do about it?
Did you do an acceptable job at 1-3?

Delivering Features (Securely)

A lot of development work is business as usual, delivering features that are a lot like the other features that you’ve already done: another screen, another API call, another report or another table. There are a few basic security concerns that you need to keep in mind when you are doing this work. Make sure that problems caught by your static analysis tool or security tests are reviewed and fixed. Watch out in code reviews for proper use of frameworks and libraries, and for error and exception handling and defensive coding.

Take some extra time when a security story comes up (a new security feature or a change to security or privacy requirements), and think about abuser stories whenever you are working on a feature that deals with something important like money, or confidential data, or secrets, or command-and-control functions.

Heavy Lifting

You need to think about security any time you are doing heavy lifting: large-scale refactoring, upgrading framework code or security plumbing or the run-time platform, introducing a new API or integrating with a new system. Just like when you are first building out the app, spend extra time threat modeling, and be more careful in testing and in reviews.

Security Sprints

At some point later in development you may need to run a security Sprint or hardening Sprint – to get the app ready for release to production, or to deal with the results of a pen test or vulnerability scan or security audit, or to clean up after a security breach.

This could involve all or only some of the team. It might include reviewing and fixing vulnerabilities found in pen testing or scanning. Checking for vulnerabilities in third party and Open Source components and patching them. Working with ops to review and harden the run-time configuration. Updating and checking your incident response plan, or improving your code review or threat modeling practices, or reviewing and improving your security tests. Or all of the above.

Adding Security into Sprints. Just Do It.

Adding security into Sprints doesn’t have to be hard or cost a lot. A stripped down approach like this will take you a long way to building secure software. And if you want to dig deeper into how security can fit into Sprints, you can try out Microsoft’s SDL for Agile. Just do it.

Wednesday, November 19, 2014

Different Ways of Scaling Agile

At this year's Construx Software Executive Summit one of the problems that we explored was how to scale software development, especially Agile development, across projects, portfolios, geographies and enterprises. As part of this, we looked at 3 different popular methods for scaling Agile: LeSS (Large Scale Scrum), SAFe (Scaled Agile Framework), and DAD (Disciplined Agile Delivery).

LeSS and LeSS Huge - Large Scale Scrum

Craig Larman, the co-author of LeSS (and LeSS Huge - for really big programs), started off by criticizing the "contract game" or "commitment game" that management, developers and customers traditionally play to shift blame upfront for when things (inevitably) go wrong on a project. It was provocative and entertaining, but it had little to do with scaling Agile.

He spent the rest of his time building the case for restructuring organizations around end-to-end cross-functional feature teams who deliver working code rather than specialist component teams and functional groups or matrices. Feature teams can move faster by sharing code and knowledge, solving problems together and minmizing handoffs and delays.

Enterprise architecture in LeSS seems easy. Every team member is a developer - and every developer is an architect. Architects work together outside of teams and projects in voluntary Communities of Practice to collaborate and shape the organization's architecture together. This sounds good - but architecture, especially in large enterprise environments, is too important to try and manage out-of-band. LeSS doesn't explain how eliminating specialization and working without upfront architecture definition and architectural standards and oversight will help build big systems that work with other big systems.

LeSS is supposed to be about scaling up, but most of what LeSS lays out looks like Scrum done by lots of people at the same time. It's not clear where Scrum ends and LeSS starts.

SAFe - Scaled Agile Framework

There's no place for management in LeSS (except for Product Owners, who are the key constraint for success - like in Scrum). Implementing Less involves fundamentally restructuring your organization around business-driven programs and getting rid of managers and specialists.

Managers (as well as architects and other specialists) do have a role in SAFe's Scaled Agile Framework - a more detailed and heavyweight method that borrows from Lean, Agile and sequential Waterfall development approaches. Teams following Scrum (and some XP technical practices) to build working code roll up into programs and portfolios, which need to be managed and coordinated.

In fact, there is so much for managers to do in SAFe as "Lean-Agile Leaders" that Dean Leffingwell spent most of his time enumerating and elaborating the roles and responsibilities of managers in scaling Agile programs and leading change.

Some of the points that stuck with me:

The easiest way to change culture is to have success. Focus on execution, not culture, and change will follow.
From Deming: Only managers can change the system - because managers create systems. Change needs to come from the middle.
Managers need to find ways to push decisions down to teams and individuals, giving them strong and clear "decision filters" so that they understand how to make their own decisions.

DAD - Disciplined Agile Delivery

Scott Ambler doesn't believe that there is one way to scale Agile development, because in an enterprise different teams and projects will deliver different kinds of software in different ways: some may be following Scrum or XP, or Kanban, or Lean Startup with Continuous Deployment, or RUP, or SAFe, or a sequential Waterfall approach (whether they have good reasons, or not so good reasons, for working the way that they do).

Disciplined Agile Development (DAD) is not a software development method or project management framework - it is a decision-making framework that looks at how to plan, build and run systems across the enterprise. DAD layers over Scrum/XP, Lean/Kanban or other lifeycles, helping managers make decisions about how to manage projects, how to manage risks, and how to drive change.

Projects, and people working in projects, need to be enterprise-aware - they need to work within the constraints of the organization, follow standards, satisfy compliance, integrate with legacy systems and with other projects and programs, and leverage shared resources and expertise and other assets across the organization.

Development isn't the biggest problem in scaling Agile. Changes need to be made in many different parts of the organization in order to move faster: governance (including the PMO), procurement, finance, compliance, legal, product management, data management, ops, ... and these changes can take a long time. In Disciplined Agile Development, this isn't easy, and it's not exciting. It just has to be done.

Scaling Agile is Hard, but it's worth it

Almost all of us agreed with Dean Leffingwell that "nothing beats Agile at the team level". But achieving the same level of success at the organizational level is a hard problem. So hard that none of the people who are supposed to be experts at it could clearly explain how to do it.

After talking to senior managers from many different industries and different countries, I learned that most organizations seem to be finding their own way, blending sequential Waterfall stage-gate development and large-scale program management practices at the enterprise-level with Agile at the team level. Using Agile approaches to explore ideas and requirements, prototyping and technical spikes to help understand viability and scope and technical needs and risks early, before chartering projects. Starting off these projects with planning and enough analysis and modeling upfront to identify key dependencies and integration points, then getting Agile teams to fill in the details and deliver working software in increments. Managing these projects like any other projects, but with more transparency into the real state of software development - because you get working software instead of status reports.

The major advantage of Agile at scale isn't the ability to react to continuous changes or even to deliver faster or cheaper. It's knowing sooner whether you should keep going, or if you need to keep going, or if you should stop and do something else instead.

Monday, April 14, 2014

Agile - What’s a Manager to Do?

As a manager, when I first started learning about Agile development, I was confused by the fuzzy way that Agile teams and projects are managed (or manage themselves), and frustrated and disappointed by the negative attitude towards managers and management in general.

Attempts to reconcile project management and Agile haven't answered these concerns. The PMI-ACP does a good job of making sure that you understand Agile principles and methods (mostly Scrum and XP with some Kanban and Lean), but is surprisingly vague about what an Agile project manager is or does. Even a book like the Software Project Manager’s Bridge to Agility, intended to help bridge PMI's project management practices and Agile, fails to come up with a meaningful job for managers or project managers in an Agile world.

In Scrum (which is what most people mean when they say Agile today), there is no place for project managers at all: responsibilities for management are spread across the Product Owner, the Scrum Master and the development team.

We have found that the role of the project manager is counterproductive in complex, creative work. The project manager’s thinking, as represented by the project plan, constrains the creativity and intelligence of everyone else on the project to that of the plan, rather than engaging everyone’s intelligence to best solve the problems.
In Scrum, we have removed the project manager. The Product Owner, or customer, provides just-in-time planning by telling the development team what is needed, as often as every month. The development team manages itself, turning as much of what the product owner wants into usable product as possible. The result is high productivity, creativity, and engaged customers.

We have replaced the project manager with the Scrum Master, who manages the process and helps the project and organization transition to agile practices.

Ken Schwaber, Agility and PMI, 2011

Project Managers have the choice of becoming a Scrum Master (if they can accept a servant leader role and learn to be an effective Agile coach – and if the team will accept them) or a Product Owner (if they have deep enough domain knowledge and other skills), or find another job somewhere else.

Project Manager as Product Owner

The Product Owner is command-and-control position responsible for the “what” part of a development project. It's a big job. The Product Owner owns the definition of what needs to be built, decides what gets done and in what order, approves changes to scope and makes scope / schedule / cost trade-offs, and decides when work is done. The Product Owner manages and represents the business stakeholders, and makes sure that business needs are met. The Product Owner replaces the project manager as the person most responsible for the success of the project (“the one throat to choke”).

But they don’t control the team’s work, the technical details of who does the work or how. That’s decided by the team.

Some project managers may have the domain knowledge and business experience, the analytical skills and the connections in the customer organization to meet the requirements of this role. But it’s also likely to be played by an absentee business manager or sponsor, backed up by a customer proxy, a business analyst or someone else on the team without real responsibility or authority in the organization, creating potentially serious project risks and management problems. Some organizations have tried to solve this by sharing the role across two people: a project manager and a business analyst, working together to handle all of the Product Owner’s responsibilities.

Project Manager as Scrum Master

It seems like the most natural path for a project manager is to become the team’s Scrum Master, although there is a lot of disagreement over whether a project manager can be effective – and accepted – as a Scrum Master, whether they will accept the changes in responsibilities and authority, and be willing to change how they work with the team and the rest of the organization.

The Scrum Master is a “process owner” and coach, not a project manager. They help the team – and the Product Owner – understand how to work in an Agile process framework, what their roles and responsibilities are, set up and guide the meetings and reviews, and coach team members through change and conflict.

The Scrum Master works a servant leader, a (nice) process cop, a secretary and a gofer. Somebody who supports the team and the Product Owner, “carries food and water” for them, tries to protect them from the world outside of the project and helps them solve problems. But the Scrum Master has no direct authority over the project or the team and does not make decisions for them, because Agile teams are supposed to be self-directing, self-organizing and self-managing.

Of course that’s not how things start off. Any group of people must work their way through Tuckman’s 4 stages of team development: Forming-Storming-Norming-Performing. It’s only when they reach the last stage that a group can effectively manage themselves. In the mean time, somebody (the Scrum Master / Coach) has to help the team make decisions that they aren’t ready to make on their own. It can take a long time for a team to reach this point, for people to learn to trust each other – and the organization – enough. And it may not last long, before something outside of the team’s control sets them back: a key person leaving or joining the team, a change in leadership, a shock to the project like a major change in direction or cuts to the budget. Then they need to be led back to a high performing state again.

Coaching the team and helping them out can be a full-time job in the beginning. After the team has got together and learned the process? Not so much. Which is why the Scrum Master is sometimes played part-time by a developer or sometimes even rotated between people on the development team.

But even when the team is performing at a high level, there’s more to managing an Agile project than setting up meetings, buying pizza and trying to stay out of the way. I've come to understand that Agile doesn't make a manager’s job go away. If anything, it expands it.

Managing Upfront

First, there’s all of the work that has to be done upfront at the start of a project – before Iteration Zero. Identifying stakeholders. Securing the charter. Negotiating the project budget and contract terms. Understanding and navigating the organization’s bureaucracy. Figuring out governance and compliance requirements and constraints, what the PMO needs. Working with HR, line managers and functional managers to put the team together, finding and hiring good people, getting space for them to work in and the tools that they need to work with. Lining up partners and suppliers and contractors. Contracting and licensing and other legal stuff. >/p>

The Product Owner might do some of this work - but they can't do it all.

Managing Up and Out

Then there’s the work that needs to be managed outside of the team.

Agile development is insular, insulated and inward-looking. The team is protected from the world outside so they can focus on building features together. But the world outside is too important to ignore. Every development project involves more than designing and building software – often much more than the work of development itself. Every project, even a small project, has dependencies and hand-offs that need to be coordinated with other teams in other places, with other projects, with specialists outside of the team, with customers and partners and suppliers. There is forward planning that needs to be done, setting and tracking drop-dead dates, defining and maintaining interfaces and integration points and landing zones.

Agile teams move and respond to change quickly. These changes can have impacts outside of the team, on the customer, other teams and other projects, other parts of the organization, suppliers and partners. You can try using a Scrum of Scrums to coordinate with other Agile teams up to a point, but somebody still has to keep track of dependencies and changes and delays and orchestrate the hand-offs.

Depending on the contracting model and your compliance or governance environment, formal change control may not go away either, at least not for material changes. Even if the Product Owner and the team are happy, somebody still has to take care of the paperwork to stay onside of regulatory traceability requirements and to stay within contract terms.

There are a lot of people who need to know what’s going on in a project outside of the development team – especially in big projects in big organizations. Communicating outwards, to people outside of the team and outside of the company. Communicating upwards to management and sponsors, keeping them informed and keeping them onside. Task boards and burn downs and big visible charts on the wall might work fine for the team, but upper management and the PMO and other stakeholders need a lot more, they need to understand development status in the overall context of the project or program or business change initiative.

And there’s cost management and procurement. Forecasting and tracking and managing costs, especially costs outside of development labor costs. Contracts and licensing need to be taken care of. Stuff needs to be bought. Bills need to be paid.

Managing Risks

Scrum done right (with XP engineering practices carefully sewed in) can be effective in containing many common software development risks: scope, schedule, requirements specification, technical risks. But there are other risks that still need to be managed, risks that come from outside of the team: program risks, political risks, partner risks and other logistical risks, integration risks, data quality risks, operational risks, security risks, financial risks, legal risks, strategic risks.

Scrum purposefully has many gaps, holes, and bare spots where you are required to use best practices – such as risk management.
Ken Schwaber

While the team and the Product Owner and Scrum Master are focused on prioritizing and delivering features and resolving technical issues, somebody has to look further out for risks, bring them up to the team, and manage the risks that aren't under the team’s control.

Managing the End Game

And just like at the start of a project, when the project nears the end game, somebody needs to take care of final approvals and contractual acceptance, coordinate integration with other systems and with customers and partners, data setup and cleansing and conversion, documentation and training. Setting up the operations infrastructure, the facilities and hardware and connectivity, the people and processes and tools needed to run the system. Setting up a support capability. Packaging and deployment, roll out planning and roll back planning, the hand-off to the customer or to ops, community building and marketing and whatever else is required for a successful launch. Never mind helping make whatever changes are required to business workflows and business processes that may be required with the new system.

Project Management doesn't go away in Agile

There are lots of management problems that need to be taken care of in any project. Agile spreads some management responsibilities around and down to the team, but doesn’t make management problems go away. Projects can’t scale, teams can’t succeed, unless somebody – a project manager or the PMO or someone else with the authority and skills required – takes care of them.

Thursday, January 23, 2014

Can you Learn and Improve without Agile Retrospectives? Of course you can…

Retrospectives – bringing the team together on a regular basis to examine how they are working and identify where and how they can improve – are an important part of Agile development.

Scrum and “Inspect and Adapt”

So important that Schwaber and Sutherland burned retrospectives into Scrum at the end of every Sprint, to make sure that teams will continuously Inspect and Adapt their way to more effective and efficient ways of working.

End-of-Sprint retrospectives are now commonly accepted as the right way to do things, and are one of the more commonly followed practices in Agile development. VersionOne’s latest State of Agile Development survey says that 72% of Agile teams are doing retrospectives.

Good Retrospectives are Hard Work

Good retrospectives are a lot of work.

For the leader/Coach/Scrum Master who needs to sell them to the team – and to management – and build a safe and respectful environment to hold the meetings and guide everyone through the process properly.

For the team, who need to take the time to learn and understand together and act on what they've learned and then follow-up and actually get better at how they work.

So hard that there several books written just on how to do retrospectives,(Agile Retrospectives: Making Good Teams Great, The Retrospective Handbook, Getting Value out of Agile Retrospectives), as well as several chapters written about retrospectives in other books on Agile, and retrospective websites (including one just on how to make retrospectives fun) and a wiki and at least one prime directive for running retrospectives, and dozens of blog posts with suggestions and coaching tips and alternative meeting formats and collaborative games and tools and techniques to help teams and coaches through the process, to energize retrospectives or re-energize them when teams lose momentum and focus.

Questioning the need for Retrospectives

Because retrospectives are so much work, some people have questioned how useful running retrospectives each Sprint really is, whether they can get by without a retrospective every time, or maybe without doing them at all.

There are good and bad reasons for teams to skip – or at least want to skip – retrospectives.

Because not everyone works in a safe environment where people trust and respect each other, so retrospectives can be dangerous and alienating, a forum for finger pointing and blame and egoism.

Because they don’t result in meaningful change, because the team doesn’t act on what they find – or aren’t given a chance to – and so the meetings become a frustrating and pointless waste of time, rehashing the same problems again and again.

Because the real problems that they need to solve in order to succeed are larger problems that they don’t have the authority or ability to do anything about, and so the meetings become a frustrating and pointless waste of time….

Because the team is under severe time pressure, they have to deliver now or there may not be a chance to get better in the future.

Because the team is working well together, they've “inspected and adapted” their way to good practices and don’t have any serious problems that have to be fixed or initiatives that are worth spending a lot of extra time and energy on, at least for now. They could keep on trying to look for ways to get even better, or they could spend that time getting more work done.

Inspecting and Adapting – without Regular Retrospectives

Regular, frequent retrospectives can be useful – especially when you are first starting off in a new team on a new project. But once the team has learned how to learn, the value that they can get from retrospectives will decline.

This is especially the case for teams working in rapid cycles, short Sprints every 2 weeks or every week or sometimes every few days. As the Sprints get shorter, the meetings need to be shorter too, which doesn’t leave enough time to really review and reflect. And there’s not enough time to make any meaningful changes before the next retrospective comes up again.

At some point it makes good sense to stop and try something different. Are there other ways to learn and improve that work as well, or better than regular team retrospective meetings?

XP and Continuous Feedback

Retrospectives were not part of Extreme Programming as Kent Beck et al defined it (in either the first or second edition).

XP teams are supposed to follow good engineering (at least coding and testing) practices and work together in an intelligent way from the beginning – it should be enough to follow the rules of XP, and fix things when they are broken.

XP relies on built-in feedback loops: TDD, Continuous Integration and continuous testing, pair programming, frequently delivering small releases of software for review. The team is expected to learn from all of this feedback, and improve as they go. If tests fail, or they get negative feedback from the Customer, or find other problems, they need to understand what went wrong, why, and correct it.

Devops and Continuous Delivery/Deployment

Delivering software frequently, or continuously, to production pushes this one step further. If you are delivering working software to real customers on a regular basis, you don’t need to ask the team to reflect internally, to introspect – your customers will tell you if you are doing a good job, and where you need to improve:

Are you delivering what customers need and want? Is it usable? Do they like it?

Is the software quality good – or at least good enough?

Are you delivering fast enough?

By understanding and acting on this feedback, the team will improve in ways that make a real difference.

Root Cause Analysis

If and when something seriously goes wrong in testing or production or within the team, call everyone together for an in depth review and carefully step through Root Cause Analysis to understand what happened, why, what you need to change to prevent problems like this from happening again, and put together a realistic plan to get better.

Reviews like this, where the team works together to confront serious problems in a serious way and genuinely understand them and commit to fixing them, are much more important than a superficial 2-hour meeting every couple of weeks. These can be – and often are – make or break situations. Handled properly, this can pull teams together and make them much stronger. Never waste a crisis.

Kanban and Micro-Optimization

Teams following Kanban are constantly learning and improving.

By making work visible and setting work limits, they can immediately detect delays and bottlenecks, then get together and correct them. This micro-optimization at the task level, always tuning and fixing problems as they come up, might seem superficial, but the results are immediate (recognizing and correcting problems as soon as they come up makes more sense than waiting until the next scheduled meeting), and small improvements are all that many teams are actually able to make anyways.

Take advantage of audits and reviews

In large organizations and highly regulated environments, audits and other reviews (for example security penetration tests) are a fact of life. Instead of trying to get through them with the least amount of effort and time wasted, use them as valuable learning opportunities. Build on what the auditors or reviewers ask for and what they find. If they find something seriously missing or wrong, treat it as a serious problem, understand it and correct it at the source.

Moving Beyond Retrospectives

There are other ways to keep learning and improving, other ways to get useful feedback, ways that can be as effective or more effective and less expensive than frequent retrospectives, from continuous checking and tuning to deep dives if something goes wrong.

You can always schedule regular retrospective meetings if the circumstances demand it: if quality or velocity start to slide noticeably, or conflicts arise in the team, or if key people leave, or there’s been some other kind of shock, a sudden change in direction or priorities that requires everyone to work in a much different way, and start learning all over again.

But don’t tie people down and force them to go through a boring, time-wasting exercise because it’s the “right way to do Agile”, or turn retrospectives into a circus because it’s the only way you can keep people engaged. Find other, better ways to keep learning and improving.

Thursday, October 10, 2013

Don't You Know that Support is the Most Important Part of a Developer’s Job?

Agile development – because you are building working software faster and delivering it incrementally – forces development teams to face a common, fundamental problem: how to balance the work of developing new software with the need to support a system that is already being used in production, whether it’s the legacy system that you’re replacing, or the system that you are still building – and sometimes both.

This is especially a problem for Agile teams following Scrum. On the one hand, in order for the team to meet Sprint goals and commitments and to establish a velocity for future planning, the team is not supposed to be interrupted while they are doing their work. On the other hand, the point of working iteratively and incrementally in Scrum is to deliver working software early and frequently to the customer, who will want to use this software as soon as they can, and who will then need support and help using the software – help and support that needs to come from the people who wrote the software.

At some point, often still early in developing a system, these teams have to stop working in a bubble with their pretend Customer, and start working in the real world with real customers who have real demands.

Supporting Customers and Still Building New Software

This means that teams have to find a way to juggle support and maintenance work with design and development, to deal with rapidly changing priorities and interruptions and complaints and questions and the stress of fire fighting when things break, while still trying to deliver good quality software and hit deadlines.

It’s not easy to balance two completely different kinds of work with directly opposed goals and incentives and metrics. As Don Schueler explains in the “The Fragile Balance between Agile Development and Customer Support”, development teams – even Agile teams working closely with their Customer – are mostly inward-looking, internally focused on delivery and velocity and cost and code quality and technical concerns. Support teams are outward-looking, focused on customer relationships and customer experience and completeness and minimizing operational risk.

Development is about being predictable and efficient: deliver to schedule and keep development costs down. Support is about being responsive and effective: listen to the customer, answer questions, fit in unplanned work, figure out problems and fix things right away. Development work is about flow, continuity, predictability, velocity, and, if managed correctly, is mostly under control of the team. Support and maintenance work is interrupt-driven, immediate, inconsistent and unpredictable – a completely different way of working and thinking. Development work requires the team to be drawn together so that they can collaborate on common goals and the design. Most maintenance and support work is disjointed and disconnected, smaller tasks that can be done by people working independently. Development, even in high pressure projects, is measured in weeks or months. Support and maintenance work needs to be done in days or hours or sometimes minutes.

Agile Support Models: Maintenance Victims

One way that teams try to handle support and maintenance is by sacrificing someone from the team: offering up a “maintenance victim” who takes on the support burden for the rest of the team temporarily, letting the others focus on design and development work. This includes taking calls from Ops or directly from customers, looking at logs, solving problems, fixing bugs. This could mean staying after hours to help troubleshoot or repairing a production problem or putting out a fix, and being on call after hours and on weekends.

The rest of the team tries to pretend that this victim doesn’t exist. If the victim isn’t busy working on support issues or fixing bugs found in production, they might work on fixing other bugs or maybe some other low-priority development work, but they are subtracted from the team’s velocity – nobody depends on them to deliver anything important.

Teams generally rotate someone through support and triage responsibilities for one or two Sprints. This way everyone at some point “shares the pain” and gets some familiarity with support problems and operational issues. There are also positive sides to being sacrificed to support. Developers get a chance to learn more about the system and stretch some of their technical skills, and get off of the hamster wheel of Sprint-after-Sprint delivery for a bit. And they get a chance to play the hero, step in and fix something important and make the customer happy.

Kent Beck and Martin Fowler in Planning Extreme Programming extend this idea to larger organizations by creating a small production support team: 2-4 developers who volunteer to focus on fixing bugs and dealing with production problems. Developers spend a couple of Sprints in production support, then rotate back to development work. Beck and Fowler recommend staggering rotations, making sure that at least one developer is in the first rotation and another in the second so that at least one member of the support team always knows about what is going on and what problems are being worked on.

Sacrificing a maintenance victim or a team makes it possible for most of the rest of the team to move forward on development, while still meeting support commitments. This approach assumes that anyone on the team is capable of figuring out and fixing any problem in the system – that everyone is a cross-functional generalist. And this means that whoever is on this support rotation has to be good enough and experienced enough that they can deal with most issues without bringing in the rest of the team - you can’t rotate newbies through support and maintenance work, at least not without someone senior backing them up.

And you also have to be prepared for problems that are too big or too urgent for your maintenance victim to take care of on their own. Even with a dedicated team you may still need to build in some kind of slack or buffer to deal with emergencies and general helping out, so that you don’t keep blowing up Sprints. You can come up with a reasonable allowance based on “yesterday’s weather”, on how much support work the team has had to do over the last few weeks or months. If you can't make this work, if the entire team is spending too much time on support and fire fighting and pushing hot fixes, then you are doing something wrong and you have to get things under control before you build more software any ways.

Kanban instead of – or inside of – Scrum

Rather than trying to shoe horn maintenance and support into time boxes, some teams have found that Kanban is much better structured than Scrum or XP is to balance support, maintenance, and operations with new development work.

Kanban’s queuing model and use of task boards makes it easy to see what work needs to be done, what work is being done, who is doing it, what’s getting in the way, and when anything changes.

Kanban makes it easier to track and manage different kinds of work that requires different kinds of skills and that don’t always fit nicely into a 1-week or 2-week time-box..

Kanban doesn’t pretend that you won’t be or can’t be interrupted – instead it helps you to manage interruptions and minimize their impact on the team. First, in Kanban you set limits on how much of different kinds of work the team can deal with at a time. This lets the team get control over work coming in, and stay focused on getting things done. Kanban’s queue-and-task model allows emergencies to pre-empt whatever work is in progress through escalation/priority lanes. And priorities can keep changing right up until the last minute – team members just pull the highest priority work item from the ready queue when they are free to take on more work, whether this is designing and developing a new feature, or fixing a bug, or dealing with a support issue.

Kanban helps teams focus more on immediate, tactical issues. It’s a better model to follow when you have more maintenance and support work than new design and development, or when you have to assert control over a major problem or manage something with a lot of moving pieces like the launch of a new system.

Devops Changes Everything

Devops, as followed by organizations like Etsy and Facebook and Netflix (where they go so far as to call it NoOps) tries to completely break down the boundaries between development, maintenance, support and operations. Devops engages developers directly and closely into the support, maintenance and operations of the systems that they build. Developers who work in these organizations are not just writing code – they are part of a team running an online service-based business, which means that support work is as important, and sometimes more important, than designing and writing more software.

In these organizations, developers are held personally responsible for their software, for getting it into production and making sure that it works. They are on call for problems with software that they worked on. They are actively involved in operations of the system, providing insight into how the system works and how it is running, in testing and configuring it and tuning it and troubleshooting problems.

Devops changes what developers work on and how they do it. They move away from project work and more towards fast feature development, fixing, tuning and hardening. Availability and reliability and performance and security and other operational factors become as important – or more important – than delivery schedules and velocity. Developers spend more time thinking about how to make the system work, how to simplify deployment and setup and about the information that people need to understand what’s going on inside the system, what metrics and tools might be useful, how to handle bad data and infrastructure failures, what could go wrong when they make a change and who they need to check with and what they need to test for.

Maintenance and Support – Responsibility and Feedback

Whether developers need to – or even should – take first line support calls from users, they at least need to be part of second level and third level support, where problems are investigated and solved.

This is not just because they are usually the only people who can actually figure out and fix many problems.

Putting aside moral hazard arguments about whether it’s ethically acceptable for developers not to take full responsibility for the consequences for their decisions and the quality of their work, there are compelling advantages to developers being directly involved in supporting and maintaining the software that they work on.

The most important is the quality of the feedback that developers get from supporting a real system – feedback that is too valuable for them to ignore.

Real feedback on what you did right in building the system, and what you got wrong. Feedback on what you thought the customer needed vs. what they really need. What features customers really find useful (and what they don`t). Where the design is weak. Where most of your problems are coming from (the 20% of the code where 80% of the bugs are hiding), where the weaknesses are in your testing and reviews, where you need to focus and where you need to improve. Valuable information into what you’re building and how you’re building it and how you plan and prioritize, and how you can get better.

When developers are called into fire fighting production incidents and Root Cause Analysis reviews they can learn enormous amounts about what it takes to build software for the real world. Thinking seriously about how problems happened and how to prevent them can change how you plan, design, build, test and deploy software; and how people work together as a team.

Farming all of this off to someone else, filtering it through a help desk or an offshore maintenance team, breaks these valuable feedback loops, with negative effects for everyone involved.

Peter Gillard-Moss explains how this happens:

In a startup, developers take care of problems themselves, well, because there isn`t anybody else to do it. But at some point things change:

“…managers decided that we were spending far too long investigating users’ problems and not long enough building the new features the business wanted. Developers needed to be more productive, and more productive meant developers developing more new features. To get developers to develop they need to be ‘in the zone’. They need headphones and big screens to glue their eyes to. They did not need petty interruptions like stupid users ringing up because they got a pop up saying their details will be resent when they tried to refresh.”

But by doing this, the development team became disconnected from the results of their work, and from their customers…

“A systems thinker would tell you this is wrong. You’ve gone from a system that connected a user to the team responsible with one degree of separation, to one that has three degrees of separation. Or think of it another way: the team producing the product, and responsible for improvements and fixes used to be one degree away from their end users, who use the product and are feeding back the product’s shortcomings and issues, but are now three degrees. And not even three degrees all of the time. The majority of the time the team won’t ever hear about most of the support issues. And most of the time the team won’t even have that much interaction with the team that does hear about most of the support issues.”

The result: Customers don’t get the support that they need. Developers don’t get the information that they need to understand how to make the system work better. A support team stuck in the middle with people just trying to keep things from getting worse and hoping to find a better job someday. It’s a self-reinforcing, negative spiral.

In our shop, support takes priority over development – always. Our senior developers work with operations to support the system, and are on call when we put new software in and on call if something goes wrong after hours. They can bring in anyone else from any team that they need for help. As a result, we have very few serious problems and these problems get fixed fast and fixed right. The experience that everyone gets from working in support helps them to design and write better, safer code. This has made the system more resilient and easier and less expensive to support and safer to setup and run and easier and safer to change. And it has made our organization better too. It’s brought developers and operations closer together, and closer to what’s important to the business.

Whether you call it “Agile” or not, there’s nothing more agile than a team that is working directly with customers, responding immediately to problems and changing requirements in a live system. While some developers and managers think of this as overhead, sustaining engineering and try to push it off to somebody else so that they can focus on “more strategic" work, others recognize that this is really the leading edge of software development, and the only way to run a successful software organization, and the only way to make software, and developers, better.

Wednesday, May 22, 2013

7 Agile Best Practices that You Don’t Need to Follow

There are many good ideas and practices in Agile development, ideas and practices that definitely work: breaking projects into Small Releases to manage risk and accelerate feedback; time-boxing to limit WIP and keep everyone focused; relying only on working software as the measure of progress; simple estimating and using velocity to forecast team performance; working closely and constantly with the customer; and Continuous Integration – and Continuous Delivery – to ensure that code is always working and stable.

But there are other commonly accepted ideas and best practices that aren’t important: if you don’t follow them, nothing bad will happen to you and your project will still succeed. And there are a couple that you are better off not following at all.

Test-Driven Development

Teams that need to move quickly need to depend on a fast, efficient testing safety net. With Test First Development or Test-Driven Development (TDD), there’s no excuse for not writing tests – after all, you have to write a failing test before you write the code. So you end up with a good set of working automated tests that ensure a high level of coverage and regression protection.

TDD is not only a way of ensuring that developers test their code. It is also advocated as a design technique that leads to better quality code and a simpler, cleaner design.

A study of teams at Microsoft and IBM (Realizing Quality Improvement through Test Driven Development, Microsoft Research, 2008) found that while TDD increased upfront development costs between 15-35% (TDD demands developers change the way that they think and work, which slows developers down, at least at first), it reduced defect density by 40% (IBM) or as much as 60-90% (Microsoft) over teams that did not follow disciplined unit testing.

But in Making Software Chapter 12 “How Effective is Test-Driven Development” researchers led by Burak Turhan found that while TDD improves external quality (measured by one or more of test cases passed, number of defects, defect density, defects per test, effort required to fix defects, change density, % of preventative changes) and can improve the quality of the tests (fewer mistakes in the tests, tests that are easier to maintain), TDD does not consistently improve the quality of the design. TDD seems to reduce code complexity and improve reuse, however it also negatively impacts coupling and cohesion. And while method and class-level complexity is better in code developed using TDD, project/package level complexity is worse.

People who like TDD like it a lot, so if you like it, do it. And even if you are not TDD-infected, there are times when working test first is natural – when you have to solve a specific problem in a specific way, or if you’re fixing a bug where the failing test case is already written up for you. But the important thing is that you write a good set of tests and keep them up to date and run them frequently – it doesn't matter if you write them before, or after, you write the code.

Pair Programming

According to the VersionOne State of Agile Development Survey 2012, almost 1/3 of teams follow pair programming – a surprisingly high number, given how disciplined pair programming is, and how few teams follow XP (2%) or Scrum/XP Hybrid (11%) methods where pair programming would be prescribed.

There are good reasons for pairing: information sharing and improving code quality through continuous, informal code reviews as developers work together. And there are natural times to pair developers, or sometimes developers and testers, together: when you’re working through a hard design problem; or on code that you’ve never seen before and somebody who has worked on it is available to help; or when you’re over your head in troubleshooting a high-pressure problem; or testing a difficult part of the system; or when a new person joins the team and needs to learn about the code and coding practices.

Some (extroverted) people enjoy pairing up, the energy it creates and the opportunities it provides to get to know others on the team. But forcing people who prefer working on their own or who don’t like each other to work closely together is definitely not a good idea. There are real social costs in pairing: you have to be careful to pair people up by skill, experience, style, personality type and work ethic. And sustained pair programming can be exhausting, especially over the long term – one study (Vanhanen and Lassenius 2007) found that people only pair between 1.5 and 4 hours a day on average, because it’s too intense to do all day long.

In Pair Programming Considered Harmful? Jon Evans says that pairing can have also negative effects on creativity:

Research strongly suggests that people are more creative when they enjoy privacy and freedom from interruption … What distinguished programmers at the top-performing companies wasn’t greater experience or better pay. It was how much privacy, personal workspace and freedom from interruption they enjoyed,” says a New York Times article castigating “the new groupthink”.

And in “Still Questioning Extreme Programming” Pete McBreen points out some other disadvantages and weaknesses of pair programming:

Exploration of ideas is not encouraged, pairing makes a developer focus on writing the code, so unless there is time in the day for solo exploration the team gets a very superficial level of understanding of the code.
Developers can come to rely too much on the unit tests, assuming that if the tests pass then the code is OK. (This follows on from the lack of exploration.)
Corner cases and edge cases are not investigated in detail, especially if they are hard to write tests for.
Code that requires detail thinking about the design is hard to do when pairing unless one partner completely dominates the session. With the usual tradeoff between partners, it is hard to build technically complex designs unless they have been already been worked out in a solo session.
Personal styles matter when pairing, and not all pairings are as productive as others.
Pairs with different typing skills and proficiencies often result in the better typist doing all of the coding with the other partner being purely passive.

And of course pairing in distributed teams doesn't work well if at all (depending on distance, differences in time zones, culture, working styles, language), although some people still try.

While pairing does improve code quality over solo programming, you can get the same improvements in code quality, and at least some of the information sharing advantages, through code reviews, at less cost. Code reviews – especially lightweight, offline reviews – are easier to schedule, less expensive and less intrusive than pairing. And as Jason Cohen points out even if developers are pair programming, you may still need to do code reviews, because pair programming is really about joint problem solving, and doesn’t cover all of the issues that a code review would.

Back to Jon Evans for the final word on pair programming:

The true answer is that there is no one answer; that what works best is a dynamic combination of solitary, pair, and group work, depending on the context, using your best judgement. Paired programming definitely has its place. (Betteridge’s Law strikes again!) In some cases that place may even be “much of most days.” But insisting on 100 percent pairing is mindless dogma, and like all mindless dogma, ultimately counterproductive.

Emergent Design and Metaphor

Incremental development works, and trying to keep design simple makes good sense, but attempting to define an architecture on the fly is foolish and impractical. There’s a reason that almost nobody actually follows Emergent Design: it doesn't work.

Relying on a high-level metaphor (the system is an "assembly line" or a "bill of materials" or a "hive of bees") shared by the team as some kind of substitute for architecture is even more ridiculous. Research from Carnegie Mellon University found that

… natural language metaphors are relatively useless for either fostering communication among technical and non-technical project members or in developing architecture.

Almost no one understands what a system metaphor is any ways, or how it is to be used, or how to choose a meaningful metaphor or how to change it if you got it wrong (and how you would know if you got it wrong), including one of the people who helped come up with the idea:

Okay I might as well say it publicly - I still haven't got the hang of this metaphor thing. I saw it work, and work well on the C3 project, but it doesn't mean I have any idea how to do it, let alone how to explain how to do it.
Martin Fowler, Is Design Dead?

Agile development methods have improved development success and shown better ways to approach many different software development problems – but not architecture and design.

Daily Standups

When you have a new team and everyone needs to get to know each other and more time to understand what the project is about; or when the team is working under emergency conditions trying to fix something or finish something under extreme pressure, then getting everyone together in regular meetings, maybe even more than once a day, is necessary and valuable. But whether everyone stands up or sits down and what they end up talking about in a meeting should be up to you.

If your team has been working well together for a while and everyone knows each other and knows what they are working on, and if developers update cards on a task board or a Kanban board or the status in an electronic system as they get things done, and if they are grown up enough to ask for help when they need it, then you don’t need to make them all stand up in a room every morning.

Collective Code Ownership

Letting everyone work on all of the code isn't always practical (because not everyone on the team has the requisite knowledge or experience to work on every problem) and collective code ownership can have negative effects on code quality.

Share code where it makes sense to do so, but realize that not everybody can – or should – work on every part of the system.

Writing All Requirements as Stories

The idea that every requirement specification can be written as User Stories in 1 or 2 lines on cards, that requirements should be too short on purpose (so that the developer has to talk to someone to explain what’s really needed) and insisting that they should all be in the same template form

“As a type of user I want some goal so that some reason…”

is silly and unnecessary. This is the same kind of simple minded orthodoxy that led everyone to try to capture all requirements in UML Use Case format with stick men and bubbles 15 years ago.

There are many different ways to effectively express requirements. Sometimes requirements need to be specified in detail (when you have to meet regulatory compliance or comply with a standard or integrate with an existing system or implement a specific algorithm or…). Sometimes it’s better to work from a test case or a detailed use case scenario or a wire frame or some other kind of model, because somebody who knows what’s going on has already worked out the details for you. So pick the format and level of detail that works best and get to work.

Relying on a Product Owner

Relying on one person as the Product Owner, as the single solitary voice of the customer and the “one throat to choke” when the project fails, doesn't scale, doesn't last, and puts the team and the project and eventually the business at risk. It’s a naïve, dangerous approach to designing a product and to managing a development project, and it causes more problems than it solves.

Many teams have realized this and are trying to work around the Product Owner idea because they have to. To succeed, a team needs real and sustained customer engagement at multiple levels, and they should take responsibility themselves for making sure that they get what they need, rather than relying on one person to do it all.

Wednesday, May 15, 2013

Certified Agile: The PMI-ACP Exam

I sat for the Project Management Institute’s Agile Certified Practitioner (PMI-ACP) exam earlier this week. The PMI-ACP tests your understanding of common Agile development methods, values and practices. It focuses on basic Agile principles, and on Scrum and XP in detail, as well as fundamentals of Lean and Kanban.

Unlike the PMP, there is no Book of Knowledge which defines best practices and a process framework for this certification. Instead there is a certification content outline that explains at a high level the tools, techniques, knowledge and skills that you will be expected to know and will be tested on, and a reference list of books to read which includes some of the usual suspects. Out of this list I’d recommend reading Mike Cohn’s books on Agile Estimating and Planning and User Stories - they are useful for the exam and they're worth reading regardless. If you’re not working in an XP shop you should also read Kent Beck’s Extreme Programming Explained to make sure that you understand XP, and you must read up on the basics of Lean and Kanban. And of course you need to memorize the Agile Manifesto and the Twelve Principles of Agile Software Development front to back.

But I know from writing the PMP several years ago that experience and general reading aren’t enough to prepare for a PMI certification exam. PMI wants everyone who holds a certification to know the same things, and to share the same values and to think and act the same way. There’s an emphasis on orthodoxy – you’re tested not on what you would do (based on your experience and common practical knowledge), but what you should do according to PMI's definition of what “the right way" is to do something. And PMI’s exams are as much a test of your ability to read and write an exam as they are of the subject matter, with trick questions and trip-up answers and questions which are purposefully hard to understand, and even some extra questions thrown in which don’t make sense at all. Writing a test like this is not fun, although the PMI-ACP exam is certainly not as hard as the PMP exam - you shouldn’t need the 3+ hours that you’re given to complete this test.

So like others, I decided to use an exam prep guide to finish my studying.

The PMI-ACP Exam: How to Pass on Your First Try by Andy Crowe is a quick overview of the material that you should know for the exam. Easy to read and easy to follow, it defines key terms and “doing Agile right”, roles and responsibilities and rituals and tools, and covers communication and collaboration issues, and includes some sample questions (and access to a sample online exam). This is not an especially insightful book, but I found it useful for last minute review and cramming.

I did most of my studying with Mike Griffiths’ PMI-ACP Exam Prep: A Course in a Book for Passing the PMI Agile Certified Practitioner (PMI-ACP) Exam, a much more complete study guide, and a good overview of Agile development that is worth keeping and reading on its own. This book builds on materials that Griffiths published earlier on his blog and it is especially good on Agile reporting tools.

Griffiths is one of the experts who created the PMI-ACP program and so he understands what you need to know in depth, and he is a good writer. However, his book is harder to study from than Crowe’s, because it contains a lot more details and because it is structured around the artificial domains that PMI uses to describe Agile development. This results in several discontinuities, where an idea or practice is introduced under “Value Driven Delivery” and then continues later under “Adaptive Planning” or “Continuous Improvement” or one of the other domains (it is not necessary by the way to learn the domains for the exam).

If you have solid experience with Agile development (which you need to in order to meet the qualifying bar) especially Scrum and XP, you should be able to pass the exam with the help of Griffiths’ guide and some general reading to fill in gaps.

Studying for the PMI-ACP has made me examine Agile development ideas and practices in more detail (which is why I decided to apply for the certification). But it hasn't changed how I think about Agile practices and methods or how I think you should follow them. I am just as convinced today as I was before that the key is not following some method in a pure way, but instead to build your own toolkit, to borrow what works from different methods and adapt them to your specific requirements, constraints and situation. And the more that you know and understand about Agile methods and practices, the more tools you have for your toolkit.

Wednesday, January 23, 2013

Design Doesn't Emerge from Code

I know a lot of people who are transitioning to Agile or already following Agile development methods. Almost all of them are using something based on Scrum at the core, mixed with common XP practices like Continuous Integration and refactoring and automated unit testing – pretty much how Mike Cohn says things should be done in his book Succeeding with Agile.

Emergent Design in Scrum and XP

But none of them are doing emergent design as Cohn describes it, or as Kent Beck explains how design is done in Extreme Programming: trying to get away without any upfront design and architecture work, coding features right away and relying on test-first development, refactoring and technical spikes to work out a design on the fly, one week or two weeks at a time.

“For the first iteration, pick a set of simple, basic stories that you expect will force you to create the whole architecture. Then narrow your horizon and implement the stories in the simplest way that can possibly work. At the end of this exercise you will have your architecture. It may not be the architecture you expected, but then you will have learned something.” Kent Beck

You don’t need upfront architecture and design?

Maybe it’s because everyone I know is working at scale – building big enterprise systems and online systems used by lots of customers, systems that have a lot of constraints and dependencies. Many of them are working on brownfield projects where you need to understand the existing system’s design and implementation first, before you can come up with a new design and before you can make any changes. Performance-critical, mission-critical systems in highly-regulated environments.

Emergent, incremental design doesn’t work for these cases. And it doesn’t scale to large projects or any project that has to be delivered along with other projects and that has specified integration points and dependencies – which is pretty much every project that I've ever worked on.

Bob Martin, another one of the people who helped define how Agile development should be done, thinks that this incremental approach to design is, well…

“One of the more insidious and persistent myths of agile development is that up-front architecture and design are bad; that you should never spend time up front making architectural decisions. That instead you should evolve your architecture and design from nothing, one test-case at a time. Pardon me, but that’s Horse Shit.”

Martin goes on to say that

“there are architectural issues that need to be resolved up front. There are design decisions that must be made early. It is possible to code yourself into a very nasty cul-de-sac that you might avoid with a little forethought.”

Architecture and Design in Disciplined Agile Delivery

The way that most people that I know approach Agile development is better described by Scott Ambler in Disciplined Agile Delivery, a model for scaling Agile to larger systems, projects and organizations. As Ambler’s research shows, almost all teams (86%) spend at least some time (on average a month or more) on upfront on planning, scoping and architecture envisioning – what he calls the “Inception Phase” (borrowing from Rational’s Unified Process) or what most others call “Sprint 0” or “Iteration 0”.

This is time spent to understand the scope of the system at a high-level at least, and the constraints and dependencies that the project needs to work within. Time to model the main chunks of the system and their interfaces, and to choose a technical direction to start with.

Upfront architectural and design work doesn't have to take a lot of time. As Ambler points out, for many teams (except for some startups), a lot of architectural decisions have already been made for you:

“In practice, it’s likely you won’t need to do much initial architectural modeling: a large majority of project teams work with technical architecture decisions that were made years earlier. Your organization will most likely already have chosen a network infrastructure, an application-server platform, a database platform, and so on. In these situations your team will need to invest the time to understand those decisions and to consider whether the existing infrastructure build-out is sufficient (and if not, identify potential improvements).”

It’s when you have a real greenfield development project, when you don’t have anything to leverage and you’re doing something completely new, that you should spend more time on upfront thinking about design – not less.

Can you “be Agile” without Emergent Design?

Of course you can. Bob Martin points out that there’s nothing in “Agile Development” that says that you shouldn't do design upfront – as much design as you need to for the size of the system that you are building and the environment that you are working in.

You can and should do iterative, incremental design and development starting with a plan of where you are going and how you think that you are going to get there. As you go along and prove out your design and respond to feedback and deal with changes in requirements, this is where incremental design actually does come into play – handling changes in direction, filling in gaps, correcting misunderstandings. The design will change and maybe become something that you didn't expect. But you need a place to start from – designs don’t just emerge from code.

Wednesday, January 9, 2013

Hardening Sprints. What are they? Do you need them?

For anyone who is developing software using Scrum, XP or another incremental development approach, the idea of a “hardening sprint” or a “release iteration” is bound to come up. But people disagree about what a “hardening sprint” should include, when you need to do one, and if you should do them at all. There is a deep divide between people who recognize that spending some time on hardening is needed for many environments, and people who are adamant that allocating some time for hardening is a sign that you are doing some things – or everything – wrong.

Hardening to make sure that Done means Done

In a hardening sprint, the team stops focusing on delivering new features or architecture, and instead spends their time on stabilizing the system and getting it ready to be released.

For some people, hardening sprints are for completing testing and fixing work that couldn't be done – or didn't get done – earlier. This might include UAT or other final acceptance testing if this is built into a contract or governance model.

Mike Cohn recognizes that teams may need a “release sprint” at the end of each release cycle, because the team’s definition of “done” may not be enough – that a "potentially shippable product" and a system that is actually “shippable” or ready for production aren't the same thing. He suggests that after every 3-5 feature iterations, the team may want to schedule a release sprint to do work like expensive manual system and integration testing and extra reviews, whatever is needed to make sure that what they think is done, is actually done.

Anand Viswanath, in “The end of regression, stabilisation, hardening or release sprints”, describes a common approach where teams schedule 1 or 2 stabilization sprints every 4-6 iterations to do regression testing and system testing in a staging environment, and then fix whatever bugs are found. As he points out, it’s hard to predict how much testing might be required and long it will take to fix whatever problems are found, so the idea is to time box this work and then triage the results.

Because this can be an expensive and risky and stressful way to work, Vishwanath recommends following Continuous Delivery to build an automated test pipeline through to staging in order to catch as many problems as early as possible. This is a good idea, but most large projects, especially projects starting from a legacy code base, will still probably need some kind of hardening or integration testing phase at regular points regardless of what kind of continuous testing they are doing.

Some testing, like interoperability testing with other systems and operational testing, can’t be done effectively until later, when there is enough of a working system to do end-to-end testing, and some of this testing can only be done in staging (if you have a staging environment), or in production. For some systems, load testing and stress testing and soak testing also needs to be left to later, because these teams don’t have access to a big enough test system to run high load scenarios before they get to production.

Is Hardening a sign that you aren't doing things right?

Not everyone thinks that scheduling a hardening sprint for testing and fixing like this is a good idea:

“[a hardening sprint] might take the cake for stupid things invented that has lead to institutionalized delusion and ‘Agile’ dysfunction.” Janelle Klein, Who Came up with the “Hardening Sprint”?

For many people, a hardening sprint or release sprint is a bad “process smell”: a sign that the team isn't working properly or thinking clearly:

“The problem with “hardening sprints” is that you are lying. You make believe your imaginary burndown during the initial sprints shows that you are approaching Done. But it’s a lie--you aren't getting any closer to being ready for Production until you begin your Test phase. You wrote a pile of code that you didn't test adequately. You don’t know how good it is, you don’t know how much work you have left to do, and you don’t know how much longer it will take, until you are deep into your Test phase.” Richard Kasperowski, Hardening sprints? Sorry, you’re not Agile

Ron Jeffries says that a hardening sprint for testing and fixing is a clear anti-pattern. I agree: if you need a separate sprint to fix bugs, then you’re doing something wrong. But that doesn't mean that you won’t need extra time to fix things before the system goes live – knowing that it is wrong doesn't make the bugs go away, you still have to fix them. As somebody else on this same discussion thread points out, there is a risk that your “definition of done” could fall short of what is actually needed by the customer, so you should plan for 1 or more hardening sprints before release, to double-check and stabilize things, just in case.

In these cases, the need for hardening sprints is a sign of a team’s immaturity (from a post by Paul Beavers):

A beginning agile team will prefer to schedule 6 hardening iterations after a 12 iteration development plan. This is “agile” to the hard core “waterfall guy”.
As time goes by, the team will mature a bit and you will see the seasoned agile team will shrink the number of required hardening iterations at the end, just because they understand they need to “fix” the high severity bugs as they go and QA understands they need to test closer and better early up in the release cycle.
Further down the road the team will notice that by adding a hardening iteration in the middle of the development cycle (and flushing out even lesser priority bugs earlier on in the process), it will help them to maintain cadence later on.
The final step of maturity is there when the team starts understanding “hardening is not required any more”, because they made fixing bugs part of their daily routines.

Hardening is whatever you need to do to Make the System Ready for Production

Another way of looking at hardening, is that this is when you stop thinking about features and focus all of your time on the detailed steps of deploying, installing and configuring the system and making sure that everything is working from end-to-end. In a hardening sprint, your most important customers are operations and support, the people who are going to make sure that the system is running, rather than the end users.

For some teams, this kind of hardening can come as an ugly and expensive surprise, after they understand that what they need to do is to take a working functional prototype and make it ready for the real world:

“All those things that got skipped in the first phase - error handling, monitoring, administration - need to get put into the product.” Catherine Powell, The "Hardening Myth"

But a hardening sprint can also be when when you take care of what operations calls hardening: reviewing and preparing the production environment and securing the run-time, tightening up access to production data, double-checking system and application configs, making sure that auditing is enabled properly, wiring the system in to operations monitoring and metrics collection, checking system dependencies like platform software versions and patch levels (and making sure that all of the systems are consistent, that there aren't any snowflakes), completing final security reviews and other review and release gates, and making sure that the people installing and running the software have the correct instructions.This is also when you need to prepare your roll-back plan or recovery plan if something bad happens with the release, and test your roll-back and recovery steps. Walk through and rehearse the release process and checklists, and make sure that everyone is prepared to roll out patches quickly after the release is done.

Hardening is something that you have to do

Some people see an obvious need for hardening sprints. For example, Dean Leffingwell includes hardening sprints in his “Scaled Agile Framework”, because there is some work that can only really be done in a final hardening phase:

Final exploratory and field testing
Checklist validation against release, QA and standards governance
Release signoffs if you need them
Ops documentation
Deployment package
Communicate release to everyone (hard to do in big companies)
Traceability etc for high assurance and regulatory compliance

Leffingwell makes it clear that hardening shouldn't include system integration, fixing high priority bugs, automating test scripts, user documentation, regression testing and code cleanup. There is other work that should be done earlier – but in the first year or so, will probably need to be done in a late hardening phase:

Cross-component integration, integration with third-party/customer
Integrated system-level testing
Final QA sign-offs
User doc finalization
Localization

Dan Rawsthorne explains that teams need at least one release sprint at first to get ready for release to production, because until you've actually done it, you don’t really know what you need to do. Release sprints include tasks like:

Exploratory testing to double check that key features are working properly
Stress testing/load testing/performance testing – testing that is expensive to setup and do
Interoperability testing with other production systems
Fix whatever comes out of this testing
Review and finish off any documentation
Train support and sales and customers on new features
Help with press releases and other marketing material

The Software Project Manager’s Bridge to Agility anticipates that teams will need at least a short hardening iteration before the system is ready for release, even if they frontload as much testing as possible. A release iteration is not a test-fix phase – it’s when you prepare for the release: capturing screenshots for marketing materials, final tests, small tweaks, finish documentation for whoever needs it, training. The authors suggest however that if some developers have any time left over in the release iteration, they can do some refactoring and other cleanup – which I think is bad advice, given that at this point you don’t want to be introducing any new variables or risks.

Disciplined Agile Delivery, a method that was developed by Scott Ambler at IBM to scale Agile practices to large organizations and large projects, includes a Transition Phase before each release to take care of:

Transition planning and coordination
End-of-lifecycle testing and fixing
Testing and rehearsing deployment
Data setup and migration
Pilots and beta testing (short UAT if necessary)
Reviewing and finalizing documentation
Preparing operations and support
Stakeholder training

This kind of transition can take almost no time, or it can take several weeks, depending on the situation.

Hardening – taking some time to make sure that the system is really ready to be released – can’t be avoided. The longer your release cycles, the further away development is from day-to-day production, the more hardening you need. Even if you've been doing disciplined testing and reviews in stream, you’re going to find some problems at the end. Even if you planned ahead for transition, you’re going to run into operational details that you didn't know about or didn't understand until the end.

When we first launched our platform from startup, we had to do hardening and stabilization work before going live to get the system ready, and some more work afterwards to deal with operational issues and requirements that we weren't prepared for. We included time at the end of subsequent releases for extra testing, deployment and roll back planning, and release coordination.

But as we shortened our release cycle, releasing less but more often, and as we built more fail-safes into the system and as we learned more about what we needed to do in ops, and as we invested more in simplifying and automating deployment and everything else that we could, we found that we didn't need time any outside of our regular iterations for hardening. We’re still doing hardening – but now this is part of the day-to-day job of building and releasing software.