There are some people who love to bake. Then there are others who cook because they don’t like to follow a predefined recipe. They like to mix and match ingredients, taste the food, smell the aroma, and sometimes throw it all away and start anew.
Fast-growing companies do more cooking than baking. There is a level of uncertainty that comes with the business changing quickly. At Rally we’ve been fortunate to be part of a fast-growing organization that builds technology which helps people get better and lead healthier lives. We’ve done it making product and engineering trade-offs knowingly and sometimes without knowing.
Over the past four years we’ve grown from a monolithic architecture supporting a single web-only product to a 100+ microservice ecosystem supporting six major products, all unified via our identity platform on both web and mobile.
This blog does not focus on how we grew to 100+ microservices. If you’d like to know that, please read our CTO’s Road to SOA part 1 and Road to SOA part 2. The purpose of this story is the debt we picked up along the way and how we’ve convinced our business partners to not sweep it under the rug.
With Growth Came Engineering Debt
Engineers do not intentionally jump at a chance to create more work for themselves down the road. Many a time, engineering debt is the result of successful growth. What had worked for us a year ago may not necessary work for us today. We’ve had to make our decisions based on the best knowledge available at the time.
In certain cases, our business mandated that we design with flexibility in mind. For example, in the insurance industry we let users through the front door based on employee data provided by the employer. This is called eligibility data. We built our eligibility backend that supported multiple insurance companies right off the bat. The driver for that was that we’ve had business commitment to multiple employers and insurance companies at the same time.
In contrast, when we built the tool which allows our patients to find doctors, we built it mostly to support our first insurance client. It is when we got our second partner that we invested into refactoring and adding multi-insurer support.
Similarly, when we started off we had a wellness product that was a monolithic app with its own identity system. Once our business grew, we realized that we’d need to re-use our identity and eligibility systems across our platform which contained multiple products. It would have been too complex and costly to build a multi-app identity system when we did not know where the business would be heading. Once the business matured, the technology decision to split the identity system became more apparent as we build more applications used by the existing user base.
The engineering investment drivers that are directly aligned to what the customers want are much easier to sell to your business partners. Those investments that are a few degrees of separation from immediate business need are harder to sell.
All the Dependencies
Not all of our engineering debt comes from the changing business landscape. Like most product driven organizations, we need to move quickly. To do that, we don’t write all of our software from scratch. We use libraries and frameworks to scale quickly so that we can focus on writing business logic rather than reinventing the wheel.
We’re a Scala shop and use the Play framework extensively. In 2017, we made the call across Rally Engineering to drive the unified upgrade of our services to the newer version of Play framework. The driver for the effort was that we needed to get continued support plus all the great features that the updated version of the framework had to offer. Most of the trouble came from coordinating the upgrade of libraries and testing properly. We underestimated the complexity of the upgrade.
We’ve just come out of a library hell where we untangled a myriad of libraries by simplifying them to a smaller set of those we depend on and by making sure that these do not change very often. It is when they do change that things get painful. The Play framework upgrade required changes to these few fundamental libraries. It also required thorough testing (automated and manual) to make sure that our software still works.
We failed to complete the upgrade. Our goal was 100% migration to the new Play framework. We got to less than 30% across Rally by end of 2017.
Lessons We Learned
We had multiple good and bad reasons why we failed. In the end it all boiled down to not having the business support for the engineering investment work and not coordinating the complex upgrades more closely across Engineering.
It was a good lesson and reflected our growth as a company. What worked for us previously, didn’t scale anymore. So we needed to change.
The 2018 Pivot
We’ve learned our lesson from past failure. In 2018, we made the following critical changes:
- Created two categories of engineering investments: centrally tracked and team level tracked
- Implemented a visible accountability mechanism via a monthly leadership update
- Added all of the engineering investments we committed to into product roadmaps
Managing Investments Centrally or in a Distributed Manner
We did not want to swing the pendulum all the way opposite from how we’ve been managing engineering as distributed and product-oriented teams. Instead, we picked six key engineering investments that we wanted to drive centrally, the rest were up to the individual teams.
The criteria we used to identify central vs. distributed investment was consideration of whether the investment impacted more than 50% of our teams across Rally and whether the investment had a high impact/urgency. Examples of central investment included Play framework upgrade and a project called Neptune, to be able to create various environments on the fly using containers running last known good versions of our production software.
Central investments are complex and usually have a web of dependencies to tackle. Assigning a part time program manager to help manage these dependencies helps. To reduce risk, front load your most important dependencies up front in the schedule as not to block/slow down the rest of the teams.
For each central investment we assigned a central owner. Each owner was a leader who had deep knowledge in that specific area and strong respect across the organization. The owner acted as part project manager and part product owner. She defined the scope, made sure there are estimates from every team, ensured the initiative was in each team’s roadmap, put together the overall project plan, and tracked each effort’s progress.
We also gave freedom to individual teams to pick and commit to a minimum to two additional investments on their own. We settled on the total count of 25+ investments across Engineering. These included everything from automation of semantic versioning to refactoring. We managed these investments in a distributed manner having individual teams own the delivery.
Getting Product Support
Rally teams have strong partnerships between product and engineering. Engineering leaders work through their product counter-parts to include these investments in the common roadmaps. This approach is more personal and has an aspect of accountability.
Start by identifying the business reason why you’re doing this in the first place. To make sure you’re on the right path, look at alternative options and trade-offs. For example what would happen if we’re not going to do the investment. Turn the values gained into a metric: dollars, time-saved, and quality improved through reducing bugs. Engineering investment means that you are going to move a bit slower now, but will move a lot faster when done. You’ll end-up with an updated, robust, and secure software as a result.
For example, we got product teams to agree to allocate time for Neptune because we promised to reduce a 30 day process to a 30 minute one. It helped that the individual contributor engineers were transparent with their product counter parts that this was a big area of inefficiency. We also explained to the product teams that the alternative to setting up standard environments with Neptune would be that we would custom build each one. This would result in us not having the ability to easily catch configuration issues in production. Given that a significant portion of production issues were configuration, this point helped as well.
Our product teams have found different ways to allocate time to investments. Some have allocated 2-3 developer equivalent of engineering time to doing nothing but investments. Some manage investments together with product work. Make sure your hiring plans include the needs to deliver on investments.
While the sense of urgency is important, be transparent as to the delivery timing of the investment as well. Can it wait 1-2 months or is your current framework version going to lose support by then? Every situation is different. If you have a lot of database performance issues, getting support for the right version is paramount. We had this issue in the past where we were using a MongoDB version that was about to expire during a time when we had scalability problems.
Being thoughtful around the scope of the work is also important. Do you need to do everything or can you split work into smaller parts and prioritize parts that are more critical? For example, one of our investments was related to security. We’ve split out the highest risk areas and prioritized these above everything else. Not everything is a P0.
Finally, check-in often on the progress you are making. Keep product updated on any changes in scope, priority, and completion of your investment projects. We do this on a monthly cadence.
We’re doing much better in 2018 in terms of completing our central investments. Even though we’ve hit roadblocks, checking in on a monthly basis helped us move these projects along.
The two most important lessons for central investments was to have strong engineering leaders help individual managers get unblocked and make sure that all commitments are in product roadmaps. The central leaders defined the scope, created roadmaps with timelines, and unblocked individual managers when they hit a wall.
Finally, get product buy in early and include business partners during planning. Be able to defend your investments from both the value they drive and the priority they’ve been given. Eventually you will be asked to rank these investments along the same scale as product initiatives. Use data to drive those decisions.