Implementing a team topology

Posted by Alex on February 24, 2021

Any organisation that designs a system (defined broadly) will produce a design whose structure is a copy of the organisation’s communication structure.

— Melvin E. Conway

Introduction

Team topologies are a concept created and popularised by Manuel Pais and Matthew Skelton. They describe a way of reforming your organisation shape in such a way that improves the architecture of your software while reducing cognitive load and directing more energy and focus towards the streams of value that flow through your organisation.

Performing an inverse-Conway manoeuvre is a big deal. We’re talking about people’s livelihoods here. To do it, you need to collect skills in change management, communication, design, and potentially many more things. Critically you’ll need to figure out the communications pathways and relationships in your business that will get your plan actioned. You’ll need to figure out the logistics behind the action as well. If you can do that, you’re golden.

Those top-level streams of value are paramount. Getting them right helps your design resonate. It helps you build shared understanding. It helps you account for change over time and growth. Streams are flexible by nature, but it’ll help if your streams make sense from the start.

To do this, you’ll need a team behind you. More than that you’ll need friends. Folks who come together to work towards an answer to the questions:

Why am I spending so much time fighting my architecture? How do we build a sense of ownership? How do we balance short term priorities with long term strategy?

I’m going to talk about how my org did it — an org that is large by New Zealand standards but small globally.

I’ll walk you through the symptoms we had and how my org struck at the root cause, how I developed an idea and worked to get it signed off. I’ll finish with how we’re rolling out that change and offer some advice for you to takeaway.

One time I cooked two-minute noodles in 112 seconds, and that was pretty cool. This is much cooler.

Symptoms

In our technology department, we used to be extremely busy. We still are, but we used to be, too. We had various streams of work in progress such that every ten weeks, all of our teams would end up working on something different to what they worked on in the previous ten weeks. High WIP made things difficult to manage, and it caused us pain.

Teams could never really “own” a bit of software for longer than ten weeks. We couldn’t build tribal knowledge because we chopped and changed regularly enough that by the time we returned to something (if we returned it at all), the expertise we gained prior we lost to skill fade.

Mostly we took one pass at solving a problem for someone and never got to iterate on an issue. Our teams were (and still are but to a far lesser degree) funded on a project basis, and the project already had a tick in the box.

Because we never iterated on our work, we never returned to address technical debt. We never got to enhance successful features nor swing around to flourish the scythe at poorly performing features.

Knowing we would never return, we would oscillate between gold plating our work (and delivering late) and piling on technical debt (which would cause late delivery and more down the track instead).

Never returning leads to ‘ninja teams’ that are typical firefighters who do an excellent job of tackling small urgent work but struggle to create features of substance. This type of work focuses on short term aims and optics at the cost of long term progress.

Knowing we would never return, our stakeholders would ask us to reduce our estimates and do more work in less time. Or worse, ask us to cut non-functional work or work that other projects relied on (also, we had dependencies between work that ought to have been independent).

None of our teams had distinct specialities, focus areas or parts of the architecture they resonated with or cared for.

It was expected that to deliver a unit of valuable work to a customer; an engineer would need to work across 5+ codebases, 3+ languages, queues, streams, containers, lambda functions or services deployed on bare metal.

The cognitive load across these tasks precluded us from developing expertise. The footprint of our architecture relative to our level of staffing trapped us in a place where we could never gain or retain enough proficiency to deliver to our partners and stakeholders’ expectations.

Our engineers did a great job of treading water — but they should never have had to do that! We as leaders put them into a place where systemically, it was an effort to remain in place.

With continually increasing demand from our partners across the organisation, we couldn’t keep doing this. Not only did we need to staff up. We needed to organise ourselves better. We still do, but we used to, too.

What is a Team Topology?

Briefly, team topologies are the answer to the question, “How can I reshape my organisation to be more focused on the streams of value that flow through my business?”. Developed by Matthew Skelton and Manuel Pais, team topologies gave me the tools to describe what I felt needed to change in my org, be more focused on value, focus on products, and focus more on teams and streams of business value.

Team topologies are about optimising for flow and reducing cognitive load of all kinds. It provides a ubiquitous language to talk to anyone in our business about value, interactions, and what it might look like to evolve our organisation humanly and humanely.

If you’re not familiar with the concepts and have some time, I highly recommend you have a quick read of the website

Team topologies contain four basic topologies — stream aligned teams, enabling teams, complicated subsystem teams, and platform teams.

Topology Description
Stream Aligned Team stream aligned teams cover the streams of value that flow through your business. These streams could be internal processes that differentiate your product or service; we could align them to the product or service itself.
Enabling Team Enabling teams are experienced facilitators. We intend for them to unblock other teams around them.
Complicated Subsystem Team Complicated subsystem teams have the experience, expertise, or business and domain context needed to maintain and improve complex or critical software safely and effectively.
Platform Team Platform teams provide high-quality internal products to other groups in the organisation. Platform teams can cover cross-cutting concerns like authorisation and authentication, billing or internal developer tooling.

The fractal nature of these streams allows for growth. They also enable incremental adoption — this is a critically important factor for my enterprise, and I imagine others will find the value there as well.

As a concrete example, right now, we have one platform team, my team. Our platform team needs to support cross-cutting concerns over a business of many hundreds of staff and many development teams. We have a backlog that’s about four years long, and we’ve existed for four months.

That’s okay!

I’ll expand on this more later, but as buy-in grows, we can hire more teams to work in the platform space, and we can start splitting our platform space into streams of the value of their own. I.e. right now, we have one platform team; in the future, I expect we’ll have an animal platform team, a customer platform team, et cetera. They will one day cover all of the valuable parts of our domain where cross-cutting concerns exist.

Teams interact with other groups; often, it’s not clear how or that how is not a considered (even conscious) decision. Out of the box with team topologies, you get three modes for how your team should interact.

Mode Description
Collaboration In collaboration mode, teams pair up for a defined period to learn together things they share a mutual interest in.
X-as-a-Service In as-a-service mode, one team provides something as a service. The consuming team treats the thing as a product; the consuming team can choose to buy it off the shelf or choose to do something else.
Facilitating In facilitation mode, one team focuses on helping another team go faster or be better for some value of faster and better. I expect that the vast majority of facilitating teams use this interaction mode.

Why Team Topologies?

Team topologies, an extension by the same authors on DevOps topologies (links below), is well-researched, well-cited, and the adoption is snowballing. It’s cutting edge, and it directly answers the pain my organisation felt.

It’s not the case with all frameworks or concepts someone will try to sell you, but team topologies directly address the root issues of that pain as well!

Some of the frameworks or concepts that I’ve read about seemed fluffy, difficult to apply to the real world or hard to implement. This is not the case with team topologies. I found it easy to re-model my organisation on a whiteboard. I found it easy to grab the nearest available executive (who happened to be the CIO) and talk through what I was thinking. We both found it easy to talk about what it might mean for us, how we might start it, and how we might measure how we are doing.

I rolled my whiteboard around the office and shopped it to different parts of my organisation, and each time I was heartened to see folks intuitively grasp the ideas. Because the model was simple, the ability to give feedback on the whiteboard was very accessible.

In this way, it was easy to get buy-in on the strength of the idea.

That ease put folks at peace while also enabling them to research for themselves; it cemented the ideas as credible and actionable. Even better, we had a clear place to start with rolling out a platform team focusing on our API platform that interacted in a collaborative and sometimes as-a-service manner. The API platform team was a great first team, and we could use it to build comfort and familiarity with the system. We could use that platform team for prototyping the wider rollout. We could roll out incrementally, starting with a top-level team for our API platform, and then convert existing units into top-level teams for their streams. We could later split those streams and add additional humans who would be more tightly focused on their fractal streams.

The ability to roll out incrementally is a crucial enabler. It means we don’t have to invest millions of dollars — we can change the org like we change software, quickly, adaptively and responsively. We don’t need to ask for an infusion of trust and an investment of money — we only need to ask for the first team and then show what we’ve measured and learned.

Incremental rollout is easier on the rest of our business function as well. It’s easier to hire, onboard and retain small numbers of staff. It’s easier to help those staff build relationships across the business, and it’s easier to track how we’re going — making it easier to ask for the next team and do it all over again. I can’t stress enough how critical this aspect of team topologies is for our business.

This low-restructure (frankly, you can never guarantee it) approach stands apart from more traditional ways of defining and creating a system of humans. I love it because of the degree of humanism I see in it. Nowhere in team topologies or any of the conversations around it do I hear humans referred to as resources.

The Process

Most organisation change is long-lived. It can be messy. It can be difficult. Humans are prone to making mistakes. We’re inclined to miss things. Before I asked my colleagues and friends to embark on any change, I needed a clear, shared understanding of where the organisation is at.

For my organisation, this meant defining what teams we have, what they are working on, how they act and how they feel. Do we have the coverage we need to support all of the software running in production? Are teams happy and healthy? Is there anything that’s causing them pain right now?

Defining these things on paper, visually rather than with paragraphs, is a forcing function for clarity and understanding (if folks can see it on a whiteboard, they’ll tell you when you have it wrong).

In my case, I saw teams that wanted to gold-plate solutions; we (systemically) had piled them high with code they would never have time to maintain. We asked them to context switch on an almost daily basis. Making a single change in a product could require changes in many supporting systems. This did not spark joy.

We knew the power of teams that cover vertical slices and focus on delivering incremental improvements to our customers, but our domain is complicated, and our architecture is large. A number of us across the technology department sensed (and still feel today) that only having vertical teams in a domain as complicated as ours, holds us back. They needed support.

We burdened teams with too much cognitive load. We asked them to take on so much intrinsic and extraneous cognitive load that they had very little thinking space left for the germane cognitive load!

   
Intrinsic Intrinsic cognitive load is the inherent level of difficulty associated with a certain system or domain.
Extraneous Extraneous cognitive load is generated as a result of poor design by the designers of the system, and negatively affects those who interact with the system.
Germane Germane cognitive load is the processing, construction and automation of knowledge — in other words, it’s the stuff we want to spend as much time on as possible.

We struggled through our symptoms, and credit to the teams — given what we asked of them, I think they did a great job. I think we treated them unfairly.

The Enemy of Art is the Absence of Limitations

Orson Welles

Our situation felt dire when catalogued this way — but constraints breed creativity. It took some research, but I eventually discovered the ideas Matthew Skelton and Manuel Pais had packaged up as “team topologies”.

I spent about a week on my initial drawings. I spent another three refining my ideas with the feedback of anyone who chanced a walk by my desk and everyone who had five minutes for a chat that inevitably ended an hour and a half later. I filled up 3.6m x 5.4m of whiteboard area — 20 square metres if you round up. The digital versions are far tidier (full credit to an excellent UI designer named Luiza) and more compact. I’ve inserted them below.

The front page of the team topologies I designed in April 2020 How I thought we could roll this out Some business rules I thought we needed at the time

There are a few caveats here — this isn’t the final concept we went with. It took further refinement at the exec level, we had to make it fit into the financial model of my org, and we had to think about the types of skills we needed in each team.

Additionally, while the work the images captured above were all mine, we had an existing strategic plan for technology, and had spent a lot of time fulfilling that strategy — the work that I had done fed into these plans and offered language, context and shape to what we already felt we had to do.

At a very high level, I produced an organisational design with both vertical and horizontal teams, where all teams align to a stream of value. My organisation already has business units aligned around the value that we bring, as you’d expect. So each business unit/stream gets a top-level team initially. I designed these teams to be overwhelmed, and I expect them to have a long backlog initially. Once they’re up and running, the most valuable work or critical systems should naturally bubble up to the top of the backlog. The key for these teams is to remember they don’t need to boil the ocean, only demonstrate how we all work together.

I thought that once top-level teams are up and running, and work is being done, the next large job is to begin planning for additional teams. This means doing the work and getting value out to users. It also means using the demand in the business and bubbling up of work in the backlog as signals to identify fracture planes in each stream of value.

You can align streams of value in several ways. I don’t think there is any single way to do it. My org has streams of value that map to workflows in our products, to user journeys and platforms like mobile and web. I’m currently aligned with the top-level team of the API platform. One day, I’ll start cutting that up into streams about animal data, scientific processes, customer data, authentication and authorisation, or more. Nothing is true; everything is permitted.

This structure addresses most of the pain I had identified and validated. This structure is by no means perfect or final. I don’t believe there is such a structure. It seems clear that as our organisation grows and our needs change, we’ll need to revisit our design. The good news is that aligning to streams of value means that we can modify the way we are shaped by applying or removing teams from value streams.

One key issue that this structure does not address — and it’s worth stating explicitly — is product or value stream funding vs project funding. Arguably, many of the challenges we have had are directly attributable to our organisation being funded for discrete projects and not long-lived products of value streams. This discussion isn’t one I want to spend much time on because team topologies are distinct and separate concerns. To a degree, we can map incoming projects to streams of value, so it’s not a huge deal now — though it certainly was before we were able to map stop/start work items onto long-lived bits of software.

A critical pivot point for value stream vs project funding is addressed by my model in that I’m advocating for a far greater influence in our system for product people. Each team in my design contains either a product owner, or a product manager, or both. Product folks should proactively engage with the wider business and collect all of those would-be projects and map them to products. With time I would expect to see the number of projects coming in from the wider business reduce as we scoop up increasing amounts of work and drive from within the technology department.

With some whiteboards to focus conversation onto ideas, the part that I struggle with most is next — communication. I’m not great at packaging what I’m saying for different audiences; I’m not proud to say that I’m responsible for more than a few phone calls between executives!

Communicating well what your model looks like, what it’s optimising for and importantly, what it’s not optimising for is the bread and butter of getting this change across the line. Your organisation will be unique in this area — it comes down to people. Who are the people you need to show this to? Whom do you need to convince? Who can you recruit as a signal booster to get your message into parts of the organisation you don’t have access to?

I probably spent the better part of 100 hours talking about this to people, taking feedback and refining the message, format and substance. Once I started hearing people talk about this in reference to other issues they’ve encountered in their work, I knew that they got it.

The next step was to do it. Both intimidating and exciting, it was time to turn the model into something tangible. Again every organisation will be different and have different levers. For my org, it was a mixture of strategy and finance. I had to convert this arbitrary concept I found on the internet into something that our leadership team could act on - that meant turning whiteboards into the same shapes that go into our budgets, with the same language we use in our strategy documents.

Something that helped a lot was to frame this idea in the same language we use today. Department leaders and I spent time collaborating on the next version of our plan. This plan was an extension on what was already occurring, and would help us scale effectively from five teams to more than fifteen! My input in this process was language and shape. The streams, the teams and the meaning behind them became a part of our strategy and a part of our plan. That input morphed from being a single data point to being “just the way we do things”. Importantly, this collaborative process made it easy for all of us to tie the benefits to current opportunities we had to improve on.

At that point, we had a plan that looked budget-able if you squint at it. We had a plan that we felt would give us room to grow while addressing pain and history. We published the idea in the relevant documents, and the machinery of the organisation swung into action to analyse, approve and implement.

The critical take away here is this: Don’t fight the plumbing. Co-opt it. Redirect the ship from the bridge, not the bow.

The Rollout

My organisation started with just one team, responsible for the shared core. The idea being we start slow and take the time to get it right. Team interaction modes, the way we embody the agile manifesto, the way we apply software product management principles, and how we collaborate to create technology all need conscious thought and planning. Importantly, all of the ways the team come together to deliver value also need road testing. We planned to work in very different ways compared to what some parts of the business have seen before, and we must speak clearly about why we’re doing that and why we think it’s essential.

Consequently, we planned to do the road test, collect some wins, and earn some scars, informing us for when it comes time to scale up. Starting small is the best way to make these changes stick. The alternative is asking for all the teams upfront, having to hire 10+ teams while you’re trying to build a brand new way of working together. Organised Chaos would be a gentle description, and in doing this, we would be failing the business in what we promised. We’d be acting big and thinking small when really we want to think big and act small. We want iterative, not big bang.

Boyd Multerer, Xbox’s father of innovation, once told me this: Think very carefully about who the first member of the team is. Who are the founding members? You only get to make your culture once. It’s those people who will set it, and after those founders set it, it’ll be tough to change that culture.

I’m incredibly proud of the first team. The team is a mixture of internal and external hires, with a broad technical base and deep knowledge of the technology stack and our domain. We have a culture of writing, of collaboration first, and of always delivering incremental value. Boyd’s words ring true.

With the team in place, the time comes to loudly and explicitly make space for experimentation. There is so much discovery to do for a new team to move from forming to performing while figuring out the domain, the architecture, the strategy and everything else they’ll need to think big and act small. You might consider that first team to be the groundbreakers, but every new team you hire that looks and works that same way will need to make discoveries of their own.

All of the value streams that run through your business will be different, and those teams will need to engage and model themselves differently. What are the engagement modes, and what team topology will each stream require? How will we hire and onboard supporting teams? How will we draw the wider org into the conversation?

Most challenging of all, how will we build community? How will we move from being a bunch of people who don’t know each other well to be many people who have strong community links across the organisation? Those relationships will see those teams through hard times; building community is a top priority.

Every team will be different, that’s okay.

We’re rolling out right now so a lot of this is planned-but-not-actual, or actual-but-only-since-Tuesday so forgive me that. These lessons are hot off the press and our journey is incomplete, I hope to perpetually remain with that feeling of incompleteness at the same time as I watch us grow, deliver, reshape and grow again.

Conclusions

Wrapping up, I have some closing thoughts:

  • Team Topologies can work well, and when you’re potentially throwing the whole org into upheaval, you better have done the homework.
  • Communicating well and getting the logistics right are the two single most significant determinants in delivering. Invest the time and energy to get those things right.
  • Your top-level streams of value are important; if you can identify those initial streams accurately, you’ll have an easier time figuring out how to split them when the time comes.
  • Humans are social creatures. Get buy-in. Get help. You can’t do it on your own.
  • We’re rolling out more teams and more streams faster than we thought we would. The ultimate mark of success in my book is more teams delivering more value to customers.

Citations and Further Reading