Lesson 18: Operational Risk Nobody Talks About

Lesson 18: The Operational Risk Nobody Talks About (Until It’s Too Late)

When your team leans on just one person for a key job, you’re always on thin ice. One sick day, one family emergency, one person leaving for another opportunity-suddenly, you’re stuck and scrambling. So, how do you make your team truly resilient without anyone feeling like you’re pushing them out? You start with redundancy.

Most companies get this risk in theory, but they usually ignore it until it smacks them in the face. There’s no early warning. It’s all smooth sailing-until the one person who knows the ropes disappears, and the rest are left floundering because no one else actually knows what to do.

Everyone is seen this play out: maybe there’s an engineer who is the only one who can fix a broken tech or system, or a support lead who holds all the history with a tough customer, or an Ops lead whose process lives in their head and nowhere else. These are single points of failure-hidden, until they’re not. It’s nobody’s fault. Cross-training feels like a luxury, and there is never enough time, so the problem stays low on the to-do list until things go sideways.

Maybe you already have great processes, detailed SOPs, checklists, the works. That’s good. But here’s the uncomfortable part: would everything still run if that one expert wasn’t here?

The Principle: Real Resilience Means No Heroes-Only Redundancy

A lot of teams lean on their superstars. It feels safe, until those stars are out. The only real safeguard is making sure at least two people know every critical process. Not because you want clones, but because the expertise is spread out, not locked up with one person.

This isn’t about doubting your stars. It’s just reality. People leave, get sick, take parental leave, or need to unplug. If your whole process hangs on one person, that’s not a process-it’s just a hidden risk.

The solution’s not complicated, but teams keep putting it off. For every mission-critical role, there needs to be a main person and a backup who’s actually trained on the job. Not just a checkbox-you need the backup to do it for real, live. Keep a record. Give credit when they step up. Make it as visible as any other part of your process.

Here’s What This Looks Like for Real

Quick story: There was one engineer who handled every rollback on a core system. Not officially, but that’s just how things evolved, no one else had the experience. When he took time off, you guess is as good as mine, the system broke and needed rolling back. The “documentation” existed, but it was vague. No one had practiced. Cue confusion and stress. Eventually the team got it done, but it was messy, slow, and nerve-racking.

After that, we did things differently: picked a specific backup for rollbacks, set up shadowing sessions so the backup watched the pro handle real problems. Then we flipped it-the backup did a live rollback, with the expert watching, not leading. Not a rehearsal, but a real event. We logged it. Every month, the backup runs a real rollback. No theory, just actual practice.

Next time the main engineer was away, the backup jumped in. No panic, no fumbling. Everything just worked. The time to fix dropped, and everyone felt calm because they’d been there before.

Redundancy worked because we didn’t just “train” the backup, we actually let them do the job.

How to Put Redundancy Into Practice

First, make a list of your most critical roles: Where would you be in trouble if someone didn’t show up? For those, you need a backup. Be honest-if a key person disappeared tomorrow, what would fall apart?
Next, assign a main and a backup. Not “anyone on the team”, pick a name. Write it down. Make sure both people know their roles.
Time for cross-training. That means:
1. Shadowing-the backup watches the main do the job for real. Not a demo. Actual work with real questions.
2. Then, within a month, switch-backup does the job live, with the main watching but not taking over. That’s the difference between knowing and actually being ready.
Check every month: Did the backup do the task at least once? Yes or no. Don’t overcomplicate. If you don’t keep it up, backups go stale and you’re back where you started.

One thing about team culture-celebrate when the backup handles the job. Announce it in standups or retros. Everyone likes feeling needed, and a lot of folks worry that teaching others everything they know means they’ll be easy to replace. It helps when you praise them for teaching and lifting the team-backup success means the main taught well.

The Honest Truth

You’ll get pushback. People guard their special expertise because it feels like job security. Sharing it, even when it makes sense, feels risky. It’s awkward, so respect that. But instead of saying, “We need a backup in case you’re gone,” try: “You’re the expert-can you teach the team?” The first way sounds like you’re making someone disposable. The second way turns them into the teacher, the go-to person, the one who made everyone better.

You absolutely need redundancy, but the way you ask for it makes all the difference.

Frame it as growth, not threat. Backups succeeding proves the main person taught them well. Make it a win for the whole team.

So make the list. Name the backups. Get them hands-on, and track it. Celebrate when the system works. Seriously, that’s worth cheering.

Take five minutes-ask yourself, if your three key people all vanished next week, what would fall apart? That’s your starting list. Pick backups, plan some shadowing, and get started now, before an emergency forces your hand.

Next, we’ll talk about the “Release Boundary That Stopped Rollbacks.” Stay tuned.

Shooting For Your Goals Should Not Be A Daunting Task