Surviving on-call

12 May 2020

For the majority of my engineering career, I’ve been on-call in some capacity. Whether it is looking after systems serving 50k rpm and customer facing, an internal service only handling internal traffic or serving as a subject matter expert for a technology in a bunch of systems I’ve rarely touched.

Much of the experience was trial by fire (my first couple of months at Envato included one of the longest sustained DDoS attacks we’ve seen), despite this I’ve managed to come out of it with lessons and tips I’d love to share.

Look after yourself

It should go without saying but here we are :slightly_smiling_face: Get enough sleep, eat properly and make sure you’re getting enough exercise. While it’s important to maintain this in your normal routine, it becomes increasingly important when you’re under a bit more stress than usual. You don’t have to be a power-lifting chef but you should be avoiding the usual unhealthy food suspects and getting at least some exercise to disconnect.

Don’t wait for alerts

My number one tip. Don’t sit around waiting to be paged. Being on call doesn’t mean you’re under house arrest, or even that you can’t do other activities. The premise of being on-call is that someone is available to take on emerging issues when they happen. Don’t sit on Slack (or whatever other team messenger tool you use). Don’t sit on your email. I’ve taken alerts on the side of the road, while herding cattle and while riding dirt bike trails. None of these scenarios changed the outcome of the event and if anything, helped formed the mindset that on-call isn’t an impediment on my day to day life but just another part of it.

Be prepared

Over the course of my on-call career I’ve come up with a few tools that have made my on-call experience more enjoyable and less nerve racking. Now, I want to preface this by saying that the tools don’t make the craftsman. You can do on-call without ANYTHING in this list but, if you can afford it and want to make you’re on-call experience that little bit better, this is a great start.

  • An on-call bag. Yeah, not the most fashionable thing but if you have a single place where your on-call stuff all lives, you can grab and go as needed. The only thing that I need to add to my on-call bag is my laptop which allows me to quickly grab it, put it in my bag and then take my bag knowing I have everything to be prepared for a call out. For a long time, I had the Cocoon Slim backpack which also had a nice Grid-It securing system in the front which worked a treat. I’ve since moved onto a Dakine backpack which I also use for air travel carry on and snowboarding. Any backpack will do though.
  • An internet connection. Pretty self explanatory. I use my iPhone via tethering but others have used dedicated hotspots (like MiFi).
  • Corded headphones. I’m a big fan of my Bose QC35s for day to day use but when I’m on call, I need something that doesn’t rely on batteries and can be used in my iPhone and my laptop. For this reason, I use the previous generation Apple earphones with the 3.5mm jack. Then, when I need to use it on my iPhone, I connect the 3.5mm jack to lightning adapter.
  • Battery pack. For whatever reason, the universe always decides the day you’re going to get a longer call out is also the day your devices aren’t going to be fully charged. I added a battery pack large enough that would also be able to charge my 13” macbook pro. This way, if I’m away from an outlet I can pop it onto the battery pack and get another couple of work done.
  • Charger for laptop. Complementary to the previous point. You’ll often be within reach of an outlet; take a charger and plug in where you can.
  • A mouse. I’m not a track pad fan, so any call out over a few minutes can be infuriating to be stuck on the laptop track pad while trying to fix issues.

The biggest improvement from this list was putting everything in a bag as I was able to sit that near the door and when I left the house, grab it and go without needing to think too much about it.

Use your secondary

Every on-call should have a primary and a secondary. This ensures that if the primary misses an alert, the alert is automatically escalated to another person to look at. However, secondaries shouldn’t only be used in those scenarios where the primary misses the alert. Sometimes life happens. Sometimes you can’t plan ahead. And sometimes, you just need a break (especially on those hectic shifts). Use these times to poke your secondary and ask if they can take the override and share the load. I’ve seen far too many people not utilise a secondary and end up in a situation where they weren’t fully present and end up making mistakes that draw out incidents because they didn’t think they needed to schedule an override. This is give and take so if you’re constantly offloading to a secondary, you need to pay it forward.

Have a preflight checklist

Do you know why first responders, pilots and astronauts all use checklists? It’s because they proven time and time again to be effective at mitigating human error and limiting on the fly decision making. Using a checklist ensures that when someone is going on call you can confirm the required access is assigned, silly or commonly overlooked things are covered explicitly and important details such as escalation or communication processes are made clear to the incoming on-caller. Regardless of how often you’re on-call, you should be going through the checklist every time. This ensures people don’t get complacent, miss critical updates and keeps the checklist self-updating.

Have you got something that you can’t do your on-call shifts without? Got a ProTip™ I missed? Let me know! I’d :heart: to hear them.