Properly closing out / solving Customer Emergency tickets
Request for comments
Need
We frequently do not properly close/solve emergency tickets. This often leads to unsatisfactory outcomes for customers and stress for SEs. We already have pretty clear guidance: https://about.gitlab.com/handbook/support/workflows/customer_emergencies_workflows.html#when-the-emergency-is-resolved – but I know both from my own actions and from what I frequently see happen that we're not following it as much as we should.
I'll point to two recent examples from myself:
https://gitlab.zendesk.com/agent/tickets/422214
An emergency from this Saturday: The call went almost 30 minutes past my shift end due to concurrent emergencies happening on the weekend. The situation was defused at that point but the expectation on both ends was that this would continue (as an emergency) the next day, see my Slack message. Despite that I should have solved that emergency with my summary and include the usual "send a NEW email" stanza. I could even have merged it back into https://gitlab.zendesk.com/agent/tickets/420990 and said we'll continue any async work (besides possible follow-up emergency work) there – that would have closed the emergency ticket, even better.
Why did I not solve it? The situation was defused, but it was very much not resolved. I was still coming down from a "several hour troubleshooting uncharted waters alone on the weekend" adrenaline rush, and the last thing we said was that they'll provide some logs on the ticket. It just didn't feel like solving would be right. (But it would have been!) Additionally, at that point in time I already working longer hours on a weekend shift, so I'll happily admit that my motivation to stay even longer and create the follow-up issue wasn't particularly high. I could (should?) have asked my AMER successors to do so, but there was already other emergencies happening that they were handling.
Unfortunately the customer kept messaging on the emergency ticket shortly afterwards, and I didn't see it until coming back from PTO today. It was "STARred" – just overall a pretty bad outcome.
https://gitlab.zendesk.com/agent/tickets/414170
License emergency with a normal ticket in tandem, you can (for two more months, retention period!) see how I made the complete incorrect decision to merge the normal ticket into my emergency ticket – and immediately regretted it. Really no good reason here other than "It felt more like going with the flow to do it that way". The consequence was an "Emergency" ticket that stayed open for days, NRT breached a couple of times due to an SLA that didn't make sense anymore, while we/I was trying to hunt down the correct Sales people to handle the situation. I could have kept advocating for that customer as CC on a follow-up ticket just as well.
Approach
Use existing guidance
Considering we already have the guidance linked above:
- Follow it.
- Identify when and why we don't follow it and find solutions.
- If you're already following it strictly and are wondering what the heck I'm talking about: Share your secrets and wisdom!
😅
Tooling
Is there anything to be done with "tooling" to help? When I merged #414172 (normal) into #414170 (emergency) instead of the other way around, is there anything we could do with Zendesk triggers/automations to either prevent it, or at least annoy and shame me into not doing it anymore in the future?
Shift duration
As I identified above, one of the reasons I slipped up in my first example was the fact that "to do it right" I would basically have had to work overtime. With an 8h CEOC shift it's almost a guarantee that will happen, because I rarely start work only exactly the minute my shift begins, and definitely never stop working exactly when my shift ends. Shifts that aren't as long as a full working day would ensure that follow-up for emergencies work can happen more easily.
This is a bigger topic and I already said that I'm not keen on personally driving another EMEA CEOC schedule change in the near future. But I'd be happy to participate if someone wants to trial what AMER is currently doing with CEOC (three shorter, overlapping shifts).
Benefit
Better outcomes for customers in emergency situations, better psychological safety for SEs in the CEOC rotation. (Not sure how much that second point is true for everyone else, but I feel awful when I'm not 100% sure that an emergency will be properly handled after my shift is over.)
Competition / Alternatives
Not sure, that's kinda what the RFC is for. Open to hear any input and ideas you might have!