Sending notifications too late, to the wrong stakeholders ... be easy to use that data to send one communication to ...
GU I DE
M
ajor incidents start to impact businesses within just minutes, often before incident resolution teams can even be engaged. With this immediate impact to business, meeting your major incident SLAs
is no longer enough. You are under a microscope from customers with expectations that all systems are ‘always on,’ or you’ll inform them immediately and communicate to them along the way.
What’s more, given the sensitive nature of incidents and the variety of audiences requiring communication, knowing how to adequately manage communication plans can be tricky. Handling them poorly can be even worse than not handling them at all. When it comes to communicating during major incidents, business
customers
are
like
Goldilocks:
They
want
information that’s just right. They don’t want too little, they don’t want too much, they want it relevant to them, and they want it right on time. This paper outlines best practices to empower and engage your key stakeholders during major incidents.
Align Your Communications with Your Process During the Planning Stage Having processes and systems in place to resolve major incidents means you’re already ahead of many large enterprises. Monitoring, contact, triage, resolution, post mortem all constitute areas where pre-planning can bring value. Major incidents happen. The processes and systems in place will determine how successfully you resolve them. The right technology, proper planning and configuration can produce
THE REAL COST OF POOR COMMUNICATION
S
ending notifications too late, to the wrong
Executives kept in the dark can take your time with
stakeholders, or with the wrong message can
inquiries, damage your internal reputation, and cost you
damage your organization (and your job security).
your job.
Extend your communication beyond resolvers: Customers will contact you by the thousands. Customer satisfaction drops by 15 percent every time a customer has to call back about the same issue.1
PROACTIVE COMMUNICATIONS DURING MAJOR INCIDENTS
Third-party service providers who lack information cannot help with resolution. A service outage starts to affect most businesses within just 15 minutes.2
www.xMatters.com | 2
GU I DE
a turnkey, repeatable process that does not require manual
Choose a template, update recipients if necessary, update
communications and produces faster resolution times.
messaging if necessary, add relevant data, and send. Even better, tight integration between monitoring, ticketing, and
Each of these steps produces a unique set of data. It would
other systems with communication systems could allow for
be easy to use that data to send one communication to
messages to auto-populate based on the type of incident.
incident resolvers and all stakeholders. Instead, parse the data to produce multiple messages for unique audiences. For
at Headquarters are failing intermittently. Please test
Right-Sizing Communications: How to Get it Just Right
the servers and all the connections to determine the
As an IT professional, you likely didn’t sign up for one of the
exact cause so we can take corrective action ASAP.
trickiest communication jobs in the business, but getting
instance, in the Triage step, messages might look like this: • To incident
resolvers:
Servers
SRDC-RHL09
through RHL16 in the third bank in Server Room 1
the communications right is now officially part of your job • To the IT department head: Servers SRDC-RHL09
description. A good rule of thumb on tackling this new task:
through RHL16 in the third bank in Server Room 1
communicate as broadly as the impact of your incident, as
at Headquarters are failing intermittently. We are
detailed as required (but no more) for any given audience,
beginning tests. We will let you know when we have
as far ahead of impact as possible.
determined the exact cause and can give you an estimated completion time.
BREADTH Major incidents are no longer isolated to back-end batch
• To executives: We’re having an issue with some
processing nor even live systems that impact a small set
servers at Headquarters. We are testing to determine
of constituents. With so many applications and services
the cause, and we will let you know the estimated
interconnected, a single incident could have a domino effect
completion time as soon as possible.
impacting multiple stakeholders. Consider the following audiences when building communication plans:
• To customers: We are experience some service issues. We are working on a resolution diligently, and
Direct Stakeholders – Communications should be sent to
will restore service as soon as possible.
anyone whose business processes may be impacted by the
Automate Messages with Templates
incident. Depending on the type of incident, this could be owners of a process dependent on an impacted service,
You need at least four distinct messages. There could
an internal customer reliant on the interrupted service or
also be messages to partners, utilities, PR, corporate
even end-customers (business or consumer) who will feel
communications, and more. Creating all these messages
the impact in a significant manner. For example, an incident
on the fly in the midst of trying to resolve a major incident
that might simply slow the services of an end consumer
would cause delay and increase the risk of error.
might not require communications to consumers, but a business customer reliant on a service that experiences an
Instead, create a series of templates to automate the
outage will be more significantly impacted and should be
tailoring of messages at each stage of the process.
considered in your communications plan.
PROACTIVE COMMUNICATIONS DURING MAJOR INCIDENTS
www.xMatters.com | 3
GU I DE
Help Desks and Call Centers – When services or applications
expectations that communications will be transparent and
are slow or disrupted, your typical 2015 worker seems
timely. As 451 Research Analyst Donnie Berkholz stated in a
to have trigger finger when it comes to calling your help
blog post “We all understand that sometimes things break,
desk. Even a 20 minute e-mail outage can result in a flood
because clouds are incredibly complex systems. We’re only
of hundreds of tickets and phone calls in the help desk of
really looking for two things out of it: (1) don’t have the
a typical large enterprise. Alerting the staff manning the
same problem twice, and (2) keep us informed.”
lines not only prevents them from opening duplicate tickets, but it’s also an opportunity to arm them with outbound
Optimal timing would allow you to reach stakeholders and
communication to alleviate the concerns of inbound callers.
clients before they feel the impact of an incident. Let the call center know before an influx of calls, let the client know
Management – While extra inbound calls have hard dollar
before their own systems are impacted, let your executives
costs associated with them, an out-of-the-loop executive
know before someone else tells them.
has an immeasurable reputation cost. Your highest value customers most likely have direct access to company
CONTENT
executives and when they feel a disruption, chances are
As mentioned earlier in the paper, altering the content of
they’re not calling into a call center. Executives should be
your communication to specific audiences is key. Resolution
kept in the loop in real time and never caught off-guard by
teams need to have specific details as well as calls to action
in inbound client.
to engage and inform them rapidly. External clients don’t need or want details. They simply want to know that the
Customer-Facing Staff – In addition to call centers and
problem is being addressed and, when possible, what
executives, any customer-facing staff should be kept
to expect next. Don’t confuse transparency with over-
informed of incidents that might impact their customers. This
communication. Too much information can result in lost
might include field staff, account executives, sales staff, etc.
confidence. The following guidelines can help with crafting incident messaging for a variety of audiences:
Downstream
Dominoes
–
In
today’s
inter-connected
enterprise, a single incident may trigger a multitude of
1:
Is the information of value to the recipient?
2: Is the information actionable by the recipient?
3: Does the information help to reduce dissatisfaction
additional incidents. Understanding interdependencies could help prevent or mitigate damage downstream. For example, a power outage might not have an immediate impact outside of restoration teams if a back-up generator kicks in. But if a generator gets below a certain threshold and restoration is not
or confusion? For example, updates at timed
imminent, giving application owners ample time to shut down
intervals can be beneficial in assuring constituents
apps properly could save hours in restart time once the power
that teams are working on the issue, even if there is
is back up and running.
no known additional information.
TIMING
4: Does the information use vernacular familiar to the
The modern business customer - be they internal or
audience? For example, don’t bog down executives
outside - is forgiving yet impatient. Intermingled with their
with fancy tech acronyms and internal process
reliance on digital systems is an acknowledgement that
abbreviations. Use plain language that they can
they will break, but the forgiveness comes packaged with
relay as necessary.
PROACTIVE COMMUNICATIONS DURING MAJOR INCIDENTS
www.xMatters.com | 4
GU I DE
Tactics for Communicating During an Incident When a major incident occurs, time is rarely your friend. The following tactics help you to use communication techniques to actually save time versus wasting precious minutes.
Utilize Message Templates During the heat of an incident, the last thing your teams need to be doing is crafting prose. Message templates can save time by providing messaging and response options. Templates can be rapidly altered during incidents for specifics.
you know they can be fickle with specific needs. Allowing stakeholders to subscribe to communications lets them control frequency of alerts, mode of communication and subject matter on which they want to be informed.
Use Communication to Trigger Collaboration Resolution teams still need to take action quickly. In today’s distributed, heterogeneous environments that often requires collaboration. Something as simple as a conference call for a team representing a dozen skill sets can take well over an hour to assemble. Automating notification to conference bridge recipients (with escalation processes for alternates) can assemble the same team in 3-5 minutes, drastically reducing resolution times.
Integrate When Possible Even better, when communications templates are integrated with monitoring systems, details can be auto-populated into message templates.
Automate When Possible Tight integration between your communication and ticketing & monitoring systems allows for messages to automatically be sent without taking your resolution and ops teams off
Partner with Your Communication Team Once communications leave the walls of your enterprise, it’s fair game. While transparency and honesty are important aspects of communicating to stakeholders, you may want a professional to craft the exact wording. Proactively set up meetings with your client or corporate communications team to create the templates and messages to keep in your arsenal for when a major incident hits.
the tasks of restoring services. Enterprise-grade systems will allow for multiple messages to be triggered for different audiences with different messages.
Synchronicity Synchronize contact information via data sync from systems of record.
Augment basic contact information
with contextual information like on-call schedules, groups and skill sets. During incidents, you are rarely looking for a specific person, but for a role or skill set. For example, you don’t want ‘Joan,’ you want the ‘call center manager on duty’.
Give Stakeholders Control We’ve already covered the requirement of communicating to stakeholders beyond resolution teams. We’ve also let
PROACTIVE COMMUNICATIONS DURING MAJOR INCIDENTS
“T he processes and systems in place will determine how successfully you resolve them. The right technology, proper planning and configuration can produce a turnkey, repeatable process that does not require manual communications.”
www.xMatters.com | 5
WHIT E PAP E R
Communication Goes Both Ways
From resolution teams and IT management to executives,
Individuals impacted by a major incident can be some of
from service desks and call centers to partners and clients,
your best field agents. Use proactive communications as
a major incident sends ripples of impact throughout your
a chance to gather data points if you think it might help
extended organization. Learning how to trigger and tier
with troubleshooting or post mortem. Your customers
proactive communication during times of disruption allows
want to know that you’ll learn from the experience and feel
for resolution teams to focus on resolution, mitigates
confident that the same issue won’t happen again. Including
impact to client-facing teams, and can drastically increase
their input as part of that process gives them a sense of
reputation and stakeholder satisfaction.
partnership in improving the services you provide.
After an Incident: Post Mortem During an incident, your company communicates as events change, which is crucial. But it’s only after an incident is over – without the distraction of firefighting - that one can gain true perspective and start to sort out all of the facts. With this 20/20 hindsight, it’s beneficial to give stakeholders the confidence that you have identified root cause, that any processes are now in place to mitigate similar incidents and reduce risk.3
1. First-call resolution: how important is it, really?, https://www. atlassian.com/help-desk/first-call-resolution, 2014
3. Cloud Outages, Transparency, and Trust,http://redmonk.com/ dberkholz/2015/01/12/cloud-outages-transparency-and-trust, 2015
2. Business Impact of IT Incident Communications, http://info.xmatters. com/rs/alarmpoint/images/xMatters-2015-Survey-Report.pdf, 2015
CONTACTS MO R E ON LI N E RESOUR C ES
https://www.xmatters.com/resources/ INT ERN ATI ON A L +1 925.226.0300 and press 2 U S /C AN TOLL FREE +1 877.XMATTRS (962.8877) EMEA +44 (0) 20 3427 6333 AU ST RA LI A /A PJ SUP P ORT +61-2-8038-5048 opt 2
ABOUT US xMatters’ cloud-based solutions enable any business process or application to trigger two-way communications (text, voice, email, etc.) throughout the extended enterprise during time-sensitive events. With over a decade of experience in rapid communication, xMatters serves more than 1,000 leading global firms to ensure business operations run smoothly and effectively during incidents such as IT failures, product recalls, natural disasters, dynamic staffing, service outages, medical emergencies and supply-chain disruption. xMatters is headquartered in San Ramon, CA with additional offices in London and Sydney. Copyright 2015 xMatters. All rights reserved. All other products and brand names are trademarks or registered trademarks of their respective holders.