IBM Systems Magazine, Power Systems - February 2018 - 27
categories: workforce shortage, loss of technology, loss of
facilities or failure in the supply
chain. This article discusses loss
of technology and facilities, and
what organizations can do to
prepare ahead of a disaster.
Ideally, operations should
continue or resume quickly after
a disaster. Failure to do so can
mean loss in business and trust.
The recovery methodology should
be reliable, predictable and at a
manageable cost. Critical personal and business data should
be protected and secure throughout the process. A typical recovery
involves a bottom-up approach
with the following steps:
1. Identify the location. When
an outage occurs, management should assess a course
of action and decide where
the recovery will occur. This
could be another location you
own, a third-party facility or a
cloud service provider.
2. Recover the data. Do you
know what data you had at
the time of the disaster? What
data can you recover? How
will you handle the rest?
3. Re-host applications.
The applications need to
be restarted, possibly on
bare metal servers that are
different from what you had
at the primary location. The
recovery may include OSes
and device drivers unique to
the new equipment. The use
of VMs might help hide some
of those differences, but now
isn't the time to figure out a
4. Human success factors.
Don't just focus on the
technological aspects. Also,
look at what I call my "five
Cs:" Command and control; communication and
and network connectivity;
contingency; and counseling.
Your staff may be scattered,
with some at the primary
location, at the DR facility or
somewhere else altogether.
Two metrics are used to measure DR: recovery point objective
(RPO) and recovery time objective
(RTO). The technology employed
determines the RPO. This is the
time from when the data was
backed up to the time the disaster
happened. Backing up to tapes
once per day represents a 24-hour
RPO. Mirroring data to flash and
disk located at the disaster facility
reduces this down to seconds.
The automation employed
determines the RTO. This is the
time from when the disaster happened, to the time your business
process is operational again. This
includes the time for management
to assess the situation, recover
the data, re-host the applications
and correct any partial or incomplete transactions. Depending on
how manual or automated your
recovery is, this can be measured
in days, hours or minutes.
Plan of Action
While the recovery itself is
bottom-up, the BC/DR plan itself
should be developed top-down.
Focus first on the business
process as a unit of recovery.
Let's take payroll as an example.
ibmsystemsmag.com FEBRUARY 2018 // 27