.
.
.
.
webmail
Services IProtecht Steps to good BCP

Steps to good BCP

There are five steps to good BCP

  • Analysis
  • Solution Design
  • Implementation
  • Testing and organisation acceptance
  • Maintenance

We at IProtecht.net have designed and implemented IT solutions which go to the heart of each of the above.

Analysis

An impact analysis divides between critical/urgent and non-critical/non-urgent functions/activities. Critical if the implications of damage to that function are unacceptable, even if failure is only a few seconds. IProtecht.net is about critical C&S and other MNF network functionality. Perceptions of the acceptability of disruption are usually gauged by the cost of establishing and maintaining appropriate business or technical recovery solutions. IProtecht.net is designed to lower the what have been very costly solutions used by especially C&S.

Also, a function may also be considered critical if dictated by law. IProtecht.net works with government departments so as to meet such legal/governmental requirements.

For each network critical function two values are assigned:

  • Recovery Point Objective (RPO) - the acceptable latency (delay) of data transmission that will be recovered
  • Recovery Time Objective (RTO) - the acceptable amount of time to fully restore all functions.

The Recovery Point Objective ensures that the Maximum Tolerable Data Loss for each activity is not exceeded. The Recovery Time Objective ensures that the Maximum Tolerable Period of Disruption for each activity is not exceeded.

IProtecht's impact analysis in cooperation with your BCP managers provides the recovery requirements for each critical network function, in short, a Service Level Agreement (SLA) that forms the basis of any BCP. Recovery requirements consist of the following information:

  • The business requirements for recovery of the critical function, and/or
  • The technical requirements for recovery of the critical function.

Threat analysis

Our 17 years of experience in intra and internet network provision and maintenance have highlighted the common threats to data/voice network structures; in order of likelihood:

  • Power failure
  • Upstream provider maintenance/upgrade/failure
  • Inhouse human error
  • Cyber attack
  • Outside contractor work causing damage
  • Extreme weather
  • Provider partner business failure
  • Earthquake

All threats in the examples cause potentially fatal damage to any business reliant on data and voice network integrity and that's why IProtecht.net is designed and configured to meet the challenges to your business in a networked world, addressing risk at the highest level rather than planning the piecemeal approach of smaller scale problems and this because our experience has shown that what at first looks like a minor problem is often a symptom and warning of high risk exposure to one or more of the above disasters. After all, your business may not survive even one event of the above type so relying on learning from experience is not an option and neither is spending a lot of money for specialty connectivity in multiple localities.

After the completion of the analysis phase the business and technical plan requirements are documented in order to commence the implementation phase. Our network risk self-analysis tool online at this site will be of great assistance to consciousness raising in respect of your business' existing asset management program and/or lack thereof and thereby facilitate quick identification of the need to acquire/upgrade auto-fallover resources. For an IP/T intensive business the plan requirements need to cover the following elements which may be classed as ICE (In Case of Emergency):

  • Equipment/links inhouse core network and power supply
  • Auto-fallover network structures inhouse and via the internet to domestic and foreign alternative network entry points (loops)
  • Staff needed during an adverse event along with their contact and availability details
  • The server/software applications and their data availability from backup locations along with backup peripherals

Solution design

Identify the most cost effective disaster recovery solution that meets two main requirements from the impact analysis stage.

1 The minimum application and application data requirements.

2 The time frame in which the minimum application and application data must be available.

3 Economies of scale of centralised larger systems verse risk reduction of distributed systems.

Disaster recovery plans may also be required outside IT applications, for example preserving paper documents and/or restoration of core operating-systems software and peripherals. This BCP phase overlaps with Disaster recovery planning. Solution design requires that the very best information about the company is available, namely:

  • the crisis management command structure
  • the location and specs of a secondary work site(s) where that has been decided as necessary, this may include work-from-home capability
  • data link architecture between primary and secondary work sites
  • data replication methodology between primary and secondary work sites
  • the application and software required at the secondary work site, and
  • the type of physical data requirements at the secondary work site.

Implementation

It is essential that paper documented manuals are written, distributed to staff and a seminar conducted wherein the manual is explained and roles assigned. Then…

Work package testing may take place during the implementation of the solution, however; work package testing does not take the place of organisational testing.

Testing and organisational acceptance.

The purpose of testing is to achieve organisational acceptance that the business continuity solution satisfies the organisation's recovery requirements. Plans may fail to meet expectations due to insufficient or inaccurate recovery requirements, solution design flaws, or solution implementation errors. Such information is vital for the integrity of BCP in that any weakness will be magnified during an adverse event, exactly when auto-backup must work faultlessly. At a minimum testing includes:

  • UPS, generator power systems integrity
  • Data circuit auto-fallover to loop path alternative(s)
  • Crisis command team call-out testing
  • Technical swing test involving all staff members in adaptation of move from primary to secondary work locations and vice versa
  • Core operating systems and peripheral restoration within chosen down-time frame.

At a minimum, power failure testing is fortnightly while remaining aspects above are tested on a biannual or at very least an annual schedule. Problems identified in the initial testing phase may be rolled up into the maintenance phase and retested during the next test cycle.

Maintenance

The maintenance manual is broken down into three periodic activities.

The first activity is the confirmation of information in the manual, roll out to ALL staff for awareness and specific training for individuals whose roles are identified as critical in response and recovery. The second activity is the testing and verification of technical solutions established for recovery operations. The third activity is the testing and verification of documented organisation recovery procedures. A biannual or annual maintenance cycle is typical.

Information update and testing

All organisations change over time, therefore a BCP manual must change to stay relevant to the organisation. Once data accuracy is verified, normally a call tree test is conducted to evaluate the notification plan's efficiency as well as the accuracy of the contact data. Some types of changes that should be identified and updated in the manual include:

  • Technical and/or software/server innovation of any sort
  • Changes in upstream/downstream supplier routes/networks
  • Staffing changes
  • Changes to important clients and their contact details
  • Changes to important vendors/suppliers and their contact details
  • Internal changes like new, closed or fundamentally changed sections

Testing and verification of technical solutions

As a part of ongoing maintenance, any specialised technical deployments must be checked for functionality. Some checks include:

  • Availability of new technology from existing suppliers, upgrades
  • Check shelved equipment for purposes of redundancy is still available
  • Virus definition distribution
  • Application security and service patch distribution
  • Hardware operability check
  • Application operability check
  • Data verification

Testing and verification of organisation recovery procedures

As work processes change over time, the previously documented recovery procedures may no longer be suitable. Some checks include:

  • Are all work processes for critical functions documented?
  • Have the systems used in the execution of critical functions changed?
  • Are the documented work checklists meaningful and accurate for staff?
  • Do the documented work process recovery tasks and supporting disaster recovery infrastructure allow staff to recover within the predetermined recovery time objective.

Treatment of test failures

As suggested by the diagram included in this article, there is a direct relationship between the test and maintenance phases and the impact phase. When establishing a BCP manual for recovery of infrastructure from scratch, issues found during the testing phase often must be reintroduced to the analysis phase.

The fact is though that for S&C functionality recovery is not an option, continuity via network integrity even in disaster events is of the essence to business continuity.

Contact Us

  • Street: PO Box 6594
  • Suburb: Auckland
  • Zip Code: 1141
  • Country: New Zealand

  • Phone: +64 9 880 1450
  • Mobile: +64 21 770 173
get in touch