Even with the best maintenance practices, data might become corrupted, causing interruption to Operations Manager 2007 functionality and loss of data. There are various causes for failure. Some of the most common causes can be classified as:
- Hardware failure, such as the storage system impacting data
availability or integrity
- Security breach or virus infection
- Accidental deletion or corruption of Active Directory Domain
Services (AD DS) security information, such as accounts or
groups
- Physical disaster
This section describes various Operations Manager failure scenarios, and how to restore Operations Manager components to resume services.
Impact of Failure in Operations Manager 2007
Various Operations Manager servers and components can potentially fail, impacting the Operations Manager functionality. The amount of data and functionality lost during a failure is different in each failure scenario. It depends on the role of the failing component, on the Operations Manager deployment, on the length of time it takes to restore the failing component, and on the availability of back ups.
Reduce the Impact of Failure
The effects of some server failures can be reduced significantly by adding redundancy or implementing a failover solution, such as clustering. Also, in that case, the urgency of restoration is greatly reduced.
The following list includes configuration options that add redundancy and clustering to the Operations Manager deployment. Implementing any of those options will reduce the impact of failure, and contribute to high availability of Operations Manager in your organization:
- Add Management Servers
- Install the Root Management Server into a Microsoft Cluster
service failover cluster
- Store databases in a Microsoft Clustering service failover
cluster
- Configure gateway servers for failover
- Configure log shipping
- Configure cross Management Group failover
Each option is further described in Operations Manager 2007 Operations Guide. For further information about deployment options that help ensure high availability and help reduce the impact of failure, see the Operations Manager 2007 Deployment Guide.
Failure Recovery Scenarios for the Root Management Server
This topic describes three failure recovery scenarios for Root Management Servers:
- If the Root Management Server fails, promote another management
server to be the Root Management Server.
- If the Root Management Server fails, promote another management
server to be the Root Management Server. After the original Root
Management Server is available again, you have the option of
promoting it back to being the Root Management Server.
- If the Root Management Server fails, promote another management
server to be the Root Management Server. In the future, you can set
up a new management server and promote it to be the Root Management
Server.
Each scenario is further described in Operations Manager 2007 Operations Guide
High Level Restore Guidelines in Operations Manager 2007
See Table: General Restore Steps in Operations Manager 2007 Operations Guidefor information about possible failure scenarios, and the general steps required to restore operations and data in that scenario.