Manage Replication and Replay

Managing replication in a cluster continuous replication (CCR) environment involves the following main activities:

Handling failovers when replication is halted.
Halting replication to storage group copies.
Restarting replication to storage group copies.

Handling Failovers When Replication Is Halted

Halting replication stops all propagation of the changes from the active storage group to the copy for the period of the suspension. Should a failover happen during that time, the storage group copy will not have the latest changes. Depending on the volume of change that has occurred on the active node, the lack of the latest changes is likely to prevent the system from mounting the copy on the passive computer. Thus, you can either use the available version of the storage group on the passive node or wait until the original server recovers. It is important to minimize the time that the replication is halted to minimize this exposure. If you don't mount the version of the data on the passive node when the original computer becomes available, the replication system will copy the missing logs and automatically mount the copy of the database on the new active node.

A failover that occurs after replication is resumed could occur when the passive copy is still missing logs or after it has all the logs, but before they have been replayed. If the logs are copied, but not replayed, a failover will trigger the replay of the outstanding logs into the database. Thus, this storage group will experience an extended recovery time as part of the failover, although other storage groups will not be affected. However, if enough logs are available to meet the configured automatic mount criteria, the system will eventually mount the database with the latest available data. There is one risk to this process: One of the logs to be replayed could be corrupted and not permit successful replay. In this case, the replay will result in an error and all further replay will be blocked. When this happens, the storage group copy will go into an error state referred to as Failed. In this error state, you may be able to recover using the version of the database up to that point. Otherwise, you will need to wait until the original server becomes available and the non-corrupted log is copied again.

Halting and Restarting Replication to Storage Group Copies

It may occasionally be necessary to control the activities of the CCR copy. It may be necessary to halt and restart replication activity. Replication is controlled at the storage group level. Because a storage group can contain only one database, replication is localized to one database.

Replication occurs when both nodes in the cluster are operational, the Microsoft Exchange Replication Service is running on the target node, and the storage group copy has copying enabled. If either the source or target location for CCR becomes unavailable, you must stop replication. In addition, some CCR administration tasks, such as seeding, performing an integrity check, or storage reconfiguration require a storage group copy to have its replication halted. If you need to stop all access to the target's log files and log directory, you must halt replication.

Exchange Server 2007 requires that all replication activity be halted when the location of the storage group or database is being changed.

For more information about managing CCR, see Microsoft Exchange document Managing Cluster Continuous Replication.