The internal dynamics of fault tolerance and load balancing are applied during interactions between various Microsoft Provisioning System (MPS) components, which include the MPF Client, Provisioning Engines, Configuration database, Audit and Recovery Service, Transaction logs, and the Provisioning Manager.

MPF Client functions

The MPF Client component of MPS is responsible for submitting XML requests to the Provisioning Engine. The MPF Client is a COM object that can be invoked by the MPS .NET Client Wrapper residing on a front-end application such as an ASP.NET Web service. The MPS .NET Client Wrapper allows a .NET application to generate base XML requests that are submitted to the MPF Client. When requests are submitted, they can be processed by the Provisioning Engine in real time, or requests can be queued within MPS. Multiple MPF Clients can coexist in an MPS deployment.

The MPF Client is stateless because it does not maintain the details of the processes in which it participates. It only provides a method to accept parameters passed by the request, such as an XML string and other values, and submits the request to the Provisioning Engine. When provisioning tasks are complete, the method then returns results to the caller in XML format.

The MPF Client obtains a list of Provisioning Engines from the Configuration database that are available to accept requests. The client then load-balances the request based on a round-robin algorithm between the different engines. This algorithm causes the MPF Client to give preference to the Provisioning Engine with the fewest open requests.

If the MPF Client attempts to submit a request to a Provisioning Engine that has failed or is unavailable because the server or network is down, the client then receives an error message indicating that the engine should be block listed. When this occurs, the MPF Client block lists the engine in the MPS Configuration database so that no new incoming requests are sent to the server hosting the engine. The MPF Client then selects a different engine for submitting requests. Meanwhile, a background thread tries to re-establish the connection with the server. If it is successful, the MPF Client then removes that server from the block list in the Configuration database.

IIS Crash

If a transaction is performing properly in MPS but before completing the transaction, and the Internet Information Services (IIS) application crashes because of, for example, a server reboot, then the transaction is unaffected because the failure is outside the realm of MPS. Therefore, the transaction completes successfully and does not roll back.

Provisioning Engine functions

When a Provisioning Engine starts up, it gathers topology information containing Transaction log and other server locations from the Configuration database and then registers existing Namespace configurations and associated stored procedures. At startup, it also retrieves the Provisioning Engine properties set by the Provisioning Manager.

Configuration and namespace information is cached within the registry and a master copy is stored in the Configuration database. If a failure occurs when a Provisioning Engine attempts to connect with the Configuration database, the cached copy is then used. In addition, whenever there are configuration changes within the Configuration database, the changes are instantly pushed out to the affected MPS components.

When a Provisioning Engine accepts a request, the request is bound to the procedure called by the request. The tasks specified by the procedure are then determined and stored within a Transaction log. Each Provisioning Engine communicates with a Transaction log, which is a Microsoft SQL Server database. For asynchronous requests, the requests are also stored in the transaction log for queuing purposes. The queue manager is responsible for initiating the processing of queued requests.

Typical server configurations include multiple coexisting Provisioning Engines for support of load balancing and single point of failure compensation.

Provisioning Engine Crash

If a Provisioning Engine crashes during a provisioning transaction, these actions occur:

  1. The Audit and Recovery Service begins to see a number of orphaned transactions and marks them for rollback in the status column of the Request Table for the transaction.
  2. The Audit and Recover Service selects an available Provisioning Engine and sends it a request, through a Distributed Component Object Model (DCOM) call, to roll back a set of transactions that were being handled by the failed Provisioning Engine.
  3. The Provisioning Engine marks all of the transactions to identify it as the new owner and proceeds to roll back all of the transactions.
  4. The Provisioning Engine updates the heartbeat.

Database functions

In MPS, all components connect to the Configuration database, including the MPF Client, Provisioning Engine, and the Audit and Recovery Service. When a new MPS component is installed on a server, the Configuration database is updated to reflect that a new component is available for service.

Whenever you install a Transaction log database on a computer running SQL Server, an associated database for the MPS Audit and Recovery Service is also installed. Every Transaction log must have an associated MPS Audit and Recovery Service. Transaction logs store transaction data and state. Provisioning engines load-balance across Transaction logs using a round-robin algorithm.

In addition, all requests submitted to MPS are assigned a ClientTransactionID attribute under the ClientContext property in the Transaction log. If you need to verify whether a transaction succeeded or not, you can use transaction IDs to obtain the status of any transaction from the Transaction log database.

In addition, all configuration data is stored by the Configuration database in a dual-redundant configuration (clustered on MPSSQL02 and MPSSQL03). Additional factors that mitigate the single point of failure for either cluster node include the following:

  • All MPS components, including the MPF Client, Provisioning Engine, and Audit and Recovery Service, store their associated data in the registry or in an in-memory data store.
  • All transactions on which a Provisioning Engine is currently working are stored in memory. After each stage of a transaction is performed, the engine updates its local in-memory data store and then updates the appropriate Transaction log.

While running a transaction, if the SQL Server database containing the Transaction log becomes unavailable because of a failed network connection or power outage, the following steps are taken to ensure fault tolerance:

  1. The Provisioning Engine recognizes that it cannot complete the transaction because the Transaction log is unavailable.
  2. MPS rolls back all steps in the current transaction. MPS is able to do this because the Provisioning Engine maintains a local in-memory copy of all transactions upon which it is currently working.

Audit and Recovery Service functions

The Audit and Recovery Service, sometimes referred to as "the Listener," is responsible for monitoring the Provisioning Engines and moving transactions marked as failed to the Auditing database. The Audit and Recovery Service determines if a transaction has failed by monitoring for orphaned transactions at a specific heartbeat interval. In addition, if the Provisioning Engine handling the transaction fails, the Audit and Recovery Service advises another available Provisioning Engine to rollback the transaction that was running at the time of the failure.

You can use a configuration of multiple coexisting Audit and Recovery Services; however, only one will have the primary role. In this configuration, if the primary Audit and Recovery Service fails, another available Audit and Recovery Service replaces it.

In a non-clustered environment, there is a one-to-one mapping between the Transaction log and Audit and Recovery Service. In a clustered environment, one Audit and Recovery Service services all Transaction logs within the cluster.

Transaction Rollback Security

If rollback is initiated during an active transaction, the rollback uses the security context associated with the original task. If the rollback is initiated because of a hardware failure, MPS will roll back the transaction using the MPFServiceAcct as the security context.

Provisioning Manager functions

The Provisioning Manager has knowledge of all MPS components that are currently registered in the Configuration database. When using Provisioning Manager, the data that you view is being retrieved from the Configuration database. When you make a change to the configuration of MPS components using the Provisioning Manager, these actions occur:

  1. Provisioning Manager submits a Windows Management Instrumentation (WMI) call to each server where registered components are located, including clients, engines, and the Audit and Recovery Service, and then updates the registries.
  2. Because components are registered to receive events, they are notified when their registry keys are modified.
  3. Components connect to the MPS Configuration Database and gather the latest data.

Orphaned Transactions

When a failure occurs and one or more transactions are orphaned, the Audit and Recovery Service becomes aware of these orphaned items in this manner:

  • At five-minute intervals, all of the Provisioning Engines update the heartbeat data for each transaction in progress by applying a date/time stamp.
  • The Audit and Recovery Service looks for transactions with a heartbeat that is older than five minutes. If any are found, they are identified as orphaned transactions.