The internal dynamics of fault tolerance and load balancing are applied during interactions between various Microsoft Provisioning System (MPS) components, which include the MPF Client, Provisioning Engines, Configuration database, Audit and Recovery Service, Transaction logs, and the Provisioning Manager.
MPF Client functions
The MPF Client component of MPS is responsible for submitting XML requests to the Provisioning Engine. The MPF Client is a COM object that can be invoked by the MPS .NET Client Wrapper residing on a front-end application such as an ASP.NET Web service. The MPS .NET Client Wrapper allows a .NET application to generate base XML requests that are submitted to the MPF Client. When requests are submitted, they can be processed by the Provisioning Engine in real time, or requests can be queued within MPS. Multiple MPF Clients can coexist in an MPS deployment.
The MPF Client is stateless because it does not maintain the details of the processes in which it participates. It only provides a method to accept parameters passed by the request, such as an XML string and other values, and submits the request to the Provisioning Engine. When provisioning tasks are complete, the method then returns results to the caller in XML format.
The MPF Client obtains a list of Provisioning Engines from the Configuration database that are available to accept requests. The client then load-balances the request based on a round-robin algorithm between the different engines. This algorithm causes the MPF Client to give preference to the Provisioning Engine with the fewest open requests.
If the MPF Client attempts to submit a request to a Provisioning Engine that has failed or is unavailable because the server or network is down, the client then receives an error message indicating that the engine should be block listed. When this occurs, the MPF Client block lists the engine in the MPS Configuration database so that no new incoming requests are sent to the server hosting the engine. The MPF Client then selects a different engine for submitting requests. Meanwhile, a background thread tries to re-establish the connection with the server. If it is successful, the MPF Client then removes that server from the block list in the Configuration database.
IIS Crash
If a transaction is performing properly in MPS but before completing the transaction, and the Internet Information Services (IIS) application crashes because of, for example, a server reboot, then the transaction is unaffected because the failure is outside the realm of MPS. Therefore, the transaction completes successfully and does not roll back.
Provisioning Engine functions
When a Provisioning Engine starts up, it gathers topology information containing Transaction log and other server locations from the Configuration database and then registers existing Namespace configurations and associated stored procedures. At startup, it also retrieves the Provisioning Engine properties set by the Provisioning Manager.
Configuration and namespace information is cached within the registry and a master copy is stored in the Configuration database. If a failure occurs when a Provisioning Engine attempts to connect with the Configuration database, the cached copy is then used. In addition, whenever there are configuration changes within the Configuration database, the changes are instantly pushed out to the affected MPS components.
When a Provisioning Engine accepts a request, the request is bound to the procedure called by the request. The tasks specified by the procedure are then determined and stored within a Transaction log. Each Provisioning Engine communicates with a Transaction log, which is a Microsoft SQL Server database. For asynchronous requests, the requests are also stored in the transaction log for queuing purposes. The queue manager is responsible for initiating the processing of queued requests.
Typical server configurations include multiple coexisting Provisioning Engines for support of load balancing and single point of failure compensation.
Provisioning Engine Crash
If a Provisioning Engine crashes during a provisioning transaction, these actions occur:
- The Audit and Recovery Service begins to see a number of
orphaned transactions and marks them for rollback in the status
column of the Request Table for the transaction.
- The Audit and Recover Service selects an available Provisioning
Engine and sends it a request, through a Distributed Component
Object Model (DCOM) call, to roll back a set of transactions that
were being handled by the failed Provisioning Engine.
- The Provisioning Engine marks all of the transactions to
identify it as the new owner and proceeds to roll back all of the
transactions.
- The Provisioning Engine updates the heartbeat.
Database functions
In MPS, all components connect to the Configuration database, including the MPF Client, Provisioning Engine, and the Audit and Recovery Service. When a new MPS component is installed on a server, the Configuration database is updated to reflect that a new component is available for service.
Whenever you install a Transaction log database on a computer running SQL Server, an associated database for the MPS Audit and Recovery Service is also installed. Every Transaction log must have an associated MPS Audit and Recovery Service. Transaction logs store transaction data and state. Provisioning engines load-balance across Transaction logs using a round-robin algorithm.
In addition, all requests submitted to MPS are assigned a ClientTransactionID attribute under the ClientContext property in the Transaction log. If you need to verify whether a transaction succeeded or not, you can use transaction IDs to obtain the status of any transaction from the Transaction log database.
In addition, all configuration data is stored by the Configuration database in a dual-redundant configuration (clustered on MPSSQL02 and MPSSQL03). Additional factors that mitigate the single point of failure for either cluster node include the following:
- All MPS components, including the MPF Client, Provisioning
Engine, and Audit and Recovery Service, store their associated data
in the registry or in an in-memory data store.
- All transactions on which a Provisioning Engine is currently
working are stored in memory. After each stage of a transaction is
performed, the engine updates its local in-memory data store and
then updates the appropriate Transaction log.
While running a transaction, if the SQL Server database containing the Transaction log becomes unavailable because of a failed network connection or power outage, the following steps are taken to ensure fault tolerance:
- The Provisioning Engine recognizes that it cannot complete the
transaction because the Transaction log is unavailable.
- MPS rolls back all steps in the current transaction. MPS is
able to do this because the Provisioning Engine maintains a local
in-memory copy of all transactions upon which it is currently
working.
Audit and Recovery Service functions
The Audit and Recovery Service, sometimes referred to as "the Listener," is responsible for monitoring the Provisioning Engines and moving transactions marked as failed to the Auditing database. The Audit and Recovery Service determines if a transaction has failed by monitoring for orphaned transactions at a specific heartbeat interval. In addition, if the Provisioning Engine handling the transaction fails, the Audit and Recovery Service advises another available Provisioning Engine to rollback the transaction that was running at the time of the failure.
You can use a configuration of multiple coexisting Audit and Recovery Services; however, only one will have the primary role. In this configuration, if the primary Audit and Recovery Service fails, another available Audit and Recovery Service replaces it.
In a non-clustered environment, there is a one-to-one mapping between the Transaction log and Audit and Recovery Service. In a clustered environment, one Audit and Recovery Service services all Transaction logs within the cluster.
Transaction Rollback Security
If rollback is initiated during an active transaction, the rollback uses the security context associated with the original task. If the rollback is initiated because of a hardware failure, MPS will roll back the transaction using the MPFServiceAcct as the security context.
Provisioning Manager functions
The Provisioning Manager has knowledge of all MPS components that are currently registered in the Configuration database. When using Provisioning Manager, the data that you view is being retrieved from the Configuration database. When you make a change to the configuration of MPS components using the Provisioning Manager, these actions occur:
- Provisioning Manager submits a Windows Management
Instrumentation (WMI) call to each server where registered
components are located, including clients, engines, and the Audit
and Recovery Service, and then updates the registries.
- Because components are registered to receive events, they are
notified when their registry keys are modified.
- Components connect to the MPS Configuration Database and gather
the latest data.
Orphaned Transactions
When a failure occurs and one or more transactions are orphaned, the Audit and Recovery Service becomes aware of these orphaned items in this manner:
- At five-minute intervals, all of the Provisioning Engines
update the heartbeat data for each transaction in progress by
applying a date/time stamp.
- The Audit and Recovery Service looks for transactions with a
heartbeat that is older than five minutes. If any are found, they
are identified as orphaned transactions.