FME Server includes a number of failover mechanisms that allow various components to connect to backup components in the event of failure. However, FME Server administrators must understand that a standard FME Server cluster does not provide complete fault tolerance. Configuring a fault tolerant cluster requires the following components:
Note: To ensure failover capability of any source data files or databases used in your FME workspaces, the following additional components are required:
The following diagram shows the basic structure of FME Server failover.
When the Core A machine fails, clients connecting to Core A failover to Core B. Because the failover core is now the “active” core of the cluster, clients subsequently always connect to the failover core so that processing continues. When a heartbeat failure is detected, this failover core takes over jobs and schedules.
Note: Any jobs currently running when the active core fails restart automatically on the failover core.
When the Core A machine and associated Engines are restored, Core B remains the Active Core, while Core A is now online as the failover core.
Note: Notification Service publishers (including UDP, Email and JMS clients) do not failover. These clients must be manually reconfigured to connect to the active core.
Note: To limit hardware costs, two instances of Engine 1 can be on the same machine, with one instance connected to the Primary host and the other to the Failover host. See Configuring FME Engines for Failover below.
The first FME Server Core that starts up in a fault tolerant cluster automatically becomes the primary “active” core. The second FME Server Core to start up becomes the failover core. When configured, the fault tolerant cluster automatically reconfigures itself depending on the state of the system.
Prior to installation, you must create and share the directory where the common FME Server files will exist (repository directory), so that each FME Server Core can access it (read and write). The repository directory should be on its own redundant file server, not the same machine as the FME Server Core installation. Then, during installation, provide the UNC/mount path to the repository directory, ensuring that you use the same directory for each FME Server Core installation.
When you bring the primary server back online, it is not necessary to shutdown the secondary server unless you want clients to fail back to the primary server. You might want to fail back if the primary server is a more capable system than the failover server.
You can configure individual FME Server services to switch to a backup FME Server Core when the connection to the primary FME Server Core is lost. This capability is called "service failover".
When a service loses its connection to the primary FME Server Core, the service attempts to connect to a backup FME Server Core you define. In a fault tolerant cluster, a primary FME Server Core can only have one backup FME Server Core.
We recommend configuring redundant FME Engines for both the primary FME Server Core and the secondary FME Server Core. If you wish to limit the number of machines used, you can start up FME Engines with the same name on the same machine, with one engine connected to the primary FME Server Core and the other connected to the failover FME Server Core.