Planning for Fault Tolerance
Fault tolerance, or "high availability," is critical to any successful business operation. To ensure that requests are processed in the event of failure, FME Server supports configuring fault tolerance throughout the multiple levels of an integrated system. FME Server provides fault tolerance in the following ways:
- Recovery: Restarting components and jobs when crashes occur. FME Server provides component and job recovery automatically—no additional planning is needed.
- Failover: Ensuring that if a hardware component fails, FME Server remains online.
About Recovery
Component Recovery
FME Server comes out-of-the-box with component recovery. This means that, even on a single system, FME Server monitors and restarts components that fail, including the FME Engines and the FME Server Core. This is achieved through the FME Server Process Monitor. The ability for FME Server to monitor its own components ensures reliable uptime and dependability.
Job Recovery
FME Server also includes the ability to restart a translation (job) when a crash occurs. FME Server continues to resubmit a translation up to a specified number of attempts. As a result, jobs that experience temporary issues, such as a network hiccup, are resubmitted and run again. Job recovery is configurable and can be turned off entirely. For more information, see Job Recovery.
Note: Resubmitted jobs may cause data duplication, such as when writing to database formats.
About Failover
For more information, see this FME Knowledge Center article.