Planning for Disaster Recovery

Disaster recovery is primarily concerned with recovering FME Server operations and data in the event of a major failure of a data center. The general concept of disaster recovery is that if one data center fails, the second data center takes over, and the FME Server Core located there becomes the 'active' Core. The time frame for disaster recovery is typically longer than fault-tolerant recovery. Disaster recovery may range from minutes, hours, or even days, while fault-tolerant recovery is typically in seconds to minutes.

Disaster recovery can be incorporated into a fault-tolerant architecture. Alternatively, if you are primarily concerned with disaster recovery, and less concerned about the fast recovery provided by fault tolerance, you may want to implement a different architecture.

WARNING: We recommend installing all FME Servers on systems that are synchronized to the same time zone. If time zones differ across FME Servers, unexpected issues may arise, including:
Improper timing of FME Server Schedule triggers.
Inconsistent or misleading timestamps in log files (accessed from Resources).

This example of disaster recovery is a combination of an Express installation and a fault-tolerant installation. However, instead of a third-party load balancer between systems, FME Server clients must be manually redirected to the FME Server Core host server of the second data center in the event of a disaster. Each data center houses full ("Express") installations of FME Server, essentially configured to provide similar functionality. To ensure synchronicity of FME Server system data between data centers, Backup & Restore operations are performed regularly. (Otherwise, workspaces and other updates must be published twice - to the FME Server Core hosts on each data center.)

Keep in mind that when planning for disaster recovery, all clients of FME Server, including web browsers, the FME Server Console, and the FME Server REST API, must connect to the "active" FME Server Core host.