Planning for Disaster Recovery
Disaster recovery is primarily concerned with recovering FME Flow operations and data in the event of a major failure of a data center. The general concept of disaster recovery is that if one data center fails, the second data center takes over, and the FME Flow Core located there becomes the 'active' Core. The time frame for disaster recovery is typically longer than fault-tolerant recovery. Disaster recovery may range from minutes, hours, or even days, while fault-tolerant recovery is typically in seconds to minutes.
Disaster recovery can be incorporated into a fault-tolerant architecture. Alternatively, if you are primarily concerned with disaster recovery, and less concerned about the fast recovery provided by fault tolerance, you may want to implement a different architecture.
- Improper timing of FME Flow Schedule triggers.
- Inconsistent or misleading timestamps in log files (accessed from Resources).
This example of disaster recovery is a combination of an Express installation and a fault-tolerant installation. However, instead of a third-party load balancer between systems, FME Flow clients must be manually redirected to the FME Flow Core host server of the second data center in the event of a disaster. Each data center houses full ("Express") installations of FME Flow, essentially configured to provide similar functionality. To ensure synchronicity of FME Flow system data between data centers, Backup & Restore operations are performed regularly. (Otherwise, workspaces and other updates must be published twice - to the FME Flow Core hosts on each data center.)
                                                     
                                                
Keep in mind that when planning for disaster recovery, all clients of FME Flow, including web browsers, the FME Flow CLI, and the FME Flow REST API, must connect to the "active" FME Flow Core host.