Custom Transformers and Parallel Processing

Each FME translation is usually run as a single process on your computer. This means that normally, FME sequentially processes each group of features specified in the Group By parameter. Versions of FME 2012+ can use multiple-core processors, which, on modern personal computers, allows multiple tasks to be performed in parallel. FME also uses hyper-threading, a technology used to make each physical core appear as two logical processors to the host operating system. By splitting the workload between cores/processors, FME performance can improve.

In custom transformers that support this feature, parallel processing lets you run several simultaneous processes. The Group By parameter allows you to assign features to processes. The Parallel Processing parameter allows you to define different levels of processing, from No Parallelism to Extreme.

Task Manager Processes

When parallel processing is enabled, FME processes groups of features in parallel by spawning a new fmeworker.exe instance for each group of features.

In the task manager, the processes are visible as additional fmeworker.exe instances:

FME Translation Log

In the Translation Log, information messages show license limit (if applicable), the request, process memory usage for each "worker", and identifying information about each -WORKER_KEY.

Note When FME spawns an additional process, it needs to send the input features to the new process and receive the output features from the process. This adds additional CPU overhead when compared to single-process mode.

Generating Separate Log Files for Each Process

When Data Caching is off, FME generates separate logs for each process of a translation, in addition to a primary log for the overall translation. These process-specific logs are placed in a folder in the same location as the primary log, as specified in Log Settings.

Parallel Processing Parameter

To use parallel processing, the workflow should have several groups of features that can each be processed independently. Each group becomes a separate (parallel) process. Some grouping techniques are discussed below.

The level of parallelism (how many processes can be executed at a single time) depends on the Parallel Processing parameter, which has five modes:

No Parallelism
Minimal
Moderate
Aggressive
Extreme

Depending on the operation performed, one mode can be more advantageous than another, and Aggressive or Extreme does not always provide the best performance. In some workspaces, parallel processing does not provide any advantage; in other workspaces, minimal or moderate levels of parallelism are the best choice.

To use Parallel Processing on a custom transformer, click the Transformer Parameters in the Navigator pane:

A custom transformer with parallel processing does not have to be limited to a single transformer within it: you can use multiple transformers.

For more information on using parallel processing with custom transformers, see The Safe Software Blog.

Usage Notes

Parallel processing can improve FME performance; but it can also degrade it or have very little effect. When using parallel processes, it is important that the processing (CPU) time for each group is anticipated to be significantly more than the overhead of launching a new process and sending the features back and forth between processes. If this is not the case, then enabling parallel processing will be slower than using no parallelism.

Trying a small subset of your data in multi-processing mode will help you determine whether there is an advantage to using it on an entire dataset.

Many, Small Groups

Parallel processing is not recommended when you have many groups, each with a small number of features. Each group spawns an FME process and that takes time. For example, with 10,000 groups of 10 features, you might find it costs more performance to start and stop FME 10,000 times than you save in parallel processing. Conversely, 10 groups of 10,000 features might be more worthwhile.

Data Volumes

Parallel processing only provides an advantage when data volumes are large enough: for smaller datasets, the overhead of running multiple processes can easily make the translation slower than a single process.

Other System Resources

You need to ensure other system resources, such as memory, are adequate for the task. Firing up eight processes to do heavy polygon dissolving when you have eight cores is fine, but if you only have 2GB of memory, you may actually slow down a translation.

Parallel processing is extremely efficient when the task is being offloaded elsewhere. For example, if you have multiple HTTP requests to make, it might be worth using parallel processing because the impact on system resources is small.

Writing to Disk

When the task involves writing to disk, spawning multiple processes will not speed up the task.

Additional Information and Examples

FME Community: Parallel Processing