RCaller
Executes an R script that has the ability to access feature data from a temporary R data frame. Input data is set up in the form of tables that will become R data frames. R data frames are tables similar to those of a relational database that support columns of varying types. More information on R data frames can be found at:
http://www.r-tutor.com/r-introduction/data-frame
This transformer requires that the system has R and the sqldf package installed in order to run. See Installing R in the Usage Notes section.
Any number of input data frames can be created, and each will be assigned an input port. Any features can be routed to that input port as long as they supply values for each column defined for the table. The R Script can involve any and all data frames and columns defined in the input. Output is taken from the fmeOutput data frame that the user can populate with the results of statistical analysis on any of the input tables.
Any number of input ports can be created either by connecting to the Connect Input port or by editing the transformer properties and manually adding new inputs or by importing port definitions from existing feature types. Once imported the table definitions will not automatically change as their source changes, in the event an attribute name is changed upstream the name of the corresponding table column will need to be manually adjusted in the table parameters. Users will need to manually expose the output attributes which will be imported from the column names of the fmeOutput data frame at runtime.
The success of the translation relies on the user supplying a valid R Script that adheres to proper R syntax. A guide on the R Language is listed below:
https://cran.r-project.org/doc/manuals/r-release/R-lang.html
To learn more about how to use R and to get ideas for different types of statistical analysis that may be possible, the following links are recommended:
http://www.r-bloggers.com/how-to-learn-r-2/
http://www.r-tutor.com/r-introduction
Parameters
Inputs
The RCaller requires definition of one or more Tables, which will become input ports to the transformer. The Import… button provides a quick way to populate the input table definitions from the source feature types in the workspace.
Note: For performance reasons, you should define as few columns as possible.
Note that certain FME attribute and table names may not be valid in R as data frame or column names (notably attribute names starting with an underscore "_"). To avoid issues, these names will be converted to valid R names. The adjusted names will be shown in the Data frames section on the left of the script editor.
R Script
The RCaller has a single output port. Attributes created in the script need to be entered by the user in the Attributes to Expose parameter in order to have these appear in subsequent transformers or the data inspector table view. The attributes set on the feature are determined by the columns set on the fmeOutput data frame at runtime. A helpful editor is used to construct the R Script, and provides convenient drag and drop access to data frames, columns, and published and private parameters which can be used within the script.
The number of features output will depend on the length of the largest column in the fmeOutput data frame. In this way the RCaller can be used to output a single value, or a list, or matrix of values.
A note on table contents: The data frames defined for the input ports require the attributes of the source features against which the queries will be performed. They do not have to – and should not – contain additional attributes.
A note on features routed to input ports: Features routed to input ports should have attributes on them which match the schema defined for the input port data frame. If they do not, null values will be inserted in place of missing attributes for the columns defined for the input table. An upstream AttributeRenamer or NullAttributeMapper can be used to ensure that attribute values are present for defined columns.
Example
Sample R script to calculate the mean and median of a list of numbers
list<-c(1,2,2,2,2,3,3,3)
fmeOutput<-data.frame(mean=mean(list), median=median(list))
Note in the above example you can assign list to the value of a numeric input table column (e.g., list<-myInput$testColumn)
The output of this script will be a single feature with outputs mean and median
Extending the above example to work with values from a dataset
Imagine that you have source tree data from a nearby park, including a tree trunk diameter attribute, Diameter.
If the dataset name is Trees, when you connect the dataset to the RCaller, an input port Trees will be created.
Ensure in the Columns section of the Inputs that Diameter is a numeric datatype.
In the R Script section, specifying fmeOutput<-data.frame(MeanDiameter=mean(Trees$Diameter)) will calculate the mean diameter of the trees.
Specifying MeanDiameter in the Attributes to Expose parameter will expose the mean diameter attribute, making it possible to use it later on in the workflow.
Usage Notes
Installing R
To use this transformer, you must install both R and the sqldf package.
Installing the R Interpreter
Download R Installers from:
Windows:
Install R following the usual Windows installation steps.
For more information, please see https://cran.r-project.org/bin/windows/base/README.R-3.2.3.
Mac and Linux:
For installing R on Mac OS X, either you can either download the latest R image from:
Or, alternatively use the homebrew, package manager command:
brew install R
This requires the homebrew package manager to be installed on the system. Homebrew can be downloaded from:
For installing R on Linux, the use of a package manager is recommended.
Installing the sqldf package
- Open an R command prompt.
- Windows: Run the R GUI as an administrator by right-clicking it in the Start menu and selecting “Run as administrator…” You should use the version that matches your FME version - 64-bit or 32-bit.
- Mac: Launch the R Console.
- Run the following command at the R command prompt:
install.packages("sqldf") - This will launch a window prompting you to select a download mirror. Once a mirror is selected, the sqldf package will be installed to the system-wide R library. It is important that this is done with administrative privileges, otherwise the package will be installed to a user library and FME will not be able to use it.
- To verify that sqldf was installed correctly, check the location listed when you run
.libPaths()
at the command line. There should be a folder called “sqldf”.
Using Shared Resources
Optionally, you can place your R libraries in a shared resources folder. The location of this folder is set in Workbench under Tools > FME Options > Default Paths > Shared FME Folders.
In Windows, the R folder is located by default in Documents > FME > Plugins.
Additional modules dropped into the R folder will be picked up by FME.
Troubleshooting Tips
Double check the types on your input tables. If you want to do numeric calculations on certain columns make sure that they are configured as numeric types
Specifying the R Interpreter
FME will try its best to find R as installed on your system; however, if R is installed in a non-default location, or you have multiple R interpreters installed, it may be necessary to specify the R Interpreter Path under Tools > FME Options > Translation > R Interpreter
Editing Transformer Parameters
Using a set of menu options, transformer parameters can be assigned by referencing other elements in the workspace. More advanced functions, such as an advanced editor and an arithmetic editor, are also available in some transformers. To access a menu of these options, click beside the applicable parameter. For more information, see Transformer Parameter Menu Options.
Transformer Categories
Search FME Knowledge Center
Search for samples and information about this transformer on the FME Knowledge Center.