StatisticsCalculator

Calculates statistics based on a designated attribute or set of attributes of the incoming features and adds the results as attributes.

Jump to Configuration

Typical Uses

Inspecting and analyzing features
Calculating statistics for use in further operations

How does it work?

The StatisticsCalculator receives features and calculates selected statistics on them per attribute. Statistics can be calculated on specific groups of input features as specified by the Group By attribute option. The results are output on each feature out of the Complete port. A summary of the results is output out of the Summary port.

Available statistics include:

Minimum
Maximum
Total Count
Sum
Mean
Median
Numeric Count
Value Count
Range
Standard Deviation (Sample or Population)
Mode
Histogram

The following statistics may also be calculated cumulatively on attributes using advanced settings: min, max, range, mean, stdev, sum, total count, numeric count, and value count. When calculating cumulative statistics, the current results thus far are output on each feature out of the Cumulative port.

Statistics are stored as attributes, named <attribute>.<statisticname>, where <statisticname> will be one of the following: min, max, range, mean, stdev, stdev_p, sum, median, mode, total_count, numeric_count, value_count, histogram.

Histogram statistics are stored as list attributes that pair the attribute value and count for each unique value of the attribute and are named <attribute>.histogram{#}.value and <attribute>.histogram{#}.count where # is a zero-based index of the unique attribute values.

Input Data Handling

Invalid, <null>, and <missing> data are considered invalid and will be skipped when calculating statistics other than total count. If no valid values are processed when calculating a statistic the result will be <missing>. Trying to calculate standard deviation on only one value will also result in <missing>. Total count, numeric count, and value count will never be <missing>. NaN values are explicitly ignored when calculating Min, Max, Range, and Numeric Count.

For example, a calculated sum on all <missing> values will be <missing> rather than a potentially misleading and less informative 0. However, if only some values are <missing> or invalid the resulting sum will be the same as if those values were 0.

Example

Usage Notes

The StatisticsCalculator transformer has default suffixes and always prepends. Upgraded transformers will retain prepend settings when output attribute names were not prepended before upgrade as well as suffix names to maintain backwards compatibility and avoid disrupting existing workspaces.

Configuration

Input Ports

Output Ports

Parameters

Group Processing

Group By

If Group By attributes are chosen, statistics will be calculated independently within each group of features. This can be used to create a pivot-table-like analysis of values in a data stream.

Complete Groups

When All Features Received: This is the default behavior. Processing will only occur in this transformer once all input is present.

When Group Changes (Advanced): This transformer will process input groups in order. Changes of the value of the Group By parameter on the input stream will trigger processing on the currently accumulating group. This may improve overall speed (particularly with multiple, equally-sized groups), but could cause undesired behavior if input groups are not truly ordered.

Statistics to Calculate

Attribute

Select which attributes to include, one per line.

Statistics

The following statistics may be calculated:

Minimum	The numerical minimum for numeric attributes. The lexical minimum for string attributes.
Maximum	The numerical maximum for numeric attributes. The lexical maximum for string attributes.
Total Count	The input feature count.
Sum	The sum of all values. Undefined for string attributes.
Mean	The average value, calculated as the sum of values divided by the number of values. Undefined for string attributes.
Median	The middle value of the ordered attribute values. If the number of attributes is even, Median returns the average of the two middle values. For string attributes, the first middle value is always used.
Numeric Count	The number of numeric values that entered the transformer. In particular, missing, null, and NaN values are ignored, and are not included in this count.
Value Count	The number of values that entered the transformer, ignoring <null> and <missing>. When Calculation Method is Numeric , only values that can be converted to numbers will be counted.
Range	The maximum minus the minimum. Undefined for string attributes.
Sample Standard Deviation	The sample standard deviation. Sample standard deviation is measured by the "non-biased" or "n-1" method. Undefined for string attributes.
Population Standard Deviation	The population standard deviation. Undefined for string attributes.
Mode	The most frequent of all the values. If the dataset is bimodal (two or more values occur with the highest frequency) one of the values will be returned randomly.
Histogram	A count for each unique value encountered for the analyzed attribute. The results are given as a structured list of attributes which represent (value,count) pairs.

Select All

This toggle works in conjunction with currently selected rows in the Statistics to Calculate table and will enable or disable all statistic type choices.

Add Attributes...

Provides a pick list of all currently available attributes to perform multiple attribute addition to the Statistics to Calculate table.

Editing Transformer Parameters

Using a set of menu options, transformer parameters can be assigned by referencing other elements in the workspace. More advanced functions, such as an advanced editor and an arithmetic editor, are also available in some transformers. To access a menu of these options, click beside the applicable parameter. For more information, see Transformer Parameter Menu Options.

Defining Values

There are several ways to define a value for use in a Transformer. The simplest is to simply type in a value or string, which can include functions of various types such as attribute references, math and string functions, and workspace parameters. There are a number of tools and shortcuts that can assist in constructing values, generally available from the drop-down context menu adjacent to the value field.

How to Set Parameter Values

Using the Text Editor

The Text Editor provides a convenient way to construct text strings (including regular expressions) from various data sources, such as attributes, parameters, and constants, where the result is used directly inside a parameter.

Text Editor

Using the Arithmetic Editor

The Arithmetic Editor provides a convenient way to construct math expressions from various data sources, such as attributes, parameters, and feature functions, where the result is used directly inside a parameter.

Arithmetic Editor

Conditional Values

Set values depending on one or more test conditions that either pass or fail.

Parameter Condition Definition Dialog

Content

Expressions and strings can include a number of functions, characters, parameters, and more.

When setting values - whether entered directly in a parameter or constructed using one of the editors - strings and expressions containing String, Math, Date/Time or FME Feature Functions will have those functions evaluated. Therefore, the names of these functions (in the form @<function_name>) should not be used as literal string values.

Content Types

String Functions	These functions manipulate and format strings.
Special Characters	A set of control characters is available in the Text Editor.
Math Functions	Math functions are available in both editors.
Date/Time Functions	Date and time functions are available in the Text Editor.
Math Operators	These operators are available in the Arithmetic Editor.
FME Feature Functions	These return primarily feature-specific values.
FME Parameters	FME and workspace-specific parameters may be used.
Creating and Modifying User Parameters	Create your own editable parameters.

Dialog Options - Tables

Transformers with table-style parameters have additional tools for populating and manipulating values.

Table Tools

Row Reordering

Enabled once you have clicked on a row item. Choices include:

Add a row
Remove a row
Move current row up one
Move current row down one
Move current row to top
Move current row to bottom

Cut, Copy, and Paste

Enabled once you have clicked on a row item. Choices include:

Cut a row - delete and copy to clipboard
Copy a row to the clipboard
Paste a row from the clipboard

Cut, copy, and paste may be used within a transformer, or between transformers.

Filter

Start typing a string, and the matrix will only display rows matching those characters. Searches all columns. This only affects the display of attributes within the transformer - it does not alter which attributes are output.

Import

Import populates the table with a set of new attributes read from a dataset. Specific application varies between transformers.

Reset/Refresh

Generally resets the table to its initial state, and may provide additional options to remove invalid entries. Behavior varies between transformers.

Note: Not all tools are available in all transformers.

FME Community

The FME Community is the place for demos, how-tos, articles, FAQs, and more. Get answers to your questions, learn from other users, and suggest, vote, and comment on new features.

Search for samples and information about this transformer on the FME Community.