NLPTrainer

Input

This transformer expects to receive a corpus of labeled texts in the form of features containing this information as attributes. All incoming features should have the same attribute names for each of the required pieces of information, both the label and the text. For example, if one of the input features uses 'my_label' as an attribute name for the label and 'my_text' as the attribute name for the text, all the other input features should use 'my_label' for labels and 'my_text' for texts as well.

Summary

This port will output one feature after the training process is over. The feature’s attributes will indicate the sizes of the test and training sets of data, as well as the accuracy of the model over the test set (on a scale of 0 to 1, where 1 is perfect accuracy), and some information about what NLP features the model finds helpful. (The exact structure of this information depends on the model type.) This information can be easily reviewed by connecting a logger.

Features which cannot be used to train the model, often because either the text itself or the label cannot be retrieved, are routed to this port.

Rejected Feature Handling: can be set to either terminate the translation or continue running when it encounters a rejected feature. This setting is available both as a default FME option and as a workspace parameter.

NLP Model

Model Type to Train	This parameter specifies what kind of model the NLPTrainer will produce. All model types are suitable for use with the NLPClassifier transformer.
Label	The attribute on incoming features which specifies the label.
Output Model Filename	The save location for the model, in the form of an *.fmd file.

Training and Testing

Text

The attribute on incoming features which specifies the text.

Case Sensitive

Determines whether the model will be case sensitive or not.

Data Percentage for Testing

Determines what percentage of the input data will be used for testing the completed model. The remainder of the input data will be used for training.

Text Type

Indicates whether this model is being trained to look at single-word or multiword texts.

NLP Features

Allows the user to tell the NLPTrainer what information about the text to take into account. Different NLP feature specifications will produce different models, which may vary considerably in their accuracy.

Contains Common Words

Number of Words to Track

This parameter decides how many common words from the training corpus to keep track of. The words selected will always be the most common ones. For example, if the value is 500, the NLPTrainer will determine the 500 most common words throughout the training corpus and the model will pay attention to whether each of these words is used in each text.

Format attribute set: _nlp_number_words

Words to Ignore

This parameter, together with the next, allows the user to specify words that, however common, should not be tracked by the model. This is helpful, for example, for ignoring “stopwords”, words that are expected to be common throughout the texts regardless of category. This parameter offers a dropdown choice between no stopwords, FME’s list of English-language stopwords, or a custom list.

Format attribute set: _nlp_stopwords

Custom Stopword List Filename

If using a custom list of stopwords, or words to ignore, this parameter supplies the filepath for that list. The NLPTrainer expects this to be a newline-delimited plain text (*.txt) file.

Format attribute set: _nlp_stopwords_filename

Mean Length

Mean Length Of

Determines whether the model takes the mean length of sentences or words in the text.

Format attribute set: _nlp_mean_length_of

Mean Length In

Determines whether the model measures mean length of the specified components in words or characters.

Format attribute set: _nlp_mean_length_in

How to Set Parameter Values

Defining Values

There are several ways to define a value for use in a Transformer. The simplest is to simply type in a value or string, which can include functions of various types such as attribute references, math and string functions, and workspace parameters.

Using the Text Editor

The Text Editor provides a convenient way to construct text strings (including regular expressions) from various data sources, such as attributes, parameters, and constants, where the result is used directly inside a parameter.

Text Editor

Using the Arithmetic Editor

The Arithmetic Editor provides a convenient way to construct math expressions from various data sources, such as attributes, parameters, and feature functions, where the result is used directly inside a parameter.

Arithmetic Editor

Conditional Values

Set values depending on one or more test conditions that either pass or fail.

Parameter Condition Definition Dialog

Content

Expressions and strings can include a number of functions, characters, parameters, and more.

When setting values - whether entered directly in a parameter or constructed using one of the editors - strings and expressions containing String, Math, Date/Time or FME Feature Functions will have those functions evaluated. Therefore, the names of these functions (in the form @<function_name>) should not be used as literal string values.

Content Types

String Functions	These functions manipulate and format strings.
Special Characters	A set of control characters is available in the Text Editor.
Math Functions	Math functions are available in both editors.
Date/Time Functions	Date and time functions are available in the Text Editor.
Math Operators	These operators are available in the Arithmetic Editor.
FME Feature Functions	These return primarily feature-specific values.
FME Parameters	FME and workspace-specific parameters may be used.
Creating and Modifying User Parameters	Create your own editable parameters.

Dialog Options - Tables

Table Tools

Transformers with table-style parameters have additional tools for populating and manipulating values.

Row Reordering

Enabled once you have clicked on a row item. Choices include:

Add a row
Remove a row
Move current row up one
Move current row down one
Move current row to top
Move current row to bottom

Cut, Copy, and Paste

Enabled once you have clicked on a row item. Choices include:

Cut a row - delete and copy to clipboard
Copy a row to the clipboard
Paste a row from the clipboard

Cut, copy, and paste may be used within a transformer, or between transformers.

Filter

Start typing a string, and the matrix will only display rows matching those characters. Searches all columns. This only affects the display of attributes within the transformer - it does not alter which attributes are output.

Import

Import populates the table with a set of new attributes read from a dataset. Specific application varies between transformers.

Reset/Refresh

Generally resets the table to its initial state, and may provide additional options to remove invalid entries. Behavior varies between transformers.

Note: Not all tools are available in all transformers.

Processing Behavior	Feature-Based
Feature Holding	Yes
Dependencies
Aliases	NaturalLanguageProcessingTrainer
History	Released FME 2019.0

NLPTrainer

Usage Notes

Configuration

Input Ports

Output Ports

Parameters

Editing Transformer Parameters

Reference

FME Community