NLPTrainer
Trains a natural language processing (NLP) classification model based on the user’s specifications and the provided data.
NLPTrainer expects tagged data as input, with each feature bearing a single text and label. Some preprocessing of this learning data may be required, and the AttributeCreator transformer can be useful for this. Based on the set of learning data and the NLP features (specific types of information about the text) that the user specifies, a model will then be created and written to a *.fmd (FME MoDel) file. The companion transformer to this one, NLPClassifier, uses these *.fmd files to perform natural language classification, sorting texts into the categories labelled in the training data.
Usage Notes
- For more information about natural language processing with FME, see the documentation for the companion NLPClassifier transformer.
Configuration
Input Ports
This transformer expects to receive a corpus of labelled texts in the form of features containing this information as attributes. All incoming features should have the same attribute names for each of the required pieces of information, both the label and the text. For example, if one of the input features uses 'my_label' as an attribute name for the label and 'my_text' as the attribute name for the text, all the other input features should use 'my_label' for labels and 'my_text' for texts as well.
Output Ports
This port will output one feature after the training process is over. The feature’s attributes will indicate the sizes of the test and training sets of data, as well as the accuracy of the model over the test set (on a scale of 0 to 1, where 1 is perfect accuracy), and some information about what NLP features the model finds helpful. (The exact structure of this information depends on the model type.) This information can be easily reviewed by connecting a logger.
Features which cannot be used to train the model, often because either the text itself or the label cannot be retrieved, are routed to this port.
Rejected Feature Handling: can be set to either terminate the translation or continue running when it encounters a rejected feature. This setting is available both as a default FME option and as a workspace parameter.
Parameters
Model Type to Train |
This parameter specifies what kind of model the NLPTrainer will produce. All model types are suitable for use with the NLPClassifier transformer. |
Label |
The attribute on incoming features which specifies the label. |
Output Model Filename |
The save location for the model, in the form of an *.fmd file. |
Text |
The attribute on incoming features which specifies the text. |
||||||||||||||||||||
Case Sensitive |
Determines whether the model will be case sensitive or not. |
||||||||||||||||||||
Data Percentage for Testing |
Determines what percentage of the input data will be used for testing the completed model. The remainder of the input data will be used for training. |
||||||||||||||||||||
Text Type |
Indicates whether this model is being trained to look at single-word or multiword texts. |
||||||||||||||||||||
NLP Features |
Allows the user to tell the NLPTrainer what information about the text to take into account. Different NLP feature specifications will produce different models, which may vary considerably in their accuracy. Beginning Characters
Contains Common Words
Contains Regular Expression
Contains String
Ending Characters
Length
Mean Length
|
Editing Transformer Parameters
Using a set of menu options, transformer parameters can be assigned by referencing other elements in the workspace. More advanced functions, such as an advanced editor and an arithmetic editor, are also available in some transformers. To access a menu of these options, click beside the applicable parameter. For more information, see Transformer Parameter Menu Options.
Defining Values
There are several ways to define a value for use in a Transformer. The simplest is to simply type in a value or string, which can include functions of various types such as attribute references, math and string functions, and workspace parameters. There are a number of tools and shortcuts that can assist in constructing values, generally available from the drop-down context menu adjacent to the value field.
Using the Text Editor
The Text Editor provides a convenient way to construct text strings (including regular expressions) from various data sources, such as attributes, parameters, and constants, where the result is used directly inside a parameter.
Using the Arithmetic Editor
The Arithmetic Editor provides a convenient way to construct math expressions from various data sources, such as attributes, parameters, and feature functions, where the result is used directly inside a parameter.
Conditional Values
Set values depending on one or more test conditions that either pass or fail.
Parameter Condition Definition Dialog
Content
Expressions and strings can include a number of functions, characters, parameters, and more.
When setting values - whether entered directly in a parameter or constructed using one of the editors - strings and expressions containing String, Math, Date/Time or FME Feature Functions will have those functions evaluated. Therefore, the names of these functions (in the form @<function_name>) should not be used as literal string values.
These functions manipulate and format strings. | |
Special Characters |
A set of control characters is available in the Text Editor. |
Math functions are available in both editors. | |
Date/Time Functions | Date and time functions are available in the Text Editor. |
These operators are available in the Arithmetic Editor. | |
These return primarily feature-specific values. | |
FME and workspace-specific parameters may be used. | |
Creating and Modifying User Parameters | Create your own editable parameters. |
Dialog Options - Tables
Transformers with table-style parameters have additional tools for populating and manipulating values.
Row Reordering
|
Enabled once you have clicked on a row item. Choices include:
|
Cut, Copy, and Paste
|
Enabled once you have clicked on a row item. Choices include:
Cut, copy, and paste may be used within a transformer, or between transformers. |
Filter
|
Start typing a string, and the matrix will only display rows matching those characters. Searches all columns. This only affects the display of attributes within the transformer - it does not alter which attributes are output. |
Import
|
Import populates the table with a set of new attributes read from a dataset. Specific application varies between transformers. |
Reset/Refresh
|
Generally resets the table to its initial state, and may provide additional options to remove invalid entries. Behavior varies between transformers. |
Note: Not all tools are available in all transformers.
Reference
Processing Behavior |
|
Feature Holding |
Yes |
Dependencies | |
Aliases | |
History | Released FME 2019.0 |
FME Community
The FME Community is the place for demos, how-tos, articles, FAQs, and more. Get answers to your questions, learn from other users, and suggest, vote, and comment on new features.
Search for all results about the NLPTrainer on the FME Community.
Examples may contain information licensed under the Open Government Licence – Vancouver and/or the Open Government Licence – Canada.