GoogleVisionConnector
Accesses the Google Vision AI API for image recognition.
Typical Uses
Submit an image to the Google Vision Library to
- detect labels and objects
- detect faces and emotions
- detect text
How does it work?
The GoogleVisionConnector uses your Google Cloud account credentials to access the Google Vision client services.
It will submit an image to the service, and return features with attributes about that image. Each input image may result in several output features.
All confidence scores are returned between 0 and 1.
For more information, see Google’s documentation:
https://cloud.google.com/vision/
Configuration
Input Ports
This transformer accepts any feature. Raster geometries may be used as input if Raster Geometry is selected as the image source.
Output Ports
Output will depend on the analysis chosen. Each input feature may result in multiple output features. For example, a single image is likely to contain multiple objects.
Object Detection
Detects and extract information about objects in an image, across a broad group of categories. Labels can identify general objects, locations, activities, animal species, products, and more. Detected objects will have a bounding box geometry returned. Bounding boxes are in pixel units, and will align with raster and local file inputs. Bounded boxes for URL inputs are returned as normalized values between 0 and 1.
Attributes
_label | Labels that describe detected entities in the image. |
_confidence | The confidence score, which ranges from 0 (no confidence) to 1 (very high confidence). |
Face Detection
Successfully identified faces will result in output features with attributes describing the face. Each feature will have a bounding box for the face. Optionally, facial "landmarks," such as LEFT_EYE, NOSE_TIP, or LEFT_EYE_PUPIL will also be added as additional point geometries on the feature.
When using any input image source, the bounding box is in pixel units, and will align with the input.
Likelihood attributes have possible values of:
UNKNOWN, VERY_UNLIKELY, UNLIKELY, POSSIBLE, LIKELY, VERY_LIKELY
Attributes
_confidence | Overall confidence score of the feature, which ranges from 0 (no confidence) to 1 (very high confidence). |
_landmark_confidence | Face landmarking confidence score, which ranges from 0 (no confidence) to 1 (very high confidence). |
_joy_likelihood | Joy likelihood. |
_sorrow_likelihood | Sorrow likelihood. |
_anger_likelihood | Anger likelihood. |
_surprise_likelihood | Surprise likelihood. |
_under_exposed_likelihood | Under-exposed likelihood. |
_blurred_likelihood | Blurred likelihood. |
_headwear_likelihood | Headwear likelihood. |
Text Detection
Detects and extracts text from any image. For example, a photograph might contain a street sign or traffic sign. Results include the entire extracted string, as well as individual words, and their bounding boxes.
When using any input image source, the bounding box is in pixel units, and will align with the input.
Attributes
_text | A detected text in an image. |
_type | Type of detected text. Types can be either PAGE, BLOCK, PARAGRAPH, WORD, or SYMBOL. The following is the hierarchy of text structures contained in text detection: PAGE -> BLOCK -> PARAGRAPH -> WORD -> SYMBOL. |
_id | The id of the detected text. Determined by the order of detected text. |
_confidence | The confidence of the OCR results of the text structure type. This will be a value between 0 and 1. |
_break_type | The type of break found. The possible break types are UNKNOWN, SPACE, SURE_SPACE, EOL_SURE_SPACE, HYPHEN, and LINE_BREAK. See the following Break Types table for more information about these types. |
_parent_id | The parent that the detected text is contained in. This value can be null with the text having no parents. |
Break Types
UNKNOWN | Unknown break label type. |
SPACE | Regular space. |
SURE_SPACE | Sure space (very wide). |
EOL_SURE_SPACE | Line-wrapping break. |
HYPHEN | End-line hyphen that is not present in text; does not co-occur with SPACE or LINE_BREAK. |
LINE_BREAK | Line break that ends a paragraph. |
Document Text Detection
Detects and extracts text from an image, but is optimized for dense text and documents. For example, an image of a handwritten document may contain blocks, paragraphs, words and symbols. Results include the entire extracted strings for blocks and paragraphs, as well as individual words and symbols.
Attributes
_text | A detected text in an image. |
_type | Type of detected text. Types can be either PAGE, BLOCK, PARAGRAPH, WORD, or SYMBOL. The following is the hierarchy of text structures contained in text detection: PAGE -> BLOCK -> PARAGRAPH -> WORD -> SYMBOL. |
_id | The id of the detected text. Determined by the order of detected text. |
_confidence | The confidence of the OCR results of the text structure type. This will be a value between 0 and 1. |
_break_type | The type of break found. The possible break types are UNKNOWN, SPACE, SURE_SPACE, EOL_SURE_SPACE, HYPHEN, and LINE_BREAK. See the following Break Types table for more information about these types. |
_parent_id | The parent that the detected text is contained in. This value can be null with the text having no parents. |
Break Types
UNKNOWN | Unknown break label type. |
SPACE | Regular space. |
SURE_SPACE | Sure space (very wide). |
EOL_SURE_SPACE | Line-wrapping break. |
HYPHEN | End-line hyphen that is not present in text; does not co-occur with SPACE or LINE_BREAK. |
LINE_BREAK | Line break that ends a paragraph. |
Output will depend on the analysis chosen.
Object Detection
Detects and extracts information about entities in an image. The service may return multiple label guesses for an individual request. For more information, see https://cloud.google.com/vision/docs/labels.
Attributes
_labels{}.name | A list of labels that describe entities in the image. |
_labels{}.confidence | A list of the confidence score, which ranges from 0 (no confidence) to 1 (very high confidence). |
Face Detection
A summary feature with the original geometry and attributes preserved will always be output through the Summary port. Attributes will be added to indicate the number of faces detected.
Attributes
_detected_faces | The number of faces that were detected in the image |
Text Detection
A summary feature with the original geometry and attributes preserved will always be output through the Summary port. Attributes will be added to indicate the number of text structures found.
Attributes
_detected_pages | The number of pages that were detected in the image |
_detected_blocks | The number of blocks that were detected in the image |
_detected_paragraphs | The number of paragraphs that were detected in the image |
_detected_words | The number of words that were detected in the image |
_detected_symbols | The number of symbols that were detected in the image |
Document Text Detection
A summary feature with the original geometry and attributes preserved will always be output through the Summary port. Attributes will be added to indicate the number of text structures found.
Attributes
_detected_pages | The number of pages that were detected in the image |
_detected_blocks | The number of blocks that were detected in the image |
_detected_paragraphs | The number of paragraphs that were detected in the image |
_detected_words | The number of words that were detected in the image |
_detected_symbols | The number of symbols that were detected in the image |
The incoming feature is output through this port.
Features that cause the operation to fail are output through this port. An fme_rejection_code attribute, having the value ERROR_DURING_PROCESSING, will be added, along with a more descriptive fme_rejection_message attribute which contains more specific details as to the reason for the failure.
Note: If a feature comes in to the GoogleVisionConnector already having a value for fme_rejection_code, this value will be removed.
Rejected Feature Handling: can be set to either terminate the translation or continue running when it encounters a rejected feature. This setting is available both as a default FME option and as a workspace parameter.
Parameters
Credential Source |
The GoogleVisionConnector can use credentials from different sources. Using a Service Account File integrates best with FME, but in some cases, you may wish to use a web connection.
|
Account |
Available when the credential source is Web Connection. To create a Google Cloud AI connection, click the 'Account' drop-down box and select 'Add Web Connection...'. The connection can then be managed via Tools -> FME Options... -> Web Connections. |
Image Source |
The source where the images can come from. Choices are:
|
Detection Type |
The type of operation to perform. Choices are:
|
The remaining parameters available depend on the value of the Request > Detection Type parameter. Parameters for each Detection Type are detailed below.
Included Text Detection Features
Pages |
Whether page text structures should be detected. |
Blocks |
Whether block text structures should be detected. |
Paragraphs |
Whether paragraph text structures should be detected. |
Words |
Whether word text structures should be detected. |
Symbols |
Whether symbol text structures should be detected. |
Face Detection Options
Face detection does not require any additional parameters.
Object Detection Options
Object detection does not require any additional parameters.
Included Text Detection Features
Pages |
Whether page text structures should be detected. |
Blocks |
Whether block text structures should be detected. |
Paragraphs |
Whether paragraph text structures should be detected. |
Words |
Whether word text structures should be detected. |
Symbols |
Whether symbol text structures should be detected. |
Editing Transformer Parameters
Using a set of menu options, transformer parameters can be assigned by referencing other elements in the workspace. More advanced functions, such as an advanced editor and an arithmetic editor, are also available in some transformers. To access a menu of these options, click beside the applicable parameter. For more information, see Transformer Parameter Menu Options.
Defining Values
There are several ways to define a value for use in a Transformer. The simplest is to simply type in a value or string, which can include functions of various types such as attribute references, math and string functions, and workspace parameters. There are a number of tools and shortcuts that can assist in constructing values, generally available from the drop-down context menu adjacent to the value field.
Using the Text Editor
The Text Editor provides a convenient way to construct text strings (including regular expressions) from various data sources, such as attributes, parameters, and constants, where the result is used directly inside a parameter.
Using the Arithmetic Editor
The Arithmetic Editor provides a convenient way to construct math expressions from various data sources, such as attributes, parameters, and feature functions, where the result is used directly inside a parameter.
Conditional Values
Set values depending on one or more test conditions that either pass or fail.
Parameter Condition Definition Dialog
Content
Expressions and strings can include a number of functions, characters, parameters, and more.
When setting values - whether entered directly in a parameter or constructed using one of the editors - strings and expressions containing String, Math, Date/Time or FME Feature Functions will have those functions evaluated. Therefore, the names of these functions (in the form @<function_name>) should not be used as literal string values.
These functions manipulate and format strings. | |
Special Characters |
A set of control characters is available in the Text Editor. |
Math functions are available in both editors. | |
Date/Time Functions | Date and time functions are available in the Text Editor. |
These operators are available in the Arithmetic Editor. | |
These return primarily feature-specific values. | |
FME and workspace-specific parameters may be used. | |
Creating and Modifying User Parameters | Create your own editable parameters. |
Dialog Options - Tables
Transformers with table-style parameters have additional tools for populating and manipulating values.
Row Reordering
|
Enabled once you have clicked on a row item. Choices include:
|
Cut, Copy, and Paste
|
Enabled once you have clicked on a row item. Choices include:
Cut, copy, and paste may be used within a transformer, or between transformers. |
Filter
|
Start typing a string, and the matrix will only display rows matching those characters. Searches all columns. This only affects the display of attributes within the transformer - it does not alter which attributes are output. |
Import
|
Import populates the table with a set of new attributes read from a dataset. Specific application varies between transformers. |
Reset/Refresh
|
Generally resets the table to its initial state, and may provide additional options to remove invalid entries. Behavior varies between transformers. |
Note: Not all tools are available in all transformers.
Reference
Processing Behavior |
|
Feature Holding |
No |
Dependencies | Google Cloud Account with access to the Cloud Vision API |
FME Licensing Level | FME Base Edition and above |
Aliases | |
History | Released FME 2019.2 |
FME Community
The FME Community is the place for demos, how-tos, articles, FAQs, and more. Get answers to your questions, learn from other users, and suggest, vote, and comment on new features.
Search for all results about the GoogleVisionConnector on the FME Community.
Examples may contain information licensed under the Open Government Licence – Vancouver