GoogleVisionConnector

Connects to the Google Vision AI API for object detection in images.

Typical Uses

Detecting labels, objects, faces, and text

How does it work?

The GoogleVisionConnector uses your Google Cloud account credentials to connect to Google Vision and submit images for analysis.

Images may be provided as local files, Google Storage objects, URLs, or as raster geometry, and each one may produce multiple output features.

Services supported are:

Detection Type	Analysis
Document Text Detection	Detects and extracts text from an image. It is optimized for dense text and documents, such as an image of a handwritten document with blocks, paragraphs, words and symbols. Results include the entire extracted strings for blocks and paragraphs, as well as individual words and symbols.
Face Detection	Successfully identified faces will result in output features with attributes describing the face. Each feature will have a bounding box for the face.Facial landmarks such as LEFT_EYE, NOSE_TIP, or LEFT_EYE_PUPIL are added as additional point geometries on the feature. When using any input image source, the bounding box is in pixel units, and will align with the input.
Object Detection	Detects and extract information about objects in an image, across a broad group of categories. Labels can identify general objects, locations, activities, animal species, products, and more. Detected objects will have a bounding box geometry returned. Bounding boxes are in pixel units, and will align with raster and local file inputs. Bounded boxes for URL inputs are returned as normalized values between 0 and 1.
Text Detection	Detects and extracts text from any image. Results include the entire extracted string, as well as individual words and their bounding boxes. When using any input image source, the bounding box is in pixel units, and will align with the input.

Detection Type

Analysis

Document Text Detection

Detects and extracts text from an image.

It is optimized for dense text and documents, such as an image of a handwritten document with blocks, paragraphs, words and symbols.

Results include the entire extracted strings for blocks and paragraphs, as well as individual words and symbols.

Face Detection

Successfully identified faces will result in output features with attributes describing the face.

Each feature will have a bounding box for the face.Facial landmarks such as LEFT_EYE, NOSE_TIP, or LEFT_EYE_PUPIL are added as additional point geometries on the feature.

When using any input image source, the bounding box is in pixel units, and will align with the input.

Object Detection

Detects and extract information about objects in an image, across a broad group of categories.

Labels can identify general objects, locations, activities, animal species, products, and more.

Detected objects will have a bounding box geometry returned. Bounding boxes are in pixel units, and will align with raster and local file inputs. Bounded boxes for URL inputs are returned as normalized values between 0 and 1.

Text Detection

Detects and extracts text from any image.

Results include the entire extracted string, as well as individual words and their bounding boxes.

When using any input image source, the bounding box is in pixel units, and will align with the input.

Optional Input Port

This transformer has two modes, depending on whether a connector is attached to the Input port or not:

Input-driven: When input features are connected, the transformer runs once for each feature it receives in the Input port.
Run Once: When no input features are connected, the transformer runs one time.

When the Input port is in use, the Initiator output port is also enabled.

Configuration

Input Ports

Output Ports

Output

Features with added attributes, as specified in parameters and according to Detection Type.

Detection Type	Output - Input-Driven	Output - Run Once
Document Text Detection	Input features, one copy for each text element, with details about the text.	New features, one copy for each text element, with details about the text.
Face Detection	Input feature(s), one copy for each face identified, with details about the face.	New feature(s), one for each face identified, with details about the face.
Object Detection	Input feature(s), one copy for each object identified, with details about the object.	New feature(s), one for each object identified, with details about the object.
Text Detection	Input features, one copy for each text element, with details about the text.	New features, one copy for each text element, with details about the text.

Summary

One feature per image is output here, with added attributes describing detection result success.

Detection Type

Summary Feature Attributes

Document Text Detection

_detected_pages	The number of pages that were detected in the image
_detected_blocks	The number of blocks that were detected in the image
_detected_paragraphs	The number of paragraphs that were detected in the image
_detected_words	The number of words that were detected in the image
_detected_symbols	The number of symbols that were detected in the image

Face Detection

_detected_faces

The number of faces that were detected in the image

Object Detection

_labels{}.name

Labels that describe entities in the image as a list attribute.

The service may return multiple label guesses for an individual request.

See Google Cloud Vision - Detect Labels.

_labels{}.confidence

Label confidence scores as a list attribute.

Scores range from 0 (no confidence) to 1 (very high confidence).

Text Detection

_detected_pages	The number of pages that were detected in the image
_detected_blocks	The number of blocks that were detected in the image
_detected_paragraphs	The number of paragraphs that were detected in the image
_detected_words	The number of words that were detected in the image
_detected_symbols	The number of symbols that were detected in the image

Parameters

Authentication

Credential Source

Select the type of credentials to use:

Web Connection (Recommended): Use a Google AI OAuth (recommended)or Google AI JSON Key web connection.
Application Default Credentials: Authenticate based on the system environment. This will attempt to resolve credentials using the following sources, in order:
- The credential file pointed to by the environment variable GOOGLE_APPLICATION_CREDENTIALS
- The credential file created by using the Google Cloud CLI
- The metadata server on a Google Cloud Engine Virtual Machine instance.
  See Google Cloud SDK > Authentication > How Application Default Credentials works.

Account

When Credential Source is Web Connection, select or create a Web Connection connecting to a Google AI OAuth or Google AI JSON Key Web Service.

See Using Web Connections.

Request

Image Source	Select the source of the image: Local File: A JPEG or PNG file on disk. URL: An image located at a URL. Google Storage Object: An image in Google Cloud Storage. Raster Geometry: Raster geometry on a feature.
Input Filename	When Image Source is Local File, provide the path and filename.
URL	When Image Source is URL, provide the URL.
Bucket	When Image Source is Google Storage Object, specify the bucket the image is in .
Object Path	When Image Source is Google Storage Object, provide the full path of the image.
Detection Type	Select the type of detection to perform: Document Text Detection Face Detection Object Detection Text Detection

Detection Type > Document Text Detection

Document Text Detection Options

Included Text Detection Features

Pages	Detect page text structures: Yes No
Blocks	Detect block text structures: Yes No
Paragraphs	Detect paragraph text structures: Yes No
Words	Detect word text structures: Yes No
Symbols	Detect symbol text structures: Yes No

Added Attributes

Output features will receive these attributes:

_text

A detected text in an image.

_type

Type of detected text. Types can be either PAGE, BLOCK, PARAGRAPH, WORD, or SYMBOL. The following is the hierarchy of text structures contained in text detection: PAGE -> BLOCK -> PARAGRAPH -> WORD -> SYMBOL.

_id

The id of the detected text. Determined by the order of detected text.

_confidence

The confidence of the OCR results of the text structure type. This will be a value between 0 and 1.

_break_type

The type of break found.

UNKNOWN	Unknown break label type.
SPACE	Regular space.
SURE_SPACE	Sure space (very wide).
EOL_SURE_SPACE	Line-wrapping break.
HYPHEN	End-line hyphen that is not present in text; does not co-occur with SPACE or LINE_BREAK.
LINE_BREAK	Line break that ends a paragraph.

_parent_id

The parent that the detected text is contained in. This value can be null with the text having no parents.

Detection Type > Face Detection

Face Detection Options

Face detection has no parameters to configure.

Added Attributes

Output features will receive these attributes.

Likelihood attributes have possible values of UNKNOWN, VERY_UNLIKELY, UNLIKELY, POSSIBLE, LIKELY, or VERY_LIKELY.

_confidence	Overall confidence score of the feature, which ranges from 0 (no confidence) to 1 (very high confidence).
_landmark_confidence	Face landmarking confidence score, which ranges from 0 (no confidence) to 1 (very high confidence).
_joy_likelihood	Joy likelihood.
_sorrow_likelihood	Sorrow likelihood.
_anger_likelihood	Anger likelihood.
_surprise_likelihood	Surprise likelihood.
_under_exposed_likelihood	Under-exposed likelihood.
_blurred_likelihood	Blurred likelihood.
_headwear_likelihood	Headwear likelihood.

Detection Type > Text Detection

Text Detection Options

Included Text Detection Features

Pages	Detect page text structures: Yes No
Blocks	Detect block text structures: Yes No
Paragraphs	Detect paragraph text structures: Yes No
Words	Detect word text structures: Yes No
Symbols	Detect symbol text structures: Yes No

Added Attributes

Output features will receive these attributes:

_text

A detected text in an image.

_type

_id

The id of the detected text. Determined by the order of detected text.

_confidence

The confidence of the OCR results of the text structure type. This will be a value between 0 and 1.

_break_type

The type of break found.

UNKNOWN	Unknown break label type.
SPACE	Regular space.
SURE_SPACE	Sure space (very wide).
EOL_SURE_SPACE	Line-wrapping break.
HYPHEN	End-line hyphen that is not present in text; does not co-occur with SPACE or LINE_BREAK.
LINE_BREAK	Line break that ends a paragraph.

_parent_id

The parent that the detected text is contained in. This value can be null with the text having no parents.

Editing Transformer Parameters

Transformer parameters can be set by directly entering values, using expressions, or referencing other elements in the workspace such as attribute values or user parameters. Various editors and context menus are available to assist. To see what is available, click beside the applicable parameter.

How to Set Parameter Values

Defining Values

There are several ways to define a value for use in a Transformer. The simplest is to simply type in a value or string, which can include functions of various types such as attribute references, math and string functions, and workspace parameters.

Using the Text Editor

The Text Editor provides a convenient way to construct text strings (including regular expressions) from various data sources, such as attributes, parameters, and constants, where the result is used directly inside a parameter.

Text Editor

Using the Arithmetic Editor

The Arithmetic Editor provides a convenient way to construct math expressions from various data sources, such as attributes, parameters, and feature functions, where the result is used directly inside a parameter.

Arithmetic Editor

Conditional Values

Set values depending on one or more test conditions that either pass or fail.

Parameter Condition Definition Dialog

Content

Expressions and strings can include a number of functions, characters, parameters, and more.

When setting values - whether entered directly in a parameter or constructed using one of the editors - strings and expressions containing String, Math, Date/Time or FME Feature Functions will have those functions evaluated. Therefore, the names of these functions (in the form @<function_name>) should not be used as literal string values.

Content Types

String Functions	These functions manipulate and format strings.
Special Characters	A set of control characters is available in the Text Editor.
Math Functions	Math functions are available in both editors.
Date/Time Functions	Date and time functions are available in the Text Editor.
Math Operators	These operators are available in the Arithmetic Editor.
FME Feature Functions	These return primarily feature-specific values.
FME Parameters	FME and workspace-specific parameters may be used.
Creating and Modifying User Parameters	Create your own editable parameters.

Dialog Options - Tables

Table Tools

Transformers with table-style parameters have additional tools for populating and manipulating values.

Row Reordering

Enabled once you have clicked on a row item. Choices include:

Add a row
Remove a row
Move current row up one
Move current row down one
Move current row to top
Move current row to bottom

Cut, Copy, and Paste

Enabled once you have clicked on a row item. Choices include:

Cut a row - delete and copy to clipboard
Copy a row to the clipboard
Paste a row from the clipboard

Cut, copy, and paste may be used within a transformer, or between transformers.

Filter

Start typing a string, and the matrix will only display rows matching those characters. Searches all columns. This only affects the display of attributes within the transformer - it does not alter which attributes are output.

Import

Import populates the table with a set of new attributes read from a dataset. Specific application varies between transformers.

Reset/Refresh

Generally resets the table to its initial state, and may provide additional options to remove invalid entries. Behavior varies between transformers.

Note: Not all tools are available in all transformers.

For more information, see Transformer Parameter Menu Options.

Reference

Processing Behavior	Feature-Based
Feature Holding	No
Dependencies	Google Cloud Account with access to the Cloud Vision API
Aliases
History	Released FME 2019.2

FME Online Resources

The FME Community and Support Center Knowledge Base have a wealth of information, including active forums with 35,000+ members and thousands of articles.

Search for all results about the GoogleVisionConnector on the FME Community.

Examples may contain information licensed under the Open Government Licence – Vancouver, Open Government Licence - British Columbia, and/or Open Government Licence – Canada.

_label	Labels that describe detected entities in the image.
_confidence	The confidence score, which ranges from 0 (no confidence) to 1 (very high confidence).

GoogleVisionConnector

Typical Uses

How does it work?

Optional Input Port

Configuration

Input Ports

Output Ports

Parameters

Document Text Detection Options

Added Attributes

Face Detection Options

Added Attributes

Object Detection Options

Added Attributes

Text Detection Options

Added Attributes

Editing Transformer Parameters

Defining Values

Using the Text Editor

Using the Arithmetic Editor

Conditional Values

Content

Table Tools

Reference

FME Online Resources