Microsoft Excel Reader (XLSXR) and Writer (XLSXW)
FME provides read and write access to Microsoft Excel spreadsheets (also known as workbooks). Excel spreadsheets may contain multiple worksheets (often called sheets) and each worksheet may optionally contain specifically named cell ranges called Named Ranges.
Format Abbreviations
- Reader: XLSXR
- Writer: XLSXW
Microsoft Excel Product and System Requirements
Format |
Product |
Operating System |
||||
---|---|---|---|---|---|---|
Reader/Writer |
FME Desktop License |
FME Server |
FME Cloud |
Windows |
Linux |
Mac |
Reader |
Available in FME Professional Edition and higher |
Yes |
Yes |
32-bit: Yes 64-bit: Yes |
Yes |
Yes |
Writer |
Available in FME Professional Edition and higher |
Yes |
Yes |
32-bit: Yes 64-bit: Yes |
Yes |
Yes |
- More about FME Licenses and Subscriptions.
- More about FME Desktop Editions and Licenses.
Versions
This format supports the following Excel file types:
- Excel 97-2003 binary formats (.xls)
- Excel 2007+ xml formats (.xlsx/.xlsm). .xlsm files are .xlsx files with macros enabled, and use the same file format.
- Excel 2007+ binary formats (.xlsb). Reading .xlsb is only available on Windows systems with Excel 2007 or greater installed. Writing .xlsb is not supported. Note: Due to technical limitations in Excel, reading .xlsb is not possible on FME Server.
The Excel reader/writer does not support Macro-enabled Templates (.xltm) or Excel Add-in (.xlam).
This reader/writer can only process one Excel file type per reader or writer.
Differences Between XLS and XLSX
Starting with Office 2007, Microsoft changed the default Excel file format from .xls to .xlsx.
While .xls is a proprietary binary format, the newer .xlsx version is an XML-based spreadsheet, defined in the Office Open XML (OOXML) specification.
How FME Works with Excel Data
To get the most value from an Excel spreadsheet, you also have to organize, analyze, and filter its data. Large datasets can easily exceed the limits of the Excel format: FME can read and analyze large amounts of data, filter it, organize it into worksheets, and then output the results into a manageable spreadsheet. FME can also integrate spreadsheets with data from other formats (for example, demographic databases, GIS, CAD drawings, web services, and orthophotos).
You can use FME to perform simple operations, like merging or filtering spreadsheet data, but you can also use FME to perform more complex tasks like spatial operations. For example, if your data contains addresses, coordinate pairs, street names or asset codes, FME recognizes common names for geometry columns (like latitude/longitude), and then automatically converts the rows to geometry. By integrating geocoding services like ArGIS Online or Google, you can map Excel data and then view it in the FME Data Inspector.
For more information on using FME to automate Excel tasks, see FME and Excel.
Terminology in this Chapter
Term | Definition or FME Representation |
---|---|
spreadsheet and workbook | reader dataset |
worksheet or named range | feature type |
column in a worksheet or named range | attribute |
row | feature |
Usage Notes
The reader will match attribute names to column values on the header row, to allow for reordering of the schema in FME or reordering of the columns in the source files. However, renaming a column in the file (but not changing the workspace attributes) will cause that column to be ignored.
Formatting or style information consists of font, text color, background colors, numeric formatting, fill patterns, border styles, etc. and can be set in the Excel application at either the column, row, or individual cell level.
Formatting is supported by both the Excel reader and writer. Row and cell formatting is read and written, except for formatting from the header row, and will be preserved from the reader to the writer. Additionally, the Excel writer can write column formatting.
Note that both row and cell formatting in Excel files is read by the Excel reader as cell formatting.
- Per-cell formatting is stored per feature as a separate attribute named <attributename>.formatting and setting the attribute value with the same name-value pairs used in per-feature formatting.
- Per-feature (row) formatting is stored in the xlsx_row_formatting attribute per feature. The attribute value is a list of name-value pairs, separated by semicolons. For example: version;1;horizontal_alignment;right;row_height;37;background_color;0,0,1;font;Times New Roman,BOLD,UNDERLINE.
- Note that both per-cell and per-feature formatting can be set using the ExcelStyler transformer.
- Per-column formatting can be set by clicking the Edit button under the Writer User Attributes tab (Formatting column), and setting the desired options.
Specifying formatting explicitly per column, row, or cell will be merged with formatting in a template sheet or existing sheet; otherwise, formatting within existing sheets or templates will be preserved if new formatting is not specified. Merging of formatting for a column, row, or cell is specific to each format setting. For example, setting Background Color for a cell does not override or invalidate existing formatting for Text Alignment or Custom Number formatting.
Note that the column type specified may also affect formatting. Specifically Date, Time, and DateTime column types will enforce some custom number formatting to structure the date, and .hyperlink fields will set the color and underline values in the font format.
This format supports formulas.
The reader can optionally read in formulas used to calculate the values of a cell and store the formula in <attributename>.formula.
For the writer, a formula for calculating the value of an attribute can be set in a separate attribute named <attributename>.formula. The formula appears in the worksheet, in the cell in which the attribute is written.
Note that in order for the formula to be applied to the cell, the writer must contain both attributename (which can be empty) and attributename.formula. For example:
An Excel file contains three columns, col1, col2, and col3. To fill col3 with a formula, you can create an attribute col3 (which can be empty) and an attribute col3.formula, which is filled with the formula. In the resulting Excel file, col3 will contain the correct formula.
Example values are:
=SUM(C2:C18)
SUM(C2:C18)
AVERAGE($A$1:$A$A5)
Note that these are not FME expressions (so you would not use the Arithmetic Editor) but you can use concatenation and FME attributes to build the formula. For example:
SUM(E@Value(_rowID):P@Value(_rowID))
would return the formula
SUM(E3:P3) when _rowID = 3
This format supports embedded hyperlinks. The reader can optionally read in embedded hyperlinks in a worksheet and store them in <attributename>.hyperlink.
For the writer, a hyperlink can be set for an attribute by setting a separate attribute named <attributename>.hyperlink:
In the output Excel spreadsheet, clicking the cell in which the attribute is written will open the hyperlinked URL. The value of the cell does not need to be same as the hyperlink. Note that if the attribute name is uppercase (for example, attribute.HYPERLINK), this functionality will not work.
Example values are:
https://www.safe.com
file://c:/documents/readme.txt
FME identifies named ranges and tables using the format <sheetName>/<namedRangeName>, followed by the names and types of the attributes.
FME currently does not support creating new Pivot tables, charts, or graphs, but can add data to them if they exist in template files or worksheets.
The writer can set protection levels for worksheets only. Encryption of the spreadsheet is not yet possible. The reader can read Excel files even if they have sheet (password) protection, but cannot read encrypted spreadsheets.
The Microsoft Excel writer allows the specification of a template file to use as a starting point for creating a new dataset. If a template file name is specified, the writer will make a copy of the template file in the location specified in the Dataset parameter. All write operations will be performed on the copy. The template file specified remains unchanged. If a template sheet is selected, in addition to the template file being copied, all new sheets being written will be written on a copy of the selected template sheet.
This is particularly useful when populating an area of the template with data such as an existing chart, graph, or pivot table or when creating many spreadsheets from a single template. If the destination dataset already exists, and a template file is specified, the destination dataset will always be overwritten, regardless of the specified choice in the Overwrite File: option. Likewise, the writer feature type option to Drop Existing Sheet/Named Range is irrelevant if a template file or template sheet is specified, as the existing sheet will always be replaced by the template sheet. However, the writer feature type option to Truncate Existing Sheet/Named Range is applied to template worksheets after being copied to the destination file.
FME automatically recognizes some common attribute names as potential x,y,z coordinates and sets their types. See Reader Parameters for additional information.
This data may not necessarily have a spatial component, but columns can be identified as x, y, or z coordinates to create point geometries. If a schema scan is performed and field labels contain variations of x/y, east/north, or easting/northing, FME will create the point geometry.
FME also detects latitude and longitude column names (for example, Latitude/Longitude or Lat/Long), and sets the source coordinate system to LL-WGS84.
About the Reader (XLSXR)
The XLSX reader supports reading multiple worksheets and named ranges from the same file. Because of this, multiple files are not supported, so you will need to create a separate reader for each Microsoft Excel spreadsheet.
About the Writer (XLSXW)
The Microsoft Excel writer writes attribute records into a spreadsheet. The writer provides the following capabilities:
- Table Creation
- Images
- Multiple Worksheets and/or Named Ranges
- Writer Mode Specification: The writer allows the user to specify a writer mode, which determines the operation to perform for each feature received.