What file formats are acceptable for importing data into the Open Data Portal?

A number of file formats are acceptable for importing data into data.iowa.gov.  The most common, however, will be a CSV file which is recommended for datasets without complex geometric structures.  Data or attributes associated with lines or polygons will require the dataset to be formatted as a shapefile, a KML file or imported through a Map Server Layer.

CSV file

A CSV file stores tabular data (both numbers and text) in plain-text form, see figure.  So many programs and applications support some variation of CSV for exporting, which makes moving tabular data between programs with different and incompatible formats possible.  Here are some basic characteristics of CSV files:Example CSV file format.  One record per line with a hard return at end of line.  Text is wrapped in this figure.

  • There is typically one record per line (hard return after the record)
  • Records divided into fields that are separated by delimiters (e.g. commas, semicolons, tabs).  It is the individual fields within a record that become the dataset’s columns when imported.
  • Each record contained in the file should have an identical list of fields.  That is, if a dataset consists of ten fields, every row in the dataset must contain ten fields. This is accomplished in a CSV file by including the separating commas with nothing in between.
  • The first row in the file should serve as your column titles or headings.
  • Fields containing a line-break, double-quote, and/or commas should be quoted (i.e. containing a text qualifier) so the file can be processed correctly[1].  Commas and quotation marks have significant meaning in CSV files. Commas indicate the separation between field values and quotation marks indicate where text values begin and end (particularly important when a text value itself contains an embedded comma).  Export utilities provided by database platforms typically handle this well, but if the export program is being developed by the agency it is important to know how to handle this situation. To signal that a quotation mark is a part of the text value and not an indicator of the beginning or end of a text value, you must immediately precede the quotation mark with a quotation mark, and surround the text value with quote marks, please see examples in the table below.
Text Value Export As
This is some "quoted" data "This is some ""quoted"" data"
"This" is some quoted data """This"" is some quoted data"
This is some quoted "data" "This is some quoted ""data"""

Read common issues with converting Excel to CSV.

Shapefile

A shapefile is a digital vector storage format for storing geometric location and associated attribute information.   It is a preferred format for more complex geographic structures, such as lines and polygons.  While the name implies a single file, it is actually a set of several files.  Your agency should organize the files into a zip file (.zip).  At minimum, each shapefile (.zip) should contain the following:

  • File defining the geometry (shapes) (e.g. geographic location of county boundaries) (.shp)
  • File providing the attribute table (e.g. population and other demographic characteristic associated with each county) (.dpf)
  • Projection file to ensure the feature locations are accurately rendered on the map (.prj)
  • Shape indexing file for efficient processing (.shx)

Shapefiles should use the WGS-84 Geographic Coordinate System (EPSG/WKID: 4326) or Web Mercator (Auxiliary Sphere) Projected Coordinate System (EPSG/WKID:3857/102100).

Shapefiles containing only point data will render better when converted to tabular CSV format.  Shapefiles containing point data are often rich sources of tabular data that can be mapped and analyzed on the platform in many ways. The .DBF file included in Shapefiles can be opened using Excel and the data converted to CSV, similar to other sources of tabular data.

Keyhole Markup Language

Keyhole Markup Language (KML) format specifies a set of features (place marks, images, polygons, 3D models, textual descriptions, etc.) for display on a map that is commonly associated with Google Maps and Google Earth.  It and its zipped or compressed version, KMZ, can be used to import geospatial data into data.iowa.gov.  Data should only provide a single geometric structure (e.g. point, line, or polygon) for a feature.  Including multiple geometric structures inside the multigeometry tag is not supported by the system.

Map Server Layer

Agencies can also publish geospatial data in data.iowa.gov by providing the URL to an endpoint for a map or individual map layer on an ArcGIS Server version 10.0 or above.  Maps published in this manner can have points, lines and polygons in different layers.   Map layers contained in a feature service need to be entered individually, but can be combined into a single map later on.  This approach allows maps to have customized point, line and boundary color.  The map is controlled and updated on your own server – not in data.iowa.gov.  The system, data.iowa.gov, will call every time a user loads the page.  Agencies need to:

  • Set up your services to reproject on the fly into WGS-84/Web Mercator, which is one of the features of ArcGIS 10.x.
  • Ensure your ArcGIS server runs on an https SSL cert.  If not, most browsers will reject the non-authenticated content.
  • You must have ArcGIS 10.0+, and provide an REST and SOAP endpoint to point at.

[1] If you are saving an Excel file as CSV, Excel will automatically put quotes around text fields requiring them.

Program Area
Transparency
Topic(s)
Open Data, Files, CSV, Shapefile, KML, Map Server Layer, Importing Data

Printed from the Iowa Department of Management website on April 24, 2018 at 3:45pm.