What is metadata and what elements should you use?

Metadata describes the context, structure, and format of data – it is data about the data.  It is a tool which we can use to manage our information resources.  Much like you would use an index in the back of a book to find a topic of interest, metadata provides an "index" for information and data holdings that greatly simplifies searching.  It also provides potential users of the data a better understanding of the data – how it was collected, its purpose and use, and definitions for data it contains.  Some general tips to writing metadata include:

  • Avoid large words
  • Limit or avoid use of jargon and acronyms
  • Use present tense
  • Use active voice

Metadata Elements

Establishing a common metadata vocabulary is critical to effective communication and to allow us to share our information with others.   The following is a list of metadata elements provided and descriptions of what they should include.  The metadata elements used on data.iowa.gov are organized into the following categories:

General Information

Dataset Title

Create a title for the dataset.  It should be in plain English and include sufficient detail to facilitate search and discovery.   Some basic elements should be considered when coming up with a title for your dataset:

  • The main numeric data available within your dataset should provide the foundation for your title (e.g. Vendor Payments; Assessed Property Values; Local Option Sales Tax Rates & Payments)
  • Known timeframes your dataset is limited to should also be used if applicable (e.g. FY 2013 Vendor Payments; 2012 Assessed Property Values)
  • Groupings used to summarize underlying data where record level detail is either not available or not provided due to sensitive or confidential data (e.g. 2012 Assessed Property Values by Tax District; FY 2014 Monthly Medicaid Payments by Vendor)

Description

Highlight information about what the dataset contains, and why is it important.  The description needs to provide sufficient detail to enable a user to quickly understand whether the dataset is of interest.  You should think about the types of questions the data will help people answer when drafting your description.

You will want to ensure your description is easily understood – appropriate to the public’s reading skills, and knowledge.  It should also be clear and direct, free of unnecessary jargon, acronyms and abbreviations.  Oftentimes acronyms and abbreviations have multiple meanings in different areas of government, industry, or even walks of life.  As such, unintended meanings for abbreviations and acronyms can cause confusion and uncertainty in what your data conveys.

Your dataset's data dictionary is a place where you can further describe the individual data elements contained within the dataset.

Category

Select the main thematic category for the dataset.  The following options are available to select from:

  • Communities & People - Data about the characteristics of Iowa communities and our people
  • Economy - Data about economic activities, employment, agriculture, and business and industry in Iowa
  • Education - Data about student achievement, and elementary, secondary and post-secondary education in Iowa
  • Environment - Data about Iowa's landscapes, habitats, environment and natural resources as well as the protection and conservation of those resources
  • Government - Data about government, and its spending, taxes, performance and operations
  • Health - Data about factors affecting health, health conditions and services available to Iowans
  • Public Safety - Data about law enforcement, fire protection, emergency services, crime and incarceration in Iowa
  • Transportation & Utilities - Data about conveyance of people and goods across Iowa, and our transportation, energy, and communications infrastructure

Tags/Keywords

Keywords should include terms you think citizens and other both technical and non-technical stakeholders might enter into a search engine to find your dataset.  Exclude generic words from your list.  The more specific the keywords are, the more likely the person searching for those keywords will find your dataset among the search results.

Settle on no more than 10 keywords and list them in descending importance.  Each term should be separated by a comma.  All terms will be converted so characters are lower case.

Row Label

Describe what each row in the dataset represents (if applicable) (e.g. case, licensee, business, etc.).

Licensing and Attribution

License Type

Licenses generally specify how your data can be used - whether it can be copied, distributed, edited, remixed and built upon.  You have the following options available:

For most datasets, Public Domain will be the most appropriate, as much of our data is not covered by a copyright.  If the dataset is subject to a copyright, this should be left null, and the use restrictions should be described in the Limitations under Disclaimers.

Below are choices that will are not applicable to our data:

  • Public Domain U.S. Government
  • Italian Open Data License 2.0
  • UK Open Government License v3
  • Nova Scotia Open Government License
  • Canada Open Government License
  • See Terms of Use

Data Provided by

Cite the agency, division, bureau and/or program as well as database, survey, report or related resource (where applicable) from which the described dataset is derived (e.g. Iowa Department of Administrative Services, State Accounting Enterprise, I3 Data Warehouse).

Source Link

Where available, provide the publicly accessible web address for the database, survey, report or related resource from which the described dataset is derived.

Coverage

Time

The date or time interval applicable to the dataset.  It is important to provide this so old data is not presumed to be current.   If you require time details, please contact the Department of Management.

Date Representation

A date represents a time period.  Agencies can represent dates using either calendar dates or week dates.  A calendar date is represented by the format YYYY-MM-DD.  YYYY is the year in the Gregorian calendar (e.g. 2014), MM is the month of the year falling between 01 (i.e. January) and 12 (i.e. December), and DD is the day of the month falling between 01 and 31.  A calendar date can be shorted to only reflect the month (e.g. YYYY-MM) or year (e.g. YYYY).  A week date is represented by the format YYYY-Www-DD.  Again YYYY is the year in the Gregorian calendar, W indicates weeks, ww is week of the year falling between 01 and 52, and DD is the day of the week with Monday being 01 and Sunday being 07.

Duration Representation

Duration is a component of time (e.g. one month or one year), and is represented by the format P[n]Y[n]M[n]D or P[n]W.  In this format, P indicates a time period, and [n] represents the number of components.  The Y, M, and D represent years, months, and days respectively.  W represents weeks.  (e.g. P2Y = two years, P5M = five months, P2Y6M = two years, six months, P10W = ten weeks, etc.).

Using a single date

A single date (e.g. 2014-06-30; 2014-06; 2014) should be used where the data was captured or is applicable to a single period in time, and no further updates are planned.

Using a time interval

A time interval is the intervening time between two dates represented by <start date>/<end date> (e.g. 2013-07-01/2014-06-30).  It should be used where data being published was collected over a period of time, and no further updates are planned.

Using a repeating time interval or duration

Repeating time intervals or duration should be used where the dataset will be updated on some periodic repeating interval.  Typically these are represented by R[n]/<start date>/<duration> or R/<duration>.   R indicates a repeating interval, and [n] represents the number of repetitions (e.g. R12/2013-01/P1M – Data is updated monthly 12 times starting January 2013).  If the [n] is not included, the number of repetitions is unbounded (i.e. endless) (e.g. R/2010/P1Y – means data is updated annually starting in 2010 and continuing through present).

Area

Provides the name of the geographic place of which the dataset is related.  Agencies should use names of geographic features provided in the USGS Geographic Names Information System (e.g. Iowa; Muscatine County, IA; Des Moines, IA; Rathbun Lake, Appanoose County, IA; Big Creek State Park, Polk County, IA; Walnut Township, Madison County, IA).  Multiple geographic places can be entered, where appropriate, and should be separated by a semicolon.

Disclaimers

Completeness

Provides information related to missing or incomplete data that would prevent users from being able to effectively aggregate and compare values.  This could be either the result of data quality issues, or due to the need to protect confidential data.  Agencies should also highlight any items that the public may perceive to be in the data (e.g. State of Iowa Expenditures do not include expenditures made by Regents institutions).

Limitations

Provides information related to any limitations on how the data can be used and/or summarized.  For instance, aggregate monthly data providing the number of unique recipients cannot be totaled to determine the number of unique recipients over a year, as one recipient may be included in multiple months.

Updates

Agency

Designates the state agency or institution that owns the dataset and is responsible for updates.

Update Frequency

Designates the frequency associated with data updates (to be associated with the update time period below).  Options include: As available (noting updates are random, and not on a set frequency), Every, and options for Every 2 through Every 30.

Update Time Period

Designates the time period associated with data updates (related to the frequency noted above).  Options include: Day, Days, Week, Weeks, Month, Months, Year, Years

Update Notes

Provides information on when new data is typically available.  (i.e.  For datasets updated every three months, agencies can list months data updates will be published.  For annual updates, agencies can list the month of the year new updates are typically available.)

Dataset Creation Steps

Data Export Steps

Field is not publicly viewable.  Used to highlight the source(s) for the data, name the queries or procedure run to produce the data extract, and any steps taken to transform the data to make it available for public consumption.  This documentation is intended to help future owners of the dataset ensure the dataset continues to be maintained in a consistent manner.

Quality Assurance Process

Field is not publicly viewable.  Highlights data checking and review steps used.  Summarizes checks used and steps taken to review your data, and clean or correct data where appropriate.

Confidential Data Redaction Steps

Field is not publicly viewable.  Highlights steps taken to de-identify or redact data that is confidential or sensitive.  Should state any disclosure thresholds that are used.

API Endpoint

Row Identifier

Select column containing permanent identifier for rows in dataset.  This gives developers a level of comfort knowing that they can use these columns to power their application.  Even if other columns get deleted or added, they are ensured that the applications built off of key identifying information within the dataset (for example an ID number for each row) will not break.

Thumbnail Image

Thumbnail Image

A public domain image or one that your agency has the copyrights to that represents the data contained in the dataset and can be used on data.iowa.gov’s homepage to feature the dataset.  It should be cropped so that it is a square.

Attachments

Attachments

Related documents such as a dataset glossary, quality assurance/quality control documentation, technical information about the resource, developer documentation, etc.

Contact Information

Contact Email

Email for the contact responsible for answering questions related to and receiving feedback about the dataset.  Address will not be displayed publicly, and will default to the account’s email if left blank.  Unless agencies have a specific organizational email to use for this purpose, it is recommended that the field be left blank and default to the email on the account.

Program Area
Transparency
Topic(s)
Open Data, Metadata, Importing Data

Printed from the Iowa Department of Management website on April 21, 2018 at 8:13am.