What are common issues when converting an Excel file to CSV?

One of the most common ways in generating CSV file format is to convert an Excel file to CSV.  In doing so, there are a few things you need to be sure to do to prevent the data from being corrupted.

Improper numeric interpretation

When importing data into Excel, ensure you are importing data as text.  Doing so ensures that the cell is displayed exactly as entered. If this is not done, numeric content is often subject to interpretation by Excel.  For example, if ’11-1961’ is imported as ‘General’ format, it will appear as ‘Nov-61’, since Excel recognizes this as a date format.  If formatted as ‘Text’, then ’11-1961’ is displayed as entered.  Importing data as text also ensures that any codes that contain leading zeros retain the leading zeros.

Blank columns and rows

Excel will allow for blank rows and columns at the end of datasets if you click outside the active data and save it.  That is, if you click outside the border of the active data and save it as CSV, Excel will read these blank rows and columns as part of the data and save the blank rows and/or columns.  This may not appear to be part of the spreadsheet but will, in fact, result in blank rows and columns being included when saving as CSV.  Care must be taken to ensure that any blank rows and columns have been removed prior to creating your CSV files.  An easy way to determine whether such blank rows or columns are present in Excel is to press [Ctrl]+[End] inside the spreadsheet and see if this takes you beyond your data in the spreadsheet.  If it does, the data file contains blank rows and columns that need to be deleted.  Delete the blank rows and columns, save the Excel file, and then press [Ctrl]+[End] again – this should now take you to the last row and column in your data.

Merged cells

Merged cells cannot be reproduced in a CSV, and should be removed from any excel file before saving as a CSV.

No data components for calculated fields

Data should include the data components used for calculated fields.  For example, if two figures were added together to create a summary value, you should include three columns: one for the first added value, a second column for the second added value, and a third column for the sum of the two.

Multiple lines of data in single cell

A cell should only contain one item of information.  Multiple lines within a cell will cause the import process to fail.  Cells that contain collections of data are impossible to evaluate and could cause problems for end-users of your data.

Program Area
Transparency
Topic(s)
Open Data, Files, Excel, CSV, Importing Data

Printed from the Iowa Department of Management website on April 24, 2018 at 3:40pm.