Manually uploading data on the open data portal

Note: The features detailed below are only available to City and County staff.

There are two ways to import data to the open data platform - manually through the user interface or programatically. For programmatic access, DataSF provides automation support as part of the standard publishing process.

Regardless of which data upload process you use, you must start by submitting dataset information on the publisher portal. If you've already done this and have gotten the go ahead from DataSF, you can follow the directions starting below to manually upload your data.

Who can import data?

You need to be either an administrator, publisher or editor on the open data portal. Publisher and Editor roles are granted by the open data administrators. You will be assigned the appropriate permissions for your City email address after review of your publishing submission, if you do not have an account already.

Accessing the import user interface

  1. Sign into the portal
  2. You will see a link appear which will say ‘Hello, {your username}’
  3. Clicking on this link will take you to your profile on the site.
  4. Near the center of the page above the data view catalog you will find a ‘Create a new dataset’ button. This is where the import process starts. Upon clicking on the button, you will proceed via a series of steps to upload your data file onto the platform.

Selecting the type of data file

You are presented with four options for what kind of file you want to import -

Design from Scratch: Choose this if you don’t have a data file to import yet. You will be able to define a dataset schema and input the data later online.

Import a Data File: Choose this if you already have a datafile on your computer which you wish to import. There are four file types allowed - .csv, .xls, .xlsx and .tsv.

Upload a Non-Data File: If your data file is not in any of the four allowed formats, such as a PDF or an image file, you are still able to host it on the data portal via this option. Note that though the file which still be searchable, it cannot be interacted with as with the other four file types.

Link to External Data: Choose this is you want to link to data hosted on another site. This data will not be imported.

Select the Import a Data File option as this is the option used for almost all new datasets (exceptions made on a case by case basis)

Choosing the source of your data file

In this second stage of manually importing a data file, when you click on the button you will be asked where your file is located:

On my computer: Choose this if the data file is on the computer you are using. It must be either a .csv, .tsv, .xls, or .xlsx file.

If you select this option, you will next be prompted to select the file from your machine:

On the internet: Choose this if your data lives on the internet in .csv, .xls, .xlsx or .tsv, and you have the HTTP(S) URL to it. 

If you select this option, you will next be prompted to provide this URL:

Review the data file schema

This stage allows you to review the columns of data which are being imported onto the platform and the overall schema of the dataset, including the column ordering, column datatype, and the number of rows which are headers.

Name: This is a text field which automatically take the text from the first row of the file as the header, which will be the column header in the platform dataset. These fields can be edited if you want to change the name of the column during this stage, for example if you wanted to change ‘% of world population’ to ‘Percentage of world population’ in the image above. These column headers can be changed when the dataset is imported, as well.

Data Type: The platform reads the first few rows of each columns to make an educated guess on the data type of the column. For example, the entries under ‘Country (or dependent territory)’ are read as Plain Text. More on the different data types our platform supports in the article Importing, Data Types and You!

Note: Sometimes the platform’s guess is wrong and this can be corrected by selecting the appropriate type from the menu which appears when you click on the data type. Your data is heavily processed when it is imported, and optimized for consumption for each particular data type. In addition, most data types have special features that will only be activated should the column be set to that type - location columns, for example, are necessary to create maps.Though it is possible to change the data types later on, it can sometimes be difficult especially if the dataset is large (~ > 50,000 rows) and can result in loss of data. We thus advise a thorough review at this stage to avoid the hassle later on.

Source columns: These are the columns from the underlying dataset whose entries will populate the platform dataset under the ‘Name’ columns.

Other options include:

  • adding a row

  • deleting a row

  • clearing all rows to start with an empty dataset

  • resetting the schema configuration changes to the original

Under Headers, the platform will by default read the first row as the header. You can use the ‘Fewer Rows’ and ‘More Rows’ to set the accurate number of rows which are headers, if it isn’t the first.

Define metadata at import

Metadata is descriptive data about the dataset. You can reference your publishing submission for the relevant metadata. You will receive a copy of your submission from DataSF staff before manual publishing.

Dataset Title (required): The minimum amount of metadata required is a title for the platform dataset. This has to be unique from all other datasets and views on the site. 

Brief Description (required): Any details you want to provide to elaborate on the title, even summarizing the different metadata information you are providing.

Category (required): You can select from a list of default categories which best encompasses the kind of data your dataset will hold.

Tags and Keywords: You can add in words which will help make your dataset more searchable on the data portal. The individual tags/keywords should be separated by commas.
Additionally required: Publishing Department, Geographic Unit, Data Steward name, Data Steward email, Publishing Frequency, Data Change Frequency



Licensing & Attribution

The City has adopted a single license for all datasets. Please select Open Data Commons > Public Domain Dedication and License from the available drop downs.

Attachments

Please upload your data dictionary (definition of fields) here.

You will be able to attach files such as PDFs and images once the data file has already been imported by going to the About panel > Edit Metadata once the data file has been fully imported.

Privacy and Security

By default all imported data files have a private setting so that you can publish the file without it being visible to other users on the portal. Do not set the dataset to public on initial upload as there will need to be additional review of the data before making it available. DataSF staff will mark the dataset public after you and other reviewers give the final okay.

…and that’s the last step! Click on ‘Finish’ and your file will be fully imported onto the platform. Send the URL of your private dataset to DataSF staff and we'll get back to you as soon as we are able. If there are no issues, the data will be made public and you can celebrate your new dataset!

Did you find this article helpful?