In this section:
When sharing data it is important to consider how you will make your data available. The two main options for making your data available are to:
- Share data via an API
- Share data in another machine readable format.
This guide gives advice on preparing for either data sharing and release option.
The most powerful way to make your data available for sharing is via an API.
An API (application programming interface) is a set of software instructions and standards that allows machine to machine communication. APIs are often called web services or web APIs. They allow for data to be automatically shared between applications.
APIs allow automatic querying of data at a granular level. Publishing APIs allows developers to build tools like apps and websites that automatically query your data and use it as part of their app, tool or service for consumers. Transport for NSW makes its timetable and real time running data available as APIs. Developers access these real time API data feeds and have used this data to create many transport apps and services that NSW commuters use every day.
Through an API:
- one application can use the functions of another application (e.g. process a payment)
- one application can use the data in another application – for discovery (e.g. search a register), visualisation (e.g. show electoral boundaries on a map) or query (e.g. check if a licence is current)
APIs provide up-to-date data
APIs automatically query the information from your designated business application.
This saves the data owner from having to routinely and manually update data, and ensures the data user always has the most current available information. This is very useful for dynamic environments, or environments where data currency and accuracy is needed for good customer service. APIs can ensure that the most current and complete data is available on demand.
APIs provide users with a lot of flexibility
People accessing your data via an API can have a lot of flexibility with how they can use the data. This creates many opportunities for innovation and service potential.
APIs create a range of opportunities for your data to add value
APIs amplify the potential of your data, and the potential of its benefits and returns for consumers, by making it more easily available to a large number of consumers and potentially allowing large numbers of new tools and services to be developed, that your agency would not have the potential to deliver.
APIs are increasingly supported by other tools
APIs work well with supported software, like geospatial software and mapping tools that integrate with APIs and remove the need to extract or host the data independently.
APIs enable users to only access the data they need
APIs support querying and filters, which enables you to only query and access the data you need, not download and support the entire dataset.
APIs can combine data sources
APIs can bring together data from several databases or applications into a single view for users to access.
APIs require investment
APIs involve technical complexity and so require good design to establish and manage the web requests needed to retrieve the data, translate it and save it as a single dataset, and ongoing maintenance to continue to ensure data remains fit for purpose and response times are adequate.
APIs should be built in standard formats and be documented
An API is usually a standardized service based on a common protocol (rules for how the service works) and formats (schema for using the service).
APIs are routinely described as RESTful JSON because they follow the REST architectural style and use JSON as its data representation format.
All APIs should have freely accessible documentation that has explains to developers how they can make use of your API.
Different API models have different security needs
APIs can be:
- Public: available for anyone to use to build their own applications, including commercial services
- Private: available within an agency or to authorised government users, to improve access and efficiency to key data
- Hybrid: available both internally and externally, with some data made available to the public, and other data more available for internal use and with business partners.
When building an API, it is important to understand which of these models you are building and to apply the appropriate security, legal, and technical rules.
Agencies should consider appropriate strategies to mitigate risks, such as using separate servers or networks for data exposed through APIs.
APIs may create uptime and availability expectations and dependencies with your users
When making data available via APIs, make sure you inform your users if there are ever up time issues.
Agencies using your data may have service dependencies on your API, and so will want to be informed if there are any service interruptions to your data.
You also may wish to apply certain throttle limits to your popular APIs, so that you can control access levels and ensure usage levels do not overload your servers.
If you don’t want to make your data available via an API, make it available in a machine-processable format like CSV.
CSV stands for comma separated values. It is a simple format for tabular data. Each line of the file is a data record. Each record consists of one or more fields, separated by commas.
To ensure your CSV is machine readable you should:
- Have valid encoding, with no odd characters
- Have consistent line breaks throughout the file
- Have declared machine readable headings
- Ensure all rows have the same number of columns
- Ensure there are no blank rows
- Ensure there is no whitespace between commas and double quotes
- Use consistent values
- Ensure all columns have names
You can use the tool http://csvlint.io/about to check that your CSV is machine readable, and whether it contains the columns and types of values it should.
Avoid publishing data in PDF or other format designed mainly for text as this substantially limits their reuse value.
If you are making data available as a CSV, you will be releasing it as a data snapshot and not as a real time data feed. A data snapshot:
- Is a point in time extract or copy of your data
- Can be created in a variety of formats that work with a wide range of software
- Are simply uploaded and hosted on a server
- Does not contain real time, or necessarily up to date data. The timeliness of your data for users depends on how frequently you are able to update the snapshots.
Cannot be filtered before download and so the full extract needs to be downloaded before any relevant data can be extracted or used.
Last updated: 19 June 2019