When sharing data, it is important to consider how you will make your data available. The two main options for making your data available are to:
- Share data via an API
- Share data in another machine- readable format.
This guide gives advice on preparing for either data sharing and release option.
Sharing data via an API
The most powerful way to make your data available for sharing is via an API.
An API (application programming interface) is a set of software instructions and standards that allows machine to machine communication. APIs are often called web services or web APIs. They allow for data to be automatically shared between applications.
APIs allow automatic querying of data at a granular level. Publishing APIs allows developers to build tools like apps and websites that automatically query your data and use it as part of their app, tool or service for consumers. Transport for NSW makes its timetable and real time running data available as APIs. Developers access these real time API data feeds and have used this data to create many transport apps and services that NSW commuters use every day.
Through an API:
- one application can use the functions of another application (e.g. process a payment)
- one application can use the data in another application – for discovery (e.g. search a register), visualisation (e.g. show electoral boundaries on a map) or query (e.g. check if a licence is current)
APIs provide up-to-date data
APIs automatically query the information from your designated business application.
This saves the data owner from having to routinely and manually update data, and ensures the data user always has the most current available information. This is very useful for dynamic environments, or environments where data currency and accuracy is needed for good customer service.
APIs provide users with flexibility
People accessing your data via an API can have greater flexibility using data. This creates opportunities for innovation and service potential.
APIs create opportunities for your data to add value
APIs amplify the potential of your data, and the potential of its benefits and returns, by making it more easily available to consumers. This allows for new tools and services to be developed that your agency may not have been able to deliver.
APIs are increasingly supported by other tools
APIs work well with supported software. For example, geospatial software and mapping tools integrate with APIs and remove the need to extract or host the data independently.
APIs enable users to only access the data they need
APIs support querying and filters, which enables you to only query and access the data you need, rather than download and support the entire dataset.
APIs can combine data sources
APIs can bring together data from several databases or applications into a single view for users to access.
APIs require investment
APIs involve technical complexity and so require good design to establish and manage the web requests needed to retrieve, translate and save data as a single dataset, as well as ongoing maintenance to ensure data remains fit for purpose and response times are adequate.
APIs should be built in standard formats and be documented
An API is usually a standardized service based on a common protocol (rules for how the service works) and formats (schema for using the service).
API protocols are typically either SOAP (Simple Object Access Protocol) or REST (Representational State Transfer). REST is preferred by many because it’s based on the familiar http Web protocol.
All APIs should have freely accessible documentation that explains to developers how they can make use of your API.
Different API models have different security needs
APIs can be:
- Public: available for anyone to use to build their own applications, including commercial services
- Private: available within an agency or to authorised government users, to improve access and efficiency to key data
- Hybrid: available both internally and externally, with some data made available to the public, and other data more available for internal use and with business partners.
When building an API, it is important to understand which of these models you are building and to apply the appropriate security, legal, and technical rules.
Agencies should consider appropriate strategies to mitigate risks, such as using separate servers or networks for data exposed through APIs.
APIs may create uptime and availability expectations with your users
When making data available via APIs, make sure you inform your users if there are ever up time issues.
Agencies using your data may have service dependencies on your API, and so will want to be informed if there are any service interruptions to your data.
You also may wish to apply certain throttle limits to your popular APIs, so that you can control access levels and ensure usage levels do not overload your servers.
Sharing other forms of machine readable data
If you don’t want to make your data available via an API, make it available in a machine-processable format like CSV.
CSV stands for comma separated values. It is a simple format for tabular data. Each line of the file is a data record. Each record consists of one or more fields, separated by commas.
To ensure your CSV is machine readable you should:
- Have valid encoding, with no odd characters
- Have consistent line breaks throughout the file
- Have declared machine readable headings
- Ensure all rows have the same number of columns
- Ensure there are no blank rows
- Ensure there is no whitespace between commas and double quotes
- Use consistent values
- Ensure all columns have names
You can use the tool http://csvlint.io/about to check that your CSV is machine readable, and whether it contains the columns and types of values it should.
Avoid publishing data in PDF or other format designed mainly for text as this substantially limits reuse value.
If you are making data available as a CSV, you will be releasing it as a data snapshot and not as a real time data feed. A data snapshot:
- Is a point in time extract or copy of your data
- Can be created in a variety of formats that work with a wide range of software
- Are simply uploaded and hosted on a server
- Does not contain real time, or necessarily up to date data. The timeliness of your data for users depends on how frequently you can update the snapshots.
Cannot be filtered before download and so the full extract needs to be downloaded before any relevant data can be extracted or used.