The data quality statement aims to help you understand how a particular dataset could be used and whether it can be compared with other, similar datasets. It provides a description of the characteristics of the data to help you decide whether the data will be fit for your specific purpose.
About the data quality rating:
The reporting questionnaire asks five questions for each of these data quality dimensions:
- Institutional Environment
- Accuracy
- Coherence
- Interpretability
- Accessibility
For each question: “yes” = 1 point; “no” = 0 points
The number of points determines the Quality Level for each dimension (high, medium, low).
Only dimensions with four or five points receive a star.
Points | Quality Level | Star / No Star |
0 | LOW | No Star |
1 | LOW | No Star |
2 | LOW | No Star |
3 | MEDIUM | No Star |
4 | MEDIUM | Star |
5 | HIGH | Star |
More information?
The data quality reporting questionnaire and further explanation of the data quality dimensions is provided in the NSW Government Standard for Data Quality Reporting published at https://data.nsw.gov.au/data-policy
Quality relates to the data’s “fitness for purpose”. Users can make different assessments about the quality of the same data, depending on their “purpose” or the way they plan to use the data.
The following questions may help you evaluate data quality for your requirements. This list is not exhaustive. Generate your own questions to assess data quality according to your specific needs and environment.
- What was the primary purpose or aim for collecting the data?
- How well does the coverage (and exclusions) match your needs?
- How useful are these data at small levels of geography?
- Does this data source provide all the relevant items or variables of interest?
- Does the population presented by the data match your needs?
- To what extent does the method of data collection seem appropriate for the information being gathered?
- Have standard classifications (eg industry or occupation classifications) been used in the collection of the data? If not, why? Does this affect the ability to compare or bring together data from different sources?
- Have rates and percentages been calculated consistently throughout the data?
- Is there a time difference between your reference period, and the reference period of the data?
- What is the gap of time between the reference period (when the data were collected) and the release date of the data?
- Will there be subsequent surveys or data collection exercises for this topic?
- Are there likely to be updates or revisions to the data after official release?