The Personal Information Factor (PIF) tool has been used by the Data Analytics Centre (DAC) to assess re-identification risk, ensuring that COVID-19 cases and tests data can be safely published as open data.
The DAC has played an important role in supporting the NSW Government’s COVID-19 pandemic response efforts in collaboration with the other clusters. De-identified data on COVID-19 cases and tests by postcode, age group and likely source of infection was provided to DAC by the Ministry of Health. This data was released as open data on Data.NSW. This provided transparency and helped inform the community about the impact of COVID-19 as the situation evolved.
Although the data provided by NSW Health was de-identified, there remained a small risk that an individual could be re-identifiable in the data. The DAC assessed the risk of re-identification using the Personal Information Factor (PIF) tool, which was developed as a collaborative effort with State governments, Commonwealth agencies, research organisations and the private sector, led by the NSW Chief Data Scientist. Further information is available in the ACS Privacy-preserving data sharing frameworks report.
The PIF tool is used to assess the risk of identifying an individual in a dataset and provides a measure of the information that could be gained about them by accessing the dataset. Based on this information, decisions can be made with respect to aggregating or suppressing information to reduce the risk of re-identification while maintaining the utility of the dataset. For example, the data on COVID-19 cases was split into four tables and data for certain cases was suppressed before being released as open data. Used in conjunction with an assessment of the sensitivity of the data, the PIF tool can assist in determining the appropriate governance model to be applied to the data. The higher the sensitivity of the data (and the higher the PIF score), the higher the level of trust required. High trust environments demand stronger data governance. However, high sensitivity data that undergoes transformation to reduce the PIF score may be able to be released into a low trust environment, for example as open data.
Last updated 07 Jun 2021