WRITTEN ON December 5th, 2011 BY William Heath AND STORED IN Uncategorized
Here’s a note I sent Francis Maude’s office after a Cabinet Office “Industry User Panel” meeting at 70 Whitehall on 28 July.
Open data is good
The government’s “Open data” agenda, from the “Power of Information” through data.gov.uk to Linked data is an acknowledged success politically and economically. The government is right to seek to build on this success. The PM’s letter to Ministers is almost entirely spot on. Cabinet Office’s determination to make rapid strides towards further announcements in the autumn is welcome.
More open data: easy wins
Open, public data means essentially data about things: stats, finance, geography, physical assets eg infrastructure of all sorts. There’s a great deal more that could helpfully be released: timetables for transport and much else, locations of assets, restaurant health ratings, keying data/indices, org structures eg of local authorities/NHS trusts/education, weather data, more detailed mapping data (under open licence), Companies House data, contracts/procurement/tendering data (with reliable feeds), Ofcom data re radio frequencies. These are easy “more of the same” open-data wins.
A dangerous path: the false promise of “anonymised” data
One path under consideration (outlined at a 21 July meeting held at 70 Whitehall under the Chatham House rule) is to release anonymised individual-level health, welfare, education or census records. The suggestion is that existing wealthy “big data” companies could apply data mining and deliver added-value services with consequent economic benefit. This is presented as attractive, and the government is in a mood to remove obstacles.
It’s entirely to be expected that today’s big data companies welcome this (though they would prefer if the data were not anonymised or pseudonymised so they could be matched to their existing highly granular records). There’s a case study from the previous administration which exploits health records, and cites performance improvement as well as commercial success as a result.
This path is highly problematic for reasons the convenor of the 21 July meeting chose not to explore. Given the richness of data and power of processing available “anonymised” records are now proven to be easy to deanonymise in practice. This problem is not “philosophical” or “merely a theory” but proven in academic studies and in practice. It means that individual-level data, even if anonymised, must be treated – morally, politically, legally and practically – as personal data.
This path therefore holds high legal risk. It would undoubtedly bring political opposition as the hitherto largely Coalition-friendly opponents of the “database state” find new cause in the for-profit exploitation of an asset that is often highly personal but has been demanded from individuals as a precondition of providing public services.
But it’s also not the most effective way to unleash the economic power of the data.
Much more promising path: unleashing the power of personal information
The far more promising next step is to unleash the economic power of personal data in responsible collaboration with individuals. This is entirely in keeping with
• the Conservative manifesto promise to restore control over personal data to the individual
• the emerging Cabinet Office ID assurance programme which replaced the benighted national ID scheme
• the BIS/Cabinet Office Mydata policy which sees structured data returned to individuals
• policies on empowerment, personalization, participation and self-service in health, education and jobs
This depends on the individual being equipped with a personal data service to allow them to manage, verify and share their personal data online under their control. Such services are rapidly becoming available, from dozens of startups around the world. Mydex is one, and the UK is to date the only country to have shown such a service working live.
When individuals control their own shopping, health, finance and general administrative data with a personal data store they can make it available in a manner that is permissioned, structured, scalable and discoverable. Data of this sort is called “volunteered personal information”: personal, permissioned, verified where necessary. Small examples today are the online search term, monetized to good effect by Google, or shared personal social data monetised by Facebook. When the individual has a proper platform and control over their personal data they can realize the fuller value of their correct name, address and contact details (which saves huge administrative costs when shared correctly), their real needs and the questions to which they seek answers, their future buying intentions, and all the feedback, criticism and advice they can offer.
This means immense savings (personal data holdings at DVLA or health services can be cleaned up removing huge ineffiencies; census data could be submitted virtually free, as often as ONS needed it). But it also opens up the sort of ambitious economic growth agenda the Government seeks.
The value of these flows of volunteered personal information is estimated at £20bn/year in the UK by 2020.
So what should Cabinet Office do short term?
The answer is do what it’s doing, but explicitly join the dots between various initiatives:
• ID assurance
• midata (formerly Mydata)
• Restoration of control over personal data to the individual
• Personalised, participative public services with more self-service
Make clear that government understands and respects the distinction between “open data” (about money, assets, infrastructure, stats, geography) and personal information including anonymised or pseudonymised information. Consider a new “power of personal information” agenda which unleashes the power and value of volunteered personal information under the explicit control of the individual. This is the ethical and legal way to do it, and politically and economically the most attractive.
What not to do
Do not heed the call to market “anonymised” individual-level records data as if this were open data. It isn’t. Any attempt to do so will compromise the good work and reputation of the authentic open data initiative. It will bring serious legal and political consequences. And it misses the bigger economic opportunity of volunteered personal information.
4 August 2011
2. See eg Robust De-anonymization of Large Sparse Datasets, Arvind Narayanan and Vitaly Shmatikov The University of Texas at Austin; Broken Promises of Privacy: Responding to the Surprising Failure of Anonymization Paul Ohm, University of Colorado Law School
3. Database State – Joseph Rowntree Reform Trust
4. This pilot is described in Government and IT – “A recipe for rip-offs”: Time for a new approach, Public Administration Select Committee, July 2011
5. Source: Ctrl-Shift research The rise of volunteered personal information 2008. Already available within Cabinet Office