Search Mailing List Archives


Limit search to: Subject & Body Subject Author
Sort by: Reverse Sort
Limit to: All This Week Last Week This Month Last Month
Select Date Range     through    

[liberationtech] Opinion on a paper?

Alec Muffett alec.muffett at gmail.com
Sun Sep 9 14:50:23 PDT 2012


On 9 September 2012 17:44, Paul Bernal (LAW) <Paul.Bernal at uea.ac.uk> wrote:
> I've just come across this paper: "The 'Re-Identification' of Governor
> William Weld's Medical Information: A Critical Re-Examination of Health Data
> Identification Risks and Privacy Protections, Then and Now"
...
> I wondered if anyone had an opinion on it

I find it a) misconceived and b) naive and c) illiberal.

The aspect that I find misconceived is that part dealing with
proposition that (to paraphrase) there were more people in Cambridge
Massachusetts than were on the voter register, therefore THE PERSON
IDENTIFIED MIGHT NOT HAVE BEEN THE GOVERNOR.

ie: there could be a shred of doubt, or something, BECAUSE WE DID NOT
START WITH A PERFECT DATABASE OF EVERYONE.

My response: "whoopee, if that's your argument then you have missed the point."

This is a security problem I've had to spend years explaining to
students in a different context: I used to run a university computer
system and occasionally would run into a smartass student who would
tell me: "I don't need a good password because I only use this system
for printing"

My response would be that there was also a collective responsibility
and that I wasn't trying to protect him, I was trying to protect
everyone, and the mainframe as a whole, and also that I would ban him
unless he fixed his password.

Hence b) naive: to restate the flaw in the context of the paper, tell
me which of the following states is the most desirable:

1) we have an anonymised database and *all records* can be
uncloaked/de-anonymised with certainty

2) we have an anonymised database and *all records* can be
uncloaked/de-anonymised with reasonable likelihood

3) we have an anonymised database and *some records* can be
uncloaked/de-anonymised with certainty

4) we have an anonymised database and *some records* can be
uncloaked/de-anonymised with reasonable likelihood

5) we have an anonymised database and *given persons* can be
found/de-anonymised with certainty

6) we have an anonymised database and *given persons* can be
found/de-anonymised with reasonable likelihood

7) we have an anonymised database and no-one can be
de-anonymised/identified at all

I believe that - except for option 7 - none of these constitute an
"anonymised" database.

So in a sense: who cares if it was the Governor who was identified, if
you're fretting about the possibility that the Governor might not have
been registered to vote (upon which the paper hinges, viz: the
author's "Myth Of The Perfect Population Register") - then you should
equally well fret that Sweeney might have just have rolled more
databases into the mix until enough data was present to nail the
Governor's medical records.

Restating the naivety one last time:

i) you don't have to start with a census of every human being alive in
order to prove in a court of law that you have identified the
Governor's medical records by correlating databases; you just have to
start with *enough* data to make the attack worthwhile

ii) there are a class of attacks apparently completely ignored by the
paper, largely because it focuses with tunnel vision upon the "snipe
the Governor's data" problem (state 5, above) - the other problems are
based around states 1 thru 4, ie: start with a disease and find some
people who have it, rather than start with a person and discover their
diseases.

iii) the paper as a whole seems to think that introduction of
reasonable doubt brings down de-anonymisation like some house of
cards, whereas (eg) insurance companies are more than happy to reject
peoples applications on the basis of demographics let alone specific
information about them, so absolute certainty is not a requirement for
many businesses based upon having better information than your
competition.

Finally with the illiberality, a quote:

---- begin ----
Several recommended best practices for the use of de-identified data
which should be considered by regulators as possible mandatory
de-identified data use conditions include:

1) Prohibiting of the re-identification, or attempted
re-identification, of individuals and their relatives, family or
household members.

HHS should establish civil and criminal penalties for any unauthorized
re-identification of de-identified data (and for limited data sets). A
carefully designed prohibition on re- identification attempts could
still allow Institutional Review Board (IRB) approved re-
identification research to be conducted, but would ban any
re-identification attempts conducted without essential human subjects
research protections.
---- end ----

Much as I don't like big brother, I dislike even more bigger brothers
that try to impose forms of thoughtcrime, and this is one such.

Let me know if you want to collaborate on a response, I'll blog this
later this week after a cleanup.

    - alec

-- 
http://dropsafe.crypticide.com/aboutalecm



More information about the liberationtech mailing list