Search Mailing List Archives


Limit search to: Subject & Body Subject Author
Sort by: Reverse Sort
Limit to: All This Week Last Week This Month Last Month
Select Date Range     through    

[liberationtech] [sunlightlabs] need advice on using hashes for preserving PII's utility for disambiguation while protecting sensitive info

Michael Rogers michael at briarproject.org
Fri Feb 7 04:46:18 PST 2014


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

On 06/02/14 20:56, Margie Roswell wrote:
> For all I know, the lack of implementations using this kind of 
> one-way transformation isn't about government sluggishness but 
> rather about its feasibility. I'd be very curious to hear folks 
> ideas on this score, though.  My general hunch is that something 
> must be possible -- even a few bits' worth of disambiguating 
> information would be hugely useful to us, and presumably you're
> not leaking important amounts of information by, say, sharing the
> last digit of a DLN. So there must be a spectrum of options. But as
> is probably apparent, I don't think I've got a handle on how to
> think about this problem rigorously.

Even if you had a perfect method of anonymising the individual
records, they might be reidentifiable by examining the whole dataset:

http://33bits.org/2010/06/21/myths-and-fallacies-of-personally-identifiable-information/
http://randomwalker.info/social-networks/
http://www.cs.utexas.edu/~shmat/shmat_oak08netflix.pdf

At the level of individual records, you could use modular
exponentiation to anonymise the data. You pick a prime modulus p, and
each organisation that's going to publish anonymised data picks a
random secret value. Organisation X with secret value x anonymises a
piece of data d by publishing d_x = d^x mod p, and organisation Y with
secret value y anonymises the same data by publishing d_y = d^y mod p.

If X and Y want to know which records they have in common, X takes the
data published by Y and calculates d_x' = d_y^x mod p = d^(yx) mod p,
and Y takes the data published by X and calculates d_y' = d_x^y mod p
= d^(xy) mod p. For each record in common, d_x' = d_y', but neither
can de-anonymise records published by the other that they don't have
in common.

This can be extended to more than two organisations: pass the records
round in a circle, and when they get back to you they've been
exponentiated by all the secret values (order doesn't matter). Now you
can see which records you have in common with all the other organisations.

(Maybe. IANAC.)

Cheers,
Michael

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)

iQEcBAEBCAAGBQJS9NWaAAoJEBEET9GfxSfMyWcH/1Au9/066O/3AaPkkid8nBhq
2uuNjjLgDWzE+5aTIQGMzk9yy85TRKlXKdC4c9/n0UXxJjAUYxkLSoNkAD33ej36
s/oi3pI0C9OQ1MffJVCSImA+NwQ0QqDG6DOUBNPRoBUTr/nd5efbBRwWVtLSn50D
0QlLJYXUGGB+fSMZKyy368rrx5Ue8ICQOzIUyNJ3sWZsQEJo0nE8WJd1+89GlR45
XPRSUUma/5DCECl9gWBFq5pVuEtf29KoXV6QLCzagWCaAa2dNlCspoGp4bVlkBz9
UWMJRFHYDj9AxzUKt5Vi++uh6nYrTu++a7bXqOGJHb9y8VL54JHweEXNW2xWyog=
=BrUY
-----END PGP SIGNATURE-----



More information about the liberationtech mailing list