Search Mailing List Archives

Limit search to: Subject & Body Subject Author
Sort by: Reverse Sort
Limit to: All This Week Last Week This Month Last Month
Select Date Range     through    

[liberationtech] [sunlightlabs] need advice on using hashes for preserving PII's utility for disambiguation while protecting sensitive info

Margie Roswell mroswell at
Thu Feb 6 12:56:04 PST 2014

 PII = personally identifiable information

(Anyone who can address the question probably already knows that... but I
was curious, and figured I'd spare others the look-up.)

-- (Please send events; This site is hungry.)

On Thu, Feb 6, 2014 at 3:49 PM, Tom Lee <tlee at> wrote:

> We've been kicking around an idea at Sunlight that aims to use
> cryptographic ideas to resolve some of the concerns around the publication
> of publicly identifiable information in government disclosures. I could use
> some smart people to tell me what's dumb about it.
> We often face challenges related to disambiguating entities: is the John
> Smith who gave political donation A the same John Smith that gave political
> donation B? One obvious solution to this problem is to push to expand the
> information that's collected and disclosed -- if we had John's driver's
> license number (DLN), for instance, it'd be easy to disambiguate these
> records. But that could introduce privacy concerns for John. One approach
> to this problem (which I don't think government has tried) is employing a
> one-way hash.
> Obviously the input key space for DLNs and most other personal ID numbers
> is so small that reversing this with a dictionary attack would be trivial.
> You can add a salt, but only on a per-entity basis (not a per-record basis)
> if you want to preserve the capacity to disambiguate. That in turns calls
> for a lookup table in which the input keys are stored, which kind of
> defeats the point of using a hash (you might as well just assign random
> output IDs for each input ID). I would worry about government's ability to
> keep this lookup table secure, and I worry about the brittleness of such a
> system.
> Alternately, you can use a single system-wide secret (or set of secrets)
> to transform inputs into reliable outputs. I think this is less brittle and
> maybe easier to preserve as a secret, but this system might be too easily
> reversible given the ability to observe its outputs and know the universe
> of possible inputs. I'm unsure of the cryptographic options that might be
> appropriate here.
> For all I know, the lack of implementations using this kind of one-way
> transformation isn't about government sluggishness but rather about its
> feasibility. I'd be very curious to hear folks ideas on this score, though.
>  My general hunch is that something must be possible -- even a few bits'
> worth of disambiguating information would be hugely useful to us, and
> presumably you're not leaking important amounts of information by, say,
> sharing the last digit of a DLN. So there must be a spectrum of options.
> But as is probably apparent, I don't think I've got a handle on how to
> think about this problem rigorously.
> Tom
> --
> You received this message because you are subscribed to the Google Groups
> "sunlightlabs" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to sunlightlabs+unsubscribe at
> To post to this group, send email to sunlightlabs at
> Visit this group at
> For more options, visit
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the liberationtech mailing list