Jach's personal blog

(Largely containing a mind-dump to myselves: past, present, and future)
Current favorite quote: "Supposedly smart people are weirdly ignorant of Bayes' Rule." William B Vogt, 2010

What if we used Scrypt with one-minute work factor for SSNs?

I've only lived about 24 years on this planet, and already I feel like my SSN has probably been compromised somewhere due to the leaky information channels I've had to transmit it, database hacks (published and unpublished) of big companies (government or non-government), a couple mistakes on my part, and maybe some other reasons. It'd be nice if this weren't so, just about every story I hear of people who do get an attacker is unpleasant. I don't want it to happen to me.

Would Scrypt be a panacea? No. But it would surely be better than what we have now. We could make it so that there's only a single point of failure instead of multiple ones in easily identifying the link between a person's name and their SSN by having that unhashed association existing within one government agency and nowhere else. (Make it illegal to store SSNs unhashed.)

If my SSN is required, the party requiring it can give me a publicly known salt (for instance, their company name) and request I send them my full legal name and the result of scrypt(SSN, genSalt(work_factor) + salt + interactionNumber) where work_factor is such that on the latest AMD GPU generating the hash takes, say, one minute. For an average person's CPU, this could take quite a bit longer. interactionNumber is just an integer representing how many times the company has had to request the SSN. Say they're a loan company: you want a loan, you give SSN hash (and probably a hash with a credit agency's salt+interactionNumber too so the loan company can look up your credit score). If you want a new loan, or an extended loan, you give a new SSN hash with a new interactionNumber.

The company verifies your hash by sending the same request to the government agency, who will compute the same hash (or multiple ones if there's a name collision, prioritizing names who in the past were shown to most frequently use the service), and if the company finds a match they know the customer is who they say they are (or compromised a SSN, which is a risk with or without this scheme). Now if they decide to store the hash, and later on their database gets leaked, or a printout gets thrown into the dump without being shredded, or the customer had to write down the hash and mail it and people intercepted the mail or saw its unshredded thrown away copy, it's okay, whereas in the current system one look and you know the person's SSN. Now you just know their hash for this company, for this one interaction. And you could pose as a company and get the government agency to reveal that yes the hash matches that person for that company ID and the interactionNumber, but you couldn't -do- anything with that hash. If you want to do anything with those companies as that customer, you need to know the real SSN so you can compute a new hash with an incremented interactionNumber that they'd give you.

You could try brute-forcing the SSN with the hash. For this one hash, every attempted SSN will take one minute on the best GPU. You can narrow the search space significantly due to the nature of SSN distribution, but for the moment let's just assume you can only randomly try 10-digit numerical guesses. There are 10,000,000,000 (10 billion) possibilities. To try them all will require ten billion minutes of computational time, or roughly 19,000 years. And this is just for this one hash with the salt. If you had a database dump of lots of hashes, and you wanted to compromise them all, it'd take in the worst case about 19,000 computational years per hash.

This isn't impossible. If you had a large botnet, targeting Litecoin miners and such, as well as put up your own capital, it's not infeasible that you could set up a farm of say 40,000-best-GPU-equilvaent machines, and be able to crack any hash you wanted in at most 6 months. But who has the time and money for this, and is the payoff worth it? Identity fraud isn't that lucrative, is it? I mean, suppose I'm right that my SSN has already been compromised, why haven't I become a victim yet? (My credit isn't bad either! Just lacking long history since I'm young and refuse to get a credit card.) The only other party with such computational power for this purpose would likely be the government agency itself who has to handle so many requests.

A one-minute time seems a little short when you realize that the first 3 digits can often be divined from public knowledge associated with a person's name. This brings down the worst-case time to a mere 19 years, given you have the first three digits. I have a friend who had that many GPUs just as a hobbyist Litecoin miner. If you have the last four digits too, this scheme buys little.

In this scheme, keeping the whole SSN safe, and trying to randomize distribution of future SSNs (even increasing their length), is the most critical point. In fact it's the same critical point as now, except under this scheme you're not immediately hosed when a party you trusted your data with violates the trust.

Will anything like this be implemented any time soon? Not likely.

A strictly better solution would be to just issue every citizen a public/private 4096-bit RSA key pair. Then the only risk is if the citizen themselves compromises their own private key. But then they can just get a new one reissued.

Posted on 2014-07-05 by Jach

Tags: pithy, thought


Trackback URL:

Back to the top

Ty February 10, 2015 12:20:39 PM The problem with using SSNs for authentication is that they were not designed nor intended for this. But this is how we use it. I think it is interesting that your conclusion is that we use something better for authentication, because this is exactly the only way for it to be secure.
Jach February 17, 2015 07:33:51 PM Yeah, the hashing idea was just a fun thought experiment. Today I learned Taiwan actually has a citizen digital certificate database, even if it has problems. (,5&sciodt=0,5)

One aspect that gets overlooked in the "just use public key encryption" idea is what to do when the private key is compromised, or a con artist claims some person's key is compromised and manages to get a revocation signature from the target, since in that case to issue a new key we have to fall back on more primitive forms of identification like possession of something signed with the previous key (such as a driver's license), or in-person interviews where the government takes a blood sample and compares your DNA with the sample taken at birth and initial key assignment. (Having every citizen's blood on record itself leads to some interesting possibilities beyond the identity verification discussion.)
Back to the first comment

Comment using the form below

(Only if you want to be notified of further responses, never displayed.)

Your Comment:

LaTeX allowed in comments, use $$\$\$...\$\$$$ to wrap inline and $$[math]...[/math]$$ to wrap blocks.