If you're like most sophisticated Internet aficionados, you probably have a very clear idea of what a good password is, and have had to follow the formula to create one over and over again. And, admit it: You probably have just a handful that you re-use across all the websites you visit.
You can recite "good" password rules by heart: eight or more characters comprising a mix of upper- and lower-case letters, numbers, and punctuation, and omitting the use of any words found in dictionaries, including substitutions (such as @ for a in p@assword or 1 for lowercase l in fai1).
A typical list of "safe password" requirementsYou repeat these rules to less-technical friends and family, and hope they are observing the same kinds of care, although you're pretty sure they aren't. As you type a new password (or one of your repertoire) into a website's form field and see a little color bar go from red for a weak password to green for a strong one, you relax a little bit.
You probably also think that you're savvy enough to avoid being phished: mistaking a fake site for a real one and entering your credentials into it. You're attentive to site details, and on sites that offer it, you pick an image that is supposed to jog your memory on your return so you know it's the legitimate site.
"It's a gut feeling when a password has all of these things—uppercase and numbers—how could anyone guess this?" says Markus Jakobsson, an applied security researcher who has written extensively about passwords and studied real user behavior.
Take a deep breath, because most of what you've been told about safe passwords is incorrect. Observing the rules might result in you creating a password that resists typical cracking techniques. But you could also be devising one that could be cracked in not much more time than "password" or "123456". And even if your password is good, the possibility of entering it at a fake site that resembles the one you intended to use is real.
Let's start with the basics.
No sensible site stores a password as plain, unencrypted text. For any financial, legal, or medical site, or for sites that process credit cards, it's either illegal or against various regulations or terms of service. Rather, they run it through a one-way encryption algorithm. The algorithm, known as a "hash," performs a series of operations on the password that transform it into an outcome that can't be reverse engineered.
That is, for any given input, the output will be dramatically different and unpredictable. Using the common (but outdated) SHA-1 hashing method, the password 1234 becomes 7110eda4d09e062aa5e4a390b0a572ac0d2c0220, while 1235 becomes ac1ab23d6288711be64a25bf13432baf1e60b2bd. Knowing the output doesn't help you figure out the original text.
Whenever you log into a site, your password—presumably carried over an encrypted https session—is hashed using whatever algorithm the site employs, and then tested against the version stored in the database.
BRUTE-FORCE METHODS CAN CHURN THROUGH BILLIONS TO HUNDREDS OF BILLIONS OF PASSWORDS PER SECOND.
A cracker who gets ahold of a file of hashed passwords uses brute-force methods to determine what password is associated with which accounts. This starts with using the most commonly used passwords, which are easy to find from previous large-scale attacks and cracks of large databases. It also includes all words found in dictionaries (English and others, depending on the site), and then proceeds to combinations of words.
Shorter passwords using a smaller character set, such as upper- and lower-case letters, can be cracked exponentially faster than ones that draw from the entire range of characters one can type or that are simply longer. Up until a few years ago, crackers built "rainbow tables" containing precomputed hashes using popular algorithms as a way of speeding up cracks against the most frequently chosen passwords. This would seem to indicate that a complicated and uncommon password like Spooning1! would be a great choice, and that the common wisdom about passwords is accurate. But not so.
Brute-force methods using modestl -priced computers, which can be souped up with affordable arrays of graphics cards, tapping into the raw computational power of their graphics processors, can churn through billions to hundreds of billions of passwords per second with the SHA-1 algorithm. (New algorithms might only allow tens of thousands to hundreds of thousands of checks, but SHA-1 remains in wide use for reasons of inertia.)
That amount of computational power means that crackers now try the likeliest matches in great quantities. Randomly constructed passwords of 11 or 12 characters that pull from the entire potential character set remain highly resistant to all but the most determined cracking. If you use a password generator and storage program like1Password or LastPass to create a unique, random password for each site, you're minimizing your risk enormously.
But most people don't. And that's where the trouble lies.
Markus Jakobsson works at the intersection of security and usability, and his concerns center around arbitrary complex password requirements that leave users more exposed to a cracking attempt, even as a site claims that the password is strong and resistant.
In an interview, he notes that our minds aren't built to recollect arbitrary letters, numbers, and punctuation. Rather, we remember stories. When asked to create a password, instead of creating something random, users "take their favorite word or concept and then just massage it into the correct shape." He says, "We don't tell people strange character sequences, and we're not really wired to remember them."
In his research, he finds someone might try to use the word "apples" because they like the fruit, but be told a capital letter is required. The visitor transforms this into "Apples" but then must add a number and punctuation. They pick the easiest course, and it becomes "Apples1!" This has eight characters and the requisite variation, and would pass muster at most signup pages. (Some might flag the use of a dictionary word or repeated letters, however.) Dr. Jakobsson says that of these password-quality indicators, "They measure your likely inability to remember your password."
FASTWORDS TIE TOGETHER STORYTELLING, PASSWORD STRENGTH, AND PROBABILITY.
Crackers check dictionary words first with substitutions and common extensions like the above. Based on passwords that are uncovered in early rounds of computation, crackers can use Markhov chains to predict likely paths to explore, reducing the number of passwords they need to check. In mid-2013, Ars Technica consulted with three cracking experts on a leaked database. The experts were able to determine the passwords in question, with success rates from 60% in one hour to 90% in 20 hours. While the potential number of brute-force password iterations is huge, winnowing and using Markhov chains winnows them down.
Dr. Jakobsson has a very different proposal for a type of password he calls "fastwords," which tie together storytelling, password strength, and probability. Reminiscent of a well-known xkcd comic, he suggests coming up with a story that one distills to a few words, such as a stepping on a squirrel while running becomes, "running forest squirrel."
Cracking of a phrase over 10 to 12 characters cannot be done effectively through brute force, so crackers would need to try word combinations and other techniques. Thus the improbability of the combination of three words in that order becomes paramount.
To determine whether a fastword is as secure as possible, it's important to check its likely occurrence in a large corpus of texts. "The fretful porpentine" appears in Shakespeare, and would likely be found as a result in a modest amount of time. Jakobsson and a co-author in one paper consult Microsoft Research's petabytes of data in its Web N-gram Services, which provides word-combination frequency results. The common phrase, "I love you honey," occurs at a frequency of 2 in 100,000,000 (2 to the -25.8th power), making for a very poor password. But a phrase like "frog work flat"—a story about accidentally squashing a frog on the way to work—is estimated at a rate of 2 to the -49.5th power or roughly 1 in a quintillion odds, a very viable defense against cracking.
The additional advantage of storytelling is resistance to phishing. Many sites employ a mnemonic of having you select an image from a variety of choices or even a word from a list, and ask you to remember that for your next visit. Almost no one does: We don't visit most sites often enough to form an arbitrary association.
In Jakobsson's formulation, an image of the first word in a password could be shown so long as the first word is sufficiently uncommon; if not, another word in the password sequence could be shown instead. That word would serve both as an acknowledgement that you were visiting the legitimate site and as a jog to the memory.
For the short-term, he hopes for a departure from an outdated password regimen less likely to help than harm most users. In the long-term, he believes that a second factor—such as using the fingerprint scanners on phones such as the iPhone 6 and Galaxy S5—will become a strict requirement, rather than an option that is not always even available. By adding a second factor, the ability to perform bulk password cracking disappears, even though individuals will remain at varying degrees of vulnerability for someone with a determined reason to crack a single password.
"If they're going to spend 200 hours to break into your bank account and they find you have $500," it's not worth it, notes Jakobsson. An economic incentive for theft evaporates by making the task so difficult that it's no longer worth the effort—which has close to the same effect as the impossible job of plugging every last technical chink in the armor.