mjhjmtu

Question

I'm massively under-informed when it comes to in-depth security, however my understanding was that password entropy should be somewhat similar between different algorithms.

I gave a couple of passwords entropy checks, and the algorithms gave me massively varying results.

Is this to be expected, and is there a reliable/standard check for password entropy?

To add to Connor's nice answer: the password entropy checks you find on the web are mostly garbage and don't tell you anything about the strength of your password especially with regards to guessability. — Sep 5 at 10:57

Braiam 15615 · Accepted Answer · 2018-09-06 15:55:24Z

Password entropy is calculated by the number of possibilities it could be to the power of the length ie. 8 character password of both upper and lowercase letters = (26*2)⁸, (26 characters of the alphabet * 2 for upper and lowercase).

If you include the numbers 0-9 as well, then that power becomes 26*2+10, and if you include special characters as well this number can become quite large for number of possibilities.

So as an example, we'll take a password QweRTy123456, (a horrible password I know). This password is 12 characters long, so the power is the number 12, it uses both uppercase, lowercase and numbers, so we have 62¹². Which gives a total number of possibilities of 3.2262668e+21.

Now if we take that same password, but all in lowercase, ie. qwerty123456, the value we have is 36¹², giving us a potential of 4.7383813e+18, still an enormous number but much smaller than using uppercase as well.

Password strength relies on two major factors, length and complexity. As an example I'll show a numeric PIN of 4 vs 8 characters to show the difference. So for a 4 digit pin number, 10⁴ we have 10,000 possible combinations, and if we use an 8 digit pin, ie. 10⁸ we have 100,000,000 possibilities. So by doubling the length of the password, we have increased the potential candidates 10,000-fold.

The second factor of passwords relies on complexity, ie. upper and lowercase, special characters, etc. I gave an example above to show how this increases quickly by using more character sets.

Just a final note, a password is not as strong as its potential, because a 12 character password could be someones place of birth, a pets name, etc. The contents of a password are also extremely important. The 4 random words model tends to be quite popular and secure, mandatory xkcd.

ConorMancone's answer gives an excellent explanation and example of the contents of a password vs the entropy, so i'd suggest giving that a read too for more info on this subject.

So in summary, take the number of possibilities from the character set, to the power of the length of the password, and divide by 2 for a reliable method of getting password strength based on brute-forcing techniques.

Hopefully this answers your question, if you have any more leave a comment and I'll update this to reflect your questions.

Excellent I understand it now. Basically it's just a measurement of the complexity of a password, but not password strength since hash tables, dictionary checks, common information checks and such exist. Thank you for the explanation. :) — Sep 5 at 10:11
"you only need to get 1/2+1 possibilities to gain the correct answer" - That's not what the birthday attack is, that's just a 50% probability of guessing correctly based on having searched half the space. — Sep 5 at 13:37
@ConnorJ Good correction, I actually missed that one. You're still mistakenly applying the birthday attack though. When you say "1/2+1 possibilities" (idk where you're getting that +1) you mean trying 50M out of 100M of the potential passwords, correct? That's not what the birthday attack is, and it doesn't guarantee finding the correct password, it's the average # of guesses required. — Sep 5 at 14:45
@ConnorJ The birthday attack would apply to two plaintexts having the same hash, not finding an input that has the same hash as a specific plaintext; ie, the birthday attack cannot be used to find a collision with a known hash. — Sep 5 at 14:58
How is this the accepted answer? It's not how password entropy is calculated at all, and it doesn't even answer the OP's question of why different checkers come up with different entropy levels. — Sep 6 at 13:38

ilkkachu 1,1971513 · Answer 2 · 2018-09-06 10:39:25Z

Attempting to add to the other Connor's answer:

Something important to keep in mind is that entropy of course is, in essence, "the amount of randomness" in the password. Therefore, part of why different entropy checkers will disagree is because the entropy is a measure of how the password was generated, not what the password contains. An extreme example is usually the best way to show what I mean. Imagine that my password was frphevgl.fgnpxrkpunatr.pbzPbabeZnapbar.

An entropy checker will probably rate that with a high amount of entropy because it contains no words and is long. It doesn't contain numbers, but taking a simple calculation (like what Connor outlined in his answer, and what most entropy calculators do), you might guess an entropy of 216 bits of entropy - far more than a typical password needs these days (38 characters with a mix of upper and lower case gives 52^38 Ã¢Â‰Âˆ 2^216).

However, looking at that, someone might suspect that my password isn't really random at all and might realize that it is just the rot13 transformation of site name + my name. Therefore the reality is that there is no entropy in my password at all, and anyone who knows how I generate my passwords will know what my password is to every site I log in as.

This is an extreme example but I hope it gets the point across. Entropy is determined not by what the password looks like but by how it is generated. If you use some rules to generate a password for each site then your passwords might not have any entropy at all, which means that anyone who knows your rules knows your passwords. If you use lots of randomness then you have a high entropy password and it is secure even if someone knows how you make your passwords.

Entropy calculators make assumptions about how the passwords were generated, and therefore they can both disagree with eachother and also be wildly wrong. Also, XKCD is always applicable when it comes to these sorts of things.

Great example, I updated my answer to direct here since this is an excellent description about the contents of a password, and not just the entropy of it! — Sep 5 at 11:34
"Entropy is determined not by what the password looks like but by how it is generated" This is the key. You should make that a bold statement, so that it stands out more. +1 — Sep 5 at 19:29
This reminds me of this Dilbert comic, too: dilbert.com/strip/2001-10-25 — Sep 6 at 7:53
Could you argue that the entropy of your password decays to the probability that you are using the rot13 method you described? For a random user, that probability is quite low. The probability that they use 123456 or password is much higher. Or do you assume that an attacker can identify you personally and knows your password generation method from leaks (e.g. the leaks listed on haveibeenpwned.com)? — Sep 19 at 19:06

Kevin 5807 · Answer 3 · 2018-09-06 13:33:03Z

Password entropy is calculated by: Knowing (or guessing) the algorithm used to generate the password, and gathering in the number of different branch points used in generating the password you chose.

Let me give some examples:

Password: password

This isn't 26^8 (or 2^38) - because the algorithm wasn't "choose 8 random lowercase characters". The algorithm was: choose a single, very easy to remember word. How many such words are there? If you decide, "there are 200 such words", then you're looking at about 8 bits of entropy (not 38.)

Password: password6

Similar to the previous entry, this isn't 36^9 (or 2^47) - because the algorithm is choose a single, very easy to remember word, and then decorate it at the end with a single digit number. The entropy here is around 11 bits (not 47.)

Password: carpet#terraform2

By now you can guess what's going on. Two relatively uncommon words, with a punctuation character between them and a number digit at the end. If you estimate that those words were chosen from a dictionary of 10000 words (2^13), you're looking at something like 33 bits of entrophy (13 for the first word + 4 for the punctuation + 13 for the second word + 3 for the final digit.)

So, now, to answer your direct question: why do the various entropy checkers give different values?

Well, let's use that last password: carpet#terraform2.

One entropy-evaluator might say, "Hey, I have no clue how you generated this. So it must just be random characters among lowercase, punctuation, and numbers. Call it 52^17, or 97 bits of entropy (2^97.)"

Another, slightly smarter entropy-evaluator might say, "Hey, I recognize that first word, but that second string of letters is just random. So the algorithm is a single uncommon word, a punctuation, nine random letters, and then a number. So 10000 x 16 x 26^9 x 10, or 63 bits of entropy"

A third and fourth entropy-evaluator might correctly figure out the algorithm used to generate it. But the third evaluator thinks both words should come from a dictionary of 5000 words, but the fourth evaluator thinks you have to break into a 30,000 word dictionary to find them. So one comes up with 32 bits on entropy while the other thinks there are 37 bits.

Hopefully it's starting to make sense. The reason different entropy evaluators are coming up with different numbers is because they're all coming up with different evaluations on how the password was generated.

supercat 1,17448 · Answer 4 · 2018-09-05 15:32:51Z

There is no way to judge whether a sequence of numbers or characters happens to be random. A machine asked to produce a random seven-digit number would produce 8675309 about once every ten million requests. A Tommy Tutone fan might offer up that number every time. If a security application happens to need a seven-digit number (perhaps it needs to be accessible via numeric pad), it might judge that 8675309 as having good entropy, unless that application happens to have been written by a Tommy Tutone fan in which case it might regard that number as being just as bad as 1111111 or 1234567.

Basically, the only thing an entropy checker can do is check whether a passcode matches any patterns that are regarded as having low entropy, and observe the lowest-entropy pattern that it matches [every passcode will match some pattern--a passcode of e.g. 23 characters will match patterns like "any other passcode of 20-25 characters" if it doesn't match anything else]. If different people are asked to produce a list of all the low-entropy passcode patterns they can think of, they'll almost certainly think of different things. What really matters, though, is whether a particular passcode would happen to match a pattern that an attacker decides to try. While there are some sets of passcodes that attackers would be particularly likely to try (e.g. 1111111 and 1234567), the choices of passcode once one gets beyond that are likely to vary significantly by attacker.

"There is no way to judge whether a sequence of numbers or characters happens to be random." Yes, while checking randomness of past values against any possible pattern is bad, it's fairly easy to check whether a future trial meets basic patterns. I can't say 1111111 isn't random, but I can design a test so that if a generator spits out 1111111 during the test then I will, with high certainty, reject the null hypothesis that the numbers are random and independent. Regardless, here's a relevant Dilbert comic: dilbert.com/strip/2001-10-25 — Sep 7 at 16:26
@CodyP: For any practical N, the number of theoretically-possible non-random ways of generating a list of N numbers will exceed the number of possible lists of N numbers. If the sole criterion one uses to reject a list is whether 11111111 appears more than it should, one would erroneously accept most non-random lists. If one would attempt to reject 60% of lists that could be might conceivably have been produced by non-random means, one would also reject 60% of random lists. The best one do is hope that certain non-random means of generating data are much more likely to be used than others. — Sep 7 at 17:06
While that's factually correct and you bring up good caveats, my main point is that the intuitive idea that 1111111 is a sign of a bad RNG has merit, despite academic considerations like false negatives and obscure patterns. I don't think anyone would suggest you set your p-value at 60% or check only for repeated 1's. Given mature tests like the NIST test suite or the zxcvbn password checker, we don't have to pretend that rejecting 1111111 is a hard problem. This is especially true for passwords where we have a good idea of what patterns are common. — Sep 7 at 21:07
@CodyP: My point is that assessments of whether particular passcodes are more likely to be chosen via high-entropy method than a low-entropy method are predicated almost entirely upon guesses as to the likelihood of various low-entropy methods being employed. — Sep 7 at 22:50

BoppreH 24317 · Answer 5 · 2018-09-05 15:01:49Z

Broadly speaking, the security of a password depends on how hard it is for your attacker to guess it. This is what those websites try to measure, with different algorithms for guessing.

A very stupid checker may think that the password "00000000000000000000" has high entropy because it's long and the web site's algorithm for guessing is "try every combination of all characters", which would indeed take a long time to correctly guess this password. A smarter checker will try numeric passwords first, thus introducing a security penalty to this password (it'll be found faster). A smarter still guesser will try patterns first (like repeated characters or dictionary words) and rightly conclude this password is garbage.

This may sound useless, but note that even a bad guesser will give you a useful upper bound: "your password can be broken with at most this many guesses". That is the value these checkers provide. By trying several guessers and taking the lowest number you have a more conservative estimate.

score 1 · Answer 6 · 2018-09-06 19:46:37Z

Other answers do not use the information theory definition of entropy. The mathematical definition is defined as a function based solely on a probability distribution. It is defined for probability density function p as the sum of p(x) * -log(p(x)) of each possible outcome x.

Units of entropy are logarithmic. Typically two is used for the base of the logarithm, so we say that a stochastic system has n bits of entropy. (Other bases could be used. For base e you would instead measure entropy "nats", for natural logarithm, but base two and "bits" are more common.)

The term comes from information theory. Entropy is, of course, related to information. (Another reason why we say "bits".) But you can also describe entropy as a measure of "unpredictability".

Flipping a fair coin once (in the ideal world) produces 1 bit of entropy. Flipping a coin twice produces two bits. And so on. Uniform (discrete) distributions are the simplest distributions to calculate the entropy of because every term summed is identical. You can simplify the equation for entropy of a discrete uniform variable X with n outcomes each with 1/n probability to

H(X) = n * 1/n * -log(1/n) = log(n)

You should be able to see how one can get the entropy out of system that is one or more coin flips with just knowledge of how many coin flips are to be recorded.

The entropy of non-uniform (discrete) distributions is also easy to compute. It just requires more steps. I'll use the coin flip example to relate entropy to unpredictability again. What if you instead use a biased coin? Well then the entropy is

H = p(heads) * -log(p(heads)) + p(tails) * -log(p(tails))

If you plot that you get this

Coin flip entropy plot

See? The less fair (uniform) a coin flip (distribution) is the less entropy it has. Entropy is maximized for the type of coin flip which is most unpredictable: a fair coin flip. Biased coin flips still contain some entropy as long as heads and tails are both still possible. Entropy (unpredictability) decreases when the coin becomes more biased. Anything 100% certain has zero entropy.

It is important to know that it is the idea of flipping coins that has entropy, not the result of the coin flips themselves. You cannot infer from one sample x from distribution X what the probability p(x) of x is. All you know is that it is non-zero. And with just x you have no idea how many other possible outcomes there are or what their individual probabilities are. Therefore you're not able to compute entropy just by looking at one sample.

When you see a string "HTHTHT" you don't know if it came from a sequence of six fair coin flips (6 bits of entropy), a biased coin flip sequence (< 6 bits), a randomly generated string from the uniform distribution of all 6 character uppercase letters (6 * log_2(26) or about 28 bits), or if its from a sequence that simply alternates between 'H' and 'T' (0 bits).

For the same reason, you cannot calculate the entropy of just one password. Any tool that tells you the entropy of a password you enter is incorrect or misrepresents what they mean by entropy. (And may be harvesting passwords.) Only systems with a probability distribution can have entropy. You cannot calculate the true entropy of that system with knowledge of exact probabilities and there is no way around that.

One may be able to estimate entropy, however, but it requires still some knowledge (or assumptions) of the distribution. If you assume a k character long password was generated from one of a few uniform distributions with some alphabet A then you can estimate entropy to be log_2 (1/|A|). |A| being the size of the alphabet. A lot of password strength estimators use this (naive) method. If they see you use only lowercase then they assume |A| = 26. If they see a mix of upper and lower case they assume |A| = 52. This is why a supposed password strength calculator might tell you that "Password1" is thousands of more times secure than "password". It makes assumptions about the statistical distribution of passwords that aren't necessarily justified.

Some password strength checkers don't exhibit this behavior, but that doesn't mean they are accurate estimates. They're just programmed to look for more patterns. We can make more informed guesses of password strength based on observed or imagined human password behaviors, but we can never calculate an entropy value that isn't an estimate.

And as I said earlier, it's wrong to say that passwords themselves have an entropy associated with them. When people say that a password has ___ entropy then, if the know what they're talking about, they are really using it shorthand for "this password was generated using a process that has ____ entropy."

Some people advocate for passwords to be computer generated instead of human generated. This is because humans are bad at coming up with high entropy passwords. Human bias produces non-uniform password distributions. Even long passwords people choose are more predictable than expected. For computer generated passwords we can know the exact entropy of a password-space because a programmer created that password space. We're able to know machine generated password entropy without estimation, assuming we use a secure RNG. Diceware is another password generating method that people advocate for that has the same properties. It doesn't require any computers and instead assume you have a fair six-sided die.

If a password is generated with n bits of entropy we can estimate how many guesses a password cracker needs to make to be 2^(n-1). However, this is a conservative estimate. Password strength and entropy are not synonymous. The entropy-based estimate assumes that the cracker knows your individual password generation process. Entropy based password strength follows Kerckhoffs's principle. It measures password strength in a sense (since entropy measures unpredictability) using the security-through-obscurity is not security mindset. If you consciously keep entropy in mind while generating passwords, then you can pick a password generating method with high entropy and get a lower bound on how secure your password is.

As long as the password is memorable/usable there is nothing wrong with using true entropy calculations as a conservative estimate of password strength; better to under-estimate (and use computer-generated or dice-generated passwords) than to over-estimate password strength (as password strength estimating algorithms do with human generated passwords.)

The answer to your question is No. There is no such thing as a reliable way to check password entropy. It's actually mathematically impossible.

+1, but I believe your H(X) should be log_2(n) not log_2(1/n). And I'm not sure why you say other answers don't use the information theory definition of entropy, some of them don't but most do. — Sep 6 at 19:39
@AndrolGenhald I fixed the H(X) error. I forgot the minus signs. Other answers go straight to how many bits of strength a password has, but they skip the formulas. Maybe I didn't read closely enough. Skipping the nuance of entropy being defined based on a probability distribution leads to misunderstanding. Someone could easily take away, instead, the idea that strength = length * alphabet_size, which is only true for uniform distributions. — Sep 6 at 19:53
Some suggestions for possible improvements: "And may be harvesting passwords." could be clearer if you linked examples of known good and known malicious entropy checkers. Also note that non-uniform (discrete) distributions are not preferred in password or especially cryptography because some outputs are more common than the entropy alone would suggest (in exchange for other outputs being less common). Finally, most password strength checkers provide an upper bound or "blind" entropy. Good checkers provide a smaller upper bound based on realistic scenarios. I don't view that as disingenuous. — Sep 7 at 16:39

Jeffrey Goldberg 3,482713 · Answer 7 · 2018-09-07 02:17:06Z

There are two crucial things that need to be understood, each of which I will elaborate on

Anything trying to judge the entropy of a password from the password alone is grasping at straws. This is because the strength is a feature of the system that generates the password and not the password itself.

Entropy is not a coherent concept for password strength unless the password is the result of a process that generates passwords uniformly. Entropy is a piss poor metric when the distribution of possible passwords is not uniform.

It's the creation scheme that matters

The strength of a password (whether measured in entropy or not) is a function of the scheme that was used to create it. To quote myself from something I wrote in 2011

I canÃ¢Â€Â™t over-emphasize the point that we need to look at the system instead of at a single output of the system. Let me illustrate this with a ridiculous example. The passwords F9GndpVkfB44VdvwfUgTxGH7A8t and rE67AjbDCUotaju9H49sMFgYszA each look like extremely strong passwords. Based on their lengths and the use of upper and lower case and digits, any password strength testing system would say that these are extremely strong passwords. But suppose that the system by which these were generated was the following: Flip a coin. If it comes up heads use F9GndpVkfB44VdvwfUgTxGH7A8t, and if it comes up tails use rE67AjbDCUotaju9H49sMFgYszA.

That system produces only two outcomes. And even though the passwords look strong, passwords generated by that system are extremely weak.

Any password strength meter is going to report those as really strong. To also quote myself describing how 1Password's strength meter works (Disclosure: I work for 1Password), I introduced my 2013 answer with:

The usual way to determine strength of a specific password is to sacrifice a chicken and then read the entrails. But in 1Password, we do better. We compute the horoscope of the chicken before sacrificing it. Given that some of our staff are vegetarians, we are always looking for alternative approaches.

The sad fact of the matter is that accurate password strength meters are impossible (short of spending years or decades trying to crack the offered password). The reason is because the strength of a password depends largely on the system by which it was generated. That is not something that can be determined with confidence by inspected a single password.

So it is not at all surprising that different strength meters are going to produce different results. And all of them should be taken with a large grain of salt. As others have noted, the strength meters have to guess what sort of system you used to create the password from the single instance of the password itself. And that guess work in hard.

Suppose for example, the password that you test has mixed case. Now most human created passwords will have the occasional capitalization at the beginning of chunk. That is, humans are far more likely to create a password that looks like
Password1234 than they are to create paSsword1234? Does your strength system account for that kind of difference? Or does it just spot that there is mixed case and make a computation assuming (incorrectly) that upper and lower case letters are equality likely throughout the password?

Now strength meters have improved over the past few years, but even the best of them are deeply and fundamentally limited because they are trying to figure out what system you used to create the password from the password itself.

Entropy is the wrong notion

Entropy only makes sense if every possible password that the scheme can create is equally likely. In a 2013 PasswordsCon talk, I illustrated this with some extreme examples. (View the PDF of the slides in single page mode to get the overlays).

Imagine a password creation scheme that 9 times out of 10 picks a fixed password, but the other 1 time out of ten picks uniformly from 2⁵¹² possibilities. The Shannon entropy of that scheme is around 52 bits (which is very respectable for a password), but clearly it is a terrible password creation scheme. I argued, instead, that when there is a non-uniform distribution we should just use the likelihood of the most likely result (min-entropy) as a strength measure of the scheme.

Although that is a contrived example to make a point, the fact of the matter is that the distribution of human chosen passwords is a fat headed distribution. It doesn't really matter how many different passwords could be generated if lots of people all end up picking from the most common ones.

So I strongly recommend that if you use the word "entropy" that you use "min-entropy" which is log₂p(x_m), where x_m is the most likely password and p is its probability of occurring. Note that when the probability distribution is uniform, min-entropy and Shannon entropy work out to be the same. But when the distribution is not uniform, they can be very different.

While it's true that the entropy of a password depends on the creation scheme used to generate it, I don't think it follows that the strength does. Firstly, it requires the creation scheme to be known: F9GndpVkfB44VdvwfUgTxGH7A8t and rE67AjbDCUotaju9H49sMFgYszA are only weak if the attacker knows that there is a scheme which selects those two strings. Secondly, different schemes may generate the same password: if I use a Geiger counter to select a sequence of 8 Unicode code points, it might select "password"; clearly, the strong randomness hasn't led to a strong password. — Sep 7 at 8:54
In security you should assume that the attacker knows the system. And the people who do seriously try to crack passwords know more about the schemes that people use than anyone else. So yes. You can get strong passwords out of low entropy schemes by using schemes that are unknown (and don't produce the kinds of common results from common schemes). But think about what these means for strength meters. Once a scheme is known, the meter needs to magically adjust. — Sep 10 at 22:03
The sentence I am challenging is "The strength of a password ... is a function of the scheme that was used to create it." Yes, attackers know the schemes generally used to generate passwords, but they don't in practice know the specific scheme used to create any particular password. The meter doesn't need to "magically adjust", it just needs to be based on the techniques attackers are likely to use. I would say the strength of a password is a function of how likely an attacker is to guess it, regardless of how it was actually created. — Sep 11 at 9:09

Braiam 15615 · Accepted Answer · 2018-09-06 15:55:24Z