How can the sha1sum function give you a unique hash? [closed]

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
4
down vote

favorite
1












I read that using the sha1sum command will run the SHA-1 algorithm and give you a "unique" result, but how can that be?



sha1sum gives you a 40 character hash, like this:



e5fa44f2b31c1fb553b6021e7360d07d5d91ff5e


This is 40 characters long, and has hexadecimal characters, so the hash can have at most 1640 combinations.



This should mean that, for any two random files, there is a small chance that their hash sum would be exactly the same, right?










share|improve this question















closed as off-topic by dr01, schily, muru, Isaac, msp9011 Aug 22 at 4:35



  • This question does not appear to be about Unix or Linux within the scope defined in the help center.
If this question can be reworded to fit the rules in the help center, please edit the question.








  • 1




    See also a couple of related questions on crypt.SE: Is it fair to assume that SHA1 collisions won't occur on a set of <100k strings, How secure is SHA1? What are the chances of a real exploit?, Why is SHA-1 considered broken?
    – ilkkachu
    Aug 21 at 9:29






  • 8




    This does not seem to be related to Unix, but to mathematics.
    – Kusalananda
    Aug 21 at 9:34






  • 6




    I'm voting to close this question as off-topic because, even if it mentions the sha1sum command, it has nothing to do with Unix&Linux; it's a generic question about hash functions.
    – dr01
    Aug 21 at 12:50














up vote
4
down vote

favorite
1












I read that using the sha1sum command will run the SHA-1 algorithm and give you a "unique" result, but how can that be?



sha1sum gives you a 40 character hash, like this:



e5fa44f2b31c1fb553b6021e7360d07d5d91ff5e


This is 40 characters long, and has hexadecimal characters, so the hash can have at most 1640 combinations.



This should mean that, for any two random files, there is a small chance that their hash sum would be exactly the same, right?










share|improve this question















closed as off-topic by dr01, schily, muru, Isaac, msp9011 Aug 22 at 4:35



  • This question does not appear to be about Unix or Linux within the scope defined in the help center.
If this question can be reworded to fit the rules in the help center, please edit the question.








  • 1




    See also a couple of related questions on crypt.SE: Is it fair to assume that SHA1 collisions won't occur on a set of <100k strings, How secure is SHA1? What are the chances of a real exploit?, Why is SHA-1 considered broken?
    – ilkkachu
    Aug 21 at 9:29






  • 8




    This does not seem to be related to Unix, but to mathematics.
    – Kusalananda
    Aug 21 at 9:34






  • 6




    I'm voting to close this question as off-topic because, even if it mentions the sha1sum command, it has nothing to do with Unix&Linux; it's a generic question about hash functions.
    – dr01
    Aug 21 at 12:50












up vote
4
down vote

favorite
1









up vote
4
down vote

favorite
1






1





I read that using the sha1sum command will run the SHA-1 algorithm and give you a "unique" result, but how can that be?



sha1sum gives you a 40 character hash, like this:



e5fa44f2b31c1fb553b6021e7360d07d5d91ff5e


This is 40 characters long, and has hexadecimal characters, so the hash can have at most 1640 combinations.



This should mean that, for any two random files, there is a small chance that their hash sum would be exactly the same, right?










share|improve this question















I read that using the sha1sum command will run the SHA-1 algorithm and give you a "unique" result, but how can that be?



sha1sum gives you a 40 character hash, like this:



e5fa44f2b31c1fb553b6021e7360d07d5d91ff5e


This is 40 characters long, and has hexadecimal characters, so the hash can have at most 1640 combinations.



This should mean that, for any two random files, there is a small chance that their hash sum would be exactly the same, right?







sha1sum






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Aug 21 at 9:05









Stephen Kitt

146k22320386




146k22320386










asked Aug 21 at 8:53









iamAguest

985




985




closed as off-topic by dr01, schily, muru, Isaac, msp9011 Aug 22 at 4:35



  • This question does not appear to be about Unix or Linux within the scope defined in the help center.
If this question can be reworded to fit the rules in the help center, please edit the question.




closed as off-topic by dr01, schily, muru, Isaac, msp9011 Aug 22 at 4:35



  • This question does not appear to be about Unix or Linux within the scope defined in the help center.
If this question can be reworded to fit the rules in the help center, please edit the question.







  • 1




    See also a couple of related questions on crypt.SE: Is it fair to assume that SHA1 collisions won't occur on a set of <100k strings, How secure is SHA1? What are the chances of a real exploit?, Why is SHA-1 considered broken?
    – ilkkachu
    Aug 21 at 9:29






  • 8




    This does not seem to be related to Unix, but to mathematics.
    – Kusalananda
    Aug 21 at 9:34






  • 6




    I'm voting to close this question as off-topic because, even if it mentions the sha1sum command, it has nothing to do with Unix&Linux; it's a generic question about hash functions.
    – dr01
    Aug 21 at 12:50












  • 1




    See also a couple of related questions on crypt.SE: Is it fair to assume that SHA1 collisions won't occur on a set of <100k strings, How secure is SHA1? What are the chances of a real exploit?, Why is SHA-1 considered broken?
    – ilkkachu
    Aug 21 at 9:29






  • 8




    This does not seem to be related to Unix, but to mathematics.
    – Kusalananda
    Aug 21 at 9:34






  • 6




    I'm voting to close this question as off-topic because, even if it mentions the sha1sum command, it has nothing to do with Unix&Linux; it's a generic question about hash functions.
    – dr01
    Aug 21 at 12:50







1




1




See also a couple of related questions on crypt.SE: Is it fair to assume that SHA1 collisions won't occur on a set of <100k strings, How secure is SHA1? What are the chances of a real exploit?, Why is SHA-1 considered broken?
– ilkkachu
Aug 21 at 9:29




See also a couple of related questions on crypt.SE: Is it fair to assume that SHA1 collisions won't occur on a set of <100k strings, How secure is SHA1? What are the chances of a real exploit?, Why is SHA-1 considered broken?
– ilkkachu
Aug 21 at 9:29




8




8




This does not seem to be related to Unix, but to mathematics.
– Kusalananda
Aug 21 at 9:34




This does not seem to be related to Unix, but to mathematics.
– Kusalananda
Aug 21 at 9:34




6




6




I'm voting to close this question as off-topic because, even if it mentions the sha1sum command, it has nothing to do with Unix&Linux; it's a generic question about hash functions.
– dr01
Aug 21 at 12:50




I'm voting to close this question as off-topic because, even if it mentions the sha1sum command, it has nothing to do with Unix&Linux; it's a generic question about hash functions.
– dr01
Aug 21 at 12:50










1 Answer
1






active

oldest

votes

















up vote
10
down vote



accepted










A SHA-1 hash, like any hash, isn’t unique; in fact collisions have been found (and more importantly, crafted).



The desirable property of a hash function is that it’s difficult to construct two files which produce the same hash, not that the hashes produced are unique (as you mention, a SHA-1 sum has 160 bits, so there are only 2160 different hashes); difficult meaning that you can’t find a collision faster than brute-force search. If you pick one file, another randomly-chosen file has a one-in-2160 chance of having the same hash. SHA-1 is considered insecure since 2005; you’ll see the odds given there as one-in-280, thanks to the birthday attack (the odds there are those of finding two colliding files in a large haystack, not those of finding a colliding file for a specific target).






share|improve this answer


















  • 3




    Two files have a 2^-160 chance of colliding, but having 2^80 files to test against each other gives about a 50 % chance of a collision, which is why the bit-strength of a hash is usually divided by two to get the strength against collisions. (It's not a question of having a collision between the two files Stephen has, but having one within ~all files, ever.)
    – ilkkachu
    Aug 21 at 9:18







  • 2




    2^80 is about 10^24, which is about the mass of the Earth in kilograms, just to give the number some scale. Random collisions are astronomically improbable, it's the brokenness of the hash you need to care about.
    – ilkkachu
    Aug 21 at 9:23






  • 1




    Ah yes, thanks @ilkkachu, it’s really about the average number of operations required to find a collision (statistically), correct? So the basic odds are 1:2^160, but you don’t need that many operations to find a collision because the start point isn’t fixed. As in, if you pick one file, then another, there’s a 1:2^160 chance they’ll collide, but that’s not the same as finding a collision in a 2^80-sized haystack.
    – Stephen Kitt
    Aug 21 at 9:30










  • @StephenKitt, yea, exactly. Collisions are the case where you don't care what the colliding files are, so the birthday attack applies, hence 2^-80. Preimage resistance is different, that's where you're trying to find another file that gives the same hash as some single particular file, so the odds are 2^-160 for each file.
    – ilkkachu
    Aug 21 at 9:39

















1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
10
down vote



accepted










A SHA-1 hash, like any hash, isn’t unique; in fact collisions have been found (and more importantly, crafted).



The desirable property of a hash function is that it’s difficult to construct two files which produce the same hash, not that the hashes produced are unique (as you mention, a SHA-1 sum has 160 bits, so there are only 2160 different hashes); difficult meaning that you can’t find a collision faster than brute-force search. If you pick one file, another randomly-chosen file has a one-in-2160 chance of having the same hash. SHA-1 is considered insecure since 2005; you’ll see the odds given there as one-in-280, thanks to the birthday attack (the odds there are those of finding two colliding files in a large haystack, not those of finding a colliding file for a specific target).






share|improve this answer


















  • 3




    Two files have a 2^-160 chance of colliding, but having 2^80 files to test against each other gives about a 50 % chance of a collision, which is why the bit-strength of a hash is usually divided by two to get the strength against collisions. (It's not a question of having a collision between the two files Stephen has, but having one within ~all files, ever.)
    – ilkkachu
    Aug 21 at 9:18







  • 2




    2^80 is about 10^24, which is about the mass of the Earth in kilograms, just to give the number some scale. Random collisions are astronomically improbable, it's the brokenness of the hash you need to care about.
    – ilkkachu
    Aug 21 at 9:23






  • 1




    Ah yes, thanks @ilkkachu, it’s really about the average number of operations required to find a collision (statistically), correct? So the basic odds are 1:2^160, but you don’t need that many operations to find a collision because the start point isn’t fixed. As in, if you pick one file, then another, there’s a 1:2^160 chance they’ll collide, but that’s not the same as finding a collision in a 2^80-sized haystack.
    – Stephen Kitt
    Aug 21 at 9:30










  • @StephenKitt, yea, exactly. Collisions are the case where you don't care what the colliding files are, so the birthday attack applies, hence 2^-80. Preimage resistance is different, that's where you're trying to find another file that gives the same hash as some single particular file, so the odds are 2^-160 for each file.
    – ilkkachu
    Aug 21 at 9:39














up vote
10
down vote



accepted










A SHA-1 hash, like any hash, isn’t unique; in fact collisions have been found (and more importantly, crafted).



The desirable property of a hash function is that it’s difficult to construct two files which produce the same hash, not that the hashes produced are unique (as you mention, a SHA-1 sum has 160 bits, so there are only 2160 different hashes); difficult meaning that you can’t find a collision faster than brute-force search. If you pick one file, another randomly-chosen file has a one-in-2160 chance of having the same hash. SHA-1 is considered insecure since 2005; you’ll see the odds given there as one-in-280, thanks to the birthday attack (the odds there are those of finding two colliding files in a large haystack, not those of finding a colliding file for a specific target).






share|improve this answer


















  • 3




    Two files have a 2^-160 chance of colliding, but having 2^80 files to test against each other gives about a 50 % chance of a collision, which is why the bit-strength of a hash is usually divided by two to get the strength against collisions. (It's not a question of having a collision between the two files Stephen has, but having one within ~all files, ever.)
    – ilkkachu
    Aug 21 at 9:18







  • 2




    2^80 is about 10^24, which is about the mass of the Earth in kilograms, just to give the number some scale. Random collisions are astronomically improbable, it's the brokenness of the hash you need to care about.
    – ilkkachu
    Aug 21 at 9:23






  • 1




    Ah yes, thanks @ilkkachu, it’s really about the average number of operations required to find a collision (statistically), correct? So the basic odds are 1:2^160, but you don’t need that many operations to find a collision because the start point isn’t fixed. As in, if you pick one file, then another, there’s a 1:2^160 chance they’ll collide, but that’s not the same as finding a collision in a 2^80-sized haystack.
    – Stephen Kitt
    Aug 21 at 9:30










  • @StephenKitt, yea, exactly. Collisions are the case where you don't care what the colliding files are, so the birthday attack applies, hence 2^-80. Preimage resistance is different, that's where you're trying to find another file that gives the same hash as some single particular file, so the odds are 2^-160 for each file.
    – ilkkachu
    Aug 21 at 9:39












up vote
10
down vote



accepted







up vote
10
down vote



accepted






A SHA-1 hash, like any hash, isn’t unique; in fact collisions have been found (and more importantly, crafted).



The desirable property of a hash function is that it’s difficult to construct two files which produce the same hash, not that the hashes produced are unique (as you mention, a SHA-1 sum has 160 bits, so there are only 2160 different hashes); difficult meaning that you can’t find a collision faster than brute-force search. If you pick one file, another randomly-chosen file has a one-in-2160 chance of having the same hash. SHA-1 is considered insecure since 2005; you’ll see the odds given there as one-in-280, thanks to the birthday attack (the odds there are those of finding two colliding files in a large haystack, not those of finding a colliding file for a specific target).






share|improve this answer














A SHA-1 hash, like any hash, isn’t unique; in fact collisions have been found (and more importantly, crafted).



The desirable property of a hash function is that it’s difficult to construct two files which produce the same hash, not that the hashes produced are unique (as you mention, a SHA-1 sum has 160 bits, so there are only 2160 different hashes); difficult meaning that you can’t find a collision faster than brute-force search. If you pick one file, another randomly-chosen file has a one-in-2160 chance of having the same hash. SHA-1 is considered insecure since 2005; you’ll see the odds given there as one-in-280, thanks to the birthday attack (the odds there are those of finding two colliding files in a large haystack, not those of finding a colliding file for a specific target).







share|improve this answer














share|improve this answer



share|improve this answer








edited Aug 21 at 9:44

























answered Aug 21 at 8:55









Stephen Kitt

146k22320386




146k22320386







  • 3




    Two files have a 2^-160 chance of colliding, but having 2^80 files to test against each other gives about a 50 % chance of a collision, which is why the bit-strength of a hash is usually divided by two to get the strength against collisions. (It's not a question of having a collision between the two files Stephen has, but having one within ~all files, ever.)
    – ilkkachu
    Aug 21 at 9:18







  • 2




    2^80 is about 10^24, which is about the mass of the Earth in kilograms, just to give the number some scale. Random collisions are astronomically improbable, it's the brokenness of the hash you need to care about.
    – ilkkachu
    Aug 21 at 9:23






  • 1




    Ah yes, thanks @ilkkachu, it’s really about the average number of operations required to find a collision (statistically), correct? So the basic odds are 1:2^160, but you don’t need that many operations to find a collision because the start point isn’t fixed. As in, if you pick one file, then another, there’s a 1:2^160 chance they’ll collide, but that’s not the same as finding a collision in a 2^80-sized haystack.
    – Stephen Kitt
    Aug 21 at 9:30










  • @StephenKitt, yea, exactly. Collisions are the case where you don't care what the colliding files are, so the birthday attack applies, hence 2^-80. Preimage resistance is different, that's where you're trying to find another file that gives the same hash as some single particular file, so the odds are 2^-160 for each file.
    – ilkkachu
    Aug 21 at 9:39












  • 3




    Two files have a 2^-160 chance of colliding, but having 2^80 files to test against each other gives about a 50 % chance of a collision, which is why the bit-strength of a hash is usually divided by two to get the strength against collisions. (It's not a question of having a collision between the two files Stephen has, but having one within ~all files, ever.)
    – ilkkachu
    Aug 21 at 9:18







  • 2




    2^80 is about 10^24, which is about the mass of the Earth in kilograms, just to give the number some scale. Random collisions are astronomically improbable, it's the brokenness of the hash you need to care about.
    – ilkkachu
    Aug 21 at 9:23






  • 1




    Ah yes, thanks @ilkkachu, it’s really about the average number of operations required to find a collision (statistically), correct? So the basic odds are 1:2^160, but you don’t need that many operations to find a collision because the start point isn’t fixed. As in, if you pick one file, then another, there’s a 1:2^160 chance they’ll collide, but that’s not the same as finding a collision in a 2^80-sized haystack.
    – Stephen Kitt
    Aug 21 at 9:30










  • @StephenKitt, yea, exactly. Collisions are the case where you don't care what the colliding files are, so the birthday attack applies, hence 2^-80. Preimage resistance is different, that's where you're trying to find another file that gives the same hash as some single particular file, so the odds are 2^-160 for each file.
    – ilkkachu
    Aug 21 at 9:39







3




3




Two files have a 2^-160 chance of colliding, but having 2^80 files to test against each other gives about a 50 % chance of a collision, which is why the bit-strength of a hash is usually divided by two to get the strength against collisions. (It's not a question of having a collision between the two files Stephen has, but having one within ~all files, ever.)
– ilkkachu
Aug 21 at 9:18





Two files have a 2^-160 chance of colliding, but having 2^80 files to test against each other gives about a 50 % chance of a collision, which is why the bit-strength of a hash is usually divided by two to get the strength against collisions. (It's not a question of having a collision between the two files Stephen has, but having one within ~all files, ever.)
– ilkkachu
Aug 21 at 9:18





2




2




2^80 is about 10^24, which is about the mass of the Earth in kilograms, just to give the number some scale. Random collisions are astronomically improbable, it's the brokenness of the hash you need to care about.
– ilkkachu
Aug 21 at 9:23




2^80 is about 10^24, which is about the mass of the Earth in kilograms, just to give the number some scale. Random collisions are astronomically improbable, it's the brokenness of the hash you need to care about.
– ilkkachu
Aug 21 at 9:23




1




1




Ah yes, thanks @ilkkachu, it’s really about the average number of operations required to find a collision (statistically), correct? So the basic odds are 1:2^160, but you don’t need that many operations to find a collision because the start point isn’t fixed. As in, if you pick one file, then another, there’s a 1:2^160 chance they’ll collide, but that’s not the same as finding a collision in a 2^80-sized haystack.
– Stephen Kitt
Aug 21 at 9:30




Ah yes, thanks @ilkkachu, it’s really about the average number of operations required to find a collision (statistically), correct? So the basic odds are 1:2^160, but you don’t need that many operations to find a collision because the start point isn’t fixed. As in, if you pick one file, then another, there’s a 1:2^160 chance they’ll collide, but that’s not the same as finding a collision in a 2^80-sized haystack.
– Stephen Kitt
Aug 21 at 9:30












@StephenKitt, yea, exactly. Collisions are the case where you don't care what the colliding files are, so the birthday attack applies, hence 2^-80. Preimage resistance is different, that's where you're trying to find another file that gives the same hash as some single particular file, so the odds are 2^-160 for each file.
– ilkkachu
Aug 21 at 9:39




@StephenKitt, yea, exactly. Collisions are the case where you don't care what the colliding files are, so the birthday attack applies, hence 2^-80. Preimage resistance is different, that's where you're trying to find another file that gives the same hash as some single particular file, so the odds are 2^-160 for each file.
– ilkkachu
Aug 21 at 9:39


Popular posts from this blog

How to check contact read email or not when send email to Individual?

Displaying single band from multi-band raster using QGIS

How many registers does an x86_64 CPU actually have?