Hashing email addresses for GDPR compliance

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP












21















UPDATED



We have a very unique scenario: We have several old databases of user accounts. We'd like a new system to be able to connect these old accounts to new accounts on the new system, if the user wishes it.



So for example, on System X you have an old account, with an old, (let's say) RPG character. On System Y you have another old account, with another RPG character on it.



On our new system, with their new account, we'd like our users to be able to search these old databases and claim their old RPG characters. (Our users want this functionality, too.)



We'd like to keep users' old account PII in our database for the sole purpose of allowing them to reconnect old accounts of their new accounts. This would benefit them and be a cool feature, but under GDPR and our privacy policy we will eventually need to delete this old PII from our databases.



BUT - What if we stored this old PII in such a way as that it was irreversible. I.e. Only someone with the information would ever get a positive match.



I'm not a security expert, but I understand that simple hashing (eg. MD5) is too far easy to hack (to put it mildly), and (technically) doesn't require "additional information" (ie. a key).



The good thing about MD5 is that it's fast (in the sense that it's deterministic), meaning we could scan a database of 100,000s rows very quickly looking for a match.



If MD5 (and SHA) are considered insecure to the point of being pointless, what else can we do to scan a database looking for a match? I'm guessing modern hashing, like bcrypt, would be designed to be slow for this very reason, and given that it's not deterministic means that it's unsuitable.



If we merged several aspects of PII into a field (eg. FirstnameLastnameEmailDOB) and then hashed that, it would essentially become heavily salted. Is this a silly solution?










share|improve this question



















  • 2





    Why do you need to pseudonymize them? You might have specific need to, but it is not a typical thing to need to do in this use case.

    – schroeder
    Jan 23 at 12:34












  • @schroeder Sorry I thought I'd explained. Some of this PII is about to expire as per our privacy policy. Pseudonymization would allow us to to keep this functionality without keeping their data.

    – Django Reinhardt
    Jan 23 at 13:52






  • 6





    Yep, that is a great situation for this use case. Kudos to your team for such great understanding of your policies!

    – schroeder
    Jan 23 at 13:54






  • 17





    "The good thing about MD5 is that it's fast, however, meaning we could scan a database of 100,000s rows" - not sure how the speed of MD5 plays a part here, since you are presumably only hashing the email once and searching a database of hashed emails? (And the DB search presumably uses an index...?)

    – MrWhite
    Jan 23 at 16:25






  • 3





    Isn't the point of that bit of the GDPR specifically to stop this? If I tell you "delete everything you have on me, GDPR says so", I want that gone from your records and never again relateable to me. I don't want an undo button for that.

    – Adam Barnes
    Jan 24 at 14:11















21















UPDATED



We have a very unique scenario: We have several old databases of user accounts. We'd like a new system to be able to connect these old accounts to new accounts on the new system, if the user wishes it.



So for example, on System X you have an old account, with an old, (let's say) RPG character. On System Y you have another old account, with another RPG character on it.



On our new system, with their new account, we'd like our users to be able to search these old databases and claim their old RPG characters. (Our users want this functionality, too.)



We'd like to keep users' old account PII in our database for the sole purpose of allowing them to reconnect old accounts of their new accounts. This would benefit them and be a cool feature, but under GDPR and our privacy policy we will eventually need to delete this old PII from our databases.



BUT - What if we stored this old PII in such a way as that it was irreversible. I.e. Only someone with the information would ever get a positive match.



I'm not a security expert, but I understand that simple hashing (eg. MD5) is too far easy to hack (to put it mildly), and (technically) doesn't require "additional information" (ie. a key).



The good thing about MD5 is that it's fast (in the sense that it's deterministic), meaning we could scan a database of 100,000s rows very quickly looking for a match.



If MD5 (and SHA) are considered insecure to the point of being pointless, what else can we do to scan a database looking for a match? I'm guessing modern hashing, like bcrypt, would be designed to be slow for this very reason, and given that it's not deterministic means that it's unsuitable.



If we merged several aspects of PII into a field (eg. FirstnameLastnameEmailDOB) and then hashed that, it would essentially become heavily salted. Is this a silly solution?










share|improve this question



















  • 2





    Why do you need to pseudonymize them? You might have specific need to, but it is not a typical thing to need to do in this use case.

    – schroeder
    Jan 23 at 12:34












  • @schroeder Sorry I thought I'd explained. Some of this PII is about to expire as per our privacy policy. Pseudonymization would allow us to to keep this functionality without keeping their data.

    – Django Reinhardt
    Jan 23 at 13:52






  • 6





    Yep, that is a great situation for this use case. Kudos to your team for such great understanding of your policies!

    – schroeder
    Jan 23 at 13:54






  • 17





    "The good thing about MD5 is that it's fast, however, meaning we could scan a database of 100,000s rows" - not sure how the speed of MD5 plays a part here, since you are presumably only hashing the email once and searching a database of hashed emails? (And the DB search presumably uses an index...?)

    – MrWhite
    Jan 23 at 16:25






  • 3





    Isn't the point of that bit of the GDPR specifically to stop this? If I tell you "delete everything you have on me, GDPR says so", I want that gone from your records and never again relateable to me. I don't want an undo button for that.

    – Adam Barnes
    Jan 24 at 14:11













21












21








21


3






UPDATED



We have a very unique scenario: We have several old databases of user accounts. We'd like a new system to be able to connect these old accounts to new accounts on the new system, if the user wishes it.



So for example, on System X you have an old account, with an old, (let's say) RPG character. On System Y you have another old account, with another RPG character on it.



On our new system, with their new account, we'd like our users to be able to search these old databases and claim their old RPG characters. (Our users want this functionality, too.)



We'd like to keep users' old account PII in our database for the sole purpose of allowing them to reconnect old accounts of their new accounts. This would benefit them and be a cool feature, but under GDPR and our privacy policy we will eventually need to delete this old PII from our databases.



BUT - What if we stored this old PII in such a way as that it was irreversible. I.e. Only someone with the information would ever get a positive match.



I'm not a security expert, but I understand that simple hashing (eg. MD5) is too far easy to hack (to put it mildly), and (technically) doesn't require "additional information" (ie. a key).



The good thing about MD5 is that it's fast (in the sense that it's deterministic), meaning we could scan a database of 100,000s rows very quickly looking for a match.



If MD5 (and SHA) are considered insecure to the point of being pointless, what else can we do to scan a database looking for a match? I'm guessing modern hashing, like bcrypt, would be designed to be slow for this very reason, and given that it's not deterministic means that it's unsuitable.



If we merged several aspects of PII into a field (eg. FirstnameLastnameEmailDOB) and then hashed that, it would essentially become heavily salted. Is this a silly solution?










share|improve this question
















UPDATED



We have a very unique scenario: We have several old databases of user accounts. We'd like a new system to be able to connect these old accounts to new accounts on the new system, if the user wishes it.



So for example, on System X you have an old account, with an old, (let's say) RPG character. On System Y you have another old account, with another RPG character on it.



On our new system, with their new account, we'd like our users to be able to search these old databases and claim their old RPG characters. (Our users want this functionality, too.)



We'd like to keep users' old account PII in our database for the sole purpose of allowing them to reconnect old accounts of their new accounts. This would benefit them and be a cool feature, but under GDPR and our privacy policy we will eventually need to delete this old PII from our databases.



BUT - What if we stored this old PII in such a way as that it was irreversible. I.e. Only someone with the information would ever get a positive match.



I'm not a security expert, but I understand that simple hashing (eg. MD5) is too far easy to hack (to put it mildly), and (technically) doesn't require "additional information" (ie. a key).



The good thing about MD5 is that it's fast (in the sense that it's deterministic), meaning we could scan a database of 100,000s rows very quickly looking for a match.



If MD5 (and SHA) are considered insecure to the point of being pointless, what else can we do to scan a database looking for a match? I'm guessing modern hashing, like bcrypt, would be designed to be slow for this very reason, and given that it's not deterministic means that it's unsuitable.



If we merged several aspects of PII into a field (eg. FirstnameLastnameEmailDOB) and then hashed that, it would essentially become heavily salted. Is this a silly solution?







hash privacy anonymity gdpr pseudonymization






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Jan 24 at 16:59







Django Reinhardt

















asked Jan 23 at 12:33









Django ReinhardtDjango Reinhardt

414516




414516







  • 2





    Why do you need to pseudonymize them? You might have specific need to, but it is not a typical thing to need to do in this use case.

    – schroeder
    Jan 23 at 12:34












  • @schroeder Sorry I thought I'd explained. Some of this PII is about to expire as per our privacy policy. Pseudonymization would allow us to to keep this functionality without keeping their data.

    – Django Reinhardt
    Jan 23 at 13:52






  • 6





    Yep, that is a great situation for this use case. Kudos to your team for such great understanding of your policies!

    – schroeder
    Jan 23 at 13:54






  • 17





    "The good thing about MD5 is that it's fast, however, meaning we could scan a database of 100,000s rows" - not sure how the speed of MD5 plays a part here, since you are presumably only hashing the email once and searching a database of hashed emails? (And the DB search presumably uses an index...?)

    – MrWhite
    Jan 23 at 16:25






  • 3





    Isn't the point of that bit of the GDPR specifically to stop this? If I tell you "delete everything you have on me, GDPR says so", I want that gone from your records and never again relateable to me. I don't want an undo button for that.

    – Adam Barnes
    Jan 24 at 14:11












  • 2





    Why do you need to pseudonymize them? You might have specific need to, but it is not a typical thing to need to do in this use case.

    – schroeder
    Jan 23 at 12:34












  • @schroeder Sorry I thought I'd explained. Some of this PII is about to expire as per our privacy policy. Pseudonymization would allow us to to keep this functionality without keeping their data.

    – Django Reinhardt
    Jan 23 at 13:52






  • 6





    Yep, that is a great situation for this use case. Kudos to your team for such great understanding of your policies!

    – schroeder
    Jan 23 at 13:54






  • 17





    "The good thing about MD5 is that it's fast, however, meaning we could scan a database of 100,000s rows" - not sure how the speed of MD5 plays a part here, since you are presumably only hashing the email once and searching a database of hashed emails? (And the DB search presumably uses an index...?)

    – MrWhite
    Jan 23 at 16:25






  • 3





    Isn't the point of that bit of the GDPR specifically to stop this? If I tell you "delete everything you have on me, GDPR says so", I want that gone from your records and never again relateable to me. I don't want an undo button for that.

    – Adam Barnes
    Jan 24 at 14:11







2




2





Why do you need to pseudonymize them? You might have specific need to, but it is not a typical thing to need to do in this use case.

– schroeder
Jan 23 at 12:34






Why do you need to pseudonymize them? You might have specific need to, but it is not a typical thing to need to do in this use case.

– schroeder
Jan 23 at 12:34














@schroeder Sorry I thought I'd explained. Some of this PII is about to expire as per our privacy policy. Pseudonymization would allow us to to keep this functionality without keeping their data.

– Django Reinhardt
Jan 23 at 13:52





@schroeder Sorry I thought I'd explained. Some of this PII is about to expire as per our privacy policy. Pseudonymization would allow us to to keep this functionality without keeping their data.

– Django Reinhardt
Jan 23 at 13:52




6




6





Yep, that is a great situation for this use case. Kudos to your team for such great understanding of your policies!

– schroeder
Jan 23 at 13:54





Yep, that is a great situation for this use case. Kudos to your team for such great understanding of your policies!

– schroeder
Jan 23 at 13:54




17




17





"The good thing about MD5 is that it's fast, however, meaning we could scan a database of 100,000s rows" - not sure how the speed of MD5 plays a part here, since you are presumably only hashing the email once and searching a database of hashed emails? (And the DB search presumably uses an index...?)

– MrWhite
Jan 23 at 16:25





"The good thing about MD5 is that it's fast, however, meaning we could scan a database of 100,000s rows" - not sure how the speed of MD5 plays a part here, since you are presumably only hashing the email once and searching a database of hashed emails? (And the DB search presumably uses an index...?)

– MrWhite
Jan 23 at 16:25




3




3





Isn't the point of that bit of the GDPR specifically to stop this? If I tell you "delete everything you have on me, GDPR says so", I want that gone from your records and never again relateable to me. I don't want an undo button for that.

– Adam Barnes
Jan 24 at 14:11





Isn't the point of that bit of the GDPR specifically to stop this? If I tell you "delete everything you have on me, GDPR says so", I want that gone from your records and never again relateable to me. I don't want an undo button for that.

– Adam Barnes
Jan 24 at 14:11










2 Answers
2






active

oldest

votes


















36














MD5 or SHA is not the concern. Hashes can be used for pseudonymization. The problem is that the hash would need to be salted (or peppered) so that data from other sources could not be used to identify the person.



My email is the same everywhere. A hash of it would also be the same. So that means that, in this case, the hash and my email become synonymous. Just like a username and the legal name of a person if paired. If you use a hash in this case, you actually gain nothing in terms of GDPR.



Hashing with a salt (or pepper) makes de-anonymising nearly impossible without knowing the added value. The salt (or pepper) almost becomes the token, in this case.



As always, check with your DPO.






share|improve this answer




















  • 2





    You probably should still use a password hash not one designed for speed. Email addresses follow common patterns and may only have very short unique parts; which would leave some of them equivalent to short passwords that can be bruteforced if only protected by a single pass of MD5 or SHA.

    – Dan Neely
    Jan 23 at 16:03






  • 5





    "Hashing with a salt makes de-anonymising nearly impossible without knowing the salt." Since the salt is usually stored right next to the hash, shouldn't it be assumed that the salt is known?

    – kapex
    Jan 23 at 17:33






  • 9





    For efficient database lookups, consider using a pepper instead.

    – NieDzejkob
    Jan 23 at 20:49






  • 6





    @DanNeely using a password-grade hash and a proper salt (unique for each user) would make the lookups prohibitively expensive; with password verification, you have already selected the user and know which salt to use, but in this case, you don't know which user it is and so have to try all of the salts

    – kbolino
    Jan 24 at 2:42







  • 2





    @kbolino the lookup should still be fast, as NieDzejkob pointed out you just can't use a unique salt. Since the actual recovery process should be rarely run you can compensate for that with much higher difficulty factors than would otherwise be acceptable for a login. 10 or 20 seconds to hash the candidate email is fine, since once you're done it once you can do a fast DB lookup afterward; while the extreme slowness of the hash means that even without the need to do each user separately a brute force attack is prohibitively expensive. Just rent a big cloud VM for a for the initial seeding.

    – Dan Neely
    Jan 24 at 3:09


















3














Realistically, pseudonymization is any method of obfuscating someone's PII/NPI so that it can't be reasonably traced back to one certain individual. GDPR doesn't necessarily dictate what hashing algorithm you are required to use in order to comply with it's standard, and to be honest - it's best that it doesn't, because if you consider the fact that if everyone was using the exact same method of obfuscation, you're creating a massive single point of failure all around. Your best bet, (as mentioned above) is to use some form of tokenization with salt, to add extra randomness to your algorithm so that it can't be easily bruteforced.






share|improve this answer


















  • 8





    From an information security perspective, the idea that it's bad to have a single widely used obfuscation method is dubious (it's either secure or not). However, it is accurate that standardizing the method by law could pose a problem, since it could become outdated.

    – Christoph Burschka
    Jan 23 at 16:18











  • The legislation that the GDPR replaced (the data protection directive 95/46/EG) is over 20 years old. IIRC, in the mid-1990s, MD5 was a pretty decent choice, and certainly among the better that were generally available; these days it's considered horribly inadequate, and even SHA-1 (which was designed to replace it) is a bad choice. Who knows what will happen to hash algorithms in the next 20-25 years? I agree, mandating any particular method or algorithm in the regulations themselves would be a bad thing to do.

    – a CVn
    Jan 24 at 9:42











Your Answer








StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "162"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsecurity.stackexchange.com%2fquestions%2f202022%2fhashing-email-addresses-for-gdpr-compliance%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























2 Answers
2






active

oldest

votes








2 Answers
2






active

oldest

votes









active

oldest

votes






active

oldest

votes









36














MD5 or SHA is not the concern. Hashes can be used for pseudonymization. The problem is that the hash would need to be salted (or peppered) so that data from other sources could not be used to identify the person.



My email is the same everywhere. A hash of it would also be the same. So that means that, in this case, the hash and my email become synonymous. Just like a username and the legal name of a person if paired. If you use a hash in this case, you actually gain nothing in terms of GDPR.



Hashing with a salt (or pepper) makes de-anonymising nearly impossible without knowing the added value. The salt (or pepper) almost becomes the token, in this case.



As always, check with your DPO.






share|improve this answer




















  • 2





    You probably should still use a password hash not one designed for speed. Email addresses follow common patterns and may only have very short unique parts; which would leave some of them equivalent to short passwords that can be bruteforced if only protected by a single pass of MD5 or SHA.

    – Dan Neely
    Jan 23 at 16:03






  • 5





    "Hashing with a salt makes de-anonymising nearly impossible without knowing the salt." Since the salt is usually stored right next to the hash, shouldn't it be assumed that the salt is known?

    – kapex
    Jan 23 at 17:33






  • 9





    For efficient database lookups, consider using a pepper instead.

    – NieDzejkob
    Jan 23 at 20:49






  • 6





    @DanNeely using a password-grade hash and a proper salt (unique for each user) would make the lookups prohibitively expensive; with password verification, you have already selected the user and know which salt to use, but in this case, you don't know which user it is and so have to try all of the salts

    – kbolino
    Jan 24 at 2:42







  • 2





    @kbolino the lookup should still be fast, as NieDzejkob pointed out you just can't use a unique salt. Since the actual recovery process should be rarely run you can compensate for that with much higher difficulty factors than would otherwise be acceptable for a login. 10 or 20 seconds to hash the candidate email is fine, since once you're done it once you can do a fast DB lookup afterward; while the extreme slowness of the hash means that even without the need to do each user separately a brute force attack is prohibitively expensive. Just rent a big cloud VM for a for the initial seeding.

    – Dan Neely
    Jan 24 at 3:09















36














MD5 or SHA is not the concern. Hashes can be used for pseudonymization. The problem is that the hash would need to be salted (or peppered) so that data from other sources could not be used to identify the person.



My email is the same everywhere. A hash of it would also be the same. So that means that, in this case, the hash and my email become synonymous. Just like a username and the legal name of a person if paired. If you use a hash in this case, you actually gain nothing in terms of GDPR.



Hashing with a salt (or pepper) makes de-anonymising nearly impossible without knowing the added value. The salt (or pepper) almost becomes the token, in this case.



As always, check with your DPO.






share|improve this answer




















  • 2





    You probably should still use a password hash not one designed for speed. Email addresses follow common patterns and may only have very short unique parts; which would leave some of them equivalent to short passwords that can be bruteforced if only protected by a single pass of MD5 or SHA.

    – Dan Neely
    Jan 23 at 16:03






  • 5





    "Hashing with a salt makes de-anonymising nearly impossible without knowing the salt." Since the salt is usually stored right next to the hash, shouldn't it be assumed that the salt is known?

    – kapex
    Jan 23 at 17:33






  • 9





    For efficient database lookups, consider using a pepper instead.

    – NieDzejkob
    Jan 23 at 20:49






  • 6





    @DanNeely using a password-grade hash and a proper salt (unique for each user) would make the lookups prohibitively expensive; with password verification, you have already selected the user and know which salt to use, but in this case, you don't know which user it is and so have to try all of the salts

    – kbolino
    Jan 24 at 2:42







  • 2





    @kbolino the lookup should still be fast, as NieDzejkob pointed out you just can't use a unique salt. Since the actual recovery process should be rarely run you can compensate for that with much higher difficulty factors than would otherwise be acceptable for a login. 10 or 20 seconds to hash the candidate email is fine, since once you're done it once you can do a fast DB lookup afterward; while the extreme slowness of the hash means that even without the need to do each user separately a brute force attack is prohibitively expensive. Just rent a big cloud VM for a for the initial seeding.

    – Dan Neely
    Jan 24 at 3:09













36












36








36







MD5 or SHA is not the concern. Hashes can be used for pseudonymization. The problem is that the hash would need to be salted (or peppered) so that data from other sources could not be used to identify the person.



My email is the same everywhere. A hash of it would also be the same. So that means that, in this case, the hash and my email become synonymous. Just like a username and the legal name of a person if paired. If you use a hash in this case, you actually gain nothing in terms of GDPR.



Hashing with a salt (or pepper) makes de-anonymising nearly impossible without knowing the added value. The salt (or pepper) almost becomes the token, in this case.



As always, check with your DPO.






share|improve this answer















MD5 or SHA is not the concern. Hashes can be used for pseudonymization. The problem is that the hash would need to be salted (or peppered) so that data from other sources could not be used to identify the person.



My email is the same everywhere. A hash of it would also be the same. So that means that, in this case, the hash and my email become synonymous. Just like a username and the legal name of a person if paired. If you use a hash in this case, you actually gain nothing in terms of GDPR.



Hashing with a salt (or pepper) makes de-anonymising nearly impossible without knowing the added value. The salt (or pepper) almost becomes the token, in this case.



As always, check with your DPO.







share|improve this answer














share|improve this answer



share|improve this answer








edited Jan 24 at 20:28

























answered Jan 23 at 12:42









schroederschroeder

75.1k29164200




75.1k29164200







  • 2





    You probably should still use a password hash not one designed for speed. Email addresses follow common patterns and may only have very short unique parts; which would leave some of them equivalent to short passwords that can be bruteforced if only protected by a single pass of MD5 or SHA.

    – Dan Neely
    Jan 23 at 16:03






  • 5





    "Hashing with a salt makes de-anonymising nearly impossible without knowing the salt." Since the salt is usually stored right next to the hash, shouldn't it be assumed that the salt is known?

    – kapex
    Jan 23 at 17:33






  • 9





    For efficient database lookups, consider using a pepper instead.

    – NieDzejkob
    Jan 23 at 20:49






  • 6





    @DanNeely using a password-grade hash and a proper salt (unique for each user) would make the lookups prohibitively expensive; with password verification, you have already selected the user and know which salt to use, but in this case, you don't know which user it is and so have to try all of the salts

    – kbolino
    Jan 24 at 2:42







  • 2





    @kbolino the lookup should still be fast, as NieDzejkob pointed out you just can't use a unique salt. Since the actual recovery process should be rarely run you can compensate for that with much higher difficulty factors than would otherwise be acceptable for a login. 10 or 20 seconds to hash the candidate email is fine, since once you're done it once you can do a fast DB lookup afterward; while the extreme slowness of the hash means that even without the need to do each user separately a brute force attack is prohibitively expensive. Just rent a big cloud VM for a for the initial seeding.

    – Dan Neely
    Jan 24 at 3:09












  • 2





    You probably should still use a password hash not one designed for speed. Email addresses follow common patterns and may only have very short unique parts; which would leave some of them equivalent to short passwords that can be bruteforced if only protected by a single pass of MD5 or SHA.

    – Dan Neely
    Jan 23 at 16:03






  • 5





    "Hashing with a salt makes de-anonymising nearly impossible without knowing the salt." Since the salt is usually stored right next to the hash, shouldn't it be assumed that the salt is known?

    – kapex
    Jan 23 at 17:33






  • 9





    For efficient database lookups, consider using a pepper instead.

    – NieDzejkob
    Jan 23 at 20:49






  • 6





    @DanNeely using a password-grade hash and a proper salt (unique for each user) would make the lookups prohibitively expensive; with password verification, you have already selected the user and know which salt to use, but in this case, you don't know which user it is and so have to try all of the salts

    – kbolino
    Jan 24 at 2:42







  • 2





    @kbolino the lookup should still be fast, as NieDzejkob pointed out you just can't use a unique salt. Since the actual recovery process should be rarely run you can compensate for that with much higher difficulty factors than would otherwise be acceptable for a login. 10 or 20 seconds to hash the candidate email is fine, since once you're done it once you can do a fast DB lookup afterward; while the extreme slowness of the hash means that even without the need to do each user separately a brute force attack is prohibitively expensive. Just rent a big cloud VM for a for the initial seeding.

    – Dan Neely
    Jan 24 at 3:09







2




2





You probably should still use a password hash not one designed for speed. Email addresses follow common patterns and may only have very short unique parts; which would leave some of them equivalent to short passwords that can be bruteforced if only protected by a single pass of MD5 or SHA.

– Dan Neely
Jan 23 at 16:03





You probably should still use a password hash not one designed for speed. Email addresses follow common patterns and may only have very short unique parts; which would leave some of them equivalent to short passwords that can be bruteforced if only protected by a single pass of MD5 or SHA.

– Dan Neely
Jan 23 at 16:03




5




5





"Hashing with a salt makes de-anonymising nearly impossible without knowing the salt." Since the salt is usually stored right next to the hash, shouldn't it be assumed that the salt is known?

– kapex
Jan 23 at 17:33





"Hashing with a salt makes de-anonymising nearly impossible without knowing the salt." Since the salt is usually stored right next to the hash, shouldn't it be assumed that the salt is known?

– kapex
Jan 23 at 17:33




9




9





For efficient database lookups, consider using a pepper instead.

– NieDzejkob
Jan 23 at 20:49





For efficient database lookups, consider using a pepper instead.

– NieDzejkob
Jan 23 at 20:49




6




6





@DanNeely using a password-grade hash and a proper salt (unique for each user) would make the lookups prohibitively expensive; with password verification, you have already selected the user and know which salt to use, but in this case, you don't know which user it is and so have to try all of the salts

– kbolino
Jan 24 at 2:42






@DanNeely using a password-grade hash and a proper salt (unique for each user) would make the lookups prohibitively expensive; with password verification, you have already selected the user and know which salt to use, but in this case, you don't know which user it is and so have to try all of the salts

– kbolino
Jan 24 at 2:42





2




2





@kbolino the lookup should still be fast, as NieDzejkob pointed out you just can't use a unique salt. Since the actual recovery process should be rarely run you can compensate for that with much higher difficulty factors than would otherwise be acceptable for a login. 10 or 20 seconds to hash the candidate email is fine, since once you're done it once you can do a fast DB lookup afterward; while the extreme slowness of the hash means that even without the need to do each user separately a brute force attack is prohibitively expensive. Just rent a big cloud VM for a for the initial seeding.

– Dan Neely
Jan 24 at 3:09





@kbolino the lookup should still be fast, as NieDzejkob pointed out you just can't use a unique salt. Since the actual recovery process should be rarely run you can compensate for that with much higher difficulty factors than would otherwise be acceptable for a login. 10 or 20 seconds to hash the candidate email is fine, since once you're done it once you can do a fast DB lookup afterward; while the extreme slowness of the hash means that even without the need to do each user separately a brute force attack is prohibitively expensive. Just rent a big cloud VM for a for the initial seeding.

– Dan Neely
Jan 24 at 3:09













3














Realistically, pseudonymization is any method of obfuscating someone's PII/NPI so that it can't be reasonably traced back to one certain individual. GDPR doesn't necessarily dictate what hashing algorithm you are required to use in order to comply with it's standard, and to be honest - it's best that it doesn't, because if you consider the fact that if everyone was using the exact same method of obfuscation, you're creating a massive single point of failure all around. Your best bet, (as mentioned above) is to use some form of tokenization with salt, to add extra randomness to your algorithm so that it can't be easily bruteforced.






share|improve this answer


















  • 8





    From an information security perspective, the idea that it's bad to have a single widely used obfuscation method is dubious (it's either secure or not). However, it is accurate that standardizing the method by law could pose a problem, since it could become outdated.

    – Christoph Burschka
    Jan 23 at 16:18











  • The legislation that the GDPR replaced (the data protection directive 95/46/EG) is over 20 years old. IIRC, in the mid-1990s, MD5 was a pretty decent choice, and certainly among the better that were generally available; these days it's considered horribly inadequate, and even SHA-1 (which was designed to replace it) is a bad choice. Who knows what will happen to hash algorithms in the next 20-25 years? I agree, mandating any particular method or algorithm in the regulations themselves would be a bad thing to do.

    – a CVn
    Jan 24 at 9:42
















3














Realistically, pseudonymization is any method of obfuscating someone's PII/NPI so that it can't be reasonably traced back to one certain individual. GDPR doesn't necessarily dictate what hashing algorithm you are required to use in order to comply with it's standard, and to be honest - it's best that it doesn't, because if you consider the fact that if everyone was using the exact same method of obfuscation, you're creating a massive single point of failure all around. Your best bet, (as mentioned above) is to use some form of tokenization with salt, to add extra randomness to your algorithm so that it can't be easily bruteforced.






share|improve this answer


















  • 8





    From an information security perspective, the idea that it's bad to have a single widely used obfuscation method is dubious (it's either secure or not). However, it is accurate that standardizing the method by law could pose a problem, since it could become outdated.

    – Christoph Burschka
    Jan 23 at 16:18











  • The legislation that the GDPR replaced (the data protection directive 95/46/EG) is over 20 years old. IIRC, in the mid-1990s, MD5 was a pretty decent choice, and certainly among the better that were generally available; these days it's considered horribly inadequate, and even SHA-1 (which was designed to replace it) is a bad choice. Who knows what will happen to hash algorithms in the next 20-25 years? I agree, mandating any particular method or algorithm in the regulations themselves would be a bad thing to do.

    – a CVn
    Jan 24 at 9:42














3












3








3







Realistically, pseudonymization is any method of obfuscating someone's PII/NPI so that it can't be reasonably traced back to one certain individual. GDPR doesn't necessarily dictate what hashing algorithm you are required to use in order to comply with it's standard, and to be honest - it's best that it doesn't, because if you consider the fact that if everyone was using the exact same method of obfuscation, you're creating a massive single point of failure all around. Your best bet, (as mentioned above) is to use some form of tokenization with salt, to add extra randomness to your algorithm so that it can't be easily bruteforced.






share|improve this answer













Realistically, pseudonymization is any method of obfuscating someone's PII/NPI so that it can't be reasonably traced back to one certain individual. GDPR doesn't necessarily dictate what hashing algorithm you are required to use in order to comply with it's standard, and to be honest - it's best that it doesn't, because if you consider the fact that if everyone was using the exact same method of obfuscation, you're creating a massive single point of failure all around. Your best bet, (as mentioned above) is to use some form of tokenization with salt, to add extra randomness to your algorithm so that it can't be easily bruteforced.







share|improve this answer












share|improve this answer



share|improve this answer










answered Jan 23 at 14:50









GhostInTheShellGhostInTheShell

512




512







  • 8





    From an information security perspective, the idea that it's bad to have a single widely used obfuscation method is dubious (it's either secure or not). However, it is accurate that standardizing the method by law could pose a problem, since it could become outdated.

    – Christoph Burschka
    Jan 23 at 16:18











  • The legislation that the GDPR replaced (the data protection directive 95/46/EG) is over 20 years old. IIRC, in the mid-1990s, MD5 was a pretty decent choice, and certainly among the better that were generally available; these days it's considered horribly inadequate, and even SHA-1 (which was designed to replace it) is a bad choice. Who knows what will happen to hash algorithms in the next 20-25 years? I agree, mandating any particular method or algorithm in the regulations themselves would be a bad thing to do.

    – a CVn
    Jan 24 at 9:42













  • 8





    From an information security perspective, the idea that it's bad to have a single widely used obfuscation method is dubious (it's either secure or not). However, it is accurate that standardizing the method by law could pose a problem, since it could become outdated.

    – Christoph Burschka
    Jan 23 at 16:18











  • The legislation that the GDPR replaced (the data protection directive 95/46/EG) is over 20 years old. IIRC, in the mid-1990s, MD5 was a pretty decent choice, and certainly among the better that were generally available; these days it's considered horribly inadequate, and even SHA-1 (which was designed to replace it) is a bad choice. Who knows what will happen to hash algorithms in the next 20-25 years? I agree, mandating any particular method or algorithm in the regulations themselves would be a bad thing to do.

    – a CVn
    Jan 24 at 9:42








8




8





From an information security perspective, the idea that it's bad to have a single widely used obfuscation method is dubious (it's either secure or not). However, it is accurate that standardizing the method by law could pose a problem, since it could become outdated.

– Christoph Burschka
Jan 23 at 16:18





From an information security perspective, the idea that it's bad to have a single widely used obfuscation method is dubious (it's either secure or not). However, it is accurate that standardizing the method by law could pose a problem, since it could become outdated.

– Christoph Burschka
Jan 23 at 16:18













The legislation that the GDPR replaced (the data protection directive 95/46/EG) is over 20 years old. IIRC, in the mid-1990s, MD5 was a pretty decent choice, and certainly among the better that were generally available; these days it's considered horribly inadequate, and even SHA-1 (which was designed to replace it) is a bad choice. Who knows what will happen to hash algorithms in the next 20-25 years? I agree, mandating any particular method or algorithm in the regulations themselves would be a bad thing to do.

– a CVn
Jan 24 at 9:42






The legislation that the GDPR replaced (the data protection directive 95/46/EG) is over 20 years old. IIRC, in the mid-1990s, MD5 was a pretty decent choice, and certainly among the better that were generally available; these days it's considered horribly inadequate, and even SHA-1 (which was designed to replace it) is a bad choice. Who knows what will happen to hash algorithms in the next 20-25 years? I agree, mandating any particular method or algorithm in the regulations themselves would be a bad thing to do.

– a CVn
Jan 24 at 9:42


















draft saved

draft discarded
















































Thanks for contributing an answer to Information Security Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsecurity.stackexchange.com%2fquestions%2f202022%2fhashing-email-addresses-for-gdpr-compliance%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown






Popular posts from this blog

How to check contact read email or not when send email to Individual?

Displaying single band from multi-band raster using QGIS

How many registers does an x86_64 CPU actually have?