Does Format Preserving Encryption have significant advantages over a randomly generated lookup table?

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP












6












$begingroup$


I have a need to anonymise phone numbers so that I can carry out testing and analysis work on telecoms data sets and comply with GDPR. I typically receive a batch of a few hundred thousand events containing phone numbers, and need to anonymise all the phone numbers in that batch under the following conditions:



  1. A definable prefix of the number should remain the same - I should only transform the right-most n digits. n will typically be between 2 and 6


  2. I should always transform the digits in the same way - abcd should always be mapped to efgh


  3. The transformation should be one-to-one - efgh should be that output for only one input


  4. I should only output digits


If I then receive a further batch of events condition 2 is removed - I can use a new transformation for a new batch.



I've considered two approaches to this requirement:



  1. Randomly create a lookup table mapping each n digit string to another n digit string - I will do this for all n digit strings as a one-time exercise prior to encrypting a batch, and generate a new lookup table for a new batch


  2. Use one of the Format Preserving Encryption algorithms - e.g. Format-preserving, Feistel-based encryption (FFX) as implemented in the PyFFX or libffx libraries, or FE1 in the botan library


In some cases I may need to reconstruct the original phone numbers in the future in which case I will store the encryption key or lookup table in some secure fashion. In some cases I know I will not need to reconstruct them in which case I will discard the key directly after encryption.



Is there any advantage to using approach 2 over approach 1? I can see the following possibilities:



  1. Using a random number generator in 1 may be slightly less robust than using FPE


  2. The encryption key for FPE can be substantially smaller than the mapping table in point 1 if there is a need to retain it


  3. A publicly available FPE implementation may contain fewer vulnerabilities than a home-grown random lookup implementation


Are there any other considerations I should have in mind?










share|improve this question











$endgroup$
















    6












    $begingroup$


    I have a need to anonymise phone numbers so that I can carry out testing and analysis work on telecoms data sets and comply with GDPR. I typically receive a batch of a few hundred thousand events containing phone numbers, and need to anonymise all the phone numbers in that batch under the following conditions:



    1. A definable prefix of the number should remain the same - I should only transform the right-most n digits. n will typically be between 2 and 6


    2. I should always transform the digits in the same way - abcd should always be mapped to efgh


    3. The transformation should be one-to-one - efgh should be that output for only one input


    4. I should only output digits


    If I then receive a further batch of events condition 2 is removed - I can use a new transformation for a new batch.



    I've considered two approaches to this requirement:



    1. Randomly create a lookup table mapping each n digit string to another n digit string - I will do this for all n digit strings as a one-time exercise prior to encrypting a batch, and generate a new lookup table for a new batch


    2. Use one of the Format Preserving Encryption algorithms - e.g. Format-preserving, Feistel-based encryption (FFX) as implemented in the PyFFX or libffx libraries, or FE1 in the botan library


    In some cases I may need to reconstruct the original phone numbers in the future in which case I will store the encryption key or lookup table in some secure fashion. In some cases I know I will not need to reconstruct them in which case I will discard the key directly after encryption.



    Is there any advantage to using approach 2 over approach 1? I can see the following possibilities:



    1. Using a random number generator in 1 may be slightly less robust than using FPE


    2. The encryption key for FPE can be substantially smaller than the mapping table in point 1 if there is a need to retain it


    3. A publicly available FPE implementation may contain fewer vulnerabilities than a home-grown random lookup implementation


    Are there any other considerations I should have in mind?










    share|improve this question











    $endgroup$














      6












      6








      6





      $begingroup$


      I have a need to anonymise phone numbers so that I can carry out testing and analysis work on telecoms data sets and comply with GDPR. I typically receive a batch of a few hundred thousand events containing phone numbers, and need to anonymise all the phone numbers in that batch under the following conditions:



      1. A definable prefix of the number should remain the same - I should only transform the right-most n digits. n will typically be between 2 and 6


      2. I should always transform the digits in the same way - abcd should always be mapped to efgh


      3. The transformation should be one-to-one - efgh should be that output for only one input


      4. I should only output digits


      If I then receive a further batch of events condition 2 is removed - I can use a new transformation for a new batch.



      I've considered two approaches to this requirement:



      1. Randomly create a lookup table mapping each n digit string to another n digit string - I will do this for all n digit strings as a one-time exercise prior to encrypting a batch, and generate a new lookup table for a new batch


      2. Use one of the Format Preserving Encryption algorithms - e.g. Format-preserving, Feistel-based encryption (FFX) as implemented in the PyFFX or libffx libraries, or FE1 in the botan library


      In some cases I may need to reconstruct the original phone numbers in the future in which case I will store the encryption key or lookup table in some secure fashion. In some cases I know I will not need to reconstruct them in which case I will discard the key directly after encryption.



      Is there any advantage to using approach 2 over approach 1? I can see the following possibilities:



      1. Using a random number generator in 1 may be slightly less robust than using FPE


      2. The encryption key for FPE can be substantially smaller than the mapping table in point 1 if there is a need to retain it


      3. A publicly available FPE implementation may contain fewer vulnerabilities than a home-grown random lookup implementation


      Are there any other considerations I should have in mind?










      share|improve this question











      $endgroup$




      I have a need to anonymise phone numbers so that I can carry out testing and analysis work on telecoms data sets and comply with GDPR. I typically receive a batch of a few hundred thousand events containing phone numbers, and need to anonymise all the phone numbers in that batch under the following conditions:



      1. A definable prefix of the number should remain the same - I should only transform the right-most n digits. n will typically be between 2 and 6


      2. I should always transform the digits in the same way - abcd should always be mapped to efgh


      3. The transformation should be one-to-one - efgh should be that output for only one input


      4. I should only output digits


      If I then receive a further batch of events condition 2 is removed - I can use a new transformation for a new batch.



      I've considered two approaches to this requirement:



      1. Randomly create a lookup table mapping each n digit string to another n digit string - I will do this for all n digit strings as a one-time exercise prior to encrypting a batch, and generate a new lookup table for a new batch


      2. Use one of the Format Preserving Encryption algorithms - e.g. Format-preserving, Feistel-based encryption (FFX) as implemented in the PyFFX or libffx libraries, or FE1 in the botan library


      In some cases I may need to reconstruct the original phone numbers in the future in which case I will store the encryption key or lookup table in some secure fashion. In some cases I know I will not need to reconstruct them in which case I will discard the key directly after encryption.



      Is there any advantage to using approach 2 over approach 1? I can see the following possibilities:



      1. Using a random number generator in 1 may be slightly less robust than using FPE


      2. The encryption key for FPE can be substantially smaller than the mapping table in point 1 if there is a need to retain it


      3. A publicly available FPE implementation may contain fewer vulnerabilities than a home-grown random lookup implementation


      Are there any other considerations I should have in mind?







      substitution-cipher format-preserving






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Feb 4 at 17:11







      sumidid

















      asked Feb 4 at 14:57









      sumididsumidid

      333




      333




















          1 Answer
          1






          active

          oldest

          votes


















          5












          $begingroup$


          any other considerations?




          Yes.



          • In many common use cases the mapping table needs to be retained. That map changes each time a number is added; that's a backup / continuity of service headache.

          • The map is security-sensitive: it contains all the clear phone numbers, and information which (combined with other information) allows getting back to users.

          • The map can be large (at least about twice as large as the numbers themselves); that's a (perhaps minor) storage issue.

          • When the table grows large, there will be collisions in the mapping table, that's a special case which must be handled, and tested.

          • Timing attack on the map search code has the potential to leak information (that's also true of some FPE implementations, but proper FPE is less likely to leak something meaningful; in particular FPE won't leak if the creation of the entry was recent, which can happen with a map).

          • FPE's performance is predictable, when some map implementation have no specified worst-case performance.

          • FPE allows efficient implementation of direct and reverse map, when not all map implementations allow that.

          Bottom line: encryption is desirable there. FPE is useful only if size or interoperability is a concern.






          share|improve this answer











          $endgroup$








          • 1




            $begingroup$
            Thanks fgrieu! I wasn't clear on a few points: * It's not always necessary to retain the mapping table * I will create an exhaustive map once prior to encryption, rather than in online manner * I will typically need to encrypt the last between 2 and 6 digits of each phone number I've edited the question to reflect these changes. I think that given these points, your view on the advantages of FPE would be: - Less vulnerability to timing attacks - Predictable performance (maybe less important with small data) - Efficient implementation- definitely worth considering
            $endgroup$
            – sumidid
            Feb 4 at 17:14










          Your Answer





          StackExchange.ifUsing("editor", function ()
          return StackExchange.using("mathjaxEditing", function ()
          StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
          StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
          );
          );
          , "mathjax-editing");

          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "281"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: false,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          imageUploader:
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          ,
          noCode: true, onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );













          draft saved

          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcrypto.stackexchange.com%2fquestions%2f67049%2fdoes-format-preserving-encryption-have-significant-advantages-over-a-randomly-ge%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          5












          $begingroup$


          any other considerations?




          Yes.



          • In many common use cases the mapping table needs to be retained. That map changes each time a number is added; that's a backup / continuity of service headache.

          • The map is security-sensitive: it contains all the clear phone numbers, and information which (combined with other information) allows getting back to users.

          • The map can be large (at least about twice as large as the numbers themselves); that's a (perhaps minor) storage issue.

          • When the table grows large, there will be collisions in the mapping table, that's a special case which must be handled, and tested.

          • Timing attack on the map search code has the potential to leak information (that's also true of some FPE implementations, but proper FPE is less likely to leak something meaningful; in particular FPE won't leak if the creation of the entry was recent, which can happen with a map).

          • FPE's performance is predictable, when some map implementation have no specified worst-case performance.

          • FPE allows efficient implementation of direct and reverse map, when not all map implementations allow that.

          Bottom line: encryption is desirable there. FPE is useful only if size or interoperability is a concern.






          share|improve this answer











          $endgroup$








          • 1




            $begingroup$
            Thanks fgrieu! I wasn't clear on a few points: * It's not always necessary to retain the mapping table * I will create an exhaustive map once prior to encryption, rather than in online manner * I will typically need to encrypt the last between 2 and 6 digits of each phone number I've edited the question to reflect these changes. I think that given these points, your view on the advantages of FPE would be: - Less vulnerability to timing attacks - Predictable performance (maybe less important with small data) - Efficient implementation- definitely worth considering
            $endgroup$
            – sumidid
            Feb 4 at 17:14















          5












          $begingroup$


          any other considerations?




          Yes.



          • In many common use cases the mapping table needs to be retained. That map changes each time a number is added; that's a backup / continuity of service headache.

          • The map is security-sensitive: it contains all the clear phone numbers, and information which (combined with other information) allows getting back to users.

          • The map can be large (at least about twice as large as the numbers themselves); that's a (perhaps minor) storage issue.

          • When the table grows large, there will be collisions in the mapping table, that's a special case which must be handled, and tested.

          • Timing attack on the map search code has the potential to leak information (that's also true of some FPE implementations, but proper FPE is less likely to leak something meaningful; in particular FPE won't leak if the creation of the entry was recent, which can happen with a map).

          • FPE's performance is predictable, when some map implementation have no specified worst-case performance.

          • FPE allows efficient implementation of direct and reverse map, when not all map implementations allow that.

          Bottom line: encryption is desirable there. FPE is useful only if size or interoperability is a concern.






          share|improve this answer











          $endgroup$








          • 1




            $begingroup$
            Thanks fgrieu! I wasn't clear on a few points: * It's not always necessary to retain the mapping table * I will create an exhaustive map once prior to encryption, rather than in online manner * I will typically need to encrypt the last between 2 and 6 digits of each phone number I've edited the question to reflect these changes. I think that given these points, your view on the advantages of FPE would be: - Less vulnerability to timing attacks - Predictable performance (maybe less important with small data) - Efficient implementation- definitely worth considering
            $endgroup$
            – sumidid
            Feb 4 at 17:14













          5












          5








          5





          $begingroup$


          any other considerations?




          Yes.



          • In many common use cases the mapping table needs to be retained. That map changes each time a number is added; that's a backup / continuity of service headache.

          • The map is security-sensitive: it contains all the clear phone numbers, and information which (combined with other information) allows getting back to users.

          • The map can be large (at least about twice as large as the numbers themselves); that's a (perhaps minor) storage issue.

          • When the table grows large, there will be collisions in the mapping table, that's a special case which must be handled, and tested.

          • Timing attack on the map search code has the potential to leak information (that's also true of some FPE implementations, but proper FPE is less likely to leak something meaningful; in particular FPE won't leak if the creation of the entry was recent, which can happen with a map).

          • FPE's performance is predictable, when some map implementation have no specified worst-case performance.

          • FPE allows efficient implementation of direct and reverse map, when not all map implementations allow that.

          Bottom line: encryption is desirable there. FPE is useful only if size or interoperability is a concern.






          share|improve this answer











          $endgroup$




          any other considerations?




          Yes.



          • In many common use cases the mapping table needs to be retained. That map changes each time a number is added; that's a backup / continuity of service headache.

          • The map is security-sensitive: it contains all the clear phone numbers, and information which (combined with other information) allows getting back to users.

          • The map can be large (at least about twice as large as the numbers themselves); that's a (perhaps minor) storage issue.

          • When the table grows large, there will be collisions in the mapping table, that's a special case which must be handled, and tested.

          • Timing attack on the map search code has the potential to leak information (that's also true of some FPE implementations, but proper FPE is less likely to leak something meaningful; in particular FPE won't leak if the creation of the entry was recent, which can happen with a map).

          • FPE's performance is predictable, when some map implementation have no specified worst-case performance.

          • FPE allows efficient implementation of direct and reverse map, when not all map implementations allow that.

          Bottom line: encryption is desirable there. FPE is useful only if size or interoperability is a concern.







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Feb 4 at 17:15

























          answered Feb 4 at 15:45









          fgrieufgrieu

          80.5k7171340




          80.5k7171340







          • 1




            $begingroup$
            Thanks fgrieu! I wasn't clear on a few points: * It's not always necessary to retain the mapping table * I will create an exhaustive map once prior to encryption, rather than in online manner * I will typically need to encrypt the last between 2 and 6 digits of each phone number I've edited the question to reflect these changes. I think that given these points, your view on the advantages of FPE would be: - Less vulnerability to timing attacks - Predictable performance (maybe less important with small data) - Efficient implementation- definitely worth considering
            $endgroup$
            – sumidid
            Feb 4 at 17:14












          • 1




            $begingroup$
            Thanks fgrieu! I wasn't clear on a few points: * It's not always necessary to retain the mapping table * I will create an exhaustive map once prior to encryption, rather than in online manner * I will typically need to encrypt the last between 2 and 6 digits of each phone number I've edited the question to reflect these changes. I think that given these points, your view on the advantages of FPE would be: - Less vulnerability to timing attacks - Predictable performance (maybe less important with small data) - Efficient implementation- definitely worth considering
            $endgroup$
            – sumidid
            Feb 4 at 17:14







          1




          1




          $begingroup$
          Thanks fgrieu! I wasn't clear on a few points: * It's not always necessary to retain the mapping table * I will create an exhaustive map once prior to encryption, rather than in online manner * I will typically need to encrypt the last between 2 and 6 digits of each phone number I've edited the question to reflect these changes. I think that given these points, your view on the advantages of FPE would be: - Less vulnerability to timing attacks - Predictable performance (maybe less important with small data) - Efficient implementation- definitely worth considering
          $endgroup$
          – sumidid
          Feb 4 at 17:14




          $begingroup$
          Thanks fgrieu! I wasn't clear on a few points: * It's not always necessary to retain the mapping table * I will create an exhaustive map once prior to encryption, rather than in online manner * I will typically need to encrypt the last between 2 and 6 digits of each phone number I've edited the question to reflect these changes. I think that given these points, your view on the advantages of FPE would be: - Less vulnerability to timing attacks - Predictable performance (maybe less important with small data) - Efficient implementation- definitely worth considering
          $endgroup$
          – sumidid
          Feb 4 at 17:14

















          draft saved

          draft discarded
















































          Thanks for contributing an answer to Cryptography Stack Exchange!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid


          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.

          Use MathJax to format equations. MathJax reference.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcrypto.stackexchange.com%2fquestions%2f67049%2fdoes-format-preserving-encryption-have-significant-advantages-over-a-randomly-ge%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown






          Popular posts from this blog

          Peggy Mitchell

          Palaiologos

          The Forum (Inglewood, California)