Does Format Preserving Encryption have significant advantages over a randomly generated lookup table?

I have a need to anonymise phone numbers so that I can carry out testing and analysis work on telecoms data sets and comply with GDPR. I typically receive a batch of a few hundred thousand events containing phone numbers, and need to anonymise all the phone numbers in that batch under the following conditions:

A definable prefix of the number should remain the same - I should only transform the right-most n digits. n will typically be between 2 and 6

I should always transform the digits in the same way - abcd should always be mapped to efgh

The transformation should be one-to-one - efgh should be that output for only one input

I should only output digits

If I then receive a further batch of events condition 2 is removed - I can use a new transformation for a new batch.

I've considered two approaches to this requirement:

Randomly create a lookup table mapping each n digit string to another n digit string - I will do this for all n digit strings as a one-time exercise prior to encrypting a batch, and generate a new lookup table for a new batch

Use one of the Format Preserving Encryption algorithms - e.g. Format-preserving, Feistel-based encryption (FFX) as implemented in the PyFFX or libffx libraries, or FE1 in the botan library

In some cases I may need to reconstruct the original phone numbers in the future in which case I will store the encryption key or lookup table in some secure fashion. In some cases I know I will not need to reconstruct them in which case I will discard the key directly after encryption.

Is there any advantage to using approach 2 over approach 1? I can see the following possibilities:

Using a random number generator in 1 may be slightly less robust than using FPE

The encryption key for FPE can be substantially smaller than the mapping table in point 1 if there is a need to retain it

A publicly available FPE implementation may contain fewer vulnerabilities than a home-grown random lookup implementation

Are there any other considerations I should have in mind?

edited Feb 4 at 17:11

asked Feb 4 at 14:57

sumidid

333

add a comment |

A definable prefix of the number should remain the same - I should only transform the right-most n digits. n will typically be between 2 and 6

I should always transform the digits in the same way - abcd should always be mapped to efgh

The transformation should be one-to-one - efgh should be that output for only one input

I should only output digits

If I then receive a further batch of events condition 2 is removed - I can use a new transformation for a new batch.

I've considered two approaches to this requirement:

Randomly create a lookup table mapping each n digit string to another n digit string - I will do this for all n digit strings as a one-time exercise prior to encrypting a batch, and generate a new lookup table for a new batch

Use one of the Format Preserving Encryption algorithms - e.g. Format-preserving, Feistel-based encryption (FFX) as implemented in the PyFFX or libffx libraries, or FE1 in the botan library

Is there any advantage to using approach 2 over approach 1? I can see the following possibilities:

Using a random number generator in 1 may be slightly less robust than using FPE

The encryption key for FPE can be substantially smaller than the mapping table in point 1 if there is a need to retain it

A publicly available FPE implementation may contain fewer vulnerabilities than a home-grown random lookup implementation

Are there any other considerations I should have in mind?

edited Feb 4 at 17:11

asked Feb 4 at 14:57

sumidid

333

add a comment |

A definable prefix of the number should remain the same - I should only transform the right-most n digits. n will typically be between 2 and 6

I should always transform the digits in the same way - abcd should always be mapped to efgh

The transformation should be one-to-one - efgh should be that output for only one input

I should only output digits

If I then receive a further batch of events condition 2 is removed - I can use a new transformation for a new batch.

I've considered two approaches to this requirement:

Randomly create a lookup table mapping each n digit string to another n digit string - I will do this for all n digit strings as a one-time exercise prior to encrypting a batch, and generate a new lookup table for a new batch

Use one of the Format Preserving Encryption algorithms - e.g. Format-preserving, Feistel-based encryption (FFX) as implemented in the PyFFX or libffx libraries, or FE1 in the botan library

Is there any advantage to using approach 2 over approach 1? I can see the following possibilities:

Using a random number generator in 1 may be slightly less robust than using FPE

The encryption key for FPE can be substantially smaller than the mapping table in point 1 if there is a need to retain it

A publicly available FPE implementation may contain fewer vulnerabilities than a home-grown random lookup implementation

Are there any other considerations I should have in mind?

edited Feb 4 at 17:11

asked Feb 4 at 14:57

sumidid

333

A definable prefix of the number should remain the same - I should only transform the right-most n digits. n will typically be between 2 and 6

I should always transform the digits in the same way - abcd should always be mapped to efgh

The transformation should be one-to-one - efgh should be that output for only one input

I should only output digits

If I then receive a further batch of events condition 2 is removed - I can use a new transformation for a new batch.

I've considered two approaches to this requirement:

Randomly create a lookup table mapping each n digit string to another n digit string - I will do this for all n digit strings as a one-time exercise prior to encrypting a batch, and generate a new lookup table for a new batch

Use one of the Format Preserving Encryption algorithms - e.g. Format-preserving, Feistel-based encryption (FFX) as implemented in the PyFFX or libffx libraries, or FE1 in the botan library

Is there any advantage to using approach 2 over approach 1? I can see the following possibilities:

Using a random number generator in 1 may be slightly less robust than using FPE

The encryption key for FPE can be substantially smaller than the mapping table in point 1 if there is a need to retain it

A publicly available FPE implementation may contain fewer vulnerabilities than a home-grown random lookup implementation

Are there any other considerations I should have in mind?

substitution-cipher format-preserving

edited Feb 4 at 17:11

asked Feb 4 at 14:57

sumidid

333

edited Feb 4 at 17:11

asked Feb 4 at 14:57

sumidid

333

edited Feb 4 at 17:11

asked Feb 4 at 14:57

sumidid

333

asked Feb 4 at 14:57

sumidid

333

asked Feb 4 at 14:57

sumidid

333

add a comment |

1 Answer
1

active

oldest

votes

any other considerations?

Yes.

In many common use cases the mapping table needs to be retained. That map changes each time a number is added; that's a backup / continuity of service headache.

The map is security-sensitive: it contains all the clear phone numbers, and information which (combined with other information) allows getting back to users.

The map can be large (at least about twice as large as the numbers themselves); that's a (perhaps minor) storage issue.

When the table grows large, there will be collisions in the mapping table, that's a special case which must be handled, and tested.

Timing attack on the map search code has the potential to leak information (that's also true of some FPE implementations, but proper FPE is less likely to leak something meaningful; in particular FPE won't leak if the creation of the entry was recent, which can happen with a map).

FPE's performance is predictable, when some map implementation have no specified worst-case performance.

FPE allows efficient implementation of direct and reverse map, when not all map implementations allow that.

Bottom line: encryption is desirable there. FPE is useful only if size or interoperability is a concern.

edited Feb 4 at 17:15

answered Feb 4 at 15:45

fgrieu

80.5k7171340

1

$begingroup$
Thanks fgrieu! I wasn't clear on a few points: * It's not always necessary to retain the mapping table * I will create an exhaustive map once prior to encryption, rather than in online manner * I will typically need to encrypt the last between 2 and 6 digits of each phone number I've edited the question to reflect these changes. I think that given these points, your view on the advantages of FPE would be: - Less vulnerability to timing attacks - Predictable performance (maybe less important with small data) - Efficient implementation- definitely worth considering
$endgroup$
– sumidid
Feb 4 at 17:14

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "281"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcrypto.stackexchange.com%2fquestions%2f67049%2fdoes-format-preserving-encryption-have-significant-advantages-over-a-randomly-ge%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

any other considerations?

Yes.

In many common use cases the mapping table needs to be retained. That map changes each time a number is added; that's a backup / continuity of service headache.

The map is security-sensitive: it contains all the clear phone numbers, and information which (combined with other information) allows getting back to users.

The map can be large (at least about twice as large as the numbers themselves); that's a (perhaps minor) storage issue.

When the table grows large, there will be collisions in the mapping table, that's a special case which must be handled, and tested.

Timing attack on the map search code has the potential to leak information (that's also true of some FPE implementations, but proper FPE is less likely to leak something meaningful; in particular FPE won't leak if the creation of the entry was recent, which can happen with a map).

FPE's performance is predictable, when some map implementation have no specified worst-case performance.

FPE allows efficient implementation of direct and reverse map, when not all map implementations allow that.

Bottom line: encryption is desirable there. FPE is useful only if size or interoperability is a concern.

edited Feb 4 at 17:15

answered Feb 4 at 15:45

fgrieu

80.5k7171340

1

$begingroup$
Thanks fgrieu! I wasn't clear on a few points: * It's not always necessary to retain the mapping table * I will create an exhaustive map once prior to encryption, rather than in online manner * I will typically need to encrypt the last between 2 and 6 digits of each phone number I've edited the question to reflect these changes. I think that given these points, your view on the advantages of FPE would be: - Less vulnerability to timing attacks - Predictable performance (maybe less important with small data) - Efficient implementation- definitely worth considering
$endgroup$
– sumidid
Feb 4 at 17:14

add a comment |

any other considerations?

Yes.

In many common use cases the mapping table needs to be retained. That map changes each time a number is added; that's a backup / continuity of service headache.

The map is security-sensitive: it contains all the clear phone numbers, and information which (combined with other information) allows getting back to users.

The map can be large (at least about twice as large as the numbers themselves); that's a (perhaps minor) storage issue.

When the table grows large, there will be collisions in the mapping table, that's a special case which must be handled, and tested.

Timing attack on the map search code has the potential to leak information (that's also true of some FPE implementations, but proper FPE is less likely to leak something meaningful; in particular FPE won't leak if the creation of the entry was recent, which can happen with a map).

FPE's performance is predictable, when some map implementation have no specified worst-case performance.

FPE allows efficient implementation of direct and reverse map, when not all map implementations allow that.

Bottom line: encryption is desirable there. FPE is useful only if size or interoperability is a concern.

edited Feb 4 at 17:15

answered Feb 4 at 15:45

fgrieu

80.5k7171340

1

$begingroup$
Thanks fgrieu! I wasn't clear on a few points: * It's not always necessary to retain the mapping table * I will create an exhaustive map once prior to encryption, rather than in online manner * I will typically need to encrypt the last between 2 and 6 digits of each phone number I've edited the question to reflect these changes. I think that given these points, your view on the advantages of FPE would be: - Less vulnerability to timing attacks - Predictable performance (maybe less important with small data) - Efficient implementation- definitely worth considering
$endgroup$
– sumidid
Feb 4 at 17:14

add a comment |

any other considerations?

Yes.

In many common use cases the mapping table needs to be retained. That map changes each time a number is added; that's a backup / continuity of service headache.

The map is security-sensitive: it contains all the clear phone numbers, and information which (combined with other information) allows getting back to users.

The map can be large (at least about twice as large as the numbers themselves); that's a (perhaps minor) storage issue.

When the table grows large, there will be collisions in the mapping table, that's a special case which must be handled, and tested.

Timing attack on the map search code has the potential to leak information (that's also true of some FPE implementations, but proper FPE is less likely to leak something meaningful; in particular FPE won't leak if the creation of the entry was recent, which can happen with a map).

FPE's performance is predictable, when some map implementation have no specified worst-case performance.

FPE allows efficient implementation of direct and reverse map, when not all map implementations allow that.

Bottom line: encryption is desirable there. FPE is useful only if size or interoperability is a concern.

edited Feb 4 at 17:15

answered Feb 4 at 15:45

fgrieu

80.5k7171340

any other considerations?

Yes.

In many common use cases the mapping table needs to be retained. That map changes each time a number is added; that's a backup / continuity of service headache.

The map is security-sensitive: it contains all the clear phone numbers, and information which (combined with other information) allows getting back to users.

The map can be large (at least about twice as large as the numbers themselves); that's a (perhaps minor) storage issue.

When the table grows large, there will be collisions in the mapping table, that's a special case which must be handled, and tested.

Timing attack on the map search code has the potential to leak information (that's also true of some FPE implementations, but proper FPE is less likely to leak something meaningful; in particular FPE won't leak if the creation of the entry was recent, which can happen with a map).

FPE's performance is predictable, when some map implementation have no specified worst-case performance.

FPE allows efficient implementation of direct and reverse map, when not all map implementations allow that.

Bottom line: encryption is desirable there. FPE is useful only if size or interoperability is a concern.

edited Feb 4 at 17:15

answered Feb 4 at 15:45

fgrieu

80.5k7171340

edited Feb 4 at 17:15

answered Feb 4 at 15:45

fgrieu

80.5k7171340

answered Feb 4 at 15:45

fgrieu

80.5k7171340

answered Feb 4 at 15:45

fgrieu

80.5k7171340

1

$begingroup$
Thanks fgrieu! I wasn't clear on a few points: * It's not always necessary to retain the mapping table * I will create an exhaustive map once prior to encryption, rather than in online manner * I will typically need to encrypt the last between 2 and 6 digits of each phone number I've edited the question to reflect these changes. I think that given these points, your view on the advantages of FPE would be: - Less vulnerability to timing attacks - Predictable performance (maybe less important with small data) - Efficient implementation- definitely worth considering
$endgroup$
– sumidid
Feb 4 at 17:14

add a comment |

1

$begingroup$
Thanks fgrieu! I wasn't clear on a few points: * It's not always necessary to retain the mapping table * I will create an exhaustive map once prior to encryption, rather than in online manner * I will typically need to encrypt the last between 2 and 6 digits of each phone number I've edited the question to reflect these changes. I think that given these points, your view on the advantages of FPE would be: - Less vulnerability to timing attacks - Predictable performance (maybe less important with small data) - Efficient implementation- definitely worth considering
$endgroup$
– sumidid
Feb 4 at 17:14

Thanks fgrieu! I wasn't clear on a few points: * It's not always necessary to retain the mapping table * I will create an exhaustive map once prior to encryption, rather than in online manner * I will typically need to encrypt the last between 2 and 6 digits of each phone number I've edited the question to reflect these changes. I think that given these points, your view on the advantages of FPE would be: - Less vulnerability to timing attacks - Predictable performance (maybe less important with small data) - Efficient implementation- definitely worth considering

– sumidid
Feb 4 at 17:14

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Cryptography Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

搜尋此網誌

mjhjmtu