Hardware RNG Bias
Clash Royale CLAN TAG#URR8PPP
$begingroup$
I have generator based on sampled noise (reverse-biased transistor - actually ChaosKey but that's not that important).
The noise source is fed into ADC and sampled with 16bit resolution (2 bytes). Problem is, the signal level is not high enough to cover whole range of ADC so what i am getting insted of 0-65535 i am getting something like 0-4096.
If I plot histogram of this data i see normal distribution and spectrum of the signal is more or less flat (white noise - looks fine on the graph)
But when i write this to a file or use it as bit stream, i have heavy bias. That is understandable - 4 bits from upper byte is always zero so the file is guaranteed to have lot more zeroes than ones (12.5% at least, probably even more). So the amount of entropy in this data is pretty low.
Now taking into account NIST SP 800-90B requirement for monitoring entropy sources i should monitor raw output to check if the generator didn't failed (stuck at some value etc.) i would still like to get something with more entropy and less bias.
My idea is simple - throw out each higher byte from each sample (the one which is mostly zero) and leave just the lover one. The question is, is this OK? Can i still treat such stream as unconditioned?
BTW on the original bit stream with NIST Non IID test i am getting min(H_original, 8 X H_bitstring): 2.527848
random-number-generator entropy
$endgroup$
add a comment |
$begingroup$
I have generator based on sampled noise (reverse-biased transistor - actually ChaosKey but that's not that important).
The noise source is fed into ADC and sampled with 16bit resolution (2 bytes). Problem is, the signal level is not high enough to cover whole range of ADC so what i am getting insted of 0-65535 i am getting something like 0-4096.
If I plot histogram of this data i see normal distribution and spectrum of the signal is more or less flat (white noise - looks fine on the graph)
But when i write this to a file or use it as bit stream, i have heavy bias. That is understandable - 4 bits from upper byte is always zero so the file is guaranteed to have lot more zeroes than ones (12.5% at least, probably even more). So the amount of entropy in this data is pretty low.
Now taking into account NIST SP 800-90B requirement for monitoring entropy sources i should monitor raw output to check if the generator didn't failed (stuck at some value etc.) i would still like to get something with more entropy and less bias.
My idea is simple - throw out each higher byte from each sample (the one which is mostly zero) and leave just the lover one. The question is, is this OK? Can i still treat such stream as unconditioned?
BTW on the original bit stream with NIST Non IID test i am getting min(H_original, 8 X H_bitstring): 2.527848
random-number-generator entropy
$endgroup$
$begingroup$
What does "8 X H_bitstring): 2.527848" mean exactly? What's the 8?
$endgroup$
– Paul Uszak
Feb 15 at 17:39
add a comment |
$begingroup$
I have generator based on sampled noise (reverse-biased transistor - actually ChaosKey but that's not that important).
The noise source is fed into ADC and sampled with 16bit resolution (2 bytes). Problem is, the signal level is not high enough to cover whole range of ADC so what i am getting insted of 0-65535 i am getting something like 0-4096.
If I plot histogram of this data i see normal distribution and spectrum of the signal is more or less flat (white noise - looks fine on the graph)
But when i write this to a file or use it as bit stream, i have heavy bias. That is understandable - 4 bits from upper byte is always zero so the file is guaranteed to have lot more zeroes than ones (12.5% at least, probably even more). So the amount of entropy in this data is pretty low.
Now taking into account NIST SP 800-90B requirement for monitoring entropy sources i should monitor raw output to check if the generator didn't failed (stuck at some value etc.) i would still like to get something with more entropy and less bias.
My idea is simple - throw out each higher byte from each sample (the one which is mostly zero) and leave just the lover one. The question is, is this OK? Can i still treat such stream as unconditioned?
BTW on the original bit stream with NIST Non IID test i am getting min(H_original, 8 X H_bitstring): 2.527848
random-number-generator entropy
$endgroup$
I have generator based on sampled noise (reverse-biased transistor - actually ChaosKey but that's not that important).
The noise source is fed into ADC and sampled with 16bit resolution (2 bytes). Problem is, the signal level is not high enough to cover whole range of ADC so what i am getting insted of 0-65535 i am getting something like 0-4096.
If I plot histogram of this data i see normal distribution and spectrum of the signal is more or less flat (white noise - looks fine on the graph)
But when i write this to a file or use it as bit stream, i have heavy bias. That is understandable - 4 bits from upper byte is always zero so the file is guaranteed to have lot more zeroes than ones (12.5% at least, probably even more). So the amount of entropy in this data is pretty low.
Now taking into account NIST SP 800-90B requirement for monitoring entropy sources i should monitor raw output to check if the generator didn't failed (stuck at some value etc.) i would still like to get something with more entropy and less bias.
My idea is simple - throw out each higher byte from each sample (the one which is mostly zero) and leave just the lover one. The question is, is this OK? Can i still treat such stream as unconditioned?
BTW on the original bit stream with NIST Non IID test i am getting min(H_original, 8 X H_bitstring): 2.527848
random-number-generator entropy
random-number-generator entropy
asked Feb 15 at 13:24
pigsterpigster
283
283
$begingroup$
What does "8 X H_bitstring): 2.527848" mean exactly? What's the 8?
$endgroup$
– Paul Uszak
Feb 15 at 17:39
add a comment |
$begingroup$
What does "8 X H_bitstring): 2.527848" mean exactly? What's the 8?
$endgroup$
– Paul Uszak
Feb 15 at 17:39
$begingroup$
What does "8 X H_bitstring): 2.527848" mean exactly? What's the 8?
$endgroup$
– Paul Uszak
Feb 15 at 17:39
$begingroup$
What does "8 X H_bitstring): 2.527848" mean exactly? What's the 8?
$endgroup$
– Paul Uszak
Feb 15 at 17:39
add a comment |
3 Answers
3
active
oldest
votes
$begingroup$
Can i still treat such stream as unconditioned?
This "conditioned" idea is pretty much a red herring and mostly just a semantic distinction. The entropy generator is what it is. If you drop the high byte, the low byte is the signal. With a 12 bit peak signal, the lower eight bits will supply you with a good amount of entropy. I much prefer the saner AIS 20 / AIS 31 Functionality classes for random number generators, and the "conditioned" term doesn't even appear.
I typically just measure mean and standard deviation of the signal for monitoring. If the signal comes directly from a physical device such as an ADC, that will be an adequate quality control. Remember that at this point, you're effectively considering a simple quantized analogue signal and stochastic wave methods are fine. An RMS level calculation might be appropriate too. Also bear in mind that thermal effects will alter the standard deviation /RMS. And you have to read Are reverse biased transistors stable?
Transistors do not work backwards for long. They break down permanently and the noise changes. It's a common problem with the budget USB type TRNGs, and long term drift can be difficult to diagnose.
i have heavy bias
This is pretty irrelevant too. As bias increases, correlation increases and entropy decreases so all biases /correlations will be factored in when you measure the entropy rate. Ultimately, it will be taken care of within the randomness extraction mechanism, manifesting itself as a simple reduction of output rate.
So yes, it's okay to drop the high bits. Actually, it's extremely common to bit drop and virtually all of the large fast TRNGs do it for various reasons.
$endgroup$
add a comment |
$begingroup$
If high nybble of each byte is always exactly zero, then obviously you can't lose entropy by discarding it: the random output of any one-to-one (deterministic) function of a random input has exactly the same entropy as the input. But the story doesn't stop there.
First, you should have a probabilistic model of the entropy source, and a collection of probabilistic models of ways it could fail, informed by physics and engineering. Part of your model might include that the high nybble of each output byte is zero: if that changes, could that be evidence of a failure mode in your understanding of the physical device?
You should devise a test to distinguish the null hypothesis that your device is working from the collection of alternative hypotheses that your device has failed, with high true positive rate or statistical power to report failure, and with whatever false positive rate—often confusingly called ‘statistical significance level’—you are willing to tolerate. This lets you raise an alarm when the device has broken.
This test should be informed by the specific details of your device. Generic tests like dieharder or the NIST entropy suite or what have you are designed in ignorance of your device. They can detect some particular deviations from uniform, like if the ratio of zero bits to one bits is excessively far from 1/2, but they're very limited because the designers had no idea about the particular device you have in your hands. (More details on what generic ‘entropy tests’ can do.)
But don't worry about using the bits from the entropy source directly, or about whatever your favorite measure of nonuniformity is, as long as the probabilistic model for the device can generate samples with at least $geq256$ bits min-entropy—if you have to read a kilobyte out of the device to get that, so be it.
Next, you should feed the output through a standard preimage-resistant hash function like SHA3-256 (‘conditioning’).* You can lop off the high-order zero bits first if you want, or not: it doesn't matter, if you're 100% sure in advance that they will be zero. As long as the input has $geq256$ bits of min-entropy, the output can essentially be treated as if it has 256 bits of min-entropy, or as if it were a uniform random string.
My idea is simple - throw out each higher byte from each sample (the one which is mostly zero) and leave just the lover one. The question is, is this OK? Can i still treat such stream as unconditioned?
If you are 100% sure that they will always—in functioning and broken devices—be zero, then yes. The point of the unconditioned rule is that after conditioning like SHA3-256, statistical tests are essentially useless. You can test guesses about exactly what string of bits the device produced, but that's all. In contrast, if you're simply discarding bits of output that you can predict in advance will be zero, then that can't hurt your ability to distinguish a working device from a failed device based on the physics.
Unless, of course, one of the failure modes causes the bits have become one when they should have been zero, in which case, well, don't discard them—test them!
* There is literature on a subject called ‘randomness extractors’, typically constructed out of universal hash families like multiplication by a uniform random matrix. These are theoretically kind of interesting objects that aim to illuminate how much min-entropy you need in a nonuniform distribution to get within a certain total variation distance of a uniform distribution. But in practice, randomness extractors per se are essentially useless, because in order to use a randomness extractor to massage a nonuniform distribution into more-uniform, you need to also already have a uniform distribution on the extractor seed. With that uniform random seed you could have just done normal cryptography. (More details on randomness extractors and practical cryptography.)
$endgroup$
$begingroup$
If randomness extractors are useless, the implication is that there are no TRNGs in this world! Yet strangely they seem to exist, are available for sale and I even have some on my desk. Hmm...
$endgroup$
– Paul Uszak
Feb 19 at 17:44
$begingroup$
And you've even referred to the ID Quantique TRNG technical papers on how they work. Which are also for sale, and pretty popular.
$endgroup$
– Paul Uszak
Feb 19 at 17:53
2
$begingroup$
Whatever ID Quantique is selling, the theorems they use in their garbled marketing brochure don't apply when the premises don't hold. It's unclear how you draw the conclusion that TRNGs don't exist—if they don't, then the physical world is obviously 100% deterministic and predictable—from a rather mundane observation that the technical definition of ‘randomness extractor’ in the literature isn't useful in practice without begging the question. Computing the product of a fixed public matrix with a sample may or may not ruin its entropy; the theorems of the RE literature are just not applicable.
$endgroup$
– Squeamish Ossifrage
Feb 19 at 18:21
add a comment |
$begingroup$
Tossing the top byte sounds like a good starting point (or maybe just the top four bits), but I'd run the test on the result and see what comes up. There may be bias in the lower bits as well that's not obvious right now.
An alternative is to use a randomness extractor. That way you should be able to massage the data to be a usable RNG even if there is bias.
$endgroup$
4
$begingroup$
I wouldn't characterize a randomness extractor as an alternative. An extractor is the answer. Anything else isn't a great idea.
$endgroup$
– Future Security
Feb 15 at 21:00
$begingroup$
I shouldn't have used the term randomness extractor. See Squeamish Ossifrage's answer. I mean something like a cryptographic-hash-based conditioning algorithm. Using "randomness extractor" this way (to include hashes and Von Neumann's method) looks like a really common misuse of terminology. The (actual) answer is to use a good secure hash function.
$endgroup$
– Future Security
Feb 19 at 19:45
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
);
);
, "mathjax-editing");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "281"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcrypto.stackexchange.com%2fquestions%2f67334%2fhardware-rng-bias%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
Can i still treat such stream as unconditioned?
This "conditioned" idea is pretty much a red herring and mostly just a semantic distinction. The entropy generator is what it is. If you drop the high byte, the low byte is the signal. With a 12 bit peak signal, the lower eight bits will supply you with a good amount of entropy. I much prefer the saner AIS 20 / AIS 31 Functionality classes for random number generators, and the "conditioned" term doesn't even appear.
I typically just measure mean and standard deviation of the signal for monitoring. If the signal comes directly from a physical device such as an ADC, that will be an adequate quality control. Remember that at this point, you're effectively considering a simple quantized analogue signal and stochastic wave methods are fine. An RMS level calculation might be appropriate too. Also bear in mind that thermal effects will alter the standard deviation /RMS. And you have to read Are reverse biased transistors stable?
Transistors do not work backwards for long. They break down permanently and the noise changes. It's a common problem with the budget USB type TRNGs, and long term drift can be difficult to diagnose.
i have heavy bias
This is pretty irrelevant too. As bias increases, correlation increases and entropy decreases so all biases /correlations will be factored in when you measure the entropy rate. Ultimately, it will be taken care of within the randomness extraction mechanism, manifesting itself as a simple reduction of output rate.
So yes, it's okay to drop the high bits. Actually, it's extremely common to bit drop and virtually all of the large fast TRNGs do it for various reasons.
$endgroup$
add a comment |
$begingroup$
Can i still treat such stream as unconditioned?
This "conditioned" idea is pretty much a red herring and mostly just a semantic distinction. The entropy generator is what it is. If you drop the high byte, the low byte is the signal. With a 12 bit peak signal, the lower eight bits will supply you with a good amount of entropy. I much prefer the saner AIS 20 / AIS 31 Functionality classes for random number generators, and the "conditioned" term doesn't even appear.
I typically just measure mean and standard deviation of the signal for monitoring. If the signal comes directly from a physical device such as an ADC, that will be an adequate quality control. Remember that at this point, you're effectively considering a simple quantized analogue signal and stochastic wave methods are fine. An RMS level calculation might be appropriate too. Also bear in mind that thermal effects will alter the standard deviation /RMS. And you have to read Are reverse biased transistors stable?
Transistors do not work backwards for long. They break down permanently and the noise changes. It's a common problem with the budget USB type TRNGs, and long term drift can be difficult to diagnose.
i have heavy bias
This is pretty irrelevant too. As bias increases, correlation increases and entropy decreases so all biases /correlations will be factored in when you measure the entropy rate. Ultimately, it will be taken care of within the randomness extraction mechanism, manifesting itself as a simple reduction of output rate.
So yes, it's okay to drop the high bits. Actually, it's extremely common to bit drop and virtually all of the large fast TRNGs do it for various reasons.
$endgroup$
add a comment |
$begingroup$
Can i still treat such stream as unconditioned?
This "conditioned" idea is pretty much a red herring and mostly just a semantic distinction. The entropy generator is what it is. If you drop the high byte, the low byte is the signal. With a 12 bit peak signal, the lower eight bits will supply you with a good amount of entropy. I much prefer the saner AIS 20 / AIS 31 Functionality classes for random number generators, and the "conditioned" term doesn't even appear.
I typically just measure mean and standard deviation of the signal for monitoring. If the signal comes directly from a physical device such as an ADC, that will be an adequate quality control. Remember that at this point, you're effectively considering a simple quantized analogue signal and stochastic wave methods are fine. An RMS level calculation might be appropriate too. Also bear in mind that thermal effects will alter the standard deviation /RMS. And you have to read Are reverse biased transistors stable?
Transistors do not work backwards for long. They break down permanently and the noise changes. It's a common problem with the budget USB type TRNGs, and long term drift can be difficult to diagnose.
i have heavy bias
This is pretty irrelevant too. As bias increases, correlation increases and entropy decreases so all biases /correlations will be factored in when you measure the entropy rate. Ultimately, it will be taken care of within the randomness extraction mechanism, manifesting itself as a simple reduction of output rate.
So yes, it's okay to drop the high bits. Actually, it's extremely common to bit drop and virtually all of the large fast TRNGs do it for various reasons.
$endgroup$
Can i still treat such stream as unconditioned?
This "conditioned" idea is pretty much a red herring and mostly just a semantic distinction. The entropy generator is what it is. If you drop the high byte, the low byte is the signal. With a 12 bit peak signal, the lower eight bits will supply you with a good amount of entropy. I much prefer the saner AIS 20 / AIS 31 Functionality classes for random number generators, and the "conditioned" term doesn't even appear.
I typically just measure mean and standard deviation of the signal for monitoring. If the signal comes directly from a physical device such as an ADC, that will be an adequate quality control. Remember that at this point, you're effectively considering a simple quantized analogue signal and stochastic wave methods are fine. An RMS level calculation might be appropriate too. Also bear in mind that thermal effects will alter the standard deviation /RMS. And you have to read Are reverse biased transistors stable?
Transistors do not work backwards for long. They break down permanently and the noise changes. It's a common problem with the budget USB type TRNGs, and long term drift can be difficult to diagnose.
i have heavy bias
This is pretty irrelevant too. As bias increases, correlation increases and entropy decreases so all biases /correlations will be factored in when you measure the entropy rate. Ultimately, it will be taken care of within the randomness extraction mechanism, manifesting itself as a simple reduction of output rate.
So yes, it's okay to drop the high bits. Actually, it's extremely common to bit drop and virtually all of the large fast TRNGs do it for various reasons.
edited Feb 15 at 22:49
answered Feb 15 at 17:36
Paul UszakPaul Uszak
7,40611636
7,40611636
add a comment |
add a comment |
$begingroup$
If high nybble of each byte is always exactly zero, then obviously you can't lose entropy by discarding it: the random output of any one-to-one (deterministic) function of a random input has exactly the same entropy as the input. But the story doesn't stop there.
First, you should have a probabilistic model of the entropy source, and a collection of probabilistic models of ways it could fail, informed by physics and engineering. Part of your model might include that the high nybble of each output byte is zero: if that changes, could that be evidence of a failure mode in your understanding of the physical device?
You should devise a test to distinguish the null hypothesis that your device is working from the collection of alternative hypotheses that your device has failed, with high true positive rate or statistical power to report failure, and with whatever false positive rate—often confusingly called ‘statistical significance level’—you are willing to tolerate. This lets you raise an alarm when the device has broken.
This test should be informed by the specific details of your device. Generic tests like dieharder or the NIST entropy suite or what have you are designed in ignorance of your device. They can detect some particular deviations from uniform, like if the ratio of zero bits to one bits is excessively far from 1/2, but they're very limited because the designers had no idea about the particular device you have in your hands. (More details on what generic ‘entropy tests’ can do.)
But don't worry about using the bits from the entropy source directly, or about whatever your favorite measure of nonuniformity is, as long as the probabilistic model for the device can generate samples with at least $geq256$ bits min-entropy—if you have to read a kilobyte out of the device to get that, so be it.
Next, you should feed the output through a standard preimage-resistant hash function like SHA3-256 (‘conditioning’).* You can lop off the high-order zero bits first if you want, or not: it doesn't matter, if you're 100% sure in advance that they will be zero. As long as the input has $geq256$ bits of min-entropy, the output can essentially be treated as if it has 256 bits of min-entropy, or as if it were a uniform random string.
My idea is simple - throw out each higher byte from each sample (the one which is mostly zero) and leave just the lover one. The question is, is this OK? Can i still treat such stream as unconditioned?
If you are 100% sure that they will always—in functioning and broken devices—be zero, then yes. The point of the unconditioned rule is that after conditioning like SHA3-256, statistical tests are essentially useless. You can test guesses about exactly what string of bits the device produced, but that's all. In contrast, if you're simply discarding bits of output that you can predict in advance will be zero, then that can't hurt your ability to distinguish a working device from a failed device based on the physics.
Unless, of course, one of the failure modes causes the bits have become one when they should have been zero, in which case, well, don't discard them—test them!
* There is literature on a subject called ‘randomness extractors’, typically constructed out of universal hash families like multiplication by a uniform random matrix. These are theoretically kind of interesting objects that aim to illuminate how much min-entropy you need in a nonuniform distribution to get within a certain total variation distance of a uniform distribution. But in practice, randomness extractors per se are essentially useless, because in order to use a randomness extractor to massage a nonuniform distribution into more-uniform, you need to also already have a uniform distribution on the extractor seed. With that uniform random seed you could have just done normal cryptography. (More details on randomness extractors and practical cryptography.)
$endgroup$
$begingroup$
If randomness extractors are useless, the implication is that there are no TRNGs in this world! Yet strangely they seem to exist, are available for sale and I even have some on my desk. Hmm...
$endgroup$
– Paul Uszak
Feb 19 at 17:44
$begingroup$
And you've even referred to the ID Quantique TRNG technical papers on how they work. Which are also for sale, and pretty popular.
$endgroup$
– Paul Uszak
Feb 19 at 17:53
2
$begingroup$
Whatever ID Quantique is selling, the theorems they use in their garbled marketing brochure don't apply when the premises don't hold. It's unclear how you draw the conclusion that TRNGs don't exist—if they don't, then the physical world is obviously 100% deterministic and predictable—from a rather mundane observation that the technical definition of ‘randomness extractor’ in the literature isn't useful in practice without begging the question. Computing the product of a fixed public matrix with a sample may or may not ruin its entropy; the theorems of the RE literature are just not applicable.
$endgroup$
– Squeamish Ossifrage
Feb 19 at 18:21
add a comment |
$begingroup$
If high nybble of each byte is always exactly zero, then obviously you can't lose entropy by discarding it: the random output of any one-to-one (deterministic) function of a random input has exactly the same entropy as the input. But the story doesn't stop there.
First, you should have a probabilistic model of the entropy source, and a collection of probabilistic models of ways it could fail, informed by physics and engineering. Part of your model might include that the high nybble of each output byte is zero: if that changes, could that be evidence of a failure mode in your understanding of the physical device?
You should devise a test to distinguish the null hypothesis that your device is working from the collection of alternative hypotheses that your device has failed, with high true positive rate or statistical power to report failure, and with whatever false positive rate—often confusingly called ‘statistical significance level’—you are willing to tolerate. This lets you raise an alarm when the device has broken.
This test should be informed by the specific details of your device. Generic tests like dieharder or the NIST entropy suite or what have you are designed in ignorance of your device. They can detect some particular deviations from uniform, like if the ratio of zero bits to one bits is excessively far from 1/2, but they're very limited because the designers had no idea about the particular device you have in your hands. (More details on what generic ‘entropy tests’ can do.)
But don't worry about using the bits from the entropy source directly, or about whatever your favorite measure of nonuniformity is, as long as the probabilistic model for the device can generate samples with at least $geq256$ bits min-entropy—if you have to read a kilobyte out of the device to get that, so be it.
Next, you should feed the output through a standard preimage-resistant hash function like SHA3-256 (‘conditioning’).* You can lop off the high-order zero bits first if you want, or not: it doesn't matter, if you're 100% sure in advance that they will be zero. As long as the input has $geq256$ bits of min-entropy, the output can essentially be treated as if it has 256 bits of min-entropy, or as if it were a uniform random string.
My idea is simple - throw out each higher byte from each sample (the one which is mostly zero) and leave just the lover one. The question is, is this OK? Can i still treat such stream as unconditioned?
If you are 100% sure that they will always—in functioning and broken devices—be zero, then yes. The point of the unconditioned rule is that after conditioning like SHA3-256, statistical tests are essentially useless. You can test guesses about exactly what string of bits the device produced, but that's all. In contrast, if you're simply discarding bits of output that you can predict in advance will be zero, then that can't hurt your ability to distinguish a working device from a failed device based on the physics.
Unless, of course, one of the failure modes causes the bits have become one when they should have been zero, in which case, well, don't discard them—test them!
* There is literature on a subject called ‘randomness extractors’, typically constructed out of universal hash families like multiplication by a uniform random matrix. These are theoretically kind of interesting objects that aim to illuminate how much min-entropy you need in a nonuniform distribution to get within a certain total variation distance of a uniform distribution. But in practice, randomness extractors per se are essentially useless, because in order to use a randomness extractor to massage a nonuniform distribution into more-uniform, you need to also already have a uniform distribution on the extractor seed. With that uniform random seed you could have just done normal cryptography. (More details on randomness extractors and practical cryptography.)
$endgroup$
$begingroup$
If randomness extractors are useless, the implication is that there are no TRNGs in this world! Yet strangely they seem to exist, are available for sale and I even have some on my desk. Hmm...
$endgroup$
– Paul Uszak
Feb 19 at 17:44
$begingroup$
And you've even referred to the ID Quantique TRNG technical papers on how they work. Which are also for sale, and pretty popular.
$endgroup$
– Paul Uszak
Feb 19 at 17:53
2
$begingroup$
Whatever ID Quantique is selling, the theorems they use in their garbled marketing brochure don't apply when the premises don't hold. It's unclear how you draw the conclusion that TRNGs don't exist—if they don't, then the physical world is obviously 100% deterministic and predictable—from a rather mundane observation that the technical definition of ‘randomness extractor’ in the literature isn't useful in practice without begging the question. Computing the product of a fixed public matrix with a sample may or may not ruin its entropy; the theorems of the RE literature are just not applicable.
$endgroup$
– Squeamish Ossifrage
Feb 19 at 18:21
add a comment |
$begingroup$
If high nybble of each byte is always exactly zero, then obviously you can't lose entropy by discarding it: the random output of any one-to-one (deterministic) function of a random input has exactly the same entropy as the input. But the story doesn't stop there.
First, you should have a probabilistic model of the entropy source, and a collection of probabilistic models of ways it could fail, informed by physics and engineering. Part of your model might include that the high nybble of each output byte is zero: if that changes, could that be evidence of a failure mode in your understanding of the physical device?
You should devise a test to distinguish the null hypothesis that your device is working from the collection of alternative hypotheses that your device has failed, with high true positive rate or statistical power to report failure, and with whatever false positive rate—often confusingly called ‘statistical significance level’—you are willing to tolerate. This lets you raise an alarm when the device has broken.
This test should be informed by the specific details of your device. Generic tests like dieharder or the NIST entropy suite or what have you are designed in ignorance of your device. They can detect some particular deviations from uniform, like if the ratio of zero bits to one bits is excessively far from 1/2, but they're very limited because the designers had no idea about the particular device you have in your hands. (More details on what generic ‘entropy tests’ can do.)
But don't worry about using the bits from the entropy source directly, or about whatever your favorite measure of nonuniformity is, as long as the probabilistic model for the device can generate samples with at least $geq256$ bits min-entropy—if you have to read a kilobyte out of the device to get that, so be it.
Next, you should feed the output through a standard preimage-resistant hash function like SHA3-256 (‘conditioning’).* You can lop off the high-order zero bits first if you want, or not: it doesn't matter, if you're 100% sure in advance that they will be zero. As long as the input has $geq256$ bits of min-entropy, the output can essentially be treated as if it has 256 bits of min-entropy, or as if it were a uniform random string.
My idea is simple - throw out each higher byte from each sample (the one which is mostly zero) and leave just the lover one. The question is, is this OK? Can i still treat such stream as unconditioned?
If you are 100% sure that they will always—in functioning and broken devices—be zero, then yes. The point of the unconditioned rule is that after conditioning like SHA3-256, statistical tests are essentially useless. You can test guesses about exactly what string of bits the device produced, but that's all. In contrast, if you're simply discarding bits of output that you can predict in advance will be zero, then that can't hurt your ability to distinguish a working device from a failed device based on the physics.
Unless, of course, one of the failure modes causes the bits have become one when they should have been zero, in which case, well, don't discard them—test them!
* There is literature on a subject called ‘randomness extractors’, typically constructed out of universal hash families like multiplication by a uniform random matrix. These are theoretically kind of interesting objects that aim to illuminate how much min-entropy you need in a nonuniform distribution to get within a certain total variation distance of a uniform distribution. But in practice, randomness extractors per se are essentially useless, because in order to use a randomness extractor to massage a nonuniform distribution into more-uniform, you need to also already have a uniform distribution on the extractor seed. With that uniform random seed you could have just done normal cryptography. (More details on randomness extractors and practical cryptography.)
$endgroup$
If high nybble of each byte is always exactly zero, then obviously you can't lose entropy by discarding it: the random output of any one-to-one (deterministic) function of a random input has exactly the same entropy as the input. But the story doesn't stop there.
First, you should have a probabilistic model of the entropy source, and a collection of probabilistic models of ways it could fail, informed by physics and engineering. Part of your model might include that the high nybble of each output byte is zero: if that changes, could that be evidence of a failure mode in your understanding of the physical device?
You should devise a test to distinguish the null hypothesis that your device is working from the collection of alternative hypotheses that your device has failed, with high true positive rate or statistical power to report failure, and with whatever false positive rate—often confusingly called ‘statistical significance level’—you are willing to tolerate. This lets you raise an alarm when the device has broken.
This test should be informed by the specific details of your device. Generic tests like dieharder or the NIST entropy suite or what have you are designed in ignorance of your device. They can detect some particular deviations from uniform, like if the ratio of zero bits to one bits is excessively far from 1/2, but they're very limited because the designers had no idea about the particular device you have in your hands. (More details on what generic ‘entropy tests’ can do.)
But don't worry about using the bits from the entropy source directly, or about whatever your favorite measure of nonuniformity is, as long as the probabilistic model for the device can generate samples with at least $geq256$ bits min-entropy—if you have to read a kilobyte out of the device to get that, so be it.
Next, you should feed the output through a standard preimage-resistant hash function like SHA3-256 (‘conditioning’).* You can lop off the high-order zero bits first if you want, or not: it doesn't matter, if you're 100% sure in advance that they will be zero. As long as the input has $geq256$ bits of min-entropy, the output can essentially be treated as if it has 256 bits of min-entropy, or as if it were a uniform random string.
My idea is simple - throw out each higher byte from each sample (the one which is mostly zero) and leave just the lover one. The question is, is this OK? Can i still treat such stream as unconditioned?
If you are 100% sure that they will always—in functioning and broken devices—be zero, then yes. The point of the unconditioned rule is that after conditioning like SHA3-256, statistical tests are essentially useless. You can test guesses about exactly what string of bits the device produced, but that's all. In contrast, if you're simply discarding bits of output that you can predict in advance will be zero, then that can't hurt your ability to distinguish a working device from a failed device based on the physics.
Unless, of course, one of the failure modes causes the bits have become one when they should have been zero, in which case, well, don't discard them—test them!
* There is literature on a subject called ‘randomness extractors’, typically constructed out of universal hash families like multiplication by a uniform random matrix. These are theoretically kind of interesting objects that aim to illuminate how much min-entropy you need in a nonuniform distribution to get within a certain total variation distance of a uniform distribution. But in practice, randomness extractors per se are essentially useless, because in order to use a randomness extractor to massage a nonuniform distribution into more-uniform, you need to also already have a uniform distribution on the extractor seed. With that uniform random seed you could have just done normal cryptography. (More details on randomness extractors and practical cryptography.)
answered Feb 19 at 17:27
Squeamish OssifrageSqueamish Ossifrage
19.3k12883
19.3k12883
$begingroup$
If randomness extractors are useless, the implication is that there are no TRNGs in this world! Yet strangely they seem to exist, are available for sale and I even have some on my desk. Hmm...
$endgroup$
– Paul Uszak
Feb 19 at 17:44
$begingroup$
And you've even referred to the ID Quantique TRNG technical papers on how they work. Which are also for sale, and pretty popular.
$endgroup$
– Paul Uszak
Feb 19 at 17:53
2
$begingroup$
Whatever ID Quantique is selling, the theorems they use in their garbled marketing brochure don't apply when the premises don't hold. It's unclear how you draw the conclusion that TRNGs don't exist—if they don't, then the physical world is obviously 100% deterministic and predictable—from a rather mundane observation that the technical definition of ‘randomness extractor’ in the literature isn't useful in practice without begging the question. Computing the product of a fixed public matrix with a sample may or may not ruin its entropy; the theorems of the RE literature are just not applicable.
$endgroup$
– Squeamish Ossifrage
Feb 19 at 18:21
add a comment |
$begingroup$
If randomness extractors are useless, the implication is that there are no TRNGs in this world! Yet strangely they seem to exist, are available for sale and I even have some on my desk. Hmm...
$endgroup$
– Paul Uszak
Feb 19 at 17:44
$begingroup$
And you've even referred to the ID Quantique TRNG technical papers on how they work. Which are also for sale, and pretty popular.
$endgroup$
– Paul Uszak
Feb 19 at 17:53
2
$begingroup$
Whatever ID Quantique is selling, the theorems they use in their garbled marketing brochure don't apply when the premises don't hold. It's unclear how you draw the conclusion that TRNGs don't exist—if they don't, then the physical world is obviously 100% deterministic and predictable—from a rather mundane observation that the technical definition of ‘randomness extractor’ in the literature isn't useful in practice without begging the question. Computing the product of a fixed public matrix with a sample may or may not ruin its entropy; the theorems of the RE literature are just not applicable.
$endgroup$
– Squeamish Ossifrage
Feb 19 at 18:21
$begingroup$
If randomness extractors are useless, the implication is that there are no TRNGs in this world! Yet strangely they seem to exist, are available for sale and I even have some on my desk. Hmm...
$endgroup$
– Paul Uszak
Feb 19 at 17:44
$begingroup$
If randomness extractors are useless, the implication is that there are no TRNGs in this world! Yet strangely they seem to exist, are available for sale and I even have some on my desk. Hmm...
$endgroup$
– Paul Uszak
Feb 19 at 17:44
$begingroup$
And you've even referred to the ID Quantique TRNG technical papers on how they work. Which are also for sale, and pretty popular.
$endgroup$
– Paul Uszak
Feb 19 at 17:53
$begingroup$
And you've even referred to the ID Quantique TRNG technical papers on how they work. Which are also for sale, and pretty popular.
$endgroup$
– Paul Uszak
Feb 19 at 17:53
2
2
$begingroup$
Whatever ID Quantique is selling, the theorems they use in their garbled marketing brochure don't apply when the premises don't hold. It's unclear how you draw the conclusion that TRNGs don't exist—if they don't, then the physical world is obviously 100% deterministic and predictable—from a rather mundane observation that the technical definition of ‘randomness extractor’ in the literature isn't useful in practice without begging the question. Computing the product of a fixed public matrix with a sample may or may not ruin its entropy; the theorems of the RE literature are just not applicable.
$endgroup$
– Squeamish Ossifrage
Feb 19 at 18:21
$begingroup$
Whatever ID Quantique is selling, the theorems they use in their garbled marketing brochure don't apply when the premises don't hold. It's unclear how you draw the conclusion that TRNGs don't exist—if they don't, then the physical world is obviously 100% deterministic and predictable—from a rather mundane observation that the technical definition of ‘randomness extractor’ in the literature isn't useful in practice without begging the question. Computing the product of a fixed public matrix with a sample may or may not ruin its entropy; the theorems of the RE literature are just not applicable.
$endgroup$
– Squeamish Ossifrage
Feb 19 at 18:21
add a comment |
$begingroup$
Tossing the top byte sounds like a good starting point (or maybe just the top four bits), but I'd run the test on the result and see what comes up. There may be bias in the lower bits as well that's not obvious right now.
An alternative is to use a randomness extractor. That way you should be able to massage the data to be a usable RNG even if there is bias.
$endgroup$
4
$begingroup$
I wouldn't characterize a randomness extractor as an alternative. An extractor is the answer. Anything else isn't a great idea.
$endgroup$
– Future Security
Feb 15 at 21:00
$begingroup$
I shouldn't have used the term randomness extractor. See Squeamish Ossifrage's answer. I mean something like a cryptographic-hash-based conditioning algorithm. Using "randomness extractor" this way (to include hashes and Von Neumann's method) looks like a really common misuse of terminology. The (actual) answer is to use a good secure hash function.
$endgroup$
– Future Security
Feb 19 at 19:45
add a comment |
$begingroup$
Tossing the top byte sounds like a good starting point (or maybe just the top four bits), but I'd run the test on the result and see what comes up. There may be bias in the lower bits as well that's not obvious right now.
An alternative is to use a randomness extractor. That way you should be able to massage the data to be a usable RNG even if there is bias.
$endgroup$
4
$begingroup$
I wouldn't characterize a randomness extractor as an alternative. An extractor is the answer. Anything else isn't a great idea.
$endgroup$
– Future Security
Feb 15 at 21:00
$begingroup$
I shouldn't have used the term randomness extractor. See Squeamish Ossifrage's answer. I mean something like a cryptographic-hash-based conditioning algorithm. Using "randomness extractor" this way (to include hashes and Von Neumann's method) looks like a really common misuse of terminology. The (actual) answer is to use a good secure hash function.
$endgroup$
– Future Security
Feb 19 at 19:45
add a comment |
$begingroup$
Tossing the top byte sounds like a good starting point (or maybe just the top four bits), but I'd run the test on the result and see what comes up. There may be bias in the lower bits as well that's not obvious right now.
An alternative is to use a randomness extractor. That way you should be able to massage the data to be a usable RNG even if there is bias.
$endgroup$
Tossing the top byte sounds like a good starting point (or maybe just the top four bits), but I'd run the test on the result and see what comes up. There may be bias in the lower bits as well that's not obvious right now.
An alternative is to use a randomness extractor. That way you should be able to massage the data to be a usable RNG even if there is bias.
answered Feb 15 at 16:20
SwashbucklerSwashbuckler
47824
47824
4
$begingroup$
I wouldn't characterize a randomness extractor as an alternative. An extractor is the answer. Anything else isn't a great idea.
$endgroup$
– Future Security
Feb 15 at 21:00
$begingroup$
I shouldn't have used the term randomness extractor. See Squeamish Ossifrage's answer. I mean something like a cryptographic-hash-based conditioning algorithm. Using "randomness extractor" this way (to include hashes and Von Neumann's method) looks like a really common misuse of terminology. The (actual) answer is to use a good secure hash function.
$endgroup$
– Future Security
Feb 19 at 19:45
add a comment |
4
$begingroup$
I wouldn't characterize a randomness extractor as an alternative. An extractor is the answer. Anything else isn't a great idea.
$endgroup$
– Future Security
Feb 15 at 21:00
$begingroup$
I shouldn't have used the term randomness extractor. See Squeamish Ossifrage's answer. I mean something like a cryptographic-hash-based conditioning algorithm. Using "randomness extractor" this way (to include hashes and Von Neumann's method) looks like a really common misuse of terminology. The (actual) answer is to use a good secure hash function.
$endgroup$
– Future Security
Feb 19 at 19:45
4
4
$begingroup$
I wouldn't characterize a randomness extractor as an alternative. An extractor is the answer. Anything else isn't a great idea.
$endgroup$
– Future Security
Feb 15 at 21:00
$begingroup$
I wouldn't characterize a randomness extractor as an alternative. An extractor is the answer. Anything else isn't a great idea.
$endgroup$
– Future Security
Feb 15 at 21:00
$begingroup$
I shouldn't have used the term randomness extractor. See Squeamish Ossifrage's answer. I mean something like a cryptographic-hash-based conditioning algorithm. Using "randomness extractor" this way (to include hashes and Von Neumann's method) looks like a really common misuse of terminology. The (actual) answer is to use a good secure hash function.
$endgroup$
– Future Security
Feb 19 at 19:45
$begingroup$
I shouldn't have used the term randomness extractor. See Squeamish Ossifrage's answer. I mean something like a cryptographic-hash-based conditioning algorithm. Using "randomness extractor" this way (to include hashes and Von Neumann's method) looks like a really common misuse of terminology. The (actual) answer is to use a good secure hash function.
$endgroup$
– Future Security
Feb 19 at 19:45
add a comment |
Thanks for contributing an answer to Cryptography Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcrypto.stackexchange.com%2fquestions%2f67334%2fhardware-rng-bias%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
$begingroup$
What does "8 X H_bitstring): 2.527848" mean exactly? What's the 8?
$endgroup$
– Paul Uszak
Feb 15 at 17:39