Hardware RNG Bias

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP












5












$begingroup$


I have generator based on sampled noise (reverse-biased transistor - actually ChaosKey but that's not that important).



The noise source is fed into ADC and sampled with 16bit resolution (2 bytes). Problem is, the signal level is not high enough to cover whole range of ADC so what i am getting insted of 0-65535 i am getting something like 0-4096.



If I plot histogram of this data i see normal distribution and spectrum of the signal is more or less flat (white noise - looks fine on the graph)



But when i write this to a file or use it as bit stream, i have heavy bias. That is understandable - 4 bits from upper byte is always zero so the file is guaranteed to have lot more zeroes than ones (12.5% at least, probably even more). So the amount of entropy in this data is pretty low.



Now taking into account NIST SP 800-90B requirement for monitoring entropy sources i should monitor raw output to check if the generator didn't failed (stuck at some value etc.) i would still like to get something with more entropy and less bias.



My idea is simple - throw out each higher byte from each sample (the one which is mostly zero) and leave just the lover one. The question is, is this OK? Can i still treat such stream as unconditioned?



BTW on the original bit stream with NIST Non IID test i am getting min(H_original, 8 X H_bitstring): 2.527848










share|improve this question









$endgroup$











  • $begingroup$
    What does "8 X H_bitstring): 2.527848" mean exactly? What's the 8?
    $endgroup$
    – Paul Uszak
    Feb 15 at 17:39















5












$begingroup$


I have generator based on sampled noise (reverse-biased transistor - actually ChaosKey but that's not that important).



The noise source is fed into ADC and sampled with 16bit resolution (2 bytes). Problem is, the signal level is not high enough to cover whole range of ADC so what i am getting insted of 0-65535 i am getting something like 0-4096.



If I plot histogram of this data i see normal distribution and spectrum of the signal is more or less flat (white noise - looks fine on the graph)



But when i write this to a file or use it as bit stream, i have heavy bias. That is understandable - 4 bits from upper byte is always zero so the file is guaranteed to have lot more zeroes than ones (12.5% at least, probably even more). So the amount of entropy in this data is pretty low.



Now taking into account NIST SP 800-90B requirement for monitoring entropy sources i should monitor raw output to check if the generator didn't failed (stuck at some value etc.) i would still like to get something with more entropy and less bias.



My idea is simple - throw out each higher byte from each sample (the one which is mostly zero) and leave just the lover one. The question is, is this OK? Can i still treat such stream as unconditioned?



BTW on the original bit stream with NIST Non IID test i am getting min(H_original, 8 X H_bitstring): 2.527848










share|improve this question









$endgroup$











  • $begingroup$
    What does "8 X H_bitstring): 2.527848" mean exactly? What's the 8?
    $endgroup$
    – Paul Uszak
    Feb 15 at 17:39













5












5








5


2



$begingroup$


I have generator based on sampled noise (reverse-biased transistor - actually ChaosKey but that's not that important).



The noise source is fed into ADC and sampled with 16bit resolution (2 bytes). Problem is, the signal level is not high enough to cover whole range of ADC so what i am getting insted of 0-65535 i am getting something like 0-4096.



If I plot histogram of this data i see normal distribution and spectrum of the signal is more or less flat (white noise - looks fine on the graph)



But when i write this to a file or use it as bit stream, i have heavy bias. That is understandable - 4 bits from upper byte is always zero so the file is guaranteed to have lot more zeroes than ones (12.5% at least, probably even more). So the amount of entropy in this data is pretty low.



Now taking into account NIST SP 800-90B requirement for monitoring entropy sources i should monitor raw output to check if the generator didn't failed (stuck at some value etc.) i would still like to get something with more entropy and less bias.



My idea is simple - throw out each higher byte from each sample (the one which is mostly zero) and leave just the lover one. The question is, is this OK? Can i still treat such stream as unconditioned?



BTW on the original bit stream with NIST Non IID test i am getting min(H_original, 8 X H_bitstring): 2.527848










share|improve this question









$endgroup$




I have generator based on sampled noise (reverse-biased transistor - actually ChaosKey but that's not that important).



The noise source is fed into ADC and sampled with 16bit resolution (2 bytes). Problem is, the signal level is not high enough to cover whole range of ADC so what i am getting insted of 0-65535 i am getting something like 0-4096.



If I plot histogram of this data i see normal distribution and spectrum of the signal is more or less flat (white noise - looks fine on the graph)



But when i write this to a file or use it as bit stream, i have heavy bias. That is understandable - 4 bits from upper byte is always zero so the file is guaranteed to have lot more zeroes than ones (12.5% at least, probably even more). So the amount of entropy in this data is pretty low.



Now taking into account NIST SP 800-90B requirement for monitoring entropy sources i should monitor raw output to check if the generator didn't failed (stuck at some value etc.) i would still like to get something with more entropy and less bias.



My idea is simple - throw out each higher byte from each sample (the one which is mostly zero) and leave just the lover one. The question is, is this OK? Can i still treat such stream as unconditioned?



BTW on the original bit stream with NIST Non IID test i am getting min(H_original, 8 X H_bitstring): 2.527848







random-number-generator entropy






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Feb 15 at 13:24









pigsterpigster

283




283











  • $begingroup$
    What does "8 X H_bitstring): 2.527848" mean exactly? What's the 8?
    $endgroup$
    – Paul Uszak
    Feb 15 at 17:39
















  • $begingroup$
    What does "8 X H_bitstring): 2.527848" mean exactly? What's the 8?
    $endgroup$
    – Paul Uszak
    Feb 15 at 17:39















$begingroup$
What does "8 X H_bitstring): 2.527848" mean exactly? What's the 8?
$endgroup$
– Paul Uszak
Feb 15 at 17:39




$begingroup$
What does "8 X H_bitstring): 2.527848" mean exactly? What's the 8?
$endgroup$
– Paul Uszak
Feb 15 at 17:39










3 Answers
3






active

oldest

votes


















0












$begingroup$


Can i still treat such stream as unconditioned?




This "conditioned" idea is pretty much a red herring and mostly just a semantic distinction. The entropy generator is what it is. If you drop the high byte, the low byte is the signal. With a 12 bit peak signal, the lower eight bits will supply you with a good amount of entropy. I much prefer the saner AIS 20 / AIS 31 Functionality classes for random number generators, and the "conditioned" term doesn't even appear.



I typically just measure mean and standard deviation of the signal for monitoring. If the signal comes directly from a physical device such as an ADC, that will be an adequate quality control. Remember that at this point, you're effectively considering a simple quantized analogue signal and stochastic wave methods are fine. An RMS level calculation might be appropriate too. Also bear in mind that thermal effects will alter the standard deviation /RMS. And you have to read Are reverse biased transistors stable?
Transistors do not work backwards for long. They break down permanently and the noise changes. It's a common problem with the budget USB type TRNGs, and long term drift can be difficult to diagnose.




i have heavy bias




This is pretty irrelevant too. As bias increases, correlation increases and entropy decreases so all biases /correlations will be factored in when you measure the entropy rate. Ultimately, it will be taken care of within the randomness extraction mechanism, manifesting itself as a simple reduction of output rate.



So yes, it's okay to drop the high bits. Actually, it's extremely common to bit drop and virtually all of the large fast TRNGs do it for various reasons.






share|improve this answer











$endgroup$




















    2












    $begingroup$

    If high nybble of each byte is always exactly zero, then obviously you can't lose entropy by discarding it: the random output of any one-to-one (deterministic) function of a random input has exactly the same entropy as the input. But the story doesn't stop there.



    First, you should have a probabilistic model of the entropy source, and a collection of probabilistic models of ways it could fail, informed by physics and engineering. Part of your model might include that the high nybble of each output byte is zero: if that changes, could that be evidence of a failure mode in your understanding of the physical device?



    You should devise a test to distinguish the null hypothesis that your device is working from the collection of alternative hypotheses that your device has failed, with high true positive rate or statistical power to report failure, and with whatever false positive rate—often confusingly called ‘statistical significance level’—you are willing to tolerate. This lets you raise an alarm when the device has broken.



    This test should be informed by the specific details of your device. Generic tests like dieharder or the NIST entropy suite or what have you are designed in ignorance of your device. They can detect some particular deviations from uniform, like if the ratio of zero bits to one bits is excessively far from 1/2, but they're very limited because the designers had no idea about the particular device you have in your hands. (More details on what generic ‘entropy tests’ can do.)



    But don't worry about using the bits from the entropy source directly, or about whatever your favorite measure of nonuniformity is, as long as the probabilistic model for the device can generate samples with at least $geq256$ bits min-entropy—if you have to read a kilobyte out of the device to get that, so be it.



    Next, you should feed the output through a standard preimage-resistant hash function like SHA3-256 (‘conditioning’).* You can lop off the high-order zero bits first if you want, or not: it doesn't matter, if you're 100% sure in advance that they will be zero. As long as the input has $geq256$ bits of min-entropy, the output can essentially be treated as if it has 256 bits of min-entropy, or as if it were a uniform random string.




    My idea is simple - throw out each higher byte from each sample (the one which is mostly zero) and leave just the lover one. The question is, is this OK? Can i still treat such stream as unconditioned?




    If you are 100% sure that they will always—in functioning and broken devices—be zero, then yes. The point of the unconditioned rule is that after conditioning like SHA3-256, statistical tests are essentially useless. You can test guesses about exactly what string of bits the device produced, but that's all. In contrast, if you're simply discarding bits of output that you can predict in advance will be zero, then that can't hurt your ability to distinguish a working device from a failed device based on the physics.



    Unless, of course, one of the failure modes causes the bits have become one when they should have been zero, in which case, well, don't discard them—test them!




    * There is literature on a subject called ‘randomness extractors’, typically constructed out of universal hash families like multiplication by a uniform random matrix. These are theoretically kind of interesting objects that aim to illuminate how much min-entropy you need in a nonuniform distribution to get within a certain total variation distance of a uniform distribution. But in practice, randomness extractors per se are essentially useless, because in order to use a randomness extractor to massage a nonuniform distribution into more-uniform, you need to also already have a uniform distribution on the extractor seed. With that uniform random seed you could have just done normal cryptography. (More details on randomness extractors and practical cryptography.)






    share|improve this answer









    $endgroup$












    • $begingroup$
      If randomness extractors are useless, the implication is that there are no TRNGs in this world! Yet strangely they seem to exist, are available for sale and I even have some on my desk. Hmm...
      $endgroup$
      – Paul Uszak
      Feb 19 at 17:44










    • $begingroup$
      And you've even referred to the ID Quantique TRNG technical papers on how they work. Which are also for sale, and pretty popular.
      $endgroup$
      – Paul Uszak
      Feb 19 at 17:53






    • 2




      $begingroup$
      Whatever ID Quantique is selling, the theorems they use in their garbled marketing brochure don't apply when the premises don't hold. It's unclear how you draw the conclusion that TRNGs don't exist—if they don't, then the physical world is obviously 100% deterministic and predictable—from a rather mundane observation that the technical definition of ‘randomness extractor’ in the literature isn't useful in practice without begging the question. Computing the product of a fixed public matrix with a sample may or may not ruin its entropy; the theorems of the RE literature are just not applicable.
      $endgroup$
      – Squeamish Ossifrage
      Feb 19 at 18:21


















    1












    $begingroup$

    Tossing the top byte sounds like a good starting point (or maybe just the top four bits), but I'd run the test on the result and see what comes up. There may be bias in the lower bits as well that's not obvious right now.



    An alternative is to use a randomness extractor. That way you should be able to massage the data to be a usable RNG even if there is bias.






    share|improve this answer









    $endgroup$








    • 4




      $begingroup$
      I wouldn't characterize a randomness extractor as an alternative. An extractor is the answer. Anything else isn't a great idea.
      $endgroup$
      – Future Security
      Feb 15 at 21:00










    • $begingroup$
      I shouldn't have used the term randomness extractor. See Squeamish Ossifrage's answer. I mean something like a cryptographic-hash-based conditioning algorithm. Using "randomness extractor" this way (to include hashes and Von Neumann's method) looks like a really common misuse of terminology. The (actual) answer is to use a good secure hash function.
      $endgroup$
      – Future Security
      Feb 19 at 19:45










    Your Answer





    StackExchange.ifUsing("editor", function ()
    return StackExchange.using("mathjaxEditing", function ()
    StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
    StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
    );
    );
    , "mathjax-editing");

    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "281"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    noCode: true, onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













    draft saved

    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcrypto.stackexchange.com%2fquestions%2f67334%2fhardware-rng-bias%23new-answer', 'question_page');

    );

    Post as a guest















    Required, but never shown

























    3 Answers
    3






    active

    oldest

    votes








    3 Answers
    3






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0












    $begingroup$


    Can i still treat such stream as unconditioned?




    This "conditioned" idea is pretty much a red herring and mostly just a semantic distinction. The entropy generator is what it is. If you drop the high byte, the low byte is the signal. With a 12 bit peak signal, the lower eight bits will supply you with a good amount of entropy. I much prefer the saner AIS 20 / AIS 31 Functionality classes for random number generators, and the "conditioned" term doesn't even appear.



    I typically just measure mean and standard deviation of the signal for monitoring. If the signal comes directly from a physical device such as an ADC, that will be an adequate quality control. Remember that at this point, you're effectively considering a simple quantized analogue signal and stochastic wave methods are fine. An RMS level calculation might be appropriate too. Also bear in mind that thermal effects will alter the standard deviation /RMS. And you have to read Are reverse biased transistors stable?
    Transistors do not work backwards for long. They break down permanently and the noise changes. It's a common problem with the budget USB type TRNGs, and long term drift can be difficult to diagnose.




    i have heavy bias




    This is pretty irrelevant too. As bias increases, correlation increases and entropy decreases so all biases /correlations will be factored in when you measure the entropy rate. Ultimately, it will be taken care of within the randomness extraction mechanism, manifesting itself as a simple reduction of output rate.



    So yes, it's okay to drop the high bits. Actually, it's extremely common to bit drop and virtually all of the large fast TRNGs do it for various reasons.






    share|improve this answer











    $endgroup$

















      0












      $begingroup$


      Can i still treat such stream as unconditioned?




      This "conditioned" idea is pretty much a red herring and mostly just a semantic distinction. The entropy generator is what it is. If you drop the high byte, the low byte is the signal. With a 12 bit peak signal, the lower eight bits will supply you with a good amount of entropy. I much prefer the saner AIS 20 / AIS 31 Functionality classes for random number generators, and the "conditioned" term doesn't even appear.



      I typically just measure mean and standard deviation of the signal for monitoring. If the signal comes directly from a physical device such as an ADC, that will be an adequate quality control. Remember that at this point, you're effectively considering a simple quantized analogue signal and stochastic wave methods are fine. An RMS level calculation might be appropriate too. Also bear in mind that thermal effects will alter the standard deviation /RMS. And you have to read Are reverse biased transistors stable?
      Transistors do not work backwards for long. They break down permanently and the noise changes. It's a common problem with the budget USB type TRNGs, and long term drift can be difficult to diagnose.




      i have heavy bias




      This is pretty irrelevant too. As bias increases, correlation increases and entropy decreases so all biases /correlations will be factored in when you measure the entropy rate. Ultimately, it will be taken care of within the randomness extraction mechanism, manifesting itself as a simple reduction of output rate.



      So yes, it's okay to drop the high bits. Actually, it's extremely common to bit drop and virtually all of the large fast TRNGs do it for various reasons.






      share|improve this answer











      $endgroup$















        0












        0








        0





        $begingroup$


        Can i still treat such stream as unconditioned?




        This "conditioned" idea is pretty much a red herring and mostly just a semantic distinction. The entropy generator is what it is. If you drop the high byte, the low byte is the signal. With a 12 bit peak signal, the lower eight bits will supply you with a good amount of entropy. I much prefer the saner AIS 20 / AIS 31 Functionality classes for random number generators, and the "conditioned" term doesn't even appear.



        I typically just measure mean and standard deviation of the signal for monitoring. If the signal comes directly from a physical device such as an ADC, that will be an adequate quality control. Remember that at this point, you're effectively considering a simple quantized analogue signal and stochastic wave methods are fine. An RMS level calculation might be appropriate too. Also bear in mind that thermal effects will alter the standard deviation /RMS. And you have to read Are reverse biased transistors stable?
        Transistors do not work backwards for long. They break down permanently and the noise changes. It's a common problem with the budget USB type TRNGs, and long term drift can be difficult to diagnose.




        i have heavy bias




        This is pretty irrelevant too. As bias increases, correlation increases and entropy decreases so all biases /correlations will be factored in when you measure the entropy rate. Ultimately, it will be taken care of within the randomness extraction mechanism, manifesting itself as a simple reduction of output rate.



        So yes, it's okay to drop the high bits. Actually, it's extremely common to bit drop and virtually all of the large fast TRNGs do it for various reasons.






        share|improve this answer











        $endgroup$




        Can i still treat such stream as unconditioned?




        This "conditioned" idea is pretty much a red herring and mostly just a semantic distinction. The entropy generator is what it is. If you drop the high byte, the low byte is the signal. With a 12 bit peak signal, the lower eight bits will supply you with a good amount of entropy. I much prefer the saner AIS 20 / AIS 31 Functionality classes for random number generators, and the "conditioned" term doesn't even appear.



        I typically just measure mean and standard deviation of the signal for monitoring. If the signal comes directly from a physical device such as an ADC, that will be an adequate quality control. Remember that at this point, you're effectively considering a simple quantized analogue signal and stochastic wave methods are fine. An RMS level calculation might be appropriate too. Also bear in mind that thermal effects will alter the standard deviation /RMS. And you have to read Are reverse biased transistors stable?
        Transistors do not work backwards for long. They break down permanently and the noise changes. It's a common problem with the budget USB type TRNGs, and long term drift can be difficult to diagnose.




        i have heavy bias




        This is pretty irrelevant too. As bias increases, correlation increases and entropy decreases so all biases /correlations will be factored in when you measure the entropy rate. Ultimately, it will be taken care of within the randomness extraction mechanism, manifesting itself as a simple reduction of output rate.



        So yes, it's okay to drop the high bits. Actually, it's extremely common to bit drop and virtually all of the large fast TRNGs do it for various reasons.







        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Feb 15 at 22:49

























        answered Feb 15 at 17:36









        Paul UszakPaul Uszak

        7,40611636




        7,40611636





















            2












            $begingroup$

            If high nybble of each byte is always exactly zero, then obviously you can't lose entropy by discarding it: the random output of any one-to-one (deterministic) function of a random input has exactly the same entropy as the input. But the story doesn't stop there.



            First, you should have a probabilistic model of the entropy source, and a collection of probabilistic models of ways it could fail, informed by physics and engineering. Part of your model might include that the high nybble of each output byte is zero: if that changes, could that be evidence of a failure mode in your understanding of the physical device?



            You should devise a test to distinguish the null hypothesis that your device is working from the collection of alternative hypotheses that your device has failed, with high true positive rate or statistical power to report failure, and with whatever false positive rate—often confusingly called ‘statistical significance level’—you are willing to tolerate. This lets you raise an alarm when the device has broken.



            This test should be informed by the specific details of your device. Generic tests like dieharder or the NIST entropy suite or what have you are designed in ignorance of your device. They can detect some particular deviations from uniform, like if the ratio of zero bits to one bits is excessively far from 1/2, but they're very limited because the designers had no idea about the particular device you have in your hands. (More details on what generic ‘entropy tests’ can do.)



            But don't worry about using the bits from the entropy source directly, or about whatever your favorite measure of nonuniformity is, as long as the probabilistic model for the device can generate samples with at least $geq256$ bits min-entropy—if you have to read a kilobyte out of the device to get that, so be it.



            Next, you should feed the output through a standard preimage-resistant hash function like SHA3-256 (‘conditioning’).* You can lop off the high-order zero bits first if you want, or not: it doesn't matter, if you're 100% sure in advance that they will be zero. As long as the input has $geq256$ bits of min-entropy, the output can essentially be treated as if it has 256 bits of min-entropy, or as if it were a uniform random string.




            My idea is simple - throw out each higher byte from each sample (the one which is mostly zero) and leave just the lover one. The question is, is this OK? Can i still treat such stream as unconditioned?




            If you are 100% sure that they will always—in functioning and broken devices—be zero, then yes. The point of the unconditioned rule is that after conditioning like SHA3-256, statistical tests are essentially useless. You can test guesses about exactly what string of bits the device produced, but that's all. In contrast, if you're simply discarding bits of output that you can predict in advance will be zero, then that can't hurt your ability to distinguish a working device from a failed device based on the physics.



            Unless, of course, one of the failure modes causes the bits have become one when they should have been zero, in which case, well, don't discard them—test them!




            * There is literature on a subject called ‘randomness extractors’, typically constructed out of universal hash families like multiplication by a uniform random matrix. These are theoretically kind of interesting objects that aim to illuminate how much min-entropy you need in a nonuniform distribution to get within a certain total variation distance of a uniform distribution. But in practice, randomness extractors per se are essentially useless, because in order to use a randomness extractor to massage a nonuniform distribution into more-uniform, you need to also already have a uniform distribution on the extractor seed. With that uniform random seed you could have just done normal cryptography. (More details on randomness extractors and practical cryptography.)






            share|improve this answer









            $endgroup$












            • $begingroup$
              If randomness extractors are useless, the implication is that there are no TRNGs in this world! Yet strangely they seem to exist, are available for sale and I even have some on my desk. Hmm...
              $endgroup$
              – Paul Uszak
              Feb 19 at 17:44










            • $begingroup$
              And you've even referred to the ID Quantique TRNG technical papers on how they work. Which are also for sale, and pretty popular.
              $endgroup$
              – Paul Uszak
              Feb 19 at 17:53






            • 2




              $begingroup$
              Whatever ID Quantique is selling, the theorems they use in their garbled marketing brochure don't apply when the premises don't hold. It's unclear how you draw the conclusion that TRNGs don't exist—if they don't, then the physical world is obviously 100% deterministic and predictable—from a rather mundane observation that the technical definition of ‘randomness extractor’ in the literature isn't useful in practice without begging the question. Computing the product of a fixed public matrix with a sample may or may not ruin its entropy; the theorems of the RE literature are just not applicable.
              $endgroup$
              – Squeamish Ossifrage
              Feb 19 at 18:21















            2












            $begingroup$

            If high nybble of each byte is always exactly zero, then obviously you can't lose entropy by discarding it: the random output of any one-to-one (deterministic) function of a random input has exactly the same entropy as the input. But the story doesn't stop there.



            First, you should have a probabilistic model of the entropy source, and a collection of probabilistic models of ways it could fail, informed by physics and engineering. Part of your model might include that the high nybble of each output byte is zero: if that changes, could that be evidence of a failure mode in your understanding of the physical device?



            You should devise a test to distinguish the null hypothesis that your device is working from the collection of alternative hypotheses that your device has failed, with high true positive rate or statistical power to report failure, and with whatever false positive rate—often confusingly called ‘statistical significance level’—you are willing to tolerate. This lets you raise an alarm when the device has broken.



            This test should be informed by the specific details of your device. Generic tests like dieharder or the NIST entropy suite or what have you are designed in ignorance of your device. They can detect some particular deviations from uniform, like if the ratio of zero bits to one bits is excessively far from 1/2, but they're very limited because the designers had no idea about the particular device you have in your hands. (More details on what generic ‘entropy tests’ can do.)



            But don't worry about using the bits from the entropy source directly, or about whatever your favorite measure of nonuniformity is, as long as the probabilistic model for the device can generate samples with at least $geq256$ bits min-entropy—if you have to read a kilobyte out of the device to get that, so be it.



            Next, you should feed the output through a standard preimage-resistant hash function like SHA3-256 (‘conditioning’).* You can lop off the high-order zero bits first if you want, or not: it doesn't matter, if you're 100% sure in advance that they will be zero. As long as the input has $geq256$ bits of min-entropy, the output can essentially be treated as if it has 256 bits of min-entropy, or as if it were a uniform random string.




            My idea is simple - throw out each higher byte from each sample (the one which is mostly zero) and leave just the lover one. The question is, is this OK? Can i still treat such stream as unconditioned?




            If you are 100% sure that they will always—in functioning and broken devices—be zero, then yes. The point of the unconditioned rule is that after conditioning like SHA3-256, statistical tests are essentially useless. You can test guesses about exactly what string of bits the device produced, but that's all. In contrast, if you're simply discarding bits of output that you can predict in advance will be zero, then that can't hurt your ability to distinguish a working device from a failed device based on the physics.



            Unless, of course, one of the failure modes causes the bits have become one when they should have been zero, in which case, well, don't discard them—test them!




            * There is literature on a subject called ‘randomness extractors’, typically constructed out of universal hash families like multiplication by a uniform random matrix. These are theoretically kind of interesting objects that aim to illuminate how much min-entropy you need in a nonuniform distribution to get within a certain total variation distance of a uniform distribution. But in practice, randomness extractors per se are essentially useless, because in order to use a randomness extractor to massage a nonuniform distribution into more-uniform, you need to also already have a uniform distribution on the extractor seed. With that uniform random seed you could have just done normal cryptography. (More details on randomness extractors and practical cryptography.)






            share|improve this answer









            $endgroup$












            • $begingroup$
              If randomness extractors are useless, the implication is that there are no TRNGs in this world! Yet strangely they seem to exist, are available for sale and I even have some on my desk. Hmm...
              $endgroup$
              – Paul Uszak
              Feb 19 at 17:44










            • $begingroup$
              And you've even referred to the ID Quantique TRNG technical papers on how they work. Which are also for sale, and pretty popular.
              $endgroup$
              – Paul Uszak
              Feb 19 at 17:53






            • 2




              $begingroup$
              Whatever ID Quantique is selling, the theorems they use in their garbled marketing brochure don't apply when the premises don't hold. It's unclear how you draw the conclusion that TRNGs don't exist—if they don't, then the physical world is obviously 100% deterministic and predictable—from a rather mundane observation that the technical definition of ‘randomness extractor’ in the literature isn't useful in practice without begging the question. Computing the product of a fixed public matrix with a sample may or may not ruin its entropy; the theorems of the RE literature are just not applicable.
              $endgroup$
              – Squeamish Ossifrage
              Feb 19 at 18:21













            2












            2








            2





            $begingroup$

            If high nybble of each byte is always exactly zero, then obviously you can't lose entropy by discarding it: the random output of any one-to-one (deterministic) function of a random input has exactly the same entropy as the input. But the story doesn't stop there.



            First, you should have a probabilistic model of the entropy source, and a collection of probabilistic models of ways it could fail, informed by physics and engineering. Part of your model might include that the high nybble of each output byte is zero: if that changes, could that be evidence of a failure mode in your understanding of the physical device?



            You should devise a test to distinguish the null hypothesis that your device is working from the collection of alternative hypotheses that your device has failed, with high true positive rate or statistical power to report failure, and with whatever false positive rate—often confusingly called ‘statistical significance level’—you are willing to tolerate. This lets you raise an alarm when the device has broken.



            This test should be informed by the specific details of your device. Generic tests like dieharder or the NIST entropy suite or what have you are designed in ignorance of your device. They can detect some particular deviations from uniform, like if the ratio of zero bits to one bits is excessively far from 1/2, but they're very limited because the designers had no idea about the particular device you have in your hands. (More details on what generic ‘entropy tests’ can do.)



            But don't worry about using the bits from the entropy source directly, or about whatever your favorite measure of nonuniformity is, as long as the probabilistic model for the device can generate samples with at least $geq256$ bits min-entropy—if you have to read a kilobyte out of the device to get that, so be it.



            Next, you should feed the output through a standard preimage-resistant hash function like SHA3-256 (‘conditioning’).* You can lop off the high-order zero bits first if you want, or not: it doesn't matter, if you're 100% sure in advance that they will be zero. As long as the input has $geq256$ bits of min-entropy, the output can essentially be treated as if it has 256 bits of min-entropy, or as if it were a uniform random string.




            My idea is simple - throw out each higher byte from each sample (the one which is mostly zero) and leave just the lover one. The question is, is this OK? Can i still treat such stream as unconditioned?




            If you are 100% sure that they will always—in functioning and broken devices—be zero, then yes. The point of the unconditioned rule is that after conditioning like SHA3-256, statistical tests are essentially useless. You can test guesses about exactly what string of bits the device produced, but that's all. In contrast, if you're simply discarding bits of output that you can predict in advance will be zero, then that can't hurt your ability to distinguish a working device from a failed device based on the physics.



            Unless, of course, one of the failure modes causes the bits have become one when they should have been zero, in which case, well, don't discard them—test them!




            * There is literature on a subject called ‘randomness extractors’, typically constructed out of universal hash families like multiplication by a uniform random matrix. These are theoretically kind of interesting objects that aim to illuminate how much min-entropy you need in a nonuniform distribution to get within a certain total variation distance of a uniform distribution. But in practice, randomness extractors per se are essentially useless, because in order to use a randomness extractor to massage a nonuniform distribution into more-uniform, you need to also already have a uniform distribution on the extractor seed. With that uniform random seed you could have just done normal cryptography. (More details on randomness extractors and practical cryptography.)






            share|improve this answer









            $endgroup$



            If high nybble of each byte is always exactly zero, then obviously you can't lose entropy by discarding it: the random output of any one-to-one (deterministic) function of a random input has exactly the same entropy as the input. But the story doesn't stop there.



            First, you should have a probabilistic model of the entropy source, and a collection of probabilistic models of ways it could fail, informed by physics and engineering. Part of your model might include that the high nybble of each output byte is zero: if that changes, could that be evidence of a failure mode in your understanding of the physical device?



            You should devise a test to distinguish the null hypothesis that your device is working from the collection of alternative hypotheses that your device has failed, with high true positive rate or statistical power to report failure, and with whatever false positive rate—often confusingly called ‘statistical significance level’—you are willing to tolerate. This lets you raise an alarm when the device has broken.



            This test should be informed by the specific details of your device. Generic tests like dieharder or the NIST entropy suite or what have you are designed in ignorance of your device. They can detect some particular deviations from uniform, like if the ratio of zero bits to one bits is excessively far from 1/2, but they're very limited because the designers had no idea about the particular device you have in your hands. (More details on what generic ‘entropy tests’ can do.)



            But don't worry about using the bits from the entropy source directly, or about whatever your favorite measure of nonuniformity is, as long as the probabilistic model for the device can generate samples with at least $geq256$ bits min-entropy—if you have to read a kilobyte out of the device to get that, so be it.



            Next, you should feed the output through a standard preimage-resistant hash function like SHA3-256 (‘conditioning’).* You can lop off the high-order zero bits first if you want, or not: it doesn't matter, if you're 100% sure in advance that they will be zero. As long as the input has $geq256$ bits of min-entropy, the output can essentially be treated as if it has 256 bits of min-entropy, or as if it were a uniform random string.




            My idea is simple - throw out each higher byte from each sample (the one which is mostly zero) and leave just the lover one. The question is, is this OK? Can i still treat such stream as unconditioned?




            If you are 100% sure that they will always—in functioning and broken devices—be zero, then yes. The point of the unconditioned rule is that after conditioning like SHA3-256, statistical tests are essentially useless. You can test guesses about exactly what string of bits the device produced, but that's all. In contrast, if you're simply discarding bits of output that you can predict in advance will be zero, then that can't hurt your ability to distinguish a working device from a failed device based on the physics.



            Unless, of course, one of the failure modes causes the bits have become one when they should have been zero, in which case, well, don't discard them—test them!




            * There is literature on a subject called ‘randomness extractors’, typically constructed out of universal hash families like multiplication by a uniform random matrix. These are theoretically kind of interesting objects that aim to illuminate how much min-entropy you need in a nonuniform distribution to get within a certain total variation distance of a uniform distribution. But in practice, randomness extractors per se are essentially useless, because in order to use a randomness extractor to massage a nonuniform distribution into more-uniform, you need to also already have a uniform distribution on the extractor seed. With that uniform random seed you could have just done normal cryptography. (More details on randomness extractors and practical cryptography.)







            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered Feb 19 at 17:27









            Squeamish OssifrageSqueamish Ossifrage

            19.3k12883




            19.3k12883











            • $begingroup$
              If randomness extractors are useless, the implication is that there are no TRNGs in this world! Yet strangely they seem to exist, are available for sale and I even have some on my desk. Hmm...
              $endgroup$
              – Paul Uszak
              Feb 19 at 17:44










            • $begingroup$
              And you've even referred to the ID Quantique TRNG technical papers on how they work. Which are also for sale, and pretty popular.
              $endgroup$
              – Paul Uszak
              Feb 19 at 17:53






            • 2




              $begingroup$
              Whatever ID Quantique is selling, the theorems they use in their garbled marketing brochure don't apply when the premises don't hold. It's unclear how you draw the conclusion that TRNGs don't exist—if they don't, then the physical world is obviously 100% deterministic and predictable—from a rather mundane observation that the technical definition of ‘randomness extractor’ in the literature isn't useful in practice without begging the question. Computing the product of a fixed public matrix with a sample may or may not ruin its entropy; the theorems of the RE literature are just not applicable.
              $endgroup$
              – Squeamish Ossifrage
              Feb 19 at 18:21
















            • $begingroup$
              If randomness extractors are useless, the implication is that there are no TRNGs in this world! Yet strangely they seem to exist, are available for sale and I even have some on my desk. Hmm...
              $endgroup$
              – Paul Uszak
              Feb 19 at 17:44










            • $begingroup$
              And you've even referred to the ID Quantique TRNG technical papers on how they work. Which are also for sale, and pretty popular.
              $endgroup$
              – Paul Uszak
              Feb 19 at 17:53






            • 2




              $begingroup$
              Whatever ID Quantique is selling, the theorems they use in their garbled marketing brochure don't apply when the premises don't hold. It's unclear how you draw the conclusion that TRNGs don't exist—if they don't, then the physical world is obviously 100% deterministic and predictable—from a rather mundane observation that the technical definition of ‘randomness extractor’ in the literature isn't useful in practice without begging the question. Computing the product of a fixed public matrix with a sample may or may not ruin its entropy; the theorems of the RE literature are just not applicable.
              $endgroup$
              – Squeamish Ossifrage
              Feb 19 at 18:21















            $begingroup$
            If randomness extractors are useless, the implication is that there are no TRNGs in this world! Yet strangely they seem to exist, are available for sale and I even have some on my desk. Hmm...
            $endgroup$
            – Paul Uszak
            Feb 19 at 17:44




            $begingroup$
            If randomness extractors are useless, the implication is that there are no TRNGs in this world! Yet strangely they seem to exist, are available for sale and I even have some on my desk. Hmm...
            $endgroup$
            – Paul Uszak
            Feb 19 at 17:44












            $begingroup$
            And you've even referred to the ID Quantique TRNG technical papers on how they work. Which are also for sale, and pretty popular.
            $endgroup$
            – Paul Uszak
            Feb 19 at 17:53




            $begingroup$
            And you've even referred to the ID Quantique TRNG technical papers on how they work. Which are also for sale, and pretty popular.
            $endgroup$
            – Paul Uszak
            Feb 19 at 17:53




            2




            2




            $begingroup$
            Whatever ID Quantique is selling, the theorems they use in their garbled marketing brochure don't apply when the premises don't hold. It's unclear how you draw the conclusion that TRNGs don't exist—if they don't, then the physical world is obviously 100% deterministic and predictable—from a rather mundane observation that the technical definition of ‘randomness extractor’ in the literature isn't useful in practice without begging the question. Computing the product of a fixed public matrix with a sample may or may not ruin its entropy; the theorems of the RE literature are just not applicable.
            $endgroup$
            – Squeamish Ossifrage
            Feb 19 at 18:21




            $begingroup$
            Whatever ID Quantique is selling, the theorems they use in their garbled marketing brochure don't apply when the premises don't hold. It's unclear how you draw the conclusion that TRNGs don't exist—if they don't, then the physical world is obviously 100% deterministic and predictable—from a rather mundane observation that the technical definition of ‘randomness extractor’ in the literature isn't useful in practice without begging the question. Computing the product of a fixed public matrix with a sample may or may not ruin its entropy; the theorems of the RE literature are just not applicable.
            $endgroup$
            – Squeamish Ossifrage
            Feb 19 at 18:21











            1












            $begingroup$

            Tossing the top byte sounds like a good starting point (or maybe just the top four bits), but I'd run the test on the result and see what comes up. There may be bias in the lower bits as well that's not obvious right now.



            An alternative is to use a randomness extractor. That way you should be able to massage the data to be a usable RNG even if there is bias.






            share|improve this answer









            $endgroup$








            • 4




              $begingroup$
              I wouldn't characterize a randomness extractor as an alternative. An extractor is the answer. Anything else isn't a great idea.
              $endgroup$
              – Future Security
              Feb 15 at 21:00










            • $begingroup$
              I shouldn't have used the term randomness extractor. See Squeamish Ossifrage's answer. I mean something like a cryptographic-hash-based conditioning algorithm. Using "randomness extractor" this way (to include hashes and Von Neumann's method) looks like a really common misuse of terminology. The (actual) answer is to use a good secure hash function.
              $endgroup$
              – Future Security
              Feb 19 at 19:45















            1












            $begingroup$

            Tossing the top byte sounds like a good starting point (or maybe just the top four bits), but I'd run the test on the result and see what comes up. There may be bias in the lower bits as well that's not obvious right now.



            An alternative is to use a randomness extractor. That way you should be able to massage the data to be a usable RNG even if there is bias.






            share|improve this answer









            $endgroup$








            • 4




              $begingroup$
              I wouldn't characterize a randomness extractor as an alternative. An extractor is the answer. Anything else isn't a great idea.
              $endgroup$
              – Future Security
              Feb 15 at 21:00










            • $begingroup$
              I shouldn't have used the term randomness extractor. See Squeamish Ossifrage's answer. I mean something like a cryptographic-hash-based conditioning algorithm. Using "randomness extractor" this way (to include hashes and Von Neumann's method) looks like a really common misuse of terminology. The (actual) answer is to use a good secure hash function.
              $endgroup$
              – Future Security
              Feb 19 at 19:45













            1












            1








            1





            $begingroup$

            Tossing the top byte sounds like a good starting point (or maybe just the top four bits), but I'd run the test on the result and see what comes up. There may be bias in the lower bits as well that's not obvious right now.



            An alternative is to use a randomness extractor. That way you should be able to massage the data to be a usable RNG even if there is bias.






            share|improve this answer









            $endgroup$



            Tossing the top byte sounds like a good starting point (or maybe just the top four bits), but I'd run the test on the result and see what comes up. There may be bias in the lower bits as well that's not obvious right now.



            An alternative is to use a randomness extractor. That way you should be able to massage the data to be a usable RNG even if there is bias.







            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered Feb 15 at 16:20









            SwashbucklerSwashbuckler

            47824




            47824







            • 4




              $begingroup$
              I wouldn't characterize a randomness extractor as an alternative. An extractor is the answer. Anything else isn't a great idea.
              $endgroup$
              – Future Security
              Feb 15 at 21:00










            • $begingroup$
              I shouldn't have used the term randomness extractor. See Squeamish Ossifrage's answer. I mean something like a cryptographic-hash-based conditioning algorithm. Using "randomness extractor" this way (to include hashes and Von Neumann's method) looks like a really common misuse of terminology. The (actual) answer is to use a good secure hash function.
              $endgroup$
              – Future Security
              Feb 19 at 19:45












            • 4




              $begingroup$
              I wouldn't characterize a randomness extractor as an alternative. An extractor is the answer. Anything else isn't a great idea.
              $endgroup$
              – Future Security
              Feb 15 at 21:00










            • $begingroup$
              I shouldn't have used the term randomness extractor. See Squeamish Ossifrage's answer. I mean something like a cryptographic-hash-based conditioning algorithm. Using "randomness extractor" this way (to include hashes and Von Neumann's method) looks like a really common misuse of terminology. The (actual) answer is to use a good secure hash function.
              $endgroup$
              – Future Security
              Feb 19 at 19:45







            4




            4




            $begingroup$
            I wouldn't characterize a randomness extractor as an alternative. An extractor is the answer. Anything else isn't a great idea.
            $endgroup$
            – Future Security
            Feb 15 at 21:00




            $begingroup$
            I wouldn't characterize a randomness extractor as an alternative. An extractor is the answer. Anything else isn't a great idea.
            $endgroup$
            – Future Security
            Feb 15 at 21:00












            $begingroup$
            I shouldn't have used the term randomness extractor. See Squeamish Ossifrage's answer. I mean something like a cryptographic-hash-based conditioning algorithm. Using "randomness extractor" this way (to include hashes and Von Neumann's method) looks like a really common misuse of terminology. The (actual) answer is to use a good secure hash function.
            $endgroup$
            – Future Security
            Feb 19 at 19:45




            $begingroup$
            I shouldn't have used the term randomness extractor. See Squeamish Ossifrage's answer. I mean something like a cryptographic-hash-based conditioning algorithm. Using "randomness extractor" this way (to include hashes and Von Neumann's method) looks like a really common misuse of terminology. The (actual) answer is to use a good secure hash function.
            $endgroup$
            – Future Security
            Feb 19 at 19:45

















            draft saved

            draft discarded
















































            Thanks for contributing an answer to Cryptography Stack Exchange!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            Use MathJax to format equations. MathJax reference.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcrypto.stackexchange.com%2fquestions%2f67334%2fhardware-rng-bias%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown






            Popular posts from this blog

            How to check contact read email or not when send email to Individual?

            How many registers does an x86_64 CPU actually have?

            Nur Jahan