Random numbers have been one of the most useful
objects in statistics, computer science, cryptography, modeling,
simulation, and other applications though it is very dicult to
construct true randomness. Many solutions (e.g., cryptographic
pseudorandom generators) have been proposed to harness or
simulate randomness and many statistical testing techniques have
been proposed to determine whether a pseudorandom generator
produces high quality randomness. NIST SP800-22 (2010) proposes
the state of art testing suite for (pseudo) random generators
to detect deviations of a binary sequence from randomness. On
the one hand, as a counter example to NIST SP800-22 test suite,
it is easy to construct functions that are considered as GOOD
pseudorandom generators by NIST SP800-22 test suite though
the output of these functions are easily distinguishable from the
uniform distribution. Thus these functions are not pseudorandom
generators by definition. On the other hand, NIST SP800-22
does not cover some of the important laws for randomness. Two
fundamental limit theorems about random binary strings are
the central limit theorem and the law of the iterated logarithm
(LIL). Several frequency related tests in NIST SP800-22 cover
the central limit theorem while no NIST SP800-22 test covers
LIL.
This paper proposes techniques to address the above challenges
that NIST SP800-22 testing suite faces. Firstly, we propose
statistical distance based testing techniques for (pseudo) random
generators to reduce the above mentioned Type II errors in NIST
SP800-22 test suite. Secondly, we propose LIL based statistical
testing techniques, calculate the probabilities, and carry out
experimental tests on widely used pseudorandom generators
by generating around 30TB of pseudorandom sequences. The
experimental results show that for a sample size of 1000 sequences
(2TB), the statistical distance between the generated sequences
and the uniform distribution is around 0.07 (with 0 for statistically
indistinguishable and 1 for completely distinguishable)
and the root-mean-square deviation is around 0.005. Though the
statistical distance 0.07 and RMSD 0.005 are acceptable for some
applications, for a cryptographic "random oracle", the preferred
statistical distance should be smaller than 0.03 and RMSD be
smaller than 0.001 at the sample size 1000. These results justify
the importance of LIL testing techniques designed in this paper.
The experimental results in this paper are reproducible and the
raw experimental data are available at author's website.
↧