[liberationtech] A software for combining text files to obtain high quality pseudo-random sequences in practice
mok-kong shen
mok-kong.shen at t-online.de
Mon Jul 10 08:52:29 PDT 2017
Shannon did some experiments to determine the entropy in English texts.
A later work done by Cover and King [1] gave an estimate of 1.34 bits
per letter. This implies that, if the letters are coded into 5 bits, one
needs to appropriately combine 4 text files in order to obtain bit
sequences of full entropy, since 4*1.34 = 5.36 > 5. The method used in
our software is to sum (mod 32) the coded values of a-z (mapped to 0-25)
as 5 bits of the corresponding letters of the text files.
There are plenty of other schemes for obtaining high quality
pseudo-random sequences in practice, e.g. AES in counter mode. However
our scheme seems to be much simpler both in the underlying logic
(understandability) and in implementation and is thus a viable
alternative that one could use/need under circumstances.
The software, TEXTCOMBINE-SP, is available at http://mok-kong-shen.de
M. K. Shen
-------------------------------------------------------------------------------
[1] T. M. Cover, R. C. King, A Convergent Gambling Estimate of the
Entropy of English, IEEE Trans. Inf. Theory, vol. 24, 1978, pp. 413-421.
