Here we illustrate how hypothesis testing is done with Torah codes by small simulated data sets
in which what is going on can be entirely visualized geometrically.
The first simulations will be with two two-dimensional point sets, one point set associated with
the ELSs of one key word and the other point set associated with ELSs of the other key word.
The Null hypothesis is that the interpoint distances between the points of the first set and the second
set are just what would be expected if the points in each set were uniformly distributed in the unit
square and independent of each other. The Alternative hypothesis we first choose to examine
is that one point from the first point set and one point from the second point set are closer
together than what would be expected under the Null hypothesis.
Red denotes points from point set 1. Green denotes points from point set 2.
The first plot is a sample showing the pattern when point set 1 is independent of point set 2
and both are uniformly distributed over the unit square.
The second plot is a sample showing the pattern when point set 2 is moved closer to point set 1
What should our test statistic be? How should be choose its critical value?