Home» Long ELS Phrases |

# Long ELS Phrases

Professor Rips has maintained that long ELS phrases is one of the most promising research areas in Torah Codes. He has asserted that meaningful long ELS phrases "can be found [in the Torah text] with relative ease." And indeed he has a large collection of them. We are hopeful that he will organize them and give us permission to show them on this website.

Testing the hypothesis that meaningful long ELS phrases occur more often in the Torah text than in monkey texts is not an easy hypothesis to test. The problem centers around computing maximally long ELS phrases, each of whose ELSs is a word in a given lexicon of tens of thousands of words, and evaluating the degree to which an ELS phrase is meaningful. Some research work has been done in the evaluation of meaningfulness of an ELS phrase. Unfortunately, the methodology is very labor intensive and it is too costly to run many experiments. The Long ELS phrase reported here uses an automated way of trying to differentiate the Maximal ELS phrases coming from the Torah text from those coming from a monkey text without any direct linguistic evaluation of meaning or semantic connectedness.

We have developed a special fast algorithm to be able to find all the maximal ELS phrases in a text using a lexicon of even some tens of thousands of key words. Using this tool we can examine the maximal ELS phrases from the Torah text and see if there is any statistic by which they are different from those of monkey texts. The statistics we use are the best ones we can think of for which there is a simple algorithm to compute them. However, what they measure is most certainly a weak derivative of meaningfulness and they in fact may not be powerful enough to capture what they really need to measure.

We have begun this examination using two statistics of an ELS phrase:
a feature called *difficulty class* and the *Conditional Entropy* per letter. The difficulty
class of an ELS phrase is defined by Professor Rips as the phrase length (not including
spaces between words) minus three times the number of words in the phrase. The *conditional entropy per letter* is the average number of bits it would take to guess the next letter of the ELS phrase given the previous K letters. In our experiments we take K to be 4.

Our *entropy* feature is different from that used by Dr. Ingermanson. He only used
entropy of digrams and trigrams, letter subsequences of two successive letters or three successive
letters on entire *skip texts* without reference to any lexicon.
By contrast, we only consider maximal ELS phrases composes of words from a lexicon.
And our entropy statistics are computed from subsequences of five successive letters,
rather than two or three. We estimate the distribution of subsequences of 5 letters at a time,
including the space character, from a Hebrew Corpora composed from a variety of sources.
Our Hebrew corpora has about 60 million letters so far. Our maximal ELS phrase is composed
of equal distance letter sequences at a uniform skip formed of words in our lexicon and in which we put the space
character between each pair of successive words. Maximal means that the ELS phrase cannot be
extended either from its beginning or from its end.

*M*words and

*K*letters, (not counting spaces) the Difficulty Class of the phrase, as defined by Professor Rips, is

*K-3M*. The idea behind the measure is to provide high values for phrases that many letters and few words, and therefore, more letters per word. The measure was designed to be used to help distinguish meaningful ELS phrases from non-sense ELS phrases that would happen by chance.

Conditional Entropy is a measure of the average uncertainty about what the value that one random variable takes before observing it given the value of another random variable. The resulting quantity is averaged over all values of the second random variable. Conditional Entropy, like entropy, is measured in bits.

An entropy of *H* bits means that in order to provide information about the value of the as yet unobserved random variable, it will require, on the average, an H bit message. For example, an *H* bit message specifies a choice of 1 out of *2 ^{H}* possibilities.

One way to explain the meaning of an *H* bit conditional entropy is by the following game played between
person A and person B. Person A and B sample a value of random variable *X* according to the
probability *P(X)*. They observe the sampled value *u*.
Then conditioned on the value *u*, person A samples a value *v* of random variable *Y*. This sampling takes place with respect to the conditional probability P(Y | X). Person B does not observe the value *u*.
Taking the weighted average, specified by *P(Y | X )*, over all possible values that random variable *Y* can take, the average uncertainty that person B has with respect to the value *u* is
the conditional entropy of *Y* given *X*. This is denoted by *H(Y | X)*.

Suppose the discrete random variable *X* takes possible values *{x _{1},...,x_{M}}* and
the discrete random variable

*Y*takes possible values

*{y*. Then the conditional entropy H(Y | X) is defined by

_{1},...,y_{N}}* H(Y | X) = -E _{X}[E_{Y}[log_2 P(Y | X)]}= - Σ^{M}_{m=1}Σ^{N}_{n=1} P(y_{n} | x_{m})log_2 P(y_{n} | x_{m})]P(x_{m}}) *.

If person B were to use his knowledge of the probability function of random
variable *X* in the most effective way possible, it would take person B, on the average, *2 ^{H(Y | X)}* guesses to correctly guess the value

*v*that person A had sampled. Or saying this another way, it would take on the average a message of length

*H(Y | X)*bits for person A to tell person B the value of

*Y*that he had sampled.

Conditional Entropy per Letter is a measure of the average uncertainty about what letter will be the current letter given the knowledge of the *N* previous letters.

The English language has 26 characters, not distinguishing between large and small and omitting numerics and punctuation marks and special symbols. The space character used to demarcate words is a 27th character. If each of the 27 characters were to occur with equal probability, the letter entropy would be about 4.76 bits per letter. However, the 27 characters do not have equal probability. The entropy per letter is actually about 4.03 bits per letter.

Now suppose that we have knowledge of the previous letter. What is the average uncertainty for the current letter? Here the question we are asking is what is the entropy per letter conditioned on the knowledge of the previous letter. This is a conditional entropy question. For the English language it is about 2.8 bits. And if we are given knowledge of the previous 2 characters, the conditional entropy per letter is about 1.3 bits. This means that a person
who had knowledge of the probability of the current letter given the previous 2 letters would take on the average about 2.5 guesses (log_{2}(2.5)=1.3) to correctly guess the current letter given the previous 2 letters. Or equivalently, given the previous two letters, it would take about 1.3 bit message, on the average, to communicate what the current letter is.

Entropy is a measure of the average uncertainty about what the value of a random variable is before observing it. Entropy is measured in bits.

An entropy of *H* bits means that in order to provide
information about the value of the as yet unobserved random variable, it will require, on the average,
an H bit message. For example, an *H* bit message specifies a choice of 1 out of *2 ^{H}* possibilities.

One way to explain the meaning of the *H* bit message is by the following game played between
person A and person B. Person A samples
at random a value *v* of the random variable *X*. Person B knows what the probability
is of random variable X taking any of its values, but does not know the value *v* that
person A has sampled. If person B were to use his knowledge of the probability function of random
variable *X* in the most effective way possible, it would take person B, on the average, *2 ^{H}* guesses to correctly
guess the value

*v*that person B had sampled.

If *P* denotes the probability function of a discrete random variable *X* which takes possible values *{x _{1},...,x_{N}}* and

*H(X)*denotes the entropy of the random variable

*X*, then the entropy of the random variable

*X*is minus the expected value of log to the base 2 of

*P(X)*

* H(X) = -E[log_2 P(X)]= - Σ ^{N}_{n=1} P(x_{n}) log_2 P(x_{n}}) *

We illustrate skip text first by example. Consider the following text :

**"The first principle in thinking is knowing what you know and knowing what you do not know. For when you think you know and you really do not know, you have put yourself in limitation. And the thinking
you do will be constrained because of what you do not know that you need to know or it will be constrained because you do not realize that what you think you know is actually incorrect." **

We take this text and remove punctuation and spaces and lay it out in rows of 20 successive characters each. This is shown below.

`Thefirstprincipleint``hinkingisknowingwhat``youknowandknowingwha``tyoudonotknowForwhen``youthinkyouknowandyo``ureallydonotknowyouh``aveputyourselfinlimi``tationAndthethinking``youdowillbeconstrain``edbecauseofwhatyoudo``notknowthatyouneedto``knoworitwillbeconstr``ainedbecauseyoudonot``realizethatwhatyouth``inkyouknowisactually``incorrectaaaaaaaaaaa`Now if we scan down the first column, we get the first skip text: Thytyuatyenkarii. The second skip text is from the second column: hioyorvaodonienn and so on.

*s*, an

*N*character text

*T=(t*has

_{1},...,t_{N})*s*skip texts

*T_1,T_2,...,T_s*given by

T_{1}=(t_{1},t_{1+s},t_{1+2s},...) |

T ,...,_{2}=(t_{2},t_{2+s},t_{2+2s},...) |

T_{s}=(t_{s},t_{2s},t_{3s},...,) |