The design of an ellipsoidally symmetric classifier is similar to the Gaussian classifier.
We form a discriminant function v acting on feature vector x
The Gaussian classifier assigns a feature vector x to class 1 when y < t ,
where t is a threshold value depending on the log of the determinants of the covariance
matrices for class 1 and class 2.
The Ellipsoidally symmetric classifier uses v as a scalar feature and computes
the class conditional probability P(v | class 1) and P(v | class 2) by simply
partitioning the range of the discriminant into K equal size bins and determining the fraction
of measurements falling into each bin for class 1 features and for class 2 features.
We take the number of bins K to be 100 and the class prior probabilities P(class1) and P(class 2) to be the fractions observed in the sample.
Discriminant feature v
is assigned to class 1 when
P(v | class 1)P(class1) > P(v | class 2)P(class 2).
As in the case for the Gaussian classifier, there is some difference between
the feature distribution coming from the maximal ELS phrases of the Torah text and
those of the monkey text for a skip range of 2 through 100. But instead of finding
a more significant p-value, it is slightly less significant. Probably when K=100 bins
the sample size of some 40,000 is too small or the bin size is too large.
The results in the larger skip range show no difference between the Torah text and monkey text maximal ELS phrases. We conclude, therefore, that the difficulty class and conditional entropy per letter features do not carry
information that can be used to distinguish between maximal ELS phrases coming from the Torah text and coming from the monkey text.
||Number of Maximal ELS Phrases Torah Text
|| Number of Maximal ELS Phrases Monkey Text