 Collinearity Analysis And Therapy Of The Indefinite Correlation-Matrix By SPEARMAN & HART (1913) _ German

Rudolf  Sponsel, Erlangen, translated by Agnes Mehl, Fürth

0. Summary

It is shown, that by using the PESO analysis by HAIN (1994) (Pivotized Erhard Schmidt Orthonormalization) and the numeric analysis of stability (SPONSEL 1994) the analysis of collinearity can be carried through. Furthermore it is shown, how the indefinite correlation matrix by SPEARMAN & HART producing highly pathological multiple and partial correlation coefficients can be "cured" using the centroid method by THURSTONE. Finally it is shown, that multicollinearity, i.e. the multiple laws included in this correlation matrix - unfortunately - disappear after a successful "centroid therapy". Thus it is also shown, that no reliable statement can be made about indefinite correlation matrices. Numerous even historically important correlation matrices are effected by this like e.g. the "Primaries..." by THURSTONE (Sponsel 1994, Empirical Correlation Matrices report (1910-1993).

1. Introduction

Indefinite matrices of correlation matrices, thus such matrices having lost their positive definiteness, produce absurd multiple and partial correlation coefficients. This mostly can be traced back to major methodological mistakes (Sponsel 1994). Particularily severe cases are indicated by higher negative eigenvalues (rule of thumb: > |.01|). If on the other side the negative eigenvalues are "small" (rule of thumb: in the area of the third digit after the decimal point), the disorder most probably will be a consequence of collinearity, i.e. the representation of a linear law in connection with rounding errors being unavoidable doing the concrete numeric computing. One cannot compute resonsibly and reasonably with indefinite correlation matrices any longer. Thus the question arises, whether and how such matrices can be "cured". An effective procedure is presented by KNOL & TEN BERGE (1989). We now want to demonstrate the efficiency of the centroid- method by THURSTONE on the SPEARMAN & HART matrix from 1913. We also show, how an analysis of collinearity can be carried through using the PESO analysis by HAIN (1994) (Pivotized Erhard Schmidt Orthonormalization) and the analysis of numeric stability (SPONSEL 1994).

2. Explanations on the criteria of the matrix analysis

Samp_Or_MD_NumS_Condit_Determ_HaInRatio_R_OutIn_K_Norm_C_Norm
_
Samp  =:  sample size

Or    =:  order of matrix

Md    =:  missing data information:  -1 =: unknown

NumS  =: valuation (& possibly number of negative  eigenvalues). Here exist so far the following rule of thumb valuations:
+   numerically stable
+?  borderline, tendency to be rather  numerically stable
-?  borderline, tendency to be rather  numerically instable
-   numerically instable
--Z  indefinite with Z negative eigenvalues
 Also only one negative eigenvalue being given, the matrix is indefinite and derailments of any kind are possible. The matrix has turned "psychotic" so to speak: no value can be trusted anymore, all is possible. Such a state has to be avoided at all costs, or to be reversed respectively immediately "treated" before any further calculating can be done.

Condit = Highest absolut eigenvalue / Smallest absolut eigenvalue with order < 10, Condit shows > 30; with order < 20, Condit shows > 50 and thereby indicates numeric instability.

Determ =: determinant. The determinant represents the absolute value of the volume of the n-dimensional parallelotope (multidimensional object). The smaller it is, the smaller is the volume of the space. A small volume hereby can be caused by a single small vector equaling a small angle. This is the critical case. A small determinant, however, can also result very "normally" from the "natural" calculating process without having to express numeric instability. A valuation only because of the absolute value of the determinant thus is not reasonable.

HaInRatio =:  HADAMARD number inverse. The HADAMARD number of the inverse indicates, which ratio the real determinant of the inverse has got of its theoretically maximum value with the coefficient matrix being given. According to the rule of thumb by FADDEJEW
& FADDEJEWA an inverse determinant will be considered small if for its ratio 1 : 50 000 is valid, thus the HaInRatio being < .00002.

R_OutIn =: LES Input Output Ratio (SPONSEL 1994). The input output ratio indicates, by how much the output will be changed, if a change by one unit is made around the third digit after the decimal point. Theoretically the value ranges from 0 to ...

K_Norm =: smallest PESO-Norm correlation matrix (HAIN 1994). The smallest reduced norm ("shortest" norm, "flatest" angle) of the correlation matrix is a measure of the degree of collinearity. The smaller the value of the K_Norm, the stronger is the degree of collinearity. The product of all reduced norms results in the absolute value of the determinant. Thus a single small K_Norm is sufficient to bring the volume close to 0 (equivalent to the function of small eigenvalues). PESO for the correlation matrix is adjusted in a way, that for all K_Norms < 0.01 in brackets
the number of relations (collinearities) is printed out. The root of the K_Norm gives an upper boundary of the highest correlation coefficient.

C_Norm =: smallest PESO-Norm CHOLESKY matrix (HAIN 1994). The C_Norm represents the smallest reduced norm of the
CHOLESKY matrix. The importance of the CHOLESKY decomposition is based among other things on the isometry to the raw scores. The smallest C_Norm indicates the smallest angle given for the centered standardized raw scores. The square of the smallest C_Norm is larger or equal the smallest K_Norm. As empirical rule of thumb is valid: (C_Norm^2)/(2...5) ~ K_Norm. Furthermore is valid: r(multiple) = SQR (1-C_Norm^2). Thus from the C_Norm directly multiple correlation coefficients can be determined. The relation or collinearity can be expressed by the smallest CHOLESKY norm, thus also by the well-known and usual multiple correlation coefficient. A C_Norm < .31 can
serve as a critical boundary for a collinearity starting to be more significant.

Eigenvalues the practically most important and most useful criterion for collinearity with correlations matrices are eigenvalues close to 0 (< .10).

Table 1
Original Correlation Matrix Spearman & Hart 1913
Original input data with  2-digit-accuracy and read with
2-digit-accuracy (for control here the analysed original matrix):
1    2    3    4    5    6    7    8    9    10   11   12   13
1   1    .77  .67  .6   .69  .57  .57  .5   .52  .48  .38 .2   .16
2   .77  1    .74  .61  .66  .59  .53  .29  .52  .16  .62 .31  .07
3   .67  .74  1    .52  .72  .45  .61  .34  .52  .14  .22 .19  .23
4   .6   .61  .52  1    .44  .76  .47  .67  .4   .29  .13 .57 -.13
5   .69  .66  .72  .44  1    .51  .65  .4   .34  .47  .23 .19  .01
6   .57  .59  .45  .76  .51  1    .41  .45  .47  .25  .03 .26  .11
7   .57  .53  .61  .47  .65  .41  1    .45  .47  .08  .26 -.05  .22
8   .5   .29  .34  .67  .4   .45  .45  1    .34  .16  .08 .05 -.05
9   .52  .52  .52  .4   .34  .47  .47  .34  1   -.07 -.01 .01 -.13
10  .48  .16  .14  .29  .47  .25  .08  .16 -.07  1    .26 .06  .19
11  .38  .62  .22  .13  .23  .03  .26  .08 -.01  .26  1 .16  .29
12  .2   .31  .19  .57  .19  .26 -.05  .05  .01  .06  .16  1    .05
13  .16  .07  .23 -.13  .01  .11  .22 -.05 -.13  .19  .29  .05  1

Table 2: Matrix Analysis Criteria

Or_ MD_NumS_ConditDeterm_         HaInRatio_ R_OutIn_ K_Norm_ C_Norm
13 -1  --1  733.3  -.0000167538   2.21 D-12   394.2   5D-3(1)  -1(-1)

Highest inverse negative diagonal value____=   -.021075044
thus multiple r( 5.rest)_________________=   6.960566542 (!)
and there are  2 multiple r > 1 (!)

i.Eigenvalue  Cholesky   i.Eigenvalue  Cholesky   i.Eigenvalue  Cholesky
1.  5.63691   1         2.  1.61837   .638        3.  1.33718  .654
4.  1.09919   .7636     5.  .87991    .6319       6.  .75463   .6054
7.  .60908    .712      8.  .41475    .6371       9.  .32742   .7243
10. .2103     .5212     11. .18194   -.1899       12. 7.69D-3 -.2362
13.-.07735   -.2838

The matrix is not positive definit. Cholesky decomposition is not success-
ful (for detailed information Cholesky's diagonalvalues are presented).

3. Discussion according to criteria of the original matrix

The negative determinant indicates an indefinite and badly derailed matrix. The condition number - largest : smallest eigenvalue - shows with 733 a high value. The LES analysis, rounding up and down in the third digit after the decimal point, shows an input-output ratio of 394, i.e. a change of the input leads to a change with the output by 394 times. The HADAMARD number of the inverses indicates 2*10^-12 a very small ratio. The negative eigenvalue is with a value of -.07735 very high and at first sight leaves little hope for therapy. However, to my surprise it turned out, that a "centroid therapy" according to the centroid method by THURSTONE was successful despite the high negative eigenvalue. Probably  no indefiniteness  because of pure collinearity is present, but it has to be suspected that the "correlation matrix" is multiply damaged: missing data, meaning procedures, "correction for attenuation" no product-moment-coefficient? The highest multiple correlation coefficient is with r5.rest = 6.96 (!) completely derailed - as a consequence of the loss of positive definiteness.

4. Which variables are responsible for the collinearity?

The position of the eigenvalue doesn't allow any conclusion, which variable constitutes the collinearity. This can easily be controlled exchanging rows and columns of the correlation matrix and realizing, that the eigenvalues stay the same. Information however produces the PESO analysis (Pivotized Erhard Schmidt Orthonormalization) developed by Dr. HAIN (1994).

Table 3:  PESO-Analysis of the correlation matrix

Var.  RN Reduced Norm          ON Original Norm Ratio RN/ON
1    2.1186080323290788       2.1186080323290788  1.0000
13   1.0830630275311198       1.1414902533297001  0.9488
12   1.0688990982453775       1.2796874610809581  0.8352
10   .89566410275237469       1.3368993970637328  0.6699
11   .87263435136430641       1.3781509343086692  0.6331
8    .72829442742515084       1.6174671548075548  0.4502
6    .5198933852503035        1.8568252464596398  0.2799
5    .50466376705114477       1.9727899014746841  0.2558
9    .38749960154371265       1.6328502677286861  0.2373
7    .33804356960980048       1.8269373267907007  0.1850
3    .27300931252094519       1.9755758642348769  0.1381
4    .11747153111497607       2.0115416961068529  0.0583
2    .010887210195374477      2.1077713335900289  0.0051

products of:
1.6753770848438221D-5       844.16208542332548  1.9846627961307499D-8

Remark: As can be seen, the product of the reduced norms results in the absolute value of the determinant. The product of the ratios results in the HADAMARD condition number (not mentioned above). Regarding the choosen boundary in PESO (equivalent >= a multiple correlation coefficient of .99499) PESO finds "relations" (Term HAIN uses for almsot collinearities), here one of them:

Table 4:   Relation
1    -0.3146527856      8     0.1316158518
2     1.0000000000      9     0.0183634921
3    -0.4556784876      10    0.1700240126
4     0.1663107489      11   -0.5511173581
5     0.0554053195      12   -0.0898030079
6    -0.4653680932      13    0.3019404255
7    -0.0053392459

Practical proof: Multiplying the original matrix by this vector results according to the choosen boundary in almost-zero. The absolue values of the relations indicate something about the contribution of the respective variable to the linear dependency. One could interpret the smallest absolute values as suggestions for elimination. Eliminating variables 7, 9, 12 the following matrix results:

Table 5: Reduced Matrix (7,9,12)

1    2    3    4    5    6    8    10   11   13      Multiple correlat.
1   1    .77  .67  .6   .69  .57  .5   .48  .38  .16    1.rest    .99353
2   .77  1    .74  .61  .66  .59  .29  .16  .62  .07    2.rest    .99906
3   .67  .74  1    .52  .72  .45  .34  .14  .22  .23    3.rest    .97033
4   .6   .61  .52  1    .44  .76  .67  .29  .13 -.13    4.rest    .992
5   .69  .66  .72  .44  1    .51  .4   .47  .23  .01    5.rest    .98908
6   .57  .59  .45  .76  .51  1    .45  .25  .03  .11    6.rest    .9774
8   .5   .29  .34  .67  .4   .45  1    .16  .08 -.05    8.rest    .99137
10  .48  .16  .14  .29  .47  .25  .16  1    .26  .19    10.rest   .99288
11  .38  .62  .22  .13  .23  .03  .08  .26  1    .29    11.rest   .99609
13  .16  .07  .23 -.13  .01  .11 -.05  .19  .29  1      13.rest   .93259

Result: The elimination of 7,9,12 brings about a just again positive definite matrix, however including a high collinearity. As can be seen  already 6 out of 10 correlation coefficients are > .99. It is clear, that this matrix has to be ill-conditioned, as also the matrix analysis shows:

Table 6: Matrix analysis criteria of the reduced matrix

Or_ MD_NumS_ConditDeterm_         HaInRatio_ R_OutIn_ K_Norm_ C_Norm
10 -1   -   4645    .000005694    1.28 D-18   330.4   1D-3(1)  .043(1)

We now try to "cure the indefinite matrix using the centroid method by THURSTONE, i.e. eliminating the negative eigenvalues, and to check afterwards which collinearities remain.

5.  "Centroid-Therapie" according to THURSTONE
The method of the main components is not applicable with negative eigenva- lues, as in the reel no square roots from negative values can be obtained: Thus the centroid factor analysis by THURSTONE is carried through using the complete number of variables, in this case 13, and from the 13 factors the correlation matrix is calculated back with the main diagonal elements set 1. Naturally this will only be possible, if the residuals  and naturally also the main diagonal elements which are essential for a correlation matrix have small values. This we want to see first:

Table 7:  13. RESIDUAL MATRIX

-.0016 .0114-.0200-.0169 .0098-.0123 .0093 .0171 .0092-.0069-.0159 .0047 .0133
.0114-.0020 .0341-.0002-.0168 .0206-.0246-.0054 .0014-.0275 .0087 .0055-.0326
-.0200 .0341-.0035 .0087-.0045-.0437-.0084-.0210 .0138-.0304 .0059-.0192 .0395
-.0169-.0002 .0087-.0174-.0443 .0384 .0032 .0533-.0030 .0549-.0327-.0052-.0719
.0098-.0168-.0045-.0443-.0154 .0201 .0242 .0347-.0133 .0233 .0198 .0309-.0278
-.0123 .0206-.0437 .0384 .0201-.0038-.0418-.0134 .0383-.0100-.0363-.0183-.0070
.0093-.0246-.0084 .0032 .0242-.0418-.0042-.0264-.0029-.0263 .0209-.0310 .0418
.0171-.0054-.0210 .0533 .0347-.0134-.0264-.0163-.0007-.0018 .0047-.0280 .0180
.0092 .0014 .0138-.0030-.0133 .0383-.0029-.0007-.0021-.0079-.0013 .0099-.0143
-.0069-.0275-.0304 .0549 .0233-.0100-.0263-.0018-.0079-.0103 .0254-.0347 .0155
-.0159 .0087 .0059-.0327 .0198-.0363 .0209 .0047-.0013 .0254-.0083 .0091 .0347
.0047 .0055-.0192-.0052 .0309-.0183-.0310-.0280 .0099-.0347 .0091-.0076 .0242
.0133-.0326 .0395-.0719-.0278-.0070 .0418 .0180-.0143 .0155 .0347 .0242-.0132

Now we perform a standard matrix analysis and discover to our complete surprise, that the "centroid therapy" is successful despite the high negative eigenvalue with ~-.07 of the original matrix and the matrix at least regained its positive definiteness, although it is still ill-conditioned, but no longer as much as before. The decisive advantage now, however, is to gain a clear picture of the original collinearity structure.

Table 8: Centroid-cured matrix

Original input data with  2-digit-accuracy and read with 2-digit-accuracy
(for control here the analyzed original matrix):
1    .76  .69  .62  .68  .58  .56  .48  .51  .49  .4   .2   .15
.76  1    .71  .61  .68  .57  .55  .3   .52  .19  .61  .3   .1
.69  .71  1    .51  .72  .49  .62  .36  .51  .17  .21  .21  .19
.62  .61  .51  1    .48  .72  .47  .62  .4   .24  .16  .58 -.06
.68  .68  .72  .48  1    .49  .63  .37  .35  .45  .21  .16  .04
.58  .57  .49  .72  .49  1    .45  .46  .43  .26  .07  .28  .12
.56  .55  .62  .47  .63  .45  1    .48  .47  .11  .24 -.02  .18
.48  .3   .36  .62  .37  .46  .48  1    .34  .16  .08  .08 -.07
.51  .52  .51  .4   .35  .43  .47  .34  1   -.06 -.01  0   -.12
.49  .19  .17  .24  .45  .26  .11  .16 -.06  1    .23  .09  .17
.4   .61  .21  .16  .21  .07  .24  .08 -.01  .23  1    .15  .26
.2   .3   .21  .58  .16  .28 -.02  .08  0    .09  .15  1    .03
.15  .1   .19 -.06  .04  .12  .18 -.07 -.12  .17  .26  .03  1

Table 9: Centroid cured matrix analysis criteria
Or_ MD_NumS_ConditDeterm_         HaInRatio_ R_OutIn_ K_Norm_ C_Norm
13  -1   -    148.8   .00007525    0.0000007    52     .026(0)  .266(0)

i.Eigenvalue  Cholesky   i.Eigenvalue  Cholesky   i.Eigenvalue  Cholesky
1.  5.64638   1         2.  1.55102   .6499       3.  1.3104   .6651
4.  1.04562   .7543     5.  .87813    .634        6.  .76073   .6654
7.  .58635    .7248     8.  .41204    .7076       9.  .31787   .7784
10. .18913    .6723     11. .15321    .4617       12. .11117   .6337
13. .03795    .8032
Cholesky decomposition successful, thus the matrix is (semi) positive
definite.

Table 10:  Multiple correlations of centroid cured matrix
r1.rest= .89232      r5.rest= .91033    r9.rest = .77994
r2.rest= .96400      r6.rest= .81196    r10.rest= .80758
r3.rest= .84951      r7.rest= .82456    r11.rest= .89483
r4.rest= .93215      r8.rest= .79282    r12.rest= .78257
r13.rest= .59575

6.  Results of the Centroid-Therapy

The analysis of the multiple correlation coefficients of the "centroid cured" matrix indicates very clearly, that the strong collinearity structures of the original indefinite matrix have disappeared. This means without doubt, that the collinearity- structure- hypotheses put up above cannot be confirmed. This underlines the importance of therapy methods and how much indefinite cor- relations may simulate relations which don't exist at all. The hypotheses about the collinearity structure of the matrix cannot be confirmed. The importance of the centroid factor analysis by THURSTONE as a method of therapy for indefinite correlation matrices could be shown here: Possibly it is not (always) as good as the method by KNOL & BERGE, but very simple and quick.

7. Literature

• FADDEJEW, D. K.,  FADDEJEWA, W. N. (1973) "Numerische Methoden der Linearen Algebra",   dt. Berlin 1973.
• HAIN, Bernhard (1994) "Some Notes on Correlation Matrices", in: SPONSEL (1994)
• HART, B., SPEARMAN, C. (1912-1913) "GENERAL ABILITY, ITS EXISTENCE AND NATURE" The British Journal of Psycho-
• logy, V, p.54, Table I
• KNOL,D.L., TEN BERGE J.M.F. (1989)  "Least-Squares Approximation Of An Improper Correlation Matrix By A Proper One", Psychometrika 54,1,53-61,
• SPONSEL, Rudolf (1994) "Ill-Conditioned Matrices and Collinearity in Psychology", (Loose-leaf-collection; German and English), Erlangen:    IEC-Verlag.
• THURSTONE, L.L. (1947) "Multiple Factor Analysis", Chicago.

Quoting
Sponsel, R. (DAS). Collinearity Analysis And Therapy Of The Indefinite Correlation-Matrix By SPEARMAN & HART (1913).  Internet publication for General and Integrative Psychotherapy IP-GIPT. Erlangen (Germany): https://www.sgipt.org/e/wisms/nis/speahart0.htm