 Ill-Conditioned Correlation Matrices By Spearman _ German

Rudolf  Sponsel translated by Dipl. Psych. Agnes Mehl

0. Summary
We shortly introduce the problem of ill-conditioned correlation matrices and collinearity as well as its destructive and constructive aspects. Some important criteria of the analysis of numeric stability respectively instability are explained. Afterwards we report on the correlation matrices by Spearman (and co-authors Hart and Holzinger): Altogether 38 correlation matrices were included, 30 of them being genuinely different ones (order 5-14, thus rather small matrices). 8 of the 30 (27%) contain misprints in the upper and lower triangle matrix. Of the 30 genuinely different matrices 7 (23%) are indefinite with 1-3 negative eigenvalues significantly above 0. 3 (10%) are clearly ill-conditioned, 3 (10%) are borderlines, 17 (57%) are numerically stable. In at least 4 (13%) out of the 30 cases "corrections for attenuation" were recorded, in at least 3 cases (13%) pooling of the correlation coefficients, which partly is not indicated. The error rate doesn't go with the otherwise precise and systematical style of SPEARMAN. Compared to the rate of indefinite matrices of our main study (Sponsel 1994) being 17.9%, the one of Spearman being 23% is clearly higher.

1. Ill-Conditioned Matrices
Systems will be called ill-conditioned, if small changes of the input data lead to large changes of the output data. Linear equation systems and correlation matrices quasi according to their nature are often ill-conditioned, i.e. they mostly are very instable. Practically this means, that the coefficients can no longer be trusted - even before any significance testing. In a historic analysis of 769 correlation matrices from 1910 to 1993 Sponsel (1994) found 47.5% numerically instable correlation matrices. 17.9% of them were indefinite, thus having lost their positive definiteness and producing mathematically absurd results like e.g. multiple and partial correlation coefficients larger than one.

Example: Multiple correlation coefficient of the matrix by  Spearman and Hart 1913
 r1.rest  =  .9495 r5.rest  = 6.9606 r9.rest  =  .7154 r13.rest =  .9175 r2.rest  =  .9922 r6.rest  =  .9665 r10.rest =  .757 r3.rest  =  .9669 r7.rest  = imaginary r11.rest =  .9755 r4.rest  = 1.3566 r8.rest  =  .7307 r12.rest = imaginary

For numeric stability there are a number of criteria. Among the most effective criteria are: the smallest eigen-value, the different condition measures (largest absolute eigenvalue : smallest absolute eigenvalue and the HADAMARD condition, best applied to the inverse (FADDEJEW & FADDEJEWA 1973) and the reduced norms according to the Pivotized Erhard Schmidt Orthonormaization (PESO analysis by HAIN 1994)).

2. The two natures of collinearity: constructive and destructive implications

Correlation matrices reach their maximum numeric instability with the matrix being singular. Then the determinant is 0 and at least one eigenvalue also equals zero. Such a matrix contains at least one collinearity. Looking at it differently: the matrix contains - mathematical -  redundant information or a functional relation. At least one variable is redundant. According to the theory of science collinearity implicates a law. This reflexion unfortunately did not get into the centre of research interest because of the predominance of factor analysis. In the case of at least one eigenvalue being close to 0, one gets at least one almost-collinearity. In the reality of empirics and numerics hardly any exactly singular or collinear correlation matrices occur. One always finds only approximate singularity or almost collinearity. Product-moment or Pearson correlation matrices have to be positive definite (HAIN 1994). However, as numeric calculating in reality mostly becomes finite after few roundings, it may occur, that collinearity in combination with rounding errors leads to the loss of positive definiteness of a correlation matrix. This is a simple case and easy to "cure". One recognizes the "simplicity" from the negative eigenvalues being very small (rule of thumb: 3rd digit after the decimal point). More difficult it should be, if major methodological mistakes were made, e.g. calculating tetrachoric instead of using Pearson (thus possibly violating the condition of the normal distribution as with THURSTONE "Primaries..."1938), coefficients being "corrected for attenuation", or being treated by another strange correction formula or the coefficients being based on different sample sizes as with meta analyses or wrong missing data solutions of elimination in pairs. One can recognize the problem case from the size of the negative eigenvalue (rule of thumb: eigenvalue >= 2. digit after the decimal point), whereby completely derailed multiple and partial correlation coefficients can occur. Such correlation matrices can no longer be interpreted.

3. Analysis and Report of 38 Spearman matrices

3.1. Explanation of the abbreviations of the evaluation

Samp_Or_MD_NumS_Condit_Determ_HaInRatio_R_OutIn_K_Norm_C_Norm
_
Samp  =:  sample size

Or    =:  order of matrix

Md    =:  missing data information:  -1 =: unknown

NumS  =: valuation (& possibly number of negative  eigenvalues). Here exist so far the following rule of thumb valuations:
+   numerically stable
+?  borderline, tendency to be rather  numerically stable
-?  borderline, tendency to be rather  numerically instable
-   numerically instable
--Z  indefinite with Z negative eigenvalues
 Also only one negative eigenvalue being given, the matrix is indefinite and derailments of any kind are possible. The matrix has turned "psychotic" so to speak: no value can be trusted anymore, all is possible. Such a state has to be avoided at all costs, or to be reversed respectively immediately "treated" before any further calculating can be done.

Condit = Highest absolut eigenvalue / Smallest absolut eigenvalue with order < 10, Condit shows > 30; with order < 20, Condit shows > 50 and thereby indicates numeric instability.

Determ =: determinant. The determinant represents the absolute value of the volume of the n-dimensional parallelotope (multidimensional object). The smaller it is, the smaller is the volume of the space. A small volume hereby can be caused by a single small vector equaling a small angle. This is the critical case. A small determinant, however, can also result very "normally" from the "natural" calculating process without having to express numeric instability. A valuation only because of the absolute value of the determinant thus is not reasonable.

HaInRatio =:  HADAMARD number inverse. The HADAMARD number of the inverse indicates, which ratio the real determinant of the inverse has got of its theoretically maximum value with the coefficient matrix being given. According to the rule of thumb by FADDEJEW
& FADDEJEWA an inverse determinant will be considered small if for its ratio 1 : 50 000 is valid, thus the HaInRatio being < .00002.

R_OutIn =: LES Input Output Ratio (SPONSEL 1994). The input output ratio indicates, by how much the output will be changed, if a change by one unit is made around the third digit after the decimal point. Theoretically the value ranges from 0 to ...

K_Norm =: smallest PESO-Norm correlation matrix (HAIN 1994). The smallest reduced norm ("shortest" norm, "flatest" angle) of the correlation matrix is a measure of the degree of collinearity. The smaller the value of the K_Norm, the stronger is the degree of collinearity. The product of all reduced norms results in the absolute value of the determinant. Thus a single small K_Norm is sufficient to bring the volume close to 0 (equivalent to the function of small eigenvalues). PESO for the correlation matrix is adjusted in a way, that for all K_Norms < 0.01 in brackets the number of relations (collinearities) is printed out. The root of the K_Norm gives an upper boundary of the highest correlation coefficient.

C_Norm =: smallest PESO-Norm CHOLESKY matrix (HAIN 1994). The C_Norm represents the smallest reduced norm of the CHOLESKY matrix. The importance of the CHOLESKY decomposition is based among other things on the isometry to the raw scores. The smallest C_Norm indicates the smallest angle given for the centered standardized raw scores. The square of the smallest C_Norm is larger or equal the smallest K_Norm. As empirical rule of thumb is valid: (C_Norm^2)/(2...5) ~ K_Norm. Furthermore is valid: r(multiple) = SQR (1-C_Norm^2). Thus from the C_Norm directly multiple correlation coefficients can be determined. The relation or collinearity can be expressed by the smallest CHOLESKY norm, thus also by the well-known and usual multiple correlation coefficient. A C_Norm < .31 can serve as a critical boundary for a collinearity starting to be more significant.

3.2   Report Analysis Spearman's Correlation Matrices

SPEARMAN, C. (GB: University College London),  HART, B.  (G)  "GENERAL ABILITY, ITS EXISTENCE AND NATURE"  The British Journal of Psychology, V, 1912-1913
Detailed Collinearity Analysis And Therapy Of The Indefinite Correlation-Matrix By SPEARMAN & HART (1913).

(G1) p.54, Table I
Samp_Or_MD_NumS_ Condit_ Determ_     HaInRatio_ R_OutIn_ K_Norm_ C_Norm
-1   13 -1 --1   733.3  -.0000167538  2.21 D-12  394.2   5D-3(1)  -1(-1)

(G2) p.62,  Table III "Coeffcients of Bonser, boys and girls pooled together
Samp_Or_MD_NumS_ Condit_ Determ_     HaInRatio_ R_OutIn_ K_Norm_ C_Norm
-1   5  -1  +     4.68   .40507469    .4136093   .8      .479(0) .816(0)

SPEARMAN, C. (GB: University College London), HOLZINGER, K. (USA:) "NOTE ON THE SAMPLING ERROR OF TETRAD DIFFERENCES",  The British Journal of Psychology 16,1925/26, p.87 Table I (N=50)   ->  SPEARMAN, C. (A7)

SPEARMAN, C. (GB: University College London)

(1) "'GENERAL INTELLIGENCE', OBJECTIVELY DETERMINED AND MEASURED",
The American Journal of Psychology, 15, 1904, p.275.

Samp_Or_MD_NumS_ Condit_ Determ_     HaInRatio_ R_OutIn_ K_Norm_ C_Norm
-1  6  -1   +?   29.5   .0174353774  .0689510    3.7    .086(0) .431(0)

(2)  "THE THEORY OF TWO FACTORS" The Psychological Review 21, 1914,

(2a) p.102 Table I 'The SIMPSON-THORNDIKE Correlations ('Raw')'
above diagonal r11.8 = .34

Samp_Or_MD_NumS_ Condit_ Determ_     HaInRatio_ R_OutIn_ K_Norm_ C_Norm
-1  14 -1  --2   485    .000001053    2.1 D-9    326    .008(1)  -1(-1)

(2b) p.102 Table I 'The SIMPSON-THORNDIKE Correlations ('Raw')'.
below diagonal r8.11 = .54
Samp_Or_MD_NumS_ Condit_ Determ_     HaInRatio_ R_OutIn_ K_Norm_ C_Norm
-1  14 -1  --2   1149   .000000470  6.87 D-13   4479    3D-3(1)  -1(-1)

Remark on (3a,3b):
The correlations r5 and r8 of the crossing out tests from table I were  pooled and combined as correlation r5 in table III: In contrast to (A5) at least this unusual procedure was mentioned. Misprints with r13.7, r7.13, r5.9, r9.5. See (SIM) Abilities according to (A4b)

(3a) p.112, Table III 'The SIMPSON-THORNDIKE Correlations After Pooling Together The Two Tests Of Cancellation' (above main diagonal r7.13=.27);  r5(p.112)=(r5+r8)/2 (p.102)).

Samp_Or_MD_NumS_ Condit_ Determ_     HaInRatio_ R_OutIn_ K_Norm_ C_Norm
-1   13 -1  --1  908.7  -.00000124   6.35D-11   1096.6  4D-3( 1)  -1(-1)
(3b) p.112 Table III (below main diagonal r13.7=.29; r5(p.112) = (r5+r8)/2 (p.102)).

Samp_Or_MD_NumS_ Condit_ Determ_     HaInRatio_ R_OutIn_ K_Norm_ C_Norm
-1   13 -1  --1   963.6  -.000001251  3.43D-11   1324.9  3D-3( 1)  -1(-1)

(S) "THE SUB-STRUCTURE OF THE MIND", The British Journal of Psychology, Vol.18, Part 3, 1928, N=40,

(S1) p.253: Table of correlations with n=40. Obtained by tossing as described. The correlations are arranged in best 'hierarchical' order.
Samp_Or_MD_NumS_ Condit_ Determ_     HaInRatio_ R_OutIn_ K_Norm_ C_Norm
40  10 -1   +    31.6   .001893539   .0087655      1      .138(0)  .563(0)

(S2) p.253: Table of inter-columnar correlations obtained from the table preceding.
Samp_Or_MD_NumS_ Condit_ Determ_     HaInRatio_ R_OutIn_ K_Norm_ C_Norm
10  10 0  --1   532.1  -.0001934245   7.09 D-13   5454.1  9D-3(1)  -1(-1)

"THE ABILITIES OF MAN  - THEIR NATURE AND MEASUREMENT" AMS Press New York 1970 reprint 2.ed. 1932 (first 1926/27)

(A1)  p.74
Samp_Or_MD_NumS_ Condit_ Determ_     HaInRatio_ R_OutIn_ K_Norm_ C_Norm
-1   5 -1   +     14    .1907942394  .1426199     2.6    .157(0) .532(0)

(A2a) p.141 (Data from McDONNEL Biometrika 1901, N=3000) print error r3.6,r6.3 SP141A7O.K07 r3.6=.353.
Samp_Or_MD_NumS_ Condit_ Determ_     HaInRatio_ R_OutIn_ K_Norm_ C_Norm
3000 7  -1   +    33.3   .0121928243  .0254251   .8      .08(0)  .411(0)

(A2b) p.141  SP141A7U.K07 r6.3=.363.
Samp_Or_MD_NumS_ Condit_ Determ_     HaInRatio_ R_OutIn_ K_Norm_ C_Norm
3000 7  -1  +    33.4   .0121471758  .0254271     .9     .08(0)  .411(0)

(A3) p.143  SP143A8.K08  (Data from GATES, A. Journ. Educ. Research, 1924,
p.341 (print error r8,6, r8.7).
Samp_Or_MD_NumS_ Condit_ Determ_     HaInRatio_ R_OutIn_ K_Norm_ C_Norm
115  8  -1 --1    81.1   -.00236261   .0003480    66.4   .04(0)   -1(-1)

(A4a)  p.144  r1.4=.579 (Data from DOLL, N=477).
Samp_Or_MD_NumS_ Condit_ Determ_     HaInRatio_ R_OutIn_ K_Norm_ C_Norm
477  6  -1  -?    40.2   .0123232675  .0199045    3.5   .077(0)  .419(0)

(A4b)  p.144  r4.1=.580 (Data from DOLL, N=477)
Samp_Or_MD_NumS_ Condit_ Determ_     HaInRatio_ R_OutIn_ K_Norm_ C_Norm
477  6  -1  -?    40.3   .0123128172  .0198543    3.5   .077(0)  .419(0)

Remark refering to p. 145: There are different sources and different correlation matrices of the SIMPSON-THORNDIKE matrix:

a) SIMPSON-THORNDIKE original (not available to me)
b) SPEARMAN "Theory Of Two Factors", Psych.Rev. 21, 1914, p.102 Table I.
c) SPEARMAN "The Abilities Of Man", 1.ed. 1927 (p.145) according to -> PAWLIK 1968, p.106.
d) SPEARMAN "The Abilities Of Man" 2.ed.1932, reprint of this 1970.
The correlation matrices b), c) and d) are different.
r8.11 (in b) =.54
r8.11 (in d) =.34
Thus the numeric criteria values of (c,d) and (b) differ (essentially):
In (c,d; r8.11 =.34):
Ratio max. range out-/input = 326
Condition number HEVA/LEVA  = 485
In (b, r8.11 =.54):
Ratio max. range out-/input = 4479
Condition number HEVA/LEVA  = 1149

(A5) p.147  (Data from BROWN, W. Brit.J.Psych.1910 p.309)
Samp Or MD  NumS  Condit  Determinant  HaInRatio  R_OutIn  K_Norm   C_Norm
66  8  -1   +     7.7    .2088387535  .1645218     2.1    .348(0)  .738(0)

Remark on (A5)
SPEARMAN's statements (1932) on a correlation matrix by William BROWN don't correspond with his original statements. After longer reflections and analysis I found, that SPEARMAN simply pooled some correlation coefficients by BROWN, without explaining this in any way. The reconstruction furthermore was made more difficult by the arrangement. However, it is amazing, that the matrix is numerically stable although being pooled .

(A6) p.147 (Data (N=757) from BONSER Brit.J. Psych. 1912,p.62.
Samp_Or_MD_NumS_ Condit_ Determ_     HaInRatio_ R_OutIn_ K_Norm_ C_Norm
757 5  -1   +    4.68   .40507469    .4136093    .8     .479(0) .816(0)

(A7) p.148 (Data (N=50) from HOLZINGER)
Samp_Or_MD_NumS_ Condit_ Determ_     HaInRatio_ R_OutIn_ K_Norm_ C_Norm
50  9  -1  +    17.56  .04321785    .0304050     .9     .239(0) .667(0)

(A8a) p.149 (Data, N=149, from MAGSON Brit. J. Psych. Mon. Suppl.9, 1926), r1.7=.45, r2.5=.50.
Samp_Or_MD_NumS_ Condit_ Determ_     HaInRatio_ R_OutIn_ K_Norm_ C_Norm
149  7  -1  +     8.89   .10356768   .1976976    .6      .305(0) .719(0)

(A8b) p.149 r7.1=.48, r5.2=.28.
Samp_Or_MD_NumS_ Condit_ Determ_     HaInRatio_ R_OutIn_ K_Norm_ C_Norm
149  7  -1  +    9.23   .10814521     .1930725    .7     .29(0)  .704(0)

(A9) p.152  (Data from BALDWIN)
Samp_Or_MD_NumS_ Condit_ Determ_     HaInRatio_ R_OutIn_ K_Norm_ C_Norm
-1   6  -1   -   136.9  .0001661343   .0023895    5.2    .025(0) .262(0)

(A10a) p.153  N=2599, r4.5=.331
Samp_Or_MD_NumS_ Condit_ Determ_     HaInRatio_ R_OutIn_ K_Norm_ C_Norm
2599 7  -1   +    9.3   .1108558579  .2207524     .5     .298(0) .712(0)

(A10b) p.153  N=2599, r5.4=.337
Samp_Or_MD_NumS_ Condit_ Determ_     HaInRatio_ R_OutIn_ K_Norm_ C_Norm
2599 7  -1   +    9.3   .1106713125  .2213055     .5     .298(0) .712(0)

(A11) p.156
Samp_Or_MD_NumS_ Condit_ Determ_     HaInRatio_ R_OutIn_ K_Norm_ C_Norm
-1  8  -1   -?   50.9   .00030370   .0211751     1.7    .066(0) .409(0)

(A12) p.171
Samp_Or_MD_NumS_ Condit_ Determ_     HaInRatio_ R_OutIn_ K_Norm_ C_Norm
-1  5  -1  -    157.8  .004854174   .0004100    175.1   .017(0) .196(0)

(A13) p.218 "78 Normal Children (Corrected For Attenuation)"
Samp_Or_MD_NumS_ Condit_ Determ_     HaInRatio_ R_OutIn_ K_Norm_ C_Norm
78  12 -1  -    96.9   .00008493    .0000471     21.1   .037(0) .322(0)

(A14) p.218  "22 Defective Children (Corrected For Attenuation)"
Samp_Or_MD_NumS_ Condit_ Determ_     HaInRatio_ R_OutIn_ K_Norm_ C_Norm
22  12 -1  --3   815.5 -.0000000036  2.85 D-9   426.7   5D-3(1)  -1(-1)

(A15)  Data, N=200, from COLLAR, Brit. J. Psych.

(A15a)  r4.5=.517
Samp_Or_MD_NumS_ Condit_ Determ_     HaInRatio_ R_OutIn_ K_Norm_ C_Norm
200 6  -1   +    21.2   .0366516723  .0343203    1.7    .122(0) .493(0)

(A15b)  r5.4=.255
Samp_Or_MD_NumS_ Condit_ Determ_     HaInRatio_ R_OutIn_ K_Norm_ C_Norm
200 6  -1  +?    26.8   .0348545063  .0272194   5.8     .093(0) .436(0)

(A16)  p.296, 4.K04,  N=77
Samp_Or_MD_NumS_ Condit_ Determ_     HaInRatio_ R_OutIn_ K_Norm_ C_Norm
77  4  -1   +    5.9   .322311345   .3110141     .6     .38(0)  .753(0)

(A17) p.301 N=47
Samp_Or_MD_NumS_ Condit_ Determ_     HaInRatio_ R_OutIn_ K_Norm_ C_Norm
47  9  -1   +    11.3   .1337945815  .0357932    4.4    .295(0) .695(0)

(A18) p.314
Samp_Or_MD_NumS_ Condit_ Determ_     HaInRatio_ R_OutIn_ K_Norm_ C_Norm
-1  4  -1   +    20.8   .1618624096  .0513551    5.8    .105(0) .443(0)

(A19a) p.315  r3.6=-.10  r5.7= -.02
Samp_Or_MD_NumS_ Condit_ Determ_     HaInRatio_ R_OutIn_ K_Norm_ C_Norm
-1   7  -1   +    6.9    .3680140648  .0591439    28.5   .382(0) .755(0)

(A19b) p.315  r6.3= .10  r7.5= -.32
Samp_Or_MD_NumS_ Condit_ Determ_     HaInRatio_ R_OutIn_ K_Norm_ C_Norm
-1   7  -1   +    5.5    .3679044677  .1085426    21.4   .45(0)  .795(0)

(A20) p.325
Samp_Or_MD_NumS_ Condit_ Determ_     HaInRatio_ R_OutIn_ K_Norm_ C_Norm
80   8  -1  +     7.6   .2312178632  .1175382    2.1    .383(0) .762(0)

(A21) p.346  (corrected for attenuation)
Samp_Or_MD_NumS_ Condit_ Determ_     HaInRatio_ R_OutIn_ K_Norm_ C_Norm
140 6  -1 --2   131.4   .002747316   .0003696    73     .024(0)  -1(-1)

(A22) p.347 "The following is the table for the students, after correcting for attenuation and eliminating the influence on g (by Yule's formula see p.156)":
Samp_Or_MD_NumS_ Condit_ Determ_     HaInRatio_ R_OutIn_ K_Norm_ C_Norm
-1  8  -1 --2   232.7   .00204695    .0000197   115.2   .016(0)  -1(-1)

4. Literature

• FADDEJEW & FADDEJEWA "Numerische Methoden der Linearen Algebra",  dt. Berlin 1973.
• HAIN, Bernhard (1994) "Some Notes on Correlation Matrices", in:
• SPONSEL, Rudolf (1994) "Ill-Conditioned Matrices and Collinearity in Psychology", (Loose-leaf-collection; German and English), Erlangen:   IEC-Verlag.
• THURSTONE, L.L. (1938) "Primary Mental Abilities", Chicago.

Quoting
Sponsel, R. (DAS). Ill-Conditioned Correlation Matrices By Spearman (with Hart, Holzinger). Internet publication for General and Integrative Psychotherapy IP-GIPT. Erlangen (Germany): https://www.sgipt.org/e/wisms/nis/spearm0.htm