The Use of the Incomplete Tetrad Method for Measuring the Similarities in Nonmetric Multidimensional Scaling

Abstract Research background: So far, many methods of direct measurement of similarity in multidimensional scaling have been developed (e.g. ranking, sorting, pairwise comparison and others). The method selection affects the subjective feelings of the respondents, i.e. fatigue, weariness resulting from making numerous assessments, or difficulties in expressing similarity assessments. Purpose: In the proposed method, for all four-element sets (tetrads) of objects a respondent is asked to pick out the most similar and the least similar pair. Because the number of tetrads increases very rapidly with the number of objects, the aim of the study is to indicate the possibility of measuring similarities based on the reduced number of tetrads. Research methodology: In order to make scaling results independent from respondents’ subjective effects the analysis was made on the basis of the given distance matrix. To construct perceptual maps based on tetrads, multidimensional scaling with the use of the MINISSA program was performed. The quality of matching the resulting points configuration to the configuration determined based on the distance matrix was tested by a Procrustes statistic. Results: It was demonstrated that the choice of the incomplete set of tetrads has no significant effect on the results of multidimensional scaling, even when all pairs of objects in tetrads cannot be presented equally frequently. Novelty: An original method for calculating similarities in nonmetric multidimensional scaling.


Introduction
Multidimensional scaling (MDS) is a technique used for the analysis of similarity (or dissimilarity) data on a set of n objects. MDS produces a multidimensional geometrical representation of objects in a low dimensional space (this is usually a two or three-dimensional perceptual map), where qualitative or quantitative relationships between the objects correspond with geometric relationships of points representing objects on the perceptual map. The goal of MDS is to uncover the meaning of dimensions that allow the researcher to explain any observed similarities or dissimilarities between the tested objects.
Two main techniques are used to look for the perceptual maps. These techniques are referred to as metric and nonmetric multidimensional scaling. While the metric MDS use quantitative information about the dissimilarities between objects, so the algorithms require data from the ratio level of measurement, the nonmetric MDS requires only qualitative information about the dissimilarities. In nonmetric multidimensional scaling, dissimilarities are measured on the ordinal scale. In this case, the researcher is only interested in knowing which of two object pairs (i, j) and (k, l) represented by their dissimilarity δ ij and δ kl , is greater. Numerous scientific papers deal with the theory of these two categories of MDS methods (see e.g. Kruskal, 1964a;Kruskal, 1964b;Cox, Cox, 2001;Borg, Groenen, 2005).
In order to perform multidimensional scaling, it is necessary to collect the complete set of n(n -1)/2 dissimilarities. There are two ways of obtaining input dissimilarities. When the dissimilarities are directly obtained from the empirical subjective measurements of objects performed by subjects, they are called direct dissimilarities. By contrast, when they are calculated from a data matrix associated with these objects, they are called derived dissimilarities. This article focuses only on the first group of dissimilarities.
When the number of objects is high, the number of direct assessments made by respondents becomes too large, and makes the dissimilarities task more difficult. In the article the method of tetrads is proposed to solve this problem in order to make the similarity task easier, while keeping satisfactory scaling solutions. The idea of the method is based on the theory of balanced incomplete block designs (see e.g. Cochran, Cox, 1957;Rink, 1987;Morris, 2010).

The methods of collecting similarities
The most important decision to be taken at the initial stage of nonmetric multidimensional scaling is the selection method for measuring similarities. So far, many more or less popular and widely used methods of direct similarities measurement have been developed (see e.g. Bijmolt, 1996;Zaborski, 2001). There are three main approaches to collecting input similarities (Tsogo, Masson, Bardot, 2000). The first approach is based on the similarities ratings of objects pairs. The second approach uses grouping and sorting tasks in order to calculate similarities. Finally, the third approach consists of pairwise comparisons of similarities. Some of them suggested in the literature are presented in (Table 1).
The differences in the application of various measurement methods may result from the number of objects simultaneously presented to the respondents (e.g. in the method consisting in ranking, sorting or conditional ordering of similarities the respondents simultaneously assess all objects, while in the course of pairwise comparison or triad method, only two or three objects are presented in a sequence), the difficulty in assessing similarities (e.g. ordering for the entire set, especially with a large number of objects is more problematic than selecting the preferred object from two or three items) and the total number of required ratings (in the case of ranking it is just one assessment, and e.g. for the triad method the number of assessments is a cubic function of the number of objects). The subject has to sort the objects into a number of groups, with relatively similar objects in each group

Ratings
The subject has to rate each pair of objects on an ordinal scale, where the extreme values of the scale represent maximum dissimilarity and maximum similarity Ranking of pairs The subject is requested to arrange all possible pairs of objects in order of decreasing similarity Pick k out of n The subject is asked to pick a number of objects which s/he considers most similar to a particular reference object. This process has to be done several times while rotating the reference object

Conditional ranking
One object is presented to the subject as a reference object, and the remaining objects have to be ordered on the basis of their similarity with the reference object. Each of the objects is in turn presented as the reference

Dyads
For each pair of pairs of objects (dyad) the subject has to select the more similar pair of the two

Triads
The subject has to indicate which objects of combinations of tree objects form the most similar pair, and which form the least similar pair Source: Zaborski (2017).
The method selection affects the subjective feelings of the respondents, i.e. fatigue, weariness resulting from making numerous assessments, or difficulties in expressing similarity assessments (see Figure 1).

The incomplete method of tetrads
In the method of tetrads the subject is asked to consider all possible groups of four objects In this situation one should also ask the respondent to indicate the object which is the most similar to one of the pair (i, j). For example, if (i, j) is the most similar pair, k-th object is the most similar to i-th object and (k, l) is the least similar pair, the tetrad is (k, i, j, l).
An advantage of the tetrads method is the relative simplicity of the judgments required of the subjects. Although it may be a useful technique for data collection, the number of tetrads increases very rapidly with the number of objects. The number of tetrads amounts to: Each pair appears in tetrads ( 1)( 2) 2 n n − − times, while each object occurs For 7 objects there are 35 tetrads, but for 12 objects there are 495 tetrads involving 2,970 compared pairs. Obviously, beyond about n = 7, the presentation of the full set of tetrads becomes totally unfeasible and very laborious for the subject.
When the number of tetrads is considered too large to be practical, according to the theory of balanced incomplete block designs, it can be reduced in such a way that all pairs of objects in tetrads are presented equally frequently, but less than ( 1)( 2) 2 n n − − times. If λ denotes the number of tetrads in which each pair of objects occurs (λ = 1, 2, …, ( 1)( 2) 2 n n − − ), than the reduced number of tetrads L λ must satisfy both of these defining relations (see e.g. Cochran, Cox, 1957).
where r is the number of replication of each object in the reduced set of tetrads.
According to the equations (2) the number of tetrads is equal: The number of tetrads for different values of λ and n are shown in (Table 4).
Because it is not possible to define a reduced number of tetrads for all combinations of λ and n, not all the cells in (Table 2) are filled.
As each tetrad involves six compared pairs, it is possible to enter the judgement on each of these paired comparisons into a matrix. The creation of the triangular similarity matrix is possible by giving the pair of objects the number of points, which is equal to the number of pairs in the tetrad, for which it can be assumed that the similarity is smaller than the similarity of a given pair. The value of an element p ij in the i-th row and the j-th column of the similarity matrix is equal to the number of points awarded to a pair consisting of the i-th and the j-th objects in all tetrads. The number of points assigned to pairs from the set of 5 objects marked with the consecutive letters of the alphabet in 5 tetrads and the corresponding similarity matrix are presented in (Tables 3 and 4).
where m ij is the number of pairs (i, j) in the set of tetrads. The denominator in the second component of the equation (4) indicates the maximum possible number of points for the pair (i, j), i.e. when in all tetrads it was considered to be the pair of the most similar objects.

The assessment of the impact of tetrads choice on multidimensional scaling results
In order to make scaling results independent on respondents' subjective effects (boredom, fatigue, difficulties in expressing similarity assessments), the reliability of the use of a reduced number of tetrads was established on the basis of the given distance matrix (see Table 5).
The matrix shows the dissimilarities in the preferences of the University of the Third Age members in Boleslawiec (Lower Silesia province) in relation to the selected forms of activities in 2013. The study involved 109 students who regularly participated in activities within a definite period of time (see Zaborski, 2016). As a result of multidimensional scaling based on the dissimilarity matrix, the configuration of points representing activities was obtained (Figure 2).  There are many sets of tetrads which may be generated for each value of λ. To verify how the choice of a tetrads set affects the preference scaling results, 6 sets of tetrads were generated based on the data in (Table 5): three for λ = 3 (T 1 , T 2 , and T 3 ) and three for λ = 6 (T 4 , T 5 and T 6 ).
All sets are presented in (Table 6).
As it was mentioned previously, it is not possible to determine a reduced number of tetrads for all combinations of λ and n, and in consequence, all pairs of objects in tetrads cannot be presented equally frequently. So each set was modified by subtracting randomly selected two, four and six tetrads. Finally 24 sets of tetrads were obtained. For each set of tetrads similarity matrices were calculated, then they were transformed into a dissimilarity matrix according to the formula (4) and the multidimensional scaling with the use of the MINISSA program was performed. MINISSA performs the basic model of nonmetric MDS by taking data in the form of the full square symmetric matrix (or its lower triangle) of dissimilarities, whose elements are to be transformed to give the distances as the solution.
This transformation will preserve the rank order of the input data.
The quality of matching the resulting points configuration to the configuration determined based on the distance matrix (Table 5) was tested by a Procrustes statistic (see Borg, Groenen, 2005): where X -the configuration of points determined on the basis of the tetrads, and Y -the configuration of points determined on the basis of the distance matrix. R 2 ∈ (0; 1〉, where In addition, for λ = 3 each set was successively reduced by two tetrads, as long as the value of Procrustes statistics starts to fall drastically. 1 The quality of matching of the resulting configurations of points to the configuration determined based on the data from (Table 5) tested by a Procrustes statistic is presented in (Table 7). Explanation: T p -k -set T p (p = 1, 2, …, 6) reduced by k tetrads.
It can be seen that for all generated sets T 0 for λ = 3 and λ = 6 results should be regarded as almost perfect. Even if the number of tetrads in sets was reduced by10, results indicate a very good matching in relation to the scaling carried out for the data set. There is only a small difference in the obtained results between reduced (maximum to 8) sets of tetrads. The difference between the best and the worst solution for all sets in this group is less than 0.085. The low value of the standard deviation which is equal to 0.024 (excluding the results for T p -12 and T p -14 ) attests the fact that the choice of a set of tetrads has no significant effect on the results of multidimensional scaling, even when all pairs of objects in tetrads cannot be presented equally frequently.
The analysis showed that the results clearly deteriorated only when the number of tetrads in sets was less than 8, but in these cases, not all pairs appear in sets.

Conclusions
The results of many studies (see e.g. Humphreys, 1982;Bijmolt, 1996;Zaborski, 2003) indicate that multidimensional scaling based on various methods of measuring similarities gives similar solutions. However, the method selection affects the subjective feelings of respondents, which may result in the different quality of input data. Therefore, the choice of the method of measurement should be guided primarily by two criteria: the method should not be labour-intensive, and expressing opinions on similarities should not cause problems to respondents. The full tetrad method which is proposed in the article does not satisfy the first of the above conditions. The number of ratings which a respondent must make for n objects is equal to the number of four element combinations of the n-element set. The article indicates the possibility of reducing the number of tetrads in such a way that each pair of objects appears in all tetrads equally frequently, but fewer than (n -1)(n -2)/2 times. In the example for 9 objects it was shown that scaling based on 8 tetrads gave a good solution. Using the method of triads, where a respondent is asked to pick out the most similar and the least similar pair from the three element sets, obtaining comparable results requires over three times more assessments (see Zaborski, 2017). It was also demonstrated that the choice of the incomplete set of tetrads has no significant effect on the results of nonmetric multidimensional scaling, even when all pairs of objects in tetrads cannot be presented equally frequently. This conclusion is particularly relevant for the creation of a reduced set of tetrads when the number of objects does not allow fulfilling the condition of an equal number of pairs. The analysis indicated that the tetrad method can be used if each pair of objects appears in sets at least once.