221
Views
7
CrossRef citations to date
0
Altmetric
PAPERS

Key Sectors, Industrial Clustering and Multivariate Outliers

&
Pages 57-73 | Received 01 Jul 2006, Published online: 17 Mar 2008
 

Abstract

In this paper a reflection is made on the problems that can arise in key sector analysis and industrial clustering, due to the usual presence of outliers when using multidimensional data related to the sectors in an input–output table. Multidimensional outliers are considered as being not only linked to the low number of clusters usually observed in this kind of study, but probably causing invalid results in most of the works involving multivariate statistical techniques, such as cluster and factor analysis. Actually, by comparing the key sectors of the Spanish economy obtained in Díaz et al. Citation(2006) to the ones we get taking into account the problem the outliers pose, one can realize they greatly distort the results. On the other hand, it is shown that identification of outliers can be considered as a good and new procedure to help select the most important sectors in an economy.

Notes

1. An excellent exposition of this problem and alternative solutions is given in Barnett and Lewis (Citation1994, Chapter 7), which is a classic on the subject.

2. See Kempthorne and Mendel Citation(1990) for related comments on bivariate outliers.

3. Most of the methods proposed for discordant tests are based on the multidimensional normality hypothesis of the data or on other specific multivariate distributions (such as exponential, Pareto).

4. The classic Mahalanobis or statistical distance has been criticized for being affected by the masking effect (see Rousseeuw and Van Driessen, Citation1999; and Becker and Gather, Citation1999).

5. Rousseeuw and Yohai Citation(1984) define a finite-sample breakdown point as the smallest fraction of contaminated data that can lead to an estimator taking an unlimited range of values.

6. Affine equi-variance is another desirable property of an estimator. A location estimator is affine equi-variant if and only if stretching or rotating the data will not change an affine estimate of the data (Hardin and Rocke, Citation2004).

7. Some algorithms for approximating the MVE were proposed in Rousseeuw and Leroy Citation(1987), and Woodruff and Rocke Citation(1993); to compute the MVE exactly see Cook et al. Citation(1992) or Agulló Citation(1996). For a survey of applications, see Rousseeuw Citation(1997).

8. Although initially the MVE method has been preferred to the MCD method (because MCD was harder to compute), Rousseeuw and Van Driessen Citation(1999) propose the Fast MCD method and recommend its use over the MVE method. Both from a computational and statistical point of view the MCD method outperforms the MVE method. MCD estimators are asymptotically normal, whereas MVE has a lower convergence rate.

9. For n larger than 600, there is a modified version of the algorithm, see Rousseeuw and Van Driessen (Citation1999, pp. 217–218) for details.

10. As Rousseeuw and Van Driessen (Citation1999, p. 212) point out, ‘Although it is quite easy to detect a single outlier by means of the Mahalanobis distances, this approach no longer suffices for multiple outliers because of the masking effect, by which multiple outliers do not necessarily have large Mahalanobis distances. It is better to use distances based on robust estimators of multivariate location and scatter’.

11. Some authors claim that an F distribution seems to be more appropriate. See Hardin and Rocke Citation(2005) for a complete description of the distributions of robust distances coming from normally distributed data.

12. The highest possible breakdown value for MCD is (n + p + 1)/2, see Lopuhaä and Rouseeuuw (Citation1991).

13. It is defined as the square root of the ratio between the highest and lowest eigenvalue.

14. The multiplier comes from the relative influence graph (Lantner, Citation1974, Citation2001), associated with the distribution coefficients matrix, and can be interpreted in terms of elasticities. The cohesion concept is the number of times that a sector is used to connect any other two sectors (Rossier, Citation1980). The topological index is very similar to the Yan and Ames Citation(1965) interrelation index (see Morillas, Citation1983, for all these influence graph-based indexes).

15. As we have said before, the maximum number of outliers allowed is: [n − [(n + p + 1)/2]] = 26

16. The three highest distances are not included in the figure, because they would change the scale and it would be impossible to see any other changes in the graphs.

17. After the principal components analysis was done, the variable production trend, with the lowest significance level, appears as irrelevant.

18. This paper provides also an overview of different approaches for identifying key sectors.

19. NERFCM is an iterative algorithm. For initialization of the memberships a robust crisp clustering result is provided. This is performed with a Partition Around Medoids (PAM) clustering (Kaufman and Rousseeuw, Citation1990), so that initially the memberships are either 0 if the sector does not belong to a cluster or 1 if it does. The appropriate number of clusters has been decided based on the silhouette width (Rousseeuw, Citation1987). These memberships will be updated within the NERFCM algorithm.

20. The silhouette width has values between − 1 and 1. If it equals 1, the ‘within’ dissimilarity is much smaller than the ‘between’ dissimilarity (minimum of the distances to the elements of other clusters), so the sector has been assigned to an appropriate cluster. If it equals 0, it is not clear whether the sector should be assigned to that cluster since the ‘within’ and ‘between’ dissimilarity are equal. If the silhouette width equals -1, the sector is misclassified. The overall average silhouette width is the average of the silhouette width for all sectors in the whole dataset.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 53.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 773.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.