Article
Keywords:
multiple linear regression; detection of influential points
Summary:
A method of geometrical characterization of multidimensional data sets, including construction of the convex hull of the data and calculation of the volume of the convex hull, is described. This technique, together with the concept of minimum convex hull volume, can be used for detection of influential points or outliers in multiple linear regression. An approximation to the true concept is achieved by ordering the data into a linear sequence such that the volume of the convex hull of the first $n$ terms in the sequence grows as slowly as possible with $n$. The performance of the method is demonstrated on four well known data sets. The average computational complexity needed for the ordering is estimated by $O(N^{2+(p-1)/(p+1)})$ for large $N$, where $N$ is the number of observations and $p$ is the data dimension, i. e. the number of predictors plus 1.
References:
[2] Barnett V.: The ordering of multivariate dat.
[4] Boček P., Lachout P.:
Linear programming approach to LMS–estimation. Comput
Zbl 0875.62292
[6] Buchta C., Müler J.: Random polytopes in a bal.
[7] Chand D. R., Kapur S. S.: An algorithm for convex polytope.
[11] Efron B.: The convex hull of a random set of point.
[12] Hadi A. S.: Identifying multiple outliers in multivariate dat.
[19] Rousseeuw P. J.: Least median of squares regressio.
[22] Rousseeuw P. J., Zomeren B. C. van: Unmasking multivariate outliers and leverage points (with comments). J. Amer. Statist. Assoc