A Statistical Toolbox For Mining And Modeling Spatial Data
DOI:
https://doi.org/10.1515/cer-2016-0035Keywords:
duality diagram, spatial autocorrelation, Moran’index, Moran’s Eigenvector Maps, Laplace operator, spatial eigenfunction filteringAbstract
Most data mining projects in spatial economics start with an evaluation of a set of attribute variables on a sample of spatial entities, looking for the existence and strength of spatial autocorrelation, based on the Moran’s and the Geary’s coefficients, the adequacy of which is rarely challenged, despite the fact that when reporting on their properties, many users seem likely to make mistakes and to foster confusion. My paper begins by a critical appraisal of the classical definition and rational of these indices. I argue that while intuitively founded, they are plagued by an inconsistency in their conception. Then, I propose a principled small change leading to corrected spatial autocorrelation coefficients, which strongly simplifies their relationship, and opens the way to an augmented toolbox of statistical methods of dimension reduction and data visualization, also useful for modeling purposes. A second section presents a formal framework, adapted from recent work in statistical learning, which gives theoretical support to our definition of corrected spatial autocorrelation coefficients. More specifically, the multivariate data mining methods presented here, are easily implementable on the existing (free) software, yield methods useful to exploit the proposed corrections in spatial data analysis practice, and, from a mathematical point of view, whose asymptotic behavior, already studied in a series of papers by Belkin & Niyogi, suggests that they own qualities of robustness and a limited sensitivity to the Modifiable Areal Unit Problem (MAUP), valuable in exploratory spatial data analysis.
Downloads
References
Anselin L. (1988), Spatial Econometrics: Methods and Models, Kluwer Academic Publishers, Dordrecht, The Netherland.
Google Scholar
Anselin L. (1995), Local indicators of spatial association - LISA. Geographical Systems, 3: 1–13.
Google Scholar
Anselin L. & Rey S.J. (2014), Modern Spatial Econometrics in Practice, GeoDa Press LLC, Chicago IL, USA.
Google Scholar
Aubigny (Drouet d’) G. (1989), L’Analyse Multidimensionnelle des Données de Dissimilarité, Thèse de Doctorat d’état es Sciences Mathématiques, Université Joseph Fourier – Grenoble I, France.
Google Scholar
Aubigny (d’) G. (2006), Dépendance spatiale et auto-corrélation, in J.~J. Droesbeke, M. Lejeune & G. Saporta (Eds.), Analyse Statistique des Données Spatiales, Editions TECHNIP, Paris, France: Chap 2: 17–45.
Google Scholar
Aubigny (d’) G. (2009), The Analysis of Proximity Data, in Govaert G. (Ed.), Data Analysis, John Wiley & sons Inc., Hoboken, USA: Chap 4: 93–147.
Google Scholar
Aubigny (d’) G. (2012), Analyse contextuelle et modélisations multiniveaux des Données Electorales. Coordinateur principal, Action Concertée Incitative ʻTerrains, Techniques, Theorie: travail interdisciplinaire en Sciences Humaines et Socialesʼ. Rapport de fin de projet, Grenoble, France. 148 pages.
Google Scholar
Aubigny (d’) C. & Aubigny (d’) G. (2009), New LISA indices for spatio-temporal Data Mining, XVI-èmes Rencontres de la Société Francophone de Classification, Grenoble, 2–4 Septembre, France.
Google Scholar
Bapat R.B. (2010), Graphs and Matrices, Springer, New York, USA.
Google Scholar
Belkin M. & Niyogi P. (2001), Laplacian Eigenmaps and Spectral techniques for Embedding and Clustering. Advances in Neural Information Processing Systems, 595–591.
Google Scholar
Belkin M. & Niyogi P. (2003), Laplacian Eigenmaps for Dimensionality Reduction and Data. Neural Computation, Vol. 15, No 6: 1373–1396.
Google Scholar
Belkin M., Sun J. & Wang Y. (2009), Constructing Laplace Operator from Point Clouds in . In Proceedings of the Symposium on Discrete Algorithms, 1031–1040.
Google Scholar
Besag J. (1974), Spatial interaction and the statistical analysis of lattice systems. Journal of the Royal Statistical Society Series B 36:192–236.
Google Scholar
Bollobas B. (1990), Modern Graph Theory, Springer, New-York, USA.
Google Scholar
Borcard D. & Legendre P. (2002), All-scale spatial analysis of ecological data by means of principal coordinates of neighbour matrices. Ecological Modelling 153 : 51–68.
Google Scholar
Cailliez F. & Pages A.J. (1976), Introduction à l’Analyse des Données, SMASH, Paris, France.
Google Scholar
Chessel D. & Mercier P. (1993), Couplage de triplets statistiques et liaisons espèce environnement. In: Biométrie et environnement. J.D. Lebreton et B. Asselin (Eds.), Masson, Paris, France, 1993.
Google Scholar
Chung F.R.K. (1997), Spectral Graphs Theory, American math. Society Ed., CBMS 92, USA.
Google Scholar
Cliff A.D. & Ord J.K. (1981), Spatial Processses: Models and Applications, Pion Limited, London, UK.
Google Scholar
Doyle P.G. & Snell J.L. (1984), Random Walks and Electric Networks, Carus Mathematical Monographs Number 22, The Mathematical Association of America, Washington D.C, USA.
Google Scholar
Dray S., Chessel D. & Thioulouse J. (2003), Co-inertia Analysis and the Linking of Ecological Data Tables. Ecology 84(11):3078–3089.
Google Scholar
Dray S., Legendre P. & Peres-Neto P.R. (2006), Spatial modelling: a comprehensive framework for principal coordinate analysis of neighbour matrices (PCNM). Ecological Modeling 196: 483–493.
Google Scholar
Escoufier Y. (1987). The Duality Diagram: a means for better practical applications, in Legendre, P. & Legendre L. (Eds.), Developments in Numerical Ecology: NATO ASI Series, Series G: Ecological Sciences, Vol 14. Springer, New-York, USA: 139–156.
Google Scholar
Geary, R.C. (1954), The Contiguity Ratio and Statistical Mapping. The Incorporated Statistician 5: 115–145.
Google Scholar
Getis A. and J.K. Ord (1992). The analysis of spatial association by use of distance statistics. Geographical Analysis, 24: 189-206.
Google Scholar
Gower J.C. (1966), Some distance properties of latent root and vector methods used in multivariate Analysis. Biometrika, 55: 325–388.
Google Scholar
Griffith D.A. (2000), A linear regression solution to the spatial autocorrelation problem. Journal of Geographical Systems 2: 141–156.
Google Scholar
Griffith D.A. (2003), Spatial Autocorrelation and Spatial Filtering: Gaining Understanding Through Theory and Scientific Visualization (Second Edition), Springer, New-York, USA.
Google Scholar
Lebart L. (1969), Analyse statistique de la contiguité. Publications de l'Institut de Statistique de l'Université de Paris, 28, pp. 81–112.
Google Scholar
Legendre P. & Legendre L (2012), Numerical Ecology (Third English Edition), ELSEVIER, Amsterdam, The Netherland.
Google Scholar
Moran P.A.P. (1950), Notes on continuous stochastic phenomena. Biometrika 37:17–23.
Google Scholar
Qiu H. & Hancock E.R. (2007), Clustering and Embedding Using Commute Times. IEEE Trans. On Pattern Analysis and Machine Intelligence, Vol. 22, No 8: 888–905.
Google Scholar
Rosenberg S. (1997), The Laplacian on a Riemannian Manifold, Cambridge University Press, Cambridge, USA.
Google Scholar
Saerens M., Fouss F., Yen L. & Dupont P. (2004), The Principal Components Analysis of a Graph, and its relationships to Spectral Clustering. Proc. 15th European Conference in Machine Learning Vol. 3201: 371–383.
Google Scholar
Shi J. & Malik J. (2000), Normalized Cuts and Image Segmentation. IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 22, No 8: 888–905.
Google Scholar
Tiefelsdorf M. (2000), Modeling Spatial Processes: The identification and Analysis of Spatial Relationships in Regression Residuals by Means of Moran’s I, Springer, New-York, USA.
Google Scholar
Torgerson W.S. (1952), Multidimensional Scaling, 1: Theory and Methods. Psychometrika, 17: 401–417.
Google Scholar
Whittle P. (1954), On stationary processes in the plane. Biometrika 41: 434–449.
Google Scholar
Downloads
Published
How to Cite
Issue
Section
License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.