A Statistical Toolbox For Mining And Modeling Spatial Data

Authors

  • Gérard d’Aubigny University of Grenoble-Alpes, Jean-Kuntzmann Laboratory

DOI:

https://doi.org/10.1515/cer-2016-0035

Keywords:

duality diagram, spatial autocorrelation, Moran’index, Moran’s Eigenvector Maps, Laplace operator, spatial eigenfunction filtering

Abstract

Most data mining projects in spatial economics start with an evaluation of a set of attribute variables on a sample of spatial entities, looking for the existence and strength of spatial autocorrelation, based on the Moran’s and the Geary’s coefficients, the adequacy of which is rarely challenged, despite the fact that when reporting on their properties, many users seem likely to make mistakes and to foster confusion. My paper begins by a critical appraisal of the classical definition and rational of these indices. I argue that while intuitively founded, they are plagued by an inconsistency in their conception. Then, I propose a principled small change leading to corrected spatial autocorrelation coefficients, which strongly simplifies their relationship, and opens the way to an augmented toolbox of statistical methods of dimension reduction and data visualization, also useful for modeling purposes. A second section presents a formal framework, adapted from recent work in statistical learning, which gives theoretical support to our definition of corrected spatial autocorrelation coefficients. More specifically, the multivariate data mining methods presented here, are easily implementable on the existing (free) software, yield methods useful to exploit the proposed corrections in spatial data analysis practice, and, from a mathematical point of view, whose asymptotic behavior, already studied in a series of papers by Belkin & Niyogi, suggests that they own qualities of robustness and a limited sensitivity to the Modifiable Areal Unit Problem (MAUP), valuable in exploratory spatial data analysis.

Downloads

Download data is not yet available.

References

Anselin L. (1988), Spatial Econometrics: Methods and Models, Kluwer Academic Publishers, Dordrecht, The Netherland.
Google Scholar

Anselin L. (1995), Local indicators of spatial association - LISA. Geographical Systems, 3: 1–13.
Google Scholar

Anselin L. & Rey S.J. (2014), Modern Spatial Econometrics in Practice, GeoDa Press LLC, Chicago IL, USA.
Google Scholar

Aubigny (Drouet d’) G. (1989), L’Analyse Multidimensionnelle des Données de Dissimilarité, Thèse de Doctorat d’état es Sciences Mathématiques, Université Joseph Fourier – Grenoble I, France.
Google Scholar

Aubigny (d’) G. (2006), Dépendance spatiale et auto-corrélation, in J.~J. Droesbeke, M. Lejeune & G. Saporta (Eds.), Analyse Statistique des Données Spatiales, Editions TECHNIP, Paris, France: Chap 2: 17–45.
Google Scholar

Aubigny (d’) G. (2009), The Analysis of Proximity Data, in Govaert G. (Ed.), Data Analysis, John Wiley & sons Inc., Hoboken, USA: Chap 4: 93–147.
Google Scholar

Aubigny (d’) G. (2012), Analyse contextuelle et modélisations multiniveaux des Données Electorales. Coordinateur principal, Action Concertée Incitative ʻTerrains, Techniques, Theorie: travail interdisciplinaire en Sciences Humaines et Socialesʼ. Rapport de fin de projet, Grenoble, France. 148 pages.
Google Scholar

Aubigny (d’) C. & Aubigny (d’) G. (2009), New LISA indices for spatio-temporal Data Mining, XVI-èmes Rencontres de la Société Francophone de Classification, Grenoble, 2–4 Septembre, France.
Google Scholar

Bapat R.B. (2010), Graphs and Matrices, Springer, New York, USA.
Google Scholar

Belkin M. & Niyogi P. (2001), Laplacian Eigenmaps and Spectral techniques for Embedding and Clustering. Advances in Neural Information Processing Systems, 595–591.
Google Scholar

Belkin M. & Niyogi P. (2003), Laplacian Eigenmaps for Dimensionality Reduction and Data. Neural Computation, Vol. 15, No 6: 1373–1396.
Google Scholar

Belkin M., Sun J. & Wang Y. (2009), Constructing Laplace Operator from Point Clouds in . In Proceedings of the Symposium on Discrete Algorithms, 1031–1040.
Google Scholar

Besag J. (1974), Spatial interaction and the statistical analysis of lattice systems. Journal of the Royal Statistical Society Series B 36:192–236.
Google Scholar

Bollobas B. (1990), Modern Graph Theory, Springer, New-York, USA.
Google Scholar

Borcard D. & Legendre P. (2002), All-scale spatial analysis of ecological data by means of principal coordinates of neighbour matrices. Ecological Modelling 153 : 51–68.
Google Scholar

Cailliez F. & Pages A.J. (1976), Introduction à l’Analyse des Données, SMASH, Paris, France.
Google Scholar

Chessel D. & Mercier P. (1993), Couplage de triplets statistiques et liaisons espèce environnement. In: Biométrie et environnement. J.D. Lebreton et B. Asselin (Eds.), Masson, Paris, France, 1993.
Google Scholar

Chung F.R.K. (1997), Spectral Graphs Theory, American math. Society Ed., CBMS 92, USA.
Google Scholar

Cliff A.D. & Ord J.K. (1981), Spatial Processses: Models and Applications, Pion Limited, London, UK.
Google Scholar

Doyle P.G. & Snell J.L. (1984), Random Walks and Electric Networks, Carus Mathematical Monographs Number 22, The Mathematical Association of America, Washington D.C, USA.
Google Scholar

Dray S., Chessel D. & Thioulouse J. (2003), Co-inertia Analysis and the Linking of Ecological Data Tables. Ecology 84(11):3078–3089.
Google Scholar

Dray S., Legendre P. & Peres-Neto P.R. (2006), Spatial modelling: a comprehensive framework for principal coordinate analysis of neighbour matrices (PCNM). Ecological Modeling 196: 483–493.
Google Scholar

Escoufier Y. (1987). The Duality Diagram: a means for better practical applications, in Legendre, P. & Legendre L. (Eds.), Developments in Numerical Ecology: NATO ASI Series, Series G: Ecological Sciences, Vol 14. Springer, New-York, USA: 139–156.
Google Scholar

Geary, R.C. (1954), The Contiguity Ratio and Statistical Mapping. The Incorporated Statistician 5: 115–145.
Google Scholar

Getis A. and J.K. Ord (1992). The analysis of spatial association by use of distance statistics. Geographical Analysis, 24: 189-206.
Google Scholar

Gower J.C. (1966), Some distance properties of latent root and vector methods used in multivariate Analysis. Biometrika, 55: 325–388.
Google Scholar

Griffith D.A. (2000), A linear regression solution to the spatial autocorrelation problem. Journal of Geographical Systems 2: 141–156.
Google Scholar

Griffith D.A. (2003), Spatial Autocorrelation and Spatial Filtering: Gaining Understanding Through Theory and Scientific Visualization (Second Edition), Springer, New-York, USA.
Google Scholar

Lebart L. (1969), Analyse statistique de la contiguité. Publications de l'Institut de Statistique de l'Université de Paris, 28, pp. 81–112.
Google Scholar

Legendre P. & Legendre L (2012), Numerical Ecology (Third English Edition), ELSEVIER, Amsterdam, The Netherland.
Google Scholar

Moran P.A.P. (1950), Notes on continuous stochastic phenomena. Biometrika 37:17–23.
Google Scholar

Qiu H. & Hancock E.R. (2007), Clustering and Embedding Using Commute Times. IEEE Trans. On Pattern Analysis and Machine Intelligence, Vol. 22, No 8: 888–905.
Google Scholar

Rosenberg S. (1997), The Laplacian on a Riemannian Manifold, Cambridge University Press, Cambridge, USA.
Google Scholar

Saerens M., Fouss F., Yen L. & Dupont P. (2004), The Principal Components Analysis of a Graph, and its relationships to Spectral Clustering. Proc. 15th European Conference in Machine Learning Vol. 3201: 371–383.
Google Scholar

Shi J. & Malik J. (2000), Normalized Cuts and Image Segmentation. IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 22, No 8: 888–905.
Google Scholar

Tiefelsdorf M. (2000), Modeling Spatial Processes: The identification and Analysis of Spatial Relationships in Regression Residuals by Means of Moran’s I, Springer, New-York, USA.
Google Scholar

Torgerson W.S. (1952), Multidimensional Scaling, 1: Theory and Methods. Psychometrika, 17: 401–417.
Google Scholar

Whittle P. (1954), On stationary processes in the plane. Biometrika 41: 434–449.
Google Scholar

Downloads

Published

2017-03-30

How to Cite

d’Aubigny, G. (2017). A Statistical Toolbox For Mining And Modeling Spatial Data. Comparative Economic Research. Central and Eastern Europe, 19(5), 5–24. https://doi.org/10.1515/cer-2016-0035

Issue

Section

Articles