הרצאה פומבית Efficient Analysis of High-Dimensional Big Data via Diffusion Geometry Preservation and Matrix Decomposition
Moshe Salhov
Abstract:
The Diffusion Maps (DM) framework is a kernel based method for manifold learning and data analysis that defines diffusion similarities by imposing a Markovian process on the given dataset.
Analysis by this process uncovers the intrinsic geometric structures in the data. Recently, it has been utilized for many modern data analysis applications.
In this talk we describe several methodologies that extend and optimize the DM framework to provide or efficient approximation efficient approximations and algorithms for analyzing high dimensional big data. Furthermore, we introduce DM analysis of data patches (i.e., local data clusters or neighborhoods) instead of processing individual data points. The defined affinities incorporate information about the dominant tangential directions in these patches together with their geometric positions on the manifold. Finally, we will propose an alternative to a non-parametric kernel method approach for obtaining data representations via spectral decompositions of a big kernel operator or matrix with finite settings. The presentation of our approach is based on the Measure-based Gaussian Correlation (MGC) diffusion kernel and on the resulting measure-based DM embedding obtained by its decomposition. We will show that when the underlying measure is modeled by a GMM, an equivalent embedding, which preserves the diffusion geometry of the data, can be computed without the need to decompose the full kernel.
This seminar presents part of the speaker's Ph.D. thesis under the same name, carried out under the supervision of Prof. Amir Averbuch.