Today's data-driven society is full of large-scale datasets, e.g., images from the
web, sequence data from the human genome, graphs representing friendship
networks, time-series data of stock prices, speech corpora of news broadcasts,
etc. In the context of machine learning, these datasets are often represented by
large matrices representing either a set of real-valued features for each datapoint
or pairwise similarities between datapoints. Hence, modern learning problems in
computer vision, natural language processing, computational biology, and other
areas often face the daunting task of storing and operating on matrices with
thousands to millions of entries. An attractive solution to this problem involves
working with low-rank approximations of the original matrix. Low-rank
approximation is at the core of widely used algorithms such as Principle
Component Analysis, Multidimensional Scaling, Latent Semantic Indexing, and
manifold learning. Furthermore, low-rank matrices appear in a wide variety of
applications including lossy data compression, collaborative filtering, image
processing, text analysis, matrix completion and metric learning.
In the past several years, there has been a growing body of research devoted to developing randomized methods to efficiently and accurately generate low-rank approximations to large matrices. This topic has been studied in a wide range of disciplines including numerical linear algebra, theoretical computer science and machine learning. Although many of these techniques have extensive theoretical guarantees and empirical success on a limited number of datasets, there nonetheless remains a gap between these (mostly theoretical) advances and practical implementations of large-scale machine learning problems. Within the past few years, however, there have been significant advances in making these theoretical tools practical for large-scale machine learning. For example, high- quality numerical implementations have been developed, ensemble methods have been introduced to improve performance, and these techniques have been used to solve outstanding problems in human genetics and computer vision. In this workshop, we aim to survey recent developments with an emphasis on usefulness for practical large-scale machine learning problems and to provide a forum for researchers to discuss several important questions associated with randomized low-rank approximation techniques, including: What are the state-of- the-art approximation techniques? How does the heterogeneity of data affect the randomization aspects of these algorithms? Which methods are appropriate for various machine learning tasks? How do these methods work in practice for large-scale machine learning tasks? What is the tradeoff between numerical precision and time/space efficiency in the context of machine learning performance, e.g., classification or clustering accuracy? In summary, we are interested in exploring the impact of low-rank methods for large-scale machine learning. We will study new algorithms, recent theoretical advances and large- scale empirical results, and more broadly we will motivate additional interesting scenarios for use of low-rank approximations for learning tasks. |