Search Mailing List Archives
[theory-seminar] Paris's Thesis Defense May 30th, 3pm, Gates 104
Paris Syminelakis
psimin at stanford.edu
Fri May 24 09:02:00 PDT 2019
Hi theory folks,
I would like to invite you to my thesis defense taking place next Thursday May 30th, 3pm at Gates 104. Details follow.
Hope to see you there!
-- Paris
University Ph.D. Dissertation Defense
Department of Electrical Engineering
Paris Syminelakis
Kernel Evaluation in High Dimensions:
Importance Sampling and Nearest Neighbor Search
Advisor: Prof. Moses Charikar
Date: Thursday, May 30, 2019
Time: 3:00 pm (refreshments at 2:45 pm)
Location: Gates 104
High-dimensional datasets arise in a multitude of ways: naturally, due to the need to represent complex objects (images, audio), and synthetically, due to the need to represent complex relationships between objects through vector embeddings (semantic relationships e.g. GloVe, class labels e.g. ImageNet). At the same time, these datasets are of massive scale on the order of million points and even executing simple computational tasks, like evaluating a predictive model, for all points in the dataset can be time consuming. These considerations along with the fact that such computational primitives are building blocks for applications that are becoming ubiquitous in everyday life raise the need for efficient and reliable algorithms.
In this thesis, we study the problem of fast kernel evaluation that underlies all kernel-based predictive models. Given a kernel function k(x,y) that takes as input two vectors and outputs a number in [0,1], we ask the question of how fast we can output an approximation to the average value of the kernel function (kernel density) between a query vector q and all points in a dataset. This problem has a long history of study but before our work under worst case assumptions all methods either run in time exponential in the dimension or linear in the number of points in the dataset. We develop a new set of techniques based on randomized space partitions that result in algorithms that can provably approximate the kernel density in sub-linear time for a wide range of kernel functions. One of the main themes of our work is designing low-variance importance sampling schemes through hashing. We complement our positive algorithmic results with impossibility results showing that obtaining significantly faster algorithms for this problem will either refute popular computational complexity conjectures or result in faster algorithms for nearest neighbor search.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.stanford.edu/pipermail/theory-seminar/attachments/20190524/d7efd827/attachment.html>
More information about the theory-seminar
mailing list