Search Mailing List Archives


Limit search to: Subject & Body Subject Author
Sort by: Reverse Sort
Limit to: All This Week Last Week This Month Last Month
Select Date Range     through    

[theory-seminar] Paris's Thesis Defense May 30th, 3pm, Gates 104

Paris Syminelakis psimin at stanford.edu
Fri May 24 09:02:00 PDT 2019


Hi theory folks,


I would like to invite you to my thesis defense taking place next Thursday May 30th, 3pm at Gates 104. Details follow.


Hope to see you there!

-- Paris




University Ph.D. Dissertation Defense

Department of Electrical Engineering



Paris Syminelakis



Kernel Evaluation in High Dimensions:
Importance Sampling and Nearest Neighbor Search

Advisor: Prof. Moses Charikar



Date: Thursday, May 30, 2019

Time: 3:00 pm (refreshments at 2:45 pm)

Location: Gates 104



High-dimensional datasets arise in a multitude of ways: naturally, due to the need to represent complex objects (images, audio), and synthetically, due to the need to represent complex relationships between objects through vector embeddings (semantic relationships e.g. GloVe, class labels e.g.  ImageNet). At the same time, these datasets are of massive scale on the order of million points and even executing simple computational tasks, like evaluating a predictive model, for all points in the dataset can be time consuming. These considerations along with the fact that such computational primitives are building blocks for applications that are becoming ubiquitous in everyday life raise the need for efficient and reliable algorithms.



In this thesis, we study the problem of fast kernel evaluation that underlies all kernel-based predictive models. Given a kernel function k(x,y) that takes as input two vectors and outputs a number in [0,1], we ask the question of how fast we  can output an approximation to the average value of the kernel function (kernel density) between a query vector q and all points in a dataset. This problem has a long history of study but before our work under worst case assumptions all methods either run in time exponential in the dimension or linear in the number of points in the dataset. We develop a new set of techniques based on randomized space partitions that result in algorithms that can provably approximate the kernel density in sub-linear time for a wide range of kernel functions. One of the main themes of our work is designing low-variance importance sampling schemes through hashing. We complement our positive algorithmic results with impossibility results showing that obtaining significantly faster algorithms for this problem will either refute popular computational complexity conjectures or result in faster algorithms for nearest neighbor search.



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.stanford.edu/pipermail/theory-seminar/attachments/20190524/d7efd827/attachment.html>


More information about the theory-seminar mailing list