Search Mailing List Archives
[theory-seminar] Theory Lunch -- Ofir Geri
Reyna Marie Hulett
rmhulett at stanford.edu
Tue Dec 4 13:08:03 PST 2018
Hi all,
In our final theory lunch of the quarter (!), Ofir will present "Sampling Sketches for Concave Sublinear Functions of Frequencies." As always, please join us Thursday from noon to 1 pm in Gates 463A!
----------------------------------------
Abstract:
We consider datasets that consist of elements that are key-value pairs, and our goal is to compute estimates of statistics or aggregates over the data. The contribution of each key is weighted by a function of its frequency (sum of values of its elements). This fundamental problem has a wealth of applications in data analytics and machine learning. A common approach is to maintain a sample and compute the statistics using the sample. One simple way to compute such a sample is to first aggregate the raw data to produce a table of keys and their frequencies and then apply a weighted sampling scheme. This aggregation however is too costly on massive distributed datasets with a large number of distinct keys.
An ideal sampling scheme, which allows for low-variance estimates, samples keys with probabilities proportional to their contributions. These probabilities depend on the function that is applied to the frequency of each key to compute its contribution. Our main contribution is the design of composable sampling sketches that can be tailored to any concave sublinear function of the frequencies and provide statistical guarantees on estimation quality that are very close to that of an ideal sample computed over aggregated data. Concave sublinear functions are commonly used to mitigate the disproportionate effect of keys with high frequency, and include capping functions min{x,T} (for a constant T), the moments x^p for 0<p<1, and ln(1+x).
In the talk, we will provide the background and describe the components of our sampling sketch.
Joint work with Edith Cohen.
----------------------------------------
Best,
-Reyna
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.stanford.edu/pipermail/theory-seminar/attachments/20181204/1332ac7b/attachment-0003.html>
More information about the theory-seminar
mailing list