Hi all,
In our final theory lunch of the quarter (!), Ofir will present "Sampling Sketches for Concave Sublinear Functions of Frequencies." As always, please join us Thursday from noon to 1 pm in Gates 463A!
----------------------------------------
Abstract:
We consider datasets that consist of elements that are key-value pairs, and our goal is to compute estimates of statistics or aggregates over the data. The contribution of each key is weighted by a function of its frequency (sum of values of its elements). This fundamental problem has a wealth of applications in data analytics and machine learning. A common approach is to maintain a sample and compute the statistics using the sample. One simple way to compute such a sample is to first aggregate the raw data to produce a table of keys and their frequencies and then apply a weighted sampling scheme. This aggregation however is too costly on massive distributed datasets with a large number of distinct keys.
An ideal sampling scheme, which allows for low-variance estimates, samples keys with probabilities proportional to their contributions. These probabilities depend on the function that is applied to the frequency of each key to compute its contribution. Our main contribution is the design of composable sampling sketches that can be tailored to any concave sublinear function of the frequencies and provide statistical guarantees on estimation quality that are very close to that of an ideal sample computed over aggregated data. Concave sublinear functions are commonly used to mitigate the disproportionate effect of keys with high frequency, and include capping functions min{x,T} (for a constant T), the moments x^p for 0
From rmhulett at stanford.edu Tue Dec 4 13:08:03 2018
From: rmhulett at stanford.edu (Reyna Marie Hulett)
Date: Tue, 4 Dec 2018 21:08:03 +0000
Subject: [theory-seminar] Theory Lunch -- Ofir Geri
Message-ID:
Hi all,
In our final theory lunch of the quarter (!), Ofir will present "Sampling Sketches for Concave Sublinear Functions of Frequencies." As always, please join us Thursday from noon to 1 pm in Gates 463A!
----------------------------------------
Abstract:
We consider datasets that consist of elements that are key-value pairs, and our goal is to compute estimates of statistics or aggregates over the data. The contribution of each key is weighted by a function of its frequency (sum of values of its elements). This fundamental problem has a wealth of applications in data analytics and machine learning. A common approach is to maintain a sample and compute the statistics using the sample. One simple way to compute such a sample is to first aggregate the raw data to produce a table of keys and their frequencies and then apply a weighted sampling scheme. This aggregation however is too costly on massive distributed datasets with a large number of distinct keys.
An ideal sampling scheme, which allows for low-variance estimates, samples keys with probabilities proportional to their contributions. These probabilities depend on the function that is applied to the frequency of each key to compute its contribution. Our main contribution is the design of composable sampling sketches that can be tailored to any concave sublinear function of the frequencies and provide statistical guarantees on estimation quality that are very close to that of an ideal sample computed over aggregated data. Concave sublinear functions are commonly used to mitigate the disproportionate effect of keys with high frequency, and include capping functions min{x,T} (for a constant T), the moments x^p for 0
From ofirgeri at stanford.edu Wed Dec 5 12:49:34 2018
From: ofirgeri at stanford.edu (Ofir Geri)
Date: Wed, 5 Dec 2018 20:49:34 +0000
Subject: [theory-seminar] Two Theory Seminars This Week: Saeed Seddighin
on 12/5 and Sam Hopkins on 12/7
In-Reply-To:
References:
Message-ID:
Reminder: Saeed Seddighin's talk is today at 3:00 PM in Gates 463A.
________________________________
From: Ofir Geri
Sent: Monday, December 3, 2018 12:35:35 PM
To: thseminar at cs.stanford.edu
Subject: Two Theory Seminars This Week: Saeed Seddighin on 12/5 and Sam Hopkins on 12/7
Hi all,
This week we will have two theory seminars:
1. Saeed Seddighin (University of Maryland) on Wednesday 12/5, 3:00 PM in Gates 463A.
2. Sam Hopkins (UC Berkeley) on Friday 12/7, 3:00 PM in Gates 392.
Please see the abstracts below. If you are interested in meeting with Sam Hopkins, please email Mary at marykw at stanford.edu
Hope to see you there!
Ofir
Fast and Parallel Algorithms for Edit Distance and Longest Common Subsequence
Wednesday 12/5, 3:00 PM, Gates 463A
Speaker: Saeed Seddighin (University of Maryland)
String similarity measures are among the most fundamental problems in computer science. The notable examples are edit distance (ED) and longest common subsequence (LCS). These problems find their applications in various contexts such as computational biology, text processing, compiler optimization, data analysis, image analysis, etc. In this talk, I'll present fast and parallel algorithms for both problems. In the first part of my talk, I will present an algorithm for approximating edit distance within a constant factor in truly subquadratic time. This question has been open for 3 decades and only recently we were able to give positive answers to it.
In the second part of my talk, I will present MPC algorithms for both edit distance and longest common subsequence. These algorithms can be seen as extensions of the previous ideas to the MPC model. The algorithms are optimal with respect to round complexity, time complexity, and approximation factor.
Mean Estimation with Sub-Gaussian Rates in Polynomial Time
Friday 12/7, 3:00 PM, Gates 392
Speaker: Sam Hopkins (UC Berkeley)
We study polynomial-time algorithms for a fundamental statistics problem: estimating the mean of a random vector from i.i.d. samples. Focusing on the heavy-tailed case, we assume only that the random vector X has finite mean and covariance. In this setting, the radius of confidence intervals achieved by the empirical mean are large compared to the case that X is Gaussian or sub-Gaussian. On the other hand, estimators based on high-dimensional medians can achieve tighter confidence intervals, at the cost of potential computational intractability.
We offer the first polynomial time algorithm to estimate the mean with sub-Gaussian-size confidence intervals under such mild assumptions. Our algorithm is based on a new semidefinite programming relaxation of a high-dimensional median. Previous estimators which assumed only existence of finitely-many moments of X either sacrifice sub-Gaussian performance or are only known to be computable via brute-force search procedures requiring time exponential in the dimension.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From ccanonne at cs.stanford.edu Fri Dec 7 06:22:37 2018
From: ccanonne at cs.stanford.edu (=?UTF-8?Q?Cl=c3=a9ment_Canonne?=)
Date: Fri, 7 Dec 2018 06:22:37 -0800
Subject: [theory-seminar] TCS+ talk: Wednesday, December 12, Julia Chuzhoy,
TTIC
Message-ID: <1fffabca-b092-9c67-f608-599406747871@cs.stanford.edu>
Hi everyone,
For the next TCS+ talk*, and last of the Fall season, Julia Chuzhoy will
be speaking about an "Almost Polynomial Hardness of Node-Disjoint Paths
in Grids."
Come next Wednesday (12th) at 10am (actually, come at 9:55 for
breakfast) to see it!
Best,
-- Cl?ment
* for the people in the back: this is an online, interactive talk we can
all watch from Gates while sipping coffee and asking questions to the
speaker.
-------------------------------
Speaker: Julia Chuzhoy (TTIC)
Title: Almost Polynomial Hardness of Node-Disjoint Paths in Grids
Abstract: In the classical Node-Disjoint Paths (NDP) problem, we are
given an n-vertex graph G, and a collection of pairs of its vertices,
called demand pairs. The goal is to route as many of the demand pairs as
possible, where to route a pair we need to select a path connecting it,
so that all selected paths are disjoint in their vertices.
The best current algorithm for NDP achieves an
$O(\sqrt{n})$-approximation, while, until recently, the best negative
result was a roughly $\Omega(\sqrt{\log n})$-hardness of approximation.
Recently, an improved $2^{\Omega(\sqrt{\log n})}$-hardness of
approximation for NDP was shown, even if the underlying graph is a
subgraph of a grid graph, and all source vertices lie on the boundary of
the grid. Unfortunately, this result does not extend to grid graphs.
The approximability of NDP in grids has remained a tantalizing open
question, with the best upper bound of $\tilde{O}(n^{1/4})$, and the
best lower bound of APX-hardness. In this talk we come close to
resolving this question, by showing an almost polynomial hardness of
approximation for NDP in grid graphs.
Our hardness proof performs a reduction from the 3COL(5) problem to NDP,
using a new graph partitioning problem as a proxy. Unlike the more
standard approach of employing Karp reductions to prove hardness of
approximation, our proof is a Cook-type reduction, where, given an
input instance of 3COL(5), we produce a large number of instances of
NDP, and apply an approximation algorithm for NDP to each of them. The
construction of each new instance of NDP crucially depends on the
solutions to the previous instances that were found by the approximation
algorithm.
Joint work with David H.K. Kim and Rachit Nimavat.
From ofirgeri at stanford.edu Fri Dec 7 10:46:44 2018
From: ofirgeri at stanford.edu (Ofir Geri)
Date: Fri, 7 Dec 2018 18:46:44 +0000
Subject: [theory-seminar] Two Theory Seminars This Week: Saeed Seddighin
on 12/5 and Sam Hopkins on 12/7
In-Reply-To:
References:
Message-ID:
Reminder: Sam Hopkins' talk is today at 3:00 PM in Gates 392 (note the non-standard location).
________________________________
From: Ofir Geri
Sent: Monday, December 3, 2018 12:35:35 PM
To: thseminar at cs.stanford.edu
Subject: Two Theory Seminars This Week: Saeed Seddighin on 12/5 and Sam Hopkins on 12/7
Hi all,
This week we will have two theory seminars:
1. Saeed Seddighin (University of Maryland) on Wednesday 12/5, 3:00 PM in Gates 463A.
2. Sam Hopkins (UC Berkeley) on Friday 12/7, 3:00 PM in Gates 392.
Please see the abstracts below. If you are interested in meeting with Sam Hopkins, please email Mary at marykw at stanford.edu
Hope to see you there!
Ofir
Fast and Parallel Algorithms for Edit Distance and Longest Common Subsequence
Wednesday 12/5, 3:00 PM, Gates 463A
Speaker: Saeed Seddighin (University of Maryland)
String similarity measures are among the most fundamental problems in computer science. The notable examples are edit distance (ED) and longest common subsequence (LCS). These problems find their applications in various contexts such as computational biology, text processing, compiler optimization, data analysis, image analysis, etc. In this talk, I'll present fast and parallel algorithms for both problems. In the first part of my talk, I will present an algorithm for approximating edit distance within a constant factor in truly subquadratic time. This question has been open for 3 decades and only recently we were able to give positive answers to it.
In the second part of my talk, I will present MPC algorithms for both edit distance and longest common subsequence. These algorithms can be seen as extensions of the previous ideas to the MPC model. The algorithms are optimal with respect to round complexity, time complexity, and approximation factor.
Mean Estimation with Sub-Gaussian Rates in Polynomial Time
Friday 12/7, 3:00 PM, Gates 392
Speaker: Sam Hopkins (UC Berkeley)
We study polynomial-time algorithms for a fundamental statistics problem: estimating the mean of a random vector from i.i.d. samples. Focusing on the heavy-tailed case, we assume only that the random vector X has finite mean and covariance. In this setting, the radius of confidence intervals achieved by the empirical mean are large compared to the case that X is Gaussian or sub-Gaussian. On the other hand, estimators based on high-dimensional medians can achieve tighter confidence intervals, at the cost of potential computational intractability.
We offer the first polynomial time algorithm to estimate the mean with sub-Gaussian-size confidence intervals under such mild assumptions. Our algorithm is based on a new semidefinite programming relaxation of a high-dimensional median. Previous estimators which assumed only existence of finitely-many moments of X either sacrifice sub-Gaussian performance or are only known to be computable via brute-force search procedures requiring time exponential in the dimension.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From ccanonne at stanford.edu Tue Dec 11 15:57:05 2018
From: ccanonne at stanford.edu (Clement Louis Arthur Canonne)
Date: Tue, 11 Dec 2018 23:57:05 +0000
Subject: [theory-seminar] TCS+ talk: Wednesday, December 12,
Julia Chuzhoy, TTIC
Message-ID: <93c9e0ae-aa8a-4636-8517-001b8f682d10@email.android.com>
Reminder: this is tomorrow, usual room (4th floor)!
-- Cl?ment
On Dec 7, 2018 6:22 AM, Cl?ment Canonne wrote:
Hi everyone,
For the next TCS+ talk*, and last of the Fall season, Julia Chuzhoy will
be speaking about an "Almost Polynomial Hardness of Node-Disjoint Paths
in Grids."
Come next Wednesday (12th) at 10am (actually, come at 9:55 for
breakfast) to see it!
Best,
-- Cl?ment
* for the people in the back: this is an online, interactive talk we can
all watch from Gates while sipping coffee and asking questions to the
speaker.
-------------------------------
Speaker: Julia Chuzhoy (TTIC)
Title: Almost Polynomial Hardness of Node-Disjoint Paths in Grids
Abstract: In the classical Node-Disjoint Paths (NDP) problem, we are
given an n-vertex graph G, and a collection of pairs of its vertices,
called demand pairs. The goal is to route as many of the demand pairs as
possible, where to route a pair we need to select a path connecting it,
so that all selected paths are disjoint in their vertices.
The best current algorithm for NDP achieves an
$O(\sqrt{n})$-approximation, while, until recently, the best negative
result was a roughly $\Omega(\sqrt{\log n})$-hardness of approximation.
Recently, an improved $2^{\Omega(\sqrt{\log n})}$-hardness of
approximation for NDP was shown, even if the underlying graph is a
subgraph of a grid graph, and all source vertices lie on the boundary of
the grid. Unfortunately, this result does not extend to grid graphs.
The approximability of NDP in grids has remained a tantalizing open
question, with the best upper bound of $\tilde{O}(n^{1/4})$, and the
best lower bound of APX-hardness. In this talk we come close to
resolving this question, by showing an almost polynomial hardness of
approximation for NDP in grid graphs.
Our hardness proof performs a reduction from the 3COL(5) problem to NDP,
using a new graph partitioning problem as a proxy. Unlike the more
standard approach of employing Karp reductions to prove hardness of
approximation, our proof is a Cook-type reduction, where, given an
input instance of 3COL(5), we produce a large number of instances of
NDP, and apply an approximation algorithm for NDP to each of them. The
construction of each new instance of NDP crucially depends on the
solutions to the previous instances that were found by the approximation
algorithm.
Joint work with David H.K. Kim and Rachit Nimavat.
_______________________________________________
theory-seminar mailing list
theory-seminar at lists.stanford.edu
https://mailman.stanford.edu/mailman/listinfo/theory-seminar
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From ccanonne at stanford.edu Wed Dec 12 10:20:09 2018
From: ccanonne at stanford.edu (Clement Louis Arthur Canonne)
Date: Wed, 12 Dec 2018 18:20:09 +0000
Subject: [theory-seminar] TCS+ talk: Wednesday, December 12,
Julia Chuzhoy, TTIC
In-Reply-To: <93c9e0ae-aa8a-4636-8517-001b8f682d10@email.android.com>
References: <93c9e0ae-aa8a-4636-8517-001b8f682d10@email.android.com>
Message-ID: <107b02bc-a332-4fc1-b6e2-36710f357837@email.android.com>
In case you are late but interested, we moved to Gates 496 (given the numbers).
-- Cl?ment
On Dec 11, 2018 3:57 PM, Clement Louis Arthur Canonne wrote:
Reminder: this is tomorrow, usual room (4th floor)!
-- Cl?ment
On Dec 7, 2018 6:22 AM, Cl?ment Canonne wrote:
Hi everyone,
For the next TCS+ talk*, and last of the Fall season, Julia Chuzhoy will
be speaking about an "Almost Polynomial Hardness of Node-Disjoint Paths
in Grids."
Come next Wednesday (12th) at 10am (actually, come at 9:55 for
breakfast) to see it!
Best,
-- Cl?ment
* for the people in the back: this is an online, interactive talk we can
all watch from Gates while sipping coffee and asking questions to the
speaker.
-------------------------------
Speaker: Julia Chuzhoy (TTIC)
Title: Almost Polynomial Hardness of Node-Disjoint Paths in Grids
Abstract: In the classical Node-Disjoint Paths (NDP) problem, we are
given an n-vertex graph G, and a collection of pairs of its vertices,
called demand pairs. The goal is to route as many of the demand pairs as
possible, where to route a pair we need to select a path connecting it,
so that all selected paths are disjoint in their vertices.
The best current algorithm for NDP achieves an
$O(\sqrt{n})$-approximation, while, until recently, the best negative
result was a roughly $\Omega(\sqrt{\log n})$-hardness of approximation.
Recently, an improved $2^{\Omega(\sqrt{\log n})}$-hardness of
approximation for NDP was shown, even if the underlying graph is a
subgraph of a grid graph, and all source vertices lie on the boundary of
the grid. Unfortunately, this result does not extend to grid graphs.
The approximability of NDP in grids has remained a tantalizing open
question, with the best upper bound of $\tilde{O}(n^{1/4})$, and the
best lower bound of APX-hardness. In this talk we come close to
resolving this question, by showing an almost polynomial hardness of
approximation for NDP in grid graphs.
Our hardness proof performs a reduction from the 3COL(5) problem to NDP,
using a new graph partitioning problem as a proxy. Unlike the more
standard approach of employing Karp reductions to prove hardness of
approximation, our proof is a Cook-type reduction, where, given an
input instance of 3COL(5), we produce a large number of instances of
NDP, and apply an approximation algorithm for NDP to each of them. The
construction of each new instance of NDP crucially depends on the
solutions to the previous instances that were found by the approximation
algorithm.
Joint work with David H.K. Kim and Rachit Nimavat.
_______________________________________________
theory-seminar mailing list
theory-seminar at lists.stanford.edu
https://mailman.stanford.edu/mailman/listinfo/theory-seminar
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From marykw at stanford.edu Wed Dec 12 20:20:12 2018
From: marykw at stanford.edu (Mary Wootters)
Date: Thu, 13 Dec 2018 06:20:12 +0200
Subject: [theory-seminar] Advances in Asymptotic Probability at Stanford,
Dec. 13-17
Message-ID:
Hi all,
Some of you might be interested in this conference at Stanford starting
December 13: https://sites.google.com/view/amirdembo60/
Best,
Mary
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From ccanonne at cs.stanford.edu Fri Dec 14 10:09:10 2018
From: ccanonne at cs.stanford.edu (=?UTF-8?Q?Cl=c3=a9ment_Canonne?=)
Date: Fri, 14 Dec 2018 10:09:10 -0800
Subject: [theory-seminar] Workshops on Privacy and Geomtric Polynomials this
Spring at Simons
In-Reply-To:
References:
Message-ID:
Hi everyone,
For those interested, the Simons institute will be hosting quite a few
interesting workshops on either differential privacy and geometry of
polynomials this Spring, as part of their eponymous semester-long
programs. If you're interested in some, consider registering -- it's
free, and it's always good to have a fancy name tag.
Best,
-- Cl?ment
-------- Forwarded Message --------
Subject: Upcoming events at the Simons Institute, December 2018
Date: Fri, 14 Dec 2018 18:00:33 +0000
From: Simons Institute for the Theory of Computing
Reply-To: Simons Institute for the Theory of Computing
To: ccanonne at cs.stanford.edu
Upcoming events at the Simons Institute, December 2018
sim-nwl-topbanner.jpg
Upcoming Events | December 2018 Read online
?
Workshops & Symposia
Workshops and boot camps take place in the Calvin Lab auditorium.
Registration for these events is required. Space may be limited, and
you are advised to register early. Registration for each event opens
approximately ten weeks in advance; please see the event page for
details.
Geometry of Polynomials Boot Camp
Jan. 22???Jan. 25, 2019
Data Privacy: Foundations and Applications Boot Camp
Jan. 28???Feb. 1, 2019
Beyond Randomized Rounding and the Probabilistic Method
Feb. 11???Feb. 15, 2019
>From Foundations to Applications
Mar. 4???Mar. 8, 2019
Deterministic Counting, Probability, and Zeros of Partition Functions
Mar. 18???Mar. 22, 2019
Privacy and the Science of Data Analysis
Apr. 8???Apr. 12, 2019
Hyperbolic Polynomials and Hyperbolic Programming
Apr. 30???May 3, 2019
Beyond Differential Privacy
May 6???May 10, 2019
Open Lectures & Other Events
These events are aimed at a broad scientific audience. Registration
is not required.
Theoretically Speaking Series
Manuela Veloso
(Carnegie Mellon University)
Goldman Theater, David Brower Center
Feb. 6, 2019 6:00 pm ? 7:30 pm
Theoretically Speaking Series
The Brain as a Prediction Machine
Anil Ananthaswamy
(Simons Institute), Celeste Kidd
(UC Berkeley), Christos Papadimitriou
(Columbia University) and Michael Pollan
(UC Berkeley)
Goldman Theater, David Brower Center
Mar. 7, 2019 6:00 pm ? 7:30 pm
Theoretically Speaking Series
Silvio Micali
(Massachusetts Institute of Technology)
Goldman Theater, David Brower Center
Mar. 27, 2019 6:00 pm ? 7:30 pm
Current & Future Research Programs
Spring 2019
Data Privacy: Foundations and Applications
Geometry of Polynomials
Summer 2019
Foundations of Deep Learning
Summer Cluster: Fairness
Summer Cluster: Error-Correcting Codes and High-Dimensional Expansion
Fall 2019
Online and Matching-Based Market Design
Proofs, Consensus, and Decentralizing Society
Spring 2020
Lattices: Algorithms, Complexity and Cryptography
The Quantum Wave in Computing
/Copyright ? 2018 Simons Institute for the Theory of Computing, All
rights reserved./
We send our event info to all supporters of the Simons Institute.
facebook_circle_color_128_simons_blue.png
youtube_circle_color_128_simons_blue.png
*https://simons.berkeley.edu
Our mailing address is:*
Simons Institute for the Theory of Computing
121 Calvin Laboratory, #2190
Berkeley, CA 94720-2190
Add us to your address book
This email was sent to ccanonne at cs.stanford.edu
/why did I get this?/
unsubscribe from this list
update subscription preferences
Simons Institute for the Theory of Computing ? 121 Calvin Laboratory,
#2190 ? Berkeley, CA 94720-2190 ? USA
-------------- next part --------------
An HTML attachment was scrubbed...
URL: