Search Mailing List Archives


Limit search to: Subject & Body Subject Author
Sort by: Reverse Sort
Limit to: All This Week Last Week This Month Last Month
Select Date Range     through    

[theory-seminar] [info_theory_forum] Course announcement: EE378C (Information-theoretic Lower Bounds in Data Science)

Yanjun Han yjhan at stanford.edu
Tue Mar 30 18:43:48 PDT 2021


Update: there was mistakenly an enrollment cap at 30 which prevents new enrollment in the system. We have lifted the cap and it should work now.

On Mar 26, 2021, at 3:26 PM, Yanjun Han <yjhan at stanford.edu<mailto:yjhan at stanford.edu>> wrote:

EE378C: Information-theoretic Lower Bounds in Data Science

"Grant me the serenity to accept the things I cannot change, courage to change the things I can, and wisdom to know the difference," Reinhold Niebuhr (1892–1971)

We are teaching a new class this spring, on the science and the art of the impossible. In this course, we will provide a diverse but unified set of ideas and tools to establish impossibility results (science), and use a rich set of interdisciplinary examples to show how to choose from and apply these ideas (art). Establishing impossibility results is a common task in various fields of data science; for example, what is the smallest error one could achieve in a classification problem? How many iterations do we need to approach the optimal solution in optimization? How much computation/communication/memory do we need to learn the distribution of the data? What is the optimal data querying/collection scheme in online and reinforcement learning? What is the best representation/coding of a message in a given communication channel?

This course aims to explore the use of information-theoretic lower bounds in data science. From the theory side, we will provide an extensive set of ideas and tools for establishing such bounds, providing a unified exposition of seemingly disparate ideas from different fields and problems. From the applied side, we will illustrate the efficacy of these ideas in a wide range of problems - spanning machine learning, statistics, information theory, theoretical computer science, optimization, online learning and bandits, operations research, and more - and provide guidelines for choosing and using the appropriate set of tools for a given problem.

Tools and examples, both in breadth and depth, are the core elements of this course, and they will be considered in a multitude of data scientific contexts.

Course website: https://web.stanford.edu/class/ee378c/

Content:

Two big ideas, i.e. reduction and testing, on establishing lower bounds, and several special topics.

Reduction: statistical decision theory, deficiency, Le Cam’s distance, asymptotic equivalence, limit of experiments, local asymptotic normality, Hajek-Le Cam classical asymptotics, parametric submodel, statistical-computational tradeoff

Testing: f-divergence, joint range, Le Cam’s two-point method, simple against composite hypotheses, Ingster-Suslina method, fuzzy hypothesis testing, orthogonal polynomial, moment matching, testing multiple hypotheses, Fano, Assouad, Global Fano, packing and covering bounds

Special topics: communication/privacy constrained estimation/testing, adaptation lower bounds, sequential experimental design, min-max and max-min formulation, compression-based arguments, geometric arguments, strong converses, multi-terminal information theory

Examples/applications: communication complexity, nonparametric density estimation and regression, Poissonization, asymptotic efficiency, planted clique, sparse PCA, bandits and online learning, (generalized) uniformity and identity testing, functional estimation, Gaussian mixture model, method of moments, theory of aggregation, optimality of VC dimension, oracle complexity of (stochastic) optimization, isotonic and convex regression, log-concave density estimation, sparse linear model, dynamic pricing, learning with limited rounds of adaptivity, optimality of Johnson-Lindenstrauss, universal source coding, density estimation under TV/Hellinger/KL divergence

Prerequisites:

EE 278, or CS 229T, or STATS 300A, or equivalent, or Instructor’s permission. This course will be mostly self-contained, but requires mathematical maturity at a graduate level. Background in statistics, information theory, machine learning, and/or optimization is recommended.

Logistics:

The course will cover a wide range of ideas, tools, and examples of proving impossibility results. Grading will be based primarily on 3-4 homework sets and a final literature review. A reading list is offered on the course website, but students are welcome to propose their own choice of papers or research projects.

Instructors: Yanjun Han, Ayfer Ozgur, and Tsachy Weissman

Time: Mon, Wed 10:00 AM - 11:20 AM
_______________________________________________
information_theory_forum mailing list
information_theory_forum at lists.stanford.edu<mailto:information_theory_forum at lists.stanford.edu>
https://mailman.stanford.edu/mailman/listinfo/information_theory_forum

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.stanford.edu/pipermail/theory-seminar/attachments/20210331/9e7cbd70/attachment.html>


More information about the theory-seminar mailing list