Search Mailing List Archives

Limit search to: Subject & Body Subject Author
Sort by: Reverse Sort
Limit to: All This Week Last Week This Month Last Month
Select Date Range     through    

[theory-seminar] "Adaptivity and Confounding in Multi-Armed Bandit Experiments" – Daniel Russo (Thu, 3-Mar @ 4:00pm)

Tavor Baharav tavorb at
Thu Mar 3 11:39:19 PST 2022

Reminder: this talk (hosted jointly with the Stanford RL Forum) will be
today at 4pm via Zoom (link here
Please join us for snacks at 3:30pm in the Grove outside Packard.

On Mon, Feb 28, 2022 at 11:53 AM Tavor Baharav <tavorb at> wrote:

> Adaptivity and Confounding in Multi-Armed Bandit ExperimentsDaniel Russo
> – Professor, Columbia Business School
> Thu, 3-Mar / 4:00pm / Zoom:
> (Zoom only)
> *Please join us for coffee and snacks at 3:30pm in the Grove outside
> Packard (near Bytes' outdoor seating). The talk will be held on
> Zoom:
> <>*
> Abstract
> Multi-armed bandit algorithms minimize experimentation costs required to
> converge on optimal behavior. They do so by rapidly adapting
> experimentation effort away from poorly performing actions as feedback is
> observed. But this desirable feature makes them sensitive to confounding,
> which is the primary concern underlying classical randomized controlled
> trials. We highlight, for instance, that popular bandit algorithms cannot
> address the problem of identifying the best action when day-of-week effects
> may influence reward observations. In response, this paper proposes
> deconfounded Thompson sampling, which makes simple, but critical,
> modifications to the way Thompson sampling is usually applied. Theoretical
> guarantees suggest the algorithm strikes a delicate balance between
> adaptivity and robustness to confounding. It attains asymptotic lower
> bounds on the number of samples required to confidently identify the best
> action – suggesting optimal adaptivity – but also satisfies strong
> performance guarantees in the presence of day-of-week effects and delayed
> observations – suggesting unusual robustness. At the core of the paper is a
> new model of contextual bandit experiments in which issues of delayed
> learning and distribution shift arise organically.
> Bio
> Daniel Russo is an Associate Professor in the Decision, Risk, and
> Operations division of Columbia Business School. His research focuses on
> problems at the intersection of sequential decision-making and statistical
> machine learning. He completed his PhD at Stanford under the supervision of
> Ben Van Roy.
> *This talk is hosted by the ISL Colloquium
> <>. To receive talk announcements, subscribe
> to the mailing list isl-colloq at
> <>.*
> ------------------------------
> Mailing list:
> This talk:
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the theory-seminar mailing list