Search Mailing List Archives


Limit search to: Subject & Body Subject Author
Sort by: Reverse Sort
Limit to: All This Week Last Week This Month Last Month
Select Date Range     through    

[theory-seminar] "Adaptivity and Confounding in Multi-Armed Bandit Experiments" – Daniel Russo (Thu, 3-Mar @ 4:00pm)

Tavor Baharav tavorb at stanford.edu
Thu Mar 3 11:39:19 PST 2022


Reminder: this talk (hosted jointly with the Stanford RL Forum) will be
today at 4pm via Zoom (link here
<https://stanford.zoom.us/meeting/register/tJckfuCurzkvEtKKOBvDCrPv3McapgP6HygJ>
).
Please join us for snacks at 3:30pm in the Grove outside Packard.

On Mon, Feb 28, 2022 at 11:53 AM Tavor Baharav <tavorb at stanford.edu> wrote:

> Adaptivity and Confounding in Multi-Armed Bandit ExperimentsDaniel Russo
> – Professor, Columbia Business School
>
> Thu, 3-Mar / 4:00pm / Zoom:
> https://stanford.zoom.us/meeting/register/tJckfuCurzkvEtKKOBvDCrPv3McapgP6HygJ
> (Zoom only)
>
> *Please join us for coffee and snacks at 3:30pm in the Grove outside
> Packard (near Bytes' outdoor seating). The talk will be held on
> Zoom: https://stanford.zoom.us/meeting/register/tJckfuCurzkvEtKKOBvDCrPv3McapgP6HygJ
> <https://stanford.zoom.us/meeting/register/tJckfuCurzkvEtKKOBvDCrPv3McapgP6HygJ>*
> Abstract
>
> Multi-armed bandit algorithms minimize experimentation costs required to
> converge on optimal behavior. They do so by rapidly adapting
> experimentation effort away from poorly performing actions as feedback is
> observed. But this desirable feature makes them sensitive to confounding,
> which is the primary concern underlying classical randomized controlled
> trials. We highlight, for instance, that popular bandit algorithms cannot
> address the problem of identifying the best action when day-of-week effects
> may influence reward observations. In response, this paper proposes
> deconfounded Thompson sampling, which makes simple, but critical,
> modifications to the way Thompson sampling is usually applied. Theoretical
> guarantees suggest the algorithm strikes a delicate balance between
> adaptivity and robustness to confounding. It attains asymptotic lower
> bounds on the number of samples required to confidently identify the best
> action – suggesting optimal adaptivity – but also satisfies strong
> performance guarantees in the presence of day-of-week effects and delayed
> observations – suggesting unusual robustness. At the core of the paper is a
> new model of contextual bandit experiments in which issues of delayed
> learning and distribution shift arise organically.
> Bio
>
> Daniel Russo is an Associate Professor in the Decision, Risk, and
> Operations division of Columbia Business School. His research focuses on
> problems at the intersection of sequential decision-making and statistical
> machine learning. He completed his PhD at Stanford under the supervision of
> Ben Van Roy.
>
> *This talk is hosted by the ISL Colloquium
> <https://isl.stanford.edu/talks/>. To receive talk announcements, subscribe
> to the mailing list isl-colloq at lists.stanford.edu
> <https://mailman.stanford.edu/mailman/listinfo/isl-colloq>.*
> ------------------------------
>
> Mailing list: https://mailman.stanford.edu/mailman/listinfo/isl-colloq
> This talk: http://isl.stanford.edu/talks/talks/2022q1/dan-russo/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.stanford.edu/pipermail/theory-seminar/attachments/20220303/0d4e5bb6/attachment.html>


More information about the theory-seminar mailing list