Search Mailing List Archives
[theory-seminar] Theory Lunch 08/05: William Kuszmaul (MIT)
Prasanna Ramakrishnan
pras1712 at stanford.edu
Thu Aug 5 11:41:51 PDT 2021
Hey everyone! Gentle reminder that theory lunch is starting in about 18
minutes, with the talk starting half an hour later. Hope to see you all
there!
On Mon, Aug 2, 2021 at 11:00 AM David Wajc <wajc at stanford.edu> wrote:
> Hi all,
>
> This week's theory lunch will take place Thursday at noon (PDT), at this
> zoom room
> <https://stanford.zoom.us/j/98932206471?pwd=YXdubytLVGNTbXhGeXFxNmJaVnhrUT09>. As
> usual, we'll start with some socializing, followed by a talk starting at
> 12:30pm. Pras will be MCing. Thanks Pras!
> Bill will tell us about: *Linear Probing Revisited: Tombstones Mark the
> Death of Primary Clustering*
>
> *Abstract:* First introduced in 1954, linear probing is one of the oldest
> data structures in computer science, and due to its unrivaled data
> locality, it continues to be one of the fastest hash tables in practice. It
> is widely believed and taught, however, that linear probing should never be
> used at high load factors; this is because primary-clustering effects cause
> insertions at load factor $1 - 1/x$ to take expected time $\Theta(x^2)$
> (rather than the ideal $\Theta(x)$). The dangers of primary clustering,
> first discovered by Knuth in 1963, have been taught to generations of
> computer scientists, and have influenced the design of some of many widely
> used hash tables.
>
> We show that primary clustering is not the foregone conclusion that it is
> reputed to be. We demonstrate that small design decisions in how deletions
> are implemented have dramatic effects on the asymptotic performance of
> insertions, so that, even if a hash table operates continuously at a load
> factor $1 - \Theta(1/x)$, the expected amortized cost per operation is
> $\tilde{O}(x)$. This is because tombstones created by deletions actually
> cause an anti-clustering effect that combats primary clustering.
>
> We also present a new variant of linear probing (which we call graveyard
> hashing) that completely eliminates primary clustering on any sequence of
> operations: if, when an operation is performed, the current load factor is
> $1 - 1/x$ for some $x$, then the expected cost of the operation is $O(x)$.
> One corollary is that, in the external-memory model with a data blocks of
> size $B$, graveyard hashing offers the following remarkable guarantee: at
> any load factor $1-1/x$ satisfying $x=o(B)$, graveyard hashing achieves
> $1+o(1)$ expected block transfers per operation. Past external-memory hash
> tables have only been able to offer a $1+o(1)$ guarantee when the block
> size $B$ is at least $\Omega(x^2)$.
>
> Based on joint work with Michael A. Bender, Bradley C. Kuszmaul
>
> Cheers, David
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.stanford.edu/pipermail/theory-seminar/attachments/20210805/0b299eee/attachment-0001.html>
More information about the theory-seminar
mailing list