Recipes for Multi-Armed Bandits

Etymology

  • Historically based on slot machines, which used to be called bandits
  • Different slot machines have different payout distributions, but you don't know what those distributions initially
  • How do you pick the most profitable slot machine?

$\epsilon$-First Method (A/B Testing)

from IPython.core.display import HTML

HTML("""<script>
    MathJax.Hub.Config({
        displayAlign: 'left'
    });
</script>""")

$\epsilon$-Greedy Method

$$ A \larr \begin{cases} \argmax_a Q(a) & \text{with probability } 1-\epsilon\\ \text{a random action} & \text{with probability } \epsilon \end{cases} \\ R \larr do(A) \\ N(A) \larr N(A) + 1 \\ Q(A) \larr Q(A) + \frac{1}{N(A)}[R-Q(A)] $$

UCB Method

Thompson Sampling