Q-Learning | Notion

<aside> ✨

</aside>

Basic Info

<aside> ✨

RL technique used to find the optimal policy in a MDP

</aside>

How does Q-Learning work (cricket example)

<aside> ✨

Iteratively updates the q-value for each state-action pair using the bellman equation
until the q-function converges the the optimal q-function ($q_*$) </aside>

➡️ This approach is called value iteration .

Example

lizard → wants to eat as many crickets as possible while avoiding the bird
the lizard has a few actions available
rewards
- for a tile with 1 cricket → +1
- for a tile with 0 cricket → -1
- for a tile with 5 crickets → +10
- for a tile with the bird → -10

➡️ INITIALLY

the q value are initially 0
- since the lizard knows nothing in the start
- throughout the process via value iteration we will keep updating the q values .

➡️ Q-Table

INITIAL TABLE

States Left Right Up Down

1 cricket 0 0 0 0

Empty 1 0 0 0 0

Empty 2 0 0 0 0

Empty 3 0 0 0 0

Bird 0 0 0 0

Empty 4 0 0 0 0

Empty 5 0 0 0 0

Empty 6 0 0 0 0

5 crickets 0 0 0 0
FINAL TABLE

State Left Right Up Down

0 (R) - 0.3 - 0.2

1 0.1 0.2 - 0.1

2 0.1 - - 0.4

3 - - 0.1 -

4 (Bird) - - - -

5 - - 0.1 0.3

6 - 0.1 0.2 -

7 (1🦗) 0.2 0.3 - 0.4

8 (5🦗) - - - -

Fin

<aside> ✨

Notes by : Mehul (mehul.xyz)

</aside>