<aside> ✨
</aside>
<aside> ✨
RL technique used to find the optimal policy in a MDP
</aside>
<aside> ✨
➡️ This approach is called value iteration .
+1
-1
+10
-10
➡️ INITIALLY
0
➡️ Q-Table
INITIAL TABLE
States | Left | Right | Up | Down |
---|---|---|---|---|
1 cricket | 0 | 0 | 0 | 0 |
Empty 1 | 0 | 0 | 0 | 0 |
Empty 2 | 0 | 0 | 0 | 0 |
Empty 3 | 0 | 0 | 0 | 0 |
Bird | 0 | 0 | 0 | 0 |
Empty 4 | 0 | 0 | 0 | 0 |
Empty 5 | 0 | 0 | 0 | 0 |
Empty 6 | 0 | 0 | 0 | 0 |
5 crickets | 0 | 0 | 0 | 0 |
FINAL TABLE
State | Left | Right | Up | Down |
---|---|---|---|---|
0 (R) | - | 0.3 | - | 0.2 |
1 | 0.1 | 0.2 | - | 0.1 |
2 | 0.1 | - | - | 0.4 |
3 | - | - | 0.1 | - |
4 (Bird) | - | - | - | - |
5 | - | - | 0.1 | 0.3 |
6 | - | 0.1 | 0.2 | - |
7 (1🦗) | 0.2 | 0.3 | - | 0.4 |
8 (5🦗) | - | - | - | - |
<aside> ✨
Notes by : Mehul (mehul.xyz)
</aside>