<aside> ✨

</aside>

<aside> ✨

To decide the agent’s actions it follows some strategies to decide the best move first.

</aside>

image.png

Epsilon Greedy Strategy

<aside> ✨

$\epsilon$ = exploration rate

</aside>

Choosing strategies

Progression

<aside> ✨

$r \lt \epsilon = 1$

Eg progression

Updating the Q-Value

Max steps

<aside> ✨

we can also specify a max number of steps that our agent can take before the episode auto-terminates. With the way the game is set up right now, termination will only occur if the lizard reaches the state with five crickets or the state with the bird.

We could define some condition that states if the lizard hasn't reached termination by either one of these two states after $100$ steps, then terminate the game after the $100^{th}$ step.

</aside>

The learning rate

<aside> ✨

Symbol : $\alpha$

<aside> ✨

The learning rate is a number between 0 and 1, which can be thought of as how quickly the agent abandons the previous Q-value in the Q-table for a given state-action pair for the new Q-value.

</aside>

</aside>