Maze Explorer

Maze explorer project using Reinforcement Learning

Video

Algorithm

Q Learnning - off policy

1
2
3
4
5
6
7
8
9
Initialize Q(s, a) arbitrarily
Repeat (for each episode):
Initialize s
Repeat (for each step of episode):
Choose a from s using policy derived from Q (e.g. ε-greedy)
Take action a, observe r, s'
Q(s, a) <- Q(s,a) + α * [r + γ * max_a'Q(s', a') - Q(s,a)]
s <- s'
until s is terminal

Sarsa - on policy

1
2
3
4
5
6
7
8
9
10
Initialize Q(s, a) arbitrarily
Repeat (for each episode):
Initialize s
Choose a from s using policy derived from Q (e.g. ε-greedy)
Repeat (for each step of episode):
Take action a, observe r, s'
Q(s, a) <- Q(s,a) + α * [r + γ * Q(s', a') - Q(s,a)]
s <- s'
a <- a'
until s is terminal

Sarsa lambda - on policy

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Initialize Q(s, a) arbitrarily for all s belongs to S, a belongs to A(s)
Repeat (for each episode):
E(s, a) = 0, for all s belongs to S, a belongs to A(s)
Initialize S, A
Repeat (for each step of episode):
Take action A, observe R, S'
Choose A' from S' using policy derived from Q (e.g. ε-greedy)
δ <- R + γQ(S', A') - Q(S, A)
E(S, A) <- E(S, A) + 1
For all s belongs to S, a belongs to A(s):
Q(s, a) <- Q(s, a) + αδE(s, a)
E(s, a) <- γλE(s, a)
S <- S'
A <- A'
until s is terminal
Author

Haoxiang Zhang

Posted on

11-11-2023

Updated on

08-24-2024

Licensed under