Posted 11-11-2023Updated 08-24-2024project2 minutes read (About 264 words)0 visits

Maze Explorer

Maze explorer project using Reinforcement Learning

Video

Algorithm

Q Learnning - off policy

Initialize Q(s, a) arbitrarily
Repeat (for each episode):
    Initialize s
    Repeat (for each step of episode):
        Choose a from s using policy derived from Q (e.g. ε-greedy)
        Take action a, observe r, s'
        Q(s, a) <- Q(s,a) + α * [r + γ * max_a'Q(s', a') - Q(s,a)]
        s <- s'
    until s is terminal

Sarsa - on policy

Initialize Q(s, a) arbitrarily
Repeat (for each episode):
    Initialize s
    Choose a from s using policy derived from Q (e.g. ε-greedy)
    Repeat (for each step of episode):  
        Take action a, observe r, s'
        Q(s, a) <- Q(s,a) + α * [r + γ * Q(s', a') - Q(s,a)]
        s <- s'
        a <- a'
    until s is terminal

Sarsa lambda - on policy

Initialize Q(s, a) arbitrarily for all s belongs to S, a belongs to A(s)
Repeat (for each episode):
    E(s, a) = 0, for all s belongs to S, a belongs to A(s)
    Initialize S, A
    Repeat (for each step of episode):  
        Take action A, observe R, S'
        Choose A' from S' using policy derived from Q (e.g. ε-greedy)
        δ <- R + γQ(S', A') - Q(S, A)
        E(S, A) <- E(S, A) + 1
        For all s belongs to S, a belongs to A(s):
            Q(s, a) <- Q(s, a) + αδE(s, a)
            E(s, a) <- γλE(s, a)
        S <- S'
        A <- A'
    until s is terminal

Maze Explorer

https://zhanghx04.github.io/2023/11/maze-explorer/

Author

Haoxiang Zhang

Posted on

11-11-2023

Updated on

08-24-2024

Licensed under

#AI Python

Maze Explorer

Video

Algorithm

Q Learnning - off policy

Sarsa - on policy

Sarsa lambda - on policy

Author

Posted on

Updated on

Licensed under

Categories

Tags