1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
| Initialize Q(s, a) arbitrarily for all s belongs to S, a belongs to A(s) Repeat (for each episode): E(s, a) = 0, for all s belongs to S, a belongs to A(s) Initialize S, A Repeat (for each step of episode): Take action A, observe R, S' Choose A' from S' using policy derived from Q (e.g. ε-greedy) δ <- R + γQ(S', A') - Q(S, A) E(S, A) <- E(S, A) + 1 For all s belongs to S, a belongs to A(s): Q(s, a) <- Q(s, a) + αδE(s, a) E(s, a) <- γλE(s, a) S <- S' A <- A' until s is terminal
|