currently multi-step TD has an incorrect parameter (JuliaReinforcementLearning/ReinforcementLearning.jl#648).
|
function run_once(n, α) |
|
env = StateTransformedEnv( |
|
RandomWalk1D(N=NS, actions=ACTIONS), |
|
state_mapping=GroupMapping(n=NS) |
|
) |
|
agent = Agent( |
|
policy=VBasedPolicy( |
|
learner=TDLearner( |
|
approximator=TabularVApproximator(; |
|
n_state=n_groups+2, |
|
opt=Descent(α) |
|
), |
|
method=:SRS, |
|
n=n |
|
), |
|
mapping=(env,V) -> rand(action_space(env)) |
|
), |
|
trajectory=VectorSARTTrajectory() |
|
) |
|
|
|
hook = RecordRMS() |
|
run(agent, env, StopAfterEpisode(10),hook) |
|
mean(hook.rms) |
|
end |
as an example, the n is used as the number of time steps. however it currently corresponds to the number of time steps plus one. run_once(1, α) thus is not TD(0) which has a time step parameter of 1, but rather a 2-step TD method. depending on how upstream is resolved an update might be needed here.
currently multi-step TD has an incorrect parameter (JuliaReinforcementLearning/ReinforcementLearning.jl#648).
ReinforcementLearningAnIntroduction.jl/notebooks/Chapter09_Random_Walk.jl
Lines 193 to 216 in e83f540
as an example, the
nis used as the number of time steps. however it currently corresponds to the number of time steps plus one.run_once(1, α)thus is notTD(0)which has a time step parameter of 1, but rather a 2-step TD method. depending on how upstream is resolved an update might be needed here.