I am running into limitations of the current design of the run loop: Let's assume I am using a custom policy that internally stores the history of past observations and actions. The questions is what is the intended way how the policy can retrieve these information. In the current implementation it seems the push!(hook, stage, policy, env) interface is supposed to be used for custom code, whereas the push!(policy, stage, env[, action]) is supposed to be used internally by RLCore, is this correct? However, the Hooks do not receive the action.
Here are some ideas to address this issue:
-
I think all calls to push!(agent::Agent, stage::AbstractStage, env[, action]) should not only be used to store information in the Trajectory (which currently is the case) but also forward to the policy by calling push!(agent.policy, stage, env[, action]). This would allow custom policies to add custom logic.
-
Similarly to push(policy, ::PostActStage, env, action) having an action argument, push!(hook, ::PostActStage, policy, env) could also have an action argument added. This would also allow to use the chosen action within a custom policy hook.
-
Another hook should be added between plan! and act! to evaluate functions that need the current env state and the action that is being executed. In the PreActStage the action is not known yet and in the PostActState the env is already in the next state, so this is currently not possible. The interior loop could look something like this:
push!(policy, PreActStage(), env)
optimise!(policy, PreActStage())
push!(hook, PreActStage(), policy, env)
action = RLBase.plan!(policy, env)
push!(policy, PostPlanStage(), env, action) # new
optimise!(policy, PostPlanStage()) # new
push!(hook, PostPlanStage(), policy, env, action) # new
act!(env, action)
push!(policy, PostActStage(), env, action)
optimise!(policy, PostActStage())
push!(hook, PostActStage(), policy, env, action) # action arg new
I can open a pull request it that's an approach you want to follow.
I am running into limitations of the current design of the
runloop: Let's assume I am using a custom policy that internally stores the history of past observations and actions. The questions is what is the intended way how the policy can retrieve these information. In the current implementation it seems thepush!(hook, stage, policy, env)interface is supposed to be used for custom code, whereas thepush!(policy, stage, env[, action])is supposed to be used internally by RLCore, is this correct? However, theHooks do not receive theaction.Here are some ideas to address this issue:
I think all calls to
push!(agent::Agent, stage::AbstractStage, env[, action])should not only be used to store information in theTrajectory(which currently is the case) but also forward to the policy by callingpush!(agent.policy, stage, env[, action]). This would allow custom policies to add custom logic.Similarly to
push(policy, ::PostActStage, env, action)having anactionargument,push!(hook, ::PostActStage, policy, env)could also have an action argument added. This would also allow to use the chosen action within a custom policy hook.Another hook should be added between
plan!andact!to evaluate functions that need the currentenvstate and theactionthat is being executed. In thePreActStagethe action is not known yet and in thePostActStatethe env is already in the next state, so this is currently not possible. The interior loop could look something like this:I can open a pull request it that's an approach you want to follow.