Contents
This package can be used to learn a value function for general problems in which state information and a simple reward signal can be provided. For example, given set of time series data from continuous navigation, one can provide a 5 dimensional state input (2 dimensional robot position, 2 dimensional goal position, and time of day in hours) and learn a value function predicting the value of any given state at a given time of day. That is, we hope to learn to avoid various areas at certain times of day due to traffic etc.
Video
Services
agent_env (value_learner/agent_env) This service expects a variable length array of type Float64 called state, a Float64 called reward, and responds with the value as a Float64. This service also takes in a bool called isStateTerminal which is used in episodic tasks to flag the terminal state of an episode when set to true. Otherwise leave false.
Parameters
Parameters
~/value_learner_agent/discount_factor (float64, default: 0.99)- The discount factor of the underlying MDP.
- The width of the radial basis functions.
- Number of radial basis functions to use in LSTD. Not necessary for RKHS-SARSA.
- Dimension of the state input.
- Type of feature map to use. Always set to raw for RKHS-SARSA.
- If 1 will perform value function updates. If 0 will simply evaluate the given states.
- The path and file name of a parameter file for LSTD. New parameters will also be written to this file.
- The path of the directory containing the parameters for RKHS-SARSA.
- lambda - the back-propagation coefficient for RKHS-SARSA.
- The step size parameter for the RKHS-SARSA gradient steps.
- The sensitivity parameter for the RKHS-SARSA algorithm. The higher this number, the less RBFs will be added.
learn_RKHS_SARSA.launch
This launches value_learner in learning mode using the RKHS-SARSA algorithm. This offers the agent_env service which awaits a state and reward and returns a value for the given state. The value response is obviously not necessary whilst learning takes place, however it can be quite useful for monitoring the progress. RKHS-SARSA is very useful when online learning is required.
learn_LSTD.launch
This launches value_learner in learning mode using the LSTD algorithm. This offers the agent_env service which awaits a state and reward and returns a value for the given state. The value response is obviously not necessary whilst learning takes place, however it can be quite useful for monitoring the progress. LSTD only performs batch updates, however, given a batch of data it is guaranteed to give exactly the corresponding value function.
predict_RKHS_SARSA.launch
This launches value_learner in predict mode, given a set of parameters learned using the RKHS-SARSA algorithm. This offers the agent_env service which awaits a state and reward and returns a value for the given state.
predict_LSTD.launch
This launches value_learner in predict mode, given a set of parameters learned using the LSTD algorithm. This offers the agent_env service which awaits a state and reward and returns a value for the given state.
steal_nav_data.py
Python script which obtains the necessary data from either a simulated, or real, navigation run. This script fills out the state and reward information accordingly and makes service calls to agent_env.