Please ask about problems and questions regarding this tutorial on answers.ros.org. Don't forget to include in your question the link to this page, the versions of your OS & ROS, and also add appropriate tags. |
Reinforcement Learning Tutorial
Description: This tutorial explains how to use the rl-texplore-ros-pkg to perform reinforcement learning (RL) experiments. It will explain how to compile the code, how to run experiments using rl_msgs, how to run experiments using rl_experiment, and how to add your own agents and environments. This package was developed by Todd Hester and Peter Stone at the University of Texas at Austin.Keywords: Reinforcement Learning, Learning, AI, TEXPLORE, MDP
Tutorial Level: BEGINNER
Installing the package
Installing the rl-texplore-ros-pkg
Install the rl-texplore-ros-pkg to get all the reinforcement learning packages. Our packages have their own Github project. Check out the trunk of rl-texplore-ros-pkg:
git clone
Installing ROS
Install the desktop install version of ROS. You don't need the the simulators, navigation, and perception stuff that comes with the full install. Install ROS by following the directions at: http://www.ros.org/wiki/ROS/Installation
Add the ROS environment setup to the end of your .bashrc:
echo "source ~/ros/setup.sh" >> ~/.bashrc
Append our packages as well as the ROS ones (this example does it from the shell, or you can use any editor):
cat <<'EOF' >> ~/.bashrc export ROS_PACKAGE_PATH=${ROS_PACKAGE_PATH}:~/svn/rl-texplore-ros-pkg EOF
Open a new terminal. If ROS is installed correctly, things like rospack and roscd should work properly. Try the following:
rospack find rl_agent roscd rl_env
Compiling the code
Next, we will want to compile the code. There are 5 packages (rl_agent, rl_common, rl_env, rl_msgs, and rl_experiment) which you can compile separately. Some of them are dependent on others, and ROS knows this. For example, rl_agent is dependent on rl_common and rl_msgs, so if you compile rl_agent, rosmake will compile rl_common and rl_msgs first.
To make this easy, we're going to compile rl_experiment, which should cause rosmake to compile all the packages. In the future though, you can just compile whatever package you're modifying or using. Compile rl_experiment with the following command:
rosmake rl_experiment
This should compile cleanly without errors.
Running experiments
There are two ways to run experiments. You can start an agent and environment separately and have them communicate with the ROS messages defined in the rl_msgs package. The rl_msgs package defines a set of ROS messages for the agent and environment to communicate. These are similar to the messages used in RL-Glue (Tanner and White 2009), but simplified and defined in the ROS message format. Or you can run experiments using the rl_experiment package, which creates an agent and environment object and calls their methods directly.
Running an experiment with rl_msgs
For the agent and environment to communicate using ROS messages, a roscore must be started. So first, we'll start that by running:
roscore
Now that roscore is running, we can start an agent from the rl_agent package. We're going to start a real-time TEXPLORE agent (Hester and Stone 2010, Hester et al 2012) running at 20 Hz, but you can start other agents, details are on the page documenting the rl_agent package. For now, the agent must be started before the environment, because the agent must be up and running when the environment sends its initial environment description message. To start the real-time TEXPLORE agent, type the following into a new tab/terminal:
rosrun rl_agent agent --agent texplore --planner parallel-uct --actrate 20
We've selected the TEXPLORE algorithm (Hester and Stone 2010) with the '--agent texplore' option, the parallel real-time planner (Hester et al 2012) using UCT (Kocsis and Szepesvari 2006) with the '--planner parallel-uct' option, and an action rate of 20 Hz with the '--actrate 20' option.
Now that the agent has started, let's start the environment. We'll start a stochastic two room gridworld in another tab/terminal:
rosrun rl_env env --env tworooms --stochastic
Here, '--env tworooms' selects the two room gridworld domain and '--stochastic' tells it to run the stochastic version.
Now your experiment should be running. If you go back to the agent tab, you'll see it print out the sum of rewards at the end of each episode.
Viewing and plotting results
ROS has a lot of nice tools for viewing messages, plotting data in real-time, and recording data. To find out more about some of these tools, take a look at these tutorials:
Here's an example of a few of these tools. The agent and environment are communicating through a series of messages from the rl_msgs package, indicating states, rewards, and actions. Using the rostopic command, we can debug some of these messages. The following command will list all the messages being published right now:
rostopic list
We can look more in depth at the values of one of the messages:
rostopic echo /rl_env/rl_state_reward
This command should show us all the RLStateReward messages being sent from the environment to the agent. These contain the agent's current state, the reward received on this step, and if this transition led to a terminal state.
Next, we can plot some of the values from these messages live in real-time. The rl_experiment_info message contains the sum of rewards each episode published by the agent. We can plot this value to examine the rewards per episode the agent is receiving. To plot the episode_reward from this message:
rxplot /rl_agent/rl_experiment_info/episode_reward
This should (after some delay) bring up a live plot of the rewards per episode of the agent.
Finally, we can record any of the messages we want. Then later we can examine the messages, plot them, or even play them back in place of the code that would usually send them. As an example, let's save all the messages sent by the agent:
rosbag record /rl_agent/rl_action /rl_agent/rl_experiment_info
After running this command, it should tell you the name of the file these messages were sent to. Now let's kill the agent and replace play back this file to send actions instead. After killing the agent, let's run:
rosbag play <bagfile>
It's hard to tell here (it may be useful to turn the debug on in the environment with the --prints option), but the environment is being controlled by the action messages played back from the file. Since are just the recorded actions the agent sent and not necessarily the right ones for the state the agent is in currently, they're unlikely to perform very well. It's still an interesting thing that can be played around with though.
Running an experiment with rl_experiment
With the rl_experiment package, we can also run experiments without passing ROS messages back and forth. Instead, the rl_experiment package instantiates an agent and an environment and calls their methods directly to run the experiment. At the end of each episode, it will print the sum of rewards for the episode to the cerr output. The rl_experiment package documentation has more details on how use it.
As an example, we will run an experiment using R-Max (Brafman and Tennenholtz 2001) on the Taxi domain (Dietterich 1998). Note that we no longer need roscore running since we are not passing ROS messages any more.
rosrun rl_experiment experiment --agent rmax --env taxi --stochastic
Here, we must pass options defining both the agent and environment. We've selected R-Max with the '--agent rmax' option and Taxi with the '--env taxi' option. In addition, we've chosed to have the Taxi domain have stochastic transitions with the '--stochastic' option.
By default, this should run a single trial of 100 episodes. You'll see the sum of rewards for each episode printed to the screen.
Adding code
This repository is intended to be easy to use and easy to extend, modify, and augment. Since the agents and environments communicate through ROS messages, you can create your own agent or environment from scratch in the language of your choosing as long as it sends and receives the appropriate ROS messages. You can look at the documentation for the rl_agent and rl_env packages to see how they send and receive rl_msgs properly.
In addition to writing an agent or environment entirely from scratch, of course you can write one within our framework, or even based on existing code. Details on how to do this are presented below.
Adding an Agent
An agent within our framework must satisfy the Agent interface defined in the rl_common package. This requires the agent to respond to methods asking it for the first_action, next_action, and last_action. One possibly way to start would be to modify an existing agent, such as the Q-Learning agent. For example, you could change the action the agent selects in the first_action and next_action methods in QLearner.cc. To use the new agent within the rl_agent package, you can modify the agent.cpp file in the package to create an instance of your new agent in the processEnvDescription method. Compile the rl_agent package with
rosmake rl_agent
Now you should be ready to run experiments with your new agent.
Adding an Environment
Similar to the agent, new environments must satisfy the Environment interface defined in the rl_common package. The environment must respond to requests for sensations, whether the state is terminal or not, apply actions when called, and provide some information on the ranges of features and rewards in the domain. A possible way to create a new Environment is to make a copy of an existing one, such as the Fuel World domain (Hester and Stone 2010). To use this new domain inside the rl_env package, you can modify the env.cpp file in the package to create the approriate instance of the domain in the initEnvironment method. Once these changes are done, compile the code:
rosmake rl_env
Now you should be able to run experiments with your new environment.
Setting up your robot as an Environment
You may want to setup your robot as its own environment. Similar to the general environments in the env.cpp file in the rl_env package, you'll want to listen to RLAction and messages and publish RLEnvDescription and RLStateReward messages. For example, we setup experiments where we had a learning agent control the pedals of our autonomous vehicle to control the vehicle's velocity. We setup an environment node which listened to messages published by the controllers of the pedals and that gave it odometry updates. It combined these observations in a state vector and reward and sent an RLStateReward message to the agent. Upon receiving an RLAction message from the agent, it sent the appropriate control messages to the pedal controllers and then took in new observations for a new RLStateReward message again.
Adding your own model learning method to the model-based agent
One of the most useful parts of the rl_agent package is that it includes a general model based agent. This agent can be used with any MDP Model or Planner that fit the interfaces defined in rl_common. Another contribution of this package is the real-time planning architecture (Hester et al 2012) which places model learning, planning, and action selection in parallel threads so that action selection can occur in real-time at the desired rate regardless of how long model learning or planning may take. It uses UCT (Kocsis and Szepesvari 2006) for planning, performing as many roll-outs as it can between each action. This architecture can be used with any model learning method that fits our MDPModel interface.
A new MDP Model must satisfy the interface. Essentially, it must be able to update the model with a single <s, a, s', r> experience or a set of experiences, and return a next-state and reward prediction for a queried state-action. It also must be able to return a copy of itself. Once you've created a new MDP model, you only need to add it to the ModelBasedAgent. Include your h file in the appropriate place, and instantiate an instance of your new model in the initModel method inside the ModelBasedAgent. You may also want to add your model as a command line option in the agent.cpp file and define an index for your type of model in the core.hh file in the rl_common package.