(!) Please ask about problems and questions regarding this tutorial on answers.ros.org. Don't forget to include in your question the link to this page, the versions of your OS & ROS, and also add appropriate tags.

Implementing a Control Layer

Description: C++ Walkthrough

Tutorial Level:

Next Tutorial: Implementing an Action Layer

The following examples show how generic Control Layers for MDPs and POMDPs can be implemented. First, for an asynchronous (event-driven) MDP:

   1 #include <string.h>
   2 
   3 #include <ros/ros.h>
   4 
   5 #include <markov_decision_making/ControllerEventMDP.h>
   6 
   7 using namespace std;
   8 using namespace ros;
   9 using namespace markov_decision_making;
  10 
  11 int main (int argc, char** argv)
  12 {
  13   init (argc, argv, "mdp_control_layer_example");
  14   
  15   if (argc < 4) {
  16     ROS_ERROR_STREAM ("Usage: mdp_control_layer_example" 
  17                       << "<number of states>"
  18                       << "<number of actions>"
  19                       << "<path to MDP Q-table>");
  20     abort();
  21   }
  22   
  23   size_t nr_states = atoi(argv[1]);
  24   size_t nr_actions = atoi(argv[2]);
  25   string q_table_file = argv[3];
  26   
  27   ControllerEventMDP controller (nr_states, 
  28                                  nr_actions, 
  29                                  q_table_file);
  30   
  31   spin();
  32   
  33   return 0;
  34 }

The ControllerEventMDP class implements an asynchronous controller for an MDP agent. Note that it requires, as an input, the Q-value function associated with the desired policy. The MDP stochastic models are not required, only its domain sizes (number of states and actions).

However, if the MDP itself is defined in a MADP-compatible file format, it can be passed as an input instead (see the MDM API for alternate constructors). In that case, the model will be parsed by MADP, and the following options can be set as parameters in the node's private namespace:

  • is_sparse (boolean): when set to true, the internal representation of the transition and observation functions of the model uses sparse (boost::uBLAS) matrices. Typically, for large models, this results in faster parsing and less memory usage at run-time.

  • cache_flat_models (boolean): when set to true, even if the model is defined in a factored format (for example, if the model is written in the ProbModelXML format), the "flat" (non-factored) version of the transition and observation functions will be calculated and stored in memory.

MDP controllers will subscribe to the ~/state topic and publish the associated action to the ~/action topic. Additionally, if the MDP model is provided, the controller will publish the immediate reward after an action selection to the ~/reward topic.

In contrast, the following implementation describes an asynchronous POMDP controller:

   1 #include <string.h>
   2 
   3 #include <ros/ros.h>
   4 
   5 #include <markov_decision_making/ControllerEventPOMDP.h>
   6 
   7 
   8 
   9 using namespace std;
  10 using namespace ros;
  11 using namespace markov_decision_making;
  12 
  13 
  14 int main (int argc, char** argv)
  15 {
  16   init (argc, argv, "pomdp_control_layer_example");
  17   
  18   if (argc < 3) {
  19     ROS_ERROR_STREAM ("Usage: pomdp_control_layer_example"
  20                       << "<path to problem file>"
  21                       << "<path to POMDP value function>");
  22     abort();
  23   }
  24   
  25   string problem_file = argv[1];
  26   string value_function_file = argv[2];
  27   
  28   ControllerEventPOMDP controller (problem_file, 
  29                                    value_function_file);
  30   
  31   spin();
  32   
  33   return 0;
  34 }

Note that, for POMDP controllers operating according to the scheme shown in Figure 2 of MDM Concepts, the problem file must be passed to the constructor, so that it is able to handle belief updates at run-time. POMDP controllers receive observations through the ~/observation topic.

Additionally, they subscribe to ~/initial_state_distribution, which can be used to set the initial belief of the POMDP. As outputs, POMDP controllers publish actions on the ~/action topic; the belief state at run-time to ~/current_belief; and the immediate (expected) reward to ~/reward.

Wiki: markov_decision_making/Tutorials/Implementing a Control Layer (last edited 2015-07-16 12:23:42 by JoaoMessias)