What shape should scheduler requests take?

  • They need to specify the 'resource' they are requesting.
  • Can be used to request a single resource, or a batched set of resources.
    • Batched sets can have items flagged as optional, i.e. they are not immediately necessary for the batched allocation of the request.
    • Allocations for batched requests should not be partial/incremental (avoid looong deadlocks).

Current Resolution

  • Implementation: Request.msg

    • Resource List : handle batched resource requests with a list.
    • Priority : added an int16 priority field (rocon_msgs#43).

  • A resource is a rocon resource identifier for a rapp ('ros package name/app name').
    • Platform Info : used as a hint to the scheduler (filtering or classification).

Discussion Thread

Post-December 3

  • (Jack) I've been thinking about a different solution to the "ranged request" problem:

    • treat the Request.resources list as an ordered sequence

    • add a Request.min_resources field

    • the request will be granted when at least the first min_resources items are available. Any remaining resources will also be allocated, but only if they are available at that moment.

December 3

  • (Daniel) Batched requests - groups of resources in a single request. e.g. A service wants three robots of varying kinds allocated together.

    • (Piyush) Worried about deadlocks from partial allocations.

      • (Daniel) Wouldn't want to consider partial allocations, only allocate if all can be fulfilled in that instant, or nothing at all.

      • (Daniel) Of course, still lots of problems in intelligently allocating the right combination of robots amongst all the permutations.

    • (Jack) Had an idea of doing requests of nested requests.

    • We decided in a discussion just to go with the simpler resource list for the first iteration and learn a bit by experimentation.

Pre-December 3

  • (Daniel) Good idea to ask this straight up - what is a resource?

    • Previous assumption was that it was a 'concert client' i.e. robots, software, human interactive tablets were all considered and were specified by a combination of platform info and robot application name, e.g. ubuntu.precise.ros.turtlebot.cybernetic_pirate.teleop

    • I didn't really think about this too much at the time.
    • Human interactive clients (tablets) were moved out of the classification as their interaction with the concert is very different.
    • Starting to be clear that the resource is actually just the 'teleop'.
    • Ideally platform info for a decent abstraction layer at the resource level should always look like *.*.*.*.*, so having the platform info is just a filter to be practical when the scheduler hasn't got all the smarts for allocation exactly the way you want it, or you'd like to characterise the request - e.g. software '*.*.ros.pc.*'.
  • (Jack) You undoubtedly understand the rapp interface better than I do. From conversations with Daniel during his recent trip to Texas, I believe he described option 2. The service requests resources by their rocon_std_msgs/PlatformInfo, which may contain NAME_ANY ("*") wild cards to match some set of robots with appropriate capabilities. When the scheduler assigns a specific robot to fulfill that request, it will send a scheduler_msgs/SchedulerFeeback containing the corresponding scheduler_msgs/Request with all the same requester-assigned UUID and all NAME_ANY fields resolved to the exact resource that was granted.

  • (Daniel) Need to jolt this direction a bit (I probably didn't explain in sufficient detail). We actually use rocon_std_msgs/PlatformInfo (or a string tuple representation thereof) + Rapp Name when a service is looking for resources. Suppose we only use platform info, then there are two problems that I see:

    1. Services are free to start any old rapp on the robots. Less worried about the security implications of this than the way this blurs the specifications of what a service needs (yeah, service A needs a turtlebot, but what it wants to do with a turtlebot, and what rapps need to be available on the turtlebot is black box).
    2. Services start needing to know robots well themselves and we lose any possibility of an abstraction layer between services and resources.
    The ideal case would be that platform info requirements would be something like *.*.*.*.mobile_teleop (where teleop is the rapp). Practically though, we might release that it should have to be linux.*.ros.*.mobile_teleop in general for ros robots, or even linux.*.ros.pr2.play_snooker in the case of a single robot specific rapp.
  • (Jack) Without better understanding of the overall design, I can only provide the perspective of an interested external user. I'd like to have:

    • a unified ROS message for transmitting resource specifications
      • unified in what way?
      • a single ROS message containing everything needed to request, assign, start, stop, preempt any Concert resource. Something like rocon_std_msgs/Resource would contain all required PlatformInfo, rapp name, remapping and other data. Many Concert components can treat it as an opaque handle without understanding the details.

    • an object-oriented Python (and C++) interface for matching specific resources to wildcard requests
      • this should also provide a canonical conversion from the ROS message to the string representation
    A string message representation with appropriate parsing support would provide considerable flexibility. Strings could be matched using regular expressions, if desired.
    • This sound good. Is this the kind of parsing support you are talking about (very rough and needs to move out of the app manager)?

      • Yes: the parser should be simple, but the syntax needs to be well-defined. Given the natural uncertainty about the way things may evolve, I suggest we consider something like a Uniform Resource Identifier.

      • It may prove useful to distinguish the rapp name from the platform identifier syntactically: <os> '.' <version> '.' <system> '.' <platform>  '/' <rapp>

  • (Jack) What is the concert_msgs/LinkEdge[] remappings field?

    • (Daniel) I'm currently trying to extricate the linkgraph, linkedge information out of our scheduler implementation - these notions are strictly part of what we used six months ago when creating a static multi-robot roslaunch, i.e. a link graph. As this is only one kind of concert service, it should not end up being used in general for resource requests. If we do need remappings (either to start apps or for some other reason, then the best thing would be to use the same rocon_std_msgs/Remappings.msg we use for human interactive clients such as androids.

  • (Jack) There is a general issue with handling updates to the Request message contents: when a message arrives how should we reconcile values updated by the scheduler with those updated by the requester? The problem is that the allocation and feedback topics are full-duplex and both sides may initiate changes in parallel.

    • The current design is for state transitions to control when updates apply.
      • Requesters apply different state transitions than the scheduler.
      • If they both make changes in parallel, the state transition rules should sort out what happens. Updates to adjunct fields will only apply when accompanied by a valid state transition.
      • I have not worked out the details well enough to show that this is sufficient.
      • Are there cases where we need to update some field without a state transition?
    • Maybe the simplest approach is to define who is allowed to change each field:
      • priority is initially set by the requester, but only changed by the scheduler.

      • resource is initially set by the requester, and only updated by the scheduler when a specific resource is assigned.

      • availability is set by the scheduler, except initially for reservation requests.

      • hold_time is only set or updated by the requester.

      • remappings are unclear (to me)

        • remappings are needed if the scheduler starts apps, maybe also for more esoteric use cases (perhaps sharing).
    • (Daniel) After setting initial resource request requirements, exactly what fields do you envisage the concert service (i.e. requester) needing to change?

      • (Jack) Probably only hold_time: experience may improve the requester's initial estimate. I suppose availability could be modified for a reservation request, but that would create a race condition if the scheduler acted simultaneously, so I would disallow it. To change any other fields, the requester would need to cancel the original request and make a new one.

  • (Jack) Do we need to track each request time separately?

    • (Daniel) How would you do otherwise?

    • (Jack) Right now we only have a time stamp on the aggregate SchedulerRequests message, but some requesters will make various requests at different times. I think we should add a time stamp to each individual Request message.

  • (Daniel) This discussion is good - I don't think the resource should use platform info at all to define the resource any longer. It should be the rapp only. Any platform info specifications should be a resource request hint, i.e. a filter for practical situations which can't be handled yet to help the scheduler.

  • (Jack) I think the scheduler request manager is complete enough to use for real scheduler and requester implementations. There are probably some missing pieces, but actual use is the best path forward. Before investing much time in such code, I'd like to make some API changes, (please comment):

    • Combine the AllocateResources and SchedulerFeedback messages into a single SchedulerRequests type. There is no good reason for two different message types, and it would be more convenient to combine them.

    • (Daniel) Sure, let's eliminate redundancies.

    • Those changes are included in the resolution listed above.

  • (Daniel) Services can cancel requests that haven't been fulfilled for a length period. Should we rely on the good behaviour of services to do this, or should the resource manager timeout old, 'hanging' requests?

    • (Jack) As long as the requester still wants that resource, I see no reason for the scheduler to cancel the request. The requester is periodically renewing it as part of the heartbeat protocol, anyway.

Wiki: rocon_concert/Reviews/simple_scheduler API proposal_API_Review/Requests (last edited 2013-12-03 16:53:51 by JackOQuin)