ROS 2 Security Working Group
ROS 2 Security Working Group
- [Sid] Vuln Disclosure Policy Next Steps
- [Sid] WG interaction with the public
- [mikael] SROS2 claiming a Quality Level as per REP-2004
- [moderator] Review past action items
Attendees Mikael Arguedas Jeremie Deray Sid Faber Kyle Fazzari Joe McManus Roger Strain Víctor Mayoral Vilches Ruffin White Florian Gramss Chris Lalancette
VDP Next Steps
See comments on REP-2006: ROS 2 Security VDP PR and past WG meeting minutes.
- What is the triage process once a report has been received?
Challenge is that this represents a large community, and it may be difficult to access code related to the vulnerability while handling an embargoed report.
It generally works well to find the maintainer when handling ROS releases. If a maintainer cannot be identified OR has a way around.
Non-responsiveness may result in removing the package from REP-2005.
We could consider adding a security contact to the package as part of the standard for REP-2005.
When reporting the vulnerability, the reporter should also provide steps needed to reproduce the issue. For example, a docker image or a snap.
Code quality and security is more than just responding to the vulnerabilities: there are more secure coding issues that will need to be addressed beyond this.
- Members of the security reporting alias
Take recommendations on how to maintain list membership. It was strongly recommended that another firm be represented on the distro list. However, to be on the list you have to contribute to the community.
WG interaction with the public
Follow-up from discussions about the VDP. Should someone want to interact with the WG (e.g., for robot security advice), where should they be directed? Today we have the following places:
Security WG wiki page (meeting agendas)
Google group for ROS Security Working Group
Consensus was to use the github community page as our landing site.
SROS2 claiming a Quality Level
Should we claim a REP-2004 quality level for SROS2?
SROS2 is a part of ros core. Most of the core is aiming for level 2, so it makes sense.
Consensus is to target level 2, although first we need to evaluate whether we comply today.
Action items update
- [new] 5/26/2020 [mikael]: Evaluate SROS2 for level 2 compliance with REP-2004.
- [new] 5/26/2020 [sid]: Update the community wiki with all the communications methods, update the TSC Working Groups list to point to the community wiki.
- [closed] 2019/08/21: How to handle errors if two nodes try using mismatched DDS: not a security issue, old, no recollection of the issue itself. Close.
[closed] 2020/02/25: Create a ros2 "security-wg" team to tag security features across ROS. We do now have a github user (that is using the google groups email address): https://github.com/ros-security-wg and teams on the ros-security and the ros2 org. I believe this is done?
- [update] 2020/05/12: ROSCON 2020 Security workshop: ROSCON will minimally be delayed; if it happens, plan for a security BOF rather than a workshop.
- [Mikael] Move security utilities outside rcl
- [Kyle] Update on current security landscape for Foxy
- [Víctor] ROSCon outlook and planning
- [Sid] Vuln Disclosure Policy Update
- [Ted] ros2 launch --secure demo
Attendees Mikael Arguedas, Jeremie Deray, Sid Faber, Kyle Fazzari, Ted Kern, Joe McManus, John-Paul Ore, Roger Strain, Victor Mayoral Vilches, Ruffin White
Administrivia The WG agreed to move meetings earlier in the day to accommodate Europe time zones. In the future meeting will be adjusted if needed to support Asia/Pacific members.
Move security outside rcl
See rcl issue 545
Goal is to be able to compile without security, particularly for resource constrained environments. Discussions included how environment variables are used/accessed, what rcl contributes to security and use cases.
The WG concluded that this is not a simple issue.
Request for WG members to review the PR above, discuss in matrix and make a recommendation on how to move forward.
Update on security landscape for Foxy
Mikael discussed again Foxy support and problems of current tests failing. Help is needed with testing on alternate operating systems, particularly in US time zones.
There is a vetted PR to address many of the issues, it just needs tested in an Americas time zone.
ROSCon outlook and planning
Víctor discussed last year's workshop at ROSCon. Since last year's event was well received, it could be repeated again this year. Request for assistance from the WG in helping to put together content for this year's event.
Consensus was that we need some diversity in content, we do not simply want to repeat last year's content. Open question on whether we have new items to present.
Víctor also proposed activities to raise security awareness within the ROS community, including a week of ROS bugs coordinated by the Security WG.
Vulnerability Disclosure Policy (VDP) update
Sid discussed the current state of the VDP. After discussing with OR, the VDP is now proposed as REP 2006. This largely accepts the WG's recommended document with two changes: some language was changed from a disclose.io format to follow kernel.org formatting, and the contact email was updated to firstname.lastname@example.org.
WG members can see past comments in the first (with Security WG) google docs draft
or in the second (with OR) google docs draft
ros2 launch --secure demo
Ted gave a demo of ros2 launch --secure using demo_nodes_cpp talker_launch.py. See the video for details.
Launch secure depends on the nodl file; the one used for the demo is available in pastebin.
This is a minimum viable product; composable and lifecycle nodes may be a challenge.
Ruffin clarified that some DDS implementations support chained CAs which allow for interesting security setups. Using this approach the robot computational graph could be further segmented and segregated.
- [mikael] (2 minutes, carried forward pending Foxy freeze) Make a recommendation on Move security related filesystem and env utilities outside rcl · Issue #545 · ros2/rcl
[mikael] Foxy status https://github.com/ros2/sros2/issues/180
Attendees Jacob Hassel, Alexis, Roger Strain, Victor Mayoral Vilches, Ted Kern, Mikael Arguedas, Sid Faber
Current status of items for Foxy, some concerns that sros2 will not be working in Foxy:
- One remaining feature to be merged is the cyclone Only open feature is the cyclone dds security plugins
Red builds because the OR CI is not up to date, and due to version issues. Nightlys still run during the work day in europe. See https://github.com/ros2/sros2/issues/180. Sync with OR to get fixes in the RMW build failures.
- OR disabled sros2 features in order to merge and opened an issue with follow-on work. Work still needs done to generate policy file based on a ROS graph (audit the running system). Looking for assistance if anyone has time this week.
- SROS2 public API curated by Kyle
- Other issues are small bug fixes that can be updated following the freeze.
ros2 launch --secure
Ted is working on adding "secure" launch tag to use NoDL to generate the keystore, create certs and keys for nodes as they are launched. Eventually will also manage individual policies. Launch is complex due to remapping and substitutions; if anyone has experience, consider helping to work through the buildout.
A design doc will be posted to the forum shortly once there's a minimum viable product with an explanation of design decisions to open it for comment.
Mikael asked for US-timezone help to get SROS2 foxy-ready. He volunteered to steer whoever took on this task but cycles from someone are required.
Request: Generate keystore materials ahead of time and set them up before the robot is actually deployed. The challenge is that the substitution system is only evaluated when the launch node action is visited by the launch service.
Use cases: Víctor requested to consider situations where CAs are not available while the robot is running. He argued that DDS did already allow for multi-CA configurations. One option is to use a Permissions CA and an Identity CA, another is to use different CAs for different parts of the robot.
Out of Víctor suggestion, It was jointly agreed to start putting together a series of reference use cases to steer the development of sros2. It may also be good to revisit CA management within sros2, since it was originally written assuming proper support for a robust keystore.
- [ruffin] (2 minutes) Update on environment variables
- [sid] (5 minutes) CIS benchmarks for ROS, ROS 2
- [kyle] (10 minutes) MVP for RMW support of security logging
- [kyle] (2 minutes) public API curation for SROS2 utilities
Attendees Gianluca Caiazza, Jeremie Deray, Sid Faber, Kyle Fazzari, Jacob H, Ted Kern, Joe McManus, Victor Murray, Victor Mayoral Vilches, Roger Strain, Ruffin White
Environment variables update
Details are available in Github sros issue 199.
Variables which used to specify security_root_directory, etc., were path files to the root of the keystore or a folder with the artifacts. This has been changed with the migration of security to enclaves. Also looking to eventually use the security location for more than just a keystore--there may be other runtime security configs that should be in the directory (which is why enclaves are in a subdirectory).
Also the variabel security_override doesn't need the prefix, just the name to be forced for the root or debug enclave, etc. Inspead pass the fully qualified path name.
Canonical will begin working on a ROS security benchmark shortly. The benchmark begins with the Ubuntu 18.04 benchmark and will be relaxed/updated to suite ROS 1. After ROS 1, intend to work on a benchmark for ROS 2. Once the initial template is laid out, this will be shared for community feedback.
The CIS security benchmarks are community accepted best practice standards. Membership is required in order to use the workbench for editing.
MVP for RMW
Pull request 404, security logging plugin for rmw_connext is up for review. This implements a minimum viable project for enabling logging. It allows enabling experimental logging for Foxy (work will continue with the design for next release).
The DDS standard does not give much definition for the primitives necessary used for security logging. This pull request does not define / change logging either, it does not add new functionality but it simply pulls the events out of the existing log and puts them into the security log.
Public API for SROS 2
Work is in progress to move the SROS 2 API that was public by default and moving parts of it private in order to curating the public API. Please add comments to the review document to define what should be public and what should not be considered public.
In addition work is in progress on a ros launch extension to automatically generate keys, keystores and securely launch nodes.
[mikael] (9 min) OR asked for feedback on https://github.com/ros2/rcl/issues/545
- [jeremie] (20 min) Security logging
[Víctor] (1 min) Week of Universal Robots bugs (https://news.aliasrobotics.com/week-of-universal-robots-bugs-exposing-insecurity/)
Attendees Joe McManus, Mikael Arguedas, Jeremie Deray, Sid Faber, Kyle Fazzari, Victor Murray, Jacob, Roger Strain, Victor Mayoral Vilches
RCL issue 545 requests that security features be moved out of rcl and into rcutils. Primary motivation was to enable ros2 on systems (e.g., embedded) that don't have a file system or environment variables.
This can be done by moving security features elsewhere, or by better use of the -DENABLE_SECURITY compile option.
Jeremie reviewed the logging plugin design document for FastRTPS.
- Pull requests for working group comment/collaboration:
The goal is to provide an ability to react to security events; the first step is to get DDS logs to syslog. This implements a new (not required) DDS plugin. There are three main streams of work:
- Implement in plugin in FastRTPS
- Bridge into ROS at the rmw_fastrtps
- Access logging with sros2 cli tool
- Standardizing the logging format since it is not currently proscriptive (standardized / specified).
- XML configuration for logging
- Managing QOS for the logging topic
- Adding a new "enable_logging" verb to ros2 security
Victor discussed Alias Robotics work in finding bugs in the widely used UR robot; bugs are not necessarily limited to ROS. This is part of an ongoing campaign to raise awareness of robotics vendors to security.
- [mikael/ruffin] (20m) Review updated proposal for node to participant mapping and SROS2 changes
WG should review https://github.com/ros2/design/pull/274
[mikael] (2m) OR asked for feedback on https://github.com/ros2/rcl/issues/545 a while back) SKIP again
- [mikael] (1m) Interest from the WG to discuss + recommend static analysis tools for quality and security?
- [kyle] (1m) The WG is now the official maintainer of the sros2 project
Attendees Mikael Arguedas, Michael C, Jeremie Deray, Sid Faber, Kyle Fazzari, Ted Kern, Joe McManus, Victor Mayoral Vilches, Ruffin White
Discussion on implementation details of context security. Highlights of the pending changes:
- Introduced context settings in policy documents and some metadata tags. Add support for a "context" security directory.
- Added context runtime argument to set location of context security options
- Also added an environment variable to override the runtime argument
- "Prefix" configuration for contexts is redundant and is not required
- How to resolve context names that do not have a fully qualified path (e.g., "foo" vs. "/path/to/foo")
- Table of user requirements and how names are resolved is included with the PR documentation
- CS4R-I went well, videos are available
- Open a discussion on Matrix about code quality tools (e.g., Coverity)
- The WG now maintains the sros2 project; discussed details of the WG's related responsibilities
- Compile feedback for REP2004-2005
- Ask OR for a “security-wg” team on the ros2 org
- Review updated proposal for node to participant mapping and SROS2 changes
OR asked for feedback on https://github.com/ros2/rcl/issues/545 a while back
- Call for participation/support/dissemination in CS4R
Attendees Mikael Arguedas, Jeremie Deray, Sid Faber, Kyle Fazzari, Joe McManus, Ricardo Gonzalez Moreno, Victor Mayoral Vilches, Ruffin White
See https://github.com/ros-infrastructure/rep/pull/218. The REP is important to the WG because it ties in to vulnerability response that’s in the draft VDP which provides a deadline to fix vulnerabilities. There's also a good way to view current quality status
General discussion covered how this might change behaviors, whether the proposed categories are too coarse, and whether this drives response times for remediating vulnerabilities.
Continue discussions offline. Make comments over the next few days in the google doc.
Proposal is to create a ros2/security-wg github team to use as a tag so all relevant security features may be tagged and monitored. Problem is that github does not properly support mentioning teams in the situations we'd like. See @mentioning other teams issue. If we do create a team, it may be that only admins can tag the team.
The WG agreed that this is an improvement over what we have today and recommend creating the team.
Brief discussion covered current status of changes in progress:
- Restructuring keystore directory to take context into account
- Adjusting lookup strategy; by default, pretty similar to previous behavior, but node name isn't used
- Thinking about launch file integration, adding context there (relative or fully qualified)
- Some concerns in how this will play out in terms of files and containers
- Updating documentation
"Cyber Security For Robotics"
Preparing cyber security for robotics workshop next week in Spain. Looking for collaborators, although short notice. Will try to record.
Remainder of the agenda was queued for our next meeting.
- Switch to one DDS participant per context
- Default SROS 2 policies
- How do we integrate security into package quality categories
- Propose general status updates
Attendees Mikael Arguedas, Jeremie Deray, Tomoya Fujita, Joe McManus, Kyle Fazzari, Ruffin White. Michael Carroll
Switch to one DDS participant per context
https://github.com/ros2/rcl/pull/515 We need to review latest commits Status update on our side [action] Ruffin, Kyle and Mikael will chat offline
Default SROS 2 policies
Concept: Enable easy use of debugging tools when security is enabled by using an agreed-upon context name. Might also be a path toward encryption by default.
[action] explore solutions all
Package quality and security
How do we integrate security into package quality categories: https://github.com/ros-infrastructure/rep/pull/218
[action] Continue to discuss (on matrix) and aggregate feedback on this document - add CI to security policy, maybe time to patch, this package supports working policy setup, etc
- Propose general status updates
- ruffsl: share link to a survey
- (Sid) Discuss draft Vulnerability Disclosure Policy (VDP)
- Default SROS2 policies
Switch to one DDS participant per context (see PR)
- Review latest commits
- Status update
Attendees Kyle Fazzari (moderator), Mikael Arguedas, Michael Carroll, Jeremie Deray, Sid Faber, Ted Kern, Victor Mayoral, Dragan Stancevic, Ruffin White
Vulnerability Disclosure Policy
This discussion consumed the entire meeting. The following decisions were agreed upon:
- Scope: Cover ROS 2 ros_base, ros_core and desktop only. Add a caveat to the scope statement with the intent that if you care about security, you need to move to ROS 2. ROS 1 is explicitly out of scope: it is known to be vulnerable, and all effort is being put into ROS 2 re-engineering rather than fixing ROS 1. Securing ROS 1 is disingenuous. Many product companies are actively moving to ROS 2.
- Timeframe: Still under discussion, but should be 90 days from initial report until public release. There are two windows to consider--the time we give ourselves to fix (internal), and the time until the vulnerability is reported (external). There is an underlying minimum based on when patch releases happen; worst case binaries are not available for 60 days. With 30 days to actually fix the code, this gives us a minimum external reporting timefram of 90 days.
- Responsibilities: The working group agreed that Open Robotics will handle operational issues such as vulnerability coordination and the day-to-day operation of the policy. The Security Working Group will focus on engineering, driving security requirements for ROS 2. This potentially includes training activities like providing vendors with templates to create their own vulnerability disclosure policies.
Sid has an action item to update the policy and continue taking input.
The VDP discussion raised a few unanswered issues, including:
- How deep in the stack do we go? For example, were we covering ROS 1, would we cover the XML parser that's a fork of a fork of a fork? Do we patch yamlcpp?
- How do we handle embargoed information while maintaining an open group? This may not be an issue if we stick to engineering work.
The last two items on the agenda were not discussed.
Given the current activity and number of items to discuss, the working group agreed to increase meetings to twice monthly, and increase to an hour.
- (Sid) Exporting security events from ROS 2 to security information and event management (SIEM) tool.
- Using DDS-Security, how do we ensure DDS-level information makes sense to ROS users?
- Discuss with DDS vendors
- (Mikael) Org / repo management for working group repositories. Governance of SROS2 repo.
Draft of SWG Governance https://github.com/ros-security/community/pull/1
(Kyle) NoDL pull request (https://github.com/ros2/design/pull/266)
Attendees Joe McManus, Kyle Fazzari , Dragan Stancevic, Lander Usategui, Jeremy Deray, Mikael Arguedas, Ruffin White
- Goal is to move ROS monitoring data back into a SIEM and create a pipeline to a security tool. Open questions:
- If you’re using ROS2 and exporting DDS events, then the data going to security monitoring is mangled. Have we ever demangled?
- There’s a pull request to convert RTI lite mangled names to ROS
- Can demangles using XML templates
- Do we need a different demangling tool? Depends on the use case. Certificate handshake failure is one, bot sending bad data / DDoS is one.
- Demangling needs to more use friendly. FastRTPS has a function to demangle, each implementation has its own demangler. The demangler is not built into the rmw api.
- Be wary that the logging / DDS management infrastructure should be kept separate from ROS. ROS should not be required for the standalone logging infrastructure; demangling exclusively within ROS may be challenging / undesirable
- Some demangling pointers:
- Rmw implementations use different logging
- Logging spec: Need buy-in from vendors. Standarizing may be tricky for some, some may not be interested or it may be complex.
- It may be possible to add a logging description to the OMG DDOS spec. The logging spec is very vague, we should consider driving more precision in the logging requirements.
- All the ROS DDS middleware vendors are members of the OMG
As the security emphasis grows with tooling and examples, we should create a repo with security subprojects.
- In progress. The repo exists, but appears to be centered on some Amazon subprojects. Although it’s not complete, the plan is to meet these goals
- It may be challenging to move other ROS projects under the security project within github.
- Open Robotics doesn't have a ton of time for maintaining sros2, that should probably be our first
- Proposal primarily includes the interface, which demonstrates security plus other benefits.
Intend to incorporate labels & validate information flow for security
- This can also be valuable for QOS: whether nodes can continue to communicate depending on QOS negotiation. This should be dynamic, maintained at run time.
- Can NoDL reference an external XML document? This might help support QOS interchangeability between vendors.
- Why would a user specify non-default QOS? Because they may have a best-effort high bandwidth sensor, for example. But other sensors (cliff) may be critical and should be configured as such at run-time or at launch time.
- DDS: the XML spec overrides and can be configured at launch time, not defined programmatically.
- What QOS do you use when using a non-DDS implementation? No current solution; the NoDL PR uses existing ROS abstractions that include QoS.
Discussion about default/overriding QoS values: https://github.com/ros-infrastructure/rep/pull/212#issuecomment-559412811
- Desires for security tooling in ROS 2
- Minimum features
- Vulnerability disclosure method: How should security researchers report security issues against ROS?
- Method for communication (keybase vs encrypted mail)
- Documented process
- CRD Process
Notes and action items are moved over to the ROS Security Working Group Wiki.
- Next meeting will be about the same date next month; there will not be a meeting over Thanksgiving week.
- Determine if Syslog/CEF is GPL'd or something similar
- Write disclosure method recommendation and feed back
- Recommendation for using syslog in common logging format. Common Event Format (CEF). How open of a standard is this? Could use this in DDS logging plugins as well.
- Intrusion detection is one use-case, but other use cases include policy generation and simple troubleshooting (have nodes record themselves).
- Ruffin: ROSCon talk on event data recorders
Vulnerability Disclosure Method
- Email and GPG keys is well-used. One key per recipient should be published, and disclosures should encrypt with each.
- Keybase is easier to setup, but then we all have to monitor an app on a phone. Web app exists but isn't great.
- Web form? Easily encrypted, also provides the ability to enforce a format. Maybe hook up to trello.
- Web form
- But we should really do both (not everyone wants to fill out a form, not everyone wants to email/figure out GPG)
- What is the target? ROS, ROS2 and standard installation items should be covered. Not all packages that can be arbitrarily installed.
- Notification is one part of a larger overall disclosure policy.
- Announcements (5 min)
- Roslaunch2 sandboxing (10-15 min)
- Discussion DDS participants (10-15 min)
- Canonical will start chairing the Security WG
Joe McManus will lead the ROS 2 security WG
- Execute ROS 2 apps w/o installing ROS2 - you just need to install roslaunch2 with PIP and you can start any ROS 2 on any platform (OS X, Windows, Linux)[^]
- Your application run in one or more Docker containers sandboxing nodes from each other and from the host system.
- Use Docker resource limits to protect your nodes and budget the resources allocated to every one of them
- Distribute your nodes easily as Docker images
No more Jenkins & Bloom hell, build your image once, upload it to a Docker repo and let anyone use your code.
- Rely on ROS2 OSRF images and Docker/GitHub integration to setup your CD pipeline!
- Build ROS2 robot applications easily
- Large ROS2 (or ROS1) apps are currently expressed through ROS packages.
- Cumbersome - each node the app relies on needs to be a dependency in the package.xml. If compiling from source, can take a very long time.
- Some cloud CI services such as Travis have a hard timeout on builds, avoid it by pulling images instead!
- Instead, your app can become a single, self-contained Python file.
- Pull all nodes as Docker images instead
- Store in your application which version of which node you want to run by specifying an image tag (impossible to do currently with Bloom+APT)
- No need to necessarily Bloom release everything (can use custom images - just need to publish the image in some repo like Dockerhub).
- Don't wait X months for updates anymore! Update your images to a newer version your dependencies instantaneously!
- Build advanced e2e and integration tests easily
- Build non-flaky, meaningful integration tests. If your integration test depends on e.g. robot_state_publisher and you pull it using test_depend, if it breaks, you break! Use a Docker image to pull robot_state_publisher from a ROS2 release easily. And pin the image to avoid E2E test flakyness!
- Write cross-distro tests (talker is a Crystal node, listener is a Dashing node)
- Write cross-RMW tests (talker is fastrtps, listener is RTI Connext)
- Write e2e tests simulating multiple architectures with qemu and Docker (listener is ARMHF, talker is X86)
Use Docker tools such as Pumba to simulate network latency in your e2e tests (a la Chaos Monkey) - are you sure your application is safe when the WiFi quality drops?
- Use Docker resource limits to starve your nodes - how will your robot react when it will run of memory?
[^] Bloom release is planned, release through PIP should be possible but is not scheduled yet.
Discussion DDS participants
ROS 2 Access Control Policies
- Contains the design as well as the schema
- Can grant general permissions and then revoke
- Evaluated in order; can interleave allow and deny
- Supports topics, services, actions (parameters TBD)
- Independent from the DDS layer
- Uses XML over YAML for parsing/validating
- “Color” profiles; use to whitelist communications between other profile
- Runtime errors if two nodes try using mismatched DDS?
- Composability might be an issue due to not knowing about Pub/Sub relationship
- Know graph if policies are part of ROS Launch
- Encryption might have a performance impact
- Off by default?
ROS 2 Node Sandboxing
The objective is to extend roslaunch with a syntax allowing nodes to be sandboxed using various methods. We are looking at implementing a policy relying on Docker containers as an extension to ROS Launch.
- Might need to bootstrap ROS launch for running in a container
- Nodes might be looking for configs at specific locations
- How to handle physical devices attached to robots
Generic Node Interface Description
- Dynamic behavior is hard to cover.
- How to define an interface if nodes are spawned on the fly?
- Maybe grouping nodes together is a way to simplify the problem?
- Useful to populate ROS 1 like node API documentation pages
- Where to put the info?
- Probably need to be installed (distributed as part of the Debian package)
- One single repository for all nodes?
- Are policies needed if you have good node interface?
- Can generate policy through your build system.
- What about having the IDL the input and consume it from the C++ side?