Errors when using pr2_computer_monitor

Error: CPU Temperature Error/Bad Reading

Errors from CPU Temperature: The diagnostics show an error under the "CPU Temperature" heading, and it says it has a bad reading. This is probably because the commands that cpu_monitor.py use to check the computer aren't working.

Solution:

  • Check the pr2_computer_monitor page for the implementation details. You should be able to run all the commands without an error.

NTP Offset Error

If you see an error under "NTP offset from <hostname> to <server>", check if the "<server>" is available using ntpdate -q.

ntpdate -q my_server.my_server.com

If this command returns an error, ntp_monitor.py will fail to update. Correct the server name and try again.

ntp_monitor.py also checks the offset for a computer against itself. Check that this command works:

ntpdate -q MY_HOSTNAME

Restarting "chrony" may work:

sudo service chrony restart

CPU monitor reports "CPU Fan Off"

In the diagnostics data, you will see the fan speeds listed for each CPU, and the "MB Fan RPM" data. If any of these fans is off, then the PR2 computers will overheat and loose performance. This is a serious problem with the PR2.

CPU monitor reports "Restarting temperature check thread"

The CPU monitor checks core temperatures using ipmitool. On occasion, these calls to ipmitool can fail, and hang the thread that checks temperatures. The CPU monitor will restart the temperature checking thread and publish a warning to the console:

Restarting temperature check thread in cpu_monitor. This should not happen

This problem is explained in <<Ticket(wg-ros-pkg 4171)>>.

Wiki: pr2_computer_monitor/Troubleshooting (last edited 2011-09-22 20:26:45 by KevinWatts)