Performance Monitoring
PLEM's monitoring system provides 1kHz control loop performance data via ROS2 topics. It uses a wait-free design that never blocks the RT loop.
Key Topics
| Topic | Rate | QoS | Description |
|---|---|---|---|
/rt_raw | 1kHz | BestEffort | Timing, joint state, control values |
/rt_events | On event | ReliableTransient | Faults, mode changes, safety triggers |
/rt_monitor_stats | 10Hz | Reliable | Queue status, overflow counts |
When robot_id is set, namespaces are automatically applied like /{robot_id}/rt_raw.
RtSample Key Fields
| Field | Type | Description |
|---|---|---|
loop_exec_us | float | Loop processing time [µs] |
loop_jitter_us | float | Deviation from target cycle (1000µs) [µs] |
deadline_miss | uint8 | Deadline miss flag (0/1) |
actual_pos[0..5] | float | Joint position [rad] |
actual_vel[0..5] | float | Joint velocity [rad/s] |
actual_torque[0..5] | float | Joint torque [Nm] |
cmd_torque[0..5] | float | Command torque [Nm] |
Full field list: ros2 topic echo /rt_raw --once
Timing Metrics
loop_exec_us: Processing time from loop wakeup to completion. Normal: 30-60 µsloop_jitter_us:actual cycle - target cycle. Positive means late wakeup (kernel scheduling delay). Normal: 5-15 µs
PlotJuggler Visualization
# Install
sudo apt install ros-humble-plotjuggler-ros
# Run
ros2 run plotjuggler plotjuggler
Setup:
- Streaming → Start: ROS2 Topic Subscriber → Check
/rt_raw→ OK - Drag fields from left panel to plot:
loop_jitter_us(timing stability)loop_exec_us(computational load)
- Set time window to 10 seconds
Threshold Display: Right-click plot → Add Custom Series for warning line (50µs yellow) and error line (200µs red).
Save Layout: File → Save Layout As... → plem_rt_monitor.xml
Performance Criteria and Thresholds
Immediate action is required when exceeding these thresholds. Sustained warning conditions can affect system stability.
| Metric | Healthy | Warning | Error | Action |
|---|---|---|---|---|
loop_jitter_us | < 15 µs | 50-100 µs | > 100 µs | See diagnostic patterns below |
loop_exec_us | < 60 µs | 80-100 µs | > 100 µs | Review control parameters |
deadline_miss | 0 | > 0 | > 10/sec | Investigate immediately |
rt_overflow_delta | 0 | > 0 | > 100/sec | Consider increasing queue size |
Performance Degradation Diagnosis
High Jitter Patterns
| Pattern | Symptoms | Possible Cause | Solution |
|---|---|---|---|
| Periodic spikes | 180µs spikes every 5 seconds | Kernel background tasks (RCU, kworker) | Check CPU isolation and kernel thread affinity |
| Gradual increase | 10→80µs over time | Memory pressure (heap fragmentation, swapping) | Verify memory locking settings |
| Random large spikes | Irregular 843µs, 1205µs, etc. | CPU not isolated or RT priority not set | Check RT priority and CPU isolation |
Slow Execution Time Patterns
| Pattern | Symptoms | Possible Cause | Solution |
|---|---|---|---|
| Sustained high execution | Continuously > 80µs | Expensive control computation | Review control parameters, check optimization options |
| Sudden jump | Step change 52µs → 153µs | Mode change (entering TRAJECTORY) | Expected behavior. Verify exec_us stays < 100µs |
Diagnostic Commands
# Monitor jitter patterns
ros2 topic echo /rt_raw --field loop_jitter_us
# Average execution time (1000 samples)
ros2 topic echo /rt_raw --field loop_exec_us | \
awk '{sum+=$1; count++} count==1000 {print "Average: " sum/count " µs"; exit}'
# Alert on high jitter
ros2 topic echo /rt_raw --field loop_jitter_us | \
awk '$1 > 50 {print "Warning: jitter " $1 " µs"}'
# Correlate with events
ros2 topic echo /rt_events
Queue Management
Data transfer from RT loop to ROS2 topics uses wait-free queues. When the queue is full, new samples are dropped.
# Monitor overflow (0 is normal)
ros2 topic echo /rt_monitor_stats --field rt_overflow_delta
| Scenario | Queue Size | Rationale |
|---|---|---|
| Development (default) | 8192 | 8-second buffer, handles debugger pauses |
| Production | 4096 | 4-second buffer, lower memory usage |
| Rosbag Recording | 16384 | 16-second buffer for disk I/O bursts |
We recommend using the default (8192) unless RAM is severely constrained.
Quick Reference
# RT performance
ros2 topic hz /rt_raw # Publishing rate (should be ~1kHz)
ros2 topic echo /rt_raw --field loop_jitter_us
ros2 topic echo /rt_raw --field loop_exec_us
ros2 topic echo /rt_monitor_stats
# Event monitoring
ros2 topic echo /rt_events
# Multi-robot environment
ros2 topic echo /arm_left/rt_raw --field loop_jitter_us
Next Steps:
- Set up PlotJuggler with recommended layout
- Verify baseline metrics for your application
- Use diagnostic pattern tables and commands to analyze root causes when issues occur