System Health Monitoring
CPU-Intensive Services

Unhealthy Devices with Failure Codes

SONiC Docker Transitions

Automatic IP Detection, Alerting and Rediscovery:
Additionally, IP Transition Widget allows operators to track all IP changes the device has undergone over a specific period and if it had conflicted with another IP in the monitored network. To further enhance visibility, an alert generation option using ONES Rule engine notifies operators of any management IP changes, ensuring they are always aware of network modifications and can respond swiftly to maintain seamless operations.

Rule Engine: Enhanced Alerts
The ONES Rule Engine has emerged as a preferred tool for automating network monitoring, allowing operators to configure custom rules based on their specific threshold levels for various parameters. When a defined condition is met, the system automatically generates an alert, enabling real-time, proactive responses to potential issues. These new metrics provide deeper insights and more precise control over network performance, ensuring smoother operations and quicker issue resolution.
- 1. Docker CPU/Memory Utilization – Enables container-level monitoring of CPU and memory usage, alerting operators if utilization exceeds set limits.
- 2. Docker Down Status – Triggers an alert if any Docker container goes down, ensuring immediate awareness of service disruptions.
- 3. Unhealthy Devices – Identifies and alerts operators of devices that experience failures.
- 4. IP Transition Monitoring – Tracks IP changes in devices and generates alerts when IP transitions occur.

ONES 3.1 takes SONiC network monitoring and troubleshooting to the next level with powerful enhancements like real-time failure detection, automated IP rediscovery, detailed system health insights, and advanced alerting.
Ready to see ONES 3.1 in action? Book a demo today and experience how it can transform your network management with smarter automation and deeper insights.
FAQs
1. How does ONES 3.1 improve SONiC infrastructure monitoring?
ONES 3.1 enhances SONiC observability by offering real-time visibility into system health, including CPU-intensive services, Docker container transitions, and device-level failures. This allows IT teams to proactively detect, investigate, and resolve issues faster than before.
2. What are the benefits of real-time alerts for Docker container failures in SONiC?
The new Docker Down Status alerts in ONES 3.1 notify operators immediately when SONiC containers fail, ensuring service disruptions are caught and addressed before they escalate—minimizing downtime and improving operational resilience.
3. Can ONES 3.1 detect and respond to SONiC device IP changes automatically?
Yes. ONES 3.1 introduces automatic IP rediscovery that detects management IP changes and re-registers the switch seamlessly, ensuring uninterrupted telemetry and real-time monitoring without manual intervention.
4. How does ONES 3.1 help in identifying the root cause of high CPU usage in SONiC?
ONES 3.1 provides granular visibility into top 10 CPU-consuming services, showing memory usage per process. This helps pinpoint root causes—like syncd, redis, or dockerd—behind performance spikes and allows quick remediation.
5. What types of SONiC infrastructure anomalies can the ONES Rule Engine detect?
The ONES Rule Engine can detect and alert on:
- CPU/memory overuse by Docker containers
- Docker container downtime
- Hardware or service failures in devices
- Real-time management IP changes
This enables a proactive, rule-based monitoring strategy tailored to each network’s performance needs.
6. How does enhanced network observability help diagnose SONiC switch issues faster?
Advanced observability tools in ONES 3.1 help operators:
- Spot unhealthy devices and see precise failure codes instantly
- Visualize Docker container status changes over time
- Correlate CPU spikes with top resource-heavy processes
- Respond quickly using direct SSH access from widgets
7. Why is automatic IP rediscovery important for large-scale SONiC deployments?
Automatic IP rediscovery ensures:
- Continuous real-time telemetry even if IPs change during maintenance
- Zero manual reconfiguration for IP updates
- Faster troubleshooting for re-addressed switches
- Reduced risk of monitoring gaps in dynamic environments
8. How does the AI network assistant simplify SONiC troubleshooting?
A conversational AI assistant can:
- Answer plain-language queries about device health and logs
- Summarize Docker transitions and failure alerts in seconds
- Suggest root cause hints based on system metrics
- Minimize CLI reliance, making diagnostics faster for all skill levels
9. What are the benefits of container-level CPU and memory monitoring for SONiC?
Container-level insights help operators:
- Identify which SONiC service (like syncd or redis) is overloading resources
- Set rule-based alerts when usage crosses safe thresholds
- Optimize system performance proactively
- Prevent unexpected container crashes due to resource exhaustion
10. How does the enhanced Rule Engine improve proactive network monitoring?
The upgraded Rule Engine enables:
- Custom alert rules for Docker, hardware failures, and IP changes
- One-click rule activation for fast deployment
- Detailed summaries of active rules for audit and tuning
- Real-time anomaly detection that cuts downtime and improves SONiC resilience