ONES 3.1 Boosts SONiC Support: Key Enhancements for Smarter Infrastructure Troubleshooting

March 12, 2025

In today’s fast-moving digital world, maintaining a stable and well-monitored infrastructure is crucial. The latest release of ONES 3.1 introduces key updates, including enhanced support for SONiC (Software for Open Networking in the Cloud). These enhancements boost visibility, automate critical processes, and strengthen system health monitoring. The improved SONiC support streamlines issue detection and response, optimizing performance and minimizing downtime. IT teams can now focus on strategic tasks, knowing their infrastructure is continuously and intelligently monitored for peak performance.

Stay ahead of issues and ensure smooth operations with ONES 3.1.

System Health Monitoring

CPU-Intensive Services

Previously, identifying resource-heavy processes was challenging due to the lack of granular insights in system-wide CPU and memory metrics. Often, system-level data shows a spike in CPU usage without providing a quick way to pinpoint the cause. To address this challenge, ONES now provides detailed reports on the top 10 CPU-consuming services running on the host, along with their memory usage. This helps users easily identify high-impact processes like redis-server, agent, syncd, and dockerd. With this level of detail, users can diagnose performance issues more quickly, optimize system resources, and prevent potential bottlenecks, resulting in greater system efficiency.

Unhealthy Devices with Failure Codes

ONES 3.1 introduces a new feature that highlights unhealthy devices, offering real-time failure detection for hardware (e.g., PSU, fan failures, LED alarms), software services, key processes, and containers. When a failure is detected, the device is marked as unhealthy, with detailed information readily available in the UI. This streamlined view helps operators quickly identify and resolve issues, simplifying troubleshooting. Notifications are also provided in the topology view and health summary page

SONiC Docker Transitions

Docker containers are the backbone of the SONiC operating system, and ensuring their stable operation is crucial for switch performance. Previously, tracking container state changes, such as shifts from “up” to “down,” was difficult and time-consuming. Operators often struggled to detect these changes in real-time, leading to delays in addressing service disruptions and unnoticed issues. ONES 3.1 introduces a new widget that visually highlights Docker container state transitions, allowing operators to quickly spot changes and respond to disruptions. Widgets provides a “Connect” button for direct SSH access to the switch, enabling swift action when needed. Additionally, it offers a timeframe selection feature, allowing operators to view container state changes over a specified period.

Automatic IP Detection, Alerting and Rediscovery:

When a monitored device’s management IP changes, it’s crucial for the monitoring software to update the IP promptly to ensure smooth operations. Previously, detecting and updating a device’s management IP was a manual, time-consuming process, often causing communication breakdowns and delayed issue identification. ONES 3.1 introduces an automatic rediscovery mechanism that instantly detects when a device’s management IP changes and re-registers the switch with the controller. This enhancement eliminates manual intervention, ensuring continuous communication, real-time monitoring, and faster issue resolution, even when devices are reconfigured.

Additionally, IP Transition Widget allows operators to track all IP changes the device has undergone over a specific period and if it had conflicted with another IP in the monitored network. To further enhance visibility, an alert generation option using ONES Rule engine notifies operators of any management IP changes, ensuring they are always aware of network modifications and can respond swiftly to maintain seamless operations.

Rule Engine: Enhanced Alerts

The ONES Rule Engine has emerged as a preferred tool for automating network monitoring, allowing operators to configure custom rules based on their specific threshold levels for various parameters. When a defined condition is met, the system automatically generates an alert, enabling real-time, proactive responses to potential issues. These new metrics provide deeper insights and more precise control over network performance, ensuring smoother operations and quicker issue resolution.

ONES 3.1 takes SONiC network monitoring and troubleshooting to the next level with powerful enhancements like real-time failure detection, automated IP rediscovery, detailed system health insights, and advanced alerting.
Ready to see ONES 3.1 in action? Book a demo today and experience how it can transform your network management with smarter automation and deeper insights.

FAQs

1. How does ONES 3.1 improve SONiC infrastructure monitoring?

ONES 3.1 enhances SONiC observability by offering real-time visibility into system health, including CPU-intensive services, Docker container transitions, and device-level failures. This allows IT teams to proactively detect, investigate, and resolve issues faster than before.

2. What are the benefits of real-time alerts for Docker container failures in SONiC?

The new Docker Down Status alerts in ONES 3.1 notify operators immediately when SONiC containers fail, ensuring service disruptions are caught and addressed before they escalate—minimizing downtime and improving operational resilience.

3. Can ONES 3.1 detect and respond to SONiC device IP changes automatically?

Yes. ONES 3.1 introduces automatic IP rediscovery that detects management IP changes and re-registers the switch seamlessly, ensuring uninterrupted telemetry and real-time monitoring without manual intervention.

4. How does ONES 3.1 help in identifying the root cause of high CPU usage in SONiC?

ONES 3.1 provides granular visibility into top 10 CPU-consuming services, showing memory usage per process. This helps pinpoint root causes—like syncd, redis, or dockerd—behind performance spikes and allows quick remediation.

5. What types of SONiC infrastructure anomalies can the ONES Rule Engine detect?

The ONES Rule Engine can detect and alert on:

CPU/memory overuse by Docker containers
Docker container downtime
Hardware or service failures in devices
Real-time management IP changes

This enables a proactive, rule-based monitoring strategy tailored to each network’s performance needs.

6. How does enhanced network observability help diagnose SONiC switch issues faster?

Advanced observability tools in ONES 3.1 help operators:

Spot unhealthy devices and see precise failure codes instantly
Visualize Docker container status changes over time
Correlate CPU spikes with top resource-heavy processes
Respond quickly using direct SSH access from widgets

7. Why is automatic IP rediscovery important for large-scale SONiC deployments?

Automatic IP rediscovery ensures:

Continuous real-time telemetry even if IPs change during maintenance
Zero manual reconfiguration for IP updates
Faster troubleshooting for re-addressed switches
Reduced risk of monitoring gaps in dynamic environments

8. How does the AI network assistant simplify SONiC troubleshooting?

A conversational AI assistant can:

Answer plain-language queries about device health and logs
Summarize Docker transitions and failure alerts in seconds
Suggest root cause hints based on system metrics
Minimize CLI reliance, making diagnostics faster for all skill levels

9. What are the benefits of container-level CPU and memory monitoring for SONiC?

Container-level insights help operators:

Identify which SONiC service (like syncd or redis) is overloading resources
Set rule-based alerts when usage crosses safe thresholds
Optimize system performance proactively
Prevent unexpected container crashes due to resource exhaustion

10. How does the enhanced Rule Engine improve proactive network monitoring?

The upgraded Rule Engine enables:

Custom alert rules for Docker, hardware failures, and IP changes
One-click rule activation for fast deployment
Detailed summaries of active rules for audit and tuning
Real-time anomaly detection that cuts downtime and improves SONiC resilience

Anbarasan Ramalingam

Blog Author

Keerthi Chukka

Blog Author

How Techevolution Modernized Its Data Centers with Aviz and SONiC

August 4, 2025

How Aitire Modernized Its Network — Without Costly Hardware Upgrades

August 4, 2025

What Is SONiC Anyway — a Cartoon Character or the Future of Enterprise Networking?

July 9, 2025

Share the Post:

SONiC

Network Observability

AI Network Assistant

Networks for AI

AI for Networks

Latest Blog

Why Partner with Us?

Latest Blog

Login to Partner Portal

Documentation

Validated Designs for SONiC

FAQs

Help

Support

ONES 3.1 Boosts SONiC Support: Key Enhancements for Smarter Infrastructure Troubleshooting

March 12, 2025

System Health Monitoring

CPU-Intensive Services

Unhealthy Devices with Failure Codes

SONiC Docker Transitions

Automatic IP Detection, Alerting and Rediscovery:

Rule Engine: Enhanced Alerts

FAQs

Anbarasan Ramalingam

Blog Author

Keerthi Chukka

Blog Author

Subscribe to Aviz latest updates

Subscribe to Our Newsletter

Contact Us

ONES 3.1 Boosts SONiC Support: Key Enhancements for Smarter Infrastructure Troubleshooting