Exciting Announcement! In celebration of launching our AI Certification, we’re thrilled to offer a 50% discount exclusively. Seize this unique chance—don’t let it slip by!
In dynamic network environments, device management IPs change over time due to various factors such as DHCP leases, network reconfigurations, or failover events. ONES, a gNMI-based telemetry solution for real-time network monitoring, must adapt to these changes to ensure continuous and accurate data collection. This dynamic adaptation enhances network reliability and minimizes disruptions. With ONES 3.1, we have adopted an advanced Automatic IP Rediscovery Mechanism to address this challenge. This eliminates the need for manual intervention by enabling ONES to instantly detect management IP updates and automatically re-register devices with the controller. It offers seamless monitoring, ensuring real-time visibility and uninterrupted telemetry data collection.
Seamless IP Transition with ONES :
Automatic IP Detection: The ONES Agent now monitors and detects any changes in the device’s management IP.
Seamless Re-Registration: Upon detection, the agent automatically updates the configuration and re-registers the device.
Real-Time Tracking with the IP Transition Widget: A dedicated widget in ONES UI logs historical IP transitions, helping operators track changes over time.
Instant Alerts via ONES Rule Engine: Notifications alert administrators when an IP transition occurs, ensuring immediate awareness and action.
IP Transition Widget & Alerts :
ONES 3.1 enhances network visibility and responsiveness by providing real-time tracking and instant alerts for management IP transitions. This ensures that network operators can proactively monitor and respond to changes without disruption.
To further streamline operations, a dedicated UI widget has been introduced, which logs historical IP changes, allowing administrators to track, analyze, and troubleshoot patterns over time. This historical insight helps in identifying trends, diagnosing potential issues, and improving overall network stability.
In addition, automated notifications play a crucial role in keeping administrators informed. The system instantly sends alerts whenever a management IP transition occurs, ensuring that teams can take swift corrective actions to maintain seamless device connectivity and uninterrupted telemetry data collection. By eliminating manual tracking and intervention, ONES 3.1 significantly reduces operational overhead while enhancing network resilience.
By removing manual processes, ONES 3.1 enhances operational efficiency, minimizes human errors, and accelerates network recovery during IP transitions. The improved visibility provided by real-time tracking and historical logging enables administrators to monitor IP changes proactively, analyze transition patterns, and troubleshoot potential issues with ease.
Additionally, the feature strengthens network reliability and performance by ensuring that devices remain seamlessly connected to the ONES, even when their management IPs change. Automated notifications and alerts further enhance responsiveness, allowing IT teams to react immediately and prevent downtime.
Overall, the IP Transition Feature in ONES 3.1 optimizes network resilience, streamlines workflows, and enhances the accuracy of real-time monitoring, making it an indispensable tool for modern network operations.
In every release, ONES product stays in tune with customer feedback to continually refine and enhance the user interface (UI), addressing key pain points and improving the overall experience. ONES 3.1 is no exception, bringing a fresh wave of UI improvements that make it easier than ever for users to manage their networks. With a single pane of glass approach, it offers clear insights into potential issues, provides quick overviews, summaries, and more. In the sections below, we’ll delve into the key UI enhancements introduced in this release.
Default Rules with Troubleshooting Hints:
Network operators prefer the ONES rule engine to monitor their networks for potential issues by setting threshold values for metrics such as CPU utilization or fan failures, triggering alerts when conditions are met. With each release, new metrics are introduced, enhancing the product’s monitoring capabilities. In Release 3.1, ONES goes a step further by analyzing network deployments across various regions and customer environments, offering a set of preconfigured rules called Default Rules. These rules come with industry-standard threshold values, allowing operators to seamlessly monitor their networks by simply enabling them. Additionally, each Default Rule includes a set of troubleshooting steps that appear directly in the alert payload. This helps users understand what actions to take when a specific alert is triggered. The troubleshooting steps include SONIC, FRR ,linux shell commands, along with recommended physical checks tailored to the issue at hand, providing clear guidance on resolving potential problems swiftly.
Alerting Rules: Preview and Download
With the growing number of default and custom rules, ONES 3.1 introduces a convenient Downloadable Summary feature. This allows users to easily access a complete overview of all rules configured in the system. The summary is available in CSV format and includes key fields such as rule names, threshold levels, Slack notification and ticketing system configurations, and more. This makes it simple for network operators to review and audit their rule configurations at a glance. In ONES 3.1, the Preview Feature has been introduced to provide a quick and detailed view of individual rules directly on the rules page. Below each rule, users can expand a summary that highlights the key configurations of that rule.
Traffic Comparison
Often, simply observing traffic data on a single interface may not offer sufficient insights into the overall traffic flow within a device. Understanding the traffic pattern in relation to other interfaces becomes essential for a more comprehensive analysis. ONES 3.1 addresses this need by introducing an Ingress and Egress Traffic Utilization Comparison feature within the device. With this enhancement, users can select up to eight interfaces and perform a comparative analysis of either the Tx (Transmit) or Rx (Receive) utilization of the links. This side-by-side comparison enables operators to detect traffic imbalances, spot potential bottlenecks, and better understand traffic distribution across multiple interfaces, leading to more informed network management decisions.
Comparison of interfaces Rx Utilization
Optics Analytics
The Transceiver Widget in the Analytics – Interfaces page of ONES 3.1 offers a comprehensive summary of the various types of transceivers deployed across the managed network. This information is invaluable to customers, allowing them to easily track and identify the count of different transceiver types in use, as well as the various vendor models or manufacturers within their network.
Additionally, the widget includes an Export Data feature, providing more granular details. This exported data contains subsections for each transceiver type and manufacturer, along with information such as the individual transceiver’s serial number and manufacturer date details. This feature helps customers better manage and audit their network hardware inventory, ensuring more efficient network planning and maintenance.
Summary of transceiver inventory
Protocol Enhancements
MC-LAG Visualization
ONES 3.1 introduces a new MC-LAG Filter option on the topology page, allowing users to easily view devices configured with MCLAG (Multi-Chassis Link Aggregation) in their managed networks. By visually displaying the MCLAG configuration, this feature greatly enhances the network view for operators, offering a clearer understanding of redundancy setups and improving overall network visibility and resilience management.
State Transitions
In ONES 3.1, the widgets on the Protocols page have been revamped for better clarity and efficiency. Instead of continuously displaying state data for features like port channels and VXLAN tunnels over a time frame—where minimal changes occur in stable networks—the new design is event-driven. Now, only state transitions are recorded and displayed in tabular form, offering a more concise and actionable view of network changes. Users can still leverage the previously available timeframe selections (1h, 2h, 4h, 12h, 24h, 1w, and 2w) to filter state transitions within a specific period, ensuring they capture relevant changes over any chosen time range. Protocol state transitions include LACP, MCLAG, VXLAN etc.
VTEP State transitions
VLAN Information
A significant addition to ONES 3.1 is the introduction of VLAN data representation on the Protocols page, offering a clear view of configured VLAN information on each switch. VLANs (Virtual Local Area Networks) are a fundamental component of data communication networks, allowing segmentation of network traffic to enhance performance, improve security, and manage broadcast domains more effectively. By logically separating devices within the same physical network, VLANs provide better control over traffic flow, isolating specific segments for efficiency and security purposes.
Navigating to Monitor → Protocols → VLAN gives users a summary count of VLANs across all devices in the managed network. Clicking on a specific device and navigating to its detailed view displays the VLANs configured and their associated ports. If any VLAN is configured with an SVI (Switched Virtual Interface), those details are also displayed, offering users a complete picture of their VLAN setup.
Configured VLAN Information
Conclusion
ONES 3.1 brings a host of UI enhancements designed to improve network management and visibility. Key updates include Default Rules with troubleshooting steps, a Downloadable Summary for quick rule audits, and a Preview Feature for easier rule configuration reviews. The Traffic Comparison tool enables better analysis of interface utilization, while the Transceiver Summary offers detailed insights into network hardware. New features like the MCLAG Filter on the topology page and the event-driven State Transition Data in Protocols provide clearer views of redundancy and protocol changes. Together, these features make ONES 3.1 more intuitive and powerful for network operators.
Ready to see ONES 3.1 in action?
Because smarter troubleshooting, real-time rule insights, and powerful protocol visibility shouldn’t be optional.
In modern high-performance computing and AI-driven workloads, real-time observability is crucial for maximizing performance and preventing failures. ONES delivers a powerful telemetry solution that provides deep insights into compute environments, monitoring NICs, GPUs, CPUs, SSDs, and other critical components. With real-time tracking and seamless multi-vendor compatibility, ONES empowers businesses with proactive performance management, ensuring stability, efficiency, and optimal resource utilization.
Unified Monitoring for Compute, Network Interfaces Cards, and GPU Performance
NIC Insights:
To ensure seamless data transmission, ONES compute telemetry provides comprehensive visibility into network interface performance. It captures key metrics such as operational and administrative status, MTU size, port speeds, and auto-negotiation settings—helping teams assess interface health and diagnose potential issues.
Additionally, ONES monitors Forward Error Correction (FEC) modes to enhance data reliability and tracks Link Layer Discovery Protocol (LLDP) statistics, including transmitted, received, and discarded frames. This enables better network topology mapping, proactive issue resolution, and improved data integrity analysis, ensuring optimal performance in high-performance computing environments.
GPU and Compute Performance Monitoring:
In GPU-accelerated environments, performance bottlenecks can stem from either the compute infrastructure hosting GPUs or the GPUs themselves. ONES provides comprehensive visibility into both, ensuring optimal efficiency and stability.
Compute Health Monitoring:
ONES tracks critical system-wide parameters, including CPU utilization, memory usage, temperature, and platform metadata. This proactive monitoring helps maintain stable performance and prevents thermal-related issues.
GPU Performance Insights:
Using NVIDIA SMI, ONES collects key GPU metrics such as real-time core temperature, utilization, power consumption, memory allocation, bus ID, and serial number. By continuously monitoring temperature fluctuations and power draw, administrators can proactively mitigate failures, optimize workloads, and maximize GPU efficiency.
Figure 1: GPU Performance Overview – Temperature, Memory, Utilization, and Power
CPU and Memory Utilization
Efficient resource allocation is essential for sustaining high-performance computing. ONES continuously monitors CPU load across various intervals and tracks memory usage at both compute and GPU levels to optimize resource distribution and prevent bottlenecks. With real-time system uptime visibility, ONES enables administrators to evaluate long-term reliability, make proactive adjustments, and ensure seamless operations while mitigating unexpected failures.
Figure 2: CPU and Memory Insights – Utilization and Temperature Analysis
Figure 3: GPU Analytics – Utilization, Temperature, and Memory Usage
Storage and Platform Health
Efficient resource allocation is essential for sustaining high-performance computing. ONES continuously monitors CPU load across various intervals and tracks memory usage at both compute and GPU levels to optimize resource distribution and prevent bottlenecks. With real-time system uptime visibility, ONES enables administrators to evaluate long-term reliability, make proactive adjustments, and ensure seamless operations while mitigating unexpected failures.
Figure 4: Disk Health and Utilization – Health, Used Percentage, Usage(in MB), and Temperature Metrics
Vendor-Agnostic and Scalable
ONES observability is vendor-agnostic, collecting network metrics through standard Linux interfaces, supporting multiple NIC vendors such as Intel and Mellanox. This flexibility ensures that ONES adapts to your evolving infrastructure as new hardware and network configurations are integrated.
Designed for large-scale deployments, ONES offers scalable monitoring solutions for thousands of system components without overwhelming server resources. While primarily gathering data from Linux servers, it supports multi-vendor environments, enabling seamless data collection from diverse hardware configurations. By centralizing monitoring across servers hosting GPUs, network interfaces, and individual GPUs, ONES ensures comprehensive performance tracking and efficient management.
Conclusion
ONES 3.1 delivers an efficient, flexible, and scalable solution for monitoring critical system components. With in-depth insights into network performance, GPU metrics, server conditions, CPU load, memory usage, and system uptime, it empowers administrators to optimize performance and prevent failures. Its seamless compatibility with diverse hardware vendors and network configurations makes it the ideal choice for complex, multi-vendor environments. Unlock the full potential of your infrastructure with ONES 3.1’s comprehensive observability and performance monitoring.
The latest release of Open Networking Enterprise Suite (ONES) marks a significant milestone in network observability, introducing comprehensive telemetry support for Spectrum-X switches. This update extends the robust monitoring capabilities of ONES to Cumulus Linux, providing deep visibility into network performance, health, and traffic patterns.In today’s rapidly evolving networking landscape, achieving end-to-end visibility is paramount for maintaining optimal network performance and swiftly addressing potential issues. With ONES, Aviz Networks ensures that organizations leveraging Cumulus Linux 5.9, 5.10, and 5.11 can achieve end-to-end network visibility, enabling efficient troubleshooting, enhanced security, and performance optimization.
Why End-to-End Visibility Matters for Cumulus Networks
End-to-end visibility refers to the comprehensive monitoring and analysis of data as it traverses the entire network infrastructure. This holistic perspective is essential for:
Proactive Issue Detection: Identifying and resolving potential problems before they escalate.
Performance Optimization: Ensuring data flows efficiently, minimizing latency and packet loss.
Security Enhancement: Detecting anomalies and potential security threats in real-time.
Informed Decision-Making: Providing actionable insights for network planning and scaling.
Without such visibility, network administrators often find themselves reacting to issues after they impact operations, leading to increased downtime and reduced efficiency.
As modern data centers become increasingly complex, ensuring seamless monitoring across all network components is critical. Lack of visibility can lead to:
Delayed Issue Resolution – Troubleshooting network problems becomes reactive rather than proactive.
Performance Bottlenecks – Poor visibility can result in increased latency, packet loss, and inefficiencies.
Security Risks – Without continuous monitoring, network vulnerabilities may go undetected.
To address these challenges, ONES supports agentless telemetry for Cumulus, delivering real-time insights into device health, interfaces, traffic statistics, and protocol performance.
Comprehensive Integration with Spectrum-X
Agentless Telemetry Collection
ONES supports Cumulus Linux in an agentless manner, leveraging NVUE (NVIDIA User Experience Daemon) and NGINX for telemetry data collection. NVUE exposes telemetry data through REST APIs, and NGINX acts as a web server to serve these API requests. This enables seamless integration and eliminates the need for additional agents.
Real-World Insights
Live Dashboard View: Real-time visibility into device performance and health metrics.
RoCE Telemetry: Detailed tracking of PFC packets and queue performance, crucial for optimizing RDMA traffic.
Unified Monitoring Experience: A consistent monitoring platform for both SONiC and Cumulus Linux devices, simplifying network management.
Advanced Rule Engine for Proactive Monitoring
ONES 3.1 integrates an advanced Rule Engine that enhances network management by providing automated alerts and notifications. This feature allows administrators to:
Define Custom Rules for monitoring critical Cumulus device metrics.
Receive Real-Time Alerts via Slack, Zendesk, and other integrations.
AI/ML Topology Visualization
ONES provides comprehensive topology visualization with full support for Cumulus devices. Users can:
Monitor AI/ML Fabric for performance optimization.
Visualize and manage network connections in data center environments.
Benefits of Deploying ONES with Cumulus Devices
Implementing ONES within a Cumulus-powered network infrastructure offers several advantages:
Unified Monitoring Platform: Organisations can now monitor both SONiC and Cumulus devices through a single pane of glass, streamlining operations and reducing complexity.
Enhanced Troubleshooting Capabilities: Detailed telemetry data accelerates the identification and resolution of network issues, minimizing downtime and improving service reliability.
Scalability: ONES is designed to handle the demands of large-scale networks, ensuring that as your infrastructure grows, your monitoring capabilities scale accordingly.
Security and Compliance: Comprehensive monitoring aids in maintaining security postures and ensuring compliance with industry standards by providing visibility into all network activities.
Enhanced Security by detecting anomalies and ensuring compliance.
Optimized Performance through RoCE visibility and advanced traffic analysis.
Conclusion
ONES sets a new standard for network observability, delivering end-to-end visibility for Spectrum-X platforms. With agentless telemetry, extensive metrics coverage, and unified monitoring, it empowers organizations to optimize network performance, security, and operational efficiency.
FAQs
1. What is end-to-end observability in Spectrum-X networks and why is it important?
End-to-end observability refers to the ability to monitor data flow and network health from source to destination across the entire infrastructure. In Spectrum-X environments, this ensures reduced latency, faster troubleshooting, and better performance tuning—especially vital for AI/ML workloads and RDMA (RoCE) traffic.
2. How does ONES enable agentless telemetry for Cumulus Linux-based Spectrum-X switches?
ONES collects telemetry using NVUE (NVIDIA User Experience Daemon) via REST APIs and serves it through NGINX, eliminating the need for extra agents. This streamlines deployment while ensuring real-time visibility into Cumulus devices running versions 5.9, 5.10, and 5.11.
3. Can ONES monitor both SONiC and Cumulus devices from a single dashboard?
Yes. ONES 3.1 offers unified observability across SONiC and Cumulus Linux devices through a single interface—simplifying network monitoring in hybrid, multi-vendor environments and enabling consistent rule-based alerts and insights.
4. How does ONES support RoCE traffic visibility for optimizing GPU clusters?
ONES provides detailed metrics on Priority Flow Control (PFC) and queue-level performance, enabling visibility into RoCE packet flows. This is critical for achieving lossless communication in GPU-driven AI clusters and fine-tuning fabric behavior.
5. What are the key benefits of integrating ONES with NVIDIA Spectrum-X for enterprise networks?
Unified network monitoring across vendors
Real-time alerts with an advanced Rule Engine
Visual topology for AI/ML fabrics
Better compliance through complete traffic visibility
Scalability to support growing data center demands
In today’s interconnected world, Network Operations (NetOps) Support Framework is crucial for organizations to maintain a robust and reliable network infrastructure. It provides the foundation to manage and optimize network performance, ensure seamless connectivity, and address other related issues. In this post, we bring you an overview of NetOps Support Frameworks, their key components, and significance in maintaining efficient operations. We also talk about SLAs and their benefits in NetOps Support Framework.
Components of NetOps Support Frameworks
Let’s quickly glance through a few critical components.
1. Network Monitoring and Management
This component covers:
Real-time monitoring of network devices and traffic
Performance analysis and reporting
Configuration management and compliance
Network inventory and asset management
The next-generation management tools offer extensions for supporting advanced functions that include:
Network Orchestration
Streaming Telemetry
Network Orchestration and Telemetry Streaming work together to enable the automation, control, and visibility of network operations while leveraging real-time telemetry data for enhanced network management and analysis. Let’s understand these functions in detail.
Network Orchestration
This function represents the overall system responsible for orchestrating and automating network operations, including configuration management, service provisioning, and network policies. It includes a core component, Orchestration Engine, that receives high-level commands/policies and further, translates them into actionable tasks for Network devices. A network device is a physical or virtual one that makes up the network infrastructure such as a router, switch, firewall, or load balancer.
Telemetry Streaming
This function represents the process of collecting, aggregating, and forwarding real-time network telemetry data to various telemetry consumers for analysis and decision-making purposes. Here, Telemetry Collector acts as an intermediary component responsible for collecting telemetry data from network devices, leveraging protocols like gRPC, NETCONF, or SNMP. Telemetry Consumers refer to the applications, systems, or analytics platforms that consume and analyze network telemetry data. These consumers can include network monitoring tools, data analytics platforms, and machine learning systems.
2. Fault Management and Troubleshooting
Thiscomponent includes:
Rapid detection and isolation of network issues
Root cause analysis and remediation
Incident management and escalation processes
3. Change Management and Configuration
1. Control and coordination of network changes 2. Version control and documentation 3. Change approval processes and tracking
4. Performance Optimization
1. Capacity planning and bandwidth management 2. Quality of Service (QoS) implementation 3. Traffic engineering and optimization 4. Proactive network optimization strategies
5. Security and Compliance
1. Network security monitoring and threat detection 2. Firewall management and access control 3. Compliance with industry regulations (for example PCI-DSS, GDPR) 4. Vulnerability assessment and patch management
Supporting Multi-Vendor NOS and Switch Hardware
In today’s diverse networking landscape, organizations often rely on a mix of network operating systems (NOS) and vendors to meet their specific requirements. However, managing and supporting multi-vendor NOS environments poses unique challenges that can be streamlined with specialized NetOps Support Frameworks. Multi-vendor NOS integration in NetOps Support Frameworks requires an understanding of interoperability challenges and the need for standardized management frameworks. For a seamless multi-vendor NOS support, vendor-agnostic network monitoring and management are primarily needed for:
1. Consolidated monitoring of dashboards for heterogeneous network devices 2. Integration with various NOS APIs for unified device management 3. Leveraging standardized protocols (for example SNMP, NETCONF, RESTful APIs) for device communication 4. Managing and troubleshooting cross-vendor faults: a. Correlation of alerts and events from different NOS vendors b. Centralized incident management and ticketing system c. Collaboration with vendor support teams for issue resolution 5. Change management and configuration: a. Standardized configuration templates for different NOS vendors b. Integration with configuration management databases (CMDB) c. Change tracking and rollback mechanisms for multi-vendor environments 6. Performance optimization and traffic engineering: a. Bandwidth allocation and optimization across diverse NOS platforms b. QoS implementation for consistent performance across vendors c. Traffic engineering strategies for load balancing and optimization
Importance of Service Level Agreements (SLAs)
In network infrastructure support, SLAs define the agreed-upon expectations/responsibilities between service providers, like Aviz Networks, and their customers. These SLAs outline key performance indicators such as service availability, response times, and other parameters.
Therefore, these play a vital role in ensuring that the network meets desired service levels and provides a satisfactory user experience. Let’s deep dive into more details:
KPIs: SLAs outline multiple KPIs such as network availability, packet loss, latency, throughput, and response times. By benchmarking the metrics, SLAs provide a quantifiable means for evaluating the performance of network infrastructure as well as service provider.
Network Availability: SLAs specify the expected level of network availability, typically expressed as a percentage of uptime over a given period. This metric indicates how often the network should be operational and accessible to users. It also ensures the accountability of a network service provider for maintaining a reliable and continuously available network infrastructure.
Response and Resolution Times: SLAs often include response and resolution time commitments for network incidents or service requests. The response time defines how quickly the service provider should acknowledge and respond to reported issues. The resolution time sets expectations about the time required to restore the network service to its normal functioning state.
Downtime and Maintenance Windows: Another benefit of such agreements is the provision for scheduled maintenance windows during which network services may be unavailable temporarily. By establishing a clear schedule and notifying customers in advance, SLAs help manage expectations and minimize service disruptions.
Escalation Procedures: SLAs outline escalation procedures to follow in case of critical incidents or service disruptions. This ensures that prompt actions are taken to address the issue and involve higher-level support or management, if necessary.
Remedies and Compensation: SLAs include provisions for remedies in the form of service credits, discounts, or other types of compensation to mitigate the impact of service disruptions/failures caused by the service providers.
Reporting and Review: Lastly, these agreements usually include reporting mechanisms to track and communicate network performance against the agreed-upon metrics. Regular performance reports and service reviews enable both parties to assess the network’s performance, identify areas for improvement, and ensure transparency and accountability.
Benefits of SLAs in NetOps Support
Improved Operational Efficiency: a. Streamlined management processes for diverse NOS platforms b. Reduced complexity and overhead associated with managing multiple vendors c. Centralized visibility and control over the entire network infrastructure
Enhanced Network Resilience and Performance: a. Rapid fault detection and resolution across different NOS environments b. Optimal utilization of network resources through unified performance optimization strategies c. Consistent security measures and compliance enforcement across vendors
Customer Satisfaction and Business Continuity: a. Adherence to SLAs for ensuring service reliability and customer satisfaction b. Minimized downtime and faster incident resolution through SLA-driven support processes c. Risk mitigation associated with multi-vendor environments
ONES from Aviz Networks is a network observability/visibility, orchestration, and assurance solution for network switches running SONiC and vendor-proprietary NOS (Network Operating System).
ONES provides a one-stop solution, right from providing better visibility into your data center networks to extending 24×7 support function for SONiC. It also hosts a powerful analytics engine that provides Proactive, Predictive, and Prescriptive Analysis of common network anomalies and disruptions.
The key capabilities of ONES include:
Purpose-built solution for SONiC deployments
Supports multiple NOS for comprehensive visibility
Orchestration and deep telemetry for observability
24×7 enterprise-grade support options for SONiC
ONES – Value and Beyond
MONITOR
Monitor your entire multi-NOS fabric
Manage inventory of your network devices running any Network OS on Broadcom, Marvell, Nvidia, and other leading ASICs View topology of the entire fabric across multiple hardware platforms, and network operating systems Monitor traffic, system health, bandwidth utilization, and more between and across devices
ORCHESTRATE
Configure your SONiC fabric with ease
Create and configure CLOS topology for ToR, Leaf, Spine, and Super-spine layers Apply and validate configurations pre- and post-deployment Compare running configs against applied configs at any point Upgrade devices with a single-click via ZTP or custom NOS Images
SUPPORTABILITY
NetOps Simplified
Proactively track Switch CPU/memory consumption, bandwidth, link failures, traffic errors, and more Instantly connect to individual devices for maintenance and quick troubleshooting Collaborate across your teams and with our SONiC experts to solve issues more efficiently
Traditional Network Orchestration tools have evolved from just delivering and monitoring network functions for proprietary NOS to designing and building network fabrics in an automated and intent-based approach.
ONES takes the Orchestration journey to the next level—adding capabilities from SONiC NOS across a fleet of multi-vendor and multi-ASIC switches, bringing together capabilities of streaming telemetry, API programmability, network control, intent-based fabric configuration, and SLA assurance for supportability.
Predictive failure/health analytics and capacity planning enable Orchestration tools (like ONES) to provide a seamless adoption journey for SONiC by leveraging historical trends of resource utilization, traffic patterns, logs/events, and derived application/workload performance.
Supportability, a crucial feature of Network Orchestration tools, goes beyond just notifying and alerting. It also enables integration with IT tools/engine to check anomalies or events correlation using real-time or historical data, single-touch management, and in turn, simplify switch/fabric onboarding for scale.
With the rapid adoption of open-source SONiC, ONES has emerged as a one-stop solution for network infrastructure teams. It seamlessly enables orchestration, deep telemetry, and assurance for multi-vendor deployments. Most importantly, the 24×7 SRE support enables them to introduce SONiC in their networks with utmost confidence.
Author: Arakkal Kunju Mohammed Yasser, Director of Engineering, Site Reliability Engineering
FAQs
1. What are the key components of an effective NetOps Support Framework?
An effective NetOps Support Framework typically includes:
Network Monitoring and Management: Real-time traffic monitoring, performance analysis, configuration compliance, and asset inventory.
Fault Management and Troubleshooting: Rapid issue detection, root cause analysis, and escalation workflows.
Change Management: Coordinated control of network changes, version tracking, and change approval systems.
Performance Optimization: Bandwidth management, QoS implementation, and proactive traffic engineering.
Security and Compliance: Threat detection, access control, patch management, and regulatory compliance (e.g., PCI-DSS, GDPR).
These components collectively support resilient, secure, and high-performing network operations.
2. How does ONES simplify NetOps for multi-vendor and multi-NOS environments?
ONES provides a vendor-agnostic platform that unifies visibility, orchestration, and assurance across various switch vendors and network operating systems (e.g., SONiC, Cumulus Linux, Arista EOS, Cisco NX-OS). It enables:
A single-pane-of-glass view across all devices.
Streamlined inventory management and real-time telemetry monitoring.
Support for multi-ASIC environments (Broadcom, Marvell, NVIDIA).
Deep telemetry, configuration drift detection, and simplified switch onboarding. This allows enterprises to operate mixed environments with confidence and ease.
3. What role do SLAs play in NetOps support and infrastructure resilience?
SLAs (Service Level Agreements) define expectations between providers like Aviz and enterprise customers. They cover metrics like:
Network availability
Packet loss, latency, and throughput
Response and resolution times for incidents
Downtime windows and escalation paths
SLAs ensure accountability, drive operational efficiency, and deliver business continuity by guaranteeing faster incident resolution and minimizing risks in multi-vendor SONiC environments.
4. How does ONES integrate telemetry and orchestration for SONiC-based networks?
ONES uses streaming telemetry and intent-based orchestration to manage SONiC-based fabrics. It:
Collects near real-time data for health, traffic, and configuration metrics.
Supports Day 1 and Day 2 operations with automated config validation and topology orchestration (e.g., CLOS).
Integrates with APIs and analytics tools to enable proactive troubleshooting, configuration management, and real-time insights.
This fusion allows SREs to operate with deep observability and automation, improving network efficiency.
5. What are the benefits of predictive analytics in SONiC network operations
ONES uses proactive, predictive, and prescriptive analytics to detect and prevent network anomalies before they impact operations. It helps teams:
Predict failures based on trends in CPU usage, memory, traffic patterns, and logs.
Plan for capacity upgrades and network scaling.
Reduce downtime with early warnings and automation workflows.
Make data-driven decisions for optimization and future-proofing network infrastructure.
Predictive analytics empowers NetOps teams to shift from reactive to proactive network management.
6. How do SLAs improve NetOps efficiency and service reliability?
Service Level Agreements (SLAs) define network performance expectations between providers and customers. They ensure:
Guaranteed network uptime and availability
Defined response and resolution times for network incidents
Proactive performance monitoring with KPIs like latency, packet loss, and throughput
Escalation procedures and service credits in case of SLA breaches By setting clear performance benchmarks, SLAs improve operational efficiency, customer satisfaction, and business continuity.
7. How does ONES simplify SONiC adoption in network environments?
ONES facilitates SONiC adoption by:
Providing a single platform for monitoring SONiC and other NOS-based networks
Automating configuration and deployment of SONiC switches
Enabling real-time telemetry and deep observability
Offering 24×7 SRE support to resolve SONiC-related issues efficiently It reduces complexity, speeds up deployment, and enhances the operational stability of SONiC-powered networks.
8. How does predictive analytics help in preventing network failures?
ONES leverages AI-driven predictive failure analytics to:
Detect early warning signs based on historical trends and real-time data
Forecast potential capacity bottlenecks and performance degradation
Automate preventive actions before failures occur
Enhance traffic engineering to optimize resource utilization This ensures a proactive rather than reactive approach to network maintenance.
9. How does ONES integrate with existing IT and security tools?
ONES supports integration with:
Configuration Management Databases (CMDBs) for centralized configuration tracking
Incident management and ticketing systems for automated troubleshooting
Security frameworks to enforce compliance with regulations like PCI-DSS and GDPR
RESTful APIs and telemetry consumers for seamless interoperability This makes ONES an enterprise-friendly solution that fits into existing NetOps ecosystems.
10. What makes ONES different from traditional network orchestration tools?
Traditional network orchestration tools focus mainly on basic configuration and monitoring. ONES goes beyond by offering:
Intent-based fabric configuration for automated network design
Streaming telemetry for real-time insights
AI-powered analytics to predict and prevent failures
Multi-vendor and multi-ASIC support for heterogeneous network environments
SLA-driven assurance for better reliability and performance
We, at Aviz Networks, are dedicated to continuously optimizing processes, strengthening your SONiC network, and bringing the best version of our every service at your fingertips.
Moving a step ahead in our journey, we are thrilled to announce the release of ONES 1.3. This latest version is armed with advanced features, specially designed to enhance your orchestration and Day 2 operations experience. Here’s an article by our Fabric Manager Team that deep dives into the remarkable additions and benefits of ONES 1.3.
Redefining Network Redundancy with EVPN Multihoming
EVPN Multihoming (EVPN MH) stands out as our star feature in ONES 1.3. Its cutting-edge technology takes over from the traditional Multi-Chassis Link Aggregation Group (MC-LAG) setup—offering a new paradigm in terms of server redundancy, high availability, load balancing, and scalability. EVPN MH brings in a level of resilience that ensures your SONiC network remains operational even in the face of failures. With its increased bandwidth and scalability, it’s a game changer for organizations aiming to optimize their network infrastructure.
Maximizing Reliability with L2/L3 Vxlan EVPN
While EVPN Multihoming is a critical addition, ONES 1.3 also builds upon the solid foundation laid by its predecessor. Features like L2 VxLAN (Asymmetric IRB) and L3 VxLAN (Symmetric IRB) – initially introduced in ONES 1.2 – power up your SONiC network fabric with enhanced reliability through mechanisms like MC-LAG and EVPN MH. These features seamlessly integrate into your network architecture to create a robust fabric that ensures your data center operations run smoothly and efficiently.
Deployment Verification Status in ONES Configurations Page
Safeguarding Your Configuration with Backup and Restore
Besides focusing on advanced networking, ONES 1.3 also addresses the critical need for configuration management. Its new Config Backup and Restore feature empowers users to take proactive measures in securing their network configurations. They now get the option to create backups of their device configurations within the fabric. This provides a safety net, allowing every user to confidently experiment with configurations while having the flexibility to restore to the previous state in case anything goes awry.
Restore Configuration in ONES
Unlock Automation Potential with ONES API Support
Our latest version brings automation to the forefront by introducing comprehensive API support. All orchestration and management functionalities are seamlessly integrated into the ONES user interface through APIs. This not only enhances the user experience but also allows you to harness the power of automation in your data center operations. A wide range of APIs, including uploadDay1Config, getDay1ConfigStatus, rebootRequest, upgradeNOSImage, and more, are at your disposal—enabling you to streamline and expedite various tasks.
Here’s a glimpse of the APIs at your fingertips:
uploadDay1Config: Enables intent-driven orchestration that performs Day 1 fabric orchestration for various data center topologies. This method initiates the Day 1 orchestration depending on the topology and intent supplied via a template file. This REST API allows network operators to upload an entire intent file (yaml-based) and orchestrate the entire fabric in a desired intent-based underlay and overlay.
getDay1ConfigStatus: Retrieves Intent Status after provisioning over SONiC-enabled fabric devices
configsListToRestore: Views available backups for a specific device
status: Retrieves deployment status logs
Day 2 Operations API: Streamline NetOps
The ONES 1.3 release also revolutionizes Day 2 operations with the introduction of the innovative replaceConfig API. This API provides a contextual diff between the current running configuration of a device and a given golden configuration file. With this insight, users can meticulously examine the differences and proceed to apply the desired changes to the device. Most importantly, the system gracefully rolls back to the base configuration of the device and ensures operational stability in case there is any issue during the application process.
Auto-Discovery
Auto-discovery capability of ONES discovers SONiC devices over a secure channel and automatically collects network state data using streaming telemetry providing drill down insights including Inventory, platform & system health, control & data plane utilization and compliance.
We are proud to share that ONES 1.3 is a testament to the evolution of our SONiC network management and orchestration. With groundbreaking features like EVPN Multihoming, Config Backup and Restore, extensive API support, and the Day 2 Operations API—network administrators and IT teams can seamlessly streamline their operations, enhance reliability, and embrace the power of automation. This release reaffirms Aviz Networks’ commitment to delivering cutting-edge solutions that empower organizations to achieve operational excellence in their data centers. Together, let’s witness the future of network orchestration and management first-hand
For more information about ONES 1.3 and other Aviz Networks products, please visit https://aviznetworks.com/
To explore ONES 1.3 in action and get hands-on experience with the hardware of your choice, schedule your demo at https://aviznetworks.com/one-center
Author: Tarun Kumar Polanki, Sr Solution Engineer
FAQs
1. What is EVPN Multihoming, and how does it improve SONiC network reliability?
EVPN Multihoming in ONES 1.3 replaces traditional MC-LAG by offering superior server redundancy, high availability, load balancing, and scalability. It ensures that the SONiC network remains resilient and operational even during failures, providing increased bandwidth and fault tolerance.
How does ONES 1.3 simplify Day 2 operations with new APIs?
ONES 1.3 introduces the replaceConfig API for Day 2 operations, which compares the current device configuration with a golden config. If any issue arises during changes, the system gracefully rolls back, ensuring operational stability and seamless NetOps.
3. Can ONES 1.3 support configuration backup and restore for SONiC fabrics?
Yes, ONES 1.3 includes a Config Backup and Restore feature, allowing users to securely back up device configurations and restore them when needed. This ensures safer experimentation and faster recovery during troubleshooting.
4. How does the ONES orchestration API automate SONiC deployment?
ONES 1.3 provides APIs like uploadDay1Config, getDay1ConfigStatus, and upgradeNOSImage, enabling intent-driven orchestration. These APIs streamline Day 1 operations by automatically configuring fabric topologies and managing NOS image upgrades.
5. What are the key features introduced in ONES 1.3 that enhance SONiC orchestration and management?
ONES 1.3 introduces EVPN Multihoming, expanded VxLAN support, Config Backup & Restore, full API integration for orchestration and Day 2 operations, and auto-discovery of SONiC devices. These features elevate automation, visibility, and operational reliability.
6. What is the benefit of L2/L3 VXLAN EVPN in ONES 1.3 for SONiC networks?
ONES 1.3 supports both L2 VXLAN (Asymmetric IRB) and L3 VXLAN (Symmetric IRB) modes, enhancing the reliability and flexibility of your SONiC fabric. These features integrate seamlessly with EVPN MH to create a scalable, high-performance network fabric ideal for modern data center needs.
7. How does ONES 1.3 enhance intent-driven orchestration for Day 1 SONiC deployments?
Through the uploadDay1Config API, ONES 1.3 enables intent-based orchestration by allowing users to submit YAML-based topology templates. This initiates automated SONiC fabric configuration, streamlining Day 1 operations across underlay and overlay environments.
8. Can ONES 1.3 monitor deployment status and logs during orchestration?
Yes, ONES 1.3 includes the getDay1ConfigStatus and status APIs, which provide real-time orchestration progress tracking and deployment logs. These capabilities help administrators monitor and validate configurations as they’re applied across the SONiC fabric.
9. What type of upgrades can be performed using ONES 1.3?
ONES 1.3 supports advanced upgrade automation with APIs like upgradeNOSImage and enableZTPUpgrade. These allow users to upgrade SONiC images on specific devices or trigger Zero Touch Provisioning (ZTP)-based upgrades, improving operational speed and reducing manual intervention.
10. How does ONES 1.3 support rollback and error handling during config changes?
The replaceConfig API in ONES 1.3 allows for a contextual comparison between the current and desired configuration. If an issue arises during the application of changes, ONES automatically rolls back to the previous configuration, ensuring network stability.
The field of network monitoring and visibility has experienced a remarkable evolution, driven by the increasing complexity of computer networks and advancements in data handling and processing. This article explores the journey of network monitoring and visibility, from its early days of collecting basic metrics to its current state of providing intelligent insights and proactive network management for the SONiC fabric. Let’s first delve into how network monitoring and visibility have become indispensable aspects of modern-day networking, enabling organizations to gain valuable insights and make informed decisions.
Traditional Network Monitoring: Early network monitoring used Simple Network Management Protocol (SNMP) to gather basic metrics like bandwidth, packet loss, and latency. It aids in fault detection, performance tracking, proactive issue identification, troubleshooting, and compliance assurance. Despite scalability limitations, it’s still popular in large on-prem legacy networks.
Flow-based Monitoring: NetFlow and sFlow introduced flow-based monitoring, analyzing network traffic patterns by collecting comprehensive communication session data. It provides comprehensive network traffic insights, identifying usage patterns, bottlenecks, and anomalies for efficient network management.
Performance Monitoring and Analysis: Advanced performance monitoring tools evolved to provide real-time analysis, historical data, customizable dashboards, and insights into network traffic, application performance, and user behavior. They were designed to optimize network efficiency, identify potential issues, improve troubleshooting, and enhance user experience.
Alerting and Event Correlation: Alerting and event correlation mechanisms were then created by grouping related events with the intent to streamline network management, reduce response times, prevent system overloads, and enhance security by detecting anomalies quickly.
Network Observation and Topology Mapping: Network Observation tools started providing graphical representations of network components, connections, and traffic flows. These tools helped enhance network understanding, simplify troubleshooting, improve planning, and boost operational efficiency through clear infrastructure representation.
Application-Aware Monitoring: Monitoring tool eventually started to include application-specific metrics and insights such as deep packet inspection and performance tracking, enhancing user experience, and aligning network monitoring with business goals for optimal application performance.
Security and Threat Monitoring: Network monitoring soon started to include security measures like intrusion detection, and threat detection tools, facilitating early detection of breaches through real-time surveillance and anomaly detection to ensure regulatory compliance.
Unified Network Monitoring: With the rise of cloud computing and mobile devices, unified network monitoring emerged, providing comprehensive visibility via a single dashboard for monitoring network performance and security across different environments.
Packet-based Monitoring: Packet-based monitoring became prevalent, capturing and analyzing data packets to gain detailed insights into network traffic, especially from security and application performance perspectives.
Intelligent Insights and Predictive Analytics: Finally, artificial intelligence and machine learning are now enabling real-time network data analysis enabling proactive troubleshooting, optimizing network performance, predicting potential issues, and aid in strategic decision-making.
At Aviz, we are at the forefront of the Open Networking revolution, enabling SONiC (Software for Open Networking in the Cloud), the open-source network operating system for enterprises, so you can not only leverage the flexibility of open-source to innovate but also optimize on the cost of their network infrastructure investments. We realize that network observability is critical for the enterprise to effectively manage and secure its network infrastructure. Hence, we have taken a comprehensive and inclusive approach to delivering the ultimate network monitoring and visibility solution; one that not only covers all the traditional aspects of network observability but is also future looking to address the needs of modern network infrastructures.
Our SONiC fabric visibility solution, Open Networking Enterprise Suite (ONES) offers a multi-vendor, multi-NOS (Network OS) platform that enables efficient management and security of the modern-day network infrastructure. By using ONES, enterprises of all sizes can benefit from the deep visibility it delivers, in particular for deployments involving SONiC on any hardware with any underlying ASIC.
ONES brings a range of essential features and capabilities that support extensive and effective visibility (figure 1). ONES telemetry agents collect and stream network telemetry data in near real-time to ensure administrators have the latest information for proactive monitoring and troubleshooting. User-friendly network topology visualization provides actionable insights for the entire network in a single unified view.
ONES dashboards are designed to provide deep insights into devices, software running on those devices, including peripherals such as transceivers connecting those devices. Version tracking for software, firmware, patches, and updates help stay compliant with security requirements and licensing policies (figure 2).
Figure 2: Aviz ONES Compliance Analytics
Continuous tracking of metrics for system health with customizable thresholds for alerting ensure smooth operations and proactive management of possible hardware failures (figure 3).
Figure 3: Aviz ONES System Health Tracking
Real-time data analysis for bandwidth utilization and traffic errors provide meaningful insights for performance optimization and capacity planning (figure 4).
Figure 4: Aviz ONES Traffic Monitoring
Above are just a few examples of the comprehensive visibility ONES brings for SONiC fabric monitoring. More information on Aviz ONES can be found on our website and we are always happy to schedule a demo for any one interested in learning about ONES.
Conclusion
As SONiC deployments continue to gain momentum, the need for extensive monitoring and visibility along with proactive network management is getting more and more crucial for network operators. At Aviz, we strive to set the standards for SONiC fabric visibility, and provide the most comprehensive solution with deep insights regardless of the underlying hardware SONiC is running on. Our goal is to deliver a seamless experience for enterprises that are transitioning to the open-source NOS that not only lowers their network infrastructure TCO, but also delivers the flexibility to collaborate and innovate for the next-generation networks.
FAQs
1-What is SONiC fabric visibility and why is it essential?
SONiC fabric visibility refers to the ability to monitor, analyze, and manage all aspects of a network running on the SONiC (Software for Open Networking in the Cloud) operating system. It enables real-time insights into performance, health, and security, making it vital for enterprises to ensure operational efficiency, cost control, and proactive issue resolution in open networking environments.
2-In what ways does ONES go beyond traditional network monitoring tools?
ONES surpasses traditional tools by offering a platform built specifically for SONiC and open networking. It incorporates modern features such as AI-powered predictive analytics, event correlation, application-aware monitoring, and detailed packet-level analysis all while supporting multi-vendor and multi-ASIC environments, which legacy systems often lack.
3-Is ONES limited to SONiC-certified hardware, or can it run across different platforms?
ONES is hardware-agnostic. It is designed to monitor SONiC deployments across any vendor hardware and any underlying ASIC, making it suitable for enterprises looking for flexibility without vendor lock-in. This enables seamless adoption of open-source NOS while maximizing infrastructure reuse and cost efficiency.
4-How does Aviz ONES improve SONiC network monitoring and observability?
Aviz ONES (Open Networking Enterprise Suite) provides deep observability into SONiC networks through real-time telemetry, system health monitoring, traffic analytics, compliance tracking, and network topology visualization. It unifies all visibility layers into a single dashboard, enabling efficient monitoring across diverse hardware and NOS combinations.
5-How does ONES support compliance, security, and proactive troubleshooting?
ONES supports compliance and security by continuously tracking software versions, firmware, patch levels, and hardware health. It also includes customizable alerts, real-time anomaly detection, and traffic error monitoring to proactively identify issues and prevent outages, all while meeting enterprise-grade compliance standards.
6. How does ONES enable real-time telemetry collection in SONiC environments?
ONES deploys telemetry agents across devices running SONiC and other NOS platforms. These agents collect and stream real-time metrics related to device health, traffic flow, and system status, allowing administrators to maintain continuous awareness of network performance and proactively respond to anomalies or failures.
7. What kinds of visual insights does ONES provide for SONiC network topology?
ONES offers graphical topology visualization that maps every device and connection across the network fabric. This visual representation helps operators quickly understand infrastructure layout, pinpoint faults, and optimize planning. It simplifies complex, multi-vendor environments into a unified, actionable interface.
8. Can ONES help reduce network downtime and mean time to repair (MTTR)?
Yes. By offering customizable alert thresholds, real-time health monitoring, and intelligent fault detection, ONES enables teams to identify potential hardware failures or traffic anomalies before they cause downtime. This proactive approach dramatically reduces MTTR and improves overall service availability.
9. How does Aviz ONES support bandwidth monitoring and capacity planning?
ONES analyzes real-time and historical traffic data to provide deep insights into bandwidth utilization and traffic errors. This enables network teams to optimize performance, identify congestion points, and make informed decisions about scaling and infrastructure investments for future growth.
10. What makes ONES a future-ready solution for SONiC observability?
ONES isn’t just about current visibility—it integrates AI and predictive analytics to anticipate issues before they occur. Combined with support for multi-vendor platforms, compliance tracking, and extensibility, ONES prepares enterprises for next-gen network demands while supporting their open networking goals today.
Networking data centers are a critical infrastructure for modern businesses, serving as the backbone for various services and applications. These data centers often comprise a multitude of interconnected nodes to handle the routing and processing of network traffic.
In a data center fabric, the maintenance mode on spine and leaf devices needs to be activated quite often. During this mode, the traffic flowing through the device is drained out or rerouted to other devices so that you can perform maintenance activity like replacing line cards or fixing any issue on the device by removing a device from a production environment or taking a node offline temporarily for maintenance. This ensures the services running on the node are seamlessly transferred to other available nodes without causing any downtime for users. In general, this operation is called cost-out and cost-in. The ‘cost-out’ operation drains out the traffic and the ‘cost-in’ feeds in the traffic after the drain out.
This blog explains these operations in detail. Further, it explores how you can seamlessly achieve these with SONiC as NOS and using ONES-Orchestration as a tool.
Understanding Why Cost-Out And Cost-In Operation is Necessary in Data Centers Explore how draining plays a key role in maintaining data centers:
Planned Maintenance: Data Centers require maintenance activities like hardware upgrades, software updates, or network reconfigurations. Draining allows administrators to take a device offline, transfer its workload, and perform maintenance tasks without affecting service availability.
Capacity Planning: At times, data centers need to redistribute resources to optimize performance. Draining allows administrators to free up resources on specific devices by moving services and workloads to other devices with extra capacity. This helps in balancing the workload and avoiding resource bottlenecks.
Faulty Device or Component Replacement: When a device or its components (such as a hard drive, power supply, or network interface) fail, it becomes necessary to replace/repair them. Draining allows a controlled migration of services from faulty device to the functional ones. Post the replacement/repair, the device is reintegrated into the production environment.
Load Balancing: centers often employ load-balancing techniques to evenly distribute incoming requests and workloads across multiple devices. If a device becomes overloaded or experiences performance issues, administrators may decide to drain it to redistribute the workload to other devices. This helps to prevent service degradation/potential failures caused by an overloaded device.
Scheduled Downtime: Sometimes, data centers undergo planned downtime for various reasons such as infrastructure upgrades, reconfigurations, or security patches. The draining allows a smooth transition of services to other active devices, ensuring users experience minimal or no interruption in service during the scheduled downtime.
In short, draining is essential in a data center to facilitate planned maintenance, optimize capacity, replace faulty components, balance workloads, and ensure seamless service continuity during scheduled downtime. It enables administrators to maintain the data center infrastructure efficiently while minimizing disruptions for users.
How NetOps Achieve Cost-Out and Cost-In
NetOps tools generally leverage automation, programmability approach and use specific APIs either offered by NOS itself or interim custom solutions owned by them. These APIs allow NetOps tools to interact with network devices using a standard, user-friendly interface. This enables network administrators to automate tasks, configure, and extract information from the devices and interfaces within the network—thereby simplifying the process of managing complex networks. This approach provides an ability to make network changes more quickly using automated processes. Instead of relying on manual command-line interface (CLI) scripting, network administrators can use this approach to create, modify, and delete configurations across multiple devices simultaneously. This not only saves time but also reduces the risk of human error.
Another advantage of using NetOps tools with an API is their increased agility. Similar to faster development and deployments in the DevOps cycle, NetOps tools enable rapid updates and fixes within the network. Further, network administrators can leverage data analytics to make informed decisions about which task to automate and prioritize. Here’s this procedure in a nutshell.
To summarize, NetOps cost-out and cost-in process, when performed in conjunction with APIs, offer numerous benefits for network operations. They enable faster provisioning and deployment, continuous improvement, proactive remediation, and easier troubleshooting. By automating tasks and leveraging AI and ML capabilities, NetOps tools help network administrators manage complex IT infrastructures more efficiently.
A Brief Overview of ONES and ONES-Orchestration
ONES is a networking solution to simplify and streamline NetOps for multi-vendor network automation and orchestration solutions.
Together, the ONES (Open Network Enterprise Solution) and ONES-Orchestration tools provide a comprehensive solution for managing and maintaining network infrastructure. They offer seamless integration with existing network equipment along with increased flexibility, scalability, network performance, reliability, and cost savings.
ONES-Orchestration Integration with SONiC Fabric
ONES-Orchestration is a platform for building network automation and orchestration solutions that work with the SONiC Fabric. It provides a set of tools, libraries, and APIs, deploying, and managing multi-vendor network infrastructure. Some of the key features include:
Multi-Vendor Support: ONES Orchestration supports a wide range of networking equipment from different vendors, allowing organizations to build automated and orchestrated network solutions that can work with their existing infrastructure.
Network Automation: ONES Orchestration provides tools and libraries for automating network operations such as configuration management, network provisioning, and change management.
Orchestration: ONES Orchestration offers a platform for orchestrating network services such as routing, firewalling, and load balancing, across multiple devices and vendors.
Integration with Popular Tools: ONES Orchestration can be integrated with popular network management and monitoring tools such as Ansible, Grafana, and Nornir—allowing organizations to leverage their existing toolsets and workflows.
ONES-Orchestration API for Cost-Out and Cost-In (Day 2 Use Case)
The ONES Orchestration is a Dockerized container capable of operating on x86 systems. It provides an API for performing soft provisioning and generating a difference. The very same API can be used to finally push/apply the configuration to the devices.
At the controller, this API will identify the delta and push the delta configuration back to the device in most cases without impacting forwarding, unless the changes themselves are disruptive. ONES Orchestration can be easily integrated into the existing NetOps workflow by giving intended configuration as input. Here’s the high-level diagram of the proposed ONES Orchestration tool and API interface.
The Ultimate Solution for Streamlining Your NetOps (Day-2 Use Case)
The ONES Orchestration is a reliable solution for streamlining NetOps by offering seamless integration with existing network infrastructure, increased flexibility, and scalability. These technologies can help businesses tackle the challenges faced by modern network operations teams, and build robust networks to meet the demand of today’s digital world.
Implementation and Optimization
To streamline your NetOps with SONiC Fabric and ONES Solutions, here are some next steps to consider:
Research about compatible switching hardware and choose the switches that meet your organization’s requirements
Download and install SONiC on the chosen switches, following the provided installation packages and documentation
Install ONES Solution on your server or virtual machine and integrate it with SONiC Fabric network
Design and implement network automation and orchestration solutions using ONES Orchestration tools and libraries
That’s it! Get set to unlock the full potential of ONES Orchestration, the benefits of streamlined NetOps, and limitless possibilities for your organization.
FAQs
1-What is ONES Orchestration and how does it enhance SONiC fabric deployment?
ONES Orchestration is a powerful network automation and orchestration platform designed to work seamlessly with SONiC-based fabrics. It simplifies Day-2 operations like cost-out and cost-in through APIs, enabling automated provisioning, configuration management, and service orchestration across multi-vendor network environments dramatically reducing deployment time and operational overhead.
2-How does the cost-out and cost-in process work in SONiC-based data centers?
In SONiC environments, cost-out operations drain traffic from a device to prepare it for maintenance or upgrades, while cost-in reintroduces the device post-maintenance. Using ONES Orchestration, this process is automated via APIs, ensuring seamless traffic rerouting, zero-downtime service transitions, and error-free configuration updates across the network fabric.
3-Can ONES Orchestration integrate with existing NetOps tools like Ansible and Grafana?
Yes, ONES Orchestration is designed to integrate effortlessly with popular NetOps tools like Ansible, Grafana, and Nornir. This compatibility allows organizations to enhance their current automation workflows, visualize network health metrics, and orchestrate complex changes without overhauling their existing toolchains.
4-What are the key benefits of using ONES for multi-vendor network environments?
ONES offers robust multi-vendor support, real-time telemetry, and centralized visibility across different NOS platforms. It eliminates vendor lock-in, improves operational efficiency, and accelerates deployment cycles. With ONES Orchestration, NetOps teams gain scalable and flexible tools for managing complex infrastructures, reducing downtime and configuration errors.
5-How can enterprises get started with SONiC and ONES Orchestration for Day-2 NetOps?
To get started, organizations should select SONiC-compatible switches, install SONiC firmware, and deploy the ONES suite on a server or VM. Once integrated with the fabric, ONES Orchestration APIs can be used to automate Day-2 tasks such as device draining, configuration deltas, and service provisioning—laying the foundation for agile and efficient NetOps.
6. How does ONES Orchestration perform configuration changes without disrupting traffic?
ONES Orchestration uses a delta-based API approach that identifies only the required configuration changes and pushes them to devices. In most cases, this soft provisioning occurs without impacting traffic forwarding, unless the changes are inherently disruptive—enabling zero-downtime updates in production environments.
7. What is the role of ONES in automating Day-2 network operations?
Day-2 operations such as maintenance, upgrades, and configuration adjustments are simplified by ONES through automated cost-out and cost-in workflows, real-time telemetry, and programmable APIs. This reduces manual intervention, minimizes error risk, and enhances the agility of NetOps teams managing dynamic network fabrics.
8. Why is draining (cost-out) critical in modern data centers?
Draining allows traffic to be safely rerouted from devices undergoing maintenance, upgrades, or experiencing faults—without impacting user-facing services. It supports load balancing, resource optimization, and fault isolation. ONES automates this process, making it repeatable, reliable, and vendor-agnostic.
9. What makes ONES Orchestration suitable for large-scale or hybrid infrastructure?
ONES Orchestration supports multi-vendor, multi-ASIC environments, integrates with tools like Ansible and Grafana, and enables centralized control via API-driven automation. This makes it ideal for large-scale, heterogeneous networks that demand scalability, flexibility, and reliability in automation workflows.
10. Can ONES Orchestration help reduce human error in network operations?
Absolutely. By automating critical tasks like configuration management, provisioning, and service orchestration through standardized APIs, ONES minimizes manual CLI interactions—significantly reducing the likelihood of configuration mistakes, which are common causes of outages in traditional network management.
We recently announced the general availability of Open Networking Enterprise Suite (ONES), the industry’s first supportability stack designed to empower network operators to migrate to SONiC. Since its inception, Hyperscalers have used open-source SONiC to manage and control their network. Enterprises globally are now looking to replicate the Hyperscaler success, but they face unique challenges around SONiC supportability as they transition to the open-source NOS. With Aviz’s ONES, and a growing multi-vendor SONiC ecosystem, enterprises can now easily transform their networks like hyperscalers.
Enabling SONiC Adoption for New or Existing Networks
ONES ushers network operators into a new era of open networking by allowing:
Hardware Agnostic Interoperability: ONES works on any Switch using any ASIC running SONiC, providing the software stack network operators need to make the move to white box switches in data centers and edge networks while delivering visibility for traditional NOSes such as Cumulus Linux, EOS, or NX-OS as well.
Deep Visibility and Control: Continuous uptime and system health begin with knowing your infrastructure and being in tune with your network fabric. ONES provides deep visibility down to the last detail on every component displaying dozens of metrics so you have the insights and power to drive peak performance while maintaining optimal system health.
24/7 Multi-Vendor SONiC Support: Aviz utilizes the ONES application and a wealth of SONiC expertise to provide an immediate response to find and fix issues before they result in downtime. The Aviz Support for SONiC is backed by vendor SLAs, so enterprises can rely on a single entity for every platform deployed.
These new capabilities enable the enterprise to move swiftly in its migration to SONiC, making ONES the most comprehensive and inclusive solution that delivers end-to-end visibility for multi-vendor, multi-NOS networks.
Hardware Agnostic Interoperability
Most enterprises today are diversifying their hardware vendor portfolio in light of ongoing chip shortages or simply for optimizing their infrastructure cost. Quality of Service, user experience, and uptime across a diverse set of platforms are of grave concern when it comes to leveraging open-source software. We are proud to bring hardware-agnostic capabilities to SONiC deployment and operations. ONES delivers near real-time visibility across any SONiC (community version or distribution). Our telemetry agents stream and normalize data regardless of the underlying hardware and the version of SONiC it is running, and provide a unified view of the entire fabric.
Figure 1: Dashboard of hardware/software components for every device in a multi-vendor network fabric
Deep Visibility and Control
No one wants to blindly adopt new technologies, especially open-source. ONES brings deep visibility with over 200 telemetry metrics collected via our agents in near-real time and provides dozens of operational and monitoring widgets that allow operators to gain insights into every aspect of their fabric, be it CPU/Memory utilization, SONiC microservices, or traffic errors. The biggest challenge in working with multi-vendor deployments is telemetry normalization before the data collected can be put to use for creating a unified view. We have worked with our early customers through proof of concepts and early deployment of ONES for almost a year now to bring the right level of visibility for large-scale multi-vendor SONiC operations.
Figure 2: Topology view of every device and network connection in the fabric
24/7 Multi-Vendor SONiC Support
In our experience, the key issue for enterprises in adopting SONiC has been a lack of truly unified enterprise-grade support across various platforms. ONES not only brings the visibility, but also a wealth of SONiC expertise in our team that has been actively involved with the community for years. We have not just helped organizations deploy and test SONiC over the years, but also worked tirelessly to establish partnerships and SLAs with all major Switch and ASIC vendors to enable a unified channel of SONiC support. This is what completes ONES as the supportability stack for SONiC. The prowess of ONES is in the network effect of the benefits every enterprise gets from our collective experience of deploying SONiC for multiple use cases and resolving issues identified across multiple platforms.
The SONiC Momentum Continues with ONES
The General Availability of ONES, our current customers, our partners, ongoing deployments and pilots are not only a proof of the our momentum but also validates that we are changing the pace of SONiC adoption on white box switches in the enterprise. SONiC is no longer a buzz, it has quickly become one of the most sought-after technologies in the networking industry, and we are excited to contribute in creating value for the SONiC ecosystem. To learn more about ONES, watch Enabling SONiC Adoption for New or Existing Networks hosted by SDxCentral.
FAQs
1-What is ONES and how does it support SONiC adoption in enterprises?
ONES (Open Networking Enterprise Suite) is a supportability stack from Aviz that simplifies and accelerates enterprise adoption of open-source SONiC. It offers critical capabilities such as deep network visibility, multi-vendor hardware support, and 24/7 expert support, addressing the key challenges enterprises face when transitioning from traditional NOS platforms to SONiC.
2-Does ONES work across different switch vendors and ASICs?
Yes. ONES is built for hardware-agnostic interoperability, meaning it works on any switch and any ASIC that supports SONiC whether it’s community-based or a commercial SONiC distribution. It enables a unified operational view across multi-vendor network environments, helping enterprises manage their infrastructure regardless of the underlying hardware diversity.
3-What kind of telemetry and visibility does ONES provide?
ONES delivers deep operational visibility by collecting over 200 telemetry metrics in near real-time. It provides insights into CPU/memory usage, SONiC microservices health, hardware components, and traffic errors, allowing network teams to monitor and maintain optimal system performance. The solution also normalizes telemetry across vendors, which is essential for multi-vendor SONiC environments.
4-How does ONES provide support for SONiC deployments?
ONES includes 24/7 enterprise-grade SONiC support, provided by Aviz’s expert team, who have years of hands-on experience with SONiC deployments. This support is backed by SLAs with major switch and ASIC vendors, giving enterprises a single point of contact for issue resolution across multiple platforms resolving the most cited barrier to SONiC adoption: lack of unified support.
5-Why is ONES considered essential for enterprise-grade SONiC operations?
While SONiC offers openness and cost advantages, it can be challenging to deploy at scale due to hardware diversity, telemetry inconsistencies, and lack of integrated support. ONES addresses all these issues by providing deep observability, multi-vendor interoperability, and round-the-clock support, making it a comprehensive solution for enterprises looking to run SONiC reliably in production environments.
6. How does ONES help enterprises transition from traditional NOS like EOS or NX-OS to SONiC?
ONES allows seamless migration by offering deep visibility into both SONiC and traditional NOS environments like Cumulus Linux, EOS, or NX-OS. This dual visibility enables enterprises to gradually transition, compare performance and health metrics, and maintain uptime while adopting SONiC without risk or disruption.
7. Can ONES be used in both data center and edge network deployments?
Yes. ONES is designed for flexibility across network topologies, from large-scale data centers to edge environments. Its hardware-agnostic telemetry agents and unified visibility make it an ideal solution for white-box switches deployed in distributed network fabrics.
8. What makes ONES a truly hardware-agnostic solution?
ONES collects and normalizes telemetry data across any switch, any ASIC, and any version of SONiC—whether community or commercial. This ensures that operators get a unified dashboard and consistent performance monitoring, regardless of the underlying hardware vendor or configuration.
9. How does ONES ensure system health and uptime in SONiC networks?
By continuously collecting over 200 telemetry metrics and offering operational widgets, ONES gives administrators real-time insights into system health, CPU/memory usage, microservice performance, and hardware status. This granular visibility allows issues to be proactively identified and resolved, ensuring maximum uptime and optimal performance.
10. What kind of enterprises should consider adopting ONES for their SONiC deployments?
Any enterprise exploring cost-effective, open-source networking with high availability across multi-vendor environments will benefit from ONES. It’s especially valuable for organizations with diverse hardware portfolios, looking to achieve vendor independence, centralized support, and hyperscaler-grade operations using SONiC.
In today’s fast-paced networking landscape, data is a critical asset. Unexpected failures can lead to downtime, operational disruptions, and misconfigurations. When a network device crashes, engineers need a reliable backup to restore it quickly. Without structured backup and restore mechanisms, organizations risk prolonged outages and inefficiencies. This overview underscores the importance of regular backups and explains how ONES Fabric Manager Backup & Restore streamlines the process, ensuring seamless recovery in multi-vendor environments.
The Importance of Backup & Restore in Network Resilience
Backup and restore processes ensure rapid recovery from failures by preserving critical network configurations. Key components include:
Configuration Snapshots: Capture and store complete network configurations, including interfaces, QoS, ACLs, and VxLAN, enabling full restoration when needed.
Multi-vendor Compatibility: Ensures seamless backup and restoration across different vendor devices.
In RMA scenarios, replacing faulty hardware is only the first step—the real challenge lies in restoring the original configurations. Without a recent backup, administrators must manually reconfigure the failed switch, resulting in extended downtime, increased risk of errors, operational disruptions, and higher recovery costs due to additional troubleshooting and resource allocation.
ONES Backup & Restore: The Lifeline for Uninterrupted Networks
ONES Fabric Manager Backup & Restore ensures seamless recovery by securely storing configurations in a persistent Docker volume, enabling quick restoration, and eliminating manual reconfiguration. With pre-replacement snapshots for ZTP or upgrades, it offers a reliable rollback option. Designed for multi-vendor compatibility, it minimizes downtime, reduces risks, and streamlines RMA processes for efficient, error-free network management.
Figure 1: Backup Taken After Configuration
Streamlined Backup & Recovery Process
ONES Fabric Manager Backup & Restore captures essential configuration files (config_db.json, frr.conf, fmcli_db.cfg) by enabling both manual and automatic snapshot creation during key operations like reboot, ZTP, or image upgrades. Each snapshot is tagged with a timestamp or custom label for easy identification and restoration. In the event of a failure, users can quickly revert to a known-good configuration—minimizing downtime and eliminating the need for complex manual recovery steps.
Figure 2: Backup Management Page
Figure 3: Restore Management Page
Multi-Vendor Support for Diverse Environments
Designed for flexibility, ONES Fabric Manager Backup & Restore works seamlessly across various network devices. Its consistent and reliable backup and recovery capabilities make it an ideal solution for dynamic, multi-vendor infrastructures, ensuring uninterrupted network performance regardless of vendor diversity.
Book a demo today — because every second of network downtime costs more than you think.
1. How does ONES Backup & Restore help reduce SONiC RMA downtime?
ONES Fabric Manager automates the backup of SONiC configurations and enables one-click restore, eliminating the need for manual reconfiguration during RMA. This drastically reduces downtime and speeds up recovery.
2. Can I use ONES Backup & Restore across multi-vendor network environments?
Yes, ONES supports multi-vendor compatibility, allowing seamless backup and restoration across SONiC and non-SONiC devices—making it ideal for hybrid data center infrastructures.
3. What configurations does ONES Backup capture for SONiC switches?
ONES captures critical configuration files like config_db.json, frr.conf, and fmcli_db.cfg, ensuring full restoration of routing, ACLs, QoS, interfaces, and more.
4. Does ONES support automatic snapshots before upgrades or ZTP?
Yes, ONES allows both manual and automated snapshot creation before key operations like Zero Touch Provisioning (ZTP), image upgrades, and reboots, enabling quick rollback if needed.
5. Why is backup and restore crucial for SONiC-based network resilience?
Without a structured backup system, RMA recovery becomes error-prone and time-consuming. ONES Backup & Restore ensures operational continuity by enabling reliable, fast, and error-free recovery after hardware failures.
Contact Us
Simplify Your SONiC RMA Experience with ONES Backup & Restore
In today’s fast-paced networking landscape, data is a critical asset. Unexpected failures can lead to downtime, operational disruptions, and misconfigurations. When a network device crashes, engineers need a reliable backup to restore it quickly. Without structured backup and restore mechanisms, organizations risk prolonged outages and inefficiencies. This overview underscores the importance of regular backups and […]