Exciting Announcement! In celebration of launching our AI Certification, we’re thrilled to offer a 50% discount exclusively. Seize this unique chance—don’t let it slip by!
In dynamic network environments, device management IPs change over time due to various factors such as DHCP leases, network reconfigurations, or failover events. ONES, a gNMI-based telemetry solution for real-time network monitoring, must adapt to these changes to ensure continuous and accurate data collection. This dynamic adaptation enhances network reliability and minimizes disruptions. With ONES 3.1, we have adopted an advanced Automatic IP Rediscovery Mechanism to address this challenge. This eliminates the need for manual intervention by enabling ONES to instantly detect management IP updates and automatically re-register devices with the controller. It offers seamless monitoring, ensuring real-time visibility and uninterrupted telemetry data collection.
Seamless IP Transition with ONES :
Automatic IP Detection: The ONES Agent now monitors and detects any changes in the device’s management IP.
Seamless Re-Registration: Upon detection, the agent automatically updates the configuration and re-registers the device.
Real-Time Tracking with the IP Transition Widget: A dedicated widget in ONES UI logs historical IP transitions, helping operators track changes over time.
Instant Alerts via ONES Rule Engine: Notifications alert administrators when an IP transition occurs, ensuring immediate awareness and action.
IP Transition Widget & Alerts :
ONES 3.1 enhances network visibility and responsiveness by providing real-time tracking and instant alerts for management IP transitions. This ensures that network operators can proactively monitor and respond to changes without disruption.
To further streamline operations, a dedicated UI widget has been introduced, which logs historical IP changes, allowing administrators to track, analyze, and troubleshoot patterns over time. This historical insight helps in identifying trends, diagnosing potential issues, and improving overall network stability.
In addition, automated notifications play a crucial role in keeping administrators informed. The system instantly sends alerts whenever a management IP transition occurs, ensuring that teams can take swift corrective actions to maintain seamless device connectivity and uninterrupted telemetry data collection. By eliminating manual tracking and intervention, ONES 3.1 significantly reduces operational overhead while enhancing network resilience.
By removing manual processes, ONES 3.1 enhances operational efficiency, minimizes human errors, and accelerates network recovery during IP transitions. The improved visibility provided by real-time tracking and historical logging enables administrators to monitor IP changes proactively, analyze transition patterns, and troubleshoot potential issues with ease.
Additionally, the feature strengthens network reliability and performance by ensuring that devices remain seamlessly connected to the ONES, even when their management IPs change. Automated notifications and alerts further enhance responsiveness, allowing IT teams to react immediately and prevent downtime.
Overall, the IP Transition Feature in ONES 3.1 optimizes network resilience, streamlines workflows, and enhances the accuracy of real-time monitoring, making it an indispensable tool for modern network operations.
In every release, ONES product stays in tune with customer feedback to continually refine and enhance the user interface (UI), addressing key pain points and improving the overall experience. ONES 3.1 is no exception, bringing a fresh wave of UI improvements that make it easier than ever for users to manage their networks. With a single pane of glass approach, it offers clear insights into potential issues, provides quick overviews, summaries, and more. In the sections below, we’ll delve into the key UI enhancements introduced in this release.
Default Rules with Troubleshooting Hints:
Network operators prefer the ONES rule engine to monitor their networks for potential issues by setting threshold values for metrics such as CPU utilization or fan failures, triggering alerts when conditions are met. With each release, new metrics are introduced, enhancing the product’s monitoring capabilities. In Release 3.1, ONES goes a step further by analyzing network deployments across various regions and customer environments, offering a set of preconfigured rules called Default Rules. These rules come with industry-standard threshold values, allowing operators to seamlessly monitor their networks by simply enabling them. Additionally, each Default Rule includes a set of troubleshooting steps that appear directly in the alert payload. This helps users understand what actions to take when a specific alert is triggered. The troubleshooting steps include SONIC, FRR ,linux shell commands, along with recommended physical checks tailored to the issue at hand, providing clear guidance on resolving potential problems swiftly.
Alerting Rules: Preview and Download
With the growing number of default and custom rules, ONES 3.1 introduces a convenient Downloadable Summary feature. This allows users to easily access a complete overview of all rules configured in the system. The summary is available in CSV format and includes key fields such as rule names, threshold levels, Slack notification and ticketing system configurations, and more. This makes it simple for network operators to review and audit their rule configurations at a glance. In ONES 3.1, the Preview Feature has been introduced to provide a quick and detailed view of individual rules directly on the rules page. Below each rule, users can expand a summary that highlights the key configurations of that rule.
Traffic Comparison
Often, simply observing traffic data on a single interface may not offer sufficient insights into the overall traffic flow within a device. Understanding the traffic pattern in relation to other interfaces becomes essential for a more comprehensive analysis. ONES 3.1 addresses this need by introducing an Ingress and Egress Traffic Utilization Comparison feature within the device. With this enhancement, users can select up to eight interfaces and perform a comparative analysis of either the Tx (Transmit) or Rx (Receive) utilization of the links. This side-by-side comparison enables operators to detect traffic imbalances, spot potential bottlenecks, and better understand traffic distribution across multiple interfaces, leading to more informed network management decisions.
Comparison of interfaces Rx Utilization
Optics Analytics
The Transceiver Widget in the Analytics – Interfaces page of ONES 3.1 offers a comprehensive summary of the various types of transceivers deployed across the managed network. This information is invaluable to customers, allowing them to easily track and identify the count of different transceiver types in use, as well as the various vendor models or manufacturers within their network.
Additionally, the widget includes an Export Data feature, providing more granular details. This exported data contains subsections for each transceiver type and manufacturer, along with information such as the individual transceiver’s serial number and manufacturer date details. This feature helps customers better manage and audit their network hardware inventory, ensuring more efficient network planning and maintenance.
Summary of transceiver inventory
Protocol Enhancements
MC-LAG Visualization
ONES 3.1 introduces a new MC-LAG Filter option on the topology page, allowing users to easily view devices configured with MCLAG (Multi-Chassis Link Aggregation) in their managed networks. By visually displaying the MCLAG configuration, this feature greatly enhances the network view for operators, offering a clearer understanding of redundancy setups and improving overall network visibility and resilience management.
State Transitions
In ONES 3.1, the widgets on the Protocols page have been revamped for better clarity and efficiency. Instead of continuously displaying state data for features like port channels and VXLAN tunnels over a time frame—where minimal changes occur in stable networks—the new design is event-driven. Now, only state transitions are recorded and displayed in tabular form, offering a more concise and actionable view of network changes. Users can still leverage the previously available timeframe selections (1h, 2h, 4h, 12h, 24h, 1w, and 2w) to filter state transitions within a specific period, ensuring they capture relevant changes over any chosen time range. Protocol state transitions include LACP, MCLAG, VXLAN etc.
VTEP State transitions
VLAN Information
A significant addition to ONES 3.1 is the introduction of VLAN data representation on the Protocols page, offering a clear view of configured VLAN information on each switch. VLANs (Virtual Local Area Networks) are a fundamental component of data communication networks, allowing segmentation of network traffic to enhance performance, improve security, and manage broadcast domains more effectively. By logically separating devices within the same physical network, VLANs provide better control over traffic flow, isolating specific segments for efficiency and security purposes.
Navigating to Monitor → Protocols → VLAN gives users a summary count of VLANs across all devices in the managed network. Clicking on a specific device and navigating to its detailed view displays the VLANs configured and their associated ports. If any VLAN is configured with an SVI (Switched Virtual Interface), those details are also displayed, offering users a complete picture of their VLAN setup.
Configured VLAN Information
Conclusion
ONES 3.1 brings a host of UI enhancements designed to improve network management and visibility. Key updates include Default Rules with troubleshooting steps, a Downloadable Summary for quick rule audits, and a Preview Feature for easier rule configuration reviews. The Traffic Comparison tool enables better analysis of interface utilization, while the Transceiver Summary offers detailed insights into network hardware. New features like the MCLAG Filter on the topology page and the event-driven State Transition Data in Protocols provide clearer views of redundancy and protocol changes. Together, these features make ONES 3.1 more intuitive and powerful for network operators.
Ready to see ONES 3.1 in action?
Because smarter troubleshooting, real-time rule insights, and powerful protocol visibility shouldn’t be optional.
In modern high-performance computing and AI-driven workloads, real-time observability is crucial for maximizing performance and preventing failures. ONES delivers a powerful telemetry solution that provides deep insights into compute environments, monitoring NICs, GPUs, CPUs, SSDs, and other critical components. With real-time tracking and seamless multi-vendor compatibility, ONES empowers businesses with proactive performance management, ensuring stability, efficiency, and optimal resource utilization.
Unified Monitoring for Compute, Network Interfaces Cards, and GPU Performance
NIC Insights:
To ensure seamless data transmission, ONES compute telemetry provides comprehensive visibility into network interface performance. It captures key metrics such as operational and administrative status, MTU size, port speeds, and auto-negotiation settings—helping teams assess interface health and diagnose potential issues.
Additionally, ONES monitors Forward Error Correction (FEC) modes to enhance data reliability and tracks Link Layer Discovery Protocol (LLDP) statistics, including transmitted, received, and discarded frames. This enables better network topology mapping, proactive issue resolution, and improved data integrity analysis, ensuring optimal performance in high-performance computing environments.
GPU and Compute Performance Monitoring:
In GPU-accelerated environments, performance bottlenecks can stem from either the compute infrastructure hosting GPUs or the GPUs themselves. ONES provides comprehensive visibility into both, ensuring optimal efficiency and stability.
Compute Health Monitoring:
ONES tracks critical system-wide parameters, including CPU utilization, memory usage, temperature, and platform metadata. This proactive monitoring helps maintain stable performance and prevents thermal-related issues.
GPU Performance Insights:
Using NVIDIA SMI, ONES collects key GPU metrics such as real-time core temperature, utilization, power consumption, memory allocation, bus ID, and serial number. By continuously monitoring temperature fluctuations and power draw, administrators can proactively mitigate failures, optimize workloads, and maximize GPU efficiency.
Figure 1: GPU Performance Overview – Temperature, Memory, Utilization, and Power
CPU and Memory Utilization
Efficient resource allocation is essential for sustaining high-performance computing. ONES continuously monitors CPU load across various intervals and tracks memory usage at both compute and GPU levels to optimize resource distribution and prevent bottlenecks. With real-time system uptime visibility, ONES enables administrators to evaluate long-term reliability, make proactive adjustments, and ensure seamless operations while mitigating unexpected failures.
Figure 2: CPU and Memory Insights – Utilization and Temperature Analysis
Figure 3: GPU Analytics – Utilization, Temperature, and Memory Usage
Storage and Platform Health
Efficient resource allocation is essential for sustaining high-performance computing. ONES continuously monitors CPU load across various intervals and tracks memory usage at both compute and GPU levels to optimize resource distribution and prevent bottlenecks. With real-time system uptime visibility, ONES enables administrators to evaluate long-term reliability, make proactive adjustments, and ensure seamless operations while mitigating unexpected failures.
Figure 4: Disk Health and Utilization – Health, Used Percentage, Usage(in MB), and Temperature Metrics
Vendor-Agnostic and Scalable
ONES observability is vendor-agnostic, collecting network metrics through standard Linux interfaces, supporting multiple NIC vendors such as Intel and Mellanox. This flexibility ensures that ONES adapts to your evolving infrastructure as new hardware and network configurations are integrated.
Designed for large-scale deployments, ONES offers scalable monitoring solutions for thousands of system components without overwhelming server resources. While primarily gathering data from Linux servers, it supports multi-vendor environments, enabling seamless data collection from diverse hardware configurations. By centralizing monitoring across servers hosting GPUs, network interfaces, and individual GPUs, ONES ensures comprehensive performance tracking and efficient management.
Conclusion
ONES 3.1 delivers an efficient, flexible, and scalable solution for monitoring critical system components. With in-depth insights into network performance, GPU metrics, server conditions, CPU load, memory usage, and system uptime, it empowers administrators to optimize performance and prevent failures. Its seamless compatibility with diverse hardware vendors and network configurations makes it the ideal choice for complex, multi-vendor environments. Unlock the full potential of your infrastructure with ONES 3.1’s comprehensive observability and performance monitoring.
The latest release of Open Networking Enterprise Suite (ONES) marks a significant milestone in network observability, introducing comprehensive telemetry support for Spectrum-X switches. This update extends the robust monitoring capabilities of ONES to Cumulus Linux, providing deep visibility into network performance, health, and traffic patterns.In today’s rapidly evolving networking landscape, achieving end-to-end visibility is paramount for maintaining optimal network performance and swiftly addressing potential issues. With ONES, Aviz Networks ensures that organizations leveraging Cumulus Linux 5.9, 5.10, and 5.11 can achieve end-to-end network visibility, enabling efficient troubleshooting, enhanced security, and performance optimization.
Why End-to-End Visibility Matters for Cumulus Networks
End-to-end visibility refers to the comprehensive monitoring and analysis of data as it traverses the entire network infrastructure. This holistic perspective is essential for:
Proactive Issue Detection: Identifying and resolving potential problems before they escalate.
Performance Optimization: Ensuring data flows efficiently, minimizing latency and packet loss.
Security Enhancement: Detecting anomalies and potential security threats in real-time.
Informed Decision-Making: Providing actionable insights for network planning and scaling.
Without such visibility, network administrators often find themselves reacting to issues after they impact operations, leading to increased downtime and reduced efficiency.
As modern data centers become increasingly complex, ensuring seamless monitoring across all network components is critical. Lack of visibility can lead to:
Delayed Issue Resolution – Troubleshooting network problems becomes reactive rather than proactive.
Performance Bottlenecks – Poor visibility can result in increased latency, packet loss, and inefficiencies.
Security Risks – Without continuous monitoring, network vulnerabilities may go undetected.
To address these challenges, ONES supports agentless telemetry for Cumulus, delivering real-time insights into device health, interfaces, traffic statistics, and protocol performance.
Comprehensive Integration with Spectrum-X
Agentless Telemetry Collection
ONES supports Cumulus Linux in an agentless manner, leveraging NVUE (NVIDIA User Experience Daemon) and NGINX for telemetry data collection. NVUE exposes telemetry data through REST APIs, and NGINX acts as a web server to serve these API requests. This enables seamless integration and eliminates the need for additional agents.
Real-World Insights
Live Dashboard View: Real-time visibility into device performance and health metrics.
RoCE Telemetry: Detailed tracking of PFC packets and queue performance, crucial for optimizing RDMA traffic.
Unified Monitoring Experience: A consistent monitoring platform for both SONiC and Cumulus Linux devices, simplifying network management.
Advanced Rule Engine for Proactive Monitoring
ONES 3.1 integrates an advanced Rule Engine that enhances network management by providing automated alerts and notifications. This feature allows administrators to:
Define Custom Rules for monitoring critical Cumulus device metrics.
Receive Real-Time Alerts via Slack, Zendesk, and other integrations.
AI/ML Topology Visualization
ONES provides comprehensive topology visualization with full support for Cumulus devices. Users can:
Monitor AI/ML Fabric for performance optimization.
Visualize and manage network connections in data center environments.
Benefits of Deploying ONES with Cumulus Devices
Implementing ONES within a Cumulus-powered network infrastructure offers several advantages:
Unified Monitoring Platform: Organisations can now monitor both SONiC and Cumulus devices through a single pane of glass, streamlining operations and reducing complexity.
Enhanced Troubleshooting Capabilities: Detailed telemetry data accelerates the identification and resolution of network issues, minimizing downtime and improving service reliability.
Scalability: ONES is designed to handle the demands of large-scale networks, ensuring that as your infrastructure grows, your monitoring capabilities scale accordingly.
Security and Compliance: Comprehensive monitoring aids in maintaining security postures and ensuring compliance with industry standards by providing visibility into all network activities.
Enhanced Security by detecting anomalies and ensuring compliance.
Optimized Performance through RoCE visibility and advanced traffic analysis.
Conclusion
ONES sets a new standard for network observability, delivering end-to-end visibility for Spectrum-X platforms. With agentless telemetry, extensive metrics coverage, and unified monitoring, it empowers organizations to optimize network performance, security, and operational efficiency.
FAQs
1. What is end-to-end observability in Spectrum-X networks and why is it important?
End-to-end observability refers to the ability to monitor data flow and network health from source to destination across the entire infrastructure. In Spectrum-X environments, this ensures reduced latency, faster troubleshooting, and better performance tuning—especially vital for AI/ML workloads and RDMA (RoCE) traffic.
2. How does ONES enable agentless telemetry for Cumulus Linux-based Spectrum-X switches?
ONES collects telemetry using NVUE (NVIDIA User Experience Daemon) via REST APIs and serves it through NGINX, eliminating the need for extra agents. This streamlines deployment while ensuring real-time visibility into Cumulus devices running versions 5.9, 5.10, and 5.11.
3. Can ONES monitor both SONiC and Cumulus devices from a single dashboard?
Yes. ONES 3.1 offers unified observability across SONiC and Cumulus Linux devices through a single interface—simplifying network monitoring in hybrid, multi-vendor environments and enabling consistent rule-based alerts and insights.
4. How does ONES support RoCE traffic visibility for optimizing GPU clusters?
ONES provides detailed metrics on Priority Flow Control (PFC) and queue-level performance, enabling visibility into RoCE packet flows. This is critical for achieving lossless communication in GPU-driven AI clusters and fine-tuning fabric behavior.
5. What are the key benefits of integrating ONES with NVIDIA Spectrum-X for enterprise networks?
Unified network monitoring across vendors
Real-time alerts with an advanced Rule Engine
Visual topology for AI/ML fabrics
Better compliance through complete traffic visibility
Scalability to support growing data center demands
SONiC (Software for Open Networking in the Cloud) Network Operating System (NOS) has seen a huge surge in its popularity and adoption in the last few years. Originally developed by Microsoft and subsequently open-sourced, it offers a versatile, modular, and hardware-agnostic platform that decouples the switching hardware from the software running on it. This allows the much-needed flexibility in networking hardware choices for the enterprise. The openness of SONiC and the customer demands have prompted a majority of hardware vendors to support SONiC on their switches. While SONiC has become a leading choice for data center networking, and major hardware vendors are supporting it on their platforms, this disaggregation is leading to a fair bit of chaos as organizations deploy hardware sourced from multiple vendors with different flavors of SONiC running on them.
SONiC Market Revenue by Customer Type (by 650 Group)
In this blog, our goal is to bring clarity to what it means to deploy, operate, and support SONiC and its various flavors on different platforms, understand the SONiC support ecosystem, and navigate the ecosystem to utilize the options best suited to your organization.
NetOps & SONiC Support Landscape
Aviz ONES Covers SONiC NOS Support and SONiC NetOps Support in Single Price
Operating and supporting SONiC on multiple hardware platforms involves a combination of strategies due to its open-source nature and the disaggregated model it promotes. Here’s what organizations should know before making decisions for NetOps tools and Support options available for SONiC.
Proprietary Single Vendor NetOps: relies on a network operations model where all solutions come exclusively from one vendor. While this ensures tight integration and often simplifies network management, it leads to vendor lock-in. Many vendors have integrated SONiC with their existing NetOps tools, but organizations may face constraints with adaptability, potential high costs for changes, and dependency on the vendor’s roadmap.
Proprietary Single Vendor Support: refers to a support model in which all assistance for the hardware and software comes exclusively from the original vendor. Much like integrating existing NetOps tools with SONiC, many vendors support SONiC on their platforms. While it may ensure a deep knowledge of the hardware and software SONiC is running on, it can limit flexibility, and organizations may be bound by the vendor’s support hours, and policies.
Disaggregated Multi-vendor NetOps: represents a shift from traditional, monolithic network operations to a more flexible model where network components, both hardware and software, can be sourced from various vendors. This approach relies on unified multi-vendor software solutions for optimized network operations and cost savings. Vendors who previously created unified tools have also started integrating SONiC, but such tools lack the depth in their SONiC integrations. So far, only Aviz has taken the approach of developing a purpose-built solution for multi-vendor SONiC NetOps.
Disaggregated Multi-vendor Support: refers to a support model in which the assistance comes from a third party vendor who has deep knowledge of the software and hardware components across all the vendors utilized in a network infrastructure. Availability support across multiple vendors introduces the complexities in coordination and communication, which is why it is critical that the third party vendor has back channels established into all the original vendors, with tight SLAs. Again, so far, Aviz is the only vendor that has established well-defined SLAs with the vast majority of Platform and ASIC vendors to create the back channels.
When executed right, a Disaggregated Multi-vendor NetOps and Support structure for SONiC can yield huge cost-savings, minimize vendor lock-in risks, and allow organizations to adapt more flexibly to the technological advancements in the networking domain.
Unpacking & Navigating Multi-vendor SONiC Deployments
SONiC presents a transformative approach to networking, offering a modular, adaptable, cost-effective, and forward-looking solution to meet the evolving needs of next generation network infrastructures. When it comes to evaluating options for SONiC NetOps and Support, we recommend:
Understanding Hardware Compatibility for SONiC: ensure that all hardware platforms you’re selecting are SONiC compatible, and make sure you are evaluating the platforms in context of the feature sets you plan to utilize.
Narrowing Down the Builds for SONiC: some vendors offer custom SONiC builds optimized for their hardware.
Identifying a Normalized NetOps Stack for SONiC: for efficient network operations, you will need to identify tools that not only understand SONiC’s underlying architecture, but also the various nuances that get introduced by various flavors of SONiC. Only tools that are purpose-built for SONiC can handle such nuances, since they are designed to normalize the data before delivering functionalities.
Identifying a Neutral SONiC Support Partner: vendors who provide SONiC compatible hardware also offer support, but that is typically limited to the support for SONiC on their own hardware and their custom SONiC builds. Third-party providers on the other hand generally focus on supporting the community distro. The key is in identifying vendors who not only have good experience with SONiC on multiple platforms, but also deep relationships with individual Platform and ASIC vendors that you select for your infrastructure.
Aviz ONES & Multi-vendor SONiC Support
Aviz Networks is a leading provider of open-source SONiC solutions, dedicated to revolutionizing the networking landscape with their unique approach. We firmly believe in SONiC’s potential as the future of networking and focus on delivering purpose-built tools and solutions specifically designed for its ecosystem.
With over 100 man-years of experience across various hardware platforms, Aviz Networks has developed their flagship platform, Open Networking Enterprise Suite (ONES). ONES is designed to work seamlessly with any SONiC on any platform, regardless of the underlying ASIC.
As a testament to our commitment to open networking, we have established strong relationships with the majority of platform, ASIC, and OS vendors. Our highly skilled SONiC support team, along with ONES, are used by businesses worldwide to effectively deploy, operate, and support SONiC across diverse hardware environments.
Experience the Power of ONES and the ONE Center!
Aviz Networks invites you to experience firsthand the transformative power of ONES and our Multi-vendor SONiC Support. Visit the state-of-the-art Open Networking Experience Center (ONE Center), either online or in-person, and discover the possibilities of SONiC across a wide range of hardware. Take advantage of a free, hands-on experience and explore how SONiC can optimize your network operations. Test our well-known vendors in hardware, platforms, ASIC, and OS environments, including Cisco SONiC, NVIDIA SONiC, Celestica SONiC, Marvell SONiC, Wistron SONiC, Edgecore Community SONiC, Arista SONiC, Supermicro SONiC, Enterprise SONiC, and DELL SONiC before your SONiC deployments.
Schedule your demo today and unleash the potential of your network!
Benefits of Aviz Solutions:
Open-source based: Leverages the flexibility and scalability of SONiC
Platform-agnostic: Works with any SONiC on any platform, with any underlying ASIC
Comprehensive solutions: Offers a complete suite of tools for deployment, operation, and support
Expert support: Provides access to a highly qualified SONiC support team
Free trial: Experience the capabilities of ONES and the ONE Center firsthand
Contact Aviz Networks today to learn more about their innovative solutions and unlock the potential of SONiC for your business.
FAQs
1. What is the difference between SONiC Network Operating System (NOS) support and SONiC NetOps support?
SONiC NOS support focuses on maintaining and troubleshooting the network operating system itself — the software layer running on switches. SONiC NetOps support, on the other hand, covers broader operational tasks like monitoring, managing, automating, and optimizing multi-vendor SONiC environments to ensure end-to-end network health and performance.
2. Why is multi-vendor NetOps support important for SONiC deployments?
Multi-vendor NetOps support is critical because SONiC deployments often involve switches and hardware from multiple vendors. A neutral NetOps solution ensures unified visibility, normalized operations across different SONiC builds, faster troubleshooting, and eliminates vendor lock-in risks leading to greater flexibility and operational efficiency.
3. What are the risks of relying only on single-vendor proprietary SONiC support?
Single-vendor proprietary support can limit flexibility, increase costs over time, and tie organizations to that vendor’s roadmap and timelines. It may also lack comprehensive visibility or operational tooling when managing diverse hardware ecosystems which is crucial in open networking environments powered by SONiC.
4. How does Aviz ONES simplify SONiC NetOps and support across multiple platforms?
Aviz ONES provides a platform-agnostic, purpose-built solution for SONiC operations. It offers normalized NetOps tooling, seamless integration across different hardware, and access to expert SONiC support teams. ONES ensures a consistent operational experience regardless of the underlying switch vendor or ASIC type.
5. How can enterprises evaluate and test SONiC across different hardware platforms before deployment?
Enterprises can leverage Aviz Networks’ Open Networking Experience Center (ONE Center) to test SONiC on various hardware platforms like Cisco, NVIDIA, Supermicro, Marvell, Edgecore, and more. This hands-on testing enables organizations to validate compatibility, performance, and operational workflows before committing to large-scale deployments.
6. What challenges do organizations face when deploying SONiC across multiple hardware vendors?
Organizations often face challenges like hardware-software compatibility issues, different SONiC build variations, lack of unified monitoring tools, fragmented support models, and coordination complexities between vendors. These can lead to operational inefficiencies if not addressed with a consolidated NetOps and support strategy.
7. Why is hardware compatibility critical when planning a SONiC-based network deployment?
Hardware compatibility ensures that the selected switches fully support SONiC features needed for production environments. Without proper compatibility evaluation, organizations risk feature gaps, unstable operations, and complicated troubleshooting during Day 2 operations, impacting network reliability and uptime.
8. How does Aviz Networks ensure seamless Day 2 operations for SONiC deployments?
Aviz Networks offers deep multi-vendor SONiC expertise, automated monitoring and troubleshooting via ONES, and SLA-backed support models. Their solutions handle platform-specific nuances, automate recovery processes, and provide proactive network insights, ensuring smooth Day 2 operations across complex SONiC deployments.
9. What makes Aviz ONES different from traditional network monitoring tools for SONiC?
Unlike traditional tools that treat SONiC as a generic device, Aviz ONES is purpose-built for SONiC. It normalizes data from various SONiC flavors, deeply understands SONiC’s architecture, supports multi-vendor environments, and integrates operational automation tailored to SONiC-specific workflows and protocols.
10. Can SONiC be a reliable alternative to proprietary NOS for enterprise data centers?
Yes, SONiC has proven to be a highly reliable, scalable, and flexible alternative to proprietary NOS solutions. With successful deployments across hyperscalers and Fortune 500 companies, combined with strong open-source community support and multi-vendor backing, SONiC is a future-proof choice for modern enterprise data centers.
Optimizing Data Center Networks: The Role of IP Clos Architecture and BGP Protocol
In contemporary data center networks, the IP Clos (IP-based Cloud Scale Networking) architecture is widely embraced for its ability to deliver a non-blocking high-bandwidth network fabric, low latency, and scalable connectivity between servers and switches, while ensuring fault tolerance. Central to the success of the IP Clos architecture is the utilization of the Border Gateway Protocol (BGP) as the routing protocol.
BGP stands out due to its features in traffic engineering, scalability, and adaptable routing, making it well-suited for the demands of modern data center environments. BGP is the protocol responsible for orchestrating internet routing, optimizing path selection through Autonomous Systems (AS), peering mechanisms, and configurable attributes.
The IP Clos network architecture is not a recent development, having been in existence for a decade and extensively implemented in large-scale data centers. It has been deployed using proprietary Network Operating Systems (NOS) and switches from specific vendors, showcasing its enduring relevance and effectiveness in meeting the evolving needs of data center networking
Key Features of SONiC That Make It Stand Out from Traditional NOS
Our Journey with SONiC
“When I joined the Aviz network, I was excited about its vision and approach to enabling IP Clos architecture through open source, any vendor, and any switch combination. Being an engineer, I am all praises for open source for three reasons. First, it’s free; second is community support. And the third reason is my ability to innovate and modify it to my needs without going through any red tape associated with a vendor-locked OS” said by Khurram Khani, VP of Customer Success, Aviz Networks
SONiC is the first true open source NOS (Network Operating System) that employs cutting-edge microservices based architecture far more capable than traditional network operating systems. SONiC has amassed a large ecosystem of developers not only in the community but also within a majority of Switch/ASIC vendors who embrace the technology due to sheer customer demand.
The Power of Open Source: Optimizing Production Network SLAs
With the great power of Open Source comes greater responsibility on the shoulders of the network operation teams. SONiC being an open-source NOS (network operating system) comes under the same Service Level Agreements (SLAs) that guarantee a certain level of performance, availability, uptime, and latency. One of our initial goals was to ensure that our customers are protected and we put all the right processes and automation in place and execute well-rounded SLAs.
How has Aviz Networks addressed concerns about SONiC’s ability to support Day 2 operations?
We met several customers who shared their myths about SONiC being an open source. The most common one is—will it be resilient, scalable, and have the necessary support compared to the decades-old proprietary vendor switches? Well, Aviz Networks with its expertise in networking, puts all the concerns to rest.
At Aviz, we have helped Fortune 500 companies with their journey around SONiC deployment at each step—right from vendor selection, use-cases validation, pre-staging, staging to production Day 2 support. The fact is SONiC has also already been successfully deployed at nearly all hyper scalers including Microsoft Azure, Alibaba, Tencent, Baidu, Google, and Meta have joined the board.
A Guide to Efficient Day 2 Network Operations: Monitoring and Maintenance
Day 2 network operation refers to ongoing monitoring, planned and unplanned maintenance, and operation efficiency. This covers all the activities after the network goes into production and is live.
Here’s an example of one of the Fortune 500 companies (One of the game developer companies in the US) we worked with. This company’s data center was designed around IP Clos and BGP. As part of their ongoing BGP Network maintenance, the company has brought up their expectations and requirements about SONiC Day 2 BGP operations tasks.
The customer asked us to certify SONiC’s BGP Day 2 Operations capabilities around:
BGP Node Maintenance – Customer expectation was to gracefully take a node out of service without any impact on the existing network and data traffic and reconverge the network quickly
BGP Link Maintenance – The next requirement is to take a link out of service without any impact on the existing network and data traffic and reconverge the network quickly
Network SLA – The other ones was assurance and guaranteed SLA around network re-convergence.
Day 2 BGP operation: Benefits of Using Community List and Route-Map for Node and Link Drain
BGP (Border Gateway Protocol) is often used for node and link drain as it provides a mechanism for the controlled removal of routes from a network. This helps to manage rerouting in a controlled manner when a node or link is being drained.
BGP can be used to gradually decrease the amount of traffic flowing through that node or link, while ensuring that the remaining traffic is still able to reach its destination. This is accomplished by updating BGP routing tables to reflect the new topology of the network.
Aviz Networks Ensures Smooth BGP Node Removal on SONiC BGP switch
Aviz Networks has built industry-leading automation around SONiC BGP node drain process validation. The BGP nodes are gracefully taken out of the network without any disruption to traffic. Aviz FTAS Automation also ensures that the network converges within SLA and has zero traffic loss.
Our team performs the following automated steps during BGP node drain validation on SONiC switches
BGP Community list
The BGP Community list is used to tag routes. It includes a community value that can be used to identify routes that will be redirected.
The “no-advertise” community is used in scenarios where a BGP node receives a route that it should not advertise to other BGP peers. We, at Aviz Networks, uses a “no-advertise” community during validation when it’s performing graceful removal of a BGP node or link
BGP Route-map
Route-map is used to match routes that will be redirected. This route-map should match the community value that was added to the routes.
Example of Node/ Link Drain Config on SONiC Router
route-map drain-community permit 10 on-match next set community no-advertise set ipv6 next-hop prefer-global exit router bgp <AS #> address-family ipv4 unicast neighbor v4server route-map drain-community in address-family ipv6 unicast neighbor v6server route-map drain-community in end
Finally, the physical connectivity of the node can be removed. This is accomplished by shutting down the router, link or taking out the physical link
Aviz FTAS ensures BGP Day 2 Operations meet Reconvergence SLA with Zero Traffic Loss
Aviz FTAS certifies BGP Day 2 operation and reconvergence times. This involves measuring the performance of BGP reconvergence time, zero traffic loss, identifying any deviations or breaches, and taking corrective action if required
Convergence Validation using Traffic Gear
How to Perform Node Drain Using AS Path Prepend in BGP
AS Path is a BGP attribute used to identify the sequence of Autonomous System (AS) numbers that a BGP route has traversed on its path to reach the target SONiC BGP router. When a BGP router sends an update, it appends its own AS number to the existing AS Path called AS Path prepending. When the router sees its own AS number in the route, it discards that route. If a destination has two paths, then the path with the lowest AS Path length is chosen.
“set as-path prepend last-as <no. of times to insert>” lets users insert the last ASN. Inserting last-as 10 times would eventually influence the router to choose another available Path.
Aviz Networks FTAS Topology for AS-Path validation
Example FTAS Configuration:
As part of the Fabric Test Automation Suite (FTAS) by Aviz Networks, we rigorously test configurations such as the one below to ensure robust functionality, combining the reliability of SONiC’s CLI with advanced testing methodologies:
route-map as-prepend permit 10
set as-path prepend last-as 10
exit
router bgp <ASN>
address-family ipv4 unicast
neighbor v4server route-map as-prepend in
address-family ipv6 unicast
neighbor v4server route-map as-prepend in
end
This sample configuration, involving route-map manipulation for AS-path prepending in BGP, is meticulously tested to guarantee the suite’s effectiveness in maintaining consistency and robustness in network operations.
How to Monitor BGP Sessions with Aviz ONES App
Periodic monitoring of BGP sessions between routers is critical to ensure that the sessions are established and maintained properly. This involves checking the status of BGP neighbors, monitoring BGP messages, and verifying that the expected routes are being exchanged.
Analyzing the BGP route advertisements received from neighboring routers helpt to proactively identify any anomalies, such as unexpected route flapping or neighbor reset. With Aviz ONES App Monitoring the BGP routing table can help detect issues and make necessary adjustments to the routing policies.
BGP Monitoring using Aviz Network’s ONES App
Empowering Modern Network Operations: Aviz Networks’ SONiC BGP Expertise
In conclusion, Aviz Networks has proven SONiC BGP capabilities to effectively manage Network Day 2 operations. We have proven SONiC’s ability to handle complex BGP network topologies, BGP route maps, as-path and policy-based routing. SONiC is a reliable choice for managing and operating large-scale IP CLOS BGP networks. With its widespread adoption with hyper scalers and Fortune 500 companies, it has demonstrated its capability to handle the evolving demands of modern network operations, flexible routing, reliability, efficient connectivity, and scalability.
If you have inquiries regarding SONiC BGP or any other features, don’t hesitate to contact us. Our team is eager to engage with you at your convenience.
FAQs
1. Why is IP Clos architecture combined with BGP essential for modern data center networks?
IP Clos architecture paired with BGP ensures non-blocking, high-bandwidth connectivity with low latency and fault tolerance, making it ideal for scalable and resilient modern data center environments. BGP’s robust routing and traffic engineering capabilities allow for efficient path selection and redundancy across large-scale infrastructures.
2. How does SONiC enhance traditional IP Clos architecture compared to proprietary Network Operating Systems (NOS)?
SONiC, as a fully open-source, microservices-based NOS, brings vendor neutrality, flexibility, and community-driven innovation to IP Clos networks. Unlike traditional vendor-locked NOS, SONiC empowers customization, scalability, and seamless integration across multi-vendor hardware ecosystems, improving operational efficiency.
3. What strategies does Aviz Networks use to ensure smooth Day 2 BGP operations on SONiC switches?
Aviz Networks uses automation-driven approaches such as BGP node and link drain validation, SLA-backed convergence tests, route-map and community list configurations, and AS-path prepending techniques. These ensure minimal disruption, zero traffic loss, and rapid network reconvergence during Day 2 operations on SONiC-powered BGP networks.
4. How does Aviz Networks' FTAS framework validate BGP Node Drain and AS-Path Prepending on SONiC?
Aviz Networks’ Fabric Test Automation Suite (FTAS) rigorously validates BGP node drain and AS-path prepending by simulating real-world traffic scenarios. FTAS ensures graceful node removals, optimal reconvergence times, traffic rerouting without packet loss, and consistency in route advertisement behaviors under production-like conditions.
5. How can the Aviz ONES App help monitor and maintain BGP sessions in SONiC-based networks?
The Aviz ONES App enables real-time monitoring of BGP sessions, neighbor relationships, route advertisements, and network health. It helps detect anomalies like route flapping, neighbor resets, and policy misconfigurations, empowering NetOps teams to proactively maintain network stability and meet SLA expectations.
6. What are the benefits of using open-source SONiC NOS over proprietary networking solutions?
Open-source SONiC offers flexibility, cost savings, and faster innovation compared to proprietary NOS. It allows operators to customize features, integrate across any hardware, avoid vendor lock-in, and tap into a vast global community for ongoing improvements — making it ideal for modern, scalable data center designs.
7-What challenges do network teams face during Day 2 operations, and how does Aviz address them for SONiC environments?
Challenges include ensuring high availability during maintenance, minimizing traffic loss, achieving fast convergence, and maintaining SLA commitments. Aviz addresses these by offering automation tools, proactive monitoring through ONES App, SLA-certified FTAS testing, and deep Day 2 operational support for SONiC deployments.
8. How does BGP Community List and Route-Map help in controlled node and link maintenance?
BGP Community Lists and Route-Maps enable graceful node or link drain by tagging and controlling route advertisements. This method ensures minimal disruption during maintenance activities, allowing traffic to reroute intelligently without impacting active data flows, ensuring consistent network availability.
9-How is BGP reconvergence validated after node or link failures in SONiC networks?
Aviz FTAS uses traffic generators and telemetry monitoring to validate BGP reconvergence. Metrics like convergence time, packet loss, and route update consistency are measured after node or link failures, ensuring that network stability is quickly restored within pre-defined SLA thresholds.
10. Why is AS Path Prepending used during BGP node drain operations in SONiC?
AS Path Prepending artificially increases the AS path length, making specific routes less preferred during BGP node drain. This strategy allows traffic to gradually shift away from the targeted node or link, ensuring a seamless rerouting process without abrupt traffic interruptions.
We’re excited to introduce ONES 2.0, a cutting-edge network operations tool that sets new standards for Visibility, Orchestration, and Support. This release represents a significant leap in our ongoing commitment to pushing the boundaries of network management capabilities and orchestration.
Brimming with groundbreaking features, a polished user interface, improved data center orchestration through incremental configuration, upgraded NetOps APIs for seamless integration, and robust functionalities, ONES 2.0 empowers network teams to effortlessly streamline operations, delve into deeper insights, and ensure peak performance.
Get ready to embark on a journey where innovation meets excellence, as ONES 2.0 empowers your network endeavors with unparalleled sophistication and efficiency. It’s not just a tool; it’s a game-changer, designed to elevate your network experience to unprecedented levels. Welcome to ONES 2.0 !
Figure 1: ONES 2.0 homepage
Unlocking Data-Driven Success: 6 Key Benefits of Enhanced Telemetry
1. Data Center Interconnect Visibility
Enhance visibility of Data Center Interconnect topologies for Layer-2 Leaf-Spine, Rack-to-Rack Connectivity using EVPN-VXLAN and MC-LAG including control plane health and configurations.
2. Rule Engine and Alerts
Experience a robust Rule Engine and Alerts system integrated with Slack messaging and Zendesk ticketing. Monitor platform metrics, health, traffic bandwidth, and more. Stay proactive with customizable alerts and automatic ticketing.
Gain visibility into RoCE (RDMA over Converged Ethernet) metrics and RoCE link visibility for improved performance monitoring and visualize traffic flows on the topology page.
Figure 4: RoCE Traffic & Configuration
4. Firmware Compliance
Stay up-to-date with detailed firmware information for each switch, covering ONIE, BMC, BIOS, FPGA, and CPLD versions.
Figure 5: Firmware Compliance
5. Enhanced Supportability Functions
Syslog Management: Simplify troubleshooting with a programmatic click in UI to collect the logs from devices, supported directly from the topology page.
Console Access and Inventory Download: Seamlessly access the console of switches directly from the ONES interface, simplifying device management. Effortlessly download the inventory details of all devices for efficient asset management.
6. Network SLA: Packet Loss & Latency
ONES 2.0 incorporates exclusive back-end support for Network SLA, enabling users to monitor packet loss and latency between any two end-points, measured using ICMP or TCP.
How does NetOps Ready Orchestration enhance enterprise deployments?
Incremental Config
ONES 2.0 adds support for incremental configuration changes. You may now deploy a template to configure your fabric and progressively update VLANs/VNIs across the fabric. This agility makes your network designs more adaptable.
Improved NetOps API
The NetOps API has been improved in 2.0 to accommodate several underlay configurations, including seamless underlay and overlay configuration. This capability not only provides for a broader range of use-cases, but also handles more deployment scenarios such as L2LS, L3LS, and flexible configuration and elastic scaling of spine and leaf nodes
New Use Cases & Features
L3 MC-LAG While L2 MC-LAG was already supported, ONES 2.0 adds L3 MC-LAG. This provides redundancy, load balancing, scalability, and easier management to meet diverse networking difficulties. Improve your network’s performance and reliability across a wide range of use scenarios.
Layer2 Leaf-Spine (L2/L3 Mode) Designed to meet the escalating demands of modern data centers, the Layer2 Leaf-Spine architecture offers low-latency, high-bandwidth connectivity with redundancy and efficient traffic distribution.
Rack-2-Rack Deployment A deployment scenario tailored for fabrics designed with interconnection with leaf devices exclusively, eliminating the need for spines. This streamlined configuration suits specific network architectures.
BGP Peering over LAG Enable higher bandwidth, load balancing, and redundancy with this configuration use case. Multiple physical links are aggregated into a logical bundle, known as a PortChannel, optimizing network performance.
BGP Peering in MC-LAG Environments Configuring BGP peering among MC-LAG peers over the PeerLink ensures seamless operations during uplink failures, while additional interface peering expands network capabilities for optimized traffic handling and heightened resilience.
sFlow & DHCP Relay Support ONES orchestration tool’s recent support for sFlow and DHCP Relay within the data center, unlock a new level of network management.
Console Log ONES 2.0 provides administrators with a centralized console plane from which they can monitor network activity for various devices.This is quite useful for tracking the progress of the operation and recording logs in case of problems.
Figure 6: Console Log
Monitoring Configuration & Operational Status In our latest release, track orchestration progresses seamlessly through our intuitive GUI. Gain real-time visibility into the progress of operational validation, empowering you to monitor every stage of your network’s deployment and verification with ease.
Figure 7: ONES UI Config and Operational Status
Explore the power and versatility of ONES 2.0 for SONiC with these exciting new features and UI enhancements. Elevate your network orchestration and management to new heights!
Conclusion
In summary, ONES 2.0 represents a significant advancement in network operations and management, establishing new standards in visibility, orchestration, and support. Packed with innovative features, an enhanced user interface, and expanded capabilities, ONES 2.0 empowers network teams to streamline operations, gain profound insights, and effortlessly ensure peak performance.
What’s next in our upcoming blog series, stay tuned to know following insightful topics:
Rule Engine, Alerts, and Notifications
RoCE Traffic Visibility in AI Fabric
Detailed Security Compliance with ONES
In-depth Analysis of NWSLA Measurement
Immerse yourself in the transformative capabilities of ONES 2.0 for SONiC, and join us on a journey toward seamless network monitoring and orchestration. Unlock the ONES 2.0 experience—schedule a demo on your preferred date, and let us show you how it’s done!
FAQs
1. What makes ONES 2.0 a game-changer for SONiC network operations?
ONES 2.0 revolutionizes SONiC network management with enhanced Visibility, Orchestration, and Supportability. It introduces incremental configuration, upgraded NetOps APIs, advanced telemetry, and rule-based alerts—allowing network teams to streamline operations, gain deeper insights, and optimize performance seamlessly.
2. How does ONES 2.0 improve network visibility and telemetry?
ONES 2.0 offers AI-Fabric control plane insights, RoCE traffic visibility, and Data Center Interconnect telemetry. It provides a real-time topology view, enriched metrics on health, traffic, and capacity, and deep insights into firmware compliance and device performance—empowering proactive network monitoring.
3.What orchestration and automation features does ONES 2.0 introduce?
ONES 2.0 enhances orchestration with YAML-based templates, incremental configuration updates, and automated L2/L3 fabric management. It also introduces seamless BGP peering, MC-LAG redundancy, and elastic scaling for leaf-spine architectures—optimizing network deployment efficiency.
4. How does ONES 2.0 enhance security and compliance in SONiC environments?
ONES 2.0 strengthens security with RBAC enforcement, mutual TLS certificates, and LDAP integration. It ensures firmware compliance by tracking ONIE, BMC, BIOS, FPGA, and CPLD versions—helping enterprises maintain a secure and standardized network infrastructure.
5. What are the key supportability features in ONES 2.0?
ONES 2.0 simplifies troubleshooting with syslog management, centralized console access, and inventory downloads. It also supports Network SLA monitoring, allowing users to measure packet loss and latency between endpoints via ICMP or TCP, ensuring optimal network health.
6. How does ONES 2.0 handle firmware compliance across multi-vendor environments?
ONES 2.0 provides comprehensive firmware visibility for each switch, tracking critical components like ONIE, BMC, BIOS, FPGA, and CPLD versions. This enables consistent firmware governance across multi-vendor deployments, helping enterprises maintain standardized and compliant infrastructure with ease.
7. What new use cases are supported in ONES 2.0 for modern data centers?
ONES 2.0 introduces support for L3 MC-LAG, BGP peering over MC-LAG and PortChannel, Layer2 Leaf-Spine topologies, and Rack-to-Rack deployments. These configurations address growing needs for scalability, redundancy, and simplified architecture, making the platform ideal for modern, elastic network fabrics.
8. How does ONES 2.0 help monitor packet loss and latency for SLA enforcement?
ONES 2.0 includes backend support for Network SLA monitoring, allowing administrators to measure packet loss and latency between endpoints using ICMP or TCP probes. These insights enable proactive SLA assurance and help fine-tune network performance to meet business-critical requirements.
9. What is the significance of console access and syslog management in ONES 2.0?
ONES 2.0 introduces centralized console access directly from the topology UI, enabling faster troubleshooting. With programmatic syslog collection, admins can gather logs from switches in one click, drastically reducing MTTR and improving operational visibility across the network.
10. How does ONES 2.0 enable real-time operational tracking and orchestration status?
ONES 2.0 offers intuitive GUI-based orchestration tracking, displaying configuration progress and operational validation in real time. This helps network teams visually monitor every stage of deployment, verify configurations instantly, and gain confidence in changes before production rollout.
ONES Rule Engine is an advanced feature that enhances your network management experience by providing a seamlessly integrated alert and notification system. It offers comprehensive monitoring metrics and allows you to create device and interface level rules with ease. With ONES Rule Engine, you can have tailored control over your network management. Upgrade your network management game and experience with ONES Rule Engine today!
10 Benefits of Using ONES Rule-Engine for Comprehensive Network Monitoring
Comprehensive Monitoring ONES Rule-Engine takes a holistic approach to network monitoring by keeping an eye on diverse metrics such as CPU utilization, Memory utilization, PSU status, fans speed, RX/TX, and more. This breadth of coverage ensures that no aspect of your network goes unnoticed, providing a comprehensive view for proactive issue resolution.
Device and Interface Level It allows the creation of rules at both device and interface levels. This fine grained rule management ensures that specific devices or interfaces can be targeted for rule application, allowing for a tailored approach to network optimization and issue handling.
Rule Customization Rule-Engine understands the unique requirements of different network components. With device-level rules based on Hardware SKU, Role, and OS version, administrators can fine-tune alerts to align with the specific characteristics of their network infrastructure.
Figure 1: Rule Configuration
Device Inclusion & Exclusion Flexibility is key in network management. The rule engine provides the capability to include or exclude devices from rules, ensuring that the rule engine caters to the specific needs of your network architecture. This feature enables a dynamic response to changes in the network environment.
Severity-Based Alerting The Rule-Engine facilitates the creation of Critical and Warning severity alerts, allowing administrators to prioritize responses based on the urgency and impact of potential issues. This hierarchical alerting system ensures that critical problems are addressed promptly, minimizing downtime and optimizing network performance.
Alert Summary for Collaborative Issue Resolution The system enables users to generate a comprehensive report of all alerts, facilitating effortless sharing with the team. This feature simplifies the collaborative resolution process, promoting efficient communication and knowledge transfer among team members.
Figure 2: Alert Summary
Integration with Slack for real-time notifications ONES’ Slack integration ensures that critical alerts are delivered directly to designated channels, keeping teams informed and in sync. Additionally, weekly Slack digests provide a comprehensive overview of alerts and Zendesk ticket details, streamlining communication and collaboration.
Zendesk Integration for Streamlined Ticketing The rule engine seamlessly integrates with Zendesk, automating the creation of tickets based on alerts. This integration simplifies the ticketing process, providing a centralized platform for tracking and managing network issues.
Preventing redundant alerts leads to efficient alerting During the rule creation process, administrators have the capability to specify the maximum number of alerts for a particular metric on a specific device, mitigating the occurrence of redundant notifications. This feature contributes to a streamlined and efficient alerting system, enhancing the overall effectiveness of network management within the ONES 2.0 ecosystem.
Strengthening Monitoring and Response Capabilities with detailed alert information Each alert is enriched with essential details, including Metric Name, Type(Critical or Warning), Triggered Time and Associated Rule Information. Alerts also includes a URL that will redirect users to associated visual representations for better understanding. In addition, device information such as IP address, role, region, SKU, serial number, NOS etc are the part of alert details. Interface specific alerts will have the related additional information like the interface name , speed , Transceiver details as shown in below image Fig 3.
Figure 3: Alerts details on Zendesk
Figure 4: Alerts Summary on Slack
Rule Engine coverage
System Health Rules can be created to monitor system health like device’s CPU utilization, Memory utilization and CPU core temperatures and alert if those values exceed the critical or warning thresholds. ONES UI also provides the recommended thresholds for CPU and memory usage.
Alert on Component Failures Rule engine can be used to alert if a device FAN or a Power supply unit (PSU) goes faulty. ONES backend keeps continuous track of component health and triggers an alert in case of failure.
Capacity Monitoring Hardware switching is an important aspect in today’s network for high speed data transmissions. Situations can develop where the switch ASIC hardware limits are utilized and forwarding happens in software causing system instability. ONES rule engines have these monitored as well and rules can be created to notify if the ASIC IPv4 / IPv6 utilizationexceeds the warning and critical levels.
Traffic Monitoring Set the utilization levels for traffic links , acceptable thresholds for errors and discards and alerts will be generated for links crossing the set levels.
Transceiver Health Transceiver operational values like Voltage, Temperature and Power are critical for having error free and lossless transmissions. Rule engine monitors those metrics and alerts the transceivers that are on verge of going rogue or requiring attention.
SONiC Services Health In addition to all the above , alerts can be generated for any BGP neighboring going down and for monitoring synced and for container cpu utilization.
Conclusion
Embrace the power of ONES 2.0’s Rule Engine and Alerting system to elevate your network management experience. With real-time monitoring of hardware, network, components, counters and transceiver health to enhance your SONiC journey with unparalleled support and advanced alert management through Slack and Zendesk integrations.
The alerts system goes beyond Slack or Zendesk integrations and can be customized to fit any platform based on the requirements.
Stay tuned for our upcoming blog series, where we’ll dive deep into these insightful topics:
RoCE Traffic Visibility in AI Fabric
Detailed security compliance with ONES
In-depth analysis regarding the measurement of NWSLA
Take a ‘test drive’ with ONES Center before SONiC Deployments with our well known vendors in hardware, platforms, ASIC and OS at your ease. Make your informed decision by testing it out with our multi-vendor, including Cisco SONiC, NVIDIA SONiC, Celestica SONiC, Marvell SONiC, Wistron SONiC, Edgecore Community SONiC, Arista SONiC, Supermicro SONiC, Enterprise SONiC, and DELL SONiC.
FAQs
1. What is the ONES Rule Engine and how does it enhance SONiC network monitoring?
The ONES Rule Engine is a powerful monitoring and alerting feature introduced in ONES 2.0. It enables network teams to create customized rules at both device and interface levels for key metrics like CPU, memory, RX/TX, PSU, and fan status. With advanced alerting, real-time Slack/Zendesk integrations, and precise rule targeting, it elevates network observability and enables proactive issue resolution in multi-vendor SONiC deployments.
2. Can ONES Rule Engine integrate with Slack and Zendesk for real-time alerts and ticketing?
Yes, ONES Rule Engine supports seamless integration with Slack and Zendesk. It delivers real-time alerts to Slack channels and creates automated support tickets in Zendesk. This ensures teams stay updated, improve collaboration, and streamline incident tracking and resolution processes.
3. How does ONES Rule Engine help prevent alert fatigue in large-scale network environments?
ONES Rule Engine allows administrators to set alert thresholds and configure a maximum number of alerts per metric per device, helping avoid redundant notifications. This keeps alerting efficient and focused, minimizing noise while ensuring critical issues are addressed promptly.
4. What types of alerts can ONES 2.0 generate using the Rule Engine?
ONES 2.0 can generate alerts for system health (CPU, memory), component failures (fan, PSU), traffic utilization, ASIC capacity, transceiver performance (voltage, temperature, power), and SONiC services (e.g., BGP neighbor down, container CPU usage). Each alert includes detailed metadata, such as device role, IP, SKU, NOS, and interface specs.
5. Why is ONES 2.0 Rule Engine important for multi-vendor SONiC network operations?
In multi-vendor SONiC environments, consistent monitoring is critical. ONES 2.0 Rule Engine normalizes observability across different platforms, allowing tailored alerts, unified visibility, and centralized control. This helps organizations scale and secure their network operations while ensuring consistent SLA compliance.
6. Can ONES Rule Engine be customized for device-specific conditions like hardware SKU or OS version?
Yes, ONES Rule Engine offers advanced rule customization based on hardware SKU, device role, and OS version. This allows administrators to fine-tune alert behavior for each device type or vendor, ensuring accurate, relevant monitoring across a multi-vendor SONiC infrastructure.
7. How does ONES 2.0 manage component-level failures like fan or PSU issues?
ONES 2.0 continuously monitors critical hardware components like fans and power supply units (PSUs). The Rule Engine generates alerts when failures are detected, enabling quick identification and resolution of hardware issues to maintain network reliability and avoid service disruptions.
8. How does ONES Rule Engine enhance traffic and capacity monitoring in SONiC environments?
ONES Rule Engine monitors traffic utilization, error rates, discards, and ASIC IPv4/IPv6 capacity thresholds. This allows network operators to proactively identify bottlenecks or potential forwarding failures, helping ensure high-speed data transmission and system stability.
9.How does ONES 2.0 simplify multi-platform alert visualization and resolution?
ONES 2.0 enriches each alert with detailed metadata including metric type, alert severity, device information (IP, SKU, NOS, etc.), and visual links to dashboards. Alerts are pushed to Slack for real-time awareness and Zendesk for ticketing, enabling end-to-end visibility and streamlined resolution across platforms.
In today’s fast-paced digital landscape, safeguarding your enterprise is paramount. With cyber threats constantly evolving, having a robust security strategy is non-negotiable.
Securing Your Enterprise with ONES and SONiC (Software for Open Networking in the Cloud): This Comprehensive Guide Talks About
Focus on Enterprise Product Security: Exploring essential aspects of securing enterprise products
Fortifying ONES: Detailing how we’ve strengthened ONES for enterprise SONiC customers
Pivotal Security Elements: Highlighting crucial security components like security scans, Certificate Authorities (CAs), user account management, Role-Based Access Control (RBAC), LDAP, and Mutual TLS (Transport Layer Security)
Fortifying Your Enterprise: 8 Essential Enterprise Security Practices
Regular Security Scans: Perform frequent security scans to identify vulnerabilities and weaknesses
Robust Certificate Management: Establish a reliable CA infrastructure to ensure trust in digital certificates
User Account Hygiene: Enforce strong password policies, implement MFA, and monitor user accounts for suspicious activity
RBAC Implementation: Assign roles and access permissions based on job responsibilities, and regularly review and update them
LDAP Integration: Centralize user and resource management with LDAP to improve security and network efficiency
Implement Mutual TLS: Secure communication between systems and services with mutual TLS for enhanced data protection
Streaming Telemetry and Continuous Monitoring: Start with collecting data from various sources such as logs, network traffic, and endpoint devices. Advanced analytics and machine learning are employed to identify anomalous behavior and potential security incidents
Security Patches: Must-have tools in the ongoing battle against cyber threats. They are updates released by software vendors to address known vulnerabilities and weaknesses in their products
Aviz Networks commences its journey with customers right from the pre-deployment stages. Our dedicated customer success teams collaborate closely with enterprise security and audit teams to align their strategies and processes with security objectives.
To learn more about our successful partnership with SONiC, we invite you to explore our case study: “Maximizing Success with SONiC.” Discover firsthand how Aviz Networks delivers reliable and secure solutions to empower your network infrastructure.
Let’s understand how we support multi-vendor SONiC deployments without compromising on the enterprise security requirements.
Revolutionize Your Networking with ONES: The Open Networking Enterprise Suite
ONES is a network orchestration, visibility, and assurance solution for multi-vendor and multi-NOS operated network infrastructure. It provides a one-stop solution, right from delivering deep network observability into your data center networks to extending 24×7 SONiC support. This solution also hosts a powerful analytics engine that assists users in identifying network issues and troubleshooting their networks, in case of common network anomalies and disruptions.
We focused on network security as the primary tenet while building ONES to cater to our enterprise SONiC customers and ensured the product adhered to all the best practices mentioned above. This blog highlights how the best practices are implemented in ONES.
ONES Overview
Streamlining Security Measures with Automated Scans
While customers perform security scans on software images, nonetheless, we have integrated and automated security scans within the CICD pipeline to ensure the integrity of software packages.
Aviz runs security checks, installer scan, SAST/ DAST (Static/Dynamic Application Security Testing) using SynK, SonarQube, etc. to ensure the robustness of the ONES application and identify any vulnerabilities against malicious attacks and potential security risks.
We adopt a CICD framework that integrates security into all phases of the software development lifecycle to reduce the risk of releasing code with security vulnerabilities.
CICD Framework
Ensuring Secure Communication with HTTPS CA Certs
ONES strongly enforces HTTPS over standard port 443 coupled with certificates signed by trusted Certificate Authority (CA). We firmly believe that HTTPS with CA certs is the sole method of safeguarding sensitive information and privacy while the data transfers between systems and services in an enterprise environment.
Setting Up User Accounts and Role-Based Access Control
ONES is designed in such a way that every user has an independent ONES account and is never required to share credentials with others. However, we have also created a ‘super admin’ account that can be used for troubleshooting and recovery in case of any individual account issues, for example – a locked account or forgotten password, etc.
Account Management – User accounts
In addition to user accounts, ONES provides a fine-grained RBAC to restrict access to special features. It ensures that the individuals have the appropriate level of access based on their roles and responsibilities within the organization.
Ex: Critical Switch operations like reboot, ZTP can be allowed for Vendor staff.
Super admin
Enterprise Admin
Enterprise Staff
Vendor Staff
Account Management – Roles and Permissions
Benefits of LDAP for Centralized User Authentication
LDAP simplifies user authentication and directory services in enterprise environments. It centralizes user account information, making it easier to manage access and permissions. Integrating LDAP into your security strategy enhances user management and access control while promoting scalability and efficiency. ONES application extends integrations with customer identity management solutions such as Active Directory and uses LDAP to communicate with Active Directory to authenticate users.
What is Mutual TLS and How Does it Ensure Secure Communication?
ONES is designed to support Mutual TLS (Transport Layer Security), or mTLS, which is a security mechanism that ensures both parties in a communication exchange can trust each other’s identity. It’s particularly valuable for securing data transfer between systems and services in an enterprise environment. ONES utilizes gRPC infrastructure to communicate with switch agents. TLS is the primary security protocol used by gRPC to secure the communication between the client and the server. TLS provides authentication, confidentiality, and integrity of data. Authentication is achieved using digital certificates which verify the identity of the client and the server.
Continuous Compliance Monitoring with ONES: Real-time Metrics and Alert Capabilities
ONES enables streaming telemetry and continuously collects metrics for software compliance such as software versions (NOS, Kernel, and ONIE software versions), EOL (End of Life) licenses, and security vulnerabilities. Also, ONES enables policies and alert capabilities to ensure that organizations remain compliant with regulatory requirements and security policies. It provides a real-time view of compliance status and helps in identifying and remedying compliance issues promptly.
Dashboard – Software Compliance
What Are the Benefits of Vulnerability Patching?
Security patches are essential tools in the ongoing battle against cyber threats. They are updates released by software vendors to address known vulnerabilities and weaknesses in their products. These patches are designed to bolster the security of your systems, close potential entry points for attackers, and mitigate the risk of exploitation. ONES is built using cloud-native and microservice design principles. Therefore, it allows container upgrades without impacting the data path or application downtime. It also allows updating security fixes or vulnerability patches without upgrading the whole system. Moreover, ONES continuously monitors for security vulnerabilities and leverages the CICD to timely update the patches to the system.
How to Secure API Endpoints with ONES?
Securing an API with an enterprise product involves a combination of strategies, tools, and best practices. ONES implements user authentication using API tokens or JWT to ensure that only authorized users and applications can access the API. ONES is containerized and all the services are hosted behind an API gateway to rate limit API to endpoints.
Conclusion: Comprehensive Approach to Enterprise Product Security
In an era of evolving cyber threats, fortifying your enterprise is not just a choice – it’s a necessity. By adopting a comprehensive approach to security, leveraging essential practices, and implementing cutting-edge technologies like ONES and SONiC, you can establish a robust defense against potential vulnerabilities.
Key Takeaways:
Regular security scans, robust certificate management, user account management, RBAC implementation, LDAP integration, and Mutual TLS are fundamental security practices that form the bedrock of a secure enterprise environment.
Implementing these practices ensures trust, integrity, and confidentiality in data transfer and access control.
At Aviz Networks, we’re dedicated to support you from pre-deployment to post-deployment, ensuring alignment of strategies with your security goals.
Security Assurance:
Prioritizing security not only shields your organization but also instills trust in your customers and partners. They can rely on you to safeguard their sensitive information and maintain the integrity of your products and services. Our products adhere to best practices during the commissioning of sandbox and production deployments.
Interested in experiencing the power of ONES firsthand? We invite you to request a ONES demo. Our team is ready to connect with you and your team, providing insights and solutions tailored to your specific security requirements.
Stay Vigilant:
Remember, security is an ongoing process. Stay vigilant, regularly update your security measures, and adapt to emerging threats to ensure the ongoing safety of your enterprise.
FAQs
1. How does ONES enhance enterprise security for multi-vendor SONiC deployments?
ONES strengthens enterprise SONiC deployments by implementing a layered security strategy that includes security scans, certificate authority (CA) integration, role-based access control (RBAC), LDAP authentication, and Mutual TLS. It ensures secure communication, user-level access controls, and real-time monitoring to detect vulnerabilities and ensure regulatory compliance.
2. What role does Mutual TLS play in securing SONiC-based infrastructures with ONES?
Mutual TLS (mTLS) in ONES ensures secure, authenticated communication between clients and servers by requiring both parties to verify each other’s identities using digital certificates. This mechanism protects sensitive enterprise data during transfers between systems and services, especially across multi-vendor environments.
3. How does ONES perform automated security scans and vulnerability patching?
ONES integrates security scans directly into the CI/CD pipeline using tools like Synk and SonarQube. These scans detect vulnerabilities early in the development lifecycle. ONES also supports continuous patching without requiring full system upgrades, minimizing downtime while maintaining application integrity.
4. What is the importance of LDAP integration in ONES for enterprise networks?
LDAP integration allows centralized user authentication and directory services, enabling secure and scalable user management across the network. ONES connects with enterprise identity platforms like Active Directory to streamline access control, reduce administrative overhead, and strengthen security posture.
5. How does ONES help with real-time compliance and telemetry monitoring?
ONES enables streaming telemetry to collect real-time data on software versions, EOL licenses, and security vulnerabilities. It also supports custom policies and alert systems to ensure continuous compliance with security protocols, while providing visual dashboards to track software health and detect anomalies instantly.
6. How does ONES ensure secure API access in enterprise environments?
ONES secures its API endpoints using authentication mechanisms like API tokens and JWT (JSON Web Tokens). It also deploys an API gateway to control traffic, enforce rate limits, and ensure only authorized users and applications can access specific services—effectively safeguarding critical operations from unauthorized access.
7. What makes ONES a reliable choice for enterprises looking to adopt secure open-source networking?
ONES is designed with enterprise-grade security in mind. It integrates best-in-class practices such as automated scans, centralized identity management, mutual TLS, fine-grained RBAC, and continuous telemetry monitoring. It also provides 24×7 support and aligns tightly with enterprise security policies.
8. Can ONES help detect and mitigate potential security incidents proactively?
Yes, ONES leverages streaming telemetry and machine learning-powered analytics to continuously monitor network behavior. It helps detect anomalies, trigger alerts, and enables proactive threat mitigation, minimizing the risk of security breaches before they escalate.
9. How does ONES handle user account safety and recovery?
ONES enforces strong account hygiene with policies like MFA and activity monitoring. Each user has an individual account, and there’s a ‘super admin’ account available for troubleshooting and recovery tasks like resetting locked or inaccessible accounts—ensuring operational continuity.
10. What is the role of ONES in ensuring compliance with industry security standards?
ONES tracks software versions, patch levels, and EOL licenses across all devices. It provides policy-based alerts and visual compliance dashboards, helping enterprises meet regulatory standards and internal governance requirements without manual effort.
In today’s interconnected world, Network Operations (NetOps) Support Framework is crucial for organizations to maintain a robust and reliable network infrastructure. It provides the foundation to manage and optimize network performance, ensure seamless connectivity, and address other related issues. In this post, we bring you an overview of NetOps Support Frameworks, their key components, and significance in maintaining efficient operations. We also talk about SLAs and their benefits in NetOps Support Framework.
Components of NetOps Support Frameworks
Let’s quickly glance through a few critical components.
1. Network Monitoring and Management
This component covers:
Real-time monitoring of network devices and traffic
Performance analysis and reporting
Configuration management and compliance
Network inventory and asset management
The next-generation management tools offer extensions for supporting advanced functions that include:
Network Orchestration
Streaming Telemetry
Network Orchestration and Telemetry Streaming work together to enable the automation, control, and visibility of network operations while leveraging real-time telemetry data for enhanced network management and analysis. Let’s understand these functions in detail.
Network Orchestration
This function represents the overall system responsible for orchestrating and automating network operations, including configuration management, service provisioning, and network policies. It includes a core component, Orchestration Engine, that receives high-level commands/policies and further, translates them into actionable tasks for Network devices. A network device is a physical or virtual one that makes up the network infrastructure such as a router, switch, firewall, or load balancer.
Telemetry Streaming
This function represents the process of collecting, aggregating, and forwarding real-time network telemetry data to various telemetry consumers for analysis and decision-making purposes. Here, Telemetry Collector acts as an intermediary component responsible for collecting telemetry data from network devices, leveraging protocols like gRPC, NETCONF, or SNMP. Telemetry Consumers refer to the applications, systems, or analytics platforms that consume and analyze network telemetry data. These consumers can include network monitoring tools, data analytics platforms, and machine learning systems.
2. Fault Management and Troubleshooting
Thiscomponent includes:
Rapid detection and isolation of network issues
Root cause analysis and remediation
Incident management and escalation processes
3. Change Management and Configuration
1. Control and coordination of network changes 2. Version control and documentation 3. Change approval processes and tracking
4. Performance Optimization
1. Capacity planning and bandwidth management 2. Quality of Service (QoS) implementation 3. Traffic engineering and optimization 4. Proactive network optimization strategies
5. Security and Compliance
1. Network security monitoring and threat detection 2. Firewall management and access control 3. Compliance with industry regulations (for example PCI-DSS, GDPR) 4. Vulnerability assessment and patch management
Supporting Multi-Vendor NOS and Switch Hardware
In today’s diverse networking landscape, organizations often rely on a mix of network operating systems (NOS) and vendors to meet their specific requirements. However, managing and supporting multi-vendor NOS environments poses unique challenges that can be streamlined with specialized NetOps Support Frameworks. Multi-vendor NOS integration in NetOps Support Frameworks requires an understanding of interoperability challenges and the need for standardized management frameworks. For a seamless multi-vendor NOS support, vendor-agnostic network monitoring and management are primarily needed for:
1. Consolidated monitoring of dashboards for heterogeneous network devices 2. Integration with various NOS APIs for unified device management 3. Leveraging standardized protocols (for example SNMP, NETCONF, RESTful APIs) for device communication 4. Managing and troubleshooting cross-vendor faults: a. Correlation of alerts and events from different NOS vendors b. Centralized incident management and ticketing system c. Collaboration with vendor support teams for issue resolution 5. Change management and configuration: a. Standardized configuration templates for different NOS vendors b. Integration with configuration management databases (CMDB) c. Change tracking and rollback mechanisms for multi-vendor environments 6. Performance optimization and traffic engineering: a. Bandwidth allocation and optimization across diverse NOS platforms b. QoS implementation for consistent performance across vendors c. Traffic engineering strategies for load balancing and optimization
Importance of Service Level Agreements (SLAs)
In network infrastructure support, SLAs define the agreed-upon expectations/responsibilities between service providers, like Aviz Networks, and their customers. These SLAs outline key performance indicators such as service availability, response times, and other parameters.
Therefore, these play a vital role in ensuring that the network meets desired service levels and provides a satisfactory user experience. Let’s deep dive into more details:
KPIs: SLAs outline multiple KPIs such as network availability, packet loss, latency, throughput, and response times. By benchmarking the metrics, SLAs provide a quantifiable means for evaluating the performance of network infrastructure as well as service provider.
Network Availability: SLAs specify the expected level of network availability, typically expressed as a percentage of uptime over a given period. This metric indicates how often the network should be operational and accessible to users. It also ensures the accountability of a network service provider for maintaining a reliable and continuously available network infrastructure.
Response and Resolution Times: SLAs often include response and resolution time commitments for network incidents or service requests. The response time defines how quickly the service provider should acknowledge and respond to reported issues. The resolution time sets expectations about the time required to restore the network service to its normal functioning state.
Downtime and Maintenance Windows: Another benefit of such agreements is the provision for scheduled maintenance windows during which network services may be unavailable temporarily. By establishing a clear schedule and notifying customers in advance, SLAs help manage expectations and minimize service disruptions.
Escalation Procedures: SLAs outline escalation procedures to follow in case of critical incidents or service disruptions. This ensures that prompt actions are taken to address the issue and involve higher-level support or management, if necessary.
Remedies and Compensation: SLAs include provisions for remedies in the form of service credits, discounts, or other types of compensation to mitigate the impact of service disruptions/failures caused by the service providers.
Reporting and Review: Lastly, these agreements usually include reporting mechanisms to track and communicate network performance against the agreed-upon metrics. Regular performance reports and service reviews enable both parties to assess the network’s performance, identify areas for improvement, and ensure transparency and accountability.
Benefits of SLAs in NetOps Support
Improved Operational Efficiency: a. Streamlined management processes for diverse NOS platforms b. Reduced complexity and overhead associated with managing multiple vendors c. Centralized visibility and control over the entire network infrastructure
Enhanced Network Resilience and Performance: a. Rapid fault detection and resolution across different NOS environments b. Optimal utilization of network resources through unified performance optimization strategies c. Consistent security measures and compliance enforcement across vendors
Customer Satisfaction and Business Continuity: a. Adherence to SLAs for ensuring service reliability and customer satisfaction b. Minimized downtime and faster incident resolution through SLA-driven support processes c. Risk mitigation associated with multi-vendor environments
ONES from Aviz Networks is a network observability/visibility, orchestration, and assurance solution for network switches running SONiC and vendor-proprietary NOS (Network Operating System).
ONES provides a one-stop solution, right from providing better visibility into your data center networks to extending 24×7 support function for SONiC. It also hosts a powerful analytics engine that provides Proactive, Predictive, and Prescriptive Analysis of common network anomalies and disruptions.
The key capabilities of ONES include:
Purpose-built solution for SONiC deployments
Supports multiple NOS for comprehensive visibility
Orchestration and deep telemetry for observability
24×7 enterprise-grade support options for SONiC
ONES – Value and Beyond
MONITOR
Monitor your entire multi-NOS fabric
Manage inventory of your network devices running any Network OS on Broadcom, Marvell, Nvidia, and other leading ASICs View topology of the entire fabric across multiple hardware platforms, and network operating systems Monitor traffic, system health, bandwidth utilization, and more between and across devices
ORCHESTRATE
Configure your SONiC fabric with ease
Create and configure CLOS topology for ToR, Leaf, Spine, and Super-spine layers Apply and validate configurations pre- and post-deployment Compare running configs against applied configs at any point Upgrade devices with a single-click via ZTP or custom NOS Images
SUPPORTABILITY
NetOps Simplified
Proactively track Switch CPU/memory consumption, bandwidth, link failures, traffic errors, and more Instantly connect to individual devices for maintenance and quick troubleshooting Collaborate across your teams and with our SONiC experts to solve issues more efficiently
Traditional Network Orchestration tools have evolved from just delivering and monitoring network functions for proprietary NOS to designing and building network fabrics in an automated and intent-based approach.
ONES takes the Orchestration journey to the next level—adding capabilities from SONiC NOS across a fleet of multi-vendor and multi-ASIC switches, bringing together capabilities of streaming telemetry, API programmability, network control, intent-based fabric configuration, and SLA assurance for supportability.
Predictive failure/health analytics and capacity planning enable Orchestration tools (like ONES) to provide a seamless adoption journey for SONiC by leveraging historical trends of resource utilization, traffic patterns, logs/events, and derived application/workload performance.
Supportability, a crucial feature of Network Orchestration tools, goes beyond just notifying and alerting. It also enables integration with IT tools/engine to check anomalies or events correlation using real-time or historical data, single-touch management, and in turn, simplify switch/fabric onboarding for scale.
With the rapid adoption of open-source SONiC, ONES has emerged as a one-stop solution for network infrastructure teams. It seamlessly enables orchestration, deep telemetry, and assurance for multi-vendor deployments. Most importantly, the 24×7 SRE support enables them to introduce SONiC in their networks with utmost confidence.
Author: Arakkal Kunju Mohammed Yasser, Director of Engineering, Site Reliability Engineering
FAQs
1. What are the key components of an effective NetOps Support Framework?
An effective NetOps Support Framework typically includes:
Network Monitoring and Management: Real-time traffic monitoring, performance analysis, configuration compliance, and asset inventory.
Fault Management and Troubleshooting: Rapid issue detection, root cause analysis, and escalation workflows.
Change Management: Coordinated control of network changes, version tracking, and change approval systems.
Performance Optimization: Bandwidth management, QoS implementation, and proactive traffic engineering.
Security and Compliance: Threat detection, access control, patch management, and regulatory compliance (e.g., PCI-DSS, GDPR).
These components collectively support resilient, secure, and high-performing network operations.
2. How does ONES simplify NetOps for multi-vendor and multi-NOS environments?
ONES provides a vendor-agnostic platform that unifies visibility, orchestration, and assurance across various switch vendors and network operating systems (e.g., SONiC, Cumulus Linux, Arista EOS, Cisco NX-OS). It enables:
A single-pane-of-glass view across all devices.
Streamlined inventory management and real-time telemetry monitoring.
Support for multi-ASIC environments (Broadcom, Marvell, NVIDIA).
Deep telemetry, configuration drift detection, and simplified switch onboarding. This allows enterprises to operate mixed environments with confidence and ease.
3. What role do SLAs play in NetOps support and infrastructure resilience?
SLAs (Service Level Agreements) define expectations between providers like Aviz and enterprise customers. They cover metrics like:
Network availability
Packet loss, latency, and throughput
Response and resolution times for incidents
Downtime windows and escalation paths
SLAs ensure accountability, drive operational efficiency, and deliver business continuity by guaranteeing faster incident resolution and minimizing risks in multi-vendor SONiC environments.
4. How does ONES integrate telemetry and orchestration for SONiC-based networks?
ONES uses streaming telemetry and intent-based orchestration to manage SONiC-based fabrics. It:
Collects near real-time data for health, traffic, and configuration metrics.
Supports Day 1 and Day 2 operations with automated config validation and topology orchestration (e.g., CLOS).
Integrates with APIs and analytics tools to enable proactive troubleshooting, configuration management, and real-time insights.
This fusion allows SREs to operate with deep observability and automation, improving network efficiency.
5. What are the benefits of predictive analytics in SONiC network operations
ONES uses proactive, predictive, and prescriptive analytics to detect and prevent network anomalies before they impact operations. It helps teams:
Predict failures based on trends in CPU usage, memory, traffic patterns, and logs.
Plan for capacity upgrades and network scaling.
Reduce downtime with early warnings and automation workflows.
Make data-driven decisions for optimization and future-proofing network infrastructure.
Predictive analytics empowers NetOps teams to shift from reactive to proactive network management.
6. How do SLAs improve NetOps efficiency and service reliability?
Service Level Agreements (SLAs) define network performance expectations between providers and customers. They ensure:
Guaranteed network uptime and availability
Defined response and resolution times for network incidents
Proactive performance monitoring with KPIs like latency, packet loss, and throughput
Escalation procedures and service credits in case of SLA breaches By setting clear performance benchmarks, SLAs improve operational efficiency, customer satisfaction, and business continuity.
7. How does ONES simplify SONiC adoption in network environments?
ONES facilitates SONiC adoption by:
Providing a single platform for monitoring SONiC and other NOS-based networks
Automating configuration and deployment of SONiC switches
Enabling real-time telemetry and deep observability
Offering 24×7 SRE support to resolve SONiC-related issues efficiently It reduces complexity, speeds up deployment, and enhances the operational stability of SONiC-powered networks.
8. How does predictive analytics help in preventing network failures?
ONES leverages AI-driven predictive failure analytics to:
Detect early warning signs based on historical trends and real-time data
Forecast potential capacity bottlenecks and performance degradation
Automate preventive actions before failures occur
Enhance traffic engineering to optimize resource utilization This ensures a proactive rather than reactive approach to network maintenance.
9. How does ONES integrate with existing IT and security tools?
ONES supports integration with:
Configuration Management Databases (CMDBs) for centralized configuration tracking
Incident management and ticketing systems for automated troubleshooting
Security frameworks to enforce compliance with regulations like PCI-DSS and GDPR
RESTful APIs and telemetry consumers for seamless interoperability This makes ONES an enterprise-friendly solution that fits into existing NetOps ecosystems.
10. What makes ONES different from traditional network orchestration tools?
Traditional network orchestration tools focus mainly on basic configuration and monitoring. ONES goes beyond by offering:
Intent-based fabric configuration for automated network design
Streaming telemetry for real-time insights
AI-powered analytics to predict and prevent failures
Multi-vendor and multi-ASIC support for heterogeneous network environments
SLA-driven assurance for better reliability and performance
Power up your NetOps Support Framework for SREs with ONES Innovation
In today’s interconnected world, Network Operations (NetOps) Support Framework is crucial for organizations to maintain a robust and reliable network infrastructure. It provides the foundation to manage and optimize network performance, ensure seamless connectivity, and address other related issues. In this post, we bring you an overview of NetOps Support Frameworks, their key components, and […]