Category: SONiC

Simplify Your SONiC RMA Experience with ONES Backup & Restore

Post author By Pramod Taramatta
Post date 8 April 2025

In today’s fast-paced networking landscape, data is a critical asset. Unexpected failures can lead to downtime, operational disruptions, and misconfigurations. When a network device crashes, engineers need a reliable backup to restore it quickly. Without structured backup and restore mechanisms, organizations risk prolonged outages and inefficiencies. This overview underscores the importance of regular backups and explains how ONES Fabric Manager Backup & Restore streamlines the process, ensuring seamless recovery in multi-vendor environments.

The Importance of Backup & Restore in Network Resilience

Backup and restore processes ensure rapid recovery from failures by preserving critical network configurations. Key components include:

In RMA scenarios, replacing faulty hardware is only the first step—the real challenge lies in restoring the original configurations. Without a recent backup, administrators must manually reconfigure the failed switch, resulting in extended downtime, increased risk of errors, operational disruptions, and higher recovery costs due to additional troubleshooting and resource allocation.

ONES Backup & Restore: The Lifeline for Uninterrupted Networks

ONES Fabric Manager Backup & Restore ensures seamless recovery by securely storing configurations in a persistent Docker volume, enabling quick restoration, and eliminating manual reconfiguration. With pre-replacement snapshots for ZTP or upgrades, it offers a reliable rollback option. Designed for multi-vendor compatibility, it minimizes downtime, reduces risks, and streamlines RMA processes for efficient, error-free network management.

Streamlined Backup & Recovery Process

ONES Fabric Manager Backup & Restore captures essential configuration files (config_db.json, frr.conf, fmcli_db.cfg) by enabling both manual and automatic snapshot creation during key operations like reboot, ZTP, or image upgrades. Each snapshot is tagged with a timestamp or custom label for easy identification and restoration. In the event of a failure, users can quickly revert to a known-good configuration—minimizing downtime and eliminating the need for complex manual recovery steps.

Multi-Vendor Support for Diverse Environments

Designed for flexibility, ONES Fabric Manager Backup & Restore works seamlessly across various network devices. Its consistent and reliable backup and recovery capabilities make it an ideal solution for dynamic, multi-vendor infrastructures, ensuring uninterrupted network performance regardless of vendor diversity.

Book a demo today — because every second of network downtime costs more than you think.

FAQs

1. How does ONES Backup & Restore help reduce SONiC RMA downtime?

ONES Fabric Manager automates the backup of SONiC configurations and enables one-click restore, eliminating the need for manual reconfiguration during RMA. This drastically reduces downtime and speeds up recovery.

2. Can I use ONES Backup & Restore across multi-vendor network environments?

Yes, ONES supports multi-vendor compatibility, allowing seamless backup and restoration across SONiC and non-SONiC devices—making it ideal for hybrid data center infrastructures.

3. What configurations does ONES Backup capture for SONiC switches?

ONES captures critical configuration files like config_db.json, frr.conf, and fmcli_db.cfg, ensuring full restoration of routing, ACLs, QoS, interfaces, and more.

4. Does ONES support automatic snapshots before upgrades or ZTP?

Yes, ONES allows both manual and automated snapshot creation before key operations like Zero Touch Provisioning (ZTP), image upgrades, and reboots, enabling quick rollback if needed.

5. Why is backup and restore crucial for SONiC-based network resilience?

Without a structured backup system, RMA recovery becomes error-prone and time-consuming. ONES Backup & Restore ensures operational continuity by enabling reliable, fast, and error-free recovery after hardware failures.

Open Networking Enterprise Suite SONiC

Cisco 8000 + SONiC with Aviz ONES Bootcamp – Why You Should Join?

Post author By Ilona Gabinsky
Post date 21 March 2025

Why You Should Join?

Ready to unlock the future of networking? If you’re managing AI workloads, optimizing data center operations, or just curious about the power of SONiC on Cisco 8000, then this is the event you don’t want to miss.

AI is changing everything – from high-performance computing to real-time analytics, and your network needs to keep up. That’s where Cisco and Aviz Networks come in. Together, we’re redefining AI-ready infrastructure with SONiC-powered networking – delivering agility, scalability, observability, and orchestration.

Join us for the Cisco 8000 + Aviz ONES SONiC Bootcamp on April 3, 2025, where we’ll break down how to run AI-optimized networks with SONiC on Cisco hardware.

Why Should You Care?

Imagine a network that’s not just fast, but intelligent – make your network self-optimized for AI workloads, and that gives you complete control over traffic flows. That’s the power of SONiC on Cisco 8000, supercharged by Aviz ONES.

If you’re a network architect, engineer, or AI infrastructure leader, this bootcamp is designed for you. Whether you’re considering SONiC for the first time or already deploying it, we’ll show you how to make it work seamlessly on Cisco 8000.

What’s on the Agenda?

Part 1: The Cisco 8000 SONiC Evolution

(Presented by Cisco)

Part 2: AI-Driven Operations & Observability with Aviz ONES

(Presented by Aviz)

AI Networking with SONiC: A Practical Guide

Join Us – Register Now!

SONiC

5 Myths About SONiC NOS

Post author By Ilona Gabinsky
Post date 13 March 2025

The rise of SONiC (Software for Open Networking in the Cloud) has disrupted the traditional networking industry, offering enterprises a vendor-agnostic, open-source alternative to proprietary NOS solutions. But with disruption comes misconceptions, and SONiC is no exception.

Let’s debunk the top five myths about SONiC NOS and set the record straight.

Myth #1: SONiC is Only for Hyperscalers

Reality: SONiC is Ready for Enterprises, AI Workloads, and Beyond

While hyperscalers like Microsoft pioneered SONiC, it’s no longer just for the tech giants. Enterprises across healthcare, retail, finance, and AI-driven data centers are deploying SONiC to cut costs, increase flexibility, and escape vendor lock-in.

👉 2025 PlugFest validated SONiC for enterprise-grade use cases, including Layer 2 networking, AI fabric, and PoE-enabled whitebox switches—proving its viability beyond hyperscalers.

Want to hear real-world enterprise SONiC success stories?

Join us for a compelling SDxCentral panel discussion featuring insights from our customers, Techevolution and 1984, alongside our partner, EPS Global. Explore how SONiC is transforming enterprise networking with unparalleled flexibility and efficiency.

Myth #2: SONiC is Difficult to Deploy

Reality: SONiC Adoption is Faster and Easier Than Ever

There’s a misconception that SONiC requires deep coding skills or hyperscaler-level engineering resources. That was true in its early days, but today, we have 1-click SONiC fabric deployment and enterprise-ready SONiC solutions come with automation, APIs, and user-friendly management tools.

💡 One-click SONiC migration guides and vendor support services have made the transition seamless—offering enterprises the same ease of deployment as traditional NOS solutions.

Myth #3: SONiC Lacks Vendor Support

Reality: A Thriving Ecosystem Backs SONiC

Some believe that using an open-source NOS means you’re on your own. That couldn’t be further from the truth. The SONiC ecosystem includes major vendors like Cisco, NVIDIA, Celestica, Edgecore, and Wistron, offering hardware compatibility, support services, and enterprise-ready integrations.

Aviz Networks is actively enabling SONiC interoperability and provide full-stack support, automation, and performance optimizations—ensuring enterprises get the help they need.

Myth #4: SONiC is Not Cost-Effective

Reality: SONiC Delivers Up to 40% TCO Savings

Proprietary NOS vendors often claim that SONiC isn’t cost-effective when factoring in hardware, support, and integration costs. But PlugFest’s TCO analysis tells a different story:

💰 Bottom line? SONiC saves enterprises millions while providing scalability for AI-driven workloads.

Myth #5: SONiC is Just a Trend—It Won’t Last

Reality: SONiC is the Future of Open Networking

Some skeptics believe SONiC is just another open-source experiment that will fade away. The reality? SONiC adoption is accelerating, with major enterprises, cloud providers, and AI data centers making it their NOS of choice.

With strong backing from industry leaders, ongoing community development, and real-world deployments, SONiC is shaping the future of open networking, AI infrastructure, and multi-vendor interoperability.

The Verdict: SONiC is Ready for Prime Time

SONiC NOS has matured beyond its hyperscaler origins into a battle-tested, cost-efficient, and enterprise-ready solution. As the 2025 PlugFest demonstrated, the myths surrounding SONiC are outdated, and the reality is clear:

Enterprise adoption is growing
Deployment is easier than ever
strong vendor ecosystem is supporting SONiC
TCO savings make it a smarter investment
Open networking is the future, and SONiC is leading the way

Frequently Asked Questions:

1. Is SONiC only suitable for hyperscalers like Microsoft?

No. While Microsoft pioneered SONiC, enterprises across industries such as healthcare, finance, retail, and AI-driven data centers are actively deploying it. The 2025 PlugFest validated SONiC for enterprise-grade use cases, including Layer 2 networking, AI fabric, and PoE-enabled whitebox switches, proving its viability beyond hyperscalers.

2. Is deploying SONiC difficult?

Not anymore. Early versions of SONiC required deep coding expertise, but today, automation and enterprise-ready solutions have simplified deployment. One-click SONiC migration guides and vendor support services ensure a seamless transition, offering the same ease of deployment as traditional NOS solutions.

3. Does SONiC lack vendor support?

No. The SONiC ecosystem is backed by major vendors, including Cisco, NVIDIA, Celestica, Edgecore, and Wistron. Aviz Networks provides full-stack SONiC support, automation, and performance optimization, ensuring enterprises have access to the help they need for successful adoption.

4. Is SONiC cost-effective compared to proprietary NOS solutions?

Yes. According to PlugFest’s TCO analysis, SONiC delivers up to 40% lower total cost of ownership (TCO) compared to proprietary NOS solutions. By eliminating NOS licensing costs, reducing OpEx, and providing multi-vendor flexibility, SONiC enables enterprises to save millions while ensuring scalability for AI-driven workloads.

5. Is SONiC just a passing trend?

No. SONiC is shaping the future of open networking. With increasing adoption by major enterprises, cloud providers, and AI data centers, SONiC is here to stay. Its strong backing from industry leaders and ongoing community development ensure long-term viability and continuous innovation.

6. How can enterprises migrate to SONiC?

Enterprises can transition to SONiC using migration guides, vendor-supported deployment tools, and automation frameworks. With one-click SONiC fabric deployment and extensive documentation, the migration process is streamlined for efficiency and minimal disruption.

7. What are the key benefits of adopting SONiC?

Vendor independence and flexibility
Lower TCO with no licensing costs
Enterprise-grade features and security
Strong industry support and ecosystem
Future-proof networking for AI and cloud workloads

8. Where can I learn more about real-world SONiC deployments?

Are you ready to break free from vendor lock-in and embrace open networking?
Learn More About SONiC and Migration Paths here.

Are you ready to break free from vendor lock-in and embrace open networking? Learn more about SONiC’s capabilities, migration paths, and real-world deployments

FAQs

1. Is SONiC NOS only meant for hyperscalers like Microsoft or Google?

No. While SONiC originated with hyperscalers, it is now widely adopted by enterprises across industries—including healthcare, retail, finance, and AI-driven data centers—for its flexibility, open-source benefits, and vendor-agnostic architecture.

2. How difficult is it to deploy SONiC in an enterprise environment?

SONiC deployment is now simplified with one-click fabric orchestration, migration guides, intuitive GUI tools, and vendor-supported solutions—making it as easy as traditional network operating systems for enterprises to adopt.

3. Does SONiC come with vendor support and integration help?

Yes. SONiC is backed by a growing ecosystem of top vendors like NVIDIA, Cisco, Edgecore, and Celestica. Additionally, companies like Aviz Networks offer full-stack SONiC support, automation tools, and deployment services tailored for enterprise use.

4. Is SONiC more cost-effective than proprietary NOS solutions?

Absolutely. SONiC can deliver up to 40% lower total cost of ownership (TCO) by eliminating NOS licensing fees, offering multi-vendor hardware flexibility, and reducing OpEx through automation and open-source efficiencies.

5. Will SONiC last, or is it just a passing trend in networking?

SONiC is here to stay. Backed by major industry players and community contributions, SONiC has evolved into a mainstream open networking platform with real-world deployments in AI fabrics, enterprise data centers, and multi-vendor environments.

6. How does SONiC enable multi-vendor hardware interoperability?

Open-source, vendor-neutral NOS
Runs on diverse ASICs and switch hardware
Avoids vendor lock-in and supply chain constraints

7. Can SONiC support AI-driven workloads and data center fabrics?

Supports RoCE and advanced QoS
Handles lossless, high-bandwidth GPU cluster traffic
Validated for AI fabrics at PlugFest and real deployments

8. Is specialized expertise still required to operate SONiC networks?

Modern GUIs and APIs simplify operations
Automation frameworks reduce manual configs
Vendor support and migration guides ease adoption

9. What tools help evaluate SONiC performance and compatibility?

PlugFests benchmark SONiC for enterprise use cases
Automated test suites (like FTAS) verify features, resilience, and vendor interoperability
CI/CD pipelines ensure continuous validation

10. How is the SONiC community driving ongoing innovation?

Backed by major hyperscalers and network vendors
Open-source community adds new features regularly
Global PlugFests and working groups keep it enterprise-ready

Open Networking Enterprise Suite SONiC

ONES 3.1 Boosts SONiC Support: Key Enhancements for Smarter Infrastructure Troubleshooting

Post author By Anbarasan Ramalingam
Post date 12 March 2025

In today’s fast-moving digital world, maintaining a stable and well-monitored infrastructure is crucial. The latest release of ONES 3.1 introduces key updates, including enhanced support for SONiC (Software for Open Networking in the Cloud). These enhancements boost visibility, automate critical processes, and strengthen system health monitoring. The improved SONiC support streamlines issue detection and response, optimizing performance and minimizing downtime. IT teams can now focus on strategic tasks, knowing their infrastructure is continuously and intelligently monitored for peak performance.

Stay ahead of issues and ensure smooth operations with ONES 3.1.

System Health Monitoring

CPU-Intensive Services

Previously, identifying resource-heavy processes was challenging due to the lack of granular insights in system-wide CPU and memory metrics. Often, system-level data shows a spike in CPU usage without providing a quick way to pinpoint the cause. To address this challenge, ONES now provides detailed reports on the top 10 CPU-consuming services running on the host, along with their memory usage. This helps users easily identify high-impact processes like redis-server, agent, syncd, and dockerd. With this level of detail, users can diagnose performance issues more quickly, optimize system resources, and prevent potential bottlenecks, resulting in greater system efficiency.

Unhealthy Devices with Failure Codes

ONES 3.1 introduces a new feature that highlights unhealthy devices, offering real-time failure detection for hardware (e.g., PSU, fan failures, LED alarms), software services, key processes, and containers. When a failure is detected, the device is marked as unhealthy, with detailed information readily available in the UI. This streamlined view helps operators quickly identify and resolve issues, simplifying troubleshooting. Notifications are also provided in the topology view and health summary page

SONiC Docker Transitions

Docker containers are the backbone of the SONiC operating system, and ensuring their stable operation is crucial for switch performance. Previously, tracking container state changes, such as shifts from “up” to “down,” was difficult and time-consuming. Operators often struggled to detect these changes in real-time, leading to delays in addressing service disruptions and unnoticed issues. ONES 3.1 introduces a new widget that visually highlights Docker container state transitions, allowing operators to quickly spot changes and respond to disruptions. Widgets provides a “Connect” button for direct SSH access to the switch, enabling swift action when needed. Additionally, it offers a timeframe selection feature, allowing operators to view container state changes over a specified period.

Automatic IP Detection, Alerting and Rediscovery:

When a monitored device’s management IP changes, it’s crucial for the monitoring software to update the IP promptly to ensure smooth operations. Previously, detecting and updating a device’s management IP was a manual, time-consuming process, often causing communication breakdowns and delayed issue identification. ONES 3.1 introduces an automatic rediscovery mechanism that instantly detects when a device’s management IP changes and re-registers the switch with the controller. This enhancement eliminates manual intervention, ensuring continuous communication, real-time monitoring, and faster issue resolution, even when devices are reconfigured.

Additionally, IP Transition Widget allows operators to track all IP changes the device has undergone over a specific period and if it had conflicted with another IP in the monitored network. To further enhance visibility, an alert generation option using ONES Rule engine notifies operators of any management IP changes, ensuring they are always aware of network modifications and can respond swiftly to maintain seamless operations.

Rule Engine: Enhanced Alerts

The ONES Rule Engine has emerged as a preferred tool for automating network monitoring, allowing operators to configure custom rules based on their specific threshold levels for various parameters. When a defined condition is met, the system automatically generates an alert, enabling real-time, proactive responses to potential issues. These new metrics provide deeper insights and more precise control over network performance, ensuring smoother operations and quicker issue resolution.

ONES 3.1 takes SONiC network monitoring and troubleshooting to the next level with powerful enhancements like real-time failure detection, automated IP rediscovery, detailed system health insights, and advanced alerting.
Ready to see ONES 3.1 in action? Book a demo today and experience how it can transform your network management with smarter automation and deeper insights.

FAQs

1. How does ONES 3.1 improve SONiC infrastructure monitoring?

ONES 3.1 enhances SONiC observability by offering real-time visibility into system health, including CPU-intensive services, Docker container transitions, and device-level failures. This allows IT teams to proactively detect, investigate, and resolve issues faster than before.

2. What are the benefits of real-time alerts for Docker container failures in SONiC?

The new Docker Down Status alerts in ONES 3.1 notify operators immediately when SONiC containers fail, ensuring service disruptions are caught and addressed before they escalate—minimizing downtime and improving operational resilience.

3. Can ONES 3.1 detect and respond to SONiC device IP changes automatically?

Yes. ONES 3.1 introduces automatic IP rediscovery that detects management IP changes and re-registers the switch seamlessly, ensuring uninterrupted telemetry and real-time monitoring without manual intervention.

4. How does ONES 3.1 help in identifying the root cause of high CPU usage in SONiC?

ONES 3.1 provides granular visibility into top 10 CPU-consuming services, showing memory usage per process. This helps pinpoint root causes—like syncd, redis, or dockerd—behind performance spikes and allows quick remediation.

5. What types of SONiC infrastructure anomalies can the ONES Rule Engine detect?

The ONES Rule Engine can detect and alert on:

CPU/memory overuse by Docker containers
Docker container downtime
Hardware or service failures in devices
Real-time management IP changes

This enables a proactive, rule-based monitoring strategy tailored to each network’s performance needs.

6. How does enhanced network observability help diagnose SONiC switch issues faster?

Advanced observability tools in ONES 3.1 help operators:

Spot unhealthy devices and see precise failure codes instantly
Visualize Docker container status changes over time
Correlate CPU spikes with top resource-heavy processes
Respond quickly using direct SSH access from widgets

7. Why is automatic IP rediscovery important for large-scale SONiC deployments?

Automatic IP rediscovery ensures:

Continuous real-time telemetry even if IPs change during maintenance
Zero manual reconfiguration for IP updates
Faster troubleshooting for re-addressed switches
Reduced risk of monitoring gaps in dynamic environments

8. How does the AI network assistant simplify SONiC troubleshooting?

A conversational AI assistant can:

Answer plain-language queries about device health and logs
Summarize Docker transitions and failure alerts in seconds
Suggest root cause hints based on system metrics
Minimize CLI reliance, making diagnostics faster for all skill levels

9. What are the benefits of container-level CPU and memory monitoring for SONiC?

Container-level insights help operators:

Identify which SONiC service (like syncd or redis) is overloading resources
Set rule-based alerts when usage crosses safe thresholds
Optimize system performance proactively
Prevent unexpected container crashes due to resource exhaustion

10. How does the enhanced Rule Engine improve proactive network monitoring?

The upgraded Rule Engine enables:

Custom alert rules for Docker, hardware failures, and IP changes
One-click rule activation for fast deployment
Detailed summaries of active rules for audit and tuning
Real-time anomaly detection that cuts downtime and improves SONiC resilience

Open Networking Enterprise Suite SONiC

Unveiling New Capabilities in Aviz ONES: NVIDIA Spectrum™-X, Orchestration for Small Networks and Conversational SONiC Troubleshooting

Post author By Kasinath Rajendran
Post date 5 March 2025

We are excited to introduce ONES 3.1, a major milestone in our continuous innovation with the Open Networking Enterprise Suite (ONES). This latest release reinforces our vision of building “Networks for AI and AI for Networks,” raising the bar for network management, configuration, and operations. With enhanced visibility and superior support, ONES 3.1 is more than just an update; it’s a transformative leap forward. This version delivers cutting-edge features that elevate the intelligence and efficiency of network operations, reflecting our unwavering commitment to redefining the possibilities in network management, orchestration, and support.

Key Features of ONES 3.1

Spectrum ™-X Observability:

Building on our existing support for Cumulus NOS, ONES 3.1 now extends compatibility to NVIDIA Spectrum™-X platforms running the latest NOS. This enhancement provides comprehensive visibility into Inventory, Environment, Firmware Versions, CPU/Memory Utilization, Transceivers, Interface Counters, LACP, BGP, RoCE Metrics including PFC, RoCE Traffic, and Queue Counters.

Additionally, ONES 3.1 brings enhanced NVIDIA GPU metrics for GPU-accelerated servers, offering a centralized dashboard that showcases the Top 10 GPU Utilization, allowing for real-time tracking and analysis of the most demanding GPU workloads.

Orchestration for Small Networks

ONES Fabric Manager introduces a simplified, intent-based orchestration experience through an intuitive GUI, enabling seamless fabric orchestration with just a few clicks. New capabilities such as Config Execution and Editor Window, Configuration Comparison, and Backups before upgrades or reboots enhance the efficiency, reliability, and manageability of data center fabric operations.

AI assistant: Conversational Troubleshooting (BETA)

ONES 3.1 introduces the AI Assistant, an intelligent conversational interface that enables users to interact effortlessly with network health and inventory data. It provides real-time insights, streamlines management, and responds to a wide range of queries, enhancing operational efficiency. Designed for on-premises deployment, the AI Assistant operates efficiently on a CPU, eliminating the need for a GPU or any tokens to process user queries.

IP Tracking & Alerting

ONES 3.1 introduces intelligent tracking of network device IP changes in real-time. A dedicated widget enables operators to monitor per-node IP changes and receive instant alerts in case of unexpected changes. Despite IP changes, telemetry streaming remains uninterrupted, ensuring continuous network monitoring without any impact on live status visibility.

Enhanced Support & Proactive Monitoring

ONES 3.1 brings a comprehensive set of default rule templates for critical metrics, ensuring instant anomaly detection and alerts with a simple one-click activation. This release expands monitoring capabilities with Docker CPU/Memory Utilization, Docker Down Status, and Unhealthy Device Detection. Additionally, users can now download a detailed summary of existing rules, enhancing visibility and control over network health.

Additional Enhancements

ONES 3.1 also introduces powerful new features, further strengthening its position as a leading network management solution:

These enhancements make ONES smarter, more efficient, and even more indispensable for modern networking.

Redefining Network Management with ONES 3.1

ONES 3.1 introduces a cutting-edge suite of features and an enhanced user experience, redefining network management, orchestration, and support. This release empowers users with advanced tools and intelligence, ensuring they stay ahead in an increasingly complex network environment.

To explore the full potential of ONES 3.1 and discover how it can transform your network operations, visit us at Aviz Networks. Embark on your journey toward seamless network monitoring and orchestration today.

FAQs

1. What is Aviz ONES 3.1 and how does it improve network operations?

Aviz ONES 3.1 is the latest version of the Open Networking Enterprise Suite, designed to optimize AI-driven data center networks. It introduces powerful enhancements in orchestration, observability, and support—tailored to modern networking needs such as RoCE fabrics, NVIDIA Spectrum™-X integration, and small-network scalability.

2. How does ONES 3.1 provide observability for NVIDIA Spectrum™-X platforms?

ONES 3.1 extends support to NVIDIA Spectrum™-X NOS by offering deep visibility into:

Inventory and transceiver health
Interface counters and RoCE traffic
LACP, BGP, PFC, and queue-level metrics
GPU utilization monitoring on accelerated servers

This makes it easier for operators to monitor, troubleshoot, and optimize NVIDIA-based AI fabrics.

3. What is Conversational Troubleshooting in ONES 3.1?

Conversational Troubleshooting is a new AI assistant in ONES 3.1 that lets users interact with their network via natural language. It answers real-time questions about device health, inventory, and metrics—without requiring CLI knowledge or GPU-based LLMs—making diagnostics more intuitive for NetOps teams.

4. How does ONES 3.1 simplify orchestration for small networks?

ONES 3.1 introduces an intent-based orchestration GUI that’s purpose-built for small to mid-size networks. It allows users to:

Execute and edit configurations visually
Compare changes before deployment
Create backups before upgrades or reboots

This streamlines network management without the complexity of CLI-heavy operations.

5. What proactive monitoring features are included in ONES 3.1?

ONES 3.1 enhances proactive monitoring with a library of pre-built rules for key metrics. Administrators can now monitor:

Docker CPU and memory usage
Transition status of containerized services
IP address changes across network nodes
Unhealthy devices or service disruptions in real time

Alerts and anomaly detection are now just one click away—ideal for fast-moving AI environments.

6. How does Spectrum™-X observability enhance GPU-centric network performance?

Deep observability tools help operators:

Monitor RoCE metrics like PFC and queue counters in real time
Track top GPU workloads to balance compute clusters efficiently
Visualize transceiver and interface health to prevent data flow bottlenecks
Correlate high GPU utilization with network congestion for faster troubleshooting

7. What makes an AI network assistant useful for day-to-day troubleshooting?

A conversational AI assistant can:

Answer plain-language questions about device status and logs
Pull up inventory, health metrics, and alerts instantly
Guide junior engineers through standard diagnostics
Reduce dependency on complex CLI commands for routine checks

8. How does intent-based orchestration help small network teams?

Intent-based orchestration tools:

Let admins push configurations with a few clicks instead of manual scripts
Offer side-by-side config comparisons to catch mistakes early
Automate backups before reboots or upgrades, reducing rollback headaches
Empower lean IT teams to manage fabrics with enterprise-grade consistency

9. How does proactive IP tracking improve network security and stability?

Real-time IP tracking and alerts:

Notify admins of unexpected IP changes, which may signal misconfigurations or threats
Keep telemetry streams uninterrupted despite address shifts
Help audit trail logs for compliance and forensics
Strengthen overall network stability by avoiding misrouted traffic

10. What new network observability metrics can admins monitor in ONES 3.1?

Admins gain expanded visibility into:

VLAN-level traffic patterns
Docker CPU and memory trends
Inbound vs outbound traffic for anomaly spotting
Top services consuming CPU/Memory
Logical MCLAG groups and their live link states
Alerts for unhealthy nodes and transitions in container status

Fabric Test Automation Suite SONiC

Single click SONiC evaluations and POCs

Post author By Naresh Kumar
Post date 9 December 2024

Learn how FTAS can do it for you!

Why Should Organizations Consider SONiC?

In today’s rapidly evolving networking landscape, organizations are seeking greater flexibility, scalability, and cost-effectiveness. SONiC (Software for Open Networking in the Cloud) has emerged as a leading open-source platform for building and managing data center networks.

SONiC empowers network operators to break free from vendor lock-in, reduce operational costs, and accelerate innovation. By providing a vendor-agnostic, open-source framework, SONiC offers unprecedented flexibility and control over network infrastructure.

What Makes Evaluating SONiC So Challenging?

While SONiC offers numerous benefits, evaluating and deploying it can be a daunting task due to several challenges :

How to Accelerate SONiC Evaluations with FTAS

How Does FTAS Keep Your Networks at Par with Quality Standards?

Aviz Networks’ Fabric Test Automation Suite (FTAS) is a powerful tool designed to ensure the quality and reliability of SONiC networks. By automating testing and validation processes, FTAS helps organizations accelerate deployment, reduce operational costs, and minimize risks.

FTAS helps maintain network quality by:

FTAS development is driven by the real-world use cases of Aviz Networks’ customers, ensuring that it meets the needs of modern data center and cloud environments.

Supported Protocols

FTAS supports a wide range of protocols essential for modern data center networks:

What are the new features in FTAS 3.1?

The latest FTAS 3.1 release brings a host of new features and enhancements to further streamline your SONiC evaluation and deployment process:

How to Use FTAS

By leveraging FTAS, you can accelerate your journey to SONiC, reduce risks, and achieve a more agile and efficient network. Start your SONiC evaluation today with FTAS.

To use FTAS, please contact Schedule a Call with Our Team to Delve into FTAS. For comprehensive information before the scheduled call, visit our FTAS product page.

FAQs

1. What makes SONiC evaluation difficult for enterprises exploring open networking?

SONiC evaluation can be complex due to:

Multi-vendor variability in hardware and features
Multiple SONiC flavors (community vs vendor-specific)
Extensive configuration options and networking feature sets
Need for deep SONiC expertise in-house
Lack of standardized testing tools across deployments

2. How does FTAS simplify SONiC evaluations and reduce time-to-deployment?

Aviz’s Fabric Test Automation Suite (FTAS) accelerates SONiC adoption by:

Automating Layer 2/3, HA, and QoS testing
Speeding up firmware and SONiC image qualification
Enabling CI/CD-based continuous validation
Supporting day-2 operations, including upgrades and troubleshooting

3. What types of testing can FTAS perform for SONiC-based networks?

FTAS offers full lifecycle validation through:

Resilience testing (reboots, link failures, container crashes)
Stress and scalability testing to evaluate real-world performance
QoS and EVPN/VXLAN validation
Fast reboot and SNMP visibility tests

4. Why is FTAS ideal for multi-vendor SONiC network environments?

FTAS is designed for heterogeneous data centers because it:

Supports standardized testing across vendors
Provides platform-specific CLI coverage
Automates interoperability checks in SONiC ecosystems
Evolves based on real customer deployments and feedback

5. What’s new in FTAS 3.1 for enterprise SONiC validation?

FTAS dramatically speeds up SONiC evaluation by automating key network tests and validations. It reduces manual efforts and delivers rapid results through:

Automated Layer 2/3 functionality tests
High availability (HA) and security protocol validation
Continuous integration with CI/CD pipelines

Reduced time-to-market for new network features

6. How does automated network testing help ensure SONiC readiness for production?

An automated network testing suite:

Validates Layer 2/3, QoS, and overlay protocols before production
Repeats tests consistently across hardware vendors
Reduces manual config mistakes that can cause downtime
Provides detailed logs for rapid troubleshooting and compliance

7. Why is continuous validation important for an AI-ready open network?

Continuous validation ensures that:

Each new SONiC update or config change won’t break critical data flows
Upgrades and patches integrate smoothly with live workloads
Performance remains predictable for AI and high-throughput applications
Operations teams catch issues before they impact end-users

8. What resilience factors should a robust network operation tool test for SONiC?

A high-quality operation tool should test resilience against:

Node reboots and container restarts
Link failures and flapping connections
Rapid scale-up of traffic loads
Recovery behavior under stress to verify high availability

9. Can automated testing tools speed up SONiC POCs for multi-vendor hardware?

Yes — automation frameworks reduce POC time by:

Running standardized tests on various switches and NICs
Validating interoperability between different vendor ASICs
Checking vendor-specific CLI compatibility
Delivering consistent reports to compare hardware options quickly

10. How does network observability tie into testing and validation workflows?

Network observability works hand-in-hand with testing by:

Capturing real-time telemetry during test runs
Highlighting bottlenecks and packet drops immediately
Feeding results back to CI/CD pipelines for automatic retesting
Enabling data-driven insights for capacity planning and tuning

Open Networking Enterprise Suite SONiC

Global Reach, Local Insight: ONES 3.0 Delivers Seamless Data Center Management

Post author By Anbarasan Ramalingam
Post date 6 November 2024

Explore the latest in AI network management with our ONES 3.0 series

ONES 3.0 introduces a range of exciting new features, with a focus on scaling data center deployments and support. In this blog post, we’ll dive into two standout features: ONES Multisite, a scalable solution for global data center deployments, and enhanced support for SONiC through tech support, servicenow integration and syslog message filtering. Let’s explore how these innovations can benefit your operations.

ONES Multi-site

The ONES rule engine enables incident detection and alert generation, but this data is limited to the specific site managed by each controller. While site data center administrators can use this information to address and resolve issues, enterprise-level administrators or executives seeking an overview of all data centers’ health must access each ONES instance individually, which can be inefficient.

To address this challenge, we introduce ONES Multisite—an application that provides a geospatial overview of anomalies across geographically distributed sites, offering a comprehensive view of the entire network’s health.

ONES instances in different data centers (DCs) around the globe can register with a central multisite application. Upon successful registration, the multisite system periodically polls each site for data related to the number of managed devices (endpoints) and the number of critical alerts. This information is displayed on a map view, showing individual sites, their health status, and last contact times. ONES Multisite also allows users to log in to individual data centers for more detailed information if needed.

Fig 1 – ONES Multisite showing DCs across the globe

To provide a quick overview of the health conditions at various sites, different colors and blinking patterns are used

Registering ONES instance with Multisite application

A simple user interface is provided for registering the ONES application to the multisite, requiring inputs such as the site name, multisite IP, and geographical coordinates ((latitude and longitude in N and E). By default, the current location coordinates of the site are auto-populated, but they can be overridden if necessary. License page of ONES application displays the status of registration status with the multisite application.

Fig 2 – Multisite Registration Window

Once registered, the multisite application will regularly gather data from each site regarding the number of managed devices (endpoints) and the count of critical alerts.

ONES Multisite streamlines the monitoring process across multiple data centers, enabling enterprise-level administrators to easily access vital information and maintain a holistic view of their network’s health. This enhanced visibility not only improves operational efficiency but also empowers teams to respond more effectively to incidents, ensuring optimal performance across all locations.

Enhanced support for SONiC using ONES 3.0

Tech support feature

SONiC Tech Support feature provides a comprehensive method for collecting system information, logs, configuration data, core dumps, and other relevant information essential for identifying and resolving issues. ONES 3.0 Tech Support feature offers an easier way to download the tech support dump from any managed switch. Users can simply select a switch and click on the Tech Support option. ONES controller connects to the switch, executes the tech support command, and notifies the user when the download file is ready. This powerful option allows data center administrators to easily retrieve tech support data without the cumbersome process of logging into each switch, executing the command, and downloading the file.

Fig 3 – ONES Tech Support page

Filtering of syslog messages

The Syslog feature empowers data center operators to easily view and download syslog messages from any of the managed switches through the ONES UI. This functionality is essential for monitoring system performance and diagnosing issues.

To enhance this feature, we’ve introduced a new enhancement that allows users to filter messages based on severity levels, such as error, warning, or all messages. This capability enables operators to quickly identify and prioritize critical alerts, streamlining the troubleshooting process and improving overall operational efficiency. By focusing on the most relevant messages, data center teams can respond more effectively to potential issues, ensuring a more reliable and robust network environment.

Fig 4 – Syslog messages with filter applied

ServiceNow Integration

ServiceNow is a cloud-based platform widely used for IT Service Management, automating business processes, and Enterprise Service Management. One of its core components is the ServiceNow ticketing system, specifically the Incident Management feature. When a user encounters a disruption in any IT service, it is reported as an incident on the platform and assigned to the responsible user or group for resolution.

The ONES Rule Engine proactively monitors the data center for potential disruptive events by creating alerts for any breaches of user-configured thresholds. It tracks various factors, such as sudden surges in CPU usage, heavy traffic bursts, and component failures (e.g., PSU, FAN).

ONES 3.0 enhances this functionality by integrating ServiceNow ticketing with the ONES Rule Engine and Alerts Engine. This integration allows ONES to automatically log tickets in the ServiceNow platform whenever any ONES rule conditions are met.

Fig 5 – Rule creation page with Service now integrated

Fig 6 – Service now platform with ONES tickets

In summary, ONES 3.0 brings significant advancements that cater to the evolving needs of data center management.

To unlock the full potential of ONES 3.0 and see how it can revolutionize your network operations, book your demo today

FAQs

1.What is ONES Multisite and how does it improve global data center monitoring?

ONES Multisite provides a centralized geospatial view of data center health across global sites, allowing enterprise administrators to monitor critical alerts and device statuses from a single interface drastically improving visibility and incident response times.

2.How does ONES 3.0 integrate with ServiceNow for automated IT incident management?

ONES 3.0 connects its built-in Rule and Alerts Engine with ServiceNow to automatically generate tickets for anomalies like CPU surges, component failures, or bandwidth spikes—ensuring streamlined IT service workflows and faster resolution times.

3.Can ONES 3.0 simplify SONiC switch tech support data collection?

Yes, ONES 3.0 introduces a simplified “Tech Support” feature that lets users download diagnostic logs from any managed SONiC switch with one click eliminating the need for manual CLI access across devices.

4.How does ONES 3.0 enhance syslog visibility and filtering for data center operations?

With advanced severity-level filtering (e.g., error, warning, info), ONES 3.0 helps operators quickly pinpoint critical syslog alerts from SONiC switches—accelerating root cause analysis and operational troubleshooting.

5.Why is ONES 3.0 considered essential for centralized AI data center management?

ONES 3.0 delivers single-pane visibility, ServiceNow integration, multisite scalability, and simplified support tools—making it the ideal centralized platform for managing complex, AI-powered, multi-vendor data center environments.

6. How does centralized network observability help with multi-site troubleshooting?

Centralized observability tools:

Aggregate health data from globally distributed sites
Visualize anomalies in a single geospatial dashboard
Reduce time spent switching between site-specific controllers
Enable faster root cause isolation for cross-site issues

7. Can a network operation tool integrate with existing ITSM systems?

Yes — a robust network operation tool can:

Monitor real-time data center health and threshold breaches
Auto-create tickets in platforms like ServiceNow
Sync incident status for better collaboration
Ensure IT service workflows remain connected to live network alerts

8. How does syslog severity filtering improve daily network operations?

Filtering syslogs by severity means teams can:

Focus first on critical errors that impact uptime
Suppress noisy, low-priority logs during high-severity events
Download filtered logs for quicker audits
Shorten mean time to detect (MTTD) issues in complex data centers

9. Why is an AI network assistant valuable for global data center teams?

An AI network assistant:

Correlates anomalies across multiple sites
Flags emerging patterns before they escalate
Suggests best-practice resolutions based on historical data
Reduces manual investigation, freeing engineers for strategic tasks

10. How does centralized tech support improve SONiC troubleshooting efficiency?

Centralized tech support:

Allows quick download of diagnostic bundles without CLI logins
Standardizes log collection across all SONiC switches
Provides clear data for vendor escalation
Cuts down troubleshooting time for remote sites with limited onsite staff

Open Networking Enterprise Suite SONiC

AI Fabric Orchestration: Supercharging AI Networks with SONiC NOS

Post author By Pramod Taramatta
Post date 6 November 2024

Explore the latest in AI network management with our ONES 3.0 series

As the demand for high-performance parallel processing surges in the AI era, GPU clusters have become the heart of data-intensive workloads. But it’s not just about the GPUs themselves—intercommunication between GPU servers is the backbone of their overall performance. Enters the network switch fabric, which is pivotal in overcoming communication bottlenecks and ensuring seamless data flow between GPU servers. Technologies like RoCE (RDMA over Converged Ethernet) allow massive chunks of data to move efficiently between servers, but ensuring that these critical data streams remain lossless and uncongested requires a powerful solution.

That’s where SONiC’s QoS (Quality of Service) features come into play. SONiC enables you to prioritize critical data traffic, ensuring high-priority packets are always transferred ahead of other traffic and also that your important data is not lost. Using SONiC’s robust QoS capabilities and ONES 3.0’s orchestration, you can turn your switch fabric into a lossless, priority-driven highway for GPU server communications.

Let’s explore how you can achieve this through SONiC via ONES 3.0 Fabric Manager orchestration tool.

Lossless And Prioritized Data Flow

Any packet entering the fabric with any DSCP/DOT1P marking can be mapped to any queue of the interface and enabling PFC on this queue makes it lossless. With PFC in place, when congestion is detected in the queue, a pause frame is sent back to the sender, signaling it to temporarily halt sending traffic of that priority. This mechanism effectively prevents packet drops, ensuring lossless transmission for traffic of particular priority.

Beyond PFC, there’s another layer of congestion management—Explicit Congestion Notification (ECN). With ECN, we can define buffer thresholds, exceeding which Congestion Notification (ECN-CNP) packets are sent to the sender, prompting it to reduce the transmission rate and proactively avoid congestion.

At this stage, we’ve ensured that our priority traffic is lossless. Moving into the egress phase, we can further enhance performance by prioritizing this traffic over others, even under congestion. SONiC provides scheduling algorithms like Deficit Weighted Round Robin (DWRR), Weighted Round Robin (WRR), and Strict Priority Scheduling (STRICT). By binding priority queues to these schedulers, the system can ensure that higher-priority traffic is transmitted preferentially, either in a weighted manner (for WRR/DWRR) or with absolute priority (for STRICT).

In summary, through PFC, ECN, and advanced scheduling techniques, SONiC ensures that high-priority traffic from GPU servers is not only lossless but also prioritized during both congestion and egress phases.

Simplifying Complex QoS Configurations with ONES Orchestration

Configuring SONiC’s complex QoS features may sound daunting, but with ONES 3.0’s seamless orchestration, it’s a breeze. ONES allows you to set up essential QoS configurations like DSCP to traffic-class mapping, PFC, ECN thresholds, and even scheduler types—all with a few lines in a YAML template. Here’s a snapshot of the YAML template showcasing how ONES orchestrates SONiC QoS (QoS is the section in YAML below)

Fig 1 – ONES UI AI Fabric Orchestration YAML Template

The Fabric Manager automates the creation and assignment of QoS profiles, saving administrators from manually configuring multiple aspects. Here’s how it works:

Mapping Traffic Classes and Queues

Orchestration begins by mapping traffic into appropriate classes and queues. ONES 3.0 Orchestration allows you to specify mapping values from DSCP (Layer 3) and dot1p (Layer 2) to traffic classes, traffic classes to queues, and traffic classes to priority groups (PGs). Upon specifying these mapping values, profiles would be created with standard namings using these mapping values like DOT1P_TC_PROFILE, TC_QUEUE_PROFILE, TC_PG_PROFILE, DSCP_TC_PROFILE and are binded to the interfaces that are part of the orchestration. This configuration ensures that each type of traffic is routed to its appropriate queue and handled correctly.

For example, we can specify mapping values in the YAML as above in image and FM will create the corresponding profiles and bind it to the interface as below:

Priority Flow Control (PFC) and Explicit Congestion Notification (ECN)

The next critical part of QoS orchestration involves Priority Flow Control (PFC), where ONES YAML allows users to define the queues that should be PFC-enabled. Moreover a PFC Watchdog can be configured to ensure that the PFC is well functioning with restoration, detection times and action to be taken in case of malfunctioning .

ECN configuration parameters can be provided in the YAML template using which ONES Fabric Manager creates a profile WRED_PROFILE and attaches it to all the queues that are PFC enabled for all the interfaces that are part of orchestration.

Here’s an example of how this would be configured on the interface for the YAML input in the above image.

This approach ensures that your network proactively manages congestion and minimizes packet drops for high-priority traffic.

Advanced Scheduling for Optimized Egress

Finally, Scheduling plays a vital role in controlling how packets are forwarded from queues. Orchestration allows administrators to choose between scheduling mechanisms such as Deficit Weighted Round Robin (DWRR), Weighted Round Robin (WRR), or STRICT priority scheduling, depending on their needs.

In the case of DWRR or WRR, weights can be assigned to each queue, influencing how often a queue is serviced relative to others. Upon specifying these parameters in the YAML, ONES-FM creates the scheduler policies (SCHEDULER.<weight>) each for a unique weight assigned to the queues and attach these created policies to the queues according to their weightage for all the interfaces that are part of the orchestration.

For instance in the below given image YAML input, there are two unique weights 60 and 40 that are assigned to queue 3 and 4 respectively. So, two scheduler policies SCHEDULER.40, SCHEDULER.60 are created and binded to the interface queues 3 and 4 respectively.

Now, here comes a question , what if all the queues are congested. How does the congestion notification packets even traverse through the network to reach the sender to stop or slow down the traffic coming in ?

ONES-FM provides an option to designate a specific queue for ECN_CNP (Explicit Congestion Notification packets) traffic, using STRICT scheduling, ensuring that even when the network is heavily congested there is always a room left for the congestion notification packets, preventing further blockages. cnp_queue under the ECN section in the above image represents that and is orchestrated as below by ONES-FM:

Flexible, Day-2 Support for QoS Management

One of the standout features of ONES-FM 3.0 is its support for Day-2 operations. As your network evolves and traffic patterns change, you can modify the QoS configurations through either the YAML template or the NetOps API. This flexibility ensures your network is always tuned to deliver the performance required by your AI workloads.

Future-Proof Your AI Infrastructure with ONES 3.0

With its intuitive YAML-based approach and support for dynamic Day-2 adjustments, ONES Fabric Manager eliminates much of the complexity associated with configuring and managing networks. ONES makes one confident that network infrastructure is both reliable and future-proof. In essence, ONES Fabric Manager enables seamless orchestration for AI fabrics, ensuring your network is always ready to meet the growing demands of AI-driven data centers.

FAQs

1. How does SONiC NOS enable lossless data transfer for GPU-based AI workloads?

SONiC NOS supports Priority Flow Control (PFC) and Explicit Congestion Notification (ECN), ensuring lossless, high-priority traffic flows—critical for real-time communication between GPU clusters in AI data centers.

2. What is the role of ONES 3.0 Fabric Manager in orchestrating SONiC QoS configurations?

ONES 3.0 provides YAML-based orchestration to simplify complex SONiC QoS settings like DSCP mapping, PFC, ECN, and queue scheduling, reducing configuration time and errors.

3. Can ONES 3.0 handle multi-vendor AI fabric orchestration?

Yes, ONES 3.0 is built for vendor-agnostic orchestration, managing QoS policies across different switches and interfaces—ideal for hybrid or evolving AI network environments.

4. How does ONES handle ECN and congestion feedback during network overloads?

ONES allows you to designate a CNP (Congestion Notification Packet) queue with STRICT priority, ensuring that even during congestion, ECN messages reach the sender to throttle traffic.

5. What scheduling algorithms does SONiC support for AI traffic prioritization?

SONiC supports DWRR, WRR, and STRICT scheduling, and ONES Fabric Manager lets you assign and orchestrate these policies via YAML—optimizing egress packet forwarding and queue handling in AI fabrics.

6. How does a network operation tool simplify complex QoS configurations for AI fabrics?

A robust operation tool:

Converts complex SONiC QoS rules into easy YAML templates
Automates DSCP, PFC, ECN, and scheduling bindings
Reduces human error in configuring queues and profiles
Enables consistent policy enforcement across GPU clusters

7. Why is proactive congestion management critical in AI switch fabrics?

AI clusters generate massive, bursty data streams. Without proactive congestion control:

Packet drops can stall GPU training jobs
RDMA traffic loses its lossless benefit
Network latency spikes, degrading AI workload throughput
Feedback loops (ECN) can’t function efficiently

Proactive management keeps traffic smooth and predictable.

8. Can an AI network assistant adjust QoS settings after deployment (Day-2 operations)?

Yes — a modern AI network assistant can:

Accept YAML changes or API calls to tweak DSCP mappings, PFC thresholds, or scheduling weights
Adapt policies dynamically as AI job patterns evolve
Ensure Day-2 changes don’t disrupt live traffic
Provide rollback and audit logs for safe adjustments

9. How does network observability enhance QoS orchestration?

Network observability feeds real-time telemetry into orchestration engines so they can:

Validate if PFC and ECN are behaving correctly
Detect queues nearing congestion
Trigger alerts or auto-tuning workflows
Optimize scheduling weights to balance loads dynamically

10. What happens if all queues in the AI fabric become congested?

Without smart design, congestion can block critical feedback packets. A centralized operation tool solves this by:

Allocating a dedicated queue for ECN CNP packets
Enforcing STRICT priority scheduling for that queue
Guaranteeing that congestion signals always reach the sender

Preventing a total traffic deadlock in the switch fabric

Open Networking Enterprise Suite SONiC

Streamlining AI Fabric Management: The Imperative of a Centralized Management Platform

Post author By Krupakar Annam
Post date 5 November 2024

Introduction

Artificial Intelligence (AI), once a mere buzzword, has now firmly established itself as a cornerstone of technological advancement. Its insatiable appetite for data fuels its continuous evolution, and generative AI, a subset capable of creating new content, is a prime driving force behind this growth. As datacenters become increasingly AI-centric and drive businesses worldwide, the networking community must assess their readiness for this transformative shift.

The Rapid Pace of AI Development

The pace of AI development is staggering, with years of progress potentially compressed into mere weeks. This rapid evolution necessitates a proactive approach from the networking community to ensure their solutions remain aligned with the cutting-edge advancements in AI. The challenge is multifold, as the increasing demand for networking switches and GPUs opens up opportunities for innovation in multi-vendor ecosystems and data center environments.

Fig 1 – GPU Market size and Trend

The Demand for Open and Flexible Networking Solutions

The rapid need for networking switches and GPUs has created a demand for multi-vendor ecosystems and data center environments. This increased demand for freedom from vendor locking has led to a surge in interest for open-source network operating systems (NOS) like SONIC for networking switches. The driving force behind this demand is the consolidation of features offered by multi-vendor hardware suitable for AI Fabrics and overall cost optimization.

Evolving Data Center Network Architectures

As data center network designs evolve from server-centric to GPU-centric architectures, the necessity for new networking topology designs such as fat-tree, dragonfly, and butterfly has become paramount. GPU workloads, including training, fine-tuning, and inferencing, have distinct networking needs, with Remote Direct Memory Access (RDMA) being the most suitable technique to handle high-bandwidth data traffic flows. Lossless networking and low entropy are also essential for optimal performance.

Fig 2 – Evolution of Data Centers

The Need for Centralized Management Solutions

A single pane of glass management tool is essential to streamline operations and optimize performance in multi-vendor AI fabric data centers. Such a tool should be capable of:

Addressing the Challenges of Centralized Management with ONES

Implementing a centralized management tool in a multi-vendor AI fabric data center requires careful consideration of several key challenges:

Aviz understands this need and has implemented ONES 3.0, a centralized management platform that provides comprehensive control over networking devices, AI workload servers and data centers.

Fig 3 – Aviz Open Networking Enterprise Suite (ONES) for AI Fabrics

The Future of Networking in the AI Era

As AI continues to evolve and its applications expand, the networking community must adapt to the changing landscape. By embracing open-source solutions, adopting new network topologies, and leveraging centralized management platforms like ONES 3.0, organizations can ensure their networks are well-equipped to support the demands of AI-driven workloads. The future of networking is inextricably linked to the advancement of AI, and those who are proactive in their approach will be well-positioned to capitalize on the opportunities that lie ahead.

All these cutting-edge innovations only mark the initial stride towards Aviz Networks’ vision, and more is yet to come. With our strong team of support engineers, we are well-equipped to empower customers with a seamless SONiC journey using the ONES platform.

As AI-driven networks grow in complexity, a centralized management platform like ONES 3.0 by Aviz Networks is essential. It provides seamless control, real-time monitoring, and multi-vendor compatibility to tackle the unique demands of AI workloads. Future-proof your network with ONES 3.0—because the future of AI fabric management starts here.

Explore more about ONES 3.0 in our latest blogs here

If you wish to get in touch with me, feel free to connect on LinkedIn here

FAQs

1. Why is centralized management essential for AI Fabric networks?

Centralized management platforms like ONES 3.0 simplify multi-vendor orchestration, offer real-time GPU and network telemetry, and streamline configuration and monitoring for evolving AI data center topologies.

2. How does ONES 3.0 address AI workload challenges in multi-vendor data centers?

ONES 3.0 supports vendor-agnostic infrastructure, enabling seamless control across switches, NICs, and GPUs, while delivering lossless RDMA optimization, topology orchestration (fat-tree, dragonfly), and proactive alerting.

3. What are the key features needed in an AI-centric network management tool?

Top features include:

Real-time infrastructure visualization
Multi-topology orchestration (fat-tree, dragonfly, butterfly)
GPU and NIC telemetry
Priority Flow Control (PFC)
End-to-end anomaly detection

4. Can ONES 3.0 support GPU-centric architectures and RDMA-based networking?

Yes, ONES 3.0 is optimized for AI/ML GPU workloads and RoCE-based RDMA traffic, enabling QoS profile automation, PFC watchdogs, and deep visibility into compute and network fabric.

5. What network topologies does ONES 3.0 support for AI workloads?

ONES 3.0 supports fat-tree, dragonfly, and butterfly network topologies, enabling scalable, high-performance designs tailored to the latency and throughput needs of modern AI fabrics.

6. How does a centralized network operation tool improve day-to-day AI fabric management?

A centralized tool offers:

Single pane of glass for switches, NICs, GPUs
Consistent configuration across vendors
Automated monitoring of lossless traffic flows (e.g., RDMA)
Faster troubleshooting through correlated network observability

7. Why is network observability critical for AI-centric data centers?

AI workloads generate massive, unpredictable traffic patterns. Robust network observability ensures operators can:

Track real-time performance across GPU clusters
Detect hotspots and microbursts
Analyze traffic flows to fine-tune QoS policies
Proactively prevent data flow disruptions in RDMA environments

8. Can an AI network assistant help manage complex multi-vendor fabrics?

Yes — an AI network assistant can:

Automate repetitive configuration tasks
Analyze telemetry from different switch and server vendors
Suggest optimizations for traffic scheduling and priority flow control
Trigger alerts and recovery workflows for anomalies in the AI fabric

9. What challenges do organizations face without centralized network visibility?

Without unified network visibility, operators often deal with:

Siloed data from multiple tools
Manual correlation of switch, NIC, and GPU logs
Slower root cause analysis for RDMA packet drops
Difficulty maintaining consistency in QoS and PFC settings across sites

10. How does centralized orchestration support new AI network topologies?

Centralized orchestration enables:

Easy mapping of DSCP to traffic classes for fat-tree, dragonfly, and butterfly topologies
Unified policy enforcement for lossless fabrics
Scalability as GPU clusters expand
Future-proofing for next-gen AI workloads that require dynamic reconfiguration

SONiC

Why SONiC is Ready Not Just for Hyperscalers

Post author By Ilona Gabinsky
Post date 12 September 2024

When you think of SONiC (Software for Open Networking in the Cloud), it’s often associated with hyperscalers—the giants in tech like Google and Microsoft that demand unparalleled scalability and customization in their network infrastructure. But what if I told you that SONiC is no longer just for hyperscalers? What if I told you that enterprises—yes, Fortune 500 companies, mid-sized businesses, even finance and telecom industries—are now tapping into the power of SONiC to transform their networks?

At Aviz, we’re witnessing this shift firsthand. We recently met with our partners and customers, who shared how they’ve successfully adopted SONiC to enhance their network operations. The results are clear—SONiC’s open-source architecture is providing flexibility, scalability, and significant cost savings across industries. Discover more about this transformation by watching our panel discussion in a video interview hosted by SDxCentral, featuring our customers Techevolution and 1984, along with our partner EPS Global.

Flexibility Through Choices

One of SONiC’s strongest suits is its ability to give businesses choices—choices in hardware, in vendors, in deployment models. Unlike traditional, proprietary solutions that lock you into a particular vendor or set of hardware, SONiC opens the door to a wide array of options. Whether you prefer Cisco, Arista, or NVIDIA, SONiC supports them all, ensuring that your network infrastructure is as flexible and adaptable as your business needs it to be.

With SONiC, you aren’t tied to a single vendor. This freedom encourages innovation and fosters a competitive ecosystem, where businesses can pick and choose the best components to suit their specific needs.

Control Over Your Network

Another key factor that sets Aviz apart is the level of control it offers. Enterprises can manage SONiC at the source code level, which means they can choose the hardware they want at any time. At Aviz, we normalize metrics from your fabric including ASICs, and operating systems to achieve the multivendor observability, giving you the control to deploy any switch you want without worrying about how it will impact your NETOps layer. But what truly makes it enterprise-ready is the end-to-end support stack provided by Aviz. Our platform ensures that customers gain full operating control, whether they prefer a traditional Cisco-like CLI, REST APIs, or even their own in-house controllers. And we understand that enterprises need more than just flexibility—they need an experience that’s simple to use and supported 24/7.

That’s why we’ve developed Aviz Easy Deploy, Monitor, and Support, a plug-and-play solution that brings SONiC to enterprise environments with ease and without the hassle of vendor lock-in.

The Cost Savings You’ve Been Looking For

Cost has always been a critical factor in any IT decision, and SONiC shines here. By moving away from expensive, proprietary solutions and embracing the open-source model, companies can dramatically reduce both capital and operational expenditures. In fact, using SONiC as a foundation, we’ve helped businesses cut costs by half compared to traditional solutions.

A Future-Proof Solution

Perhaps the most exciting part of SONiC’s evolution is that it is on the path to becoming the Linux of networking. Much like how Linux started with a small group of adopters and evolved into a mainstream operating system, SONiC is following a similar trajectory. From a select few hyperscalers to widespread adoption across industries, SONiC is proving it’s not only scalable but also future-proof.

At Aviz, we’re proud to be leading this transformation alongside our partners, customers, and the open-source community. With our support stack, we’ve perfected the recipe for deploying, managing, and scaling SONiC in enterprise environments, offering the same robust experience you’d expect from legacy OEMs.

Ready for the Enterprise

In short, SONiC is no longer just for hyperscalers. It’s ready for enterprises of all sizes, offering flexibility, control, and cost savings that simply can’t be matched by traditional solutions. And with Aviz Easy Deploy, Monitor, and Support, we’ve made SONiC accessible to virtually any organization, making it the go-to choice for network infrastructure transformation.

As more Fortune 500 companies embrace SONiC, we’re confident that the rest of the industry will soon follow suit. So if you’re looking for a scalable, cost-effective, and flexible network solution, it’s time to look at SONiC—not just as the future of networking, but as the solution that’s ready for your enterprise today.

FAQs

1. Is SONiC only suitable for hyperscalers like Microsoft and Google?

No. While SONiC was originally developed for hyperscalers, it’s now enterprise-ready. With solutions like Aviz Easy Deploy and multi-vendor support, mid-sized businesses, Fortune 500s, and telecoms are successfully using SONiC to modernize their network infrastructure.

2. What are the main benefits of using SONiC for enterprise networks?

Enterprises adopt SONiC for its vendor-agnostic flexibility, full control of source code, significant CapEx and OpEx savings, and open support ecosystem—allowing them to customize, scale, and future-proof their networks.

3. How does SONiC reduce network costs for enterprises?

By avoiding expensive proprietary software and enabling open-source switching, SONiC helps enterprises cut network costs by up to 50%. With Aviz, deployment and support are simplified, eliminating the need for costly vendor lock-ins.

4. Can SONiC work with Cisco, Arista, or NVIDIA hardware?

Yes. SONiC is hardware-agnostic and supports popular vendors like Cisco, Arista, NVIDIA, and more, enabling enterprises to mix-and-match best-of-breed hardware without compatibility concerns.

5. Why is SONiC considered the 'Linux of networking'?

SONiC is becoming the de-facto open standard for disaggregated networking, much like how Linux revolutionized computing. With community backing, open-source transparency, and growing adoption across industries, it’s paving the way for next-gen network architectures.

SONiC

Network Observability

AI Network Assistant

Networks for AI

AI for Networks

Latest Blog

Why Partner with Us?

Latest Blog

Login to Partner Portal

Documentation

Validated Designs for SONiC

FAQs

Help

Support

The Importance of Backup & Restore in Network Resilience

ONES Backup & Restore: The Lifeline for Uninterrupted Networks

Streamlined Backup & Recovery Process

Multi-Vendor Support for Diverse Environments

FAQs

Why You Should Join?

Why Should You Care?

What’s on the Agenda?

Part 1: The Cisco 8000 SONiC Evolution

(Presented by Cisco)

Part 2: AI-Driven Operations & Observability with Aviz ONES

(Presented by Aviz)

AI Networking with SONiC: A Practical Guide

Join Us – Register Now!

Myth #1: SONiC is Only for Hyperscalers

Reality: SONiC is Ready for Enterprises, AI Workloads, and Beyond

Myth #2: SONiC is Difficult to Deploy

Reality: SONiC Adoption is Faster and Easier Than Ever

Myth #3: SONiC Lacks Vendor Support

Reality: A Thriving Ecosystem Backs SONiC

Myth #4: SONiC is Not Cost-Effective

Reality: SONiC Delivers Up to 40% TCO Savings

Myth #5: SONiC is Just a Trend—It Won’t Last

Reality: SONiC is the Future of Open Networking

The Verdict: SONiC is Ready for Prime Time

Frequently Asked Questions:

1. Is SONiC only suitable for hyperscalers like Microsoft?

2. Is deploying SONiC difficult?

3. Does SONiC lack vendor support?

4. Is SONiC cost-effective compared to proprietary NOS solutions?

5. Is SONiC just a passing trend?

6. How can enterprises migrate to SONiC?

7. What are the key benefits of adopting SONiC?

8. Where can I learn more about real-world SONiC deployments?

FAQs

System Health Monitoring

CPU-Intensive Services

Unhealthy Devices with Failure Codes

SONiC Docker Transitions

Automatic IP Detection, Alerting and Rediscovery:

Rule Engine: Enhanced Alerts

FAQs

Key Features of ONES 3.1

Spectrum ™-X Observability:

Orchestration for Small Networks

AI assistant: Conversational Troubleshooting (BETA)

IP Tracking & Alerting

Enhanced Support & Proactive Monitoring

Additional Enhancements

FAQs

Learn how FTAS can do it for you!

Why Should Organizations Consider SONiC?

What Makes Evaluating SONiC So Challenging?

How to Accelerate SONiC Evaluations with FTAS

How Does FTAS Keep Your Networks at Par with Quality Standards?

Supported Protocols

What are the new features in FTAS 3.1?

How to Use FTAS

FAQs

Explore the latest in AI network management with our ONES 3.0 series

ONES Multi-site

Registering ONES instance with Multisite application

Enhanced support for SONiC using ONES 3.0

Tech support feature

Filtering of syslog messages

ServiceNow Integration