Exciting Announcement! In celebration of launching our AI Certification, we’re thrilled to offer a 50% discount exclusively. Seize this unique chance—don’t let it slip by!

Categories
All

Single click SONiC evaluations and POCs

Learn how FTAS can do it for you!

Why Should Organizations Consider SONiC?

In today’s rapidly evolving networking landscape, organizations are seeking greater flexibility, scalability, and cost-effectiveness. SONiC (Software for Open Networking in the Cloud) has emerged as a leading open-source platform for building and managing data center networks.

SONiC empowers network operators to break free from vendor lock-in, reduce operational costs, and accelerate innovation. By providing a vendor-agnostic, open-source framework, SONiC offers unprecedented flexibility and control over network infrastructure.

What Makes Evaluating SONiC So Challenging?

While SONiC offers numerous benefits, evaluating and deploying it can be a daunting task due to several challenges :

How to Accelerate SONiC Evaluations with FTAS

How Does FTAS Keep Your Networks at Par with Quality Standards?

Aviz Networks’ Fabric Test Automation Suite (FTAS) is a powerful tool designed to ensure the quality and reliability of SONiC networks. By automating testing and validation processes, FTAS helps organizations accelerate deployment, reduce operational costs, and minimize risks.

FTAS helps maintain network quality by:

FTAS development is driven by the real-world use cases of Aviz Networks’ customers, ensuring that it meets the needs of modern data center and cloud environments.

Supported Protocols

FTAS supports a wide range of protocols essential for modern data center networks:

What are the new features in FTAS 3.1?

The latest FTAS 3.1 release brings a host of new features and enhancements to further streamline your SONiC evaluation and deployment process:

How to Use FTAS

By leveraging FTAS, you can accelerate your journey to SONiC, reduce risks, and achieve a more agile and efficient network. Start your SONiC evaluation today with FTAS.

To use FTAS, please contact Schedule a Call with Our Team to Delve into FTAS. For comprehensive information before the scheduled call, visit our FTAS product page.

Categories
All

Transforming AI Fabric with ONES: Enhanced Observability for GPU Performance

Explore the latest in AI network management with our ONES 3.0 series

Future of Intelligent Networking for AI Fabric Optimization

If you’re operating a high-performance data center or managing AI/ML workloads, ONES 3.0 offers advanced features that ensure your network remains optimized and congestion-free, with lossless data transmission as a core priority.

In today’s fast-paced, AI-driven world, network infrastructure must evolve to meet the growing demands of high-performance computing, real-time data processing, and seamless communication. As organizations build increasingly complex AI models, the need for low-latency, lossless data transmission, and sophisticated scheduling of network traffic has become crucial. ONES 3.0 is designed to address these requirements by offering cutting-edge tools for managing AI fabrics with precision and scalability.

Building on the solid foundation laid by ONES 2.0, where RoCE (RDMA over Converged Ethernet) support enabled lossless communication and enhanced proactive congestion management, ONES 3.0 takes these capabilities to the next level. We’ve further improved RoCE features with the introduction of PFC Watchdog (PFCWD) for enhanced fault tolerance, Scheduler for optimized traffic handling, and WRED for intelligent queue management, ensuring that AI workloads remain highly efficient and resilient, even in the most demanding environments.

Why RoCE is Critical for Building AI Models

As the next generation of AI models requires vast amounts of data to be transferred quickly and reliably across nodes, RoCE becomes an indispensable technology. By enabling remote direct memory access (RDMA) over Ethernet, RoCE facilitates low-latency, high-throughput, and lossless data transmission—all critical elements in building and training modern AI models.

In AI workloads, scheduling data packets effectively ensures that model training is not delayed due to network congestion or packet loss. RoCE’s ability to prioritize traffic and ensure lossless data movement allows AI models to operate at optimal speeds, making it a perfect fit for today’s AI infrastructures. Whether it’s transferring large datasets between GPU clusters or ensuring smooth communication between nodes in a distributed AI system, RoCE ensures that critical data flows seamlessly without compromising performance.

Enhancing RoCE Capabilities from ONES 2.0 to ONES 3.0

In ONES 3.0, we’ve taken RoCE management even further, enhancing the ability to monitor and optimize Priority Flow Control (PFC) and ensuring lossless RDMA traffic under heavy network loads. The new PFC Watchdog (PFCWD) ensures that any misconfiguration or failure in flow control is detected and addressed in real-time, preventing traffic stalls or congestion collapse in AI-driven environments.

Additionally, ONES 3.0’s Scheduler allows for more sophisticated data packet scheduling, ensuring that AI tasks are executed with precision and efficiency. Combined with WRED (Weighted Random Early Detection), which intelligently manages queue drops to prevent buffer overflow in congested networks, ONES 3.0 provides a holistic solution for RoCE-enabled AI fabrics.

The Importance of QoS and RoCE in AI Networks

Quality of Service (QoS) and RoCE are pivotal in ensuring that AI networks can handle the rigorous demands of real-time processing and massive data exchanges without performance degradation. In environments where AI workloads must process large amounts of data between nodes, QoS ensures that critical tasks receive the required bandwidth, while RoCE ensures that this data is transmitted with minimal latency and no packet loss.

With AI workloads demanding real-time responsiveness, any network inefficiency or congestion can slow down AI model training, leading to delays and sub-optimal performance. The advanced QoS mechanisms in ONES 3.0, combined with enhanced RoCE features, provide the necessary tools to prioritize traffic, monitor congestion, and optimize the network for the low-latency, high-reliability communication that AI models depend on.

In ONES 3.0, QoS features such as DSCP mapping, WRED, and scheduling profiles allow customers to:

By leveraging QoS in combination with RoCE, ONES 3.0 creates an optimized environment for AI networks, allowing customers to confidently build and train next-generation AI models without worrying about data bottlenecks.

1. Comprehensive Interface and Performance Metrics

The UI showcases essential network performance indicators such as In/Out packet rates, errors, and discards, all displayed in real time. These metrics give customers the ability to:
By having access to real-time and historical data, customers can make data-driven decisions to optimize network performance without sacrificing the quality of their AI workloads.

2. RoCE Config Visualization

RoCE (RDMA over Converged Ethernet) is a key technology used to achieve high-throughput and low-latency communication, especially when training AI models, where data packets must flow without loss. In ONES 3.0, the RoCE tab within the UI offers full transparency into how data traffic is managed:

3. Visual Traffic Monitoring: A Data-Driven Experience

The UI doesn’t just give you raw data—it helps you visualize it. With multiple graphing options and real-time statistics, customers can easily monitor:

4. Flexible Time-Based Monitoring and Analysis

Customers have the option to track metrics over various time periods, from live updates (1 hour) to historical views (12 hours, 2 weeks, etc.). This flexibility allows customers to:
This feature is especially valuable for customers running AI workloads, where consistent performance over extended periods is vital for the accuracy and efficiency of model training.

Centralized QoS View

ONES 3.0 offers a unified interface for all QoS configurations, including DSCP to TC mappings, WRED/ECN, and scheduler profiles, making traffic management simpler for network admins.
This page provides administrators with comprehensive insights into how traffic flows through the network, allowing them to fine-tune and optimize their configurations to meet the unique demands of modern workloads.
QoS Profile List
Fig 1 – QoS Profile List

Comprehensive Topology View

ONES offers a comprehensive, interactive map of network devices and their connectivity, ideal for monitoring AI/ML and RoCE environments. It provides an actionable overview that simplifies network management.
AI-ML Topology View
Fig 2 – AI-ML Topology View
Key features include:
Overall, the Topology Page in ONES enhances network visibility and control, making it easier to optimize performance, troubleshoot issues, and ensure the smooth operation of AI/ML and RoCE workloads.

Proactive Monitoring and Alerts with the Enhanced ONES Rule Engine

The ONES Rule Engine has been a standout feature in previous releases, providing robust monitoring and alerting capabilities for network administrators. With the latest update, we’ve enhanced the usability and functionality, making rule creation and alert configuration even smoother and more intuitive. Whether monitoring RoCE metrics or AI-Fabric performance counters, administrators can now set up alerts with greater precision and ease. This new streamlined experience allows for better anomaly detection, helping prevent network congestion and data loss before they impact performance.

The ONES Rule Engine offers cutting-edge capabilities for proactive network management, enabling real-time anomaly detection and alerting. It provides deep visibility into AI-Fabric metrics like queue counters, PFC events, packet rates, and link failures, ensuring smooth performance for RoCE-based applications. By allowing users to set custom thresholds and conditions for congestion detection, the Rule Engine ensures that network administrators can swiftly address potential bottlenecks before they escalate.

With integrated alerting systems such as Slack and Zendesk, administrators can respond instantly to network anomalies. The ONES Rule Engine’s automation streamlines monitoring and troubleshooting, helping prevent data loss and maintain optimal network conditions, ultimately enhancing the overall network efficiency.

Conclusion

In an era where AI and machine learning are driving transformative innovations, the need for a robust and efficient network infrastructure has never been more critical. ONES 3.0 ensures that AI workloads can operate seamlessly, with minimal latency and no packet loss.
Categories
All

Global Reach, Local Insight: ONES 3.0 Delivers Seamless Data Center Management

Explore the latest in AI network management with our ONES 3.0 series

ONES 3.0 introduces a range of exciting new features, with a focus on scaling data center deployments and support. In this blog post, we’ll dive into two standout features: ONES Multisite, a scalable solution for global data center deployments, and enhanced support for SONiC through tech support, servicenow integration and syslog message filtering. Let’s explore how these innovations can benefit your operations.

ONES Multi-site

The ONES rule engine enables incident detection and alert generation, but this data is limited to the specific site managed by each controller. While site data center administrators can use this information to address and resolve issues, enterprise-level administrators or executives seeking an overview of all data centers’ health must access each ONES instance individually, which can be inefficient.

To address this challenge, we introduce ONES Multisite—an application that provides a geospatial overview of anomalies across geographically distributed sites, offering a comprehensive view of the entire network’s health.

ONES instances in different data centers (DCs) around the globe can register with a central multisite application. Upon successful registration, the multisite system periodically polls each site for data related to the number of managed devices (endpoints) and the number of critical alerts. This information is displayed on a map view, showing individual sites, their health status, and last contact times. ONES Multisite also allows users to log in to individual data centers for more detailed information if needed.

ONES Multisite showing DCs across the globe
Fig 1 – ONES Multisite showing DCs across the globe
To provide a quick overview of the health conditions at various sites, different colors and blinking patterns are used

Registering ONES instance with Multisite application

A simple user interface is provided for registering the ONES application to the multisite, requiring inputs such as the site name, multisite IP, and geographical coordinates ((latitude and longitude in N and E). By default, the current location coordinates of the site are auto-populated, but they can be overridden if necessary. License page of ONES application displays the status of registration status with the multisite application.
Multisite Registration Window
Fig 2 – Multisite Registration Window

Once registered, the multisite application will regularly gather data from each site regarding the number of managed devices (endpoints) and the count of critical alerts.

ONES Multisite streamlines the monitoring process across multiple data centers, enabling enterprise-level administrators to easily access vital information and maintain a holistic view of their network’s health. This enhanced visibility not only improves operational efficiency but also empowers teams to respond more effectively to incidents, ensuring optimal performance across all locations.

Enhanced support for SONiC using ONES 3.0

Tech support feature

SONiC Tech Support feature provides a comprehensive method for collecting system information, logs, configuration data, core dumps, and other relevant information essential for identifying and resolving issues. ONES 3.0 Tech Support feature offers an easier way to download the tech support dump from any managed switch. Users can simply select a switch and click on the Tech Support option. ONES controller connects to the switch, executes the tech support command, and notifies the user when the download file is ready. This powerful option allows data center administrators to easily retrieve tech support data without the cumbersome process of logging into each switch, executing the command, and downloading the file.
ONES Tech support page

Fig 3 – ONES Tech Support page

Filtering of syslog messages

The Syslog feature empowers data center operators to easily view and download syslog messages from any of the managed switches through the ONES UI. This functionality is essential for monitoring system performance and diagnosing issues.

To enhance this feature, we’ve introduced a new enhancement that allows users to filter messages based on severity levels, such as error, warning, or all messages. This capability enables operators to quickly identify and prioritize critical alerts, streamlining the troubleshooting process and improving overall operational efficiency. By focusing on the most relevant messages, data center teams can respond more effectively to potential issues, ensuring a more reliable and robust network environment.

Syslog messages with filter applied
Fig 4 – Syslog messages with filter applied

ServiceNow Integration

ServiceNow is a cloud-based platform widely used for IT Service Management, automating business processes, and Enterprise Service Management. One of its core components is the ServiceNow ticketing system, specifically the Incident Management feature. When a user encounters a disruption in any IT service, it is reported as an incident on the platform and assigned to the responsible user or group for resolution.

The ONES Rule Engine proactively monitors the data center for potential disruptive events by creating alerts for any breaches of user-configured thresholds. It tracks various factors, such as sudden surges in CPU usage, heavy traffic bursts, and component failures (e.g., PSU, FAN).

ONES 3.0 enhances this functionality by integrating ServiceNow ticketing with the ONES Rule Engine and Alerts Engine. This integration allows ONES to automatically log tickets in the ServiceNow platform whenever any ONES rule conditions are met.

Rule creation page with Service now integrated
Fig 5 – Rule creation page with Service now integrated
Service now platform with ONES tickets
Fig 6 – Service now platform with ONES tickets
In summary, ONES 3.0 brings significant advancements that cater to the evolving needs of data center management.

To unlock the full potential of ONES 3.0 and see how it can revolutionize your network operations, book your demo today

Categories
All

AI Fabric Orchestration: Supercharging AI Networks with SONiC NOS

Explore the latest in AI network management with our ONES 3.0 series

As the demand for high-performance parallel processing surges in the AI era, GPU clusters have become the heart of data-intensive workloads. But it’s not just about the GPUs themselves—intercommunication between GPU servers is the backbone of their overall performance. Enters the network switch fabric, which is pivotal in overcoming communication bottlenecks and ensuring seamless data flow between GPU servers. Technologies like RoCE (RDMA over Converged Ethernet) allow massive chunks of data to move efficiently between servers, but ensuring that these critical data streams remain lossless and uncongested requires a powerful solution.

That’s where SONiC’s QoS (Quality of Service) features come into play. SONiC enables you to prioritize critical data traffic, ensuring high-priority packets are always transferred ahead of other traffic and also that your important data is not lost. Using SONiC’s robust QoS capabilities and ONES 3.0’s orchestration, you can turn your switch fabric into a lossless, priority-driven highway for GPU server communications.

Let’s explore how you can achieve this through SONiC via ONES 3.0 Fabric Manager orchestration tool.

Lossless And Prioritized Data Flow

Any packet entering the fabric with any DSCP/DOT1P marking can be mapped to any queue of the interface and enabling PFC on this queue makes it lossless. With PFC in place, when congestion is detected in the queue, a pause frame is sent back to the sender, signaling it to temporarily halt sending traffic of that priority. This mechanism effectively prevents packet drops, ensuring lossless transmission for traffic of particular priority.

Beyond PFC, there’s another layer of congestion management—Explicit Congestion Notification (ECN). With ECN, we can define buffer thresholds, exceeding which Congestion Notification (ECN-CNP) packets are sent to the sender, prompting it to reduce the transmission rate and proactively avoid congestion.

At this stage, we’ve ensured that our priority traffic is lossless. Moving into the egress phase, we can further enhance performance by prioritizing this traffic over others, even under congestion. SONiC provides scheduling algorithms like Deficit Weighted Round Robin (DWRR), Weighted Round Robin (WRR), and Strict Priority Scheduling (STRICT). By binding priority queues to these schedulers, the system can ensure that higher-priority traffic is transmitted preferentially, either in a weighted manner (for WRR/DWRR) or with absolute priority (for STRICT).

In summary, through PFC, ECN, and advanced scheduling techniques, SONiC ensures that high-priority traffic from GPU servers is not only lossless but also prioritized during both congestion and egress phases.

Simplifying Complex QoS Configurations with ONES Orchestration

Configuring SONiC’s complex QoS features may sound daunting, but with ONES 3.0’s seamless orchestration, it’s a breeze. ONES allows you to set up essential QoS configurations like DSCP to traffic-class mapping, PFC, ECN thresholds, and even scheduler types—all with a few lines in a YAML template. Here’s a snapshot of the YAML template showcasing how ONES orchestrates SONiC QoS (QoS is the section in YAML below)

ONES UI AI Fabric Orchestration YAML Template
Fig 1 – ONES UI AI Fabric Orchestration YAML Template

The Fabric Manager automates the creation and assignment of QoS profiles, saving administrators from manually configuring multiple aspects. Here’s how it works:

Mapping Traffic Classes and Queues

Orchestration begins by mapping traffic into appropriate classes and queues. ONES 3.0 Orchestration allows you to specify mapping values from DSCP (Layer 3) and dot1p (Layer 2) to traffic classes, traffic classes to queues, and traffic classes to priority groups (PGs). Upon specifying these mapping values, profiles would be created with standard namings using these mapping values like DOT1P_TC_PROFILE, TC_QUEUE_PROFILE, TC_PG_PROFILE, DSCP_TC_PROFILE and are binded to the interfaces that are part of the orchestration. This configuration ensures that each type of traffic is routed to its appropriate queue and handled correctly.

For example, we can specify mapping values in the YAML as above in image and FM will create the corresponding profiles and bind it to the interface as below:

Priority Flow Control (PFC) and Explicit Congestion Notification (ECN)

The next critical part of QoS orchestration involves Priority Flow Control (PFC), where ONES YAML allows users to define the queues that should be PFC-enabled. Moreover a PFC Watchdog can be configured to ensure that the PFC is well functioning with restoration, detection times and action to be taken in case of malfunctioning .

ECN configuration parameters can be provided in the YAML template using which ONES Fabric Manager creates a profile WRED_PROFILE and attaches it to all the queues that are PFC enabled for all the interfaces that are part of orchestration.

Here’s an example of how this would be configured on the interface for the YAML input in the above image.

This approach ensures that your network proactively manages congestion and minimizes packet drops for high-priority traffic.

Advanced Scheduling for Optimized Egress

Finally, Scheduling plays a vital role in controlling how packets are forwarded from queues. Orchestration allows administrators to choose between scheduling mechanisms such as Deficit Weighted Round Robin (DWRR), Weighted Round Robin (WRR), or STRICT priority scheduling, depending on their needs.

In the case of DWRR or WRR, weights can be assigned to each queue, influencing how often a queue is serviced relative to others. Upon specifying these parameters in the YAML, ONES-FM creates the scheduler policies (SCHEDULER.<weight>) each for a unique weight assigned to the queues and attach these created policies to the queues according to their weightage for all the interfaces that are part of the orchestration.

For instance in the below given image YAML input, there are two unique weights 60 and 40 that are assigned to queue 3 and 4 respectively. So, two scheduler policies SCHEDULER.40, SCHEDULER.60 are created and binded to the interface queues 3 and 4 respectively.

Now, here comes a question , what if all the queues are congested. How does the congestion notification packets even traverse through the network to reach the sender to stop or slow down the traffic coming in ?

ONES-FM provides an option to designate a specific queue for ECN_CNP (Explicit Congestion Notification packets) traffic, using STRICT scheduling, ensuring that even when the network is heavily congested there is always a room left for the congestion notification packets, preventing further blockages. cnp_queue under the ECN section in the above image represents that and is orchestrated as below by ONES-FM:

Flexible, Day-2 Support for QoS Management

One of the standout features of ONES-FM 3.0 is its support for Day-2 operations. As your network evolves and traffic patterns change, you can modify the QoS configurations through either the YAML template or the NetOps API. This flexibility ensures your network is always tuned to deliver the performance required by your AI workloads.

Future-Proof Your AI Infrastructure with ONES 3.0

With its intuitive YAML-based approach and support for dynamic Day-2 adjustments, ONES Fabric Manager eliminates much of the complexity associated with configuring and managing networks. ONES makes one confident that network infrastructure is both reliable and future-proof. In essence, ONES Fabric Manager enables seamless orchestration for AI fabrics, ensuring your network is always ready to meet the growing demands of AI-driven data centers.
Categories
All

Streamlining AI Fabric Management: The Imperative of a Centralized Management Platform

Introduction

Artificial Intelligence (AI), once a mere buzzword, has now firmly established itself as a cornerstone of technological advancement. Its insatiable appetite for data fuels its continuous evolution, and generative AI, a subset capable of creating new content, is a prime driving force behind this growth. As datacenters become increasingly AI-centric and drive businesses worldwide, the networking community must assess their readiness for this transformative shift.

The Rapid Pace of AI Development

The pace of AI development is staggering, with years of progress potentially compressed into mere weeks. This rapid evolution necessitates a proactive approach from the networking community to ensure their solutions remain aligned with the cutting-edge advancements in AI. The challenge is multifold, as the increasing demand for networking switches and GPUs opens up opportunities for innovation in multi-vendor ecosystems and data center environments.
GPU Market size and Trend

Fig 1 – GPU Market size and Trend

The Demand for Open and Flexible Networking Solutions

The rapid need for networking switches and GPUs has created a demand for multi-vendor ecosystems and data center environments. This increased demand for freedom from vendor locking has led to a surge in interest for open-source network operating systems (NOS) like SONIC for networking switches. The driving force behind this demand is the consolidation of features offered by multi-vendor hardware suitable for AI Fabrics and overall cost optimization.

Evolving Data Center Network Architectures

As data center network designs evolve from server-centric to GPU-centric architectures, the necessity for new networking topology designs such as fat-tree, dragonfly, and butterfly has become paramount. GPU workloads, including training, fine-tuning, and inferencing, have distinct networking needs, with Remote Direct Memory Access (RDMA) being the most suitable technique to handle high-bandwidth data traffic flows. Lossless networking and low entropy are also essential for optimal performance.
Fig 2 – Evolution of Data Centers

The Need for Centralized Management Solutions

A single pane of glass management tool is essential to streamline operations and optimize performance in multi-vendor AI fabric data centers. Such a tool should be capable of:

Addressing the Challenges of Centralized Management with ONES

Implementing a centralized management tool in a multi-vendor AI fabric data center requires careful consideration of several key challenges:
Aviz understands this need and has implemented ONES 3.0, a centralized management platform that provides comprehensive control over networking devices, AI workload servers and data centers.
Fig 3 – Aviz Open Networking Enterprise Suite (ONES) for AI Fabrics

The Future of Networking in the AI Era

As AI continues to evolve and its applications expand, the networking community must adapt to the changing landscape. By embracing open-source solutions, adopting new network topologies, and leveraging centralized management platforms like ONES 3.0, organizations can ensure their networks are well-equipped to support the demands of AI-driven workloads. The future of networking is inextricably linked to the advancement of AI, and those who are proactive in their approach will be well-positioned to capitalize on the opportunities that lie ahead.

All these cutting-edge innovations only mark the initial stride towards Aviz Networks’ vision, and more is yet to come. With our strong team of support engineers, we are well-equipped to empower customers with a seamless SONiC journey using the ONES platform.

As AI-driven networks grow in complexity, a centralized management platform like ONES 3.0 by Aviz Networks is essential. It provides seamless control, real-time monitoring, and multi-vendor compatibility to tackle the unique demands of AI workloads. Future-proof your network with ONES 3.0—because the future of AI fabric management starts here.

Explore more about ONES 3.0 in our latest blogs here

If you wish to get in touch with me, feel free to connect on LinkedIn here

Categories
All

Partner Program Announcement

Aviz Announces a new Promotion for Its Channel Partners

Aviz Networks, a leader in AI-driven networking solutions, is thrilled to announce a new Program designed to accelerate the market penetration of our innovative products. This incentive is offered exclusively to registered partners to help expand our reach and enhance customer engagement.

Overview

The Program is tailored to reward our partners for their efforts in connecting us with qualified customer prospects and advancing the adoption of our platform in the market.

Program Details

Program Administration:

Program Timeline:

We value our partners and are committed to supporting your efforts to promote the adoption of our cutting-edge AI networking solutions. Together, let’s transform the networking industry and achieve remarkable success.

Want to know about about Partner Incentive Program

Already a Partner ? Login to Partner Portal
Categories
All

Announcing New Features in AI Network Management and Operations

We are thrilled to announce the release of ONES 3.0, a pivotal update in our ongoing innovation journey with Open Networking Enterprise Suite (ONES). This release furthers our mission of building ‘Networks for AI and AI for Networks,’ setting a new benchmark in network management and operations. With enhanced Visibility, AI Fabric Manager, and Support, ONES 3.0 is not merely an upgrade—it’s a significant stride forward. This version introduces advanced features that significantly boost the sophistication and efficiency of network operations, embodying our commitment to continuously push the boundaries of what’s possible in network orchestration and management.

"With the launch of ONES 3.0, we are enhancing the observability and orchestration of AI-Fabric network infrastructure tailored for GPU-centric workloads. This release offers improved visibility into compute metrics, including GPUs and network interface cards, enabling comprehensive end-to-end observability across multi-site AI infrastructures. Additionally, it strengthens fabric management with the inclusion of RoCE (QoS Profiles) configuration providing single-click Day 0 deployment for AI deployments. ONES 3.0 reflects our commitment to innovation, empowering customers to efficiently manage and optimize complex networks."

Key Features of ONES 3.0

ONES Multi-site

Multi-site offers a revolutionary way to visualize network anomalies across geographical locations. This intuitive, geospatial interface provides a comprehensive view of network health by representing anomalies on a map, making it easier to identify and address issues that span multiple sites. This feature is particularly valuable for organizations with geographically dispersed networks, as it allows for a unified and detailed perspective of network performance.

AI Fabric Manager

ONES AI Fabric Manager enhances the management and optimization of AI workloads, streamlining the deployment of AI/ML tasks across your network for efficient resource utilization. It automates the creation and assignment of QoS profiles, reducing the need for manual configuration.

Orchestration framework enables mapping of DSCP at Layer 3 and IEEE 802.1p at Layer 2 to traffic classes, which can then be linked to queues and priority groups. A key feature is Priority Flow Control (PFC), allowing users to define PFC-enabled queues for lossless traffic management. Additionally, a PFC watchdog can monitor functionality and initiate recovery actions if needed. The framework also supports ECN and various scheduling options, such as DWRR, WRR, and Strict Priority Scheduling for dynamic traffic management.

With AI Fabric visibility, administrators gain real-time insights into workload performance and resource utilization, facilitating proactive management. Detailed analytics help monitor trends, identify bottlenecks, and inform future capacity planning.

GPU and NIC Visibility

ONES 3.0 introduces a standout feature that enhances network performance by providing advanced visibility into GPU server metrics. The ONES agent on the server enables real-time monitoring of key metrics across network interfaces, GPUs, CPUs, and system-wide parameters, once integrated with the ONES platform. It supports a wide range of hardware vendors and configurations, ensuring adaptability and comprehensive monitoring. This capability is particularly valuable for tracking real-time server data and accommodating complex AI/ML workloads, ensuring that your network can handle even the most demanding computational tasks efficiently

ServiceNow Integration

Experience the powerful ONES Rule Engine and Alerts system, now integrated with ServiceNow ticketing. The ONES anomaly detection engine automatically reports issues, streamlining incident management. This integration connects ONES with your existing IT service management infrastructure, enhancing change control and overall network operations. The seamless integration simplifies the maintenance and optimization of your network environment

Support Enhancements

ONES now offers enhanced customer support through a single-pane access to the tech support page and syslog, providing comprehensive support resources and troubleshooting tools. The Tech Support feature allows for the efficient collection of system information, logs, configuration data, core dumps, and other critical data needed to diagnose and resolve issues. A new enhancement enables users to filter messages based on severity levels, such as errors, warnings, or all messages. This feature helps operators quickly identify and prioritize critical alerts, streamlining the troubleshooting process and improving overall operational efficiency.

A New Era of Network Management

ONES 3.0 features a suite of innovative functionalities and an enhanced user interface. This release revolutionizes network orchestration and management, providing the tools and capabilities needed to stay ahead in an increasingly complex network landscape.

To explore the full potential of ONES 3.0 and discover how it can transform your network operations, visit us at Aviz Networks Embark on your journey toward seamless network monitoring and orchestration today.

Categories
All

Why SONiC is Ready Not Just for Hyperscalers

When you think of SONiC (Software for Open Networking in the Cloud), it’s often associated with hyperscalers—the giants in tech like Google and Microsoft that demand unparalleled scalability and customization in their network infrastructure. But what if I told you that SONiC is no longer just for hyperscalers? What if I told you that enterprises—yes, Fortune 500 companies, mid-sized businesses, even finance and telecom industries—are now tapping into the power of SONiC to transform their networks?

At Aviz, we’re witnessing this shift firsthand. We recently met with our partners and customers, who shared how they’ve successfully adopted SONiC to enhance their network operations. The results are clear—SONiC’s open-source architecture is providing flexibility, scalability, and significant cost savings across industries. Discover more about this transformation by watching our panel discussion in a video interview hosted by SDxCentral, featuring our customers Techevolution and 1984, along with our partner EPS Global.

Flexibility Through Choices

One of SONiC’s strongest suits is its ability to give businesses choices—choices in hardware, in vendors, in deployment models. Unlike traditional, proprietary solutions that lock you into a particular vendor or set of hardware, SONiC opens the door to a wide array of options. Whether you prefer Cisco, Arista, or NVIDIA, SONiC supports them all, ensuring that your network infrastructure is as flexible and adaptable as your business needs it to be.

With SONiC, you aren’t tied to a single vendor. This freedom encourages innovation and fosters a competitive ecosystem, where businesses can pick and choose the best components to suit their specific needs.

Control Over Your Network

Another key factor that sets Aviz apart is the level of control it offers. Enterprises can manage SONiC at the source code level, which means they can choose the hardware they want at any time. At Aviz, we normalize metrics from your fabric including ASICs, and operating systems to achieve the multivendor observability, giving you the control to deploy any switch you want without worrying about how it will impact your NETOps layer. But what truly makes it enterprise-ready is the end-to-end support stack provided by Aviz. Our platform ensures that customers gain full operating control, whether they prefer a traditional Cisco-like CLI, REST APIs, or even their own in-house controllers. And we understand that enterprises need more than just flexibility—they need an experience that’s simple to use and supported 24/7.

That’s why we’ve developed Aviz Easy Deploy, Monitor, and Support, a plug-and-play solution that brings SONiC to enterprise environments with ease and without the hassle of vendor lock-in.

The Cost Savings You’ve Been Looking For

Cost has always been a critical factor in any IT decision, and SONiC shines here. By moving away from expensive, proprietary solutions and embracing the open-source model, companies can dramatically reduce both capital and operational expenditures. In fact, using SONiC as a foundation, we’ve helped businesses cut costs by half compared to traditional solutions.

A Future-Proof Solution

Perhaps the most exciting part of SONiC’s evolution is that it is on the path to becoming the Linux of networking. Much like how Linux started with a small group of adopters and evolved into a mainstream operating system, SONiC is following a similar trajectory. From a select few hyperscalers to widespread adoption across industries, SONiC is proving it’s not only scalable but also future-proof.

At Aviz, we’re proud to be leading this transformation alongside our partners, customers, and the open-source community. With our support stack, we’ve perfected the recipe for deploying, managing, and scaling SONiC in enterprise environments, offering the same robust experience you’d expect from legacy OEMs.

Ready for the Enterprise

In short, SONiC is no longer just for hyperscalers. It’s ready for enterprises of all sizes, offering flexibility, control, and cost savings that simply can’t be matched by traditional solutions. And with Aviz Easy Deploy, Monitor, and Support, we’ve made SONiC accessible to virtually any organization, making it the go-to choice for network infrastructure transformation.

As more Fortune 500 companies embrace SONiC, we’re confident that the rest of the industry will soon follow suit. So if you’re looking for a scalable, cost-effective, and flexible network solution, it’s time to look at SONiC—not just as the future of networking, but as the solution that’s ready for your enterprise today.

Categories
All

The Power of Choice in Networking: How The AI Stack Breaks Down Barriers

A lot of people ask me, “What are the problems that you are solving for customers?” At Aviz, we understand that modern networking demands more than just connectivity; it requires agile, scalable solutions that can adapt to the evolving demands of AI-driven environments. We’re tackling the challenges of complexity, vendor lock-in, and prohibitive costs that many face in traditional network setups. Our AI Networking Stack isn’t just about keeping your network running; it’s about advancing it to think, predict, and operate more efficiently.

At Aviz, we are reshaping networks for the AI era by pioneering both ‘Networks for AI’ and ‘AI for Networks’. Our AI Networking Stack offers unparalleled choice, control, and cost savings, designed to enhance orchestration, observability, and real-time alerts in a vendor-agnostic environment. We’re not just providing solutions; we’re transforming networks with advanced LLM-based learning for critical operations, ensuring powerful, open-source solutions that drive innovation at a fraction of the cost.

We lead the journey to redefine networking with a data-centric approach that seamlessly integrates with any switch and network operating system, delivering performance that rivals the top OEM solutions—all while focusing on the core pillars of choice, control, and cost-effectiveness.

So, if you value having choices, staying in control, and achieving cost savings, read on to discover how our innovative solutions can transform your network management experience.

Now, let’s take a closer look at what sets our technology apart. Here is the detailed overview of our AI Networking Stack:

We’ve meticulously developed each layer of our AI Networking Stack to address the unique challenges our customers face in today’s dynamic network environments. From foundational hardware choices to advanced AI-driven functionalities, let’s dive into the specifics of what makes our technology stand out in the industry.

First, let’s discuss why choosing a vendor-agnostic approach is so crucial. Imagine using Linux. Does it really matter you use it on HPE, DELL, Lenovo Servers, or even at AWS, Azure or GCP. That’s the kind of interoperability we bring to the networking world. Similar to what Linux did for the tech industry, we leverage the open-source SONiC operating system, enhanced by strong partnerships and robust community support. This approach offers an array of choices, enabling hardware selections from our partner ecosystem without any constraints.

At the heart of our innovative lineup is ONES (Open Networking Enterprise Suite), which empowers you with real-time visibility, seamless orchestration, advanced anomaly detection, and AI fabric functionality, including RoCE, across multiple vendors.
This means you have full control over which hardware solutions you implement, supported by our dedicated 24/7 customer service. ONES is designed to give you the freedom to manage your network without vendor lock-in, ensuring flexibility and control in your hands. Another critical aspect of our strategy to ensure cost efficiency is the Open Packet Broker (OPB).
Built on the powerful, community-driven SONiC platform, the OPB mirrors the capabilities of traditional Network Packet Brokers (NPB) but at a fraction of the cost. This solution delivers all the traditional functionalities you expect but optimizes them to offer significant cost savings without sacrificing performance or scalability.
Sitting atop our stack is the GenAI-based Network Copilot, your AI-powered assistant that simplifies all aspects of network management—from routine upgrades and audit reports to complex troubleshooting tasks. This tool is designed to enhance your operational efficiency, dramatically reducing the time and effort required for network management tasks, thereby freeing up your team to focus on strategic initiatives that drive business growth.

Our AI Networking Stack is designed to be the backbone of future network management, integrating advanced AI to navigate the complexities of modern networking with sophistication and ease. Opting for a vendor-agnostic approach provides the flexibility to choose the best technologies at the most effective prices, ensuring your network remains robust, scalable, and primed for future technological advancements.

Explore the benefits of a networking solution that brings choices, control, and cost savings without the constraints of traditional vendor lock-ins. This approach is not just about adopting new technology—it’s about advancing with a platform that understands and adapts to the evolving needs of your enterprise.

The Power of Choice in Networking: How The AI Stack Breaks Down Barriers

A lot of people ask me, “What are the problems that you are solving for customers?” At Aviz, we understand that modern networking demands more than just connectivity; it requires agile, scalable solutions that can adapt to the evolving demands of AI-driven environments. We’re tackling the challenges of complexity, vendor lock-in, and prohibitive costs that […]