Exciting Announcement! In celebration of launching our AI Certification, we’re thrilled to offer a 50% discount exclusively. Seize this unique chance—don’t let it slip by!

Categories
Network Copilot Open Networking Enterprise Suite

Join Us at NVIDIA GTC: A Must-Watch Panel on AI in Networking!

AI is transforming every industry—including networking. As AI workloads scale, the infrastructure powering them must evolve. Networks must become smarter, faster, and more efficient to support the next wave of AI-driven innovation. At Aviz Networks, we believe in Networks for AI and AI for Networks—and we’re making it happen.

That’s why I’m thrilled to invite you to an exclusive panel at NVIDIA GTC, where we’ll explore NVIDIA’s role in transforming networking across these two dimensions and how Aviz Networks complements this ecosystem with innovative products.

Panel: Network Modernization in the Age of AI

AI-driven workloads demand a new era of networking—one that redefines how we design, deploy, and optimize infrastructure for peak performance. Our discussion will cover:

Why This Matters

AI is no longer just an application—it’s the backbone of modern enterprise infrastructure. But here’s the challenge: AI workloads are hungry for bandwidth, require extreme precision, and demand real-time optimization. Aviz Networks and NVIDIA are solving this with cutting-edge AI networking innovations.

Don’t just read about AI in networking—experience it. Watch our exclusive demo and join the panel discussion at NVIDIA GTC!

Categories
Open Networking Enterprise Suite

Transforming AI Fabric with ONES: Enhanced Observability for GPU Performance

Explore the latest in AI network management with our ONES 3.0 series

Future of Intelligent Networking for AI Fabric Optimization

If you’re operating a high-performance data center or managing AI/ML workloads, ONES 3.0 offers advanced features that ensure your network remains optimized and congestion-free, with lossless data transmission as a core priority.

In today’s fast-paced, AI-driven world, network infrastructure must evolve to meet the growing demands of high-performance computing, real-time data processing, and seamless communication. As organizations build increasingly complex AI models, the need for low-latency, lossless data transmission, and sophisticated scheduling of network traffic has become crucial. ONES 3.0 is designed to address these requirements by offering cutting-edge tools for managing AI fabrics with precision and scalability.

Building on the solid foundation laid by ONES 2.0, where RoCE (RDMA over Converged Ethernet) support enabled lossless communication and enhanced proactive congestion management, ONES 3.0 takes these capabilities to the next level. We’ve further improved RoCE features with the introduction of PFC Watchdog (PFCWD) for enhanced fault tolerance, Scheduler for optimized traffic handling, and WRED for intelligent queue management, ensuring that AI workloads remain highly efficient and resilient, even in the most demanding environments.

Why RoCE is Critical for Building AI Models

As the next generation of AI models requires vast amounts of data to be transferred quickly and reliably across nodes, RoCE becomes an indispensable technology. By enabling remote direct memory access (RDMA) over Ethernet, RoCE facilitates low-latency, high-throughput, and lossless data transmission—all critical elements in building and training modern AI models.

In AI workloads, scheduling data packets effectively ensures that model training is not delayed due to network congestion or packet loss. RoCE’s ability to prioritize traffic and ensure lossless data movement allows AI models to operate at optimal speeds, making it a perfect fit for today’s AI infrastructures. Whether it’s transferring large datasets between GPU clusters or ensuring smooth communication between nodes in a distributed AI system, RoCE ensures that critical data flows seamlessly without compromising performance.

Enhancing RoCE Capabilities from ONES 2.0 to ONES 3.0

In ONES 3.0, we’ve taken RoCE management even further, enhancing the ability to monitor and optimize Priority Flow Control (PFC) and ensuring lossless RDMA traffic under heavy network loads. The new PFC Watchdog (PFCWD) ensures that any misconfiguration or failure in flow control is detected and addressed in real-time, preventing traffic stalls or congestion collapse in AI-driven environments.

Additionally, ONES 3.0’s Scheduler allows for more sophisticated data packet scheduling, ensuring that AI tasks are executed with precision and efficiency. Combined with WRED (Weighted Random Early Detection), which intelligently manages queue drops to prevent buffer overflow in congested networks, ONES 3.0 provides a holistic solution for RoCE-enabled AI fabrics.

The Importance of QoS and RoCE in AI Networks

Quality of Service (QoS) and RoCE are pivotal in ensuring that AI networks can handle the rigorous demands of real-time processing and massive data exchanges without performance degradation. In environments where AI workloads must process large amounts of data between nodes, QoS ensures that critical tasks receive the required bandwidth, while RoCE ensures that this data is transmitted with minimal latency and no packet loss.

With AI workloads demanding real-time responsiveness, any network inefficiency or congestion can slow down AI model training, leading to delays and sub-optimal performance. The advanced QoS mechanisms in ONES 3.0, combined with enhanced RoCE features, provide the necessary tools to prioritize traffic, monitor congestion, and optimize the network for the low-latency, high-reliability communication that AI models depend on.

In ONES 3.0, QoS features such as DSCP mapping, WRED, and scheduling profiles allow customers to:

By leveraging QoS in combination with RoCE, ONES 3.0 creates an optimized environment for AI networks, allowing customers to confidently build and train next-generation AI models without worrying about data bottlenecks.

1. Comprehensive Interface and Performance Metrics

The UI showcases essential network performance indicators such as In/Out packet rates, errors, and discards, all displayed in real time. These metrics give customers the ability to:
By having access to real-time and historical data, customers can make data-driven decisions to optimize network performance without sacrificing the quality of their AI workloads.

2. RoCE Config Visualization

RoCE (RDMA over Converged Ethernet) is a key technology used to achieve high-throughput and low-latency communication, especially when training AI models, where data packets must flow without loss. In ONES 3.0, the RoCE tab within the UI offers full transparency into how data traffic is managed:

3. Visual Traffic Monitoring: A Data-Driven Experience

The UI doesn’t just give you raw data—it helps you visualize it. With multiple graphing options and real-time statistics, customers can easily monitor:

4. Flexible Time-Based Monitoring and Analysis

Customers have the option to track metrics over various time periods, from live updates (1 hour) to historical views (12 hours, 2 weeks, etc.). This flexibility allows customers to:
This feature is especially valuable for customers running AI workloads, where consistent performance over extended periods is vital for the accuracy and efficiency of model training.

Centralized QoS View

ONES 3.0 offers a unified interface for all QoS configurations, including DSCP to TC mappings, WRED/ECN, and scheduler profiles, making traffic management simpler for network admins.
This page provides administrators with comprehensive insights into how traffic flows through the network, allowing them to fine-tune and optimize their configurations to meet the unique demands of modern workloads.
QoS Profile List
Fig 1 – QoS Profile List

Comprehensive Topology View

ONES offers a comprehensive, interactive map of network devices and their connectivity, ideal for monitoring AI/ML and RoCE environments. It provides an actionable overview that simplifies network management.
AI-ML Topology View
Fig 2 – AI-ML Topology View
Key features include:

Overall, the Topology Page in ONES enhances network observability and control, making it easier to optimize performance, troubleshoot issues, and ensure the smooth operation of AI/ML and RoCE workloads.

Proactive Monitoring and Alerts with the Enhanced ONES Rule Engine

The ONES Rule Engine has been a standout feature in previous releases, providing robust monitoring and alerting capabilities for network administrators. With the latest update, we’ve enhanced the usability and functionality, making rule creation and alert configuration even smoother and more intuitive. Whether monitoring RoCE metrics or AI-Fabric performance counters, administrators can now set up alerts with greater precision and ease. This new streamlined experience allows for better anomaly detection, helping prevent network congestion and data loss before they impact performance.

The ONES Rule Engine offers cutting-edge capabilities for proactive network management, enabling real-time anomaly detection and alerting. It provides deep visibility into AI-Fabric metrics like queue counters, PFC events, packet rates, and link failures, ensuring smooth performance for RoCE-based applications. By allowing users to set custom thresholds and conditions for congestion detection, the Rule Engine ensures that network administrators can swiftly address potential bottlenecks before they escalate.

With integrated alerting systems such as Slack and Zendesk, administrators can respond instantly to network anomalies. The ONES Rule Engine’s automation streamlines monitoring and troubleshooting, helping prevent data loss and maintain optimal network conditions, ultimately enhancing the overall network efficiency.

Conclusion

In an era where AI and machine learning are driving transformative innovations, the need for a robust and efficient network infrastructure has never been more critical. ONES 3.0 ensures that AI workloads can operate seamlessly, with minimal latency and no packet loss.

FAQs

1. Why is RoCE critical for AI infrastructure and model training?

RoCE (RDMA over Converged Ethernet) is essential for AI because it enables:

  • Low-latency, high-throughput data transfers between GPU nodes
  • Lossless communication, vital for real-time model training
  • Efficient memory access without CPU involvement
    This makes RoCE a foundational technology for building and scaling AI/ML workloads.

 ONES 3.0 advances RoCE integration through:

  • PFC Watchdog (PFCWD) for monitoring and recovering from flow control issues
  • Advanced scheduling tools (DWRR, WRR, STRICT) to manage packet priorities
  • WRED-based queue management to prevent buffer overflows

These features ensure network reliability, even under high AI traffic loads.

 Quality of Service (QoS) is crucial for prioritizing AI tasks. ONES 3.0 includes:

  • DSCP and dot1p mapping for accurate traffic classification
  • Priority queue configuration to handle mission-critical packets
  • Real-time congestion alerts and traffic shaping for lossless AI data transmission

Together, these ensure uninterrupted, high-performance AI workloads.

 The ONES UI provides deep visibility into:

  • DSCP and 802.1p mapping to queues and priority groups
  • WRED and PFC stats for congestion handling
  • Scheduler profiles and queue usage across switches

This empowers network admins to proactively tune RoCE traffic and avoid disruptions.

The enhanced ONES Rule Engine enables proactive, automated management through:

  • Custom alert rules for RoCE, queue drops, and link failures
  • Slack/Zendesk integration for instant anomaly notifications
  • Granular threshold settings to prevent issues before they affect AI training
    It turns ONES into an intelligent observability and incident response system.

Categories
Open Networking Enterprise Suite SONiC

Global Reach, Local Insight: ONES 3.0 Delivers Seamless Data Center Management

Explore the latest in AI network management with our ONES 3.0 series

ONES 3.0 introduces a range of exciting new features, with a focus on scaling data center deployments and support. In this blog post, we’ll dive into two standout features: ONES Multisite, a scalable solution for global data center deployments, and enhanced support for SONiC through tech support, servicenow integration and syslog message filtering. Let’s explore how these innovations can benefit your operations.

ONES Multi-site

The ONES rule engine enables incident detection and alert generation, but this data is limited to the specific site managed by each controller. While site data center administrators can use this information to address and resolve issues, enterprise-level administrators or executives seeking an overview of all data centers’ health must access each ONES instance individually, which can be inefficient.

To address this challenge, we introduce ONES Multisite—an application that provides a geospatial overview of anomalies across geographically distributed sites, offering a comprehensive view of the entire network’s health.

ONES instances in different data centers (DCs) around the globe can register with a central multisite application. Upon successful registration, the multisite system periodically polls each site for data related to the number of managed devices (endpoints) and the number of critical alerts. This information is displayed on a map view, showing individual sites, their health status, and last contact times. ONES Multisite also allows users to log in to individual data centers for more detailed information if needed.

ONES Multisite showing DCs across the globe
Fig 1 – ONES Multisite showing DCs across the globe
To provide a quick overview of the health conditions at various sites, different colors and blinking patterns are used

Registering ONES instance with Multisite application

A simple user interface is provided for registering the ONES application to the multisite, requiring inputs such as the site name, multisite IP, and geographical coordinates ((latitude and longitude in N and E). By default, the current location coordinates of the site are auto-populated, but they can be overridden if necessary. License page of ONES application displays the status of registration status with the multisite application.
Multisite Registration Window
Fig 2 – Multisite Registration Window

Once registered, the multisite application will regularly gather data from each site regarding the number of managed devices (endpoints) and the count of critical alerts.

ONES Multisite streamlines the monitoring process across multiple data centers, enabling enterprise-level administrators to easily access vital information and maintain a holistic view of their network’s health. This enhanced visibility not only improves operational efficiency but also empowers teams to respond more effectively to incidents, ensuring optimal performance across all locations.

Enhanced support for SONiC using ONES 3.0

Tech support feature

SONiC Tech Support feature provides a comprehensive method for collecting system information, logs, configuration data, core dumps, and other relevant information essential for identifying and resolving issues. ONES 3.0 Tech Support feature offers an easier way to download the tech support dump from any managed switch. Users can simply select a switch and click on the Tech Support option. ONES controller connects to the switch, executes the tech support command, and notifies the user when the download file is ready. This powerful option allows data center administrators to easily retrieve tech support data without the cumbersome process of logging into each switch, executing the command, and downloading the file.
ONES Tech support page

Fig 3 – ONES Tech Support page

Filtering of syslog messages

The Syslog feature empowers data center operators to easily view and download syslog messages from any of the managed switches through the ONES UI. This functionality is essential for monitoring system performance and diagnosing issues.

To enhance this feature, we’ve introduced a new enhancement that allows users to filter messages based on severity levels, such as error, warning, or all messages. This capability enables operators to quickly identify and prioritize critical alerts, streamlining the troubleshooting process and improving overall operational efficiency. By focusing on the most relevant messages, data center teams can respond more effectively to potential issues, ensuring a more reliable and robust network environment.

Syslog messages with filter applied
Fig 4 – Syslog messages with filter applied

ServiceNow Integration

ServiceNow is a cloud-based platform widely used for IT Service Management, automating business processes, and Enterprise Service Management. One of its core components is the ServiceNow ticketing system, specifically the Incident Management feature. When a user encounters a disruption in any IT service, it is reported as an incident on the platform and assigned to the responsible user or group for resolution.

The ONES Rule Engine proactively monitors the data center for potential disruptive events by creating alerts for any breaches of user-configured thresholds. It tracks various factors, such as sudden surges in CPU usage, heavy traffic bursts, and component failures (e.g., PSU, FAN).

ONES 3.0 enhances this functionality by integrating ServiceNow ticketing with the ONES Rule Engine and Alerts Engine. This integration allows ONES to automatically log tickets in the ServiceNow platform whenever any ONES rule conditions are met.

Rule creation page with Service now integrated
Fig 5 – Rule creation page with Service now integrated
Service now platform with ONES tickets
Fig 6 – Service now platform with ONES tickets
In summary, ONES 3.0 brings significant advancements that cater to the evolving needs of data center management.

To unlock the full potential of ONES 3.0 and see how it can revolutionize your network operations, book your demo today

FAQs

1.What is ONES Multisite and how does it improve global data center monitoring?

ONES Multisite provides a centralized geospatial view of data center health across global sites, allowing enterprise administrators to monitor critical alerts and device statuses from a single interface drastically improving visibility and incident response times.

ONES 3.0 connects its built-in Rule and Alerts Engine with ServiceNow to automatically generate tickets for anomalies like CPU surges, component failures, or bandwidth spikes—ensuring streamlined IT service workflows and faster resolution times.

Yes, ONES 3.0 introduces a simplified “Tech Support” feature that lets users download diagnostic logs from any managed SONiC switch with one click eliminating the need for manual CLI access across devices.

With advanced severity-level filtering (e.g., error, warning, info), ONES 3.0 helps operators quickly pinpoint critical syslog alerts from SONiC switches—accelerating root cause analysis and operational troubleshooting.

ONES 3.0 delivers single-pane visibility, ServiceNow integration, multisite scalability, and simplified support tools—making it the ideal centralized platform for managing complex, AI-powered, multi-vendor data center environments.

Categories
Open Networking Enterprise Suite SONiC

AI Fabric Orchestration: Supercharging AI Networks with SONiC NOS

Explore the latest in AI network management with our ONES 3.0 series

As the demand for high-performance parallel processing surges in the AI era, GPU clusters have become the heart of data-intensive workloads. But it’s not just about the GPUs themselves—intercommunication between GPU servers is the backbone of their overall performance. Enters the network switch fabric, which is pivotal in overcoming communication bottlenecks and ensuring seamless data flow between GPU servers. Technologies like RoCE (RDMA over Converged Ethernet) allow massive chunks of data to move efficiently between servers, but ensuring that these critical data streams remain lossless and uncongested requires a powerful solution.

That’s where SONiC’s QoS (Quality of Service) features come into play. SONiC enables you to prioritize critical data traffic, ensuring high-priority packets are always transferred ahead of other traffic and also that your important data is not lost. Using SONiC’s robust QoS capabilities and ONES 3.0’s orchestration, you can turn your switch fabric into a lossless, priority-driven highway for GPU server communications.

Let’s explore how you can achieve this through SONiC via ONES 3.0 Fabric Manager orchestration tool.

Lossless And Prioritized Data Flow

Any packet entering the fabric with any DSCP/DOT1P marking can be mapped to any queue of the interface and enabling PFC on this queue makes it lossless. With PFC in place, when congestion is detected in the queue, a pause frame is sent back to the sender, signaling it to temporarily halt sending traffic of that priority. This mechanism effectively prevents packet drops, ensuring lossless transmission for traffic of particular priority.

Beyond PFC, there’s another layer of congestion management—Explicit Congestion Notification (ECN). With ECN, we can define buffer thresholds, exceeding which Congestion Notification (ECN-CNP) packets are sent to the sender, prompting it to reduce the transmission rate and proactively avoid congestion.

At this stage, we’ve ensured that our priority traffic is lossless. Moving into the egress phase, we can further enhance performance by prioritizing this traffic over others, even under congestion. SONiC provides scheduling algorithms like Deficit Weighted Round Robin (DWRR), Weighted Round Robin (WRR), and Strict Priority Scheduling (STRICT). By binding priority queues to these schedulers, the system can ensure that higher-priority traffic is transmitted preferentially, either in a weighted manner (for WRR/DWRR) or with absolute priority (for STRICT).

In summary, through PFC, ECN, and advanced scheduling techniques, SONiC ensures that high-priority traffic from GPU servers is not only lossless but also prioritized during both congestion and egress phases.

Simplifying Complex QoS Configurations with ONES Orchestration

Configuring SONiC’s complex QoS features may sound daunting, but with ONES 3.0’s seamless orchestration, it’s a breeze. ONES allows you to set up essential QoS configurations like DSCP to traffic-class mapping, PFC, ECN thresholds, and even scheduler types—all with a few lines in a YAML template. Here’s a snapshot of the YAML template showcasing how ONES orchestrates SONiC QoS (QoS is the section in YAML below)

ONES UI AI Fabric Orchestration YAML Template
Fig 1 – ONES UI AI Fabric Orchestration YAML Template

The Fabric Manager automates the creation and assignment of QoS profiles, saving administrators from manually configuring multiple aspects. Here’s how it works:

Mapping Traffic Classes and Queues

Orchestration begins by mapping traffic into appropriate classes and queues. ONES 3.0 Orchestration allows you to specify mapping values from DSCP (Layer 3) and dot1p (Layer 2) to traffic classes, traffic classes to queues, and traffic classes to priority groups (PGs). Upon specifying these mapping values, profiles would be created with standard namings using these mapping values like DOT1P_TC_PROFILE, TC_QUEUE_PROFILE, TC_PG_PROFILE, DSCP_TC_PROFILE and are binded to the interfaces that are part of the orchestration. This configuration ensures that each type of traffic is routed to its appropriate queue and handled correctly.

For example, we can specify mapping values in the YAML as above in image and FM will create the corresponding profiles and bind it to the interface as below:

Priority Flow Control (PFC) and Explicit Congestion Notification (ECN)

The next critical part of QoS orchestration involves Priority Flow Control (PFC), where ONES YAML allows users to define the queues that should be PFC-enabled. Moreover a PFC Watchdog can be configured to ensure that the PFC is well functioning with restoration, detection times and action to be taken in case of malfunctioning .

ECN configuration parameters can be provided in the YAML template using which ONES Fabric Manager creates a profile WRED_PROFILE and attaches it to all the queues that are PFC enabled for all the interfaces that are part of orchestration.

Here’s an example of how this would be configured on the interface for the YAML input in the above image.

This approach ensures that your network proactively manages congestion and minimizes packet drops for high-priority traffic.

Advanced Scheduling for Optimized Egress

Finally, Scheduling plays a vital role in controlling how packets are forwarded from queues. Orchestration allows administrators to choose between scheduling mechanisms such as Deficit Weighted Round Robin (DWRR), Weighted Round Robin (WRR), or STRICT priority scheduling, depending on their needs.

In the case of DWRR or WRR, weights can be assigned to each queue, influencing how often a queue is serviced relative to others. Upon specifying these parameters in the YAML, ONES-FM creates the scheduler policies (SCHEDULER.<weight>) each for a unique weight assigned to the queues and attach these created policies to the queues according to their weightage for all the interfaces that are part of the orchestration.

For instance in the below given image YAML input, there are two unique weights 60 and 40 that are assigned to queue 3 and 4 respectively. So, two scheduler policies SCHEDULER.40, SCHEDULER.60 are created and binded to the interface queues 3 and 4 respectively.

Now, here comes a question , what if all the queues are congested. How does the congestion notification packets even traverse through the network to reach the sender to stop or slow down the traffic coming in ?

ONES-FM provides an option to designate a specific queue for ECN_CNP (Explicit Congestion Notification packets) traffic, using STRICT scheduling, ensuring that even when the network is heavily congested there is always a room left for the congestion notification packets, preventing further blockages. cnp_queue under the ECN section in the above image represents that and is orchestrated as below by ONES-FM:

Flexible, Day-2 Support for QoS Management

One of the standout features of ONES-FM 3.0 is its support for Day-2 operations. As your network evolves and traffic patterns change, you can modify the QoS configurations through either the YAML template or the NetOps API. This flexibility ensures your network is always tuned to deliver the performance required by your AI workloads.

Future-Proof Your AI Infrastructure with ONES 3.0

With its intuitive YAML-based approach and support for dynamic Day-2 adjustments, ONES Fabric Manager eliminates much of the complexity associated with configuring and managing networks. ONES makes one confident that network infrastructure is both reliable and future-proof. In essence, ONES Fabric Manager enables seamless orchestration for AI fabrics, ensuring your network is always ready to meet the growing demands of AI-driven data centers.

FAQs

1. How does SONiC NOS enable lossless data transfer for GPU-based AI workloads?

SONiC NOS supports Priority Flow Control (PFC) and Explicit Congestion Notification (ECN), ensuring lossless, high-priority traffic flows—critical for real-time communication between GPU clusters in AI data centers.

ONES 3.0 provides YAML-based orchestration to simplify complex SONiC QoS settings like DSCP mapping, PFC, ECN, and queue scheduling, reducing configuration time and errors.

Yes, ONES 3.0 is built for vendor-agnostic orchestration, managing QoS policies across different switches and interfaces—ideal for hybrid or evolving AI network environments.

ONES allows you to designate a CNP (Congestion Notification Packet) queue with STRICT priority, ensuring that even during congestion, ECN messages reach the sender to throttle traffic.

SONiC supports DWRR, WRR, and STRICT scheduling, and ONES Fabric Manager lets you assign and orchestrate these policies via YAML—optimizing egress packet forwarding and queue handling in AI fabrics.

Categories
Open Networking Enterprise Suite SONiC

Streamlining AI Fabric Management: The Imperative of a Centralized Management Platform

Introduction

Artificial Intelligence (AI), once a mere buzzword, has now firmly established itself as a cornerstone of technological advancement. Its insatiable appetite for data fuels its continuous evolution, and generative AI, a subset capable of creating new content, is a prime driving force behind this growth. As datacenters become increasingly AI-centric and drive businesses worldwide, the networking community must assess their readiness for this transformative shift.

The Rapid Pace of AI Development

The pace of AI development is staggering, with years of progress potentially compressed into mere weeks. This rapid evolution necessitates a proactive approach from the networking community to ensure their solutions remain aligned with the cutting-edge advancements in AI. The challenge is multifold, as the increasing demand for networking switches and GPUs opens up opportunities for innovation in multi-vendor ecosystems and data center environments.
GPU Market size and Trend

Fig 1 – GPU Market size and Trend

The Demand for Open and Flexible Networking Solutions

The rapid need for networking switches and GPUs has created a demand for multi-vendor ecosystems and data center environments. This increased demand for freedom from vendor locking has led to a surge in interest for open-source network operating systems (NOS) like SONIC for networking switches. The driving force behind this demand is the consolidation of features offered by multi-vendor hardware suitable for AI Fabrics and overall cost optimization.

Evolving Data Center Network Architectures

As data center network designs evolve from server-centric to GPU-centric architectures, the necessity for new networking topology designs such as fat-tree, dragonfly, and butterfly has become paramount. GPU workloads, including training, fine-tuning, and inferencing, have distinct networking needs, with Remote Direct Memory Access (RDMA) being the most suitable technique to handle high-bandwidth data traffic flows. Lossless networking and low entropy are also essential for optimal performance.
Fig 2 – Evolution of Data Centers

The Need for Centralized Management Solutions

A single pane of glass management tool is essential to streamline operations and optimize performance in multi-vendor AI fabric data centers. Such a tool should be capable of:

Addressing the Challenges of Centralized Management with ONES

Implementing a centralized management tool in a multi-vendor AI fabric data center requires careful consideration of several key challenges:
Aviz understands this need and has implemented ONES 3.0, a centralized management platform that provides comprehensive control over networking devices, AI workload servers and data centers.
Fig 3 – Aviz Open Networking Enterprise Suite (ONES) for AI Fabrics

The Future of Networking in the AI Era

As AI continues to evolve and its applications expand, the networking community must adapt to the changing landscape. By embracing open-source solutions, adopting new network topologies, and leveraging centralized management platforms like ONES 3.0, organizations can ensure their networks are well-equipped to support the demands of AI-driven workloads. The future of networking is inextricably linked to the advancement of AI, and those who are proactive in their approach will be well-positioned to capitalize on the opportunities that lie ahead.

All these cutting-edge innovations only mark the initial stride towards Aviz Networks’ vision, and more is yet to come. With our strong team of support engineers, we are well-equipped to empower customers with a seamless SONiC journey using the ONES platform.

As AI-driven networks grow in complexity, a centralized management platform like ONES 3.0 by Aviz Networks is essential. It provides seamless control, real-time monitoring, and multi-vendor compatibility to tackle the unique demands of AI workloads. Future-proof your network with ONES 3.0—because the future of AI fabric management starts here.

Explore more about ONES 3.0 in our latest blogs here

If you wish to get in touch with me, feel free to connect on LinkedIn here

FAQs

1. Why is centralized management essential for AI Fabric networks?

Centralized management platforms like ONES 3.0 simplify multi-vendor orchestration, offer real-time GPU and network telemetry, and streamline configuration and monitoring for evolving AI data center topologies.

ONES 3.0 supports vendor-agnostic infrastructure, enabling seamless control across switches, NICs, and GPUs, while delivering lossless RDMA optimization, topology orchestration (fat-tree, dragonfly), and proactive alerting.

Top features include:

  • Real-time infrastructure visualization
  • Multi-topology orchestration (fat-tree, dragonfly, butterfly)
  • GPU and NIC telemetry
  • Priority Flow Control (PFC)
  • End-to-end anomaly detection

Yes, ONES 3.0 is optimized for AI/ML GPU workloads and RoCE-based RDMA traffic, enabling QoS profile automation, PFC watchdogs, and deep visibility into compute and network fabric.

ONES 3.0 supports fat-tree, dragonfly, and butterfly network topologies, enabling scalable, high-performance designs tailored to the latency and throughput needs of modern AI fabrics.

Categories
Open Networking Enterprise Suite

Announcing New Features in AI Network Management and Operations

We are thrilled to announce the release of ONES 3.0, a pivotal update in our ongoing innovation journey with Open Networking Enterprise Suite (ONES). This release furthers our mission of building ‘Networks for AI and AI for Networks,’ setting a new benchmark in network management and operations. With enhanced Visibility, AI Fabric Manager, and Support, ONES 3.0 is not merely an upgrade—it’s a significant stride forward. This version introduces advanced features that significantly boost the sophistication and efficiency of network operations, embodying our commitment to continuously push the boundaries of what’s possible in network orchestration and management.

"With the launch of ONES 3.0, we are enhancing the observability and orchestration of AI-Fabric network infrastructure tailored for GPU-centric workloads. This release offers improved visibility into compute metrics, including GPUs and network interface cards, enabling comprehensive end-to-end observability across multi-site AI infrastructures. Additionally, it strengthens fabric management with the inclusion of RoCE (QoS Profiles) configuration providing single-click Day 0 deployment for AI deployments. ONES 3.0 reflects our commitment to innovation, empowering customers to efficiently manage and optimize complex networks."

Key Features of ONES 3.0

ONES Multi-site

Multi-site offers a revolutionary way to visualize network anomalies across geographical locations. This intuitive, geospatial interface provides a comprehensive view of network health by representing anomalies on a map, making it easier to identify and address issues that span multiple sites. This feature is particularly valuable for organizations with geographically dispersed networks, as it allows for a unified and detailed perspective of network performance.

AI Fabric Manager

ONES AI Fabric Manager enhances the management and optimization of AI workloads, streamlining the deployment of AI/ML tasks across your network for efficient resource utilization. It automates the creation and assignment of QoS profiles, reducing the need for manual configuration.

Orchestration framework enables mapping of DSCP at Layer 3 and IEEE 802.1p at Layer 2 to traffic classes, which can then be linked to queues and priority groups. A key feature is Priority Flow Control (PFC), allowing users to define PFC-enabled queues for lossless traffic management. Additionally, a PFC watchdog can monitor functionality and initiate recovery actions if needed. The framework also supports ECN and various scheduling options, such as DWRR, WRR, and Strict Priority Scheduling for dynamic traffic management.

With AI Fabric visibility, administrators gain real-time insights into workload performance and resource utilization, facilitating proactive management. Detailed analytics help monitor trends, identify bottlenecks, and inform future capacity planning.

GPU and NIC Visibility

ONES 3.0 introduces a standout feature that enhances network performance by providing advanced visibility into GPU server metrics. The ONES agent on the server enables real-time monitoring of key metrics across network interfaces, GPUs, CPUs, and system-wide parameters, once integrated with the ONES platform. It supports a wide range of hardware vendors and configurations, ensuring adaptability and comprehensive monitoring. This capability is particularly valuable for tracking real-time server data and accommodating complex AI/ML workloads, ensuring that your network can handle even the most demanding computational tasks efficiently

ServiceNow Integration

Experience the powerful ONES Rule Engine and Alerts system, now integrated with ServiceNow ticketing. The ONES anomaly detection engine automatically reports issues, streamlining incident management. This integration connects ONES with your existing IT service management infrastructure, enhancing change control and overall network operations. The seamless integration simplifies the maintenance and optimization of your network environment

Support Enhancements

ONES now offers enhanced customer support through a single-pane access to the tech support page and syslog, providing comprehensive support resources and troubleshooting tools. The Tech Support feature allows for the efficient collection of system information, logs, configuration data, core dumps, and other critical data needed to diagnose and resolve issues. A new enhancement enables users to filter messages based on severity levels, such as errors, warnings, or all messages. This feature helps operators quickly identify and prioritize critical alerts, streamlining the troubleshooting process and improving overall operational efficiency.

A New Era of Network Management

ONES 3.0 features a suite of innovative functionalities and an enhanced user interface. This release revolutionizes network orchestration and management, providing the tools and capabilities needed to stay ahead in an increasingly complex network landscape.

To explore the full potential of ONES 3.0 and discover how it can transform your network operations, visit us at Aviz Networks Embark on your journey toward seamless network monitoring and orchestration today.

FAQs

1. What’s new in ONES 3.0 for AI network management?

ONES 3.0 introduces multi-site anomaly visualization, GPU and NIC telemetry, AI Fabric QoS orchestration, ServiceNow integration, and enhanced support tools—designed to streamline operations in AI-centric, multi-vendor networks.

With agent-based telemetry, ONES 3.0 provides real-time GPU, CPU, NIC, and system-level metrics, enabling proactive monitoring of AI workloads across diverse hardware and vendors.

Yes, ONES 3.0 includes geospatial multi-site anomaly detection, allowing operators to monitor network health and troubleshoot issues across multiple geographic data centers in a single pane.

ONES 3.0 automates DSCP-to-queue mapping, Priority Flow Control (PFC), ECN, and scheduling policies like WRR, DWRR, and strict priority—ideal for RoCE-based, lossless AI fabric environments.

ONES 3.0 features native ServiceNow ticketing integration, enabling automatic issue creation via its rule engine—bridging network telemetry with ITSM workflows for faster incident resolution.

Categories
Open Networking Enterprise Suite SONiC

The Power of Choice in Networking: How The AI Stack Breaks Down Barriers

A lot of people ask me, “What are the problems that you are solving for customers?” At Aviz, we understand that modern networking demands more than just connectivity; it requires agile, scalable solutions that can adapt to the evolving demands of AI-driven environments. We’re tackling the challenges of complexity, vendor lock-in, and prohibitive costs that many face in traditional network setups. Our AI Networking Stack isn’t just about keeping your network running; it’s about advancing it to think, predict, and operate more efficiently.

At Aviz, we are reshaping networks for the AI era by pioneering both ‘Networks for AI’ and ‘AI for Networks’. Our AI Networking Stack offers unparalleled choice, control, and cost savings, designed to enhance orchestration, observability, and real-time alerts in a vendor-agnostic environment. We’re not just providing solutions; we’re transforming networks with advanced LLM-based learning for critical operations, ensuring powerful, open-source solutions that drive innovation at a fraction of the cost.

We lead the journey to redefine networking with a data-centric approach that seamlessly integrates with any switch and network operating system, delivering performance that rivals the top OEM solutions—all while focusing on the core pillars of choice, control, and cost-effectiveness.

So, if you value having choices, staying in control, and achieving cost savings, read on to discover how our innovative solutions can transform your network management experience.

Now, let’s take a closer look at what sets our technology apart. Here is the detailed overview of our AI Networking Stack:

We’ve meticulously developed each layer of our AI Networking Stack to address the unique challenges our customers face in today’s dynamic network environments. From foundational hardware choices to advanced AI-driven functionalities, let’s dive into the specifics of what makes our technology stand out in the industry.

First, let’s discuss why choosing a vendor-agnostic approach is so crucial. Imagine using Linux. Does it really matter you use it on HPE, DELL, Lenovo Servers, or even at AWS, Azure or GCP. That’s the kind of interoperability we bring to the networking world. Similar to what Linux did for the tech industry, we leverage the open-source SONiC operating system, enhanced by strong partnerships and robust community support. This approach offers an array of choices, enabling hardware selections from our partner ecosystem without any constraints.

At the heart of our innovative lineup is ONES (Open Networking Enterprise Suite), which empowers you with real-time visibility, seamless orchestration, advanced anomaly detection, and AI fabric functionality, including RoCE, across multiple vendors.
This means you have full control over which hardware solutions you implement, supported by our dedicated 24/7 customer service. ONES is designed to give you the freedom to manage your network without vendor lock-in, ensuring flexibility and control in your hands. Another critical aspect of our strategy to ensure cost efficiency is the Open Packet Broker (OPB).
Built on the powerful, community-driven SONiC platform, the OPB mirrors the capabilities of traditional Network Packet Brokers (NPB) but at a fraction of the cost. This solution delivers all the traditional functionalities you expect but optimizes them to offer significant cost savings without sacrificing performance or scalability.
Sitting atop our stack is the GenAI-based Network Copilot, your AI-powered assistant that simplifies all aspects of network management—from routine upgrades and audit reports to complex troubleshooting tasks. This tool is designed to enhance your operational efficiency, dramatically reducing the time and effort required for network management tasks, thereby freeing up your team to focus on strategic initiatives that drive business growth.

Our AI Networking Stack is designed to be the backbone of future network management, integrating advanced AI to navigate the complexities of modern networking with sophistication and ease. Opting for a vendor-agnostic approach provides the flexibility to choose the best technologies at the most effective prices, ensuring your network remains robust, scalable, and primed for future technological advancements.

Explore the benefits of a networking solution that brings choices, control, and cost savings without the constraints of traditional vendor lock-ins. This approach is not just about adopting new technology—it’s about advancing with a platform that understands and adapts to the evolving needs of your enterprise.

FAQs

1. What is an AI Networking Stack and why is it important for modern enterprises?

The AI Networking Stack refers to a multi-layered, AI-powered network management architecture that provides real-time orchestration, anomaly detection, and observability across multivendor environments. Aviz AI Stack combines ONES, Open Packet Broker (OPB), and Network Copilot™ to deliver flexibility, vendor-agnostic integration, and cost-efficiency—ideal for businesses modernizing their networks for AI-era workloads.

Unlike traditional OEM networks that tie users into proprietary hardware and software, Aviz’s open-source SONiC-based stack supports interoperability across Dell, HPE, Cisco, Arista, and others. This vendor-agnostic approach lets enterprises avoid lock-in, reduce costs, and adapt faster to evolving infrastructure needs.

Aviz’s stack includes:

  • ONES for orchestration, visibility, and AI fabric monitoring
  • Open Packet Broker (OPB) for cost-efficient traffic visibility
  • Network Copilot™ for AI-powered network insights and automation

These tools work together to deliver full-stack AI networking without dependency on any specific NOS or hardware.

Network Copilot™ leverages LLM-based AI to provide intelligent chat-based troubleshooting, performance diagnostics, upgrade checks, and real-time analytics. It streamlines operations and replaces repetitive tasks with intelligent automation—making it ideal for both network engineers and business leaders.

SONiC (Software for Open Networking in the Cloud) enables hardware-agnostic deployment, similar to how Linux enabled OS flexibility. Aviz builds on SONiC to offer full-stack solutions (ONES, OPB, Copilot) that allow organizations to choose hardware freely, while retaining full control over orchestration, observability, and AI integration.

Categories
Open Networking Enterprise Suite SONiC

ONE Data Lake & AWS S3 – Enhancing data Management and Analytics – Part 2

In February, we introduced the ONE Data Lake as part of our ONES 2.1 release, highlighting its integration capabilities with Splunk and AWS. In this blog post, we’ll delve into how the Data Lake integrates specifically with the S3 bucket of AWS.

A data lake functions as a centralized repository designed to store vast amounts of structured, semi-structured, and unstructured data on a large scale. These repositories are typically constructed using scalable, distributed, cloud-based storage systems such as Amazon S3, Azure Data Lake Storage, or Google Cloud Storage. A key advantage of a data lake is its ability to manage large volumes of data from various sources, providing a unified storage solution that facilitates data exploration, analytics, and informed decision-making.

Aviz ONE-Data Lake acts as a platform that enables the migration of on-premises network data to cloud storage. It includes metrics that capture operational data across the network’s control plane, data plane, system, platform, and traffic. As an enhanced version of the Aviz Open Networking Enterprise Suite (ONES), ONE-Data Lake stores the metrics previously used in ONES in the cloud.

Why AWS S3?

Amazon S3 (Simple Storage Service) is often used as a core component of a data lake architecture, where it stores structured, semi-structured, and unstructured data. This enables comprehensive data analytics and exploration across diverse data sources. S3 is widely used for several reasons:
S3 integrates seamlessly with a wide range of AWS services and third-party tools, significantly enhancing data processing, analytics, and machine learning workflows. This seamless integration allows for efficient data ingestion, real-time analytics, advanced data processing, and robust machine learning model training and deployment, creating a powerful and cohesive ecosystem for comprehensive data management and analysis.
S3 is engineered for complete durability, ensuring that your data is exceptionally safe and consistently accessible. This level of durability is achieved through advanced data replication across multiple geographically dispersed locations, providing robust protection against data loss and guaranteeing high availability.
S3 offers comprehensive security and compliance capabilities, providing a robust framework for safeguarding data and ensuring regulatory adherence. This includes advanced data encryption, both at rest and in transit, ensuring that sensitive information remains protected throughout its lifecycle. Additionally, S3 provides granular access management tools, such as AWS Identity and Access Management (IAM), bucket policies, and access control lists (ACLs), allowing fine-tuned control over who can access and modify data. These features, combined with compliance certifications for various industry standards (such as GDPR, HIPAA, and SOC), make S3 a secure and reliable choice for data storage in highly regulated environments.
S3’s capability to handle virtually unlimited amounts of data makes it an unparalleled choice for building and maintaining expansive data lakes that require storing massive volumes of information. This scalability empowers organizations to seamlessly scale their storage needs without upfront investments in infrastructure, accommodating growing data demands effortlessly. This capability is crucial for enterprises seeking to centralize and manage diverse data types, enabling advanced analytics, machine learning, and other data-driven initiatives with agility and reliability.
S3 provides flexible pricing models and a variety of storage classes to optimize costs based on data access patterns. Users can take advantage of storage classes like S3 Standard, S3 Intelligent-Tiering, S3 Standard-IA, S3 One Zone-IA, and S3 Glacier to manage expenses efficiently.
S3 offers robust data management capabilities, including versioning, lifecycle policies, and replication, which streamline data governance and archival processes. These features ensure data integrity, compliance, and resilience across various use cases, from regulatory compliance to disaster recovery planning. This capability empowers businesses to unlock the full potential of their data assets, supporting diverse applications such as predictive analytics, business intelligence, and real-time reporting with ease and efficiency.
S3’s robust features, including cross-region replication and lifecycle policies, establish it as an exceptional solution for disaster recovery strategies, ensuring data redundancy and resilience. Furthermore, S3’s lifecycle policies enable automated management of data throughout its lifecycle, facilitating seamless transitions between storage tiers and automated deletion of outdated or unnecessary data. Together, these features make S3 a reliable backup solution that enhances data durability and availability, providing organizations with peace of mind knowing their critical data is securely stored and accessible even in unforeseen circumstances.

Integrating S3 with ONES:

Steps involved to integrate the S3 cloud service with ONES,
To integrate the S3 service with ONES, follow these steps:
By accurately providing these details, you can effectively configure and integrate the S3 service with ONES, facilitating smooth metric collection and analysis.
Figure 1: Cloud Instance configuration page in ONES
Figure 2: Instance created and ready for data streaming
The cloud instance created within ONES offers several management options to enhance user experience and sustainability. Users can update the integration settings, pause and resume metric uploads to the cloud, and delete the created integration when needed. These features make it easy for users to maintain and manage their cloud endpoint integrations effectively.
Figure 3 : Updating the integration details
Figure 4: Option to pause and resume the metric streaming to cloud
Figure 5: Option to delete the integration created
The end user has the flexibility to select which metrics from their network monitored by ONES should be uploaded to the designated cloud service. This ONES 2.1 release supports various metrics, including Traffic Statistics, ASIC Capacity, Device Health, and Inventory. Administrators can choose and deselect metrics from the available list within these categories according to their preferences.
Figure 6 : Multiple options available for metric update on cloud
The metric update is not limited to any particular hardware or network operating system (NOS). ONE-Data Lake’s data collection capability extends across various network operating systems, including Cisco NX-OS, Arista AOS, SONiC, and Non-SONiC. Data streaming occurs via the gnmi process on SONiC-supported devices and through SNMP on OS from other vendors.
Figure 7: ONES inventory showing multiple vendor devices streaming

S3 Analytical capabilities:

Analyzing data stored in an S3 bucket can be accomplished through various methods, each leveraging different AWS services and tools. Here are some key methods:

AWS Athena:

Description: A serverless interactive query service that allows you to run SQL queries directly against data stored in S3.

Use Case: Ad-hoc querying, data exploration, and reporting.

Example: Querying log files, CSVs, JSON, or Parquet files stored in S3 without setting up a database.

Figure 8 - Data stored in S3 bucket in json format

AWS Glue:

Description: A managed ETL (Extract, Transform, Load) service that helps prepare and transform data for analytics.

Use Case: Data preparation, cleaning, and transformation.

Example: Cleaning raw data stored in S3 and transforming it into a more structured format for analysis.

Figure 9 - Pie Chart in S3 representing the data from different NOS vendors

AWS SageMaker:

Description: A fully managed service for building, training, and deploying machine learning models.

Use Case: Machine learning and predictive analytics.

Example: Training machine learning models using large datasets stored in S3 and deploying them for inference.

Third-Party Tools:

Description: Numerous third-party tools integrate with S3 to provide additional analytical capabilities.

Use Case: Specialized data analysis, data science, and machine learning.

Example:Using tools like Databricks, Snowflake, or Domo to analyze and visualize data stored in S3.

Custom Applications:

Description: Developing custom applications or scripts that use AWS SDKs to interact with S3.

Use Case: Tailored data processing and analysis.

Example: Writing Python scripts using the Boto3 library to process data in S3 and generate reports.

Conclusion:

Aviz ONE-Data Lake serves as the cloud-native iteration of ONES, facilitating the storage of network data in cloud repositories. It operates agnostically across various cloud platforms and facilitates data streaming from major network device manufacturers like Dell, Mellanox, Arista, and Cisco. Network administrators retain flexibility to define which metrics are transferred to the cloud endpoint, ensuring customized control over the data storage process.

Unlock the ONE-Data Lake experience— schedule a demo on your preferred date, and let us show you how it’s done!

FAQs

1. What are the benefits of integrating ONE Data Lake with AWS S3 for network data storage?

 Integrating Aviz ONE Data Lake with AWS S3 enables:

  • Centralized cloud storage for network telemetry
  • Unlimited scalability for growing datasets
  • Enhanced security with AWS encryption and IAM controls
  • Durable and highly available storage across regions

Flexible analytics through services like AWS Athena, Glue, and SageMaker
This combination helps enterprises achieve cost-effective, compliant, and powerful data management.

To set up AWS S3 integration with ONE Data Lake:

  • Provide your ARN role, region, bucket name, and (optionally) external ID
  • Configure your S3 instance on the ONES cloud interface

Select desired network metrics (e.g., Traffic Stats, Device Health) for uploading
This ensures seamless cloud metric collection customized to your organization’s needs.

With ONES 2.1, administrators can selectively upload metrics like:

  • Traffic Statistics
  • ASIC Capacity Metrics
  • Device Health and Platform Monitoring
  • Inventory Data
    The flexibility to customize and filter metrics helps optimize storage costs and streamline analytics pipelines.

Yes!
Aviz ONE Data Lake collects and streams telemetry across multiple NOS platforms, including:

  • Cisco NX-OS
  • Arista EOS
  • SONiC
  • Cumulus Linux and other non-SONiC devices .It uses gNMI for SONiC and SNMP for other vendors, ensuring multi-vendor support without limitations.

  • AWS Athena enables SQL-based querying directly on raw S3 data (no database setup needed).

  • AWS Glue automates ETL workflows, prepping raw network telemetry for structured analytics.
  • AWS SageMaker builds ML models using S3-stored datasets for predictive network optimization.

Together, these services transform raw network data into actionable insights and machine learning opportunities.

Categories
Open Networking Enterprise Suite SONiC

ONE Data Lake & Splunk: Revolutionizing Network Data Analytics – Part 1

In February, we introduced the ONE Data Lake as part of our ONES 2.1 release, highlighting its integration capabilities with Splunk and AWS. In this blog post, we’ll delve into how the Data Lake integrates specifically with Splunk.

A data lake serves as a centralized storage facility capable of accommodating large quantities of structured, semi-structured, and unstructured data on a significant scale.These are typically built using scalable distributed cloud-based storage systems, such as Amazon S3, Azure Data Lake Storage, or Google Cloud Storage.

A pivotal benefit of a data lake lies in its capacity to handle substantial amounts of data from diverse origins, offering a cohesive storage solution conducive to data exploration, analytics, and informed decision-making processes.

Aviz ONE-Data Lake functions as a platform facilitating the migration of on-premises network data to cloud storage. It encompasses metrics that capture operational data across the network’s control plane, data plane, system, platform, and traffic. Serving as an upgraded iteration of Aviz Open Networking Enterprise Suite (ONES), ONE-Data Lake stores the metrics previously utilized in ONES onto the cloud.

Why Splunk?

Splunk is highly significant for organizations across diverse industries for multiple reasons:
Splunk empowers organizations to obtain immediate insights from their operational data, facilitating the monitoring of system and application health and performance. This capability aids in promptly identifying and addressing issues, thereby reducing downtime and enhancing operational efficiency.
Splunk is extensively utilized for Security Information and Event Management (SIEM) objectives, aiding organizations in overseeing their IT environments for security threats and irregularities. By correlating data from diverse sources, it can efficiently identify and address security incidents, thereby bolstering the overall cybersecurity stance.
Splunk supports regulatory adherence and oversight by empowering organizations to gather, analyze, and report on data pertinent to regulatory requirements and industry standards. This capability is especially critical for sectors like finance, healthcare, and government, where stringent compliance mandates are in place.
Splunk aids in IT operations and DevOps practices by providing visibility into IT infrastructure, application performance, and deployment processes. This allows organizations to identify areas for optimization, streamline operations, and accelerate the development and delivery of software applications
Splunk equips organizations with machine learning and predictive analytics functionalities, empowering them to uncover patterns, detect anomalies, and forecast outcomes from their data. This supports proactive resolution of issues, capacity planning, and efforts in risk management

Splunk can be utilized to assess customer interactions and feedback from various channels, enabling organizations to delve deeper into customer requirements and preferences. This information can then be utilized to tailor offerings, elevate customer satisfaction levels, and nurture brand loyalty

To sum up, Splunk is an essential tool for organizations to leverage data efficiently, promoting operational excellence, strengthening security measures, ensuring compliance, and achieving business objectives.

Integrating Splunk with ONES:

Steps involved to integrate the Splunk cloud service with ONES,
To integrate the Splunk service with ONES, follow these steps:
By ensuring these details are accurately provided, you can successfully configure and integrate the Splunk service with ONES, enabling seamless metric collection and analysis.
Figure 1: Cloud Instance configuration page in ONES
Figure 2: Instance created and ready for data streaming
The cloud instance created within ONES offers several management options to enhance user experience and sustainability. Users can update the integration settings, pause and resume metric uploads to the cloud, and delete the created integration when needed. These features make it easy for users to maintain and manage their cloud endpoint integrations effectively.
Figure 3 : Updating the integration details
Figure 4: Option to pause and resume the metric streaming to cloud
Figure 5: Option to delete the integration created
The end user has the flexibility to select which metrics from their network monitored by ONES should be uploaded to the designated cloud service. This ONES 2.1 release supports various metrics, including Traffic Statistics, ASIC Capacity, Device Health, and Inventory. Administrators can choose and deselect metrics from the available list within these categories according to their preferences.
Figure 6 : Multiple options available for metric update on cloud
The metric update is not limited to any particular hardware or network operating system (NOS). ONE-Data Lake’s data collection capability extends across various network operating systems, including Cisco NX-OS, Arista AOS, SONiC, and Non-SONiC. Data streaming occurs via the gnmi process on SONiC-supported devices and through SNMP on OS from other vendors.
Figure 7: ONES inventory showing multiple vendor devices streaming

Splunk Analytical capabilities:

Events within Splunk generally contain timestamped data alongside related metadata and content. Each event undergoes parsing and indexing separately, facilitating users to efficiently search, analyze, and visualize data. Splunk automatically extracts fields from events during indexing, streamlining filtering and correlation based on specific criteria.
Figure 8 - Inventory details from NX-OS is captured as events in Splunk
This entails visually depicting data using charts or graphs, aiding users in comprehending patterns, trends, and relationships within the data more readily than analyzing raw data alone. These graphical representations encompass diverse types such as bar charts, line charts, pie charts, scatter plots, and others, each tailored to specific data types and analytical objectives
Figure 9 - Pie Chart in Splunk representing the data from different NOS vendors

Conclusion:

Aviz ONE-Data Lake functions as the cloud-based version of ONES, enabling the storage of network data in cloud repositories. It operates independently of any particular cloud platform and supports data streaming from leading network device manufacturers such as Dell, Mellanox, Arista, and Cisco. Network administrators have the freedom to specify the metrics they want to transfer to the cloud endpoint, granting customized control over the data storage procedure.

Schedule your demo today because with ONE Data Lake integrated with Splunk, you’re not just managing data — you’re revolutionizing network analytics for unparalleled insights and efficiency.

FAQs

1. What is Aviz ONE Data Lake and how does it enhance network data analytics?

 Aviz ONE Data Lake is a cloud platform that collects and stores telemetry from multi-vendor networks.It centralizes operational, traffic, and device health data — giving teams a single source for deep analytics, proactive decisions, and smarter network management.

Connecting ONE Data Lake with Splunk gives teams:

  • Real-time analytics
  • Powerful dashboards
  • Anomaly detection
    It helps detect issues faster, optimize resources, strengthen security, and improve operational visibility across all network layers.

You can stream:

  • Traffic statistics
  • ASIC utilization
  • Device health
  • Network inventory
    from SONiC and non-SONiC devices like Cisco, Arista, Dell — using gNMI and SNMP protocols.

No!
ONE Data Lake is vendor-neutral.
It supports multi-vendor environments — including SONiC, Cisco NX-OS, Arista EOS, and more — so you get unified observability across your entire network.

Through ONES, users can:

  • Update integration settings
  • Pause or resume uploads
  • Select which metrics to send

Delete integrations when needed
It’s simple, flexible, and designed for dynamic network environments.

Categories
Open Networking Enterprise Suite

From Hype to Reality: Navigating the Challenges of AI in Network Telemetry

AI is riding the crest of a technological wave, crowned the “Peak of Inflated Expectations” by Gartner’s 2023 Hype Cycle. Platforms like ChatGPT have become more than just buzzwords; they’re blazing a trail into a new era of technological possibilities. This isn’t a fleeting fad; it’s a fuel injection for innovation, poised to transform the landscape across industries including the Networking domain.

Think beyond chatbots and clever tweets.  AI’s true potential lies in its ability to learn, adapt, and create. It can craft personalized experiences, generate realistic synthetic data, and even write code, all while pushing the boundaries of what we thought possible. This isn’t just about hype; it’s about harnessing the power of creativity to revolutionize the way we live, work, and play. So buckle up, because the AI revolution is just getting started. And this blog, let me give some insights into how AI can transform Network telemetry and enhance the experience.

Gartner Hype Cycle - AI

Understanding Network Telemetry and applying AI

What is Network Telemetry?

Network Telemetry is the process of data collection, inspection, normalization and interpreting to generate information that helps the end user to visualize the network state and make decisions.

Beyond simply collecting data, network telemetry transforms it into actionable intelligence. Through meticulous analysis and normalization, it illuminates the network’s current state, enabling informed decisions and proactive interventions. Think of it as the network’s nervous system, providing a constant pulse of information for precise navigation.

Harnessing the Power of AI for Network Telemetry

The convergence of AI and network telemetry represents a significant evolutionary leap in network management. By integrating AI’s analytical prowess with established telemetry infrastructure, we can unlock transformative benefits that enhance network security, optimize resource allocation, and streamline troubleshooting.

Elevating Network Intelligence:

Beyond Hype, Embracing a Paradigm Shift:

The integration of AI into network telemetry isn’t just a technological trend; it’s a strategic imperative. By embracing this transformative technology, organizations can build a future-proof network infrastructure characterized by enhanced security, proactive efficiency, and informed decision-making. This is not a revolution, but an evolution, a seamless integration of AI’s capabilities to empower existing systems and propel network management to new heights.

Reframing the Challenges: Building Robust AI for Network Telemetry

While the promises of AI in network telemetry are vast, navigating its implementation requires careful consideration of several key challenges:

Data-Driven Foundations:

Trust and Transparency:

AI TRISM: Transforming Network Telemetry with Trust, Reliability, and Safety

Applying the AI TRISM framework to network telemetry unlocks a new era of trust, reliability, and safety in our connected world. Trust is bolstered by transparent models that explain how anomalies are detected and prioritized, allowing network administrators to understand and make informed decisions. Reliability soars through AI-powered anomaly detection, automatically pinpointing issues before they snowball into outages, while synthetic data generation ensures robust training even with limited real-world telemetry. Safety takes center stage as AI models learn to differentiate between harmless fluctuations and genuine threats, protecting critical infrastructure from cyberattacks and malicious actors.

Imagine a network humming with the silent symphony of AI. Anomalous blips in traffic flow are instantly flagged, not by rigid thresholds, but by AI models continuously learning the network’s healthy rhythm. Security threats are swiftly identified and neutralized, not through brute force, but by AI’s uncanny ability to discern friend from foe. This is the future of network telemetry, powered by AI TRISM – a future where trust, reliability, and safety weave a protective web around our increasingly interconnected lives.

We, at Aviz, are harnessing the power of AI to make significant improvements in the networking landscape. Expect even more advancements to come from us soon.

Contact us today because with our cutting-edge AI solutions, you’re not just navigating the hype — you’re transforming your network telemetry into a powerhouse of innovation, efficiency, and security.

FAQs

1. How can AI transform traditional network telemetry and observability?

AI enhances traditional network telemetry by enabling real-time anomaly detection, automated root cause analysis, predictive traffic forecasting, and proactive infrastructure optimization. It transforms telemetry from passive data collection into dynamic, actionable intelligence that improves security, resilience, and operational efficiency.

Key benefits include advanced anomaly detection, faster troubleshooting through automated diagnostics, improved capacity planning with predictive analytics, proactive threat identification, and delivering real-time, human-readable insights via AI-powered chatbots and conversational interfaces for network teams.

Challenges include ensuring high-quality, diverse telemetry data, selecting and adapting the right AI models, building explainable and transparent systems, preventing AI “hallucinations” or false outputs, and continuously training models to align with evolving network topologies and threat landscapes.

AI TRISM (Trust, Reliability, and Safety Management) improves network telemetry by enforcing transparency, reliable anomaly detection, and safe behavior prediction. It ensures AI models are explainable, resilient against adversarial inputs, and capable of differentiating real threats from harmless fluctuations.

Explainability ensures that network teams can trust and understand AI-driven insights, making it easier to justify actions, detect false positives, and continuously improve model performance. Transparent AI builds operational confidence and fosters responsible decision-making in critical network environments.

Contact Us

From Hype to Reality: Navigating the Challenges of AI in Network Telemetry

AI is riding the crest of a technological wave, crowned the “Peak of Inflated Expectations” by Gartner’s 2023 Hype Cycle. Platforms like ChatGPT have become more than just buzzwords; they’re blazing a trail into a new era of technological possibilities. This isn’t a fleeting fad; it’s a fuel injection for innovation, poised to transform the […]