In today’s interconnected world, Network Operations (NetOps) Support Framework is crucial for organizations to maintain a robust and reliable network infrastructure. It provides the foundation to manage and optimize network performance, ensure seamless connectivity, and address other related issues. In this post, we bring you an overview of NetOps Support Frameworks, their key components, and significance in maintaining efficient operations. We also talk about SLAs and their benefits in NetOps Support Framework.
Components of NetOps Support Frameworks
Let’s quickly glance through a few critical components.
1. Network Monitoring and Management
This component covers:
- Real-time monitoring of network devices and traffic
- Performance analysis and reporting
- Configuration management and compliance
- Network inventory and asset management
The next-generation management tools offer extensions for supporting advanced functions that include:
- Network Orchestration
- Streaming Telemetry
Network Orchestration and Telemetry Streaming work together to enable the automation, control, and visibility of network operations while leveraging real-time telemetry data for enhanced network management and analysis. Let’s understand these functions in detail.
Network Orchestration
This function represents the overall system responsible for orchestrating and automating network operations, including configuration management, service provisioning, and network policies.
It includes a core component, Orchestration Engine, that receives high-level commands/policies and further, translates them into actionable tasks for Network devices. A network device is a physical or virtual one that makes up the network infrastructure such as a router, switch, firewall, or load balancer.
Telemetry Streaming
This function represents the process of collecting, aggregating, and forwarding real-time network telemetry data to various telemetry consumers for analysis and decision-making purposes.
Here, Telemetry Collector acts as an intermediary component responsible for collecting telemetry data from network devices, leveraging protocols like gRPC, NETCONF, or SNMP. Telemetry Consumers refer to the applications, systems, or analytics platforms that consume and analyze network telemetry data. These consumers can include network monitoring tools, data analytics platforms, and machine learning systems.
2. Fault Management and Troubleshooting
This component includes:
- Rapid detection and isolation of network issues
- Root cause analysis and remediation
- Incident management and escalation processes
3. Change Management and Configuration
1. Control and coordination of network changes
2. Version control and documentation
3. Change approval processes and tracking
4. Performance Optimization
1. Capacity planning and bandwidth management
2. Quality of Service (QoS) implementation
3. Traffic engineering and optimization
4. Proactive network optimization strategies
5. Security and Compliance
1. Network security monitoring and threat detection
2. Firewall management and access control
3. Compliance with industry regulations (for example PCI-DSS, GDPR)
4. Vulnerability assessment and patch management
Supporting Multi-Vendor NOS and Switch Hardware
In today’s diverse networking landscape, organizations often rely on a mix of network operating systems (NOS) and vendors to meet their specific requirements. However, managing and supporting multi-vendor NOS environments poses unique challenges that can be streamlined with specialized NetOps Support Frameworks. Multi-vendor NOS integration in NetOps Support Frameworks requires an understanding of interoperability challenges and the need for standardized management frameworks. For a seamless multi-vendor NOS support, vendor-agnostic network monitoring and management are primarily needed for:
1. Consolidated monitoring of dashboards for heterogeneous network devices
2. Integration with various NOS APIs for unified device management
3. Leveraging standardized protocols (for example SNMP, NETCONF, RESTful APIs) for device communication
4. Managing and troubleshooting cross-vendor faults:
a. Correlation of alerts and events from different NOS vendors
b. Centralized incident management and ticketing system
c. Collaboration with vendor support teams for issue resolution
5. Change management and configuration:
a. Standardized configuration templates for different NOS vendors
b. Integration with configuration management databases (CMDB)
c. Change tracking and rollback mechanisms for multi-vendor environments
6. Performance optimization and traffic engineering:
a. Bandwidth allocation and optimization across diverse NOS platforms
b. QoS implementation for consistent performance across vendors
c. Traffic engineering strategies for load balancing and optimization
Importance of Service Level Agreements (SLAs)
In network infrastructure support, SLAs define the agreed-upon expectations/responsibilities between service providers, like Aviz Networks, and their customers. These SLAs outline key performance indicators such as service availability, response times, and other parameters.
Therefore, these play a vital role in ensuring that the network meets desired service levels and provides a satisfactory user experience. Let’s deep dive into more details:
- KPIs: SLAs outline multiple KPIs such as network availability, packet loss, latency, throughput, and response times. By benchmarking the metrics, SLAs provide a quantifiable means for evaluating the performance of network infrastructure as well as service provider.
- Network Availability: SLAs specify the expected level of network availability, typically expressed as a percentage of uptime over a given period. This metric indicates how often the network should be operational and accessible to users. It also ensures the accountability of a network service provider for maintaining a reliable and continuously available network infrastructure.
- Response and Resolution Times: SLAs often include response and resolution time commitments for network incidents or service requests. The response time defines how quickly the service provider should acknowledge and respond to reported issues. The resolution time sets expectations about the time required to restore the network service to its normal functioning state.
- Downtime and Maintenance Windows: Another benefit of such agreements is the provision for scheduled maintenance windows during which network services may be unavailable temporarily. By establishing a clear schedule and notifying customers in advance, SLAs help manage expectations and minimize service disruptions.
- Escalation Procedures: SLAs outline escalation procedures to follow in case of critical incidents or service disruptions. This ensures that prompt actions are taken to address the issue and involve higher-level support or management, if necessary.
- Remedies and Compensation: SLAs include provisions for remedies in the form of service credits, discounts, or other types of compensation to mitigate the impact of service disruptions/failures caused by the service providers.
- Reporting and Review: Lastly, these agreements usually include reporting mechanisms to track and communicate network performance against the agreed-upon metrics. Regular performance reports and service reviews enable both parties to assess the network’s performance, identify areas for improvement, and ensure transparency and accountability.
Benefits of SLAs in NetOps Support
- Improved Operational Efficiency:
a. Streamlined management processes for diverse NOS platforms
b. Reduced complexity and overhead associated with managing multiple vendors
c. Centralized visibility and control over the entire network infrastructure - Enhanced Network Resilience and Performance:
a. Rapid fault detection and resolution across different NOS environments
b. Optimal utilization of network resources through unified performance optimization strategies
c. Consistent security measures and compliance enforcement across vendors - Customer Satisfaction and Business Continuity:
a. Adherence to SLAs for ensuring service reliability and customer satisfaction
b. Minimized downtime and faster incident resolution through SLA-driven support processes
c. Risk mitigation associated with multi-vendor environments
Introducing ONES (Open Networking Enterprise Suite)
ONES from Aviz Networks is a network observability/visibility, orchestration, and assurance solution for network switches running SONiC and vendor-proprietary NOS (Network Operating System).
ONES provides a one-stop solution, right from providing better visibility into your data center networks to extending 24×7 support function for SONiC. It also hosts a powerful analytics engine that provides Proactive, Predictive, and Prescriptive Analysis of common network anomalies and disruptions.
The key capabilities of ONES include:
- Purpose-built solution for SONiC deployments
- Supports multiple NOS for comprehensive visibility
- Orchestration and deep telemetry for observability
- 24×7 enterprise-grade support options for SONiC
ONES – Value and Beyond
Monitor your entire multi-NOS fabric
Manage inventory of your network devices running any Network OS on Broadcom, Marvell, Nvidia, and other leading ASICs
View topology of the entire fabric across multiple hardware platforms, and network operating systems
Monitor traffic, system health, bandwidth utilization, and more between and across devices
Configure your SONiC fabric with ease
Create and configure CLOS topology for ToR, Leaf, Spine, and Super-spine layers
Apply and validate configurations pre- and post-deployment
Compare running configs against applied configs at any point
Upgrade devices with a single-click via ZTP or custom NOS Images
NetOps Simplified
Proactively track Switch CPU/memory consumption, bandwidth, link failures, traffic errors, and more
Instantly connect to individual devices for maintenance and quick troubleshooting
Collaborate across your teams and with our SONiC experts to solve issues more efficiently
Traditional Network Orchestration tools have evolved from just delivering and monitoring network functions for proprietary NOS to designing and building network fabrics in an automated and intent-based approach.
ONES takes the Orchestration journey to the next level—adding capabilities from SONiC NOS across a fleet of multi-vendor and multi-ASIC switches, bringing together capabilities of streaming telemetry, API programmability, network control, intent-based fabric configuration, and SLA assurance for supportability.
Predictive failure/health analytics and capacity planning enable Orchestration tools (like ONES) to provide a seamless adoption journey for SONiC by leveraging historical trends of resource utilization, traffic patterns, logs/events, and derived application/workload performance.
Supportability, a crucial feature of Network Orchestration tools, goes beyond just notifying and alerting. It also enables integration with IT tools/engine to check anomalies or events correlation using real-time or historical data, single-touch management, and in turn, simplify switch/fabric onboarding for scale.
With the rapid adoption of open-source SONiC, ONES has emerged as a one-stop solution for network infrastructure teams. It seamlessly enables orchestration, deep telemetry, and assurance for multi-vendor deployments. Most importantly, the 24×7 SRE support enables them to introduce SONiC in their networks with utmost confidence.
Author:
Arakkal Kunju Mohammed Yasser, Director of Engineering, Site Reliability Engineering