Optimizing Data Center Networks: The Role of IP Clos Architecture and BGP Protocol
In contemporary data center networks, the IP Clos (IP-based Cloud Scale Networking) architecture is widely embraced for its ability to deliver a non-blocking high-bandwidth network fabric, low latency, and scalable connectivity between servers and switches, while ensuring fault tolerance. Central to the success of the IP Clos architecture is the utilization of the Border Gateway Protocol (BGP) as the routing protocol.
BGP stands out due to its features in traffic engineering, scalability, and adaptable routing, making it well-suited for the demands of modern data center environments. BGP is the protocol responsible for orchestrating internet routing, optimizing path selection through Autonomous Systems (AS), peering mechanisms, and configurable attributes.
The IP Clos network architecture is not a recent development, having been in existence for a decade and extensively implemented in large-scale data centers. It has been deployed using proprietary Network Operating Systems (NOS) and switches from specific vendors, showcasing its enduring relevance and effectiveness in meeting the evolving needs of data center networking
Key Features of SONiC That Make It Stand Out from Traditional NOS
Our Journey with SONiC
“When I joined the Aviz network, I was excited about its vision and approach to enabling IP Clos architecture through open source, any vendor, and any switch combination. Being an engineer, I am all praises for open source for three reasons. First, it’s free; second is community support. And the third reason is my ability to innovate and modify it to my needs without going through any red tape associated with a vendor-locked OS” said by Khurram Khani, VP of Customer Success, Aviz Networks
SONiC is the first true open source NOS (Network Operating System) that employs cutting-edge microservices based architecture far more capable than traditional network operating systems. SONiC has amassed a large ecosystem of developers not only in the community but also within a majority of Switch/ASIC vendors who embrace the technology due to sheer customer demand.
The Power of Open Source: Optimizing Production Network SLAs
With the great power of Open Source comes greater responsibility on the shoulders of the network operation teams. SONiC being an open-source NOS (network operating system) comes under the same Service Level Agreements (SLAs) that guarantee a certain level of performance, availability, uptime, and latency. One of our initial goals was to ensure that our customers are protected and we put all the right processes and automation in place and execute well-rounded SLAs.
How has Aviz Networks addressed concerns about SONiC’s ability to support Day 2 operations?
We met several customers who shared their myths about SONiC being an open source. The most common one is—will it be resilient, scalable, and have the necessary support compared to the decades-old proprietary vendor switches? Well, Aviz Networks with its expertise in networking, puts all the concerns to rest.
At Aviz, we have helped Fortune 500 companies with their journey around SONiC deployment at each step—right from vendor selection, use-cases validation, pre-staging, staging to production Day 2 support. The fact is SONiC has also already been successfully deployed at nearly all hyper scalers including Microsoft Azure, Alibaba, Tencent, Baidu, Google, and Meta have joined the board.
A Guide to Efficient Day 2 Network Operations: Monitoring and Maintenance
Day 2 network operation refers to ongoing monitoring, planned and unplanned maintenance, and operation efficiency. This covers all the activities after the network goes into production and is live.
Here’s an example of one of the Fortune 500 companies (One of the game developer companies in the US) we worked with. This company’s data center was designed around IP Clos and BGP. As part of their ongoing BGP Network maintenance, the company has brought up their expectations and requirements about SONiC Day 2 BGP operations tasks.
The customer asked us to certify SONiC’s BGP Day 2 Operations capabilities around:
-
- BGP Node Maintenance – Customer expectation was to gracefully take a node out of service without any impact on the existing network and data traffic and reconverge the network quickly
-
- BGP Link Maintenance – The next requirement is to take a link out of service without any impact on the existing network and data traffic and reconverge the network quickly
- Network SLA – The other ones was assurance and guaranteed SLA around network re-convergence.
Day 2 BGP operation: Benefits of Using Community List and Route-Map for Node and Link Drain
BGP (Border Gateway Protocol) is often used for node and link drain as it provides a mechanism for the controlled removal of routes from a network. This helps to manage rerouting in a controlled manner when a node or link is being drained.
BGP can be used to gradually decrease the amount of traffic flowing through that node or link, while ensuring that the remaining traffic is still able to reach its destination. This is accomplished by updating BGP routing tables to reflect the new topology of the network.
Aviz Networks Ensures Smooth BGP Node Removal on SONiC BGP switch
Aviz Networks has built industry-leading automation around SONiC BGP node drain process validation. The BGP nodes are gracefully taken out of the network without any disruption to traffic. Aviz FTAS Automation also ensures that the network converges within SLA and has zero traffic loss.
Our team performs the following automated steps during BGP node drain validation on SONiC switches
-
- BGP Community list
The BGP Community list is used to tag routes. It includes a community value that can be used to identify routes that will be redirected.
The “no-advertise” community is used in scenarios where a BGP node receives a route that it should not advertise to other BGP peers. We, at Aviz Networks, uses a “no-advertise” community during validation when it’s performing graceful removal of a BGP node or link
-
- BGP Route-map
Route-map is used to match routes that will be redirected. This route-map should match the community value that was added to the routes.
Example of Node/ Link Drain Config on SONiC Router
route-map drain-community permit 10
on-match next
set community no-advertise
set ipv6 next-hop prefer-global
exit
router bgp <AS #>
address-family ipv4 unicast
neighbor v4server route-map drain-community in
address-family ipv6 unicast
neighbor v6server route-map drain-community in
end
Finally, the physical connectivity of the node can be removed. This is accomplished by shutting down the router, link or taking out the physical link
Aviz FTAS ensures BGP Day 2 Operations meet Reconvergence SLA with Zero Traffic Loss
Aviz FTAS certifies BGP Day 2 operation and reconvergence times. This involves measuring the performance of BGP reconvergence time, zero traffic loss, identifying any deviations or breaches, and taking corrective action if required
How to Perform Node Drain Using AS Path Prepend in BGP
AS Path is a BGP attribute used to identify the sequence of Autonomous System (AS) numbers that a BGP route has traversed on its path to reach the target SONiC BGP router. When a BGP router sends an update, it appends its own AS number to the existing AS Path called AS Path prepending. When the router sees its own AS number in the route, it discards that route. If a destination has two paths, then the path with the lowest AS Path length is chosen.
“set as-path prepend last-as <no. of times to insert>” lets users insert the last ASN. Inserting last-as 10 times would eventually influence the router to choose another available Path.
Example FTAS Configuration:
As part of the Fabric Test Automation Suite (FTAS) by Aviz Networks, we rigorously test configurations such as the one below to ensure robust functionality, combining the reliability of SONiC’s CLI with advanced testing methodologies:
route-map as-prepend permit 10
set as-path prepend last-as 10
exit
router bgp <ASN>
address-family ipv4 unicast
neighbor v4server route-map as-prepend in
address-family ipv6 unicast
neighbor v4server route-map as-prepend in
end
This sample configuration, involving route-map manipulation for AS-path prepending in BGP, is meticulously tested to guarantee the suite’s effectiveness in maintaining consistency and robustness in network operations.
How to Monitor BGP Sessions with Aviz ONES App
Periodic monitoring of BGP sessions between routers is critical to ensure that the sessions are established and maintained properly. This involves checking the status of BGP neighbors, monitoring BGP messages, and verifying that the expected routes are being exchanged.
Analyzing the BGP route advertisements received from neighboring routers helpt to proactively identify any anomalies, such as unexpected route flapping or neighbor reset. With Aviz ONES App Monitoring the BGP routing table can help detect issues and make necessary adjustments to the routing policies.
Empowering Modern Network Operations: Aviz Networks’ SONiC BGP Expertise
In conclusion, Aviz Networks has proven SONiC BGP capabilities to effectively manage Network Day 2 operations. We have proven SONiC’s ability to handle complex BGP network topologies, BGP route maps, as-path and policy-based routing. SONiC is a reliable choice for managing and operating large-scale IP CLOS BGP networks. With its widespread adoption with hyper scalers and Fortune 500 companies, it has demonstrated its capability to handle the evolving demands of modern network operations, flexible routing, reliability, efficient connectivity, and scalability.
If you have inquiries regarding SONiC BGP or any other features, don’t hesitate to contact us. Our team is eager to engage with you at your convenience.