23 July 2024

In February, we introduced the ONE Data Lake as part of our ONES 2.1 release, highlighting its integration capabilities with Splunk and AWS. In this blog post, we’ll delve into how the Data Lake integrates specifically with the S3 bucket of AWS.

A data lake functions as a centralized repository designed to store vast amounts of structured, semi-structured, and unstructured data on a large scale. These repositories are typically constructed using scalable, distributed, cloud-based storage systems such as Amazon S3, Azure Data Lake Storage, or Google Cloud Storage. A key advantage of a data lake is its ability to manage large volumes of data from various sources, providing a unified storage solution that facilitates data exploration, analytics, and informed decision-making.

Aviz ONE-Data Lake acts as a platform that enables the migration of on-premises network data to cloud storage. It includes metrics that capture operational data across the network’s control plane, data plane, system, platform, and traffic. As an enhanced version of the Aviz Open Networking Enterprise Suite (ONES), ONE-Data Lake stores the metrics previously used in ONES in the cloud.

Why AWS S3?

Amazon S3 (Simple Storage Service) is often used as a core component of a data lake architecture, where it stores structured, semi-structured, and unstructured data. This enables comprehensive data analytics and exploration across diverse data sources. S3 is widely used for several reasons:

S3 integrates seamlessly with a wide range of AWS services and third-party tools, significantly enhancing data processing, analytics, and machine learning workflows. This seamless integration allows for efficient data ingestion, real-time analytics, advanced data processing, and robust machine learning model training and deployment, creating a powerful and cohesive ecosystem for comprehensive data management and analysis.

S3 is engineered for complete durability, ensuring that your data is exceptionally safe and consistently accessible. This level of durability is achieved through advanced data replication across multiple geographically dispersed locations, providing robust protection against data loss and guaranteeing high availability.

S3 offers comprehensive security and compliance capabilities, providing a robust framework for safeguarding data and ensuring regulatory adherence. This includes advanced data encryption, both at rest and in transit, ensuring that sensitive information remains protected throughout its lifecycle. Additionally, S3 provides granular access management tools, such as AWS Identity and Access Management (IAM), bucket policies, and access control lists (ACLs), allowing fine-tuned control over who can access and modify data. These features, combined with compliance certifications for various industry standards (such as GDPR, HIPAA, and SOC), make S3 a secure and reliable choice for data storage in highly regulated environments.

S3’s capability to handle virtually unlimited amounts of data makes it an unparalleled choice for building and maintaining expansive data lakes that require storing massive volumes of information. This scalability empowers organizations to seamlessly scale their storage needs without upfront investments in infrastructure, accommodating growing data demands effortlessly. This capability is crucial for enterprises seeking to centralize and manage diverse data types, enabling advanced analytics, machine learning, and other data-driven initiatives with agility and reliability.

S3 provides flexible pricing models and a variety of storage classes to optimize costs based on data access patterns. Users can take advantage of storage classes like S3 Standard, S3 Intelligent-Tiering, S3 Standard-IA, S3 One Zone-IA, and S3 Glacier to manage expenses efficiently.

S3 offers robust data management capabilities, including versioning, lifecycle policies, and replication, which streamline data governance and archival processes. These features ensure data integrity, compliance, and resilience across various use cases, from regulatory compliance to disaster recovery planning. This capability empowers businesses to unlock the full potential of their data assets, supporting diverse applications such as predictive analytics, business intelligence, and real-time reporting with ease and efficiency.

S3’s robust features, including cross-region replication and lifecycle policies, establish it as an exceptional solution for disaster recovery strategies, ensuring data redundancy and resilience. Furthermore, S3’s lifecycle policies enable automated management of data throughout its lifecycle, facilitating seamless transitions between storage tiers and automated deletion of outdated or unnecessary data. Together, these features make S3 a reliable backup solution that enhances data durability and availability, providing organizations with peace of mind knowing their critical data is securely stored and accessible even in unforeseen circumstances.

Integrating S3 with ONES:

Steps involved to integrate the S3 cloud service with ONES,

To integrate the S3 service with ONES, follow these steps:

ARN Role: The unique identifier for the role that grants permissions to access specific AWS resources, including S3 buckets
Region: The AWS region where your S3 bucket is located
Bucket Name: The globally unique name of your S3 bucket
External ID(Optional): An external ID is an additional security measure used when granting cross-account access to IAM roles.

By accurately providing these details, you can effectively configure and integrate the S3 service with ONES, facilitating smooth metric collection and analysis.

The cloud instance created within ONES offers several management options to enhance user experience and sustainability. Users can update the integration settings, pause and resume metric uploads to the cloud, and delete the created integration when needed. These features make it easy for users to maintain and manage their cloud endpoint integrations effectively.

The end user has the flexibility to select which metrics from their network monitored by ONES should be uploaded to the designated cloud service. This ONES 2.1 release supports various metrics, including Traffic Statistics, ASIC Capacity, Device Health, and Inventory. Administrators can choose and deselect metrics from the available list within these categories according to their preferences.

The metric update is not limited to any particular hardware or network operating system (NOS). ONE-Data Lake’s data collection capability extends across various network operating systems, including Cisco NX-OS, Arista AOS, SONiC, and Non-SONiC. Data streaming occurs via the gnmi process on SONiC-supported devices and through SNMP on OS from other vendors.

S3 Analytical capabilities:

Analyzing data stored in an S3 bucket can be accomplished through various methods, each leveraging different AWS services and tools. Here are some key methods:

AWS Athena:

Description: A serverless interactive query service that allows you to run SQL queries directly against data stored in S3.

Use Case: Ad-hoc querying, data exploration, and reporting.

Example: Querying log files, CSVs, JSON, or Parquet files stored in S3 without setting up a database.

AWS Glue:

Description: A managed ETL (Extract, Transform, Load) service that helps prepare and transform data for analytics.

Use Case: Data preparation, cleaning, and transformation.

Example: Cleaning raw data stored in S3 and transforming it into a more structured format for analysis.

AWS SageMaker:

Description: A fully managed service for building, training, and deploying machine learning models.

Use Case: Machine learning and predictive analytics.

Example: Training machine learning models using large datasets stored in S3 and deploying them for inference.

Third-Party Tools:

Description: Numerous third-party tools integrate with S3 to provide additional analytical capabilities.

Use Case: Specialized data analysis, data science, and machine learning.

Example:Using tools like Databricks, Snowflake, or Domo to analyze and visualize data stored in S3.

Custom Applications:

Description: Developing custom applications or scripts that use AWS SDKs to interact with S3.

Use Case: Tailored data processing and analysis.

Example: Writing Python scripts using the Boto3 library to process data in S3 and generate reports.

Conclusion:

Aviz ONE-Data Lake serves as the cloud-native iteration of ONES, facilitating the storage of network data in cloud repositories. It operates agnostically across various cloud platforms and facilitates data streaming from major network device manufacturers like Dell, Mellanox, Arista, and Cisco. Network administrators retain flexibility to define which metrics are transferred to the cloud endpoint, ensuring customized control over the data storage process.

Unlock the ONE-Data Lake experience— schedule a demo on your preferred date, and let us show you how it’s done!

SONiC

Network Visibility

AI Network Assistant

Networks for AI

AI for Networks

Platform Integrations

Latest Blog

On Demand ...

Why Partner with Us?

Latest Blog

Login to Partner Portal

Executives

Network Engineers

Network Operators

Procurement

Documentation

Validated Designs for SONiC

Help

Support

Technical Support

Why AWS S3?

Integrating S3 with ONES:

S3 Analytical capabilities:

AWS Athena:

AWS Glue:

AWS SageMaker:

Third-Party Tools:

Custom Applications:

Conclusion:

Products

Solutions

Company

Quick Links

Subscribe to Aviz latest updates

ONE Data Lake & AWS S3 – Enhancing data Management and Analytics – Part 2