Introduction to Data Lake Microsoft Azure

calendar_month 14 May 2021

perm_contact_calendar Por: Zoluxiones

What is a Data Lake?

A data lake is a centralized storage repository that contains information from multiple sources and a large volume of data in a granular and raw format. The main benefit of a data lake is the centralization of content sources in a single repository for the exploitation of data into information for decision making. Once the content is in the data lake, it can be normalized and enriched. This can include metadata extraction, format conversion, augmentation, entity extraction, crosslinking, aggregation, de-normalization, or indexing.

What is a Microsoft Azure Data Lake?

It is a repository provisioned in the Azure ecosystem to store information from multiple sources in a simple, unprocessed format and then take that data as input for a transformation and take it to a data lake or datawarehouse (Logical Data Warehouse) where it will be analyzed and mined.

What is Azure Data Factory?

Azure Data Factory is the cloud-based ETL and data integration service that allows you to create data-driven workflows to orchestrate data movement and transform data to scale. With Azure Data Factory, you can create and schedule data-based workflows (called pipelines) that can ingest data from different data stores. It basically allows intake into the Azure ecosystem. In some cases when extracting from an on premise server it is necessary to first install or configure the Integration Runtime service that serves as a safe bridge for extraction.

What is Azure Blob Storage?

Azure Blob Storage is a massively scalable object storage for any kind of unstructured data and images (images, videos, audio, documents, etc.) in a simpler and more cost-effective way.

By: Anthony Campodónico

Leave a Reply

Your email address will not be published. Required fields are marked *