
Microsoft Fabric Dataflows Gen2 is a powerful self-service, cloud-based data preparation technology that allows you to efficiently transform and shape your data. In this blog post, we’ll explore what Dataflows Gen2 is, how to create and use it, and provide practical examples with code snippets.
What is Microsoft Fabric Dataflows Gen2?
Dataflows Gen2 is an integral part of Microsoft Fabric, designed to streamline data preparation tasks. It enables users to extract, transform, and load (ETL) data from various sources into a structured format suitable for downstream analytics, reporting, or machine learning. Here are some key features:
- Self-Service: Dataflows Gen2 empowers business users and data engineers to create data pipelines without extensive coding knowledge.
- Cloud-Based: Hosted in the cloud, it seamlessly integrates with other Azure services.
- Data Transformation: Perform data transformations using a visual interface.
- Data Profiling: Understand your data better with built-in profiling tools.
- Data Orchestration: Easily incorporate dataflows into your data pipelines.
How to Create a Dataflow Solution
Let’s walk through the steps to create your first Dataflow Gen2 solution:
Prerequisites
Before you start, ensure you have the following:
- Microsoft Fabric Tenant Account: Sign up for a Microsoft Fabric account if you haven’t already.
- Microsoft Fabric-Enabled Workspace: Create a workspace within your Microsoft Fabric environment.
Creating a Dataflow
- Navigate to Your Workspace: Access your Microsoft Fabric workspace.
- Create a New Dataflow Gen2:
- Click “New” and select “Dataflow Gen2.”
- This opens the dataflow editor.
Getting Data
In our example, we’ll retrieve data from an OData service. Follow these steps:
- In the dataflow editor, click “Get data” and select “More.”
- Choose “Other > OData” as the data source.
- Enter the URL 1 for the OData service (e.g., Northwind sample data).
- Select the relevant tables (e.g., “Orders” and “Customers”) and create your dataflow.
Applying Transformations
Now that you’ve loaded data into your dataflow, let’s apply transformations:
- Enable Data Profiling Tools:
- Navigate to Home > Options > Global Options.
- Ensure the Data Profiling tools are enabled.
- Use Power Query Editor:
- Within the “Orders” table, calculate the total number of orders per customer:
- Select the “CustomerID” column and choose “Group By” under the Transform tab.
- Perform a count of rows as the aggregation.
- Combine data from the “Customers” table with the count of orders per customer:
- Use the “Merge queries” transformation.
- Configure the merge operation by matching the “CustomerID” column.
- Within the “Orders” table, calculate the total number of orders per customer:
Publishing Your Dataflow
- Once transformations are complete, publish your dataflow.
- You can now incorporate this dataflow into your data pipelines.
Example: Including Dataflow in a Pipeline
Suppose you have an Azure Data Factory pipeline. To include your dataflow:
- Create a Pipeline:
- Define your pipeline in Azure Data Factory.
- Add a Dataflow Activity:
- Within the pipeline, add a Dataflow activity.
- Select your previously created Dataflow Gen2.
- Configure any additional settings (e.g., scheduling, dependencies).
Microsoft Fabric Dataflows Gen2 simplifies data preparation, making it accessible to a broader audience. Whether you’re transforming data for reporting, analytics, or machine learning, Dataflows Gen2 provides a user-friendly experience with powerful capabilities.