Microsoft Fabric Dataflows Gen2: A Comprehensive Guide

Microsoft Fabric Dataflows Gen2 is a powerful self-service, cloud-based data preparation technology that allows you to efficiently transform and shape your data. In this blog post, we’ll explore what Dataflows Gen2 is, how to create and use it, and provide practical examples with code snippets.

What is Microsoft Fabric Dataflows Gen2?

Dataflows Gen2 is an integral part of Microsoft Fabric, designed to streamline data preparation tasks. It enables users to extract, transform, and load (ETL) data from various sources into a structured format suitable for downstream analytics, reporting, or machine learning. Here are some key features:

  • Self-Service: Dataflows Gen2 empowers business users and data engineers to create data pipelines without extensive coding knowledge.
  • Cloud-Based: Hosted in the cloud, it seamlessly integrates with other Azure services.
  • Data Transformation: Perform data transformations using a visual interface.
  • Data Profiling: Understand your data better with built-in profiling tools.
  • Data Orchestration: Easily incorporate dataflows into your data pipelines.

How to Create a Dataflow Solution

Let’s walk through the steps to create your first Dataflow Gen2 solution:

Prerequisites

Before you start, ensure you have the following:

  1. Microsoft Fabric Tenant Account: Sign up for a Microsoft Fabric account if you haven’t already.
  2. Microsoft Fabric-Enabled Workspace: Create a workspace within your Microsoft Fabric environment.

Creating a Dataflow

  1. Navigate to Your Workspace: Access your Microsoft Fabric workspace.
  2. Create a New Dataflow Gen2:
    • Click “New” and select “Dataflow Gen2.”
    • This opens the dataflow editor.

Getting Data

In our example, we’ll retrieve data from an OData service. Follow these steps:

  1. In the dataflow editor, click “Get data” and select “More.”
  2. Choose “Other > OData” as the data source.
  3. Enter the URL 1 for the OData service (e.g., Northwind sample data).
  4. Select the relevant tables (e.g., “Orders” and “Customers”) and create your dataflow.

Applying Transformations

Now that you’ve loaded data into your dataflow, let’s apply transformations:

  1. Enable Data Profiling Tools:
    • Navigate to Home > Options > Global Options.
    • Ensure the Data Profiling tools are enabled.
  2. Use Power Query Editor:
    • Within the “Orders” table, calculate the total number of orders per customer:
      • Select the “CustomerID” column and choose “Group By” under the Transform tab.
      • Perform a count of rows as the aggregation.
    • Combine data from the “Customers” table with the count of orders per customer:
      • Use the “Merge queries” transformation.
      • Configure the merge operation by matching the “CustomerID” column.

Publishing Your Dataflow

  1. Once transformations are complete, publish your dataflow.
  2. You can now incorporate this dataflow into your data pipelines.

Example: Including Dataflow in a Pipeline

Suppose you have an Azure Data Factory pipeline. To include your dataflow:

  1. Create a Pipeline:
    • Define your pipeline in Azure Data Factory.
  2. Add a Dataflow Activity:
    • Within the pipeline, add a Dataflow activity.
    • Select your previously created Dataflow Gen2.
    • Configure any additional settings (e.g., scheduling, dependencies).

Microsoft Fabric Dataflows Gen2 simplifies data preparation, making it accessible to a broader audience. Whether you’re transforming data for reporting, analytics, or machine learning, Dataflows Gen2 provides a user-friendly experience with powerful capabilities.

About Atul Divekar 28 Articles
Seasoned IT professional with more than decade years of extensive experience in IT service management. An Executive MBA graduate from IIMK and a certified PMP, I excel in infrastructure management, service delivery management, business operations, leadership, and people management. My track record showcases a proficiency in handling challenging engagements and successfully turning them around. I'm passionate about driving operational excellence and leveraging technology to enhance business outcomes. Let's connect to explore opportunities for collaborative success!