Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Power Query Cookbook Use effective and powerful queries in Power BI Desktop and Dataflows to prepare and transform your data (Janicijevic, Andrea) (z-lib.org).pdf
Скачиваний:
143
Добавлен:
14.08.2022
Размер:
25.9 Mб
Скачать

298 Adding Value to Your Data

The example you saw is just one of many that you can perform using custom functions. The idea of this feature is to create something that can be reused with multiple queries and allow you to simplify, accelerate, and easily maintain your data transformation steps.

Clustering values

Data often comes from original data sources in many forms and you can end up having multiple variations of the same value where you need to have a unique value. In this case, you need a way to group and correct these values quickly, without creating complex rules or doing it manually. In this recipe, we will see how to leverage the clustering values feature, which enables you to automatically group data based on similarities thanks to an underlying algorithm.

Getting ready

For this recipe, you need to have access to the Power BI portal, for which a Power BI Pro license is needed. You also need to have access to a workspace.

How to do it…

After you log in to the Power BI portal, perform the following steps:

1. Browse to your workspace, click on New, and click on Dataflow.

Figure 8.52 – Creating a dataflow

Clustering values 299

2.Click on Add new tables in order to connect to a data source and access the Power Query online UI.

Figure 8.53 – Add new tables

3.Select the Web API connector and enter the following URL to connect to a CSV file loaded in the Power Query Cookbook GitHub repository, https://github.com/

PacktPublishing/Power-Query-Cookbook/blob/main/Chapter08/ SalesData.csv, and then click on Next to see the data preview.

Figure 8.54 – Web API connector

300Adding Value to Your Data

4.You will now see a data preview with the File origin, Delimiter, and Data type detection options automatically detected. Click on Transform data to access the Power Query UI.

Figure 8.55 – Data preview

5.Now have a look at the Country column and see how countries' names are not spelled in the same way. Imagine you want to have homogeneous names and to correct capital letters and other spelling issues. In this case, we will use the Cluster values feature to create these clusters. Browse to Add column and click on the

Cluster values button.

Figure 8.56 – Cluster values button

Clustering values 301

Select Country for Column and define a new column called Country_ corrected, which will contain the corrected values. Define a similarity threshold of 0.7 (the default is 0.8), which indicates how two similar values should be to be clustered together (for example, United States and United States of America). Ignore case and Group by combining text parts will be flagged by default. In this case, leave the default options flagged and also flag Show similarity scores, and then click on OK.

Figure 8.57 – Cluster values window

302Adding Value to Your Data

6.After you have clicked OK, you will see two newly added columns, one with the clustered values, Country_corrected, and one with score similarities,

Country_Country_corrected_Similarity. You can observe how values were grouped according to similarities identified by the algorithm.

Figure 8.58 – Clusters column

With the Cluster values feature, you have the possibility to quickly correct values within a column by using a fuzzy matching algorithm built into Power Query online. You can think of different scenarios to correct your data and get the most out of this feature.

9

Performance Tuning with Power BI

Dataflows

We already had the chance to see in detail how Power Query works in its Desktop version, where you can perform data preparation and transformations and save all of it in a Power BI Desktop file with the .pbix extension. But what if you would like to reuse Power Query transformations done by others but cannot retrieve the .pbix file? Or what if you want to store them somewhere accessible to multiple users? You can do that, thanks to the Power Query online version that is integrated with the Power BI Dataflows feature, accessible via the Power BI Portal.

In this chapter, we will see how to create, configure, and consume dataflows by exploring the following recipes:

Using Power BI dataflows

Centralizing ELT with dataflows

Building dataflows with Power BI Premium capabilities

Understanding dataflow best practices