Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Analyzing Data with Power BI and Power Pivot for Excel (Alberto Ferrari, Marco Russo) (z-lib.org).pdf
Скачиваний:
11
Добавлен:
14.08.2022
Размер:
18.87 Mб
Скачать

FIGURE 5-11 The grand total shows an incorrect number if you only set bidirectional filtering on.

The reason the grand total is different for the NumOfBuyingCustomersCorrectCrossFilter measure is that the Sales table is not filtered at the grand total. Consequently, the engine has no filter to propagate to Customer. If, instead, you use the full bidirectional pattern with the Sales table as a filter, then the filter is always applied and shows only the customers who appear somewhere in Sales. Because of this, the CROSSFILTER version performs better when no filter needs to be applied and, from a performance point of view, is the preferred one. The difference between the two calculations becomes evident only if there are multiple versions of the customer in the current selection.

By their very nature, SCDs change slowly. Therefore, multiple versions of the same customer are not generally hit by a given selection. Still, this might happen if the selection is large enough. For example, many years of data are likely to contain several versions of the same customer.

It is very useful to learn how to spot these subtle differences between computing the number of customers and the number of versions. Understanding these small details will aid greatly in your data-modeling career and help you identify when a number or total is incorrect.

Loading slowly changing dimensions

This chapter outlines the use of the Power BI Desktop Query Editor to load an SCD. SCDs might not always be present in your original data model, but there

may be times when you need to introduce them in a specific model that you are working on. For example, in the demo database we are using in this chapter, the original model does not contain an SCD. However, you will need to load an SCD to track the original and historical sales manager, a piece of information that is not present in the original data warehouse.

To convey the challenges in handling SCDs, we must revive an important topic that we introduced in Chapter 1, “Introduction to data modeling”: granularity. The presence of an SCD changes the granularity of both the dimension and the fact table.

Without SCDs, the granularity of the facts in the demo database is only at the customer level. When you introduce an SCD, the granularity increases to each customer version. Different versions of the same customer must be linked with different sales, depending on when the sale occurred.

Changing the granularity involves several actions and details to build the correct model. You will also need to change the query of both the dimension and the fact table so their granularity matches. You cannot update the granularity of one table without updating the granularity of the other one, too. Otherwise, the relationship will not work correctly.

Let us start by analyzing the scenario. The database has a Customer table that is not an SCD. It also has a CountryManagers table that contains the sales manager from each country or region with his or her start and end year. The sales manager for a country or region is not always the same for each year. However, because the sales manager for a country or region does not always change annually, we do not want to overly increase granularity to the level of customer/year because this would create unnecessary duplicates of some customers. In this scenario, our ideal granularity falls somewhere between customer (which is too low to account for changing managers) and customer/year (which is too high to account for the years in which the manager remained the same). This granularity depends on how many times the manager of the customer’s country or region changed.

Let us start by finding the correct granularity. To perform this step, you will first build the worst-case granularity. Then you will determine what the correct granularity is. Figure 5-12 shows the original table, which contains the sales managers for various countries or regions.

FIGURE 5-12 The CountryManagers table contains the columns FromYear and ToYear to indicate when the sales manager for each country or region was on duty.

To find the right granularity, you will change this model to a simpler one that contains the country or region, the sales manager, and the year by replacing the FromYear and ToYear columns with a single column that indicates the year only. By doing so, you will increase the number of rows. Many of the rows will show the same sales manager for several years. (We will cover removing those extra rows in a moment.)

First, add a new column in the table that contains the list of years that are included between FromYear and ToYear, using the List.Numbers function, as shown in Figure 5-13.

FIGURE 5-13 The Year column lists the years between FromYear and ToYear.

Figure 5-13 shows both the column, visible only as List in the user interface, and the column’s content, which you can see in the Query Editor by clicking the cell. You can see that Paul was the manager in the United Kingdom from 2007 to 2010; thus, the list contains the three years 2007, 2008, and 2009.

Now that you have produced the list of years, you can expand the list by generating one row for each element of the list. You can also remove the FromYear and ToYear columns, which are now useless. This obtains the result shown in Figure 5-14.

FIGURE 5-14 In this table, United Kingdom now appears three times with the same manager.

This table now contains the worst-case granularity for the country or region, with one version for each year. Many rows will show the same value for the same country, differing only in the year. However, this table is still useful, because you will use it as a lookup when changing the granularity of the fact table. Because the table contains the historical country or region sales manager, save it under the name Historical Country Managers.

The second table you need is one that contains the actual country or region sales manager. This is somewhat easy to build, if we start from the Historical Country Managers table. You simply need to group the historical country or region sales managers by CountryRegion and Manager, which results in the distinct pairs of CountryRegion and Manager. During the grouping, you use MAX to aggregate the year to obtain the last year the sales manager was on duty for the given country or region. As shown in Figure 5-15, United Kingdom is now represented with a single row.

FIGURE 5-15 After the grouping, the cardinality is now correct.

This table contains the distinct pairs of CountryRegion and Manager, along with the last year when the sales manager was on duty for that country or region. To transform this table into a table that contains only the current sales manager, it is enough to filter out rows that do not contain the value for the current year in the LastYear column. (In this example, the “current year” is 2009, which is the last year for which we have data in the set.) Figure 5-16 shows you the result of this second query, which we named Actual Country Managers.

FIGURE 5-16 Actual Country Managers table contains only the last and current sales manager for each country or region.

At this point, you have the following two tables:

Actual Country Managers This contains the current sales manager for each