- •Contents at a glance
- •Contents
- •Introduction
- •Who this book is for
- •Assumptions about you
- •Organization of this book
- •Conventions
- •About the companion content
- •Acknowledgments
- •Errata and book support
- •We want to hear from you
- •Stay in touch
- •Chapter 1. Introduction to data modeling
- •Working with a single table
- •Introducing the data model
- •Introducing star schemas
- •Understanding the importance of naming objects
- •Conclusions
- •Chapter 2. Using header/detail tables
- •Introducing header/detail
- •Aggregating values from the header
- •Flattening header/detail
- •Conclusions
- •Chapter 3. Using multiple fact tables
- •Using denormalized fact tables
- •Filtering across dimensions
- •Understanding model ambiguity
- •Using orders and invoices
- •Calculating the total invoiced for the customer
- •Calculating the number of invoices that include the given order of the given customer
- •Calculating the amount of the order, if invoiced
- •Conclusions
- •Chapter 4. Working with date and time
- •Creating a date dimension
- •Understanding automatic time dimensions
- •Automatic time grouping in Excel
- •Automatic time grouping in Power BI Desktop
- •Using multiple date dimensions
- •Handling date and time
- •Time-intelligence calculations
- •Handling fiscal calendars
- •Computing with working days
- •Working days in a single country or region
- •Working with multiple countries or regions
- •Handling special periods of the year
- •Using non-overlapping periods
- •Periods relative to today
- •Using overlapping periods
- •Working with weekly calendars
- •Conclusions
- •Chapter 5. Tracking historical attributes
- •Introducing slowly changing dimensions
- •Using slowly changing dimensions
- •Loading slowly changing dimensions
- •Fixing granularity in the dimension
- •Fixing granularity in the fact table
- •Rapidly changing dimensions
- •Choosing the right modeling technique
- •Conclusions
- •Chapter 6. Using snapshots
- •Using data that you cannot aggregate over time
- •Aggregating snapshots
- •Understanding derived snapshots
- •Understanding the transition matrix
- •Conclusions
- •Chapter 7. Analyzing date and time intervals
- •Introduction to temporal data
- •Aggregating with simple intervals
- •Intervals crossing dates
- •Modeling working shifts and time shifting
- •Analyzing active events
- •Mixing different durations
- •Conclusions
- •Chapter 8. Many-to-many relationships
- •Introducing many-to-many relationships
- •Understanding the bidirectional pattern
- •Understanding non-additivity
- •Cascading many-to-many
- •Temporal many-to-many
- •Reallocating factors and percentages
- •Materializing many-to-many
- •Using the fact tables as a bridge
- •Performance considerations
- •Conclusions
- •Chapter 9. Working with different granularity
- •Introduction to granularity
- •Relationships at different granularity
- •Analyzing budget data
- •Using DAX code to move filters
- •Filtering through relationships
- •Hiding values at the wrong granularity
- •Allocating values at a higher granularity
- •Conclusions
- •Chapter 10. Segmentation data models
- •Computing multiple-column relationships
- •Computing static segmentation
- •Using dynamic segmentation
- •Understanding the power of calculated columns: ABC analysis
- •Conclusions
- •Chapter 11. Working with multiple currencies
- •Understanding different scenarios
- •Multiple source currencies, single reporting currency
- •Single source currency, multiple reporting currencies
- •Multiple source currencies, multiple reporting currencies
- •Conclusions
- •Appendix A. Data modeling 101
- •Tables
- •Data types
- •Relationships
- •Filtering and cross-filtering
- •Different types of models
- •Star schema
- •Snowflake schema
- •Models with bridge tables
- •Measures and additivity
- •Additive measures
- •Non-additive measures
- •Semi-additive measures
- •Index
- •Code Snippets
Sales[ProductName],
Sales[ColorName],
Sales[Manufacturer],
Sales[BrandName],
Sales[ProductCategoryName],
Sales[ProductSubcategoryName]
), ALL (
Purchases[ProductName],
Purchases[ColorName],
Purchases[Manufacturer],
Purchases[BrandName],
Purchases[ProductCategoryName],
Purchases[ProductSubcategoryName]
)
)
)
This calculated table performs two ALL operations on product columns from Sales and Purchases, reducing the number of columns and computing the distinct combinations of the required data. Then it uses UNION to merge them together. Finally, it uses DISTINCT to remove further duplicates, which are likely to appear because of the UNION function.
Note
The choice between using M or DAX code is entirely up to you and your personal taste. There are no relevant differences between the two solutions.
Once again, the correct solution to the model is to restore a star schema. This simple concept bears frequent repetition: Star schemas are good, but everything else might be bad. If you are facing a modeling problem, before doing anything else, ask yourself if you can rebuild your model to move toward a star schema. By doing this, you will likely go in the right direction.
Filtering across dimensions
In the previous example, you learned the basics of multiple dimension handling.
There, you had two over-denormalized dimensions and, to make the model a better one, you had to revert to a simpler star schema. In this next example, we analyze a different scenario, again using Sales and Purchases.
You want to analyze the purchases of only the products sold during a given period—or, more generally, the products that satisfy a given selection. You learned in the previous section that if you have two fact tables, the best way to model the scenario is to relate them to dimensions. That would give you the ability to use a single dimension to filter both. Thus, the starting scenario is the one shown in Figure 3-5.
FIGURE 3-5 In this model, two fact tables are related to two dimensions.
Using this model and two basic measures, you can easily build a report like the one shown in Figure 3-6, where you can see both the sales and purchases divided by brand and year.
FIGURE 3-6 In this simple star schema, sales and purchases divided by year and brand are computed easily.
A more difficult calculation is to show the number of purchases for only the products that are being sold. In other words, you want to use Sales as a filter to further refine the products so that any other filter imposed on sales (the date, for example) restricts the list of products for which you are computing purchases.
There are different approaches to handling this scenario. We will show you some of them and discuss the advantages and disadvantages of each solution.
If you have bidirectional filtering available in your tool (at the time of this writing, bidirectional filtering is available in Power BI and SQL Server Analysis Services, but not in Excel), you might be tempted to change the data model that enables bidirectional filtering on Sales versus Product so that you see only the products sold. Unfortunately, to perform this operation, you must disable the relationship between Product and Purchases, as shown in Figure 3-7. Otherwise, you would end up with an ambiguous model, and the engine would refuse to make all the relationships bidirectional.
FIGURE 3-7 To enable bidirectional filtering between the Sales and Product tables, you must disable the relationship between Product and Purchases.
Info
The DAX engine refuses to create any ambiguous model. You will learn more about ambiguous models in the next section.
If you follow the filtering options of this data model, you will quickly discover that it does not solve the problem. If you place a filter on the Date table, for example, the filter will propagate to Sales, then to Product (because bidirectional filtering is enabled), but it will stop there, without having the option of filtering Purchases. If you enable bidirectional filtering on Date, too, then the data model