Tabular Cubes are quickly becoming one of the most popular ways to develop data models to support Business Intelligence. They can be developed quickly; building a full model with a basic set of measures can take as little as half an hour. However, they still offer a great deal of power and flexibility, through their calculation language – DAX.
In order to go beyond simple aggregations like sums and averages, we have to have a strong understanding of DAX, but that understanding is well rewarded. Many of the most interesting and useful DAX design patterns make use of the “X” functions – COUNTX, SUMX, AVERAGEX and several others. In this article, I’ll give an overview of how these functions operate, as well as several useful examples showing off their functionality.
The most important step to understanding these functions is understanding contexts in DAX – what they are, how they interact, and when it is appropriate to create new contexts. There are three types of context in a DAX formula – query context, filter context, and row context. We go over the different contexts and how they are defined in our last blog, Row Context, Nested Functions, and EARLIER(), but here's an overview: The query context comes from the end user tool, for instance a Pivot Table. If a cell in a pivot corresponds to the Country “USA” and the month “December 2014” then the calculations in this context will take place inside a query context defined by those two filters. Unless an override is used, all of the tables in the model will be filtered by this query context when calculating the cell.
Filter context operates similarly to query context, in that it will be applied as a blanket rule to some portion of the calculation, but can be defined within the model, rather than by the end user. For the purposes of this discussion however, the most interesting context is row context. This is the context created by a single row of a table in the model, and is the type of context generated by our X functions.
With contexts in mind, the fundamental operation these X functions do is to create a series of row contexts, one for each row in the given table. Let’s look at the signature for the SUMX function, to get an idea of what this means:
SUMX takes only two arguments – a table, and an expression. The table could either be a named table in the data model enclosed in single quotes (e.g. ‘Sales Order Details’) or any DAX expression which returns a table. For now, we will stick with the first option, but the second will become very important in our more complex patterns later. Note that even if we use a named table – we are not necessarily calculating based on the entire table! If we are calculating a cell in a pivot, then there will be a query context, which will filter the table before it ever gets passed to SUMX. This is key to understanding the power behind the X functions.
Now, we come to the second argument, the expression. This is where the row context comes into play. SUMX will take the table (remember, it’s been filtered by the query context) then for each row in that table, it will create a new row context. Within that row context it will evaluate the expression we give it as a second argument. As a simple example, let’s look at a calculation we could do with a calculated column, and instead use an X function – a weighted average.
Suppose my table is ‘Assignments’ and my columns are Student ID, Assignment ID, Score, and Weight, where score is a number 0-100 for each assignment. We might have homework assignments with a weight of 1, quizzes with a weight of 2, and tests with a weight of 5 (if we were feeling generous towards our students, that is!) Now I want to know how well my class is doing overall. If I simply compute AVERAGE([Score]), homework, quizzes, and tests will all be treated identically – and we’re not feeling that generous! Instead, we compute:
This takes each row in the assignment table, creates a row context – in which [Score] and [Weight] are single values – multiplies them, then averages the result. And we have the weighted average of our student’s scores. Now, we could have simply created a calculated column for [Score]*[Weight] in our data-set, either in the SQL Source, or the tabular model itself, so for this application, we didn’t need to use an X function. The power however, is in the fact that a calculated column is computed at process time, but measures are computed dynamically – in other words, they will respond to changes in the query context, unlike a calculated column.
This brings us to our first advanced pattern. Using the X functions, we can emulate a dynamic calculated column, which is query context aware. Consider the following problem. Suppose we have a data model with financial information on various partners. Then suppose we want to count the number of partners with over $500,000 in revenue – but we want this filter to be query context aware! That is, if our end user filters to France, and 2014-Q2, then we want to count only those partners who had over $500,000 in revenue – in France in 2014-Q2. But we do not know which filters the end user will decide on, so our count must dynamically determine the revenue.
For the sake of the example, suppose we have a [Partner Revenue] measure, which slices by our partners, who are themselves stored in the table ‘Partners’. Now, for each row of partners, we want to calculate the measure, compare it to $500,000, then decide whether or not to include the partner in the result or not based on the result. Clearly, we will need to use an iterated row context – an X function.
For this example, we will use SUMX, with a nested IF statement. If the row meets our criteria, we will count it as a ‘1’, otherwise, a ‘0’, so that the sum will be a count of those partners that meet our criteria. Let’s look at the finished formula, then discuss each part in turn:
Remember, SUMX takes the partners table, then for reach row, it creates a row context. In this row context it evaluates the expression. Let’s look at that expression then – we start with the IF statement we described above. We then use by far the most important function in DAX – CALCULATE. This takes the current row context (created by SUMX) and transforms it into a filter context.
Now, when we calculate [Partner Revenue], there are two contexts being applied. First, there is the query context – e.g. the France, 2014-Q2 from above. This query context will filter any tables we are using to calculate revenue. Second, is the new filter context created by calculate – which is really just the row context! So we calculate, for each partner in the table, whether within the current query context they have $500,000 in revenue or not, then if they do, add a 1, otherwise add 0. Remember, this is why we can’t simply use a calculated column – we need the criteria to be applied within the query context for this application.
Of course, there is no reason we had to filter based on revenue, we could have used any other measure we had defined that sliced by partner, to create a powerful set of dynamic filtering conditions. Similarly, we could make the ‘cutoff’ value itself dynamic, changing based on the current query context. As a final example of how powerful these functions can be with only a line or two of code, I’ll leave you with a measure which uses MAXX to calculate the revenue of the top partner, as a percentage of the total revenue – answering the question “How top heavy is our revenue?” – within whatever filters the user decides to use:
DIVIDE(MAXX('Partner',[Partner Revenue]),[Partner Revenue],BLANK())
By: Reilly Miller
Posted by admin