The following post describes how to quickly extract useful information from a small data set using DAX, Tabular, and Power BI.
To get the most from your data it pays to quickly move though the discovery and basic data analysis phase of the project. This process will help you do just that. The following 7 steps are easy to follow and will help ensure you get the most from each data set.
Consider the following data about a recent trip. With only 7 data points it seems the data will be very limited. However it is very easy to extract over 15 measures from this small data set!
Step 1) Add Information to the Existing Data
Understand what is available and supplemental data in the initial data set.
Supplement the data with simple logic or use a tool like Alteryx to add new information.
Data added to the sample project:
- Country based on city
- Expense Group based on expense amount
- Month based on date
- Month Number for sorting
Step 2) Create Clarity and Consistency in the Data
Hierarchies add clarity instantly by showing the relationships in the data. Tools like Paxata can help with cleaning data quickly and developing hierarchies as you gain deeper understanding.
Clean hierarchy created in the sample project:
- "Expense Hierarchy" based on the type of expense
Step 3) Build the Building Blocks
Adding Sum and Count will be the start of almost every PowerPivot. These can be easily added using the quick function dropdown. These are the building blocks for future calculations. These will be available to be sliced by every other field (dimensional attribute)
Base measures created in the sample project:
- Sum of Expenses per Person
- Count of Expenses
Expenses per Person:=SUM([Cost/PP])
Step 4) Add Distinct Counts
The power of Distinct Count in tabular should not be overlooked. It’s hard to add too many of these because each can answer a different question.
Distinct measures created in the sample project:
- Distinct Count of Days
- Distinct Count of Expenses
- Distinct Count of Cities
- Distinct Count of Lodging
- Distinct Count of Activities
Step 5) Add in Averages
Comparing a set of averages is the next step to enable more effective analysis of the aggregated set.
Average measures created in the sample project:
- Average Expense per Day
- Average Stay Length
- Average Activities per Day
Avg Expenses per Day:=[Expenses per Person]/[Days Traveled]
Avg Stay Length:=[Days Traveled]/[Distinct Lodging Count]
Avg Activities per Day:=[Distinct Activity Count]/[Days Traveled]
Step 6) Use Basic Functions
There are several functions that are so easy to use and provide new insights
Functions used in the sample project:
- Min-- First date in City
- Max-- Last Date in City and Largest Expense
- RankX-- Rank of Expense with dense rank
Expense Name Rank:=RANKX(ALLSELECTED('Expenses'),[Expenses per Person],,,Dense)
Step 7) Create Calculations in Comparison, or in Relation, to Something Else
Adding comparison calculations will help identify outliers that can be acted upon
Comparisons used in the sample project:
- % of Total
% of Total Expenses:=[Expenses per Person]/(CALCULATE([Expenses per Person],ALLSELECTED(Expenses)))
As you compile your metrics and finish this process you might iterate back to an earlier step.
Note how expenses from major travel such as skews the data and needs to be excluded in several places.
I hope these steps give you confidence and the ability to get the most from your data.
By Evan Schmidt
Posted by Brian Pohl