Tableau Prep (2018.1.2)
Note: All the information in this post is based on the Tableau Prep (2018.1.2) that was released on June 12 ,2018. Functionality and options available might change as time progresses. Tableau Prep 2018.1.2 is now available to everyone with the Creator license or current maintenance. You can download it here.
Tableau Prep is a new product from Tableau designed to help the users combine, shape and clean their data for analysis. It’s seamlessly integrated with Tableau Analytical workflow, making it easy to go from data prep to analysis. A lot of our data needs some preparation before starting our analysis and visualization, or maybe we want to combine it with multiple data sources. These tasks can be done in tableau desktop itself, however it can become a bit messy with the changes we need to apply to data. The workbook performance may not be up to the mark because of lot of calculated fields and filters applied on data as part of data manipulation for analysis. Tableau Prep allows us to perform all these tasks in a much simpler way, to easily replicate the workflow in future and to share it with others.
Tableau prep works in a step by step basis where we connect to dataset(s) we want to prepare then use options to filter, rename, join, create and aggregate etc, and then verify the flow, and at the end output the data to Tableau extract or publish it to Tableau Server or Tableau Online. We can easily track, review and edit the steps independently in the work flow.
Tableau Prep Workspace
The Tableau Prep workspace consists of the Connections pane and three coordinated areas that help to interact and explore data.
Connections Pane: On the left side of the workspace is the connections pane which shows the databases and files you are connected to. Add connections to one or more databases and then drag the tables you want to work with into the Flow pane (To the right)
Flow pane: At the top of the workspace is Flow pane which is a visual representation of our operation steps as we prepare our data and aligns from left to right. This workflow explains what operation is performed and in what order and how data is being manipulated. Files can be dragged from connections pane to Flow pane to join, pivot, manipulate, aggregate data, and generate output files (. tde, .tds, and .hyper ) .These output files can be used in Tableau Desktop for further analysis and visualization.
Profile pane : In the center of the workspace is the Profile pane. The Profile pane shows you the structure of your data at any point in the flow. The structure of your data can be represented in different ways depending on the operation you want to perform on your data or the step that you select in the Flow pane. This is where you will do most of your data cleaning.
At the top of the Profile pane is a toolbar that shows you the cleaning operations that you can perform for each step in your flow. A drop-down menu also appears on each card in the Profile pane where you can select the different operations that you can perform on the data.
- Search, sort, and split fields
- Filter, include, or exclude values
- Find and fix null values
- Rename fields
- Clean up data entry errors using group and replace or quick cleaning operations
- Use automatic data parse to change data types
- Rearrange the order of your field columns by dragging and dropping them where you want them
Tableau Prep keeps track of any changes you make, in the order you make them, so you can always go back and review or edit those changes if needed. Use drag and drop to re-order those operations to experiment and apply changes in a different order.
Click the arrow on the upper right of the pane to expand and collapse the Changes pane for more room to work with the data in the Profile pane.
Data grid : At the bottom of the workspace is the Data grid, which shows you the row level detail in your data. The values displayed in the Data grid reflect the operations defined in the Profile pane. You can filter, keep only and exclude individual field values in this grid, and drag and drop fields to change the display order.
How to Use Tableau Prep
Use Case: For a Superstore to analyze product sales and profits over the last four years for the company using Tableau. For this Superstore data has been collected and tracked differently for each region. The data is entered differently for different regions for which lot of data cleaning is required.
Connecting to Data
Let us use the sample Superstore data files located here:
(Windows) C:\ProgramFiles\Tableau\TableauPrep<version>\help\Samples\en_US\Superstore Files
1. Sales data files for the different regions are stored in different formats, and the orders from the South are in multiple files. Since South has multiple files lets collate those files first.
2. On the Connections pane, click the Add connection button. We must create an input step to start a flow. We can include multiple input steps and multiple data files as required.
3. Since these files are .csv files, select Text files in Connections pane.
4. Navigate to the directory for superstore files and select the first file orders_south_2015.csv to add it to the flow.
Once the file is connected, the Tableau Prep workspace opens, and it is divided into two main sections. The Flow pane at the top and the Input pane at the bottom. In Tableau Flow Pane the first step is created.
Flow Pane: We can interact with Data visually and build flow. For single Tables Tableau Prep creates an input step when the file is added through connections or we can drag and drop the tables to add to the pane.
Input Pane: The Input pane contains configuration options about how the data is ingested. It also shows you the fields, data types, and sample values for your data set.
5. To add remaining 3 files, click on the Multiple files tab in input files section.
6. Select Wild Card Union Option.
To select multiple files in the same parent/child directory at once, wild card option is useful. However, the files should be with similar name and structure.
We can see that all files in the south folder are included by using wild card union option.
7. The files for the other regions are all single table files, so you can select all the files at once and add them to your flow. (Drag and drop to Flow pane / Use connections button to select from the folder as shown in step 2)
To prepare analysis -ready data, understanding the data is an important step to spot any issues. We can look over the data and make quick fixes in the input pane.
1. In the Flow pane, click the Orders_Central Input step to select it.
2. In the Input pane data fields are available and we can scroll down to spot issues.
Identified Issues Sample.
1. The order dates and ship dates are separated out into fields for month, day, and year.
2. Some of the fields have different data types than the same fields in other files.
Order date in Orders_Central file is number, however it’s Date& time in Orders_West file as shown below:
We need to clean data and fix issues. But we can’t do it in input step yet. Keep inspecting for other files as well and spot issues to make changes in later steps.
Cleaning & Shaping Data
In Tableau Prep, examining and cleaning data is an iterative process. To clean, combine and shape data we need to add steps in the flow. Different step types (clean, add, split, aggregate, merge, rename and so on) are used based on the operation required for the data.
Lets clean Orders_central
1. Select Orders_central file in the flow pane and click on the + symbol, a pop-up window opens with option as shown below. Click on Add Step.
When we add a cleaning step to the flow, the workspace changes and we can see the details of our data.
The Profile pane shows the structure of our data, summarizing the field values into bins so that we can quickly see related values and spot outliers and null values. This is where we will perform most of our cleaning tasks.
Adding new filed (Using calculated filed)
This data set is missing a field for Region. Since the other data sets have this field we will need to add it so that we can combine our data later. We will need a calculated field for that.
1. Profile pane toolbar -> Create Calculated filed
2. Name the filed as Region and enter “Central”, click Save.
Fixing Dates (Order Year, Order Month, and Order Day fields into -> format "MM/DD/YYYY".)
In Orders_Central file order date and ship date fields are separate fields for date, year and month. We want to combine them into two single fields, one for Order Date and one for Ship Date so they align with the same fields in the other data sets.
1. Profile pane toolbar -> Create Calculated filed
2. Name the calculated field Order Date. Then enter the following calculation into the Calculation editor and click Save:
MAKEDATE ([Order Year], [Order Month], [Order Day])
Order Date filed is created
Since we have separate Order Date field, we can delete Order Year, Order Month, and Order Day fields.
3. In the In the Profile pane, in the search box, type Order. Tableau Prep scrolls all the fields with the name Order included in it.
4. Ctrl+click (Command+click on Mac) to select the fields for Order Year, Order Month, and Order Day. Then right-click on the selected fields and select Remove Field from the menu to remove them.
Repeat 2-4 steps for Ship Date
To shape and clean the data we have performed may tasks. To review the changes we made, click the arrow on the left side on the profile pane to open it. We can see the list of changes we made with details.
After making sure that the files have similar fields, you want to union the files together to add the rows from each file into a single table.
1. Drag Orders_west file to Orders_East and use Union step.
2. Add Orders_Central’s cleaned extract to the Union1.
3. Drag the orders_south_2015 step to the new Union step. Drop it on Add to add it to the existing union. Now all our files are combined into a single table. In the Flow pane, select the Union step to see your results.
4. Now all our files are combined into a single table. In the Flow pane, select the Union step to see results.
We notice that Tableau Prep automatically matched up the fields that had the same names and types. We also see that the colors assigned to the steps in the flow are used in the union profiles to indicate where the field came from and appear in the colored band across the top of each field to show you if that field exists in that table.
5. We notice that a new field called Table Names was added that lists the tables where all the rows in the union come from.
6. We see a check box at the top of the Union profile to see only mismatched fields, and you see under Resulting Fields on the left that you have a few, so you want to see what's there. In the Union profile pane, select the Show only mismatched fields check box.
Running Flow & Generating Output Data
1. In the Flow pane, select Union1, click the plus + icon and select Add Output.
2. In the left pane select Save to file.
3. Click the Browse button, then in the Save Extract As dialog, enter a name for the file, for example Orders_Superstore, and click Accept.
4. In the Output type field, select the output type. Depending on the version of Tableau Desktop you use you can choose from the following options:
Tableau Data Extract (. hyper), Tableau Data Extract (.tde), Comma Separated Values (.csv).
5. Click the Run Flow button to generate your output.
The Analysis-ready extract is ready for Tableau Desktop (.tde )
Posted by Jamie Deadmond