#Prepping Data in Prep and Power Query

I’ve been dabbling with Power Query in Excel for the past few days and wanted to see if I could complete the current #PreppinData challenge with Power Query. I’m brand new to Power Query so the steps I outline below are what I got to work and may not be most efficient or best way of doing this. This isn’t an evaluation of which is better or what you can do in one that you can’t do in the other. I don’t have enough experience with Power Query to make that assessment. This is more the method behind my madness of how I solved a #PreppinData challenge in two different tools.

WARNING THIS IS A LONG POST!

This week’s challenge was to rank the men’s Premier League teams with and with out the 6 teams that were going to leave for the “Super League”. I created the Prep flow first to work through the logic and transformations needed. Below I will walk through each step from both Prep & Power Query.

INPUT:
Once you open Tableau Prep you’re presented with the option of opening a flow or connecting to data. When you select connect to data you select your data source type, in this example it is a text file, and then navigate out to the file. Once you do that you’ll see an input node in the flow

this is an example of what the input node looks like in Tableau Prep
Tableau Prep Input Node

To connect to the data in Power Query I opened MS Excel, navigated to the Data tab and selected Get Data From Text/CSV. Like Prep, you navigate to the file and select open. You’ll then be presented with this screen.

this is the screen you get when importing a text file into Excel.
Load Text/CSV into Excel

When you select transform data the Power Query window will open and look like this. This is where the transformations are created.

this is an example of what the Power Query window looks like
Power Query Window

INITIAL CLEAN:
The first thing I did was exclude the rows with a null result. These are future games (as of the time the file was created).
Prep: I did this by right clicking on the null in the results column and selecting exclude.
Power Query: I did this by opening the filter next to results and un-selecting blank.

example of how to remove null from both Prep and Power Query
Remove Future Games Prep & Power Query

I then wanted to change the match date to an actual date field. While this isn’t a required step it is something I do out of habit whenever I have a date field in my data set and I wanted to see how Power Query handled this.
Prep: The date field came through as a timestamp so I clicked on the data type and changed to a date
Power Query: The date field came through as a text field and there is also the point & click functionality to change to a date. I selected date and got an error, I selected date and time and got an error for some rows and others did not have the correct date, I then noticed using locale at the end of the menu and after I selected date time and English (United Kingdom) the (#PreppinData team is located in the UK) I was able to convert to a date time field and then to a date.

this is an image of changing the data type from a string to a date in Tableau Prep & Power Query
Converted to Date Prep & Power Query

The next thing I wanted to do was create a row number for each game. There isn’t a unique identifier in the data set so I like to create one.
Prep: Created a new calculated field called Row Number { ORDERBY [Date] ASC :ROW_NUMBER()}
Power Query: This was the first time I needed to create a new column and it was pretty easy. I went to the Add Column Menu and clicked on the drop down next to Index Column and selected from 1

We need to create a new field that determines the number of points a team has and the the points are based on win,loss, and draw. In order to calculate that I split the results field to get the home and away scores. Splitting fields is pretty easy in both tools but Power Query has the option to split on case (upper to lower and lower to upper) and digit/non-digit (digit to non and non to digit). Would love to see those as options in Prep.
Prep: Use the custom split functionality and split all fields on a –
Power Query: Use the split column option on the Home tab and split on a – for each occurrence
After splitting the columns I renamed them home score and away score and in Prep I changed them to whole numbers.

this is an example of splitting a column in Tableau Prep & Power Query
Split Column Prep & Power Query


We want to know what the team’s position is with and without the big 6 teams, so we need to create a field to indicate whether the game included a big 6 team.
Prep: For some reason Prep doesn’t have the IN operator but desktop does, so I created a boolean calculated field that looks to see if the home or away team is one of the big 6 teams using nested ORs.
Power Query: I knew I needed to create a new column so I went back to the Add Column tab and selected the Conditional Column option. That brings up the screen below where I entered the big 6 teams for both home and away teams.

this is an example of the conditional column in Power Query
Conditional Column in Power Query

I did warn you earlier that this is a long post.

RESHAPE:

Now that the initial cleaning is done I need to reshape my data so each team has their own row. I will need to aggregate the points for the teams for all of their games so I need to move my Home & Away Teams into the same column.
Prep: I added a pivot step after my initial clean step. I’m pivoting columns to rows and added my home and away teams to the pivoted fields. I’ll then rename the Pivot 1 Names field to Home Away
Power Query: To reshape the data I needed to unpivot my columns (interesting that I am pivoting in one tool and unpivoting in another). I selected the Home Team & Away Team columns and clicked on the drop down next to Unpivot Columns and selected the first option Unpivot Columns.

this is an example of the pivot in Tableau Prep and the unpivot in Power Query
Pivot in Prep & Unpivot in Power Query

CREATE POINTS & GOAL DIFFERENCE:

The next thing to do is create the points and the goal difference. A win is worth 3 points, a draw is worth 1, and a loss is worth 0. There could be an easier way to do this but I did this through 2 calculated fields in Prep and a series of custom columns in Power Query.
POINTS:
Prep: I created this calculated field:
IF [Home Away] = ‘Home Team’
AND [Home Score] > [Away Score]
THEN 3
ELSEIF [Home Away] = ‘Away Team’
AND [Away Score] > [Home Score]
THEN 3
ELSEIF [Home Score] = [Away Score]
THEN 1
ELSE 0
END
Power Query: I created these custom columns:
Home Points:
if [Team Type] = “Home Team”
and [Home Score] > [Away Score]
then 3
else 0
Away Points:
if [Team Type] = “Away Team”
and [Away Score] > [Home Score]
then 3
else 0
Draw Points:
created a conditional column where if the home score = away score it returned 1 else 0
Total Points:
[Home Points] + [Away Points] + [Draw Points]
I will go back and change how these are done but when I work with something new I break things into small chunks and that is what I did here. I’d guess that I can do this in a similar manner as I did in Prep but for my first time I wanted to do it this way.

GOAL DIFFERENCE:
Prep: I created this custom calculation:
IF [Home Away] = ‘Home Team’
THEN [Home Score] – [Away Score]
ELSEIF [Home Away] = ‘Away Team’
THEN [Away Score] – [Home Score]
END
Power Query:
if [Team Type] = “Home Team”
then [Home Score] – [Away Score]
else
if [Team Type] = “Away Team”
then [Away Score] – [Home Score]
else 0


AGGREGATE:

We’ve gotten to the point where we can summarize the number of points & the goal difference by team. We’ll need to rank the teams by these new metrics and will need to calculate them with the big 6 teams and without the big 6 teams.
Prep: I added two aggregate steps, one is filtered to exclude the big six teams and one includes every team. The two aggregates are identical outside of the big 6 filter.

this is an example of the aggregate step in Tableau Prep
Aggregate Step in Prep Non Big 6 Teams

Power Query: At this point I was a little lost as to what to do. I thought maybe I should add pivot tables and then find a way to combine them. Unrelated to the aggregate I asked how to replicate a LOD calc in Power Query and Jorge Supelano mentioned the group by feature and that was the clue I needed.
These are the steps I took to get the summarized data:
Did a close and load on the cleaned data and created another query off that range.
Used the group by option to summarize the data

this is an example of the group by function in Power Query
Power Query Group by

Did a close and load to a new worksheet
Created a new query off of the summarized data and filtered those rows to exclude the big 6 teams.
Created the same group by as above and did a close and load to a new worksheet.


I’m impressed if you are still reading this.

RANKINGS:

We’re getting close to the end. The next thing we need to do is create the rankings. The primary rank is on the total points and ties are broken with the goal differential. The approach in Prep & Power Query was similar.
PREP:
To get the position I used the ROW_NUMBER() function. I ordered the data descending by the points and the goal difference. Remember I branched off the flow to have an all section & a non big 6 section so this is replaced in both steps in the flow.
{ORDERBY [Match Points] DESC, [Goal Difference] DESC : ROW_NUMBER()}
POWER QUERY:
I edited the aggregates I loaded above to add in the ranking.
The first step was to sort both the points & difference in descending order. I did this by clicking on the heading and selecting the Z to A on the Home tab.
The next step was to add a column that has the ranking, earlier in the flow I added an index column to give each match an unique id. I used that same logic to get the position ranking.
After both the all summarized & non big 6 summarized sets had the ranking I closed and loaded those back into the workbook.


FINAL TIDY:

You’ve made it to the end! Now that we’ve got the summarized data with and without the big 6 in each tool we need to combine them to see the difference in the rankings.
PREP: This is a straight forward inner join of my two branches on the team name in Prep

example of the join step in Tableau Prep
Join Step in Tableau Prep

POWER QUERY: I was again totally lost on how to combine my two sets together in Power Query. When I asked about the LOD Spencer Baucke mentioned merge in his reply to me. Once again another clue from the Twitter data community. I didn’t want to mess up my existing queries so I again created a new query off of the non big 6 data and when that I loaded I found the Merge Queries option in the Home menu.
I merged or joined the All Summarized query on the team name by selecting the field in both sections and picking and inner join.

example of the merge queries functionality in Power Query
Merge Queries in Power Query

Now that I have the rankings with and without the big 6 I can subtract the rankings and clean up the columns to just include what the challenge is looking for.


FINAL THOUGHTS:

I haven’t used Excel much over the last few years. I’m either using SQL, Tableau Desktop, or Tableau Prep for my data needs. I’m glad I took the time to do an initial test of Power Query and look forward to learning more about what it can do. Thanks for reading this and hopefully you’re inspired to check out Power Query.

VOTD and A Recovering Perfectionist

After 150 Tableau Public visualizations, I received VOTD (Viz of the Day) for my SportsVizSunday visualization on the Boston Marathon Wheelchair winners. The recognition was great and I was happy that so many people like what I created, but I was upset that the thumbnail looked so bad. I finally created something I loved, I get recognition and all people see is a mess up thumbnail. After a breathing exercise, I was able to appreciate the recognition and accept that the viz was more important than the thumbnail. If you can relate to this, you are probably one of my fellow recovering perfectionists.

For me, perfectionism is usually paired with a lack of confidence and a whole lot of self doubt. In short it’s held me back. Over the past few months I started working more on this and establishing methods to mitigate the damage. I haven’t been 100% successful, I’ve passed on a great opportunity, beat myself up for a mistake, and questioned whether I am in the right career. But I’ve had some successes too, including letting the crappy thumbnail go.

I’m optimistic that I’ll have the confidence to take a risk in the future and that I’ll get to a place where perfectionism has less and less of a role my life. I’m looking for progress not perfection.

tableau alerts +/- a range

A couple of weeks ago I wanted to set up a server alert when a daily count was either over or under a specific range. This was the first time I had tested out server alerts and was disappointed to find out that the alert threshold was a hard coded value. Since then, I have tested out a few ideas and I think for what I am looking to do the steps below are the best solution I have come up with.

I generated an Excel sheet with random values by day using the RANDBETWEEN() function. I altered a couple of records to make sure I have values that would exceed or be below my range. I brought the sheet into Tableau and then created a few calculated fields for my range.

  • Window Average – WINDOW_AVG(SUM([Widgets]))
  • Upper- WINDOW_STDEVP(SUM([Widgets])) + [Window Average]
  • Lower – [Window Average] – WINDOW_STDEVP(SUM([Widgets]))

I then created a simple line chart with indicators for when the daily value was +/- the upper and lower values. I also added the above calculations as references lines.

It is easy for me to now see where I have issues with my widget numbers. But, I want a alert to go out to a group of people when the widgets are above or below that range. After testing out a few different options the solution I ended up with was to create an additional chart that just had the widget counts for those days that were outside of the range.

To get the above chart I created a boolean calculated field that looked to see if the widget count was above the high value or below the low value and applied that as a filter. Over or Under – SUM([Widgets]) > [Upper] OR SUM([Widgets]) < [Lower].

I then created a bar chart with the indicators for just the days that are outside my range and added that to a dashboard. I can then create an alert off the second chart where the value is greater than or equal to zero.

It isn’t exactly what I want but I think this is the best solution that I tried. I’d love to hear from anyone else who has tackled this to see what solutions you’ve come up with.

Thanks for reading!

Praise for tableau prep up and Running

Earlier this year Tableau Zen Master and Head Coach at The Data School Carl Allchin wrote Tableau Prep Up & Running Self Service Data Preparation for Better Analysis. The book is a fabulous resource for a wide range of people in data roles (or those who want to get into a data role). While it is focused on Tableau’s Prep Builder tool, the data concepts are transferable to other data prep solutions. The book is a perfect mix of theory and “how tos”.

The book covers the data prep concepts with detailed examples and theories. These concepts include

  • understanding your data – I’m stealing KYD (know your data)
  • planning your preparation – a must read section
  • data types and how to work with them
  • profiling data
  • shaping data
  • cleaning
  • joining (Carl does an amazing job covering the different join types in an easy to follow manner)
  • aggregating
  • scaffolding

Chapter 35, Breaking Down Complex Data Preparation Challenges is probably my favorite section. You are going to encounter a lot of complex situations in the data prep world, knowing how to deal with this is a big key to success. Carl walks through the most complex #PreppinData challenge and his explanations of how to break down this challenge and you’ll gain insights into how to logically break down a problem into smaller portions.

I highly recommend purchasing this book and be sure to check out #PreppinData.

HOW TO PREP TO SQL – SUB-QUERIES

In this installment of the How to Prep to SQL series I’m going to review how you would translate a Prep flow into SQL using a sub-query. A sub-query is a nested query within a SQL statement. All of the examples below use Tableau Prep version 2020.2.3 and MS SQL Server and utilize the Tableau Superstore data set.

In this example I want to return all of the orders from Superstore where the discount is greater than the average discount for all the orders.

There are two different ways I would do this in Prep.

  • Connect to the Superstore data set
  • Insert a clean step and keep only the order ID & discount fields
  • Add an aggregate step and calculate the average discount
  • Join the aggregate and the first clean step together where the discount from the clean step is greater than the average discount

The final flow will look like this

The second approach in Prep is to use a LOD (level of detail calculation) in the first step. The calculation is {FIXED : AVG([Discount])}. Then create another calculation to evaluate if the discount is greater than the average discount [Discount] > [Avg Discount] and then right click on True and select keep only.

The final flow will look like this:

To do this in SQL, first, I am going to write a select statement to return the order id & discount fields from my orders table in the Superstore database:

SELECT
O.[Order ID],
O.[Discount]
FROM Superstore.dbo.Orders$ O

To get the average discount I am going to use a sub-query within the select statement. The sub-query is a select statement that will calculate the average discount from the orders table

SELECT
O.[Order ID],
O.[Discount],
(SELECT AVG([Discount]) FROM Superstore.dbo.Orders$) AS [AVG DISCOUNT] –this is the sub-query
FROM Superstore.dbo.Orders$ O

To limit the results to just the orders that are over the average discount we need to add a where clause that limits the orders to just those where the discount was above the average discount. We’ll do this by saying the discount is greater than the sub-query that calculates the average discount. (in the first prep example this was done through the join. in the second prep example this was done by creating the boolean field and then right clicking on true and selecting keep only)

The query is:

SELECT
O.[Order ID],
O.[Discount],
(SELECT AVG([Discount]) FROM Superstore.dbo.Orders$) AS [AVG DISCOUNT]
FROM Superstore.dbo.Orders$ O
WHERE O.[Discount] > (SELECT AVG([Discount]) FROM Superstore.dbo.Orders$)

PrepSQLSelect3

This is one example of how you could use Prep to help you understand sub-queries. If you are interested in another example or have any other questions on how to learn SQL by using Prep please let me know!

HOW TO PREP to SQL – DATES

This is the second post in a series on how you can use Tableau Prep to learn SQL. In this post I’ll go over how to create custom dates in Prep & SQL. This may also be handy for people who know SQL and aren’t as familiar with the Tableau date functions. The SQL and Prep examples are all based on the Tableau Superstore data set. The SQL is created for MS SQL Server, if you are using a different flavor of SQL the syntax may be different.

The beauty of Prep is that there is point and click functionality that makes creating date dimensions easy. Need to change a timestamp to a date? Click on the field type icon in the column and change it to a date. Need to change a date to a year number? Click column that has your date field and click on the … Click on the menu that says Convert Dates and you’ll see these options. Note if you want to maintain the date field duplicate the field and do the transformation on the new duplicated field.

this is an image of a convert date menu in Tableau Prep. The options are date & time, year number, quarter number, month number, week number, and day of the month
build in convert date options in Tableau Prep

 

If you’re wondering how to create these date dimensions in SQL, the table below will have a snippet of code. The full query is listed below the table.

Date Type (field type returned)TableauSQL
Timestamp to Date (Date)DATE([Order Date]) or click on the field type and select dateCAST(O.[Order Date] AS DATE) AS ORDER_DATE
Day of Month (Number)DAY([Order Date])DAY(O.[Order Date]) AS DAY_NUM
Day of Year (Number)DATEPART(‘dayofyear’,[Order Date])DATENAME(dy,O.[Order Date]) AS DAY_OF_YR_NUM
Epoch Date (Number)DATEDIFF(‘second’,#1970-01-01#,[Order Date])DATEDIFF(s, ‘1970-01-01 00:00:00’, O.[Order Date]) AS EPOCH_DATE
Day of Week Name (Text)DATENAME(‘weekday’,[Order Date])DATENAME(WEEKDAY,O.[Order Date])
Day of Week (Number)DATEPART(‘weekday’,[Order Date])DATEPART(dw,O.[Order Date])
Week Start (Date)DATE(DATETRUNC(‘week’,[Order Date]))DATEADD(dd, -(DATEPART(dw, O.[Order Date])-1), O.[Order Date]) AS WEEK_START_DATE
Month (Number)MONTH([Order Date]) or use convert date menuMONTH(O.[Order Date]) AS MONTH_NUM
Month Name (Text)DATENAME(‘month’,[Order Date])DATENAME(MONTH,O.[Order Date]) AS MONTH_NAME
Last Day of Month (Date)DATE(DATEADD(‘day’,-1,
DATEADD(‘month’, 1, DATETRUNC(‘month’,[Order Date]))
))
EOMONTH(O.[Order Date]) AS LAST_DAY_OF_MONTH
First Day of Month (Date)DATE(DATETRUNC(‘month’,[Order Date]))DATEADD(DAY, 1, EOMONTH(O.[Order Date], -1)) AS FIRST_DAY_OF_MONTH
Max Month in Data Set (Date){FIXED : MAX(DATE(DATETRUNC(‘month’,[Order Date])))}(SELECT MAX(DATEADD(DAY, 1, EOMONTH(O.[Order Date], -1))) FROM Superstore.dbo.Orders$ O) MAX_MONTH
Min Month in Date Set (Date){FIXED : MIN(DATE(DATETRUNC(‘month’,[Order Date])))}(SELECT MIN(DATEADD(DAY, 1, EOMONTH(O.[Order Date], -1))) FROM Superstore.dbo.Orders$ O) MIN_MONTH
Today’s Month (Date)DATE(DATETRUNC(‘month’, TODAY()))DATEADD(DAY, 1, EOMONTH(GETDATE(), -1))
Year (Number)YEAR([Order Date]) or use convert date menuYEAR(O.[Order Date]) AS YEAR_NUM
Today’s Year (Number)DATEPART(‘year’,TODAY())YEAR(GETDATE()) AS TODAY_YEAR
Current Date Time (Timestamp)NOW()GETDATE() AS TODAY_DATETIME
Quarter Number (Number)DATEPART(‘quarter’, [Order Date])DATEPART(qq,O.[Order Date]) AS QUARTER_NUM
Quarter Start (Date)DATETRUNC(‘quarter’,[Order Date])CAST(DATEADD(qq, DATEDIFF(qq, 0, O.[Order Date]), 0) AS DATE) AS QUARTER_START_DATE
Quarter End Date (Date)DATEADD(‘day’,-1,DATEADD(‘quarter’,1,DATETRUNC(‘quarter’,[Order Date])))CAST(DATEADD(d, -1, DATEADD(q, DATEDIFF(q, 0, O.[Order Date]) +1, 0)) AS DATE) AS QUARTER_END_DATE

This is the query in SQL:

SELECT
O.[Order ID],
CAST(O.[Order Date] AS DATE) AS ORDER_DATE,
DAY(O.[Order Date]) AS DAY_NUM,
DATENAME(dy,O.[Order Date]) AS DAY_OF_YR_NUM,
DATEDIFF(s, ‘1970-01-01 00:00:00’, O.[Order Date]) AS EPOCH_DATE,
DATENAME(WEEKDAY,O.[Order Date]) WEEKDAY_NAME,
DATEPART(dw,O.[Order Date]) AS WEEKDAY_NUM,
DATEADD(dd, -(DATEPART(dw, O.[Order Date])-1), O.[Order Date]) AS WEEK_START_DATE,
MONTH(O.[Order Date]) AS MONTH_NUM,
DATENAME(MONTH,O.[Order Date]) AS MONTH_NAME,
EOMONTH(O.[Order Date]) AS LAST_DAY_OF_MONTH,
DATEADD(DAY, 1, EOMONTH(O.[Order Date], -1)) AS FIRST_DAY_OF_MONTH,
(SELECT MAX(DATEADD(DAY, 1, EOMONTH(O.[Order Date], -1))) FROM Superstore.dbo.Orders$ O) MAX_MONTH,
(SELECT MIN(DATEADD(DAY, 1, EOMONTH(O.[Order Date], -1))) FROM Superstore.dbo.Orders$ O) MIN_MONTH,
DATEADD(DAY, 1, EOMONTH(GETDATE(), -1)) AS TODAY_MONTH,
YEAR(O.[Order Date]) AS YEAR_NUM,
YEAR(GETDATE()) AS TODAY_YEAR,
GETDATE() AS TODAY_DATETIME,
CAST(GETDATE() AS DATE) AS TODAY_DATE,
DATEPART(qq,O.[Order Date]) AS QUARTER_NUM,
CAST(DATEADD(qq, DATEDIFF(qq, 0, O.[Order Date]), 0) AS DATE) AS QUARTER_START_DATE,
CAST(DATEADD(d, -1, DATEADD(q, DATEDIFF(q, 0, O.[Order Date]) +1, 0)) AS DATE) AS QUARTER_END_DATE,
FROM Superstore.dbo.Orders$ O

 

Hope that you found this helpful.

 

 

 

HOW TO PREP to SQL – PIVOT

A very long time ago, my first introduction to SQL was through the SQL view in MS Access. I would build a query and then look at the SQL view to see how it translated that into SQL. Using MS Access gave me a decent foundation in how to build queries and connect tables.

Fast forward about 20 years and Tableau Prep is now my go to tool to help people learn how to manipulate and clean data. The built in functionality in Prep makes it a breeze to get your data in the format you want, but, there is no SQL view in Prep to look at and learn from. This is the first in a series of posts on how you would replicate steps from Prep in SQL.

For this example I’ve connected to the superstore data set and build a simple flow that counts the number of orders by ship mode and segment. In Prep we would do this by adding an aggregate step (the sigma step) that counts the orders by segment and ship mode.

The next step is to pivot (or reshape the data). In this step I am moving the ship mode to the columns so this is a rows to columns pivot step.

This is how to achieve the same result by using SQL, First create a query that pulls the fields in from the Superstore data set.

SELECT (select is saying pick these fields)
S.Segment, (separate your selected fields with a comma)
S.[Ship Mode], (use [] when your field name has a space)
S.[Order ID]
FROM SUPERSTORE S
(from is where you set the data source. The “S” after the table name is an alias. I think it is a good practice to use alias, especially when you start joining tables)

This gets the initial data set that we want. Now we want to add on to this query to move the Ship Mode to the columns and count the order ids.

SELECT * FROM (this is new and we add this to select all of the fields from the statement)

(SELECT (this is the query from above)
S.Segment,
S.[Ship Mode],
S.[Order ID]
FROM SUPERSTORE S) O

PIVOT( (this starts the reshaping of the data)
COUNT(O.[Order ID]) (we want to get a count of the order id)
FOR O.[Ship Mode] IN (this specifies the values that we want to include in our pivot. These are the values in the ship mode field. you can enter all or just a few of the values)
([First Class],
[Second Class],
[Standard Class],
[Same Day])
) AS ORDERS_SHIPMODE

Here is the query without my notes:

SELECT * FROM

(SELECT
S.Segment,
S.[Ship Mode],
S.[Order ID]
FROM SUPERSTORE.dbo.Orders$ S) O

PIVOT(
COUNT(O.[Order ID])
FOR O.[Ship Mode] IN
([First Class],
[Second Class],
[Standard Class],
[Same Day])
) AS ORDERS_SHIPMODE

 

Please note this is based on SQL for MS SQL Server. Oracle has a PIVOT function as well but the syntax is different. If you are copying the above query be sure to remove my comments in the () in the statement.

Hope this helps you understand how to do a simple pivot in SQL.

 

morality and grocery shopping

I haven’t stepped foot in a grocery store in almost 7 weeks. Since Massachusetts went into a stay at home advisory I’ve avoided going into stores and have relied on Instacart, Amazon Fresh, USPS, UPS, ShipIt, and a local meat farm for my food and necessities. Every time I use these services, I feel guilty and struggle with the decision. Is it morally right for me to pay someone to do something I am uncomfortable doing?

My moral compass has always been guided by fairness, and it has been strong moral compass. I’ve made people uncomfortable when I refused to laugh at a joke that degraded someone because they were different and made career decisions based on how teammates were treated. And now I’m not sure if I am off course. Is it morally right for me to expect someone to do something I am uncomfortable doing?

I assume that the people who have been shopping for me are in a different situation than I am. We are DINKs and my husband and I are still employed. I recognize that we are lucky. While I don’t know for sure, I would guess that the people who have been doing my shopping are putting themselves in at risk because they need to. When I’ve discussed this with friends, they’ve tried to re-assure me by saying you’re supporting someone who needs work. All I can think is that I am putting someone in potential danger. Am I harming someone because I don’t want to do something?

I see multiple sides of this and have come to the realization there is no real answer to this. We’re all trying to do the best we can to make the right decisions.

Virtual Prep TUG Example

I was honored to present at the first virtual Tableau Prep user group session and was both thrilled and a bit nervous when I found out 1,200 people had registered for the session. Thrilled because I was glad to see that there were so many people interested in Prep and nervous because I hadn’t presented to that many people before. It was a great TUG and I learned something from my co-presenters Joshua, Jenny, and Rahim and am appreciative of Jack & Caroline’s efforts in making the TUG happen. The link to the recording and the packaged flow are posted on the Tableau forum.

My presentation was based on a real life example where I had to reverse the management levels in an employee hierarchy. The people hierarchy file has an employee’s immediate leader as their first management level, their immediate leader’s leader as their second and then works up the hierarchy to the company leader. I need to reverse that so the company leader is the first management level and then works down the hierarchy so the immediate leader is in the last hierarchy level.

As you can see in the Excel employees have a different number of leaders so the top level leader could be in any one of the columns. When I was working through this I noticed all of the management levels headers have a number in them and my initial thought was I could use that number as the way to reverse the levels.

After connecting to the file in Prep I added an initial clean step and created one calculated field called Dummy Field for Aggregate Joins with “A” as the calculation. (I’ll get back to why I created this towards the end of the post.)

I wanted to extract the number from the management level headers and use that to create a new reversed level. In order to extract the number I need to reshape my data with a pivot step. Moving the columns to rows puts all of my managers in one column and creates a new field called Pivot 1 Names which were my old column headers.

After the pivot I added a clean step and made 7 changes:

  1. excluded any records with a null management level
  2. duplicated the pivot names field
  3. use the built in clean function to remove all the letters from the field created in step 2
  4. removed all of the spaces from the new field
  5. changed the field type from a string to a number
  6. renamed the new field management level
  7. removed the Pivot 1 Names field

These steps created a new field with the number from the management level header. I duplicated the original pivot names field because I am in the habit of keeping the original field as a comparison point when I change the values. You do not have to do this, it is just a personal preference.

The built in clean tools to remove letters and spaces is accessed by selecting the column you want to clean and then clicking on the 3 dots. That opens a new menu and you’ll see the clean option.

The next step was to get the max management level number for each employee. When I have the max level I will be able to subtract that from the management level we pulled out of the header to get the new management level. To get the max level I added an aggregate step, grouped by the employee and added the management level number to the aggregated fields and changed the calculation to max. I then joined that max level back to my data to add that field in. Note that in the latest version of Prep you can now do this with a LOD (level of detail) calculation this functionality didn’t exist when I created the flow.

Now that I have the highest management level for each employee I can subtract that from the management level and add 1 to that to get the reversed level. I created this calculation ([Max Mgmt Level] – [Manager Level]) +1. I also created a new header field with the reversed level with this calculation “Level ” + STR([Reverse Mgmt Level]) + ” Manager”

In this snippet of data you can see that Albert Norman has Brittany Newman has his 1st management level, his highest management level is 5. When that is reversed Patti Reed who is the top level is now the level 1 manager and Brittany is the level 5 manager.

I cleaned up a few fields and then added another pivot to move the new management levels back to columns. This pivot is a rows to column pivot and because I know there is only 1 value for each level I am taking the min of the manager name.

The last thing to do is to add Patti Reed back to the cleaned data. Patti is the CEO of Fake Company and does not have a management level. When we excluded the null management levels after the first pivot she was removed from the data set. I created a branch for just Patti and unioned that back to the cleaned data set.

Earlier in this post I mentioned that I created a dummy field with “A” in it. I like to add aggregates into my flows to check the record counts at different stages of the flow. I got in the habit of creating these and exporting them out because I often work with sampled data. Creating the dummy field allows me a way to join the aggregates together and validate my record counts. If you’ve downloaded the flow you’ll see these aggregates and the step that exports the counts.

Thanks for reading and I hope this example was helpful. If you have any questions please feel free to get in touch. Happy Preppin!

January 2020 Sports Viz Sunday

The topic for the first #SportsVizSunday of 2020 is personal health data. I took some leeway with the topic and looked at my golf handicap index and scores. I normally walk the golf course and golf impacts my mental health (sometimes positive and sometimes negative). There were a few times this year where I thought about buying a boat.

For #SportsVizSunday, I wanted to look at where my index fell in relation to other women who keep a handicap and highlight the scores that count towards my current index. As with most public work I do, I like to keep it simple. I spend a lot of time during the week working on dashboards so in my free time I tend to keep it light and simple.

The 2019 season was a bit all over the place for me. I struggled with my irons for the last two seasons and that definitely impacted my score. While that aspect was off the rest of my game was in good shape and that helped me get my handicap index down to an 18.4.

I play most of my golf at two different courses and wanted to see what my score and differentials looked like at those two courses. I felt like I played better at Furnace Brook because I hit my fairway woods and hybrid more than I hit my irons. The data backed that up. I scored better (based on differential) at Furnace Brook than at William J Devine.

my differential at the Brook was 4 strokes lower than at the Park

In 2020 I’m going to track more of my golf stats and visualize them to see where I can get better. I know where I struggle with my game, but, seeing the data makes it a bit more real.