#Prepping Data in Prep and Power Query

I’ve been dabbling with Power Query in Excel for the past few days and wanted to see if I could complete the current #PreppinData challenge with Power Query. I’m brand new to Power Query so the steps I outline below are what I got to work and may not be most efficient or best way of doing this. This isn’t an evaluation of which is better or what you can do in one that you can’t do in the other. I don’t have enough experience with Power Query to make that assessment. This is more the method behind my madness of how I solved a #PreppinData challenge in two different tools.

WARNING THIS IS A LONG POST!

This week’s challenge was to rank the men’s Premier League teams with and with out the 6 teams that were going to leave for the “Super League”. I created the Prep flow first to work through the logic and transformations needed. Below I will walk through each step from both Prep & Power Query.

INPUT:
Once you open Tableau Prep you’re presented with the option of opening a flow or connecting to data. When you select connect to data you select your data source type, in this example it is a text file, and then navigate out to the file. Once you do that you’ll see an input node in the flow

this is an example of what the input node looks like in Tableau Prep
Tableau Prep Input Node

To connect to the data in Power Query I opened MS Excel, navigated to the Data tab and selected Get Data From Text/CSV. Like Prep, you navigate to the file and select open. You’ll then be presented with this screen.

this is the screen you get when importing a text file into Excel.
Load Text/CSV into Excel

When you select transform data the Power Query window will open and look like this. This is where the transformations are created.

this is an example of what the Power Query window looks like
Power Query Window

INITIAL CLEAN:
The first thing I did was exclude the rows with a null result. These are future games (as of the time the file was created).
Prep: I did this by right clicking on the null in the results column and selecting exclude.
Power Query: I did this by opening the filter next to results and un-selecting blank.

example of how to remove null from both Prep and Power Query
Remove Future Games Prep & Power Query

I then wanted to change the match date to an actual date field. While this isn’t a required step it is something I do out of habit whenever I have a date field in my data set and I wanted to see how Power Query handled this.
Prep: The date field came through as a timestamp so I clicked on the data type and changed to a date
Power Query: The date field came through as a text field and there is also the point & click functionality to change to a date. I selected date and got an error, I selected date and time and got an error for some rows and others did not have the correct date, I then noticed using locale at the end of the menu and after I selected date time and English (United Kingdom) the (#PreppinData team is located in the UK) I was able to convert to a date time field and then to a date.

this is an image of changing the data type from a string to a date in Tableau Prep & Power Query
Converted to Date Prep & Power Query

The next thing I wanted to do was create a row number for each game. There isn’t a unique identifier in the data set so I like to create one.
Prep: Created a new calculated field called Row Number { ORDERBY [Date] ASC :ROW_NUMBER()}
Power Query: This was the first time I needed to create a new column and it was pretty easy. I went to the Add Column Menu and clicked on the drop down next to Index Column and selected from 1

We need to create a new field that determines the number of points a team has and the the points are based on win,loss, and draw. In order to calculate that I split the results field to get the home and away scores. Splitting fields is pretty easy in both tools but Power Query has the option to split on case (upper to lower and lower to upper) and digit/non-digit (digit to non and non to digit). Would love to see those as options in Prep.
Prep: Use the custom split functionality and split all fields on a –
Power Query: Use the split column option on the Home tab and split on a – for each occurrence
After splitting the columns I renamed them home score and away score and in Prep I changed them to whole numbers.

this is an example of splitting a column in Tableau Prep & Power Query
Split Column Prep & Power Query


We want to know what the team’s position is with and without the big 6 teams, so we need to create a field to indicate whether the game included a big 6 team.
Prep: For some reason Prep doesn’t have the IN operator but desktop does, so I created a boolean calculated field that looks to see if the home or away team is one of the big 6 teams using nested ORs.
Power Query: I knew I needed to create a new column so I went back to the Add Column tab and selected the Conditional Column option. That brings up the screen below where I entered the big 6 teams for both home and away teams.

this is an example of the conditional column in Power Query
Conditional Column in Power Query

I did warn you earlier that this is a long post.

RESHAPE:

Now that the initial cleaning is done I need to reshape my data so each team has their own row. I will need to aggregate the points for the teams for all of their games so I need to move my Home & Away Teams into the same column.
Prep: I added a pivot step after my initial clean step. I’m pivoting columns to rows and added my home and away teams to the pivoted fields. I’ll then rename the Pivot 1 Names field to Home Away
Power Query: To reshape the data I needed to unpivot my columns (interesting that I am pivoting in one tool and unpivoting in another). I selected the Home Team & Away Team columns and clicked on the drop down next to Unpivot Columns and selected the first option Unpivot Columns.

this is an example of the pivot in Tableau Prep and the unpivot in Power Query
Pivot in Prep & Unpivot in Power Query

CREATE POINTS & GOAL DIFFERENCE:

The next thing to do is create the points and the goal difference. A win is worth 3 points, a draw is worth 1, and a loss is worth 0. There could be an easier way to do this but I did this through 2 calculated fields in Prep and a series of custom columns in Power Query.
POINTS:
Prep: I created this calculated field:
IF [Home Away] = ‘Home Team’
AND [Home Score] > [Away Score]
THEN 3
ELSEIF [Home Away] = ‘Away Team’
AND [Away Score] > [Home Score]
THEN 3
ELSEIF [Home Score] = [Away Score]
THEN 1
ELSE 0
END
Power Query: I created these custom columns:
Home Points:
if [Team Type] = “Home Team”
and [Home Score] > [Away Score]
then 3
else 0
Away Points:
if [Team Type] = “Away Team”
and [Away Score] > [Home Score]
then 3
else 0
Draw Points:
created a conditional column where if the home score = away score it returned 1 else 0
Total Points:
[Home Points] + [Away Points] + [Draw Points]
I will go back and change how these are done but when I work with something new I break things into small chunks and that is what I did here. I’d guess that I can do this in a similar manner as I did in Prep but for my first time I wanted to do it this way.

GOAL DIFFERENCE:
Prep: I created this custom calculation:
IF [Home Away] = ‘Home Team’
THEN [Home Score] – [Away Score]
ELSEIF [Home Away] = ‘Away Team’
THEN [Away Score] – [Home Score]
END
Power Query:
if [Team Type] = “Home Team”
then [Home Score] – [Away Score]
else
if [Team Type] = “Away Team”
then [Away Score] – [Home Score]
else 0


AGGREGATE:

We’ve gotten to the point where we can summarize the number of points & the goal difference by team. We’ll need to rank the teams by these new metrics and will need to calculate them with the big 6 teams and without the big 6 teams.
Prep: I added two aggregate steps, one is filtered to exclude the big six teams and one includes every team. The two aggregates are identical outside of the big 6 filter.

this is an example of the aggregate step in Tableau Prep
Aggregate Step in Prep Non Big 6 Teams

Power Query: At this point I was a little lost as to what to do. I thought maybe I should add pivot tables and then find a way to combine them. Unrelated to the aggregate I asked how to replicate a LOD calc in Power Query and Jorge Supelano mentioned the group by feature and that was the clue I needed.
These are the steps I took to get the summarized data:
Did a close and load on the cleaned data and created another query off that range.
Used the group by option to summarize the data

this is an example of the group by function in Power Query
Power Query Group by

Did a close and load to a new worksheet
Created a new query off of the summarized data and filtered those rows to exclude the big 6 teams.
Created the same group by as above and did a close and load to a new worksheet.


I’m impressed if you are still reading this.

RANKINGS:

We’re getting close to the end. The next thing we need to do is create the rankings. The primary rank is on the total points and ties are broken with the goal differential. The approach in Prep & Power Query was similar.
PREP:
To get the position I used the ROW_NUMBER() function. I ordered the data descending by the points and the goal difference. Remember I branched off the flow to have an all section & a non big 6 section so this is replaced in both steps in the flow.
{ORDERBY [Match Points] DESC, [Goal Difference] DESC : ROW_NUMBER()}
POWER QUERY:
I edited the aggregates I loaded above to add in the ranking.
The first step was to sort both the points & difference in descending order. I did this by clicking on the heading and selecting the Z to A on the Home tab.
The next step was to add a column that has the ranking, earlier in the flow I added an index column to give each match an unique id. I used that same logic to get the position ranking.
After both the all summarized & non big 6 summarized sets had the ranking I closed and loaded those back into the workbook.


FINAL TIDY:

You’ve made it to the end! Now that we’ve got the summarized data with and without the big 6 in each tool we need to combine them to see the difference in the rankings.
PREP: This is a straight forward inner join of my two branches on the team name in Prep

example of the join step in Tableau Prep
Join Step in Tableau Prep

POWER QUERY: I was again totally lost on how to combine my two sets together in Power Query. When I asked about the LOD Spencer Baucke mentioned merge in his reply to me. Once again another clue from the Twitter data community. I didn’t want to mess up my existing queries so I again created a new query off of the non big 6 data and when that I loaded I found the Merge Queries option in the Home menu.
I merged or joined the All Summarized query on the team name by selecting the field in both sections and picking and inner join.

example of the merge queries functionality in Power Query
Merge Queries in Power Query

Now that I have the rankings with and without the big 6 I can subtract the rankings and clean up the columns to just include what the challenge is looking for.


FINAL THOUGHTS:

I haven’t used Excel much over the last few years. I’m either using SQL, Tableau Desktop, or Tableau Prep for my data needs. I’m glad I took the time to do an initial test of Power Query and look forward to learning more about what it can do. Thanks for reading this and hopefully you’re inspired to check out Power Query.

tableau alerts +/- a range

A couple of weeks ago I wanted to set up a server alert when a daily count was either over or under a specific range. This was the first time I had tested out server alerts and was disappointed to find out that the alert threshold was a hard coded value. Since then, I have tested out a few ideas and I think for what I am looking to do the steps below are the best solution I have come up with.

I generated an Excel sheet with random values by day using the RANDBETWEEN() function. I altered a couple of records to make sure I have values that would exceed or be below my range. I brought the sheet into Tableau and then created a few calculated fields for my range.

  • Window Average – WINDOW_AVG(SUM([Widgets]))
  • Upper- WINDOW_STDEVP(SUM([Widgets])) + [Window Average]
  • Lower – [Window Average] – WINDOW_STDEVP(SUM([Widgets]))

I then created a simple line chart with indicators for when the daily value was +/- the upper and lower values. I also added the above calculations as references lines.

It is easy for me to now see where I have issues with my widget numbers. But, I want a alert to go out to a group of people when the widgets are above or below that range. After testing out a few different options the solution I ended up with was to create an additional chart that just had the widget counts for those days that were outside of the range.

To get the above chart I created a boolean calculated field that looked to see if the widget count was above the high value or below the low value and applied that as a filter. Over or Under – SUM([Widgets]) > [Upper] OR SUM([Widgets]) < [Lower].

I then created a bar chart with the indicators for just the days that are outside my range and added that to a dashboard. I can then create an alert off the second chart where the value is greater than or equal to zero.

It isn’t exactly what I want but I think this is the best solution that I tried. I’d love to hear from anyone else who has tackled this to see what solutions you’ve come up with.

Thanks for reading!

January 2020 Sports Viz Sunday

The topic for the first #SportsVizSunday of 2020 is personal health data. I took some leeway with the topic and looked at my golf handicap index and scores. I normally walk the golf course and golf impacts my mental health (sometimes positive and sometimes negative). There were a few times this year where I thought about buying a boat.

For #SportsVizSunday, I wanted to look at where my index fell in relation to other women who keep a handicap and highlight the scores that count towards my current index. As with most public work I do, I like to keep it simple. I spend a lot of time during the week working on dashboards so in my free time I tend to keep it light and simple.

The 2019 season was a bit all over the place for me. I struggled with my irons for the last two seasons and that definitely impacted my score. While that aspect was off the rest of my game was in good shape and that helped me get my handicap index down to an 18.4.

I play most of my golf at two different courses and wanted to see what my score and differentials looked like at those two courses. I felt like I played better at Furnace Brook because I hit my fairway woods and hybrid more than I hit my irons. The data backed that up. I scored better (based on differential) at Furnace Brook than at William J Devine.

my differential at the Brook was 4 strokes lower than at the Park

In 2020 I’m going to track more of my golf stats and visualize them to see where I can get better. I know where I struggle with my game, but, seeing the data makes it a bit more real.

The Tableau Community

A quick sidebar with Mark Edwards and a message from Adam Mico on Twitter on the last day of the Tableau conference got me thinking about defining the “Tableau community” and what my Tableau community is. I’ve been noodling this for a few days now and this is what it means to me.

My Tableau community is:
Krish who works at TD bank in Toronto and attended his first Tableau conference. I was sitting at a table with some folks I “knew” from Twitter at the TUG Tips Battle Session and noticed that someone was sitting by themselves at a table. I have been that person at many events. I moved to the table, introduced myself and struck up a conversation. There are so many people like Krish who are in the Tableau community but not involved in the community projects.

My Tableau community is:
A man I had lunch with on Thursday when we arrived at the same time to a 2-seat table to eat our lunch. I spent 5 minutes with him and learned that he was new to Tableau and trying to learn as much as he could at conference. I suggested that he check out Ryan Sleeper’s blog and Playfair Data TV as resources to get him up to speed on Tableau when he got home.
A few hours later I ran into Ryan at the community area of the Data Village and ended up getting pizza with him, Sean Miller, Tom O’Hara, Will Strousse and Furkan Celik. I know Tom and Will and enjoyed getting to know Furkan, Ryan, and Sean more. Ryan even schooled the 3 Bostonians there on the time out chair at Pizzeria Regina.

My Tableau community is:
James, Simon, and Spencer who have given me exposure to the larger Tableau community by having me host #SportsVizSunday twice and by asking me to be on the half time panel of their data19 session. Giving people an opportunity to get their name out there is invaluable and I appreciate what they have done for me.

My Tableau community is:
Bridget Cogley who told me things I needed to hear and encouraged me not to settle. Don’t let the shortness of this section fool you. This was one of the most important conversations I had all week.

My Tableau community is:
All of the people I know from the Twitter community. All of the people I know from BTUG including my first TUG friends Paula Munoz and Susan Glass. All of my co-workers who use Tableau including Amar, Jesse, Josh, and Tom. All the people who have asked and answered questions on the forums that have helped me. All of the people who write blogs and do videos to share their knowledge.

The community isn’t just those with rocks, those that are ambassadors, those that are on Twitter, those involved in the community projects, and those that win community awards. The Tableau community is anyone who uses Tableau in some capacity and I can’t wait to meet more and more of those people.

The Masters

Last month’s Sports Viz Sunday was the Masters. I created 3 different vizzes using Tableau public.

The first one I created looked at how closely contested the Masters usually is. I’ve always felt that Masters Sunday was the best TV viewing day of the year and looking at the data backed that up. The tournament has only been won by 5 strokes or more 5 times.

Overall I like how this turned out. The one thing I would change is the title. I don’t feel that it gives a good take-a-way of what the viz is about.

A Brief History of Champions at The Masters(1)

The next one I did was on Tiger’s 1997 win. Tiger won by 12 strokes the largest margin of victory (as of this post). I wanted to see round by round how much better Tiger was than the average score for the day. Tiger is known for wearing red and black on Sundays and I used the color scheme in honor of that.

This is a simple viz and the goal was to highlight how good his 2nd and 3rd rounds were in relation to the field average score. There was a Twitter discussion about showing the better score on the bottom of the viz. In golf being under par is good and while it may seem strange to see better on the bottom I think it make sense when you are looking at golf scores. If I was showing tournament position (first place, second place etc.) it makes sense to show them at the top, but, I believe when showing in relation to par at the bottom of the viz makes more sense.

Tiger Woods 1997 Masters(2)

The 3rd viz looked at 1956 Masters where Jackie Burke Jr started Sunday 8 strokes behind Ken Venturi and came back to win by 1 stroke. I wanted to show round by round how well Venturi played for the first 3 rounds and how steady Burke was. I’d like to do a more in depth analysis on this to show how great Burke’s final round was. There were only 2 players under par on Sunday and Bobby Jones said it was the toughest weather conditions the Masters had been played in. This is my favorite of the 3 and hopefully I’ll expand upon this with a more in depth analysis.

1956 Masters(1)

 

 

 

 

 

 

 

 

Don’t Underestimate the Power of the Bar

I was excited that Cole Nussbaumer Knaflic’s Storytelling with Data current challenge is to create a basic bar chart. She says “The #SWDchallenge this month is to create a basic bar chart. Nothing fancy. No need to stack it or do anything else crazy.” I love a good bar chart and have been known to say “don’t underestimate the power of the bar” more than once.

For this challenge I used data from the 2017 Masters to show which holes had the highest percent of scores over par.

2017 Hardest Holes at the Masters2

At first, I sorted the data in descending order so the top three were together at the top of the chart. For other data sets I think this works, but, for this I liked keeping the holes ordered by the hole number.

I debated the bar color for the top 3 for a while. I wanted to use the green to tie with the Masters theme. I decided against that because people tend to associate green with good – if I were showing the 3 easiest I would have used that. I tried orange, a maroon-ish red, dark gray, and brown but I didn’t love any of those choices. I had my husband look and he suggested that I color code them in multiple shades. Instead of shutting that down immediately I changed the scheme to show him what it would look like and asked do the top 3 still stand out? When he agreed that it didn’t, I switched it back to a two color scheme and he suggested the purple and I think it pops.

Initially, I labeled the bars and tested out different alignments. I felt that the chart was too busy with the bars labeled. I needed to add the percent over par to the chart so I added it next to the hole name. To do this in Tableau add your measure to the row shelf and change it to discrete.

I don’t have any annotations on this chart and if you aren’t familiar with golf over par may not resonate with you.  I am sure some folks would suggest adding text to explain over par but I opted not to because I liked the clean look and felt that my title got the point of the chart across.

To see other entries for this challenge take a look at #swdchallenge on Twitter.

Also take time to check out Cole’s website and buy the Storytelling with Data book.

 

Rank and Magnitude Dashboard

If you don’t have the Big Book of Dashboards yet you are missing out. Steve Wexler, Jeffrey Shaffer, and Andy Cotgreave  put together a fantastic resource for Business Dashboards.

I used the BBOD to help solve a request to show both the ranking and magnitude for a 12 week snapshot and also show that by different groups. Chapter 22 in the book walks through this type of scenario.

This is an example of what I created – all of the information below is fake. The chart on the left shows the ranking and controls what is highlighted in the chart on the right. The bar chart about the chart on the right appears when something is selected on the left. This was added to make it easier for the user to compare the volume across the groups.A big thanks to Steve who helped me with this addition!

MagRank

This design isn’t flashy but it serves a specific purpose and it has been well received by the end users. I will probably adjust the fonts, titles, and colors a bit more but will keep the main design as it is.

I’d love to hear your feedback or ideas on this!

I Love Parameters

I spent time with a couple of co-workers yesterday going through revisions on an existing dashboard and received valuable constructive feedback. One person said the visuals are great and you built exactly what I asked for but I find it hard to answer questions – I end up having to do the work in Excel.

The thing they struggled with the most was looking at before and after a specific date. They needed the ability to pick the date to pivot off of and the ability to select the number of days back and forward to look at.

This was a good challenge for me and after thinking it over for a while I thought that parameters were exactly what I needed.

My idea was to create a parameter for the X date (the date to look back and forward from) and then create a parameter for the days back and the days forward and use those to limit the dates in the chart and use them to calculate the before and after measures they were looking for.

My Parameters are:

  • Selected Date – date to pivot back and forward from (date)
  • Days Back – # of days to go back (integer)
  • Days Forward – # of days to go forward (integer)

My Calculated fields are:

  • Back Date – [Selected Date] – [Days Back]
  • Forward Date – [Selected Date] + [Days Forward]
  • Custom Date Range – IF [Order Date] >= [Back Date]
    AND [Order Date] <= [Forward Date]
    THEN [Order Date]
    END

I then added the Custom Date range to the filters to exclude the nulls. To exclude the nulls I added the Custom Date dimension and selected Range of Dates > Special > Non Null

I used these fields to create a line chart to show the trend during the custom date range and added a reference line for the selected date. I also create a metric tile to show the before and after counts and changes in those counts. I added a few other charts to help them see what changed by different dimensions.

I love the flexibility with parameters!

I wanted to see if I could figure this out without Googling how to do this. I’d love to know if there is an easier way to do this or if I over engineered the solution.

 

Design Thinking

I was an early Tableau adopter in work and tried to push my end users out of their Excel data dump comfort zone. I would correct people who referred to Tableau as Excel on steroids on a regular basis. I was excited about giving my users a visual representation of their data. But, I kept getting asked to add table view of the data. I would add these views, but, I wouldn’t make them as pretty as the dashboards in hopes that people would use the dashboards instead. But that wasn’t the case, when I looked at Tableau server the most viewed sheets were the boring pivot table views.

I didn’t give up and kept plowing ahead and improving my Tableau skills. I’ve been trying to learn as much as I can about colors, charts, telling stories, LOD calculations, parameters and all the other things that go along with Tableau. I started to convert more users to the visual side and away from the table views. When someone wanted to know how to export the raw data my canned response was what do you need it for. I wanted to make the dashboard helpful for them. I felt like I was making some progress.

My goal with Tableau was to make it easier for the end users to quickly see what they need and allow them to interact and customize the dashboards but I felt like I was missing something – was I really giving them what they needed? Was I giving a solution without really knowing what they needed?

To help me solve that question I attended a Design Thinking Bootcamp class at General Assembly. The class was helpful and gave me a number of ideas on how I can change my approach.

Here are a few of the concepts that stood out to me:

  • ask open ended questions to the end user
  • silence is fine – if you ask a question and the user stops to think – let them think don’t interject other ideas or options
  • pay attention to work-arounds – these are areas the need isn’t being met
  • after you understand the need develop your point of view with the user and their need (not solution). someone who can’t reach the top shelf doesn’t need a ladder they need the item on the top shelf.
  • get the topic and ideas out on paper – make this a free flowing exercise – no judgements!
  • prototype – shouldn’t be a finalized version. It should be used to communicate the idea. be prepared to scrap it and start over – we don’t always get it right on the first go round.
  • build your story – who is the user, what is the challenge they face, how does that challenge impact them, what is the solution, how does it meet the need

It was helpful for me to be reminded that you need to fully understand the need and develop a lot of ideas around the need before jumping to a solution.

 

Make over Monday Week 5

Trying to get caught up on the Makeover Monday exercises from the last couple of weeks. I just finished week 5 which was on employment in the G7 countries. The original viz showed two pie charts on employment share and net employment growth in the G7 countries from 2010 – 2016

g7original

From reading the article the more important data point appeared to be the share of net employment growth in the US. I decided to turn that pie chart into a bar chart because I find it easier to see the differences in values with a bar chart than in a pie chart. I also adjusted the thickness of the bar to correspond to the measure. I kept the share of total employment in my remake as a reference point – I wanted the focus to be on the net share of employment so I added the share of total employment as a table. You can view the workbook on my Tableau public site.

g7mom

In both my MOM and work dashboards I am starting to use the same theme. I like the clean look and find it easier to read dashboards on a light background. I think the darker backgrounds can be beautiful when they are done right but I have a hard time reading them and sometimes find them distracting. I’m getting new glasses in a few weeks and maybe when I get my progressives I’ll change my mind but for now I’m sticking with the light background!