#Prepping Data in Prep and Power Query

I’ve been dabbling with Power Query in Excel for the past few days and wanted to see if I could complete the current #PreppinData challenge with Power Query. I’m brand new to Power Query so the steps I outline below are what I got to work and may not be most efficient or best way of doing this. This isn’t an evaluation of which is better or what you can do in one that you can’t do in the other. I don’t have enough experience with Power Query to make that assessment. This is more the method behind my madness of how I solved a #PreppinData challenge in two different tools.

WARNING THIS IS A LONG POST!

This week’s challenge was to rank the men’s Premier League teams with and with out the 6 teams that were going to leave for the “Super League”. I created the Prep flow first to work through the logic and transformations needed. Below I will walk through each step from both Prep & Power Query.

INPUT:
Once you open Tableau Prep you’re presented with the option of opening a flow or connecting to data. When you select connect to data you select your data source type, in this example it is a text file, and then navigate out to the file. Once you do that you’ll see an input node in the flow

this is an example of what the input node looks like in Tableau Prep
Tableau Prep Input Node

To connect to the data in Power Query I opened MS Excel, navigated to the Data tab and selected Get Data From Text/CSV. Like Prep, you navigate to the file and select open. You’ll then be presented with this screen.

this is the screen you get when importing a text file into Excel.
Load Text/CSV into Excel

When you select transform data the Power Query window will open and look like this. This is where the transformations are created.

this is an example of what the Power Query window looks like
Power Query Window

INITIAL CLEAN:
The first thing I did was exclude the rows with a null result. These are future games (as of the time the file was created).
Prep: I did this by right clicking on the null in the results column and selecting exclude.
Power Query: I did this by opening the filter next to results and un-selecting blank.

example of how to remove null from both Prep and Power Query
Remove Future Games Prep & Power Query

I then wanted to change the match date to an actual date field. While this isn’t a required step it is something I do out of habit whenever I have a date field in my data set and I wanted to see how Power Query handled this.
Prep: The date field came through as a timestamp so I clicked on the data type and changed to a date
Power Query: The date field came through as a text field and there is also the point & click functionality to change to a date. I selected date and got an error, I selected date and time and got an error for some rows and others did not have the correct date, I then noticed using locale at the end of the menu and after I selected date time and English (United Kingdom) the (#PreppinData team is located in the UK) I was able to convert to a date time field and then to a date.

this is an image of changing the data type from a string to a date in Tableau Prep & Power Query
Converted to Date Prep & Power Query

The next thing I wanted to do was create a row number for each game. There isn’t a unique identifier in the data set so I like to create one.
Prep: Created a new calculated field called Row Number { ORDERBY [Date] ASC :ROW_NUMBER()}
Power Query: This was the first time I needed to create a new column and it was pretty easy. I went to the Add Column Menu and clicked on the drop down next to Index Column and selected from 1

We need to create a new field that determines the number of points a team has and the the points are based on win,loss, and draw. In order to calculate that I split the results field to get the home and away scores. Splitting fields is pretty easy in both tools but Power Query has the option to split on case (upper to lower and lower to upper) and digit/non-digit (digit to non and non to digit). Would love to see those as options in Prep.
Prep: Use the custom split functionality and split all fields on a –
Power Query: Use the split column option on the Home tab and split on a – for each occurrence
After splitting the columns I renamed them home score and away score and in Prep I changed them to whole numbers.

this is an example of splitting a column in Tableau Prep & Power Query
Split Column Prep & Power Query


We want to know what the team’s position is with and without the big 6 teams, so we need to create a field to indicate whether the game included a big 6 team.
Prep: For some reason Prep doesn’t have the IN operator but desktop does, so I created a boolean calculated field that looks to see if the home or away team is one of the big 6 teams using nested ORs.
Power Query: I knew I needed to create a new column so I went back to the Add Column tab and selected the Conditional Column option. That brings up the screen below where I entered the big 6 teams for both home and away teams.

this is an example of the conditional column in Power Query
Conditional Column in Power Query

I did warn you earlier that this is a long post.

RESHAPE:

Now that the initial cleaning is done I need to reshape my data so each team has their own row. I will need to aggregate the points for the teams for all of their games so I need to move my Home & Away Teams into the same column.
Prep: I added a pivot step after my initial clean step. I’m pivoting columns to rows and added my home and away teams to the pivoted fields. I’ll then rename the Pivot 1 Names field to Home Away
Power Query: To reshape the data I needed to unpivot my columns (interesting that I am pivoting in one tool and unpivoting in another). I selected the Home Team & Away Team columns and clicked on the drop down next to Unpivot Columns and selected the first option Unpivot Columns.

this is an example of the pivot in Tableau Prep and the unpivot in Power Query
Pivot in Prep & Unpivot in Power Query

CREATE POINTS & GOAL DIFFERENCE:

The next thing to do is create the points and the goal difference. A win is worth 3 points, a draw is worth 1, and a loss is worth 0. There could be an easier way to do this but I did this through 2 calculated fields in Prep and a series of custom columns in Power Query.
POINTS:
Prep: I created this calculated field:
IF [Home Away] = ‘Home Team’
AND [Home Score] > [Away Score]
THEN 3
ELSEIF [Home Away] = ‘Away Team’
AND [Away Score] > [Home Score]
THEN 3
ELSEIF [Home Score] = [Away Score]
THEN 1
ELSE 0
END
Power Query: I created these custom columns:
Home Points:
if [Team Type] = “Home Team”
and [Home Score] > [Away Score]
then 3
else 0
Away Points:
if [Team Type] = “Away Team”
and [Away Score] > [Home Score]
then 3
else 0
Draw Points:
created a conditional column where if the home score = away score it returned 1 else 0
Total Points:
[Home Points] + [Away Points] + [Draw Points]
I will go back and change how these are done but when I work with something new I break things into small chunks and that is what I did here. I’d guess that I can do this in a similar manner as I did in Prep but for my first time I wanted to do it this way.

GOAL DIFFERENCE:
Prep: I created this custom calculation:
IF [Home Away] = ‘Home Team’
THEN [Home Score] – [Away Score]
ELSEIF [Home Away] = ‘Away Team’
THEN [Away Score] – [Home Score]
END
Power Query:
if [Team Type] = “Home Team”
then [Home Score] – [Away Score]
else
if [Team Type] = “Away Team”
then [Away Score] – [Home Score]
else 0


AGGREGATE:

We’ve gotten to the point where we can summarize the number of points & the goal difference by team. We’ll need to rank the teams by these new metrics and will need to calculate them with the big 6 teams and without the big 6 teams.
Prep: I added two aggregate steps, one is filtered to exclude the big six teams and one includes every team. The two aggregates are identical outside of the big 6 filter.

this is an example of the aggregate step in Tableau Prep
Aggregate Step in Prep Non Big 6 Teams

Power Query: At this point I was a little lost as to what to do. I thought maybe I should add pivot tables and then find a way to combine them. Unrelated to the aggregate I asked how to replicate a LOD calc in Power Query and Jorge Supelano mentioned the group by feature and that was the clue I needed.
These are the steps I took to get the summarized data:
Did a close and load on the cleaned data and created another query off that range.
Used the group by option to summarize the data

this is an example of the group by function in Power Query
Power Query Group by

Did a close and load to a new worksheet
Created a new query off of the summarized data and filtered those rows to exclude the big 6 teams.
Created the same group by as above and did a close and load to a new worksheet.


I’m impressed if you are still reading this.

RANKINGS:

We’re getting close to the end. The next thing we need to do is create the rankings. The primary rank is on the total points and ties are broken with the goal differential. The approach in Prep & Power Query was similar.
PREP:
To get the position I used the ROW_NUMBER() function. I ordered the data descending by the points and the goal difference. Remember I branched off the flow to have an all section & a non big 6 section so this is replaced in both steps in the flow.
{ORDERBY [Match Points] DESC, [Goal Difference] DESC : ROW_NUMBER()}
POWER QUERY:
I edited the aggregates I loaded above to add in the ranking.
The first step was to sort both the points & difference in descending order. I did this by clicking on the heading and selecting the Z to A on the Home tab.
The next step was to add a column that has the ranking, earlier in the flow I added an index column to give each match an unique id. I used that same logic to get the position ranking.
After both the all summarized & non big 6 summarized sets had the ranking I closed and loaded those back into the workbook.


FINAL TIDY:

You’ve made it to the end! Now that we’ve got the summarized data with and without the big 6 in each tool we need to combine them to see the difference in the rankings.
PREP: This is a straight forward inner join of my two branches on the team name in Prep

example of the join step in Tableau Prep
Join Step in Tableau Prep

POWER QUERY: I was again totally lost on how to combine my two sets together in Power Query. When I asked about the LOD Spencer Baucke mentioned merge in his reply to me. Once again another clue from the Twitter data community. I didn’t want to mess up my existing queries so I again created a new query off of the non big 6 data and when that I loaded I found the Merge Queries option in the Home menu.
I merged or joined the All Summarized query on the team name by selecting the field in both sections and picking and inner join.

example of the merge queries functionality in Power Query
Merge Queries in Power Query

Now that I have the rankings with and without the big 6 I can subtract the rankings and clean up the columns to just include what the challenge is looking for.


FINAL THOUGHTS:

I haven’t used Excel much over the last few years. I’m either using SQL, Tableau Desktop, or Tableau Prep for my data needs. I’m glad I took the time to do an initial test of Power Query and look forward to learning more about what it can do. Thanks for reading this and hopefully you’re inspired to check out Power Query.

Published by

Kate Brown

golfer, yogi, history major data loving person

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s