In this installment of the How to Prep to SQL series I’m going to review how you would translate a Prep flow into SQL using a sub-query. A sub-query is a nested query within a SQL statement. All of the examples below use Tableau Prep version 2020.2.3 and MS SQL Server and utilize the Tableau Superstore data set.

In this example I want to return all of the orders from Superstore where the discount is greater than the average discount for all the orders.

There are two different ways I would do this in Prep.

  • Connect to the Superstore data set
  • Insert a clean step and keep only the order ID & discount fields
  • Add an aggregate step and calculate the average discount
  • Join the aggregate and the first clean step together where the discount from the clean step is greater than the average discount

The final flow will look like this

The second approach in Prep is to use a LOD (level of detail calculation) in the first step. The calculation is {FIXED : AVG([Discount])}. Then create another calculation to evaluate if the discount is greater than the average discount [Discount] > [Avg Discount] and then right click on True and select keep only.

The final flow will look like this:

To do this in SQL, first, I am going to write a select statement to return the order id & discount fields from my orders table in the Superstore database:

O.[Order ID],
FROM Superstore.dbo.Orders$ O

To get the average discount I am going to use a sub-query within the select statement. The sub-query is a select statement that will calculate the average discount from the orders table

O.[Order ID],
(SELECT AVG([Discount]) FROM Superstore.dbo.Orders$) AS [AVG DISCOUNT] –this is the sub-query
FROM Superstore.dbo.Orders$ O

To limit the results to just the orders that are over the average discount we need to add a where clause that limits the orders to just those where the discount was above the average discount. We’ll do this by saying the discount is greater than the sub-query that calculates the average discount. (in the first prep example this was done through the join. in the second prep example this was done by creating the boolean field and then right clicking on true and selecting keep only)

The query is:

O.[Order ID],
(SELECT AVG([Discount]) FROM Superstore.dbo.Orders$) AS [AVG DISCOUNT]
FROM Superstore.dbo.Orders$ O
WHERE O.[Discount] > (SELECT AVG([Discount]) FROM Superstore.dbo.Orders$)


This is one example of how you could use Prep to help you understand sub-queries. If you are interested in another example or have any other questions on how to learn SQL by using Prep please let me know!


This is the second post in a series on how you can use Tableau Prep to learn SQL. In this post I’ll go over how to create custom dates in Prep & SQL. This may also be handy for people who know SQL and aren’t as familiar with the Tableau date functions. The SQL and Prep examples are all based on the Tableau Superstore data set. The SQL is created for MS SQL Server, if you are using a different flavor of SQL the syntax may be different.

The beauty of Prep is that there is point and click functionality that makes creating date dimensions easy. Need to change a timestamp to a date? Click on the field type icon in the column and change it to a date. Need to change a date to a year number? Click column that has your date field and click on the … Click on the menu that says Convert Dates and you’ll see these options. Note if you want to maintain the date field duplicate the field and do the transformation on the new duplicated field.

this is an image of a convert date menu in Tableau Prep. The options are date & time, year number, quarter number, month number, week number, and day of the month
build in convert date options in Tableau Prep


If you’re wondering how to create these date dimensions in SQL, the table below will have a snippet of code. The full query is listed below the table.

Date Type (field type returned)TableauSQL
Timestamp to Date (Date)DATE([Order Date]) or click on the field type and select dateCAST(O.[Order Date] AS DATE) AS ORDER_DATE
Day of Month (Number)DAY([Order Date])DAY(O.[Order Date]) AS DAY_NUM
Day of Year (Number)DATEPART(‘dayofyear’,[Order Date])DATENAME(dy,O.[Order Date]) AS DAY_OF_YR_NUM
Epoch Date (Number)DATEDIFF(‘second’,#1970-01-01#,[Order Date])DATEDIFF(s, ‘1970-01-01 00:00:00’, O.[Order Date]) AS EPOCH_DATE
Day of Week Name (Text)DATENAME(‘weekday’,[Order Date])DATENAME(WEEKDAY,O.[Order Date])
Day of Week (Number)DATEPART(‘weekday’,[Order Date])DATEPART(dw,O.[Order Date])
Week Start (Date)DATE(DATETRUNC(‘week’,[Order Date]))DATEADD(dd, -(DATEPART(dw, O.[Order Date])-1), O.[Order Date]) AS WEEK_START_DATE
Month (Number)MONTH([Order Date]) or use convert date menuMONTH(O.[Order Date]) AS MONTH_NUM
Month Name (Text)DATENAME(‘month’,[Order Date])DATENAME(MONTH,O.[Order Date]) AS MONTH_NAME
Last Day of Month (Date)DATE(DATEADD(‘day’,-1,
DATEADD(‘month’, 1, DATETRUNC(‘month’,[Order Date]))
First Day of Month (Date)DATE(DATETRUNC(‘month’,[Order Date]))DATEADD(DAY, 1, EOMONTH(O.[Order Date], -1)) AS FIRST_DAY_OF_MONTH
Max Month in Data Set (Date){FIXED : MAX(DATE(DATETRUNC(‘month’,[Order Date])))}(SELECT MAX(DATEADD(DAY, 1, EOMONTH(O.[Order Date], -1))) FROM Superstore.dbo.Orders$ O) MAX_MONTH
Min Month in Date Set (Date){FIXED : MIN(DATE(DATETRUNC(‘month’,[Order Date])))}(SELECT MIN(DATEADD(DAY, 1, EOMONTH(O.[Order Date], -1))) FROM Superstore.dbo.Orders$ O) MIN_MONTH
Today’s Month (Date)DATE(DATETRUNC(‘month’, TODAY()))DATEADD(DAY, 1, EOMONTH(GETDATE(), -1))
Year (Number)YEAR([Order Date]) or use convert date menuYEAR(O.[Order Date]) AS YEAR_NUM
Current Date Time (Timestamp)NOW()GETDATE() AS TODAY_DATETIME
Quarter Number (Number)DATEPART(‘quarter’, [Order Date])DATEPART(qq,O.[Order Date]) AS QUARTER_NUM
Quarter Start (Date)DATETRUNC(‘quarter’,[Order Date])CAST(DATEADD(qq, DATEDIFF(qq, 0, O.[Order Date]), 0) AS DATE) AS QUARTER_START_DATE
Quarter End Date (Date)DATEADD(‘day’,-1,DATEADD(‘quarter’,1,DATETRUNC(‘quarter’,[Order Date])))CAST(DATEADD(d, -1, DATEADD(q, DATEDIFF(q, 0, O.[Order Date]) +1, 0)) AS DATE) AS QUARTER_END_DATE

This is the query in SQL:

O.[Order ID],
DAY(O.[Order Date]) AS DAY_NUM,
DATEDIFF(s, ‘1970-01-01 00:00:00’, O.[Order Date]) AS EPOCH_DATE,
DATEADD(dd, -(DATEPART(dw, O.[Order Date])-1), O.[Order Date]) AS WEEK_START_DATE,
(SELECT MAX(DATEADD(DAY, 1, EOMONTH(O.[Order Date], -1))) FROM Superstore.dbo.Orders$ O) MAX_MONTH,
(SELECT MIN(DATEADD(DAY, 1, EOMONTH(O.[Order Date], -1))) FROM Superstore.dbo.Orders$ O) MIN_MONTH,
YEAR(O.[Order Date]) AS YEAR_NUM,
FROM Superstore.dbo.Orders$ O


Hope that you found this helpful.





A very long time ago, my first introduction to SQL was through the SQL view in MS Access. I would build a query and then look at the SQL view to see how it translated that into SQL. Using MS Access gave me a decent foundation in how to build queries and connect tables.

Fast forward about 20 years and Tableau Prep is now my go to tool to help people learn how to manipulate and clean data. The built in functionality in Prep makes it a breeze to get your data in the format you want, but, there is no SQL view in Prep to look at and learn from. This is the first in a series of posts on how you would replicate steps from Prep in SQL.

For this example I’ve connected to the superstore data set and build a simple flow that counts the number of orders by ship mode and segment. In Prep we would do this by adding an aggregate step (the sigma step) that counts the orders by segment and ship mode.

The next step is to pivot (or reshape the data). In this step I am moving the ship mode to the columns so this is a rows to columns pivot step.

This is how to achieve the same result by using SQL, First create a query that pulls the fields in from the Superstore data set.

SELECT (select is saying pick these fields)
S.Segment, (separate your selected fields with a comma)
S.[Ship Mode], (use [] when your field name has a space)
S.[Order ID]
(from is where you set the data source. The “S” after the table name is an alias. I think it is a good practice to use alias, especially when you start joining tables)

This gets the initial data set that we want. Now we want to add on to this query to move the Ship Mode to the columns and count the order ids.

SELECT * FROM (this is new and we add this to select all of the fields from the statement)

(SELECT (this is the query from above)
S.[Ship Mode],
S.[Order ID]

PIVOT( (this starts the reshaping of the data)
COUNT(O.[Order ID]) (we want to get a count of the order id)
FOR O.[Ship Mode] IN (this specifies the values that we want to include in our pivot. These are the values in the ship mode field. you can enter all or just a few of the values)
([First Class],
[Second Class],
[Standard Class],
[Same Day])

Here is the query without my notes:


S.[Ship Mode],
S.[Order ID]

COUNT(O.[Order ID])
FOR O.[Ship Mode] IN
([First Class],
[Second Class],
[Standard Class],
[Same Day])


Please note this is based on SQL for MS SQL Server. Oracle has a PIVOT function as well but the syntax is different. If you are copying the above query be sure to remove my comments in the () in the statement.

Hope this helps you understand how to do a simple pivot in SQL.


morality and grocery shopping

I haven’t stepped foot in a grocery store in almost 7 weeks. Since Massachusetts went into a stay at home advisory I’ve avoided going into stores and have relied on Instacart, Amazon Fresh, USPS, UPS, ShipIt, and a local meat farm for my food and necessities. Every time I use these services, I feel guilty and struggle with the decision. Is it morally right for me to pay someone to do something I am uncomfortable doing?

My moral compass has always been guided by fairness, and it has been strong moral compass. I’ve made people uncomfortable when I refused to laugh at a joke that degraded someone because they were different and made career decisions based on how teammates were treated. And now I’m not sure if I am off course. Is it morally right for me to expect someone to do something I am uncomfortable doing?

I assume that the people who have been shopping for me are in a different situation than I am. We are DINKs and my husband and I are still employed. I recognize that we are lucky. While I don’t know for sure, I would guess that the people who have been doing my shopping are putting themselves in at risk because they need to. When I’ve discussed this with friends, they’ve tried to re-assure me by saying you’re supporting someone who needs work. All I can think is that I am putting someone in potential danger. Am I harming someone because I don’t want to do something?

I see multiple sides of this and have come to the realization there is no real answer to this. We’re all trying to do the best we can to make the right decisions.

Virtual Prep TUG Example

I was honored to present at the first virtual Tableau Prep user group session and was both thrilled and a bit nervous when I found out 1,200 people had registered for the session. Thrilled because I was glad to see that there were so many people interested in Prep and nervous because I hadn’t presented to that many people before. It was a great TUG and I learned something from my co-presenters Joshua, Jenny, and Rahim and am appreciative of Jack & Caroline’s efforts in making the TUG happen. The link to the recording and the packaged flow are posted on the Tableau forum.

My presentation was based on a real life example where I had to reverse the management levels in an employee hierarchy. The people hierarchy file has an employee’s immediate leader as their first management level, their immediate leader’s leader as their second and then works up the hierarchy to the company leader. I need to reverse that so the company leader is the first management level and then works down the hierarchy so the immediate leader is in the last hierarchy level.

As you can see in the Excel employees have a different number of leaders so the top level leader could be in any one of the columns. When I was working through this I noticed all of the management levels headers have a number in them and my initial thought was I could use that number as the way to reverse the levels.

After connecting to the file in Prep I added an initial clean step and created one calculated field called Dummy Field for Aggregate Joins with “A” as the calculation. (I’ll get back to why I created this towards the end of the post.)

I wanted to extract the number from the management level headers and use that to create a new reversed level. In order to extract the number I need to reshape my data with a pivot step. Moving the columns to rows puts all of my managers in one column and creates a new field called Pivot 1 Names which were my old column headers.

After the pivot I added a clean step and made 7 changes:

  1. excluded any records with a null management level
  2. duplicated the pivot names field
  3. use the built in clean function to remove all the letters from the field created in step 2
  4. removed all of the spaces from the new field
  5. changed the field type from a string to a number
  6. renamed the new field management level
  7. removed the Pivot 1 Names field

These steps created a new field with the number from the management level header. I duplicated the original pivot names field because I am in the habit of keeping the original field as a comparison point when I change the values. You do not have to do this, it is just a personal preference.

The built in clean tools to remove letters and spaces is accessed by selecting the column you want to clean and then clicking on the 3 dots. That opens a new menu and you’ll see the clean option.

The next step was to get the max management level number for each employee. When I have the max level I will be able to subtract that from the management level we pulled out of the header to get the new management level. To get the max level I added an aggregate step, grouped by the employee and added the management level number to the aggregated fields and changed the calculation to max. I then joined that max level back to my data to add that field in. Note that in the latest version of Prep you can now do this with a LOD (level of detail) calculation this functionality didn’t exist when I created the flow.

Now that I have the highest management level for each employee I can subtract that from the management level and add 1 to that to get the reversed level. I created this calculation ([Max Mgmt Level] – [Manager Level]) +1. I also created a new header field with the reversed level with this calculation “Level ” + STR([Reverse Mgmt Level]) + ” Manager”

In this snippet of data you can see that Albert Norman has Brittany Newman has his 1st management level, his highest management level is 5. When that is reversed Patti Reed who is the top level is now the level 1 manager and Brittany is the level 5 manager.

I cleaned up a few fields and then added another pivot to move the new management levels back to columns. This pivot is a rows to column pivot and because I know there is only 1 value for each level I am taking the min of the manager name.

The last thing to do is to add Patti Reed back to the cleaned data. Patti is the CEO of Fake Company and does not have a management level. When we excluded the null management levels after the first pivot she was removed from the data set. I created a branch for just Patti and unioned that back to the cleaned data set.

Earlier in this post I mentioned that I created a dummy field with “A” in it. I like to add aggregates into my flows to check the record counts at different stages of the flow. I got in the habit of creating these and exporting them out because I often work with sampled data. Creating the dummy field allows me a way to join the aggregates together and validate my record counts. If you’ve downloaded the flow you’ll see these aggregates and the step that exports the counts.

Thanks for reading and I hope this example was helpful. If you have any questions please feel free to get in touch. Happy Preppin!

January 2020 Sports Viz Sunday

The topic for the first #SportsVizSunday of 2020 is personal health data. I took some leeway with the topic and looked at my golf handicap index and scores. I normally walk the golf course and golf impacts my mental health (sometimes positive and sometimes negative). There were a few times this year where I thought about buying a boat.

For #SportsVizSunday, I wanted to look at where my index fell in relation to other women who keep a handicap and highlight the scores that count towards my current index. As with most public work I do, I like to keep it simple. I spend a lot of time during the week working on dashboards so in my free time I tend to keep it light and simple.

The 2019 season was a bit all over the place for me. I struggled with my irons for the last two seasons and that definitely impacted my score. While that aspect was off the rest of my game was in good shape and that helped me get my handicap index down to an 18.4.

I play most of my golf at two different courses and wanted to see what my score and differentials looked like at those two courses. I felt like I played better at Furnace Brook because I hit my fairway woods and hybrid more than I hit my irons. The data backed that up. I scored better (based on differential) at Furnace Brook than at William J Devine.

my differential at the Brook was 4 strokes lower than at the Park

In 2020 I’m going to track more of my golf stats and visualize them to see where I can get better. I know where I struggle with my game, but, seeing the data makes it a bit more real.

#PreppinData Week 1 2020

The first #PreppinData challenge for 2020 was to recreate Ryan Sleeper’s financial statement table using data from the fictitious Chin & Beard Suds Co.

To do this we need to:

  • get the profit totals
  • get the profit subtotals
  • update the blanks with the total & subtotals
  • indent the 2nd & 3rd level data

This is the initial view of the data. Based on the naming convention of the Item my first thought was to split off the number portion of the field and use that as a way to create a hierarchy.


I used the custom split tool to parse the field off a space and took the first field. I trimmed any extra spaces and renamed this field Item Id. These are the calculations:


I then created by hierarchy levels taking the left X number of characters from the new item id field. My thought was I would use these to get the totals & subtotals. These are the fields I created for the hierarchy:


I also created a new field with the indented item names:

Format Item Name

IF LEN([Item ID]) = 2 THEN [Item]
ELSEIF LEN([Item ID]) = 3 THEN SPACE(5) + [Item]
ELSEIF LEN([Item ID]) = 5 THEN SPACE(10) + [Item]


After I had my levels I then created two aggregates to get the totals and subtotals. The first one sums the profit by my new top level field and the second one sums the profit by the new second level field. I joined both of these aggregates back to the prior step where the top level in to totals aggregate = the item id & where the second level = the second level.


The last step is the clean up step. I this step I have 11 changes to the joined data.

  • remove the duplicated fields from the joins.
  • merge the profit field from the initial step with the profits from the aggregates
  • rename the merged profit fields to profit
  • created a calculated field to get the length of the item id field to sort my rows correctly
  • renamed the Format Item Name field to item name
  • removed any remaining unnecessary fields


This was a great challenge to kick off the 2020 #PreppinData series. I love the formatting idea from Ryan and have a few ideas of how I can implement both Ryan’s table and this PreppinData challenge in my day to day work.

If anyone is interested in getting a copy of my flow please let me know. I am more than happy to share my approach.

Happy Preppin!













The Tableau Community

A quick sidebar with Mark Edwards and a message from Adam Mico on Twitter on the last day of the Tableau conference got me thinking about defining the “Tableau community” and what my Tableau community is. I’ve been noodling this for a few days now and this is what it means to me.

My Tableau community is:
Krish who works at TD bank in Toronto and attended his first Tableau conference. I was sitting at a table with some folks I “knew” from Twitter at the TUG Tips Battle Session and noticed that someone was sitting by themselves at a table. I have been that person at many events. I moved to the table, introduced myself and struck up a conversation. There are so many people like Krish who are in the Tableau community but not involved in the community projects.

My Tableau community is:
A man I had lunch with on Thursday when we arrived at the same time to a 2-seat table to eat our lunch. I spent 5 minutes with him and learned that he was new to Tableau and trying to learn as much as he could at conference. I suggested that he check out Ryan Sleeper’s blog and Playfair Data TV as resources to get him up to speed on Tableau when he got home.
A few hours later I ran into Ryan at the community area of the Data Village and ended up getting pizza with him, Sean Miller, Tom O’Hara, Will Strousse and Furkan Celik. I know Tom and Will and enjoyed getting to know Furkan, Ryan, and Sean more. Ryan even schooled the 3 Bostonians there on the time out chair at Pizzeria Regina.

My Tableau community is:
James, Simon, and Spencer who have given me exposure to the larger Tableau community by having me host #SportsVizSunday twice and by asking me to be on the half time panel of their data19 session. Giving people an opportunity to get their name out there is invaluable and I appreciate what they have done for me.

My Tableau community is:
Bridget Cogley who told me things I needed to hear and encouraged me not to settle. Don’t let the shortness of this section fool you. This was one of the most important conversations I had all week.

My Tableau community is:
All of the people I know from the Twitter community. All of the people I know from BTUG including my first TUG friends Paula Munoz and Susan Glass. All of my co-workers who use Tableau including Amar, Jesse, Josh, and Tom. All the people who have asked and answered questions on the forums that have helped me. All of the people who write blogs and do videos to share their knowledge.

The community isn’t just those with rocks, those that are ambassadors, those that are on Twitter, those involved in the community projects, and those that win community awards. The Tableau community is anyone who uses Tableau in some capacity and I can’t wait to meet more and more of those people.

Preppin Data Week 14

The #Preppin Data week 14 challenge asked us to determine the impact of implementing a meal deal at Whyte’s Cafe. I thought this was a good challenge and per usual I broke the steps out into a number of steps. Seeing the solutions for the weekly challenges made me realize that I usually have a different approach than the solution and that I like to break things out into multiple steps and this week was no different.

Here is what my final flow looks like:


I connected to the .csv file and inserted a clean step. The steps say to replace null prices to 1.5, however, I didn’t have any nulls when I imported my data so I skipped that step. They also stated to replace the null member ids with a zero. This is the only action in my first step. To do this I right clicked on the null in the member id field and selected edit value and replaced the null with 0.

The output needed to have the total ticket price, the number of meal deals, the total prices for items not included in the meal deal, and the difference between the total price and the meal deal adjusted price. I determined these through 3 different branches.

The first aggregate I created was to determine the number of meal deals per ticket. To get this I needed to know the number of items per ticket.


After the aggregate I added a pivot step to move the type from rows to columns. I did this so I could use the columns to determine the number of meal deals per ticket.


I then inserted a clean step to create the meal deal flag and the number of meal deals per ticket. This is my approach:

  • determine if the ticket has items that are meal deal eligible
    • [Drink] > 0 AND  [Snack] > 0 AND  [Main] > 0
  • right clicked on the new meal deal eligible field and kept true
  • determine number of meal deals per ticket:
    • IF [Drink] <= [Snack] AND [Drink] <= [Main] THEN [Drink] ELSEIF [Snack] <= [Drink] AND [Snack] <= [Main] THEN [Snack] ELSEIF [Main] <= [Drink] AND [Main] <= [Snack] THEN [Main] END

The next step was to get the average cost per item and ticket. I created this to use to determine the excess cost later in the flow. Because drinks, mains, and snacks all have different prices I used the average by type to get the excess cost.


I also pivoted the types to columns and then added a clean step. In the clean step I replaced any null values to a zero.

The third aggregate totaled the cost per ticket.


I added a clean step after the aggregate to round the total cost to 2 decimals. I used this calculation ROUND([Total Ticket Price],2).

After the aggregates were ready to go I joined the 3 branches together and started the final steps. These are the calculations I did in the costs step:

  • determines meal deal cost:
    • Meal Deal Total: [Min Meal Deal Item] *5
  • determine number of items that aren’t part of the meal deal (3 different calculations):
    • [Drink_Items] – [Min Meal Deal Item]
    • [Snack_Items] – [Min Meal Deal Item]
    • [Main_Items] – [Min Meal Deal Item]
  • determine cost of excess items:
    • [Excess Drink Items] * [Drink_Price]
    • [Excess Snack Items] * [Snack_Price]
    • [Excess Main Items] * [Main_Price]
  • determine excess total:
    • ROUND([Excess Drink Cost] + [Excess Main Cost] + [Excess Snack Cost],2)

Because I like to break things up I inserted another clean step. These are the calculations I used to get the final output.

  • determine the new total price using the meal deal:
    • [Meal Deal Total] + [Excess Total]
  • determine the cost difference:
    • [Total Ticket Price] – [Total Ticket Meal Deal Adj]

My meal deal items and costs matched the solution output but my prices do not and I haven’t been able to figure that out. I have done pivots in excel to get the cost by ticket and they match what I have in my output. I decided to not get too hung up on that an use this week as a concept and not exact match.

I enjoyed this week’s challenge and I continue to enjoy seeing the different approaches to the challenge each week.


Different Ways of Getting the Same Result

There are usually a number of different ways to get to the same result in Tableau & Tableau Prep. Week 9 of #PreppinData is another example of this.

For this edition of #PreppinData we looked at Chin & Beard Suds Co’s Twitter complaints. We were given a list of complaints and asked to:

Remove Chin & Beard Suds Co Twitter handle
Split the tweets up in to individual words
Pivot the words so we get just one column of each word used in the tweets
Remove the 250 most common words in the English language (sourced from here for you:
Output a list of words used alongside the original tweet so we know the context for how the word was used.

Here’s the flow I created


The first clean step splits the text on the space character. The next step pivots all of those splits back together in one column. In this pivot I used a wildcard pivot where the field name contained Tweet –


After consolidating the splits, I did a few clean steps to get the two sources ready for joining. Anytime that I join on text I always make sure to trim all the extra spaces and make the text fields either upper or lower case. I think this is a good habit to be in for text matching. I also excluded the company Twitter handle and any null records in this step.

week9d  week9e

Now it is time to join them together. The first join I did was an outer join between the tweets and the list of the 250 words. In the step after the join I kept only the records from the 250 list that were null (I used the rank field).

The other way this join can be done is with the left unmatched only join type. When you use this join type all you need to do is remove the two fields from the 250 list.

Initially I didn’t think of the other join type and found a way to get to the final result. Going back and looking at the joins the second option is probably the better way to go. There isn’t a wrong and right way to do it just different approaches.

Happy Preppin!