HOW TO PREP TO SQL – SUB-QUERIES

In this installment of the How to Prep to SQL series I’m going to review how you would translate a Prep flow into SQL using a sub-query. A sub-query is a nested query within a SQL statement. All of the examples below use Tableau Prep version 2020.2.3 and MS SQL Server and utilize the Tableau Superstore data set.

In this example I want to return all of the orders from Superstore where the discount is greater than the average discount for all the orders.

There are two different ways I would do this in Prep.

  • Connect to the Superstore data set
  • Insert a clean step and keep only the order ID & discount fields
  • Add an aggregate step and calculate the average discount
  • Join the aggregate and the first clean step together where the discount from the clean step is greater than the average discount

The final flow will look like this

The second approach in Prep is to use a LOD (level of detail calculation) in the first step. The calculation is {FIXED : AVG([Discount])}. Then create another calculation to evaluate if the discount is greater than the average discount [Discount] > [Avg Discount] and then right click on True and select keep only.

The final flow will look like this:

To do this in SQL, first, I am going to write a select statement to return the order id & discount fields from my orders table in the Superstore database:

SELECT
O.[Order ID],
O.[Discount]
FROM Superstore.dbo.Orders$ O

To get the average discount I am going to use a sub-query within the select statement. The sub-query is a select statement that will calculate the average discount from the orders table

SELECT
O.[Order ID],
O.[Discount],
(SELECT AVG([Discount]) FROM Superstore.dbo.Orders$) AS [AVG DISCOUNT] –this is the sub-query
FROM Superstore.dbo.Orders$ O

To limit the results to just the orders that are over the average discount we need to add a where clause that limits the orders to just those where the discount was above the average discount. We’ll do this by saying the discount is greater than the sub-query that calculates the average discount. (in the first prep example this was done through the join. in the second prep example this was done by creating the boolean field and then right clicking on true and selecting keep only)

The query is:

SELECT
O.[Order ID],
O.[Discount],
(SELECT AVG([Discount]) FROM Superstore.dbo.Orders$) AS [AVG DISCOUNT]
FROM Superstore.dbo.Orders$ O
WHERE O.[Discount] > (SELECT AVG([Discount]) FROM Superstore.dbo.Orders$)

PrepSQLSelect3

This is one example of how you could use Prep to help you understand sub-queries. If you are interested in another example or have any other questions on how to learn SQL by using Prep please let me know!

HOW TO PREP to SQL – DATES

This is the second post in a series on how you can use Tableau Prep to learn SQL. In this post I’ll go over how to create custom dates in Prep & SQL. This may also be handy for people who know SQL and aren’t as familiar with the Tableau date functions. The SQL and Prep examples are all based on the Tableau Superstore data set. The SQL is created for MS SQL Server, if you are using a different flavor of SQL the syntax may be different.

The beauty of Prep is that there is point and click functionality that makes creating date dimensions easy. Need to change a timestamp to a date? Click on the field type icon in the column and change it to a date. Need to change a date to a year number? Click column that has your date field and click on the … Click on the menu that says Convert Dates and you’ll see these options. Note if you want to maintain the date field duplicate the field and do the transformation on the new duplicated field.

this is an image of a convert date menu in Tableau Prep. The options are date & time, year number, quarter number, month number, week number, and day of the month
build in convert date options in Tableau Prep

 

If you’re wondering how to create these date dimensions in SQL, the table below will have a snippet of code. The full query is listed below the table.

Date Type (field type returned)TableauSQL
Timestamp to Date (Date)DATE([Order Date]) or click on the field type and select dateCAST(O.[Order Date] AS DATE) AS ORDER_DATE
Day of Month (Number)DAY([Order Date])DAY(O.[Order Date]) AS DAY_NUM
Day of Year (Number)DATEPART(‘dayofyear’,[Order Date])DATENAME(dy,O.[Order Date]) AS DAY_OF_YR_NUM
Epoch Date (Number)DATEDIFF(‘second’,#1970-01-01#,[Order Date])DATEDIFF(s, ‘1970-01-01 00:00:00’, O.[Order Date]) AS EPOCH_DATE
Day of Week Name (Text)DATENAME(‘weekday’,[Order Date])DATENAME(WEEKDAY,O.[Order Date])
Day of Week (Number)DATEPART(‘weekday’,[Order Date])DATEPART(dw,O.[Order Date])
Week Start (Date)DATE(DATETRUNC(‘week’,[Order Date]))DATEADD(dd, -(DATEPART(dw, O.[Order Date])-1), O.[Order Date]) AS WEEK_START_DATE
Month (Number)MONTH([Order Date]) or use convert date menuMONTH(O.[Order Date]) AS MONTH_NUM
Month Name (Text)DATENAME(‘month’,[Order Date])DATENAME(MONTH,O.[Order Date]) AS MONTH_NAME
Last Day of Month (Date)DATE(DATEADD(‘day’,-1,
DATEADD(‘month’, 1, DATETRUNC(‘month’,[Order Date]))
))
EOMONTH(O.[Order Date]) AS LAST_DAY_OF_MONTH
First Day of Month (Date)DATE(DATETRUNC(‘month’,[Order Date]))DATEADD(DAY, 1, EOMONTH(O.[Order Date], -1)) AS FIRST_DAY_OF_MONTH
Max Month in Data Set (Date){FIXED : MAX(DATE(DATETRUNC(‘month’,[Order Date])))}(SELECT MAX(DATEADD(DAY, 1, EOMONTH(O.[Order Date], -1))) FROM Superstore.dbo.Orders$ O) MAX_MONTH
Min Month in Date Set (Date){FIXED : MIN(DATE(DATETRUNC(‘month’,[Order Date])))}(SELECT MIN(DATEADD(DAY, 1, EOMONTH(O.[Order Date], -1))) FROM Superstore.dbo.Orders$ O) MIN_MONTH
Today’s Month (Date)DATE(DATETRUNC(‘month’, TODAY()))DATEADD(DAY, 1, EOMONTH(GETDATE(), -1))
Year (Number)YEAR([Order Date]) or use convert date menuYEAR(O.[Order Date]) AS YEAR_NUM
Today’s Year (Number)DATEPART(‘year’,TODAY())YEAR(GETDATE()) AS TODAY_YEAR
Current Date Time (Timestamp)NOW()GETDATE() AS TODAY_DATETIME
Quarter Number (Number)DATEPART(‘quarter’, [Order Date])DATEPART(qq,O.[Order Date]) AS QUARTER_NUM
Quarter Start (Date)DATETRUNC(‘quarter’,[Order Date])CAST(DATEADD(qq, DATEDIFF(qq, 0, O.[Order Date]), 0) AS DATE) AS QUARTER_START_DATE
Quarter End Date (Date)DATEADD(‘day’,-1,DATEADD(‘quarter’,1,DATETRUNC(‘quarter’,[Order Date])))CAST(DATEADD(d, -1, DATEADD(q, DATEDIFF(q, 0, O.[Order Date]) +1, 0)) AS DATE) AS QUARTER_END_DATE

This is the query in SQL:

SELECT
O.[Order ID],
CAST(O.[Order Date] AS DATE) AS ORDER_DATE,
DAY(O.[Order Date]) AS DAY_NUM,
DATENAME(dy,O.[Order Date]) AS DAY_OF_YR_NUM,
DATEDIFF(s, ‘1970-01-01 00:00:00’, O.[Order Date]) AS EPOCH_DATE,
DATENAME(WEEKDAY,O.[Order Date]) WEEKDAY_NAME,
DATEPART(dw,O.[Order Date]) AS WEEKDAY_NUM,
DATEADD(dd, -(DATEPART(dw, O.[Order Date])-1), O.[Order Date]) AS WEEK_START_DATE,
MONTH(O.[Order Date]) AS MONTH_NUM,
DATENAME(MONTH,O.[Order Date]) AS MONTH_NAME,
EOMONTH(O.[Order Date]) AS LAST_DAY_OF_MONTH,
DATEADD(DAY, 1, EOMONTH(O.[Order Date], -1)) AS FIRST_DAY_OF_MONTH,
(SELECT MAX(DATEADD(DAY, 1, EOMONTH(O.[Order Date], -1))) FROM Superstore.dbo.Orders$ O) MAX_MONTH,
(SELECT MIN(DATEADD(DAY, 1, EOMONTH(O.[Order Date], -1))) FROM Superstore.dbo.Orders$ O) MIN_MONTH,
DATEADD(DAY, 1, EOMONTH(GETDATE(), -1)) AS TODAY_MONTH,
YEAR(O.[Order Date]) AS YEAR_NUM,
YEAR(GETDATE()) AS TODAY_YEAR,
GETDATE() AS TODAY_DATETIME,
CAST(GETDATE() AS DATE) AS TODAY_DATE,
DATEPART(qq,O.[Order Date]) AS QUARTER_NUM,
CAST(DATEADD(qq, DATEDIFF(qq, 0, O.[Order Date]), 0) AS DATE) AS QUARTER_START_DATE,
CAST(DATEADD(d, -1, DATEADD(q, DATEDIFF(q, 0, O.[Order Date]) +1, 0)) AS DATE) AS QUARTER_END_DATE,
FROM Superstore.dbo.Orders$ O

 

Hope that you found this helpful.

 

 

 

HOW TO PREP to SQL – PIVOT

A very long time ago, my first introduction to SQL was through the SQL view in MS Access. I would build a query and then look at the SQL view to see how it translated that into SQL. Using MS Access gave me a decent foundation in how to build queries and connect tables.

Fast forward about 20 years and Tableau Prep is now my go to tool to help people learn how to manipulate and clean data. The built in functionality in Prep makes it a breeze to get your data in the format you want, but, there is no SQL view in Prep to look at and learn from. This is the first in a series of posts on how you would replicate steps from Prep in SQL.

For this example I’ve connected to the superstore data set and build a simple flow that counts the number of orders by ship mode and segment. In Prep we would do this by adding an aggregate step (the sigma step) that counts the orders by segment and ship mode.

The next step is to pivot (or reshape the data). In this step I am moving the ship mode to the columns so this is a rows to columns pivot step.

This is how to achieve the same result by using SQL, First create a query that pulls the fields in from the Superstore data set.

SELECT (select is saying pick these fields)
S.Segment, (separate your selected fields with a comma)
S.[Ship Mode], (use [] when your field name has a space)
S.[Order ID]
FROM SUPERSTORE S
(from is where you set the data source. The “S” after the table name is an alias. I think it is a good practice to use alias, especially when you start joining tables)

This gets the initial data set that we want. Now we want to add on to this query to move the Ship Mode to the columns and count the order ids.

SELECT * FROM (this is new and we add this to select all of the fields from the statement)

(SELECT (this is the query from above)
S.Segment,
S.[Ship Mode],
S.[Order ID]
FROM SUPERSTORE S) O

PIVOT( (this starts the reshaping of the data)
COUNT(O.[Order ID]) (we want to get a count of the order id)
FOR O.[Ship Mode] IN (this specifies the values that we want to include in our pivot. These are the values in the ship mode field. you can enter all or just a few of the values)
([First Class],
[Second Class],
[Standard Class],
[Same Day])
) AS ORDERS_SHIPMODE

Here is the query without my notes:

SELECT * FROM

(SELECT
S.Segment,
S.[Ship Mode],
S.[Order ID]
FROM SUPERSTORE.dbo.Orders$ S) O

PIVOT(
COUNT(O.[Order ID])
FOR O.[Ship Mode] IN
([First Class],
[Second Class],
[Standard Class],
[Same Day])
) AS ORDERS_SHIPMODE

 

Please note this is based on SQL for MS SQL Server. Oracle has a PIVOT function as well but the syntax is different. If you are copying the above query be sure to remove my comments in the () in the statement.

Hope this helps you understand how to do a simple pivot in SQL.