I was honored to present at the first virtual Tableau Prep user group session and was both thrilled and a bit nervous when I found out 1,200 people had registered for the session. Thrilled because I was glad to see that there were so many people interested in Prep and nervous because I hadn’t presented to that many people before. It was a great TUG and I learned something from my co-presenters Joshua, Jenny, and Rahim and am appreciative of Jack & Caroline’s efforts in making the TUG happen. The link to the recording and the packaged flow are posted on the Tableau forum.
My presentation was based on a real life example where I had to reverse the management levels in an employee hierarchy. The people hierarchy file has an employee’s immediate leader as their first management level, their immediate leader’s leader as their second and then works up the hierarchy to the company leader. I need to reverse that so the company leader is the first management level and then works down the hierarchy so the immediate leader is in the last hierarchy level.
As you can see in the Excel employees have a different number of leaders so the top level leader could be in any one of the columns. When I was working through this I noticed all of the management levels headers have a number in them and my initial thought was I could use that number as the way to reverse the levels.
After connecting to the file in Prep I added an initial clean step and created one calculated field called Dummy Field for Aggregate Joins with “A” as the calculation. (I’ll get back to why I created this towards the end of the post.)
I wanted to extract the number from the management level headers and use that to create a new reversed level. In order to extract the number I need to reshape my data with a pivot step. Moving the columns to rows puts all of my managers in one column and creates a new field called Pivot 1 Names which were my old column headers.
After the pivot I added a clean step and made 7 changes:
- excluded any records with a null management level
- duplicated the pivot names field
- use the built in clean function to remove all the letters from the field created in step 2
- removed all of the spaces from the new field
- changed the field type from a string to a number
- renamed the new field management level
- removed the Pivot 1 Names field
These steps created a new field with the number from the management level header. I duplicated the original pivot names field because I am in the habit of keeping the original field as a comparison point when I change the values. You do not have to do this, it is just a personal preference.
The built in clean tools to remove letters and spaces is accessed by selecting the column you want to clean and then clicking on the 3 dots. That opens a new menu and you’ll see the clean option.
The next step was to get the max management level number for each employee. When I have the max level I will be able to subtract that from the management level we pulled out of the header to get the new management level. To get the max level I added an aggregate step, grouped by the employee and added the management level number to the aggregated fields and changed the calculation to max. I then joined that max level back to my data to add that field in. Note that in the latest version of Prep you can now do this with a LOD (level of detail) calculation this functionality didn’t exist when I created the flow.
Now that I have the highest management level for each employee I can subtract that from the management level and add 1 to that to get the reversed level. I created this calculation ([Max Mgmt Level] – [Manager Level]) +1. I also created a new header field with the reversed level with this calculation “Level ” + STR([Reverse Mgmt Level]) + ” Manager”
In this snippet of data you can see that Albert Norman has Brittany Newman has his 1st management level, his highest management level is 5. When that is reversed Patti Reed who is the top level is now the level 1 manager and Brittany is the level 5 manager.
I cleaned up a few fields and then added another pivot to move the new management levels back to columns. This pivot is a rows to column pivot and because I know there is only 1 value for each level I am taking the min of the manager name.
The last thing to do is to add Patti Reed back to the cleaned data. Patti is the CEO of Fake Company and does not have a management level. When we excluded the null management levels after the first pivot she was removed from the data set. I created a branch for just Patti and unioned that back to the cleaned data set.
Earlier in this post I mentioned that I created a dummy field with “A” in it. I like to add aggregates into my flows to check the record counts at different stages of the flow. I got in the habit of creating these and exporting them out because I often work with sampled data. Creating the dummy field allows me a way to join the aggregates together and validate my record counts. If you’ve downloaded the flow you’ll see these aggregates and the step that exports the counts.
Thanks for reading and I hope this example was helpful. If you have any questions please feel free to get in touch. Happy Preppin!