There are usually a number of different ways to get to the same result in Tableau & Tableau Prep. Week 9 of #PreppinData is another example of this.
For this edition of #PreppinData we looked at Chin & Beard Suds Co’s Twitter complaints. We were given a list of complaints and asked to:
Remove Chin & Beard Suds Co Twitter handle
Split the tweets up in to individual words
Pivot the words so we get just one column of each word used in the tweets
Remove the 250 most common words in the English language (sourced from here for you: http://www.anglik.net/english250.htm)
Output a list of words used alongside the original tweet so we know the context for how the word was used.
Here’s the flow I created
The first clean step splits the text on the space character. The next step pivots all of those splits back together in one column. In this pivot I used a wildcard pivot where the field name contained Tweet –
After consolidating the splits, I did a few clean steps to get the two sources ready for joining. Anytime that I join on text I always make sure to trim all the extra spaces and make the text fields either upper or lower case. I think this is a good habit to be in for text matching. I also excluded the company Twitter handle and any null records in this step.
Now it is time to join them together. The first join I did was an outer join between the tweets and the list of the 250 words. In the step after the join I kept only the records from the 250 list that were null (I used the rank field).
The other way this join can be done is with the left unmatched only join type. When you use this join type all you need to do is remove the two fields from the 250 list.
Initially I didn’t think of the other join type and found a way to get to the final result. Going back and looking at the joins the second option is probably the better way to go. There isn’t a wrong and right way to do it just different approaches.
I just finished week 2 of the Preppin Data challenge and wanted to walk through my approach. One of the things I love about Tableau and Tableau Prep is that there are a number of different ways to get at the same result.
This week Carl & Jonathan gave us a file that had a big header, names that needed to be cleaned, and metrics that needed to be moved to columns. The output needed to be 6 columns and 14 rows.
After setting my connection to the file the first thing I did was check the Use Data Interpreter box. This helper removed the unnecessary header at the top of the file.
Whenever I built something in Tableau Prep I like to always add a clean step after my connection to get a sense of what is in the data. When I did this I noticed that my city field had a value called “city”. I knew from looking at the initial file that this was a secondary header so I right clicked on the value of city and selected exclude.
At this point I also added an aggregate to see how many rows were in my data set. I like to add these as I build out a flow to get a sense of how my record counts change as I build out different steps.
I added another clean step and I did this because I like to partition out my changes when I build something new (I’m quirky). I could have done these all in the first step. In this step I grouped the various city names by pronunciation This took care of all but two values. I edited the group and manually added “nodonL” to London and “3d!nburgh” to Edinburgh. In this step I also created the new header field which combined the metric and the measure and then removed those fields as they were no longer needed.
The next step was to move the values from the rows to columns. This is done in a pivot step. Most of the people I help with Prep think Pivot = Pivot table and are confused when they add that step. Pivot will reshape your data. My pivoted field is my new field that I created in the prior step and my field to aggregate is the value field.
At this point I also added an aggregate step to make sure I had 14 rows as the instructions called for. This is the full view of my flow.
My goal for 2019 is to produce more personal projects on my Tableau Public page than I have in the past. I’m kicking that off by looking at the 2018 golf season for my golf group. The Fairway Ladies of Franklin Park play at the William J Devine golf course in Boston MA. To get a high level summary of what our season looked like I pulled a report from GHIN (the program we use to keep our handicaps).
I designed the dashboard to be a wide layout and kept it with a simple color scheme. I’m pleased with how it turned out. I went back and forth on the score differential chart a number of times and finally settled on the bar code chart.
I’m looking forward to doing more of this in 2019!
To keep with the popular year end theme of year in review here is my list of acknowledgements to the Tableau Community in 2018. These are in no specific order.
Susan Glass (@SusanJG1 ) & Paula Munoz (@paulisDataViz ) – I “met” Susan & Paula via Twitter and then got to meet them in person at the Boston Tableau User group this year. I’ve been a sporadic BTUG attendee for years but never really met anyone at these user groups. Sometimes the user group meetings felt like riding the T – unless you already knew someone on the train you avoid eye contact with everyone else. It is great to have some real live Tableau friends now.
Tom O’Hara (@taawwmm ) – Tom is Tableau support at Comcast. The range of questions he answers on our internal Slack & Teams Tableau boards is amazing. He’s always helpful and supportive. I hosted Sports Viz Sunday in September and was thrilled that Tom supported me by entering a viz. It’s great to have work colleagues support your personal Tableau endeavors.
Josh Tapley (@josh_tapley) & Corey Jones (@CoreyJ34) Josh & Corey run the Philadelphia Tableau User group and gave me my first opportunity to present at a TUG. I did a live demo Tableau Prep and enjoyed presenting more than I thought I would. I was also blown away by Corey at the TUG. There were a number of St. Joes students who presented their work after they were done Corey acknowledged something he liked about each one of their vizes.
Ann Jackson (@AnnUJackson) & Luke Stanke (@lukestanke ) – Ann & Luke put out my favorite podcast Hashtag Analytics https://bit.ly/2QTiAJW. Their podcasts are great conversations on data and the Tableau community. You’re missing out if you aren’t listening to these.
The SportsVizSunday Guys (@SimonBeaumont04, @JSBaucke ,@sportschord) – Thank you for asking me to host #SportsVizSunday in September! Being asked to host September’s challenge was big for me. This was the first time I’d been invited to be more involved in a data viz project. There is a big Tableau community on Twitter and at times I’ve felt a little lost because I don’t create flashy work and I don’t have a gazillion followers. When Simon asked me to host I felt great. We all like to be recognized from time to time (even the introverts like this).
Sarah Bartlett (@sarahlovesdata) – Sarah is the Tableau ambassador on Twitter. No one else in the community welcomes and supports people like Sarah does. She also promotes new folks every week with #Tableauff. She’s also got mad skills and it was awesome to see a women in the IronViz Europe finals this year.
Chantilly Jaggernauth (@chanjagg) & Amar Donthala (@AmarendranathD)- Chantilly & Amar created a Millennials & Data (millennialsanddata.com ) program this year to prepare millennials to enter the data driven world. This is in addition to their full time jobs at Comcast. Their first cohort of 16 produced amazing work and they all passed their Tableau Desktop Specialist Certifications. I see great things in the future for Chantilly & Amar!
There are a number of other folks who have influenced me in 2018. This list is my no means inclusive of everyone but these are folks that I wanted to highlight.
I think creating blog posts/videos are a great way to learn something so I decided to create a series of videos on Tableau Prep. The first one has been published is how to create a union. Hope you enjoy it!
In last week’s Makeover Monday recap Andy reminded us that this is a makeover. The intention is to evaluate what is good and what can be improved with a viz and create a new one with those points in mind. People can use makeover Monday for what they what but the intention is to improve upon the selected viz.
I normally try to take that approach but I don’t often document what I like and what can be improved so for the next few weeks I am going to attempt to put my thoughts and approach together here.
The viz this week comes from HowMuch.net and looks at R&D spending across the globe.
I like that the person who created this tried a different approach to displaying the information. They want the reader to focus on the large circles in the middle for the US, China, Japan, and Germany. What I think can be improved is the amount of clutter in the viz. There is a lot going on here between the circles, the map, the flag, and the multiple colors. I think a better approach would be to simplify the viz and draw the attention to the top 5 countries. I don’t think the flag and the country shape add to the story so I would remove them.
I selected a treemap for this week’s makeover. While treemaps may not always be the best option to compare values I think in this case it works because I want to highlight the contribution of the top 5 countries and I don’t want to compare all of the countries against each other.
Overall I think this meets the goal of drawing attention to the top countries.
Last month’s Sports Viz Sunday was the Masters. I created 3 different vizzes using Tableau public.
The first one I created looked at how closely contested the Masters usually is. I’ve always felt that Masters Sunday was the best TV viewing day of the year and looking at the data backed that up. The tournament has only been won by 5 strokes or more 5 times.
Overall I like how this turned out. The one thing I would change is the title. I don’t feel that it gives a good take-a-way of what the viz is about.
The next one I did was on Tiger’s 1997 win. Tiger won by 12 strokes the largest margin of victory (as of this post). I wanted to see round by round how much better Tiger was than the average score for the day. Tiger is known for wearing red and black on Sundays and I used the color scheme in honor of that.
This is a simple viz and the goal was to highlight how good his 2nd and 3rd rounds were in relation to the field average score. There was a Twitter discussion about showing the better score on the bottom of the viz. In golf being under par is good and while it may seem strange to see better on the bottom I think it make sense when you are looking at golf scores. If I was showing tournament position (first place, second place etc.) it makes sense to show them at the top, but, I believe when showing in relation to par at the bottom of the viz makes more sense.
The 3rd viz looked at 1956 Masters where Jackie Burke Jr started Sunday 8 strokes behind Ken Venturi and came back to win by 1 stroke. I wanted to show round by round how well Venturi played for the first 3 rounds and how steady Burke was. I’d like to do a more in depth analysis on this to show how great Burke’s final round was. There were only 2 players under par on Sunday and Bobby Jones said it was the toughest weather conditions the Masters had been played in. This is my favorite of the 3 and hopefully I’ll expand upon this with a more in depth analysis.
I was excited that Cole Nussbaumer Knaflic’s Storytelling with Data current challenge is to create a basic bar chart. She says “The #SWDchallenge this month is to create a basic bar chart. Nothing fancy. No need to stack it or do anything else crazy.” I love a good bar chart and have been known to say “don’t underestimate the power of the bar” more than once.
For this challenge I used data from the 2017 Masters to show which holes had the highest percent of scores over par.
At first, I sorted the data in descending order so the top three were together at the top of the chart. For other data sets I think this works, but, for this I liked keeping the holes ordered by the hole number.
I debated the bar color for the top 3 for a while. I wanted to use the green to tie with the Masters theme. I decided against that because people tend to associate green with good – if I were showing the 3 easiest I would have used that. I tried orange, a maroon-ish red, dark gray, and brown but I didn’t love any of those choices. I had my husband look and he suggested that I color code them in multiple shades. Instead of shutting that down immediately I changed the scheme to show him what it would look like and asked do the top 3 still stand out? When he agreed that it didn’t, I switched it back to a two color scheme and he suggested the purple and I think it pops.
Initially, I labeled the bars and tested out different alignments. I felt that the chart was too busy with the bars labeled. I needed to add the percent over par to the chart so I added it next to the hole name. To do this in Tableau add your measure to the row shelf and change it to discrete.
I don’t have any annotations on this chart and if you aren’t familiar with golf over par may not resonate with you. I am sure some folks would suggest adding text to explain over par but I opted not to because I liked the clean look and felt that my title got the point of the chart across.
To see other entries for this challenge take a look at #swdchallenge on Twitter.
Also take time to check out Cole’s website and buy the Storytelling with Data book.
“Swing your Swing.
Not some idea of a swing.
Not a swing you saw on TV.
Not the swing you wish you had.
No, swing your swing.
Capable of greatness.
Prized only by you.
Perfect in its imperfection.
Swing your swing.
I know I did”
I see “swing your swing” as be true to yourself in your approach. In golf all that truly matters is the contact with the ball. How you get to that point is where you “swing your swing”. There are methods and instructions that make it easier to square the club at impact, however, do what works for you. Explore your swing. Figure out your tendencies good and bad. Swing your swing. Don’t become so mechanical that you lose sight of you.
Because I have a golf “issue” I immediately applied this my data viz work. My goal in data visualization is to depict data in a manner that clearly communicates the insights in the data. Like golf, there are a number of best practices and methods on how to do this. And like golf there are different approaches to get to the same end point. Take a look at #makeovermonday or #dataforacause and you will see an number of people approach the same data set in a number of different ways. You’ll see things that run the range of simple bar charts to radial charts. This is where people “viz their viz”.
In both golf and data viz your personal style is important, however, if your style trumps your success or ability to communicate effectively you need to refine your style. It is hard to square the club consistently when you come over the top and it is hard to communicate data effectively when you have a pie chart that uses 20 different colors.
So how do you get there?
Practice, practice, practice.
Experiment – #makeovermonday is a great opportunity for this.
Learn from others – be inspired by their work but don’t seek to duplicate someone else’s style.