There are usually a number of different ways to get to the same result in Tableau & Tableau Prep. Week 9 of #PreppinData is another example of this.
For this edition of #PreppinData we looked at Chin & Beard Suds Co’s Twitter complaints. We were given a list of complaints and asked to:
Remove Chin & Beard Suds Co Twitter handle
Split the tweets up in to individual words
Pivot the words so we get just one column of each word used in the tweets
Remove the 250 most common words in the English language (sourced from here for you: http://www.anglik.net/english250.htm)
Output a list of words used alongside the original tweet so we know the context for how the word was used.
Here’s the flow I created
The first clean step splits the text on the space character. The next step pivots all of those splits back together in one column. In this pivot I used a wildcard pivot where the field name contained Tweet –
After consolidating the splits, I did a few clean steps to get the two sources ready for joining. Anytime that I join on text I always make sure to trim all the extra spaces and make the text fields either upper or lower case. I think this is a good habit to be in for text matching. I also excluded the company Twitter handle and any null records in this step.
Now it is time to join them together. The first join I did was an outer join between the tweets and the list of the 250 words. In the step after the join I kept only the records from the 250 list that were null (I used the rank field).
The other way this join can be done is with the left unmatched only join type. When you use this join type all you need to do is remove the two fields from the 250 list.
Initially I didn’t think of the other join type and found a way to get to the final result. Going back and looking at the joins the second option is probably the better way to go. There isn’t a wrong and right way to do it just different approaches.