In this article we want to show you how to use the different Alteryx join tools in order to analyse your data and obtain great insights. In this case we have data regarding customers and their purchases. In order to make interesting analysis, we have to join both files. This will allow us to estimate in a next article the market share of a company in different regions and its local performance.
Customer and purchase data are loaded, let’s get started!
Before joining both lists, we want to be sure that the customers in the customer file are unique. To check this, we use the unique tool to verify the customer ID. This makes it possible to easily remove doubles. We do not place a unique after the purchase table because one client can make multiple purchases.
To join our lists we use a “classic” join tool and we connect the customer table to the “Left input” anchor and the purchase table to the “Right input” anchor. Then, we select one or multiple fields on which we want to join the data. This will only work if the fields are present in both inputs, but they do not have to be named the same way.
When joining fields, they must have the same general type (string, date/time or numeric), otherwise the process will result in an error message. This avoids any irrelevant and potentially risky matching between data sources.
As you can see, not all the entries of both inputs are systematically matched. This is one of the huge advantages of the Alteryx Join tool. All entries are traced and none are lost silently.
- Behind the middle output we find our new list of customers and expenses.
- Entries in the left output are customers that did not match with the purchases list. These are probably customers who did not purchase anything in the observed period of time.
- The entries in the right output are purchases that did not match with any customer. We might have a data quality problem here!
If we look closer, we observe that before 2017 the customer ID was miscoded in the list of purchases (the “ID” characters in the two first positions are missing).
We can now correct the format of the customer ID and join these entries again to the customer list.
We use a union tool to combine the output of the two join tools. The union tool with the option “Auto config by name” takes two or more tables with the same fields and combines them vertically based on field names.
In this use case we have joined a file with customer information and purchases data. You have seen that the join tool is an intuitive, versatile and efficient way to blend, control and correct your data in a reproducible way.
The outcome is a new table with all information in which we are now able to, for example, aggregate all spending by postal code. By comparing these to the market potential we would obtain a view of the local performance. But this will be a topic for a next article.