Let’s talk about checking your data quality with Alteryx?

Use case: get an idea of the data you want to use

So, you have access to the data (read the blog article about the in- and output tools) and can now start your analyses. What to do before you launch yourself into creating workflows with analyses?

Always validate your data!

Know what you are working with. This way you can avoid problems later on causing incorrect results, delays, other kinds of problems…

What are some of the typical problems that you encounter when you start working? How to quickly identify and correct them with Alteryx?

1. Check data quality

Browse AlteryxUse the Browse tool to get an idea of the data quality.

A quality check can highlight many problems such as incorrect data formats, NULL values, empty fields, aberrant values, inappropriate whitespaces, outliers, …

With the browse tool, you can see the fields in your data and the quality of the data in each field. Identifying data issues immediately prevents problems later on.

profile your data

2. Change data type

selectWhen data formats such as plain text are used for data transfer, the field type is not coded with the data. Hence, Alteryx will interpret each field as a character string. It is of course recommended to select the relevant format for each field.

Why? Because you will need numbers if you want to carry out calculations, date time format to use date time tools, and a numeric ID will be sorted differently depending on whether it is coded as string or integer.

Using the select tool, you can set for each field the appropriate data type. If needed you can also change the field name to something unique and/or better understandable. For more complex fields, you can add a description that will ensure a good use and a correct interpretation of the field by you and other users.

change the data type

3. Correct NULL values

NULL values indicate the absence of known values and such cases should always be treated.

Why? Because NULL values can cause unexpected behaviour in data blending operations and can cause errors in predictive algorithms. Depending on the context, records containing NULL values should be removed (filter tool) or replaced by more appropriate values. You can use the data cleansing tool to automatically replace NULL values by 0 (for numeric format) or by an empty string (for character format).

remove null values in data

4. Remove whitespaces

The data cleansing tool proposes a simple and efficient option to remove inappropriate whitespaces.

Inappropriate whitespaces such as leading, trailing and duplicate whitespace can typically occur in data fields that were at one time typed on a keyboard or imported using symbol recognition. Inappropriate whitespaces are pernicious as they are hardly visible and can distort the result of simple operations such as joining by ID.

When you have done all this, you know what data you are working with and you have done the basic cleaning.

Need help?

You can obtain help on Alteryx through different channels. Ask help to peers on the Alteryx community, browse for solutions on the Help Center, have a look at the different tutorials on YouTube or obtain Live support by our certified consultants.

Movie: what is an optimal point of sale network?