If you have rubbish data then you will get rubbish information from it! So often I find clients who can’t get any decent analysis from their Excel data and it isn’t always because they don’t have the right tools at their disposal. It’s sometimes because their data is not accurate or consistent. Without retyping huge chunks of data what can be done to rectify this issue? The tools below help automate the process. Which ones you use will depend on what issues you find.
If the data has inconsistencies such as Ltd instead of Limited or extra text that is not required then replace can find all occurrences and replace with the new text or with nothing if the text is not required at all.
Sometimes dates have been entered in an incorrect format so they are not viewed as dates but instead are treated as text. Excel is NOT happy with the date format 25.12.2019. The dots mean that it is viewed as text rather than a date so it will not behave correctly when sorting, filtering or calculating with it. A quick solution is to replace the “.” with either a “/” or “-“ turning it back to a valid date format. Another problem occurs if dates have been entered as 25th December 2019. Excel doesn’t like the th. In this case finding st, nd, rd and th and replacing with nothing will sort out the problem.
Text with Spaces at the End
If text has spaces at the end these won’t be visible but any analysis will view text with a space as very different to text without a space. Here you must be careful. If find and replace is used you may end up removing all spaces on the spreadsheet including ones between words. Replace works on the whole spreadsheet if only one cell is selected otherwise it works on the cells selected only. One solution could be to select only the cells that need to be changed.
However, if there are spaces in the middle of the text in these cells that need to be kept then this won’t work. The function TRIM removes spaces from the beginning and the end of the text but keeps the ones in the middle.
Text Containing Unprintable Characters
If text has been exported from other packages there can be some unprintable characters that get exported too. These can be removed using the clean function.
Regular Cleansing of Data
If a repeat of these sort of steps is required on a regular basis then it’s worth considering investing some time recording a macro to do all of these. This can then be run to perform all these steps automatically.
Prevent Errors in the First Place
Data validation can be used to validate the data that is initially entered. This can restrict the information to only allow certain data types to be entered but can also restrict numbers or dates to certain ranges or text to a specific number of characters or only entries from a specific list.
Is this a Perfect Solution?
However careful you are when entering data or however many of these tools you use to make the data consistent there will always be idiosyncrasies that will need to be manually changed. Unless this is done any data analysis performed will be far from perfect. So let’s try and prevent the rubbish from going in and therefore prevent the rubbish coming out the other end!
If you found this useful see what Excel courses are available or email firstname.lastname@example.org to discuss any training requirements.