Ever worked with messy data? Whenever I get xls or csv files, I won’t be so naive to think that the data is clean. Very often there are trailing spaces or even worse newline characters at the end of certain cells, not to mention typos and stuff. I have to physically go through all data cells to do clean up before getting them into the database.
It’s such a pain, and I probably won’t notice anything until errors occur later on. Chances are I missed a few clean ups and now I would have to re-import the dataset into the database again.
But guess what? That’s when Google Refine comes to the rescue. It visualizes your data and lets you do filtering to see the discrepancies. Then you can correct the same kind of errors in one step. Check out the following videos to see things in action!