Google Refine

Ever worked with messy data? Whenever I get xls or csv files, I won’t be so naive to think that the data is clean. Very often there are trailing spaces or even worse newline characters at the end of certain cells, not to mention typos and stuff. I have to physically go through all data cells to do clean up before getting them into the database.

It’s such a pain, and I probably won’t notice anything until errors occur later on. Chances are I missed a few clean ups and now I would have to re-import the dataset into the database again.

But guess what? That’s when Google Refine comes to the rescue. It visualizes your data and lets you do filtering to see the discrepancies. Then you can correct the same kind of errors in one step. Check out the following videos to see things in action!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s