Open Refine

Google Refine

This is a tool for finding and correcting errors in data. All datasets have errors and inconsistencies in them. We need to clean them before using them, otherwise the errors will show up in the charts or maps we make. Open Refine (previously called Google Refine) can help you do this. Data errors can be the cause of different date formats used for the same day, typing errors made during data entry or just extra spaces where there shouldn't be any. Spreadsheets can have duplicate entries, or entries that should be split into two (or more) entries. These can be hard to find. Sometimes these problems are one-offs, but they can also be systematic errors made across the whole dataset, such as spelling a person’s name or location differently each time. Finding and correcting these by hand is time-consuming and comes with the risk of making new errors when trying to correct the old ones. Open Refine highlights where errors might be and can help you fix the problems all at once across your whole dataset. This tool can also do many other things to your data, including re-structuring and re-formatting data, and merging your data with other datasets. It can also translate data into other languages, though this is a little more complicated.

It's useful because

it automates tasks that would take a long time to do manually, such as finding typos and data that might be out of place.

But watch out for

how hard it is to figure out what filtering, faceting, clustering and reconciliation actually do to your data!

Learning curve:
Steep
How can you use it?
Desktop installation
Languages:
English

things

How do you make things with it?
Open Refine is free and needs to be downloaded and installed on your computer. The software is used offline but through your Internet browser just like a website. This tool isn't for creating data from scratch like a spreadsheet. To get started, Open Refine will ask you to upload the data that you already have, whether as a spreadsheet or another sort of file. It will then show the data in your Internet browser. The types of analysis are shown by clicking on the down-pointing arrows in the first row of your data as displayed by Open Refine. The tool keeps a record of everything you do, so you can “undo” or “redo” changes you make to your data. Open Refine does not change the data in your spreadsheet – it creates a new dataset that has all the changes, which can be exported from Open Refine as a new spreadsheet.

privacy

Privacy and Portability
Open Refine works from your computer, not the Internet (even though it uses your browser window), so you control how it is used, what data you put in it and who can access it.
How do you get data into it?
CSV, Google Spreadsheets, JSON, RDF, TSV, XLS, XLSX and XML
How do you get data out of it?
CSV, HTML, TSV and XLS
How is it licensed?
BSD