How to Clean Your Data: Best Practices for Data Hygiene

At Grantbook, we are often engaged in data cleaning tasks either as stand-alone projects or as a part of a larger grants management system (GMS) implementation. Even though data is a key component of an organization’s story, we often find that data cleaning or migration work is not given the time or resources that it needs. Accurate and well governed data is often the best way for grantmakers to track their funding, know who they're giving to, and begin to unpack their impact story.

If this is the first time your organization is taking on a data cleansing project, it can definitely be daunting to know where to start. The good news? We’ve been in your shoes and having worked on several data prep and cleansing projects over the years, and we’ve picked up some tips and tricks along the way.  

Clean data will get you from numbers to insights much more quickly.

We’ve broken this information out into five sections:

  1. What is Clean Data? An overview of what we mean by “clean data” and why it’s important
  2. Data Strategy & Governance The most important first step to ensure that your organizational strategy, roles, and processes are all aligned to ensure that your cleaned data is informative and reliable
  3. Cleaning Checklist How to go about the process of data cleaning
  4. Data Maintenance & Hygiene Best practices to ensure your data continues to stay clean on an ongoing basis
  5. Frequently Used Data Cleaning Tools Available tools (or helpers if you will) that will allow you to speed up the process

What is clean data?

Before jumping into the how-to’s, we feel it is important to establish what we mean by “clean data.” In broad strokes, we define clean data as data that is accurate, complete, consistent, and unique. 

Accuracy

Different types of data will have different requirements for keeping it accurate and up to date. For instance, information about organizations is more static than information about individuals.

Completeness

Staff can draw more reliable inferences and have a more holistic picture from datasets that are complete.

Consistency

Where possible there should be uniformity between similar records in the system. Data entry conventions for record names and fields makes for easier reporting and segmentation of the database.

Uniqueness

Too many duplicate records can skew information in reports. It can also provide inaccurate and confusing information to those who need it on the programming side. Keeping duplicates low or out of the system gives users confidence in the data being tracked.

When migrating to a new grants management system or implementing a new system, clean data is integral to the success of your implementation. Clean data will enable you to migrate the appropriate unique organizations, contacts and grants in the new system, as well as establish the correct relationships between these data points. Clean data in fields that are used to drive functional logic is also important.

Data Strategy & Governance: The Most Important First Step

We have seen clients pawn off data tasks to interns without organizational context, migrate every single field from an old system to a new one only to realize 30% were never populated, or underestimate the larger strategy and planning discussions involved in the process.

From our experiences with data in cleaning and migration contexts, we have distilled a few best practices for planning and expectation setting that should be considered before starting any data project.

  • Organize a data discovery session: Use this session as an opportunity to review your organization’s data, flag risks, and find common issues and patterns (duplicates, incorrect data, data sitting in the wrong fields, fields used incorrectly, unused fields, etc.). This is also a good setting to discuss post-cleaning governance tasks and ensure that leadership is on board with the data vision.  
  • Know the goals of the data: Are there fields that should not be carried over to the new system? What are the reports or impact dashboards you want to create from your grant data and which data points need to be collected in order to make this happen?
  • Set time expectations: Establish the time commitment required to complete data tasks at the outset of a project. If the data is cleaner, the time commitment for the data project will be lighter. If the data is messier, you will need more time and might need to bring in individuals who have more organizational context to help sort through the data.
  • Establish roles: It is important for everyone to have a role. The projects that we’ve seen have the most success always have a data champion, someone assigned to QA, decision makers, and organizational context holders. 
  • Create a task list of cleaning activities: What gets measured gets done. Having a task list ensures that each person knows what they are responsible for.

Cleaning Checklist: How to Clean your Data

  • Start Small: Begin by cleaning just a subset of your data. This will help you identify a standardized way to go about the process, establish a more accurate sense of time required and unearth potential issues earlier on. 
  • Data Extraction: Export out the raw data required from your system for each data model (i.e. Organizations, Contacts, Grants etc).
  • File Format & Encoding: Pay attention to the file format and file encoding when exporting out data. CSV files are a commonly used file format for large datasets (it will take up less storage space) and UTF-8 is considered a safe encoding language across OS and multilingual languages.
  • File Labeling: When exporting out your data, make sure to clearly label every data file. Include the data model, the date of export, and the label ‘Raw Data’ to signify that the file represents the original data before it was scrubbed.  
  • Create a copy: Always create a copy of your ‘raw data’ file and use this copied file (once relabeled to say ‘Scrubbed Data’) as the file to use when you begin scrubbing the data. The goal is to always maintain a record of your original raw data so you have a frame of reference if needed. 
  • Log all changes: Data cleaning always involves a variety of tasks (often more than anticipated) and it can become very easy to lose track of the changes made. Keeping a log (a simple spreadsheet or google doc will do the trick) is a good way to track the history of changes made and can be especially useful when you have multiple people working on cleansing the data as opposed to just one individual person.
  • Check for duplicates: Use this as an opportunity to check for duplicates (based on name, email address, phone number, etc.), and do so at the start of the process. This will reduce both the amount of data you have to scrub as well as making data validation (at the end of the process) much simpler. 
  • Standardize Formats: Review the formats being used for applicable data types (dates, phone numbers, postal codes etc) and set and apply a standard format to all. 
  • Special Characters & Accents: Depending on the file format and encoding chosen, special characters can display weirdly once the data is exported. For example, the term “Açaí” might display as “AçaÃ,”­ so make sure to do a check and clean as needed. 
  • Maintain backups: Always maintain a backup of all files (both raw and scrubbed data files). Having an ongoing backup for all versions of scrubbed data files allows you to have version history and helps avoid those grey hairs in case anything were to happen to your master files. Avoid storing data locally though (both master files and backups) as it can be a security risk.
  • Clean workspace: Data scrubbing is the type of task that requires chunks of uninterrupted time and a tidy digital and mental workspace so avoid having other documents open and other tasks/meetings on the go as you work on the data, as distractions will increase the chance of error.
  • Data Import: Once you’ve finalized a scrub of all your data, import it back into your system.
  • Data Validation: Once the scrubbed data is back in your system, validate its accuracy by conducting a quality review. You can do this by (i) reviewing a handful of records for each data model to verify accuracy (ii) reviewing a sample of records with special characters and accents to ensure proper formatting and (iiI) running reports for each data model to extract out the clean data and cross reference against the data in your master (scrubbed data) spreadsheets.

Data Maintenance & Hygiene: Keeping Your Data Clean

  • Create a list of data protocols supported by a data glossary to uphold data cleanliness.
  • Schedule trainings/workshops with staff to help get them informed and engaged on the new data policies.
  • Conduct frequent data validation (e.g. every quarter) to ensure data policies are being followed and clean data is continuing to be maintained.
  • Leverage any available existing functionality of your system in maintaining the integrity of your data e.g. turn on duplicate management (if available).

Frequently Used Data Cleaning Tools 

There are several tools that can help speed up data cleansing but it can be hard knowing which to use so listed below are some of the tools we’ve had success using in the past (and it includes some free options!): 

  • Duplicate Check for Salesforce (has free trial options)
  • Native duplicate rules functionality in Salesforce (org functionality is better than contacts)
  • OpenRefine (free) 
  • Fuzzy Lookup for Excel from Microsoft (helps identify potential duplicates based on close matches)
  • If viewing data in a spreadsheet, use the filter function on fields that should have values from a certain set. The filter dropdown will show all the values in the column and you can spot anomalies. The filter also has options to look for blank values across fields that should have values

Technology has enabled Foundations to collect more meaningful data, more quickly than ever before—which can in turn unlock even greater impact. Use these tips to make the process of data cleaning and maintenance more effective.

Are you planning to migrate your data to a new Grants Management System? Our team of experts can help you select, implement, and optimize your systems. Reach out for an introductory chat today.

Annie Xiong's headshot

Annie Xiong

Implementation Consultant

Systems Design & Fluxx

Annie has worked with SAAS products her whole career, and was excited to move into the philanthropic sector in 2017 to apply her skills and experience with grantmakers. As an Implementation Specialist at Grantbook, she helps Foundations with grant management system implementations, systems training, data cleansing, and migration tasks, using a human-centred approach that considers the processes and people behind the technology.

In her free time, she is an avid salsa and bachata dancer and is a member of two latin dance teams with the Araguacu Latin Dance Company. She is also a design consultant for dance companies and solo instructors in Toronto and is always happy to teach you the basics!

Annie has a Bachelor of Commerce degree from Queen’s University where she was heavily involved in Enactus Canada, an organization that encourages students to implement community empowerment projects, and RECODE Collaborate, a student-led, national program designed to engage students through collaboration and dialogue on the future of higher education in Canada. A highlight during her time at Queen’s, was successfully obtaining a Certificate in Responsible Leadership alongside her Commerce degree, and having the opportunity to participate in social impact conferences, take social-impact themed classes, and volunteer with local non-profit community organizations.

Rhea Vaz

Implementation Consultant

Systems Design & Fluxx

Rhea came to Grantbook with over 3 years of experience in system implementations, specifically ERP implementations, and a strong background in data analysis, data cleansing and preparation. All her previous implementations had one unique distinction—having always been a part of the end-user’s team that is adopting the new system. 

This alternate perspective is what drew Rhea to her role at Grantbook, as implementation and support specialist. Having  experienced first-hand the excitement, trepidation, and requirements evolution that accompany finding the right digital solution, she hopes to be a source of guidance and support that clients can rely upon, with the end goal of delivering a solution that allows them the freedom to focus on real systemic change.

Rhea obtained her Bachelor of Commerce degree in Finance, Operations and Accounting, at McGill University. A highlight during her time at McGill was successfully attaining a Lean Six Sigma Green Belt certification, alongside her undergraduate degree, and having the opportunity to utilize key six sigma strategies and techniques to streamline the McGill Bookstore’s inventory management system as part of the certification.