Using Excel to Identify Duplicates in Salesforce Data
Are you responsible for data integrity and data management within your Salesforce org? If so, you probably have multiple tools in your toolbox to perform different validations and checks on your Salesforce data.
Excel can be one of those helpful tools in certain situations, like identifying duplicates in your data.
Before we start, we want to be sure that all newbie Admins understand an important point about using Excel to manage Salesforce data:
Excel reads upper and lowercase letters as the same value.
This is important to know because the Salesforce record ID is often used as a primary key when working with data. If you run a Salesforce report and pull in the record ID, that ID will contain a 15-digit string that consists of letters and numbers, including a mixture of upper and lowercase letters.
If you use Excel to identify duplicate records using the 15-digit ID, you will run into the issue of Excel misidentifying a record as a duplicate. For example, Salesforce sees these two ID’s as unique values, Excel does not.
Record 1: 0011r00002M1AOO
Record 2: 0011r00002m1aOO
To prevent Excel from misidentifying duplicates based on a record ID you should use the 18-digit ID instead of the 15-digit ID.
Now that we’ve shared the golden rule with you, let’s move on to Excel.
Removing Duplicate Values
Excel has an option in the ribbon that allows you to remove duplicates.
You just received a lead list with 10,000 rows. You have already validated that none of the records exist in Salesforce, but you also need to be sure that the data does not contain duplicates.
How to Resolve Outside of Salesforce:
Click on the Remove Duplicates option in the Excel ribbon
a. Excel Web Version: Data > Remove Duplicates
b. Excel Windows Version: Data > Data Tools > Remove Duplicates
2. Clicking on the Remove Duplicates option will select all the data in your spreadsheet.
3. A dialog box will open, and all your column headings will be selected. It’s up to you to decide if you want to look for an exact match across all columns or if you want to select specific column headings to match against.
Identifying Duplicate Values
The Conditional Formatting function in Excel is something we use a lot. It allows you to visually identify data patterns and issues by changing the appearance of the cells based on the conditions that you specify.
We could write an entire article on this topic, but for now, let's focus on identifying data groups.
You just received another lead list with 10,000 rows and you’ve been asked to update the spreadsheet so that you can easily identify different values in the Acquired From column.
You then need to filter the spreadsheet on the state of Pennsylvania so that the viewer can see which Acquired From values were prevalent.
How to Resolve:
Select the Acquired From column in your spreadsheet.
On the Home tab, click the Conditional Formatting option and then click Text that Contains.
Populate the text field with one of the unique values from the Acquired From column and select the color for that value.
Repeat step 3 twice more to give each value in the Acquired From column a different color.
Filter your spreadsheet on the state of Pennsylvania
The end user will be able to:
a. Visually see which option is more prevalent
b. Drill down on the results based on the Acquired From cell colors
There are multiple ways to generate these results from Excel, but we just wanted to explain these two. Hopefully, knowing how to identify duplicates using Excel will help you shift through your Salesforce data in a faster and more efficient way.
Need help with a Salesforce challenge? Contact Us