Video: How to Remove Duplicate Rows in SQL Server - SQL Training Online
In this video, I show you a trick on removing duplicate rows in SQL Server 2012.
How to Remove Duplicate Rows in SQL Server?
I first need a table that has duplicate records to remove, so here is a script to create some duplicates in an email address table.
create table email_list
(
name varchar(100)
,email varchar(300)
)
go
insert into email_list values (‘Jan’,‘[email protected]’);
insert into email_list values (‘Bob’,‘[email protected]’);
insert into email_list values (‘Jill’,‘[email protected]’);
insert into email_list values (‘Bob’,‘[email protected]’);
insert into email_list values (‘Jill’,‘[email protected]’);
insert into email_list values (‘Bob’,‘[email protected]’);
insert into email_list values (‘Jill’,‘[email protected]’);

Now, the first method to remove the duplicates is in the following script that I adapted from [Alexander-s Post on duplicates](http://amitelman.wordpress.com/2012/09/05/duplicate-records-delete-all-but-one/).
WITH list_rownumbers AS
(
SELECT name,email,
ROW_NUMBER() OVER (ORDER BY name,email) AS 'RowNumber'
FROM email_list
)
DELETE list_rownumbers WHERE RowNumber not in
(SELECT min(RowNumber) FROM list_rownumbers GROUP BY name,email)
If you just look at the SQL With statement (CTE), you will see that it simply numbers all of the name, email combinations using the Row_Number function. Then it uses the minimum rows number per name, email group as an exclusion to remove the duplicate rows.
The second script is and adaptation of Ritesh-s Post on duplicates.
WITH list_duplicates (name, email, duplicate_count) AS
(
SELECT name,email,
ROW_NUMBER() OVER(PARTITION BY name,email ORDER BY name,email) AS duplicate_count
FROM email_list
)
DELETE
FROM list_duplicates
WHERE duplicate_count > 1
If we focus on the CTE query, we see that is uses the Partition By statement to actually reset the row_number count on each name, email group.
It makes the delete statement a little easier to follow since all we have to do is delete where the row_number (duplicate_count) is larger than 1.
And that-s it. That is how you Delete Duplicates in SQL.
Let me know what you think by commenting or sharing on twitter, facebook, google+, etc.
SQL Training Online: /
Twitter: http://www.twitter.com/sql_by_joey
Google+:
LinkedIn: http://www.linkedin.com/in/joeyblue