Back to Blog

SQL

How to Remove Duplicate Rows in SQL Server

Remove duplicate records using ROW_NUMBER, CTE, and DISTINCT. Learn multiple methods to clean data and eliminate duplicate rows in SQL Server.

3 min read

Last updated on

Video: How to Remove Duplicate Rows in SQL Server - SQL Training Online

In this video, I show you a trick on removing duplicate rows in SQL Server 2012.

How to Remove Duplicate Rows in SQL Server?

I first need a table that has duplicate records to remove, so here is a script to create some duplicates in an email address table.

create table email_list

(

name varchar(100)

,email varchar(300)

)

go

insert into email_list values (‘Jan’,‘[email protected]’);

insert into email_list values (‘Bob’,‘[email protected]’);

insert into email_list values (‘Jill’,‘[email protected]’);

insert into email_list values (‘Bob’,‘[email protected]’);

insert into email_list values (‘Jill’,‘[email protected]’);

insert into email_list values (‘Bob’,‘[email protected]’);

insert into email_list values (‘Jill’,‘[email protected]’);


![Remove Duplicate Rows in SQL Server](../../assets/blog/2012/11/Remove-Duplicate-Rows-in-SQL-Server.png)

Now, the first method to remove the duplicates is in the following script that I adapted from [Alexander-s Post on duplicates](http://amitelman.wordpress.com/2012/09/05/duplicate-records-delete-all-but-one/).

WITH list_rownumbers AS

(

SELECT name,email,

ROW_NUMBER() OVER (ORDER BY name,email) AS 'RowNumber'

FROM email_list

)

DELETE list_rownumbers WHERE RowNumber not in

(SELECT min(RowNumber) FROM list_rownumbers GROUP BY name,email)

If you just look at the SQL With statement (CTE), you will see that it simply numbers all of the name, email combinations using the Row_Number function. Then it uses the minimum rows number per name, email group as an exclusion to remove the duplicate rows.

The second script is and adaptation of Ritesh-s Post on duplicates.

WITH list_duplicates (name, email, duplicate_count) AS

(

SELECT name,email,

ROW_NUMBER() OVER(PARTITION BY name,email ORDER BY name,email) AS duplicate_count

FROM email_list

)

DELETE

FROM list_duplicates

WHERE duplicate_count > 1

If we focus on the CTE query, we see that is uses the Partition By statement to actually reset the row_number count on each name, email group.

It makes the delete statement a little easier to follow since all we have to do is delete where the row_number (duplicate_count) is larger than 1.

And that-s it. That is how you Delete Duplicates in SQL.

Let me know what you think by commenting or sharing on twitter, facebook, google+, etc.

SQL Training Online: /

Twitter: http://www.twitter.com/sql_by_joey

Google+:

LinkedIn: http://www.linkedin.com/in/joeyblue

Facebook: http://www.facebook.com/sqltrainingonline

About Joey Blue

Joey Blue teaches practical data skills that companies actually use. With 25+ years of experience solving real data problems for Fortune 500 companies, he's helped 152,000+ students learn SQL, Power BI, reporting, and modern analytics—cutting straight to what works.