Deleting Duplicate Rows in Oracle

by Anuj Parashar · June 24, 2010

Deleting duplicate rows from tables is one of the common task oracle developers come across. The data might get duplicated because of missing primary/unique key on the table or batch file getting loaded multiple times. Here I have tried to summarize different ways of deleting this duplicated data. Please note that this is not an extensive list of all available methods but the ones I was able to figure out. This should serve as a handy reference while at work.

CREATE TABLE tbl_test(
	 SER_NO NUMBER,
 	 FST_NM VARCHAR2(30),
 	 DEPTID	NUMBER,
 	 CMNT   VARCHAR2(30));

INSERT INTO tbl_test VALUES(1, 'aaaaa', 2004, 'xxx');
INSERT INTO tbl_test VALUES(2, 'bbbbb', 2005, 'yyy');
INSERT INTO tbl_test VALUES(1, 'aaaaa', 2004, 'xxx');
INSERT INTO tbl_test VALUES(1, 'aaaaa', 2004, 'xxx');
INSERT INTO tbl_test VALUES(3, 'ccccc', 2005, 'zzz');
INSERT INTO tbl_test VALUES(2, 'bbbbb', 2005, 'yyy');
Code language: SQL (Structured Query Language) (sql)

1. Using MIN(rowid) : The most common method of removing duplicate rows.

DELETE FROM tbl_test
      WHERE ROWID NOT IN (SELECT   MIN (ROWID)
                              FROM tbl_test
                          GROUP BY ser_no, fst_nm, deptid, cmnt);
Code language: SQL (Structured Query Language) (sql)

Comment: This will take hours & hours if the table is large (records in million). 2. Using MIN(rowid) & Join: More or less the same as first one

DELETE FROM tbl_test t
      WHERE t.ROWID NOT IN (SELECT MIN (b.ROWID)
                              FROM tbl_test b
                             WHERE b.ser_no = t.ser_no
                               AND b.fst_nm = t.fst_nm
                               AND b.deptid = t.deptid
                               AND b.cmnt   = t.cmnt);
Code language: SQL (Structured Query Language) (sql)

3. Using Subquery: This is an interesting one

DELETE FROM tbl_test
      WHERE ser_no IN (SELECT ser_no FROM tbl_test GROUP BY ser_no, fst_nm, deptid, cmnt HAVING COUNT (*) > 1)
        AND fst_nm IN (SELECT fst_nm FROM tbl_test GROUP BY ser_no, fst_nm, deptid, cmnt HAVING COUNT (*) > 1)
        AND deptid IN (SELECT deptid FROM tbl_test GROUP BY ser_no, fst_nm, deptid, cmnt HAVING COUNT (*) > 1)
        AND cmnt   IN (SELECT cmnt   FROM tbl_test GROUP BY ser_no, fst_nm, deptid, cmnt HAVING COUNT (*) > 1)
        AND ROWID NOT IN (SELECT   MIN (ROWID) 
                              FROM tbl_test
                          GROUP BY ser_no, fst_nm, deptid, cmnt
                            HAVING COUNT (*) > 1)
Code language: SQL (Structured Query Language) (sql)

Comment: A complicated way of performing the same task. Not efficient. 4. Using Nested Subqueries:

DELETE FROM tbl_test a
      WHERE (a.ser_no, a.fst_nm, a.deptid, a.cmnt) IN (SELECT b.ser_no, b.fst_nm, b.deptid, b.cmnt
                                                     FROM tbl_test b
                                                    WHERE a.ser_no = b.ser_no
                                                      AND a.fst_nm = b.fst_nm
                                                      AND a.deptid = b.deptid
                                                      AND a.cmnt   = b.cmnt
                                                      AND a.ROWID  > b.ROWID);
Code language: SQL (Structured Query Language) (sql)

Comment: Will work but for large tables, this is not efficient. 5. Using Analytic Fucntions:

DELETE FROM tbl_test
      WHERE ROWID IN (
               SELECT rid
                 FROM (SELECT ROWID rid,
                              ROW_NUMBER () OVER (PARTITION BY ser_no, fst_nm, deptid, cmnt ORDER BY ROWID) rn
                         FROM tbl_test)
                WHERE rn <> 1);
Code language: SQL (Structured Query Language) (sql)

Comments: This is by far one of the best solutions if the table is really really large. Using the invaluable power of Analytics. 6. CREATE-DROP-RENAME: This one is a more appropriate solution in terms of resource usage in the sense that if we have a really large table, then with delete option we are generating a huge amount of UNDO information.(if we want to rollback for any reason). Even worst, the rollback segment may not be large enough to hold your UNDO information and give error. CTAS comes handy in this case. Step 1.

CREATE  TABLE tbl_test1 NOLOGGING
   AS
   SELECT tbl_test .*
     FROM tbl_test tbl_test
    WHERE ROWID IN (SELECT rid 
                      FROM (SELECT ROWID rid, ROW_NUMBER() OVER (PARTITION BY ser_no, fst_nm, deptid, cmnt ORDER BY ROWID) rn 
                              FROM tbl_test)
                     WHERE rn=1);
Code language: SQL (Structured Query Language) (sql)

Step 2.

DROP TABLE tbl_test; --drop the original table with lots of duplicate
Code language: SQL (Structured Query Language) (sql)

Step 3.

RENAME tbl_test1 TO tbl_test; -- your original table without duplicates.
Code language: SQL (Structured Query Language) (sql)

In case you have some other method of deleting duplicate data, please share it.

Tags: Database database queries How-To Oracle

26 Comments

bhaskar says:
25 June, 2010, 18:07
Nice, but may be you should elaborate a little on why certain approaches are slow and inefficient while others are better suited for the job.
May be a explain plan for each of the delete statements will help further explain your point.
Reply
Ritesh Bhatnagar says:
6 July, 2011, 23:19
Hi Anuj,
Had some time today so was browsing through your Oracle posts. Nice issue and resolutions but I think the solution for removing duplicates needs to be corrected above.
1. First, your query would retain only the duplicates at the end as you are CTAS using row number 1. This mean it will return duplicates in the test1 table. The correct solution would be to put rn=1.
2. Since you have the row_number value , why are you using ROWID again to query tbl_test? This is an unnecessary join, you can directly CTAS from the inner query (as shown below). This way you are eliminating the re-query of the tbl_test table based on ROWID.
CREATE TABLE tbl_test1 NOLOGGING
AS
SELECT * FROM
(SELECT ROW_NUMBER() OVER (PARTITION BY ser_no, fst_nm, deptid, cmnt ORDER BY ROWID) rn, tbl_test.*
FROM tbl_test)
WHERE rn=1;
And then perform step 2 and step3.
Reply
- Anuj Parashar says:
  8 July, 2011, 17:41
  Hi Ritesh,
  Thanks for pointing it out. The first one is wrong indeed, it should be rn=1. Corrected now.
  Regarding your 2nd point, if I want to avoid re-quering as you suggested, I need to provide the full column list to avoid “rn” becoming a new column for new table.
  Reply
Ritesh Bhatnagar says:
14 July, 2011, 8:32
Hi Anuj,
Point 1: I see the new corrected query now. Thanks for making the correction.
Point 2: I would still feel that providing full column names in the select query would be a simpler option than re-querying the table. If you are re-querying the table just to avoid the column list then I would point that you are anyway using the full column list in the PARTITION BY clause of the ROW_NUMBER function. It would be matter of pasting the same column in the SELECT clause. Here would be the fine tuned version of the query without “rn” being part of the tbl_test1 table –
CREATE TABLE tbl_test1 NOLOGGING
AS
SELECT ser_no, fst_nm, deptid, cmnt FROM
(SELECT ROW_NUMBER() OVER (PARTITION BY ser_no, fst_nm, deptid, cmnt ORDER BY ROWID) rn, tbl_test.*
FROM tbl_test)
WHERE rn=1;
You can now perform step 2 and step3 as before.
Reply
ranjeet says:
4 January, 2012, 12:47
CREATE TABLE tbl_test1 NOLOGGING
AS
SELECT distinct ser_no, fst_nm, deptid, cmnt FROM tbl_test
and now perform step 2 and 3
Reply
Info says:
26 July, 2012, 13:38
All the contents of this Website is a copy paste from other website like tutorialspoint,javaworld
,coderancs,stack overflow
book like headfirst etc..
Could face copy protection law
Reply
Prajeesh says:
5 November, 2012, 17:35
Good one.. But the 5th one using analytic fn is not giving correct result.
Reply
akash says:
11 December, 2012, 9:26
may be this would work
create new table using following query
create table new as select distinct * from old;
and then delete old table
Reply
Nitin Agrawal says:
14 June, 2013, 20:06
Very good posts…I have many of your posts on Java also & those were very helpful & informative. Similarly this one also has great stuff…keep posting such information
Thanks
Reply
Jay says:
26 June, 2013, 21:13
Thanks. Nice post. Helped
Reply
Ivan says:
24 December, 2014, 12:28
CREATE-DROP-RENAME
Step 1: CREATE TABLE tbl_test1 NOLOGGING
AS
SELECT DISTINCT t .* FROM FROM tbl_test t ;
STEP 2 : DROP TABLE tbl_test;
STEP 3: RENAME tbl_test1 TO tbl_test;
Reply
Kovid Rathee says:
30 January, 2015, 10:21
The 6th option is the most efficient with tables with a large number of records. I’ve performed this on 6 million records and it worked out just fine.
Reply
Hiren says:
30 April, 2015, 23:10
Hi!
I’m new to Oracle. I’ve a problem that, I’ve eight table. Table-1 contains Customer Detail, and other contains different details related to customer in Table-1. I want to find out Customer whose details not available in all other tables. I want to mark this records as UNUSED. One field is common in all tables. I tried a lot but I’m not able to do this in single query.
I tried to do it with Union All but it takes tooooooo mush time.
How can I do this ?
Reply
angel ikaz says:
25 February, 2016, 18:59
i would suggest you to try DuplicateFilesDeleter , it can help resolve duplicate files issue.
Reply
SANKARAN DHANANJAYAN says:
2 June, 2016, 7:51
Thanks a lot … excellent tutorials on deleting duplicate records…
Reply
Helcio Furlan says:
3 June, 2016, 0:21
And this solution?
DELETE tbl_test a
WHERE a.ROWID > (SELECT MIN(b.ROWID)
FROM tbl_test b
WHERE b.col_pk = a.col_pk);
OR
DELETE tbl_test a
WHERE a.ROWID < (SELECT MAX(b.ROWID)
FROM tbl_test b
WHERE b.col_pk = a.col_pk);
where col_pk is a primary key column
Reply
Bilal\ says:
22 August, 2016, 21:54
In Microsoft Access I have never get this issue i think you should switch to access from oracle because of this and some other reason like auto backup, well user interface etc.
Regards Bilal Hussain
Reply
anonymous says:
12 January, 2017, 2:46
good article
Reply
Fawad khan says:
6 June, 2017, 1:39
HI
i would suggest you to try DuplicateFilesDeleter, it can help resolve duplicate files issue.
thx
Reply
albus says:
16 January, 2018, 18:26
I believe this query can work
create table new as select distinct * from old;
and then delete old tables
Reply
- Tammy says:
  30 January, 2020, 1:45
  AWESOME! This is the best and easiest way. It took 1 minute to load millions of records
  Reply
Deepak Kushwaha says:
9 February, 2018, 2:41
Wow Very Nice Information Thanks For Sharing
Reply
Jhon Yasin says:
2 April, 2018, 11:14
Right here is the right web site for anyone who wishes to understand this topic. You understand so much its almost hard to argue with you (not that I actually would want to…HaHa). You definitely put a fresh spin on a topic that’s been written about for decades. Wonderful stuff, just excellent!
Reply
jinal says:
28 April, 2018, 15:08
Nice Blog Post.Thanks For Sharing this information.
Reply
getintopc says:
24 September, 2018, 15:56
Nice Information. I read your blog very helpful to me. Thanks for sharing it.
Reply
Nouman Ali says:
28 April, 2020, 14:46
Wow so nice Amazing post, thanks for sharing this article.
Thanks again
Reply

Deleting Duplicate Rows in Oracle

You may also like...

26 Comments

Leave a Reply Cancel reply

Deleting Duplicate Rows in Oracle

You may also like...

Row Data Multiplication in Oracle

How To Reset MySQL Autoincrement Column

45 Useful Oracle Queries

Understanding Primary Key(PK) Constraint in Oracle

26 Comments

Leave a Reply Cancel reply