Tips: Deleting a Lot of Rows from a Database

Recently, I had to remove a large number of rows, almost 10 million, from a very active database table in a LIVE system. Forget why for now as that is the subject of another post. The basic problem was how to remove 99.8% of the rows without impacting users or removing the few rows that we actually wanted to keep. To make matters worse, the field that was determined to have all the answers didn’t have an index.

Adding an index to the field in question would only solve part of the problem. It would take resources just to apply the key to a table that large, especially one that already had several indices. There are also locking issues and let us not forget the Transaction Log usage. We are talking Gigabytes of space for any new index and for temporary Transaction Log space.

The Query

The basic problem was how to delete them without manually running lots of queries. I decided that deleting less than 10,000 rows wasn’t a burden on the system. So if I picked a safe number, 1,000, and did it 10,000 times, users could use the system and I would solve my problem.

The key was to do this repeatedly, so I used some Transact SQL from the dredges of my memory, with some syntax help from MSDN. The beauty was that I didn’t need to run it all at once. I could run a few thousand, or any number of iterations and wait for my Transaction Log storage to empty during the regular backup cycle. This also allowed me to monitor the process and make sure that the system was performing well while not stressing my storage too much.

I then embedded a transaction. That would slow things down a bit, but also make sure that I wasn’t locking the table much. The transaction may have been unnecessary, but by the time I researched it, I could be done. It also fell under the “safe-than-sorry” line of thought.

DECLARE @Count INT
SELECT @Count = 0
While (@Count < 10000)
BEGIN
SELECT @Count = @Count + 1
BEGIN TRAN
DELETE FROM dm_really_big_table
WHERE r_object_id in
(SELECT TOP 1000 r_object_id FROM dm_really_big_table
WHERE column = 'value')
COMMIT TRAN
END

This was what I used in SQL Server 2000. You can actually specify a limited Delete clause in SQL Server 2005 without the sub-query. With minor tweaks, this approach will work with Oracle as well.