We have a database with a large amount of data stored in a field of type varbinary(max). At some point we can purge that data for most rows, but not all. Our plan is to make that field nullable, and simply null out the data when it is no longer needed. Once we do that we would like to reduce the size of the DB. What is the best way to accomplish this?
If there isn’t a good way to reclaim space with the current setup, one idea I have is to move that data field to a separate table with just two columns: the key to the main table and the data field. Then we could simply delete the rows when they are no longer needed. (And then do some sort of shrink.) However, this would be a much more difficult change to make than simply making the existing field nullable.
Note: I actually don’t care so much about making the database file smaller, but I do care about the newly freed up space becoming re-usable.
Over 90% of the DB size is this one field. I’m at 3TB already.
It looks to me like just updating the columns to NULL
will release pages for reuse. Here’s a Very Scottish® demo, to celebrate it being almost 5PM, EST.
With rows inserted, let’s check on our index pages.
xxxxxxxxxx
SELECT OBJECT_NAME(i.object_id) AS table_name,
i.name AS index_name,
MAX(a.used_pages) AS leaf_me_alone
FROM sys.indexes AS i
JOIN sys.partitions AS p
ON p.object_id = i.object_id
AND p.index_id = i.index_id
JOIN sys.allocation_units AS a
ON a.container_id = p.partition_id
WHERE OBJECT_NAME(i.object_id) = 'RobertBurns'
GROUP BY i.object_id, i.index_id, i.name
ORDER BY OBJECT_NAME(i.object_id), i.index_id;
After the insert, I get this. Actual pages may vary for you.
xxxxxxxxxx
table_name index_name leaf_me_alone
RobertBurns PK__RobertBu__3214EC074BE633A2 5587
RobertBurns ix_novarbinary 10
RobertBurns ix_yesvarbinary 5581
Let’s NULL
out some rows!
xxxxxxxxxx
UPDATE rb
SET rb.HaggisAddress = NULL
FROM dbo.RobertBurns AS rb
WHERE rb.Id % 15 = 0;
And check back in on our pages:
xxxxxxxxxx
table_name index_name leaf_me_alone
RobertBurns PK__RobertBu__3214EC074BE633A2 5300
RobertBurns ix_novarbinary 10
RobertBurns ix_yesvarbinary 5273
So page count was reduced. Huzzah! For the two indexes that touch our VARBINARY
data, they lost a buncha pages. That means they’re back in circulation for other objects to use. Since I’m in tempdb, they probably get gobbled up pretty quickly by all the garbage things that go on here.
Now let’s put some data back in:
xxxxxxxxxx
INSERT dbo.RobertBurns (Scotch, HaggisAddress )
SELECT TOP 10 rb.Scotch, rb.HaggisAddress
FROM dbo.RobertBurns AS rb;
And checking back in:
xxxxxxxxxx
table_name index_name leaf_me_alone
RobertBurns PK__RobertBu__3214EC074BE633A2 5330
RobertBurns ix_novarbinary 11
RobertBurns ix_yesvarbinary 5305
Page counts inched up a bit.
So, looks like you don’t have to do anything too crazy, or even shrink your database to get space reuse. I think you were conflating the behavior of dropping columns and needing to run DBCC CLEANTABLE
with what you’re actually doing.
Hope this helps!