We have a database with a large amount of data stored in a field of type **varbinary(max)**. At some point we can purge that data for most rows, but not all. Our plan is to make that field nullable, and simply null out the data when it is no longer needed. Once we do that we would like to reduce the size of the DB. What is the best way to accomplish this? If there isn't a good way to reclaim space with the current setup, one idea I have is to move that data field to a separate table with just two columns: the key to the main table and the data field. Then we could simply delete the rows when they are no longer needed. (And then do some sort of shrink.) However, this would be a much more difficult change to make than simply making the existing field nullable. Note: I actually don't care so much about making the database file smaller, but I do care about the newly freed up space becoming re-usable. Over 90% of the DB size is this one field. I'm at 3TB already.
meme (imported from SE)
It looks to me like just updating the columns to `NULL` will release pages for reuse. Here's a Very Scottish® demo, to celebrate it being almost 5PM, EST. USE tempdb; DROP TABLE IF EXISTS dbo.RobertBurns; CREATE TABLE dbo.RobertBurns ( Id INT IDENTITY(1, 1) PRIMARY KEY CLUSTERED, Scotch VARCHAR(50), HaggisAddress VARBINARY(MAX) ); DECLARE @AddressToAVarbinaryHaggis VARBINARY(MAX); DECLARE @AddressToAHaggis NVARCHAR(MAX) = N' Good luck to you and your honest, plump face, Great chieftain of the pudding race! Above them all you take your place, gut, stomach-lining, or intestine, You''re well worth a grace as long as my arm. The overloaded serving tray there you fill, Your buttocks shaped like a distant hilltop, Your wooden skewer could be used to fix a mill if need be, While through your pores your juices drip like liquid gold. His knife see the serving-man clean, And then cut you up with great skill, Making a trench in your bright, gushing guts To form a ditch, And then, 0h! What a glorious sight! Warm, steaming, and rich! Then, spoonful after spoonful, they eagerly eat, The devil will get the last bit, on they go, Until all their well-stretched stomachs, by-and-by, are bent like drums, Then the head of the family, about to burst, murmurs “Thank the Lord". Is there a pretentious soul who, over his French ragout, Or Italian cuisine that would make a pig sick, Or French stew that would make that same pig ill with complete and utter disgust, Looks down with a sneering, scornful attitude, on such a meal? (as Haggis) Poor devil! See him over his trash! As feeble as a withered bullrush, His skinny leg no thicker than a thin rope, His fist the size of a nut, Through a river or field to travel, Completely unfit! But look at the healthy, Haggis-fed person! The trembling earth respects him as a man! Put a knife in his fist, He''ll make it work! And legs, and arms, and heads will come off, Like the tops of thistle. You Powers who look after mankind, And dish out his bill of fare, Old Scotland wants no watery, wimpy stuff That splashes about in little wooden bowls! But, if You will grant her a grateful prayer, Give her a Haggis!'; INSERT dbo.RobertBurns (Scotch, HaggisAddress ) SELECT TOP 1000 CASE WHEN x.c % 15 = 0 THEN 'Laphroaig' WHEN x.c % 5 = 0 THEN 'Lagavulin' WHEN x.c % 3 = 0 THEN 'Port Ellen' ELSE 'Ardbeg' END AS Scotch, CONVERT(VARBINARY(MAX), REPLICATE(@AddressToAHaggis, x.c % 20 + 1)) FROM ( SELECT ROW_NUMBER() OVER (ORDER BY @@ROWCOUNT) AS c FROM sys.messages AS m ) AS x; CREATE INDEX ix_novarbinary ON dbo.RobertBurns (Scotch, Id); CREATE INDEX ix_yesvarbinary ON dbo.RobertBurns (Scotch, Id) INCLUDE (HaggisAddress); With rows inserted, let's check on our index pages. SELECT OBJECT_NAME(i.object_id) AS table_name, i.name AS index_name, MAX(a.used_pages) AS leaf_me_alone FROM sys.indexes AS i JOIN sys.partitions AS p ON p.object_id = i.object_id AND p.index_id = i.index_id JOIN sys.allocation_units AS a ON a.container_id = p.partition_id WHERE OBJECT_NAME(i.object_id) = 'RobertBurns' GROUP BY i.object_id, i.index_id, i.name ORDER BY OBJECT_NAME(i.object_id), i.index_id; After the insert, I get this. Actual pages may vary for you. table_name index_name leaf_me_alone RobertBurns PK__RobertBu__3214EC074BE633A2 5587 RobertBurns ix_novarbinary 10 RobertBurns ix_yesvarbinary 5581 Let's `NULL` out some rows! UPDATE rb SET rb.HaggisAddress = NULL FROM dbo.RobertBurns AS rb WHERE rb.Id % 15 = 0; And check back in on our pages: table_name index_name leaf_me_alone RobertBurns PK__RobertBu__3214EC074BE633A2 5300 RobertBurns ix_novarbinary 10 RobertBurns ix_yesvarbinary 5273 So page count was reduced. Huzzah! For the two indexes that touch our `VARBINARY` data, they lost a buncha pages. That means they're back in circulation for other objects to use. Since I'm in tempdb, they probably get gobbled up pretty quickly by all the garbage things that go on here. Now let's put some data back in: INSERT dbo.RobertBurns (Scotch, HaggisAddress ) SELECT TOP 10 rb.Scotch, rb.HaggisAddress FROM dbo.RobertBurns AS rb; And checking back in: table_name index_name leaf_me_alone RobertBurns PK__RobertBu__3214EC074BE633A2 5330 RobertBurns ix_novarbinary 11 RobertBurns ix_yesvarbinary 5305 Page counts inched up a bit. So, looks like you don't have to do anything too crazy, or even shrink your database to get space reuse. I think you were conflating the behavior of dropping columns and needing to run `DBCC CLEANTABLE` with what you're actually doing. Hope this helps!