Fast alter column NVARCHAR(4000) to NVARCHAR(260)

sql-server sql-server-2017 add tag

Nick Whaley (imported from SE)

I have a performance problem with very large memory grants handling this table with a couple of `NVARCHAR(4000)` columns. Thing is these columns are never larger than `NVARCHAR(260)`.

Using

    ALTER TABLE [table] ALTER COLUMN [col] NVARCHAR(260) NULL

results in SQL Server rewriting the entire table (and using 2x table size in log space), which is billions of rows, only to change nothing, isn't an option. Increasing the column width doesn't have this problem, but decreasing it does.

I have tried creating a constraint `CHECK (DATALENGTH([col]) <= 520)` or `CHECK (LEN([col]) <= 260)` and SQL Server still decides to re-write the entire table.

Is there any way to alter the column data type as a metadata-only operation? Without the expense of rewriting the entire table?  I'm using SQL Server 2017 (14.0.2027.2 and 14.0.3192.2).

Here is a sample DDL table to use to reproduce:

	CREATE TABLE [table](
		id INT IDENTITY(1,1) NOT NULL,
		[col] NVARCHAR(4000) NULL,
		CONSTRAINT [PK_test] PRIMARY KEY CLUSTERED (id ASC)
	);

And then run the `ALTER`.

Top Answer

Josh Darnell (imported from SE)

> Is there any way to alter the column data type as a metadata-only operation?

I don't think so, this is how the product works right now.  There are some really great workarounds to this limitation proposed in [Joe's answer][1].

> ...results in SQL Server rewriting the entire table (and using 2x table size in log space)

I'm going to respond to the two parts of that statement separately.

## Rewriting the Table

As I mentioned before, there's not really any way to avoid this.  That seems to be the reality of the situation, even if it doesn't make complete sense from our perspective as customers.

Looking at `DBCC PAGE` before and after changing the column from 4000 to 260 shows that all of the data is duplicated on the data page (my test table had `'A'` 260 times in the row): 

[![Screenshot of data portion of dbcc page before and after][2]][2]

At this point, there are two copies of the exact same data on the page.  The "old" column is essentially deleted (the id is changed from id=2 to id=67108865), and the "new" version of the column is updated to point to the new offset of the data on the page:

[![Screenshot of column metadata portions of dbcc page before and after][3]][3]

## Using 2x Table Size in Log Space

Adding `WITH (ONLINE = ON)` to the end of the `ALTER` statement **reduces the logging activity by about half**, so this is one improvement you could make to reduce the amount of writes to disk / disk space needed.

I used this test harness to try it out:

```
USE [master];
GO
DROP DATABASE IF EXISTS [248749];
GO
CREATE DATABASE [248749] 
ON PRIMARY 
(
	NAME = N'248749', 
	FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL14.SQL2017\MSSQL\DATA\248749.mdf', 
	SIZE = 2048000KB, 
	FILEGROWTH = 65536KB
)
LOG ON 
(
	NAME = N'248749_log', 
	FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL14.SQL2017\MSSQL\DATA\248749_log.ldf', 
	SIZE = 2048000KB, 
	FILEGROWTH = 65536KB
);
GO
USE [248749];
GO

CREATE TABLE dbo.[table]
(
    id int IDENTITY(1,1) NOT NULL,
    [col] nvarchar (4000) NULL,

    CONSTRAINT [PK_test] PRIMARY KEY CLUSTERED (id ASC)
);

INSERT INTO dbo.[table]
SELECT TOP (1000000)
	REPLICATE(N'A', 260)
FROM master.dbo.spt_values v1
	CROSS JOIN master.dbo.spt_values v2
	CROSS JOIN master.dbo.spt_values v3;
GO
```

I checked `sys.dm_io_virtual_file_stats(DB_ID(N'248749'), DEFAULT)` before and after running the `ALTER` statement, and here are the differences:

### Default (Offline) `ALTER`

 - Data file writes / bytes written: 34,809 / 2,193,801,216
 - Log file writes / bytes written: 40,953 / 1,484,910,080

## Online `ALTER`

 - Data file writes / bytes written: 36,874	/ 1,693,745,152 (22.8 % drop)
 - Log file writes / bytes written: 24,680 / 866,166,272 (41 % drop)

As you can see, there was a slight drop in the data file writes, and a major drop in the log file writes.

  [1]: https://dba.stackexchange.com/a/248754/6141
  [2]: https://i.stack.imgur.com/gVLLk.png
  [3]: https://i.stack.imgur.com/ivKkO.png

Answer #2

Joe Obbish (imported from SE)

I don't know of a way to directly accomplish what you're looking for here. Note that the query optimizer isn't smart enough at this time to factor in constraints for memory grant calculations, so the constraint wouldn't have helped anyway. A few methods that avoid rewriting the table's data:

 1. CAST the column as NVARCHAR(260) in all codes that uses it. The query optimizer will calculate the memory grant using the casted data type instead of the raw one.
 2. Rename the table and create a view that does the cast instead. This accomplishes the same thing as option 1 but may limit the amount of code you need to update.
 3. Create a non-persisted computed column with the right data type and have all of your queries select from that column instead of the original one.
 4. Rename existing column and add computed column with the original name. Then adjust all of your queries making updates or inserts to the original column to use new column name instead.

2 Answers