How to control Segmentation min/max data_id on a non-clustered ColumnStore index - Databases

TopAnswers Databases

Meta

Databases

TeX

Code Golf

APL

C++

.net

db<>fiddle

Java

*nix

PHP

PowerShell

Python

Rust

टेक्-मराठी

Typst

Web Client Dev

Web Server Dev

How to control Segmentation min/max data_id on a non-clustered ColumnStore index

sql-server sql-server-2019 add tag

crokusek (imported from SE)

Given a simple row-based table without a PK but with a row-based clustered index like so:

    create clustered index [CX_PropertyValue] ON [dbo].[PropertyValue] ([PropertyId], [Value])

Then I wish to add a column store index that is segmented in the same order as the clustered index above:

    create nonclustered columnstore index CS_IX_PropertyValue on dbo.PropertyValue( 
        PropertyId, Value
    )
    with (drop_existing = on, maxdop = 1); -- maxdop=1 to preserve the order by property 

MaxDop hint to preserve order came from: [here](https://www.sqlpassion.at/archive/2017/01/30/columnstore-segment-elimination/)

Then the following query was used to report the min/max data_id for the PropertyId column and it the full range was reported on each of the 7 segments:

    -- (Warning: This query is not joined quite right in that it may report the wrong table column 
    -- name but the min/max data itself makes it obvious which column is PropertyId for this case).
    select top 20000000000
           s.Name as SchemaName, 
           t.Name as TableName,
           c.name as ColumnName,
           c.column_id as ColumnId,
           cs.segment_id as SegmentId,
           cs.min_data_id as MinValue,
           cs.max_data_id as MaxValue
      from sys.schemas s
      join sys.tables t
        on t.schema_id = s.schema_id
      join sys.columns c
        on c.object_id = t.object_id
      join sys.partitions as p  
        on p.object_id = t.object_id
      join sys.column_store_segments cs
        on cs.hobt_id = p.hobt_id
       and cs.column_id = c.column_id
     order by s.Name, t.Name, c.Name, cs.Segment_Id

I tried making the clustered index unique which did slightly affect the reported ranges but still was not monotonically increasing.

Any ideas?

Here is a [Link](https://joyfulcraftsmen.com/blog/cci-how-to-load-data-for-better-columnstore-segment-elimination/?unapproved=28694&moderation-hash=b9e689998ece9e7a2614ae23a9e94982#comment-28694) that accomplished the segmentation in this manner but I don't see any difference.

Version:  Microsoft SQL Server 2019 (RTM) - 15.0.2000.5 (X64)

Top Answer

Paul White

This isn't directly supported for **nonclustered** columnstore indexes.

It does work for **clustered** columnstore.

Azure Synapse Analytics has language support for doing it in one step e.g.:

```
CREATE CLUSTERED COLUMNSTORE INDEX <index_name>
ON dbo.PropertyValue
ORDER (PropertyId, Value);
```

This syntax has not yet made it to the SQL Server box product, though it is available under an undocumented feature flag so perhaps it isn't far away. It still won't work on a nonclustered columnstore index though.

### General Workaround

The best you can do is to create the nonclustered **rowstore** index with `MAXDOP = 1`, then replace it with a nonclustered **columnstore** index with `MAXDOP = 1` and `DROP_EXISTING = ON`.

This isn't guaranteed to preserve the ordering as you want, but it is highly likely:

```
CREATE NONCLUSTERED INDEX CS_IX_PropertyValue
ON dbo.PropertyValue (PropertyId, Value)
WITH (MAXDOP = 1);

CREATE NONCLUSTERED COLUMNSTORE INDEX CS_IX_PropertyValue
ON dbo.PropertyValue (PropertyId, Value)
WITH (DROP_EXISTING = ON, MAXDOP = 1);
```

This will give you your best chance of achieving [rowgroup elimination][1] when filtering on `PropertyId`.

### Special Case

When the desired ordering matches the rowstore clustered index (as appears to be the case in the question), there is no need to create a rowstore nonclustered index first. The [documentation says][2]:

> Note, for nonclustered columnstore index (NCCI), if the base rowstore table has a clustered index, the rows are already ordered. In this case, the resultant nonclustered columnstore index will automatically be ordered.

So, in your case, it should be enough to run only:

```
CREATE NONCLUSTERED COLUMNSTORE INDEX CS_IX_PropertyValue
ON dbo.PropertyValue (PropertyId, Value)
WITH (MAXDOP = 1);
```

### Metadata

You can see the min and max values for each rowgroup and column using:

```
SELECT
    CSS.column_id,
    column_name = C.[name],
    rowgroup_id = CSS.segment_id,
    CSS.min_data_id,
    CSS.max_data_id,
    CSS.row_count
FROM sys.partitions AS P
JOIN sys.column_store_segments AS CSS
    ON CSS.hobt_id = P.hobt_id
JOIN sys.indexes AS I
    ON I.[object_id] = P.[object_id]
    AND I.index_id = P.index_id
JOIN sys.index_columns AS IC
    ON IC.[object_id] = I.[object_id]
    AND IC.index_id = I.index_id
    AND IC.index_column_id = CSS.column_id
JOIN sys.columns AS C
    ON C.[object_id] = P.[object_id]
    AND C.column_id = IC.column_id
WHERE
    P.[object_id] = OBJECT_ID(N'dbo.PropertyValue', N'U')
ORDER BY
    C.column_id,
    CSS.segment_id;
```

### Demo

<>https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=8af9375320c889339475dcbb98a8c618


  [1]: https://techcommunity.microsoft.com/t5/sql-server/columnstore-index-performance-rowgroup-elimination/ba-p/385034
  [2]: https://docs.microsoft.com/en-us/sql/relational-databases/indexes/columnstore-indexes-query-performance#recommendations-for-improving-query-performance

1 Answer