Given a simple row-based table without a PK but with a row-based clustered index like so:
create clustered index [CX_PropertyValue] ON [dbo].[PropertyValue] ([PropertyId], [Value])
Then I wish to add a column store index that is segmented in the same order as the clustered index above:
create nonclustered columnstore index CS_IX_PropertyValue on dbo.PropertyValue(
PropertyId, Value
)
with (drop_existing = on, maxdop = 1); -- maxdop=1 to preserve the order by property
MaxDop hint to preserve order came from: [here](https://www.sqlpassion.at/archive/2017/01/30/columnstore-segment-elimination/)
Then the following query was used to report the min/max data_id for the PropertyId column and it the full range was reported on each of the 7 segments:
-- (Warning: This query is not joined quite right in that it may report the wrong table column
-- name but the min/max data itself makes it obvious which column is PropertyId for this case).
select top 20000000000
s.Name as SchemaName,
t.Name as TableName,
c.name as ColumnName,
c.column_id as ColumnId,
cs.segment_id as SegmentId,
cs.min_data_id as MinValue,
cs.max_data_id as MaxValue
from sys.schemas s
join sys.tables t
on t.schema_id = s.schema_id
join sys.columns c
on c.object_id = t.object_id
join sys.partitions as p
on p.object_id = t.object_id
join sys.column_store_segments cs
on cs.hobt_id = p.hobt_id
and cs.column_id = c.column_id
order by s.Name, t.Name, c.Name, cs.Segment_Id
I tried making the clustered index unique which did slightly affect the reported ranges but still was not monotonically increasing.
Any ideas?
Here is a [Link](https://joyfulcraftsmen.com/blog/cci-how-to-load-data-for-better-columnstore-segment-elimination/?unapproved=28694&moderation-hash=b9e689998ece9e7a2614ae23a9e94982#comment-28694) that accomplished the segmentation in this manner but I don't see any difference.
Version: Microsoft SQL Server 2019 (RTM) - 15.0.2000.5 (X64)
Top Answer
Paul White
This isn't directly supported for **nonclustered** columnstore indexes.
It does work for **clustered** columnstore.
Azure Synapse Analytics has language support for doing it in one step e.g.:
```
CREATE CLUSTERED COLUMNSTORE INDEX <index_name>
ON dbo.PropertyValue
ORDER (PropertyId, Value);
```
This syntax has not yet made it to the SQL Server box product, though it is available under an undocumented feature flag so perhaps it isn't far away. It still won't work on a nonclustered columnstore index though.
### General Workaround
The best you can do is to create the nonclustered **rowstore** index with `MAXDOP = 1`, then replace it with a nonclustered **columnstore** index with `MAXDOP = 1` and `DROP_EXISTING = ON`.
This isn't guaranteed to preserve the ordering as you want, but it is highly likely:
```
CREATE NONCLUSTERED INDEX CS_IX_PropertyValue
ON dbo.PropertyValue (PropertyId, Value)
WITH (MAXDOP = 1);
CREATE NONCLUSTERED COLUMNSTORE INDEX CS_IX_PropertyValue
ON dbo.PropertyValue (PropertyId, Value)
WITH (DROP_EXISTING = ON, MAXDOP = 1);
```
This will give you your best chance of achieving [rowgroup elimination][1] when filtering on `PropertyId`.
### Special Case
When the desired ordering matches the rowstore clustered index (as appears to be the case in the question), there is no need to create a rowstore nonclustered index first. The [documentation says][2]:
> Note, for nonclustered columnstore index (NCCI), if the base rowstore table has a clustered index, the rows are already ordered. In this case, the resultant nonclustered columnstore index will automatically be ordered.
So, in your case, it should be enough to run only:
```
CREATE NONCLUSTERED COLUMNSTORE INDEX CS_IX_PropertyValue
ON dbo.PropertyValue (PropertyId, Value)
WITH (MAXDOP = 1);
```
### Metadata
You can see the min and max values for each rowgroup and column using:
```
SELECT
CSS.column_id,
column_name = C.[name],
rowgroup_id = CSS.segment_id,
CSS.min_data_id,
CSS.max_data_id,
CSS.row_count
FROM sys.partitions AS P
JOIN sys.column_store_segments AS CSS
ON CSS.hobt_id = P.hobt_id
JOIN sys.indexes AS I
ON I.[object_id] = P.[object_id]
AND I.index_id = P.index_id
JOIN sys.index_columns AS IC
ON IC.[object_id] = I.[object_id]
AND IC.index_id = I.index_id
AND IC.index_column_id = CSS.column_id
JOIN sys.columns AS C
ON C.[object_id] = P.[object_id]
AND C.column_id = IC.column_id
WHERE
P.[object_id] = OBJECT_ID(N'dbo.PropertyValue', N'U')
ORDER BY
C.column_id,
CSS.segment_id;
```
### Demo
<>https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=8af9375320c889339475dcbb98a8c618
[1]: https://techcommunity.microsoft.com/t5/sql-server/columnstore-index-performance-rowgroup-elimination/ba-p/385034
[2]: https://docs.microsoft.com/en-us/sql/relational-databases/indexes/columnstore-indexes-query-performance#recommendations-for-improving-query-performance