According to [official Microsoft BOL][1] DENSE_RANK is nondeterministic ([RANK()][2]). But according to [Ranking Functions by Itzik Ben-Gan][3] "... the RANK() and DENSE_RANK() functions are always deterministic". Who is right?
What I have found so far:
[Microsoft's Definition][4] "Deterministic functions always return the same result any time they are called with a specific set of input values and given the same state of the database."
So in Set theory tables
Employees
Employee Salary
Sue Right 1.00
Robin Page 1.00
Phil Factor 1.00
and
Employees2
Employee Salary
Phil Factor 1.00
Sue Right 1.00
Robin Page 1.00
are the same. But Ranking functions return different values:
CREATE TABLE [dbo].[Employees](
--[ID] [int] IDENTITY(1,1) NOT NULL,
[Employee] [varchar](150) NOT NULL,
[Salary] [smallmoney] NULL,
) ON [PRIMARY]
GO
CREATE TABLE [dbo].[Employees2](
--[ID] [int] IDENTITY(1,1) NOT NULL,
[Employee] [varchar](150) NOT NULL,
[Salary] [smallmoney] NULL,
) ON [PRIMARY]
INSERT INTO [dbo].[Employees]
([Employee] ,[Salary])
VALUES
('Sue Right', 1)
, ('Robin Page', 1)
,('Phil Factor', 1 )
GO
INSERT INTO [dbo].[Employees2]
([Employee] ,[Salary])
VALUES
('Phil Factor', 1 )
,('Sue Right', 1)
,('Robin Page', 1)
GO
SELECT RANK() OVER ( ORDER BY Salary) AS [Rank]
, DENSE_RANK() OVER (ORDER BY Salary ) AS [Dense_rank]
, [Employee]
FROM
dbo.Employees
SELECT RANK() OVER ( ORDER BY Salary) AS [Rank]
, DENSE_RANK() OVER (ORDER BY Salary ) AS [Dense_rank]
, [Employee]
FROM
dbo.Employees2
SELECT NTILE(3) OVER ( ORDER BY SALARY )
, [Employee]
FROM
dbo.Employees
SELECT NTILE(3) OVER ( ORDER BY SALARY )
, [Employee]
FROM
dbo.Employees2
[1]: http://msdn.microsoft.com/en-GB/library/ms173825.aspx
[2]: http://msdn.microsoft.com/en-GB/library/ms176102.aspx
[3]: http://sqlmag.com/t-sql/ranking-functions
[4]: http://msdn.microsoft.com/en-GB/library/ms178091.aspx
Top Answer
Paul White
>*According to official Microsoft BOL DENSE_RANK is nondeterministic (RANK()). But according to Ranking Functions by Itzik Ben-Gan "... the RANK() and DENSE_RANK() functions are always deterministic". Who is right?*
They are both right, because they are using different senses of the word "deterministic".
From the SQL Server optimizer's point of view, "deterministic" has a very precise meaning; a meaning that existed before window and ranking functions were added to the product. To the optimizer, the "deterministic" property defines whether a function can be freely duplicated within its internal tree structures during optimization. This is not legal for a non-deterministic function.
*Deterministic* here means: the exact instance of the function always returns the same output for the same input, no matter how many times it is called. This is never true for windowing functions, by definition, because as a (single-row) scalar function, they do not return the same result within a row or across rows. To state it simply, using `ROW_NUMBER` as an example:
> The `ROW_NUMBER` function returns different values for different rows (by definition!), so for *optimization purposes* it is nondeterministic
This is the sense BOL is using.
Itzik is making a different point about the determinism of the result as a whole. Over an ordered input set (with suitable tie-breaking) the output is a "deterministic" sequence. That is a valid observation, but it is not the "deterministic" quality that is important during query optimization.