johnnym (imported from SE)
**Update 2014-12-18**

With the overwhelming response to the main question being "No", the more interesting responses have focused on part 2, how to solve the performance puzzle with an explicit ORDER BY.  Although I've marked an answer already, I wouldn't be surprised if there were an even better performing solution.

**Original**

This question arose because the only extremely fast solution I could find to a particular problem only works without an ORDER BY clause. Below is the full T-SQL needed to produce the problem, along with my proposed solution (I am using SQL Server 2008 R2, if that matters.)

--Create Orders table
IF OBJECT_ID('tempdb..#Orders') IS NOT NULL DROP TABLE #Orders
CREATE TABLE #Orders
(
OrderID    INT NOT NULL IDENTITY(1,1)
, CustID     INT NOT NULL
, StoreID    INT NOT NULL
, Amount     FLOAT NOT NULL
)
CREATE CLUSTERED INDEX IX ON #Orders (StoreID, Amount DESC, CustID)

--Add 1 million rows w/ 100K Customers each of whom had 10 orders
;WITH
Cte0 AS (SELECT 1 AS C UNION ALL SELECT 1), --2 rows
Cte1 AS (SELECT 1 AS C FROM Cte0 AS A, Cte0 AS B),--4 rows
Cte2 AS (SELECT 1 AS C FROM Cte1 AS A ,Cte1 AS B),--16 rows
Cte3 AS (SELECT 1 AS C FROM Cte2 AS A ,Cte2 AS B),--256 rows
Cte4 AS (SELECT 1 AS C FROM Cte3 AS A ,Cte3 AS B),--65536 rows
Cte5 AS (SELECT 1 AS C FROM Cte4 AS A ,Cte2 AS B),--1048576 rows
FinalCte AS (SELECT  ROW_NUMBER() OVER (ORDER BY C) AS Number FROM   Cte5)
INSERT INTO #Orders (CustID, StoreID, Amount)
SELECT CustID = Number / 10
, StoreID    = Number % 4
, Amount     = 1000 * RAND(Number)
FROM  FinalCte
WHERE Number <= 1000000

SET STATISTICS IO ON
SET STATISTICS TIME ON

--For StoreID = 1, find the top 500 customers ordered by their most expensive purchase (Amount)

--Solution A: Without ORDER BY
DECLARE @Top INT = 500
SELECT DISTINCT TOP (@Top) CustID
FROM #Orders WITH(FORCESEEK)
WHERE StoreID = 1
OPTION(OPTIMIZE FOR (@Top = 1), FAST 1);
--9 logical reads, CPU Time = 0 ms, elapsed time = 1 ms
GO
--Solution B: With ORDER BY
DECLARE @Top INT = 500
SELECT TOP (@Top) CustID
FROM #Orders
WHERE StoreID = 1
GROUP BY CustID
ORDER BY MAX(Amount) DESC
OPTION(MAXDOP 1)
--745 logical reads, CPU Time = 141 ms, elapsed time = 145 ms
--Uses Sort operator

GO

Here are the execution plans for Solution A and B, respectively:

![Sol A][1]

![Sol B][2]

Solution A gives the performance I need, but I couldn't get it to work with the same performance when adding any kind ORDER BY clause (e.g., see Solution B).  And it certainly seems like Solution A would have to deliver its results in order, since 1) the table has only one index on it, 2) a seek is forced, thus eliminating the possibility of its using an allocation order scan based on IAM pages.

So my questions are:

1. Am I right that it will guarantee the order in this case without an order by clause?

2. If not, is there another method to force a plan that is as fast as Solution A, preferably one that avoids sorts? Note that it would have to solve the exact same problem (for StoreID = 1, find the top 500 customers ordered by their most expensive purchase amount). It would also have to still use the #Orders table, but different indexing schemes would be OK.

[1]: http://i.stack.imgur.com/PhZLn.jpg
[2]: http://i.stack.imgur.com/K3kZJ.jpg
Paul White
>1. Am I right that it will guarantee the order in this case without an order by clause?

**No.** A *Flow Distinct* that preserves order (allowing ORDER BY without a sort) is not implemented in SQL Server today. It is possible to do in principle, but then many things are possible if we are allowed to change the SQL Server source code. If you can make a good case for this development work, you could [suggest it to Microsoft][1].

>2. If not, is there another method to force a plan that is as fast as Solution A, preferably one that avoids sorts?

**Yes.** (Table & query hints only required when using the pre-2014 cardinality estimator):

CREATE UNIQUE NONCLUSTERED INDEX i
ON #Orders (StoreID, CustID, Amount, OrderID);

-- Query
SELECT TOP (500)
O.CustID,
O.Amount
FROM #Orders AS O
WITH (FORCESEEK(IX (StoreID)))
WHERE O.StoreID = 1
AND NOT EXISTS
(
SELECT NULL
FROM #Orders AS O2
WITH (FORCESEEK(i (StoreID, CustID, Amount)))
WHERE
O2.StoreID = O.StoreID
AND O2.CustID = O.CustID
AND O2.Amount >= O.Amount
AND
(
O2.Amount > O.Amount
OR
(
O2.Amount = O.Amount
AND O2.OrderID > O.OrderID
)
)
)
ORDER BY
O.Amount DESC
OPTION (MAXDOP 1);

![Actual Execution Plan][2]

(500 row(s) affected)

SQL Server Execution Times:
CPU time = 0 ms,  elapsed time = 4 ms.

### SQL CLR solution

The following script shows using a SQL CLR table-valued function to meet the stated requirements. I am not a C# expert, so the code may bear improvement:

USE Sandpit;
GO
-- Ensure SQLCLR is enabled
EXECUTE sys.sp_configure
@configname = 'clr enabled',
@configvalue = 1;
RECONFIGURE;
GO
-- Lazy, but effective to allow EXTERNAL_ACCESS
ALTER DATABASE Sandpit
SET TRUSTWORTHY ON;
GO
-- The CLR assembly
CREATE ASSEMBLY FlowDistinctOrder
AUTHORIZATION dbo
WITH PERMISSION_SET = EXTERNAL_ACCESS;
GO
-- The CLR TVF with order guarantee
CREATE FUNCTION dbo.FlowDistinctOrder
(
@ServerName nvarchar(128),
@DatabaseName nvarchar(128),
@MaxRows bigint
)
RETURNS TABLE
(
CustID integer NULL,
Amount float NULL
)
ORDER (Amount DESC)
AS EXTERNAL NAME FlowDistinctOrder.UserDefinedFunctions.FlowDistinctOrder;

Test table and sample data from the question:

-- Test table
CREATE TABLE dbo.Orders
(
OrderID    integer	NOT NULL IDENTITY(1,1),
CustID     integer	NOT NULL,
StoreID    integer	NOT NULL,
Amount     float	NOT NULL
);
GO
-- Sample data
WITH
Cte0 AS (SELECT 1 AS C UNION ALL SELECT 1), --2 rows
Cte1 AS (SELECT 1 AS C FROM Cte0 AS A, Cte0 AS B),--4 rows
Cte2 AS (SELECT 1 AS C FROM Cte1 AS A ,Cte1 AS B),--16 rows
Cte3 AS (SELECT 1 AS C FROM Cte2 AS A ,Cte2 AS B),--256 rows
Cte4 AS (SELECT 1 AS C FROM Cte3 AS A ,Cte3 AS B),--65536 rows
Cte5 AS (SELECT 1 AS C FROM Cte4 AS A ,Cte2 AS B),--1048576 rows
FinalCte AS (SELECT  ROW_NUMBER() OVER (ORDER BY C) AS Number FROM   Cte5)
INSERT dbo.Orders
(CustID, StoreID, Amount)
SELECT
CustID	= Number / 10,
StoreID = Number % 4,
Amount  = 1000 * RAND(Number)
FROM FinalCte
WHERE
Number <= 1000000;
GO
-- Index
CREATE CLUSTERED INDEX IX
ON dbo.Orders
(StoreID ASC, Amount DESC, CustID ASC);

Function test:

-- Test the function
-- Run several times to ensure connection is cached
-- and CLR code fully compiled
DECLARE @Start datetime2 = SYSUTCDATETIME();

SELECT TOP (500)
FDO.CustID
FROM dbo.FlowDistinctOrder
(
@@SERVERNAME,	-- For external connection
DB_NAME(),		-- For external connection
500				-- Number of rows to return
) AS FDO
ORDER BY
FDO.Amount DESC;

SELECT DATEDIFF(MILLISECOND, @Start, SYSUTCDATETIME());

Execution plan (note the validation of the ORDER guarantee):

![CLR function execution plan][3]

On my laptop, this typically executes in 80-100ms. This is nowhere near as fast as the T-SQL rewrite above, but it should show good performance stability in the face of different data distributions.

Source code:

cs
using Microsoft.SqlServer.Server;
using System.Collections;
using System.Collections.Generic;
using System.Data.SqlClient;

public partial class UserDefinedFunctions
{
private sealed class ReverseComparer<T> : IComparer<T>
{

public ReverseComparer(IComparer<T> original)
{
this.original = original;
}

public int Compare(T left, T right)
{
return original.Compare(right, left);
}
}

[SqlFunction
(
SystemDataAccess = SystemDataAccessKind.None,
FillRowMethodName = "FillRow",
TableDefinition = "CustID integer NULL, Amount float NULL"
)
]
public static IEnumerable FlowDistinctOrder
(
[SqlFacet (MaxSize=128)]string ServerName,
[SqlFacet (MaxSize=128)]string DatabaseName,
long MaxRows
)
{
var list = new SortedDictionary<double, int>
(new ReverseComparer<double>(Comparer<double>.Default));

var csb = new SqlConnectionStringBuilder();
csb.ConnectTimeout = 10;
csb.DataSource = ServerName;
csb.Enlist = false;
csb.InitialCatalog = DatabaseName;
csb.IntegratedSecurity = true;

using (var conn = new SqlConnection(csb.ConnectionString))
{
conn.Open();
using (var cmd = conn.CreateCommand())
{
cmd.CommandText =
@"
SELECT
O.CustID,
O.Amount
FROM dbo.Orders AS O
WHERE
O.StoreID = 1
ORDER BY
O.Amount DESC";

int custid;
double amount;

{
{
custid = rdr.GetInt32(0);
amount = rdr.GetDouble(1);

if (!list.ContainsKey(amount))
{
if (list.Count == MaxRows)
{
break;
}
}
}
}
}
}
return list;
}

public static void FillRow(object obj, out int CustID, out double Amount)
{
var v = (KeyValuePair<double, int>)obj;
CustID = v.Value;
Amount = v.Key;
}
}


[1]: https://docs.microsoft.com/en-us/sql/sql-server/sql-server-get-help
[2]: https://i.stack.imgur.com/kMIm2.png
[3]: https://i.stack.imgur.com/WDkuQ.png