This is a kind of trivial task in my C# homeworld, but I don't yet make it in SQL and would prefer to solve it set-based (without cursors). A resultset should come from a query like this. SELECT SomeId, MyDate, dbo.udfLastHitRecursive(param1, param2, MyDate) as 'Qualifying' FROM T **How should it work** > I send those three params into a UDF. > The UDF internally use params to fetch related <= 90 days older rows, from a view. > The UDF traverse 'MyDate' and return 1 if it should be included in a total calculation. > If it should not, then it return 0. > Named here as "qualifying". **What the udf will do** > List the rows in date order. > Calculate the days between rows. > First row in resultset defaults to Hit = 1. > If the difference is up to 90, > - then pass to next row until the sum of gaps is 90 days (90th day must pass) > When reached, set Hit to 1 and reset gap to 0. It would also work to instead omit the row from result. |(column by udf, which not work yet) Date Calc_date MaxDiff | Qualifying 2014-01-01 11:00 2014-01-01 0 | 1 2014-01-03 10:00 2014-01-01 2 | 0 2014-01-04 09:30 2014-01-03 1 | 0 2014-04-01 10:00 2014-01-04 87 | 0 2014-05-01 11:00 2014-04-01 30 | 1 In the table above, MaxDiff column is the gap from date in previous line. The problem with my attempts so far is that I can't ignore second last row in the sample above. I paste the udf I have compiled just now. Though, is just a placeholder and won't give useful result. ;WITH cte (someid, otherkey, mydate, cost) AS ( SELECT someid, otherkey, mydate, cost FROM dbo.vGetVisits WHERE someid = @someid AND VisitCode = 3 AND otherkey = @otherkey AND CONVERT(Date,mydate) = @VisitDate UNION ALL SELECT top 1 e.someid, e.otherkey, e.mydate, e.cost FROM dbo.vGetVisits AS E WHERE CONVERT(date, e.mydate) BETWEEN DateAdd(dd,-90,CONVERT(Date,@VisitDate)) AND CONVERT(Date,@VisitDate) AND e.someid = @someid AND e.VisitCode = 3 AND e.otherkey = @otherkey AND CONVERT(Date,e.mydate) = @VisitDate order by e.mydate ) I have another query which I define separately which is more close to what I need, but blocked with the fact I can't calculate on windowed columns. I also tried one similiar which give more or less same output just with a LAG() over MyDate, surrounded with a datediff. SELECT t.Mydate, t.VisitCode, t.Cost, t.SomeId, t.otherkey, t.MaxDiff, t.DateDiff FROM ( SELECT *, MaxDiff = LAST_VALUE(Diff.Diff) OVER ( ORDER BY Diff.Mydate ASC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) FROM ( SELECT *, Diff = ISNULL(DATEDIFF(DAY, LAST_VALUE(r.Mydate) OVER ( ORDER BY r.Mydate ASC ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING), r.Mydate),0), DateDiff = ISNULL(LAST_VALUE(r.Mydate) OVER ( ORDER BY r.Mydate ASC ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING), r.Mydate) FROM dbo.vGetVisits AS r WHERE r.VisitCode = 3 AND r.SomeId = @SomeID AND r.otherkey = @otherkey ) AS Diff ) AS t WHERE t.VisitCode = 3 AND t.SomeId = @SomeId AND t.otherkey = @otherkey AND t.Diff <= 90 ORDER BY t.Mydate ASC;
IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[vGetVisits]') AND type in (N'U')) DROP TABLE [dbo].[vGetVisits] GO CREATE TABLE [dbo].[vGetVisits]( [id] [int] NOT NULL, [mydate] [datetime] NOT NULL, CONSTRAINT [PK_vGetVisits] PRIMARY KEY CLUSTERED ( [id] ASC ) ) GO INSERT INTO [dbo].[vGetVisits]([id], [mydate]) VALUES (1, '2014-01-01 11:00'), (2, '2014-01-03 10:00'), (3, '2014-01-04 09:30'), (4, '2014-04-01 10:00'), (5, '2014-05-01 11:00'), (6, '2014-07-01 09:00'), (7, '2014-07-31 08:00'); GO -- Clean up IF OBJECT_ID (N'dbo.udfLastHitRecursive', N'FN') IS NOT NULL DROP FUNCTION udfLastHitRecursive; GO -- Actual Function CREATE FUNCTION dbo.udfLastHitRecursive ( @MyDate datetime) RETURNS TINYINT AS BEGIN -- Your returned value 1 or 0 DECLARE @Returned_Value TINYINT; SET @Returned_Value=0; -- Prepare gaps table to be used. WITH gaps AS ( -- Select Date and MaxDiff from the original table SELECT CONVERT(Date,mydate) AS [date] , DATEDIFF(day,ISNULL(LAG(mydate, 1) OVER (ORDER BY mydate), mydate) , mydate) AS [MaxDiff] FROM dbo.vGetVisits ) SELECT @Returned_Value= (SELECT DISTINCT -- DISTINCT in case we have same date but different time CASE WHEN ( -- It is a first entry [date]=(SELECT MIN(CONVERT(Date,mydate)) FROM dbo.vGetVisits)) OR /* --Gap between last qualifying date and entered is greater than 90 Calculate Running sum upto and including required date and find a remainder of division by 91. */ ((SELECT SUM(t1.MaxDiff) FROM (SELECT [MaxDiff] FROM gaps WHERE [date]<=t2.[date] ) t1 )%91 - /* ISNULL added to include first value that always returns NULL Calculate Running sum upto and NOT including required date and find a remainder of division by 91 */ ISNULL((SELECT SUM(t1.MaxDiff) FROM (SELECT [MaxDiff] FROM gaps WHERE [date]<t2.[date] ) t1 )%91, 0) -- End ISNULL <0 ) /* End Running sum upto and including required date */ OR -- Gap between two nearest dates is greater than 90 ((SELECT SUM(t1.MaxDiff) FROM (SELECT [MaxDiff] FROM gaps WHERE [date]<=t2.[date] ) t1 ) - ISNULL((SELECT SUM(t1.MaxDiff) FROM (SELECT [MaxDiff] FROM gaps WHERE [date]<t2.[date] ) t1 ), 0) > 90) THEN 1 ELSE 0 END AS [Qualifying] FROM gaps t2 WHERE [date]=CONVERT(Date,@MyDate)) -- What is neccesary to return when entered date is not in dbo.vGetVisits? RETURN @Returned_Value END GO SELECT dbo.udfLastHitRecursive(mydate) AS [Qualifying] , [id] , mydate FROM dbo.vGetVisits ORDER BY mydate Result ![enter image description here][1] Also have a look at [How to Calculate Running Total in SQL Server][2] update: please see below the results of performance testing. Because of the different logic used in finding "90 days gap" ypercube's and my solutions if left intact may return different results to Paul White's solution. This is due to the use of [DATEDIFF][3] and [DATEADD][4] functions respectively. For example: SELECT DATEADD(DAY, 90, '2014-01-01 00:00:00.000') returns '2014-04-01 00:00:00.000' meaning that '2014-04-01 01:00:00.000' is beyond 90 days gap but SELECT DATEDIFF(DAY, '2014-01-01 00:00:00.000', '2014-04-01 01:00:00.000') Returns '90' meaning that it is still within the gap. Consider an example of a retailer. In this case selling a perishable product that has sell by date '2014-01-01' at '2014-01-01 23:59:59:999' is fine. So value DATEDIFF(DAY, ...) in this case is OK. Another example is a patient waiting to be seen. For someone who comes at '2014-01-01 00:00:00:000' and leaves at '2014-01-01 23:59:59:999' it is 0(zero) days if DATEDIFF is used even though the actual wait was almost 24 hours. Again patient that comes at '2014-01-01 23:59:59' and walks away at '2014-01-02 00:00:01' waited for a day if DATEDIFF is used. But I digress. I left DATEDIFF solutions and even performance tested those but they should really be in their own league. Also it was noted that for the big datasets it is impossible to avoid same day values. So if we have say 13 Million records spanning 2 years of data then we will end up having more than one record for some days. Those records are being filtered out at the earliest opportunity in my and ypercube's DATEDIFF solutions. Hope ypercube does not mind this. Solutions were tested on the following table CREATE TABLE [dbo].[vGetVisits]( [id] [int] NOT NULL, [mydate] [datetime] NOT NULL, ) with two different clustered indexes (mydate in this case): CREATE CLUSTERED INDEX CI_mydate on vGetVisits(mydate) GO Table was populated in the following way SET NOCOUNT ON GO INSERT INTO dbo.vGetVisits(id, mydate) VALUES (1, '01/01/1800') GO DECLARE @i bigint SET @i=2 DECLARE @MaxRows bigint SET @MaxRows=13001 WHILE @i<@MaxRows BEGIN INSERT INTO dbo.vGetVisits(id, mydate) VALUES (@i, DATEADD(day,FLOOR(RAND()*(3)),(SELECT MAX(mydate) FROM dbo.vGetVisits))) SET @i=@i+1 END For a multimillion row case INSERT was changed in a such way that 0-20 minutes entries were randomly added. All solutions were carefully wrapped up in the following code SET NOCOUNT ON GO DECLARE @StartDate DATETIME SET @StartDate = GETDATE() --- Code goes here PRINT 'Total milliseconds: ' + CONVERT(varchar, DATEDIFF(ms, @StartDate, GETDATE())) Actual codes tested (in no particular order): Ypercube's DATEDIFF solution (**YPC, DATEDIFF**) DECLARE @cd TABLE ( TheDate datetime PRIMARY KEY, Qualify INT NOT NULL ); DECLARE @TheDate DATETIME, @Qualify INT = 0, @PreviousCheckDate DATETIME = '1799-01-01 00:00:00' DECLARE c CURSOR LOCAL STATIC FORWARD_ONLY READ_ONLY FOR SELECT mydate FROM (SELECT RowNum = ROW_NUMBER() OVER(PARTITION BY cast(mydate as date) ORDER BY mydate) , mydate FROM dbo.vGetVisits) Actions WHERE RowNum = 1 ORDER BY mydate; OPEN c ; FETCH NEXT FROM c INTO @TheDate ; WHILE @@FETCH_STATUS = 0 BEGIN SET @Qualify = CASE WHEN DATEDIFF(day, @PreviousCheckDate, @Thedate) > 90 THEN 1 ELSE 0 END ; IF @Qualify=1 BEGIN INSERT @cd (TheDate, Qualify) SELECT @TheDate, @Qualify ; SET @PreviousCheckDate=@TheDate END FETCH NEXT FROM c INTO @TheDate ; END CLOSE c; DEALLOCATE c; SELECT TheDate FROM @cd ORDER BY TheDate ; Ypercube's DATEADD solution (**YPC, DATEADD**) DECLARE @cd TABLE ( TheDate datetime PRIMARY KEY, Qualify INT NOT NULL ); DECLARE @TheDate DATETIME, @Next_Date DATETIME, @Interesting_Date DATETIME, @Qualify INT = 0 DECLARE c CURSOR LOCAL STATIC FORWARD_ONLY READ_ONLY FOR SELECT [mydate] FROM [test].[dbo].[vGetVisits] ORDER BY mydate ; OPEN c ; FETCH NEXT FROM c INTO @TheDate ; SET @Interesting_Date=@TheDate INSERT @cd (TheDate, Qualify) SELECT @TheDate, @Qualify ; WHILE @@FETCH_STATUS = 0 BEGIN IF @TheDate>DATEADD(DAY, 90, @Interesting_Date) BEGIN INSERT @cd (TheDate, Qualify) SELECT @TheDate, @Qualify ; SET @Interesting_Date=@TheDate; END FETCH NEXT FROM c INTO @TheDate; END CLOSE c; DEALLOCATE c; SELECT TheDate FROM @cd ORDER BY TheDate ; Paul White's solution (**PW**) ;WITH CTE AS ( SELECT TOP (1) T.[mydate] FROM dbo.vGetVisits AS T ORDER BY T.[mydate] UNION ALL SELECT SQ1.[mydate] FROM ( SELECT T.[mydate], rn = ROW_NUMBER() OVER ( ORDER BY T.[mydate]) FROM CTE JOIN dbo.vGetVisits AS T ON T.[mydate] > DATEADD(DAY, 90, CTE.[mydate]) ) AS SQ1 WHERE SQ1.rn = 1 ) SELECT CTE.[mydate] FROM CTE OPTION (MAXRECURSION 0); My DATEADD solution (**PN, DATEADD**) DECLARE @cd TABLE ( TheDate datetime PRIMARY KEY ); DECLARE @TheDate DATETIME SET @TheDate=(SELECT MIN(mydate) as mydate FROM [dbo].[vGetVisits]) WHILE (@TheDate IS NOT NULL) BEGIN INSERT @cd (TheDate) SELECT @TheDate; SET @TheDate=( SELECT MIN(mydate) as mydate FROM [dbo].[vGetVisits] WHERE mydate>DATEADD(DAY, 90, @TheDate) ) END SELECT TheDate FROM @cd ORDER BY TheDate ; My DATEDIFF Solution (**PN, DATEDIFF**) DECLARE @MinDate DATETIME; SET @MinDate=(SELECT MIN(mydate) FROM dbo.vGetVisits); ;WITH gaps AS ( SELECT t1.[date] , t1.[MaxDiff] , SUM(t1.[MaxDiff]) OVER (ORDER BY t1.[date]) AS [Running Total] FROM ( SELECT mydate AS [date] , DATEDIFF(day,LAG(mydate, 1, mydate) OVER (ORDER BY mydate) , mydate) AS [MaxDiff] FROM (SELECT RowNum = ROW_NUMBER() OVER(PARTITION BY cast(mydate as date) ORDER BY mydate) , mydate FROM dbo.vGetVisits ) Actions WHERE RowNum = 1 ) t1 ) SELECT [date] FROM gaps t2 WHERE ( ([Running Total])%91 - ([Running Total]- [MaxDiff])%91 <0 ) OR ( [MaxDiff] > 90) OR ([date]=@MinDate) ORDER BY [date] I am using SQL Server 2012, so apologies to Mikael Eriksson, but his code won't be tested here. I would still expect his solutions with DATADIFF and DATEADD to return different values on some datasets. And the actual results are: ![enter image description here][5] [1]: https://i.stack.imgur.com/kSav8.jpg [2]: https://stackoverflow.com/questions/860966/calculate-a-running-total-in-sqlserver [3]: http://msdn.microsoft.com/en-gb/library/ms189794.aspx [4]: http://msdn.microsoft.com/en-us/library/ms186819.aspx [5]: https://i.stack.imgur.com/M2Eoo.jpg
As I read the question, the basic recursive algorithm required is: 1. Return the row with the earliest date in the set 2. Set that date as "current" 3. Find the row with the earliest date more than 90 days after the current date 4. Repeat from step 2 until no more rows are found This is relatively easy to implement with a recursive common table expression. For example, using the following sample data (based on the question): DECLARE @T AS table (TheDate datetime PRIMARY KEY); INSERT @T (TheDate) VALUES ('2014-01-01 11:00'), ('2014-01-03 10:00'), ('2014-01-04 09:30'), ('2014-04-01 10:00'), ('2014-05-01 11:00'), ('2014-07-01 09:00'), ('2014-07-31 08:00'); The recursive code is: WITH CTE AS ( -- Anchor: -- Start with the earliest date in the table SELECT TOP (1) T.TheDate FROM @T AS T ORDER BY T.TheDate UNION ALL -- Recursive part SELECT SQ1.TheDate FROM ( -- Recursively find the earliest date that is -- more than 90 days after the "current" date -- and set the new date as "current". -- ROW_NUMBER + rn = 1 is a trick to get -- TOP in the recursive part of the CTE SELECT T.TheDate, rn = ROW_NUMBER() OVER ( ORDER BY T.TheDate) FROM CTE JOIN @T AS T ON T.TheDate > DATEADD(DAY, 90, CTE.TheDate) ) AS SQ1 WHERE SQ1.rn = 1 ) SELECT CTE.TheDate FROM CTE OPTION (MAXRECURSION 0); The results are: ╔═════════════════════════╗ ║ TheDate ║ ╠═════════════════════════╣ ║ 2014-01-01 11:00:00.000 ║ ║ 2014-05-01 11:00:00.000 ║ ║ 2014-07-31 08:00:00.000 ║ ╚═════════════════════════╝ With an index having `TheDate` as a leading key, the execution plan is very efficient: ![Execution plan][1] You could choose to wrap this in a function and execute it directly against the view mentioned in the question, but my instincts are against it. Usually, performance is better when you select rows from a view into a temporary table, provide the appropriate index on the temporary table, then apply the logic above. The details depend on the details of the view, but this is my general experience. For completeness (and prompted by ypercube's answer) I should mention that my other go-to solution for this type of problem (until T-SQL gets proper ordered set functions) is a SQLCLR cursor ([see my answer here][2] for an example of the technique). This performs *much* better than a T-SQL cursor, and is convenient for those with skills in .NET languages and the ability to run SQLCLR in their production environment. It may not offer much in this scenario over the recursive solution because the majority of the cost is the sort, but it is worth mentioning. [1]: https://i.stack.imgur.com/yZulu.png [2]: https://dba.stackexchange.com/a/64223/1192
A solution that uses a cursor. *(first, some needed tables and variables)*: -- a table to hold the results DECLARE @cd TABLE ( TheDate datetime PRIMARY KEY, Qualify INT NOT NULL ); -- some variables DECLARE @TheDate DATETIME, @diff INT, @Qualify INT = 0, @PreviousCheckDate DATETIME = '1900-01-01 00:00:00' ; **The actual cursor:** -- declare the cursor DECLARE c CURSOR LOCAL STATIC FORWARD_ONLY READ_ONLY FOR SELECT TheDate FROM T ORDER BY TheDate ; -- using the cursor to fill the @cd table OPEN c ; FETCH NEXT FROM c INTO @TheDate ; WHILE @@FETCH_STATUS = 0 BEGIN SET @diff = DATEDIFF(day, @PreviousCheckDate, @Thedate) ; SET @Qualify = CASE WHEN @diff > 90 THEN 1 ELSE 0 END ; INSERT @cd (TheDate, Qualify) SELECT @TheDate, @Qualify ; SET @PreviousCheckDate = CASE WHEN @diff > 90 THEN @TheDate ELSE @PreviousCheckDate END ; FETCH NEXT FROM c INTO @TheDate ; END CLOSE c; DEALLOCATE c; **And getting the results:** -- get the results SELECT TheDate, Qualify FROM @cd -- WHERE Qualify = 1 -- optional, to see only the qualifying rows ORDER BY TheDate ; Tested at **[SQLFiddle][1]** [1]: http://sqlfiddle.com/#!6/a68d7/12 [2]: http://sqlperformance.com/2012/07/t-sql-queries/running-totals
Since this *is* a SQL Server 2014 question I might as well add a natively compiled stored procedure version of a "cursor". Source table with some data: create table T ( TheDate datetime primary key ); go insert into T(TheDate) values ('2014-01-01 11:00'), ('2014-01-03 10:00'), ('2014-01-04 09:30'), ('2014-04-01 10:00'), ('2014-05-01 11:00'), ('2014-07-01 09:00'), ('2014-07-31 08:00'); A table type that is the parameter to the stored procedure. [Adjust the `bucket_count` appropriately][1]. create type TType as table ( ID int not null primary key nonclustered hash with (bucket_count = 16), TheDate datetime not null ) with (memory_optimized = on); And a stored procedure that loops through the table valued parameter and collects the rows in `@R`. create procedure dbo.GetDates @T dbo.TType readonly with native_compilation, schemabinding, execute as owner as begin atomic with (transaction isolation level = snapshot, language = N'us_english', delayed_durability = on) declare @R dbo.TType; declare @ID int = 0; declare @RowsLeft bit = 1; declare @CurDate datetime = '1901-01-01'; declare @LastDate datetime = '1901-01-01'; while @RowsLeft = 1 begin set @ID += 1; select @CurDate = T.TheDate from @T as T where T.ID = @ID if @@rowcount = 1 begin if datediff(day, @LastDate, @CurDate) > 90 begin insert into @R(ID, TheDate) values(@ID, @CurDate); set @LastDate = @CurDate; end; end else begin set @RowsLeft = 0; end end; select R.TheDate from @R as R; end Code to fill a memory optimized table variable that is used as a parameter to the natively compiled stored procedure and call the procedure. declare @T dbo.TType; insert into @T(ID, TheDate) select row_number() over(order by T.TheDate), T.TheDate from T; exec dbo.GetDates @T; Result: TheDate ----------------------- 2014-07-31 08:00:00.000 2014-01-01 11:00:00.000 2014-05-01 11:00:00.000 Update: If you for some reason don't need to visit every row in the table you can do the equivalent of the "jump to next date" version that is implemented in the recursive CTE by Paul White. The data type does not need the ID column and you should not use a hash index. create type TType as table ( TheDate datetime not null primary key nonclustered ) with (memory_optimized = on); And the stored procedure uses a `select top(1) ..` to find the next value. create procedure dbo.GetDates @T dbo.TType readonly with native_compilation, schemabinding, execute as owner as begin atomic with (transaction isolation level = snapshot, language = N'us_english', delayed_durability = on) declare @R dbo.TType; declare @RowsLeft bit = 1; declare @CurDate datetime = '1901-01-01'; while @RowsLeft = 1 begin select top(1) @CurDate = T.TheDate from @T as T where T.TheDate > dateadd(day, 90, @CurDate) order by T.TheDate; if @@rowcount = 1 begin insert into @R(TheDate) values(@CurDate); end else begin set @RowsLeft = 0; end end; select R.TheDate from @R as R; end [1]: http://msdn.microsoft.com/en-us/library/dn494956.aspx