Get streak count and streak type from win-loss-tie data

sql-server sql-server-2012 add tag

jamauss (imported from SE)

I made a [SQL Fiddle][1] for this question if that makes things easier for anyone. 

I have a fantasy sports database of sorts and what I'm trying to figure out is how to come up with "current streak" data (like 'W2' if the team has won their last 2 matchups, or 'L1' if they lost their last matchup after winning the previous matchup - or 'T1' if they tied their most recent matchup).

Here is my basic schema:

    CREATE TABLE FantasyTeams (
      team_id BIGINT NOT NULL
    )

    CREATE TABLE FantasyMatches(
    	match_id BIGINT NOT NULL,
    	home_fantasy_team_id BIGINT NOT NULL,
    	away_fantasy_team_id BIGINT NOT NULL,
    	fantasy_season_id BIGINT NOT NULL,
    	fantasy_league_id BIGINT NOT NULL,
    	fantasy_week_id BIGINT NOT NULL,
    	winning_team_id BIGINT NULL
    )

A value of `NULL` in the `winning_team_id` column indicates a tie for that match.

Here's a sample DML statement with some sample data for 6 teams and 3 weeks worth of matchups:

    INSERT INTO FantasyTeams
    SELECT 1
    UNION
    SELECT 2
    UNION
    SELECT 3
    UNION
    SELECT 4
    UNION
    SELECT 5
    UNION
    SELECT 6
    
    INSERT INTO FantasyMatches
    SELECT 1, 2, 1, 2, 4, 44, 2
    UNION
    SELECT 2, 5, 4, 2, 4, 44, 5
    UNION
    SELECT 3, 6, 3, 2, 4, 44, 3
    UNION
    SELECT 4, 2, 4, 2, 4, 45, 2
    UNION
    SELECT 5, 3, 1, 2, 4, 45, 3
    UNION
    SELECT 6, 6, 5, 2, 4, 45, 6
    UNION
    SELECT 7, 2, 6, 2, 4, 46, 2
    UNION
    SELECT 8, 3, 5, 2, 4, 46, 3
    UNION
    SELECT 9, 4, 1, 2, 4, 46, NULL
    
    GO

Here is an example of the desired output (based on the DML above) that I'm having trouble even beginning to figure out how to derive:

    | TEAM_ID | STEAK_TYPE | STREAK_COUNT |
    |---------|------------|--------------|
    |       1 |          T |            1 |
    |       2 |          W |            3 |
    |       3 |          W |            3 |
    |       4 |          T |            1 |
    |       5 |          L |            2 |
    |       6 |          L |            1 |

I've tried various methods using subqueries and CTE's but I can't put it together. I'd like to avoid using a cursor as I could have a large dataset to run this against in the future. I feel like there might be a way involving table variables that join this data to itself somehow but I'm still working on it.

Additional Info: There could be a varying number of teams (any even number between 6 and 10) and the total matchups will increase by 1 for each team every week. Any ideas on how I should do this?

  [1]: http://www.sqlfiddle.com/#!6/10a46/3/0

Top Answer

Serpiton (imported from SE)

Another way to get the result is by a recursive CTE

    WITH TeamRes As (
    SELECT FT.Team_ID
         , FM.match_id
         , Previous_Match = LAG(match_id, 1, 0) 
                            OVER (PARTITION BY FT.Team_ID ORDER BY FM.match_id)
         , Matches = Row_Number() 
                     OVER (PARTITION BY FT.Team_ID ORDER BY FM.match_id Desc)
         , Result = Case Coalesce(winning_team_id, -1)
                         When -1 Then 'T'
                         When FT.Team_ID Then 'W'
                         Else 'L'
                    End 
    FROM   FantasyMatches FM
           INNER JOIN FantasyTeams FT ON FT.Team_ID IN 
             (FM.home_fantasy_team_id, FM.away_fantasy_team_id)
    ), Streaks AS (
    SELECT Team_ID, Result, 1 As Streak, Previous_Match
    FROM   TeamRes
    WHERE  Matches = 1
    UNION ALL
    SELECT tr.Team_ID, tr.Result, Streak + 1, tr.Previous_Match
    FROM   TeamRes tr
           INNER JOIN Streaks s ON tr.Team_ID = s.Team_ID 
                               AND tr.Match_id = s.Previous_Match 
                               AND tr.Result = s.Result
    )
    Select Team_ID, Result, Max(Streak) Streak
    From   Streaks
    Group By Team_ID, Result
    Order By Team_ID


[SQLFiddle][1] demo


  [1]: http://www.sqlfiddle.com/#!6/c01c5/18

Answer #2

Paul White (imported from SE)

One intuitive approach to solve this problem is:

1. Find the most recent result for each team
2. Check the previous match and add one to the streak count if the result type matches
3. Repeat step 2 but stop as soon as the first different result is encountered

This strategy might win out over the window function solution (which performs a full scan of the data) as the table grows larger, assuming the recursive strategy is implemented efficiently. The key to success is to provide efficient indexes to locate rows quickly (using seeks) and to avoid sorts. The indexes needed are:

    -- New index #1
    CREATE UNIQUE INDEX uq1 ON dbo.FantasyMatches 
        (home_fantasy_team_id, match_id) 
    INCLUDE (winning_team_id);
    
    -- New index #2
    CREATE UNIQUE INDEX uq2 ON dbo.FantasyMatches 
        (away_fantasy_team_id, match_id) 
    INCLUDE (winning_team_id);

To assist in query optimization, I will use a temporary table to hold rows identified as forming part of a current streak. If streaks are typically short (as is true for the teams I follow, sadly) this table should be quite small:

    -- Table to hold just the rows that form streaks
    CREATE TABLE #StreakData
    (
        team_id bigint NOT NULL,
        match_id bigint NOT NULL,
        streak_type char(1) NOT NULL,
        streak_length integer NOT NULL,
    );
    
    -- Temporary table unique clustered index
    CREATE UNIQUE CLUSTERED INDEX cuq ON #StreakData (team_id, match_id);

My recursive query solution is as follows ([**SQL Fiddle here**][1]):

    -- Solution query
    WITH Streaks AS
    (
        -- Anchor: most recent match for each team
        SELECT 
            FT.team_id, 
            CA.match_id, 
            CA.streak_type, 
            streak_length = 1
        FROM dbo.FantasyTeams AS FT
        CROSS APPLY
        (
            -- Most recent match
            SELECT
                T.match_id,
                T.streak_type
            FROM 
            (
                SELECT 
                    FM.match_id, 
                    streak_type =
                        CASE 
                            WHEN FM.winning_team_id = FM.home_fantasy_team_id
                                THEN CONVERT(char(1), 'W')
                            WHEN FM.winning_team_id IS NULL
                                THEN CONVERT(char(1), 'T')
                            ELSE CONVERT(char(1), 'L')
                        END
                FROM dbo.FantasyMatches AS FM
                WHERE 
                    FT.team_id = FM.home_fantasy_team_id
                UNION ALL
                SELECT 
                    FM.match_id, 
                    streak_type =
                        CASE 
                            WHEN FM.winning_team_id = FM.away_fantasy_team_id
                                THEN CONVERT(char(1), 'W')
                            WHEN FM.winning_team_id IS NULL
                                THEN CONVERT(char(1), 'T')
                            ELSE CONVERT(char(1), 'L')
                        END
                FROM dbo.FantasyMatches AS FM
                WHERE
                    FT.team_id = FM.away_fantasy_team_id
            ) AS T
            ORDER BY 
                T.match_id DESC
                OFFSET 0 ROWS 
                FETCH FIRST 1 ROW ONLY
        ) AS CA
        UNION ALL
        -- Recursive part: prior match with the same streak type
        SELECT 
            Streaks.team_id, 
            LastMatch.match_id, 
            Streaks.streak_type, 
            Streaks.streak_length + 1
        FROM Streaks
        CROSS APPLY
        (
            -- Most recent prior match
            SELECT 
                Numbered.match_id, 
                Numbered.winning_team_id, 
                Numbered.team_id
            FROM
            (
                -- Assign a row number
                SELECT
                    PreviousMatches.match_id,
                    PreviousMatches.winning_team_id,
                    PreviousMatches.team_id, 
                    rn = ROW_NUMBER() OVER (
                        ORDER BY PreviousMatches.match_id DESC)
                FROM
                (
                    -- Prior match as home or away team
                    SELECT 
                        FM.match_id, 
                        FM.winning_team_id, 
                        team_id = FM.home_fantasy_team_id
                    FROM dbo.FantasyMatches AS FM
                    WHERE 
                        FM.home_fantasy_team_id = Streaks.team_id
                        AND FM.match_id < Streaks.match_id
                    UNION ALL
                    SELECT 
                        FM.match_id, 
                        FM.winning_team_id, 
                        team_id = FM.away_fantasy_team_id
                    FROM dbo.FantasyMatches AS FM
                    WHERE 
                        FM.away_fantasy_team_id = Streaks.team_id
                        AND FM.match_id < Streaks.match_id
                ) AS PreviousMatches
            ) AS Numbered
            -- Most recent
            WHERE 
                Numbered.rn = 1
        ) AS LastMatch
        -- Check the streak type matches
        WHERE EXISTS
        (
            SELECT 
                Streaks.streak_type
            INTERSECT
            SELECT 
                CASE 
                    WHEN LastMatch.winning_team_id IS NULL THEN 'T' 
                    WHEN LastMatch.winning_team_id = LastMatch.team_id THEN 'W' 
                    ELSE 'L' 
                END
        )
    )
    INSERT #StreakData
        (team_id, match_id, streak_type, streak_length)
    SELECT
        team_id,
        match_id,
        streak_type,
        streak_length
    FROM Streaks
    OPTION (MAXRECURSION 0);

The T-SQL text is quite long, but each section of the query corresponds closely to the broad process outline given at the start of this answer. The query is made longer by the need to use certain tricks to avoids sorts and to produce a `TOP` in the recursive part of the query (which is normally not allowed).

The execution plan is relatively small and simple by comparison with the query. I have shaded the anchor region yellow, and the recursive part green in the screenshot below:

![Recursive execution plan][2]

With the streak rows captured in a temporary table, it is easy to get the summary results you require. (Using a temporary table also avoids a sorting spill that might occur if the query below were combined with the main recursive query)

    -- Basic results
    SELECT
        SD.team_id,
        StreakType = MAX(SD.streak_type),
        StreakLength = MAX(SD.streak_length)
    FROM #StreakData AS SD
    GROUP BY 
        SD.team_id
    ORDER BY
        SD.team_id;

![Basic query execution plan][3]

The same query can be used as a basis for updating the `FantasyTeams` table:

    -- Update team summary
    WITH StreakData AS
    (
        SELECT
            SD.team_id,
            StreakType = MAX(SD.streak_type),
            StreakLength = MAX(SD.streak_length)
        FROM #StreakData AS SD
        GROUP BY 
            SD.team_id
    )
    UPDATE FT
    SET streak_type = SD.StreakType,
        streak_count = SD.StreakLength
    FROM StreakData AS SD
    JOIN dbo.FantasyTeams AS FT
        ON FT.team_id = SD.team_id;

Or, if you prefer `MERGE`:

    MERGE dbo.FantasyTeams AS FT
    USING
    (
        SELECT
            SD.team_id,
            StreakType = MAX(SD.streak_type),
            StreakLength = MAX(SD.streak_length)
        FROM #StreakData AS SD
        GROUP BY 
            SD.team_id
    ) AS StreakData
        ON StreakData.team_id = FT.team_id
    WHEN MATCHED THEN UPDATE SET
        FT.streak_type = StreakData.StreakType,
        FT.streak_count = StreakData.StreakLength;

Either approach produces an efficient execution plan (based on the known number of rows in the temporary table):

![Update execution plan][4]

Finally, because the recursive method naturally includes the `match_id` in its processing, it is easy to add a list of the `match_id`s that form each streak to the output:

    SELECT
        S.team_id,
        streak_type = MAX(S.streak_type),
        match_id_list =
            STUFF(
            (
                SELECT ',' + CONVERT(varchar(11), S2.match_id)
                FROM #StreakData AS S2
                WHERE S2.team_id = S.team_id
                ORDER BY S2.match_id DESC
                FOR XML PATH ('')
            ), 1, 1, ''),
        streak_length = MAX(S.streak_length)
    FROM #StreakData AS S
    GROUP BY 
        S.team_id
    ORDER BY
        S.team_id;

Output:

![Match list included][5]

Execution plan:

![Match list execution plan][6]

  [1]: http://sqlfiddle.com/#!6/92e8e/1
  [2]: https://i.stack.imgur.com/l7kwJ.png
  [3]: https://i.stack.imgur.com/QhP4a.png
  [4]: https://i.stack.imgur.com/JaTWR.png
  [5]: https://i.stack.imgur.com/n3gvs.png
  [6]: https://i.stack.imgur.com/ayeIN.png

Answer #3

Mikael Eriksson (imported from SE)

Since you are on SQL Server 2012 you can use a couple of the new windowing functions.

    with C1 as
    (
      select T.team_id,
             case
               when M.winning_team_id is null then 'T'
               when M.winning_team_id = T.team_id then 'W'
               else 'L'
             end as streak_type,
             M.match_id
      from FantasyMatches as M
        cross apply (values(M.home_fantasy_team_id),
                           (M.away_fantasy_team_id)) as T(team_id)
    ), C2 as
    (
      select C1.team_id,
             C1.streak_type,
             C1.match_id,
             lag(C1.streak_type, 1, C1.streak_type) 
               over(partition by C1.team_id 
                    order by C1.match_id desc) as lag_streak_type
      from C1
    ), C3 as
    (
      select C2.team_id,
             C2.streak_type,
             sum(case when C2.lag_streak_type = C2.streak_type then 0 else 1 end) 
               over(partition by C2.team_id 
                    order by C2.match_id desc rows unbounded preceding) as streak_sum
      from C2
    )
    select C3.team_id,
           C3.streak_type,
           count(*) as streak_count
    from C3
    where C3.streak_sum = 0
    group by C3.team_id,
             C3.streak_type
    order by C3.team_id;

[SQL Fiddle](http://www.sqlfiddle.com/#!6/c01c5/26)


`C1` calculates the `streak_type` for each team and match.

`C2` finds the previous `streak_type` ordered by `match_id desc`.

`C3` generates a running sum `streak_sum` ordered by `match_id desc` keeping a `0` a long as the `streak_type` is the same as the last value.

Main query sums up the streaks where `streak_sum` is `0`.

3 Answers