Grouped string concatenation with XML PATH and STUFF error

sql-server sql-server-2008-r2 add tag

B McCarthy (imported from SE)

I am trying to query two tables and get results like the following:

    Section    Names
    shoes      AccountName1, AccountName2, AccountName3
    books      AccountName1

The tables are:

    CREATE TABLE dbo.TableA(ID INT, Section varchar(64), AccountId varchar(64));
    
    INSERT dbo.TableA(ID, Section, AccountId) VALUES
    (1 ,'shoes','A1'),
    (2 ,'shoes','A2'),
    (3 ,'shoes','A3'),
    (4 ,'books','A1');
    
    CREATE TABLE dbo.TableB(AccountId varchar(20), Name varchar(64));
    
    INSERT dbo.TableB(AccountId, Name) VALUES
    ('A1','AccountName1'),
    ('A2','AccountName2'),
    ('A3','AccountNAme3');

I saw a few questions answered saying to use "XML PATH" and "STUFF" to query the data to get the results I am looking for, but I think there is something missing.  I have tried the below query and get the error message:

> Column 'a.AccountId' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.

I don't have it in the `SELECT` clause of either query, but I assume the error is because AccountId is not unique in TableA.

Here is the query I am currently trying to get working correctly.

    SELECT section, names= STUFF((
        SELECT ', ' + Name FROM TableB as b 
    WHERE AccountId = b.AccountId
    FOR XML PATH('')), 1, 1, '')
    FROM TableA AS a
    GROUP BY a.section

Top Answer

aaron bertrand (imported from SE)

Sorry, I missed a step in the relationship. Try this version (though [Martin's will work as well](https://pastebin.com/Fz601fDC)):

    SELECT DISTINCT o.section, names= STUFF((
        SELECT ', ' + b.Name 
        FROM dbo.TableA AS a
    	INNER JOIN dbo.TableB AS b
    	ON a.AccountId = b.AccountId
    	WHERE a.Section = o.Section
        FOR XML PATH, TYPE).value(N'.[1]', N'varchar(max)'), 1, 2, '')
    FROM dbo.TableA AS o;

An approach that is *at least as good*, but sometimes better, is switching from `DISTINCT` to `GROUP BY`:

    SELECT o.section, names= STUFF((
        SELECT ', ' + b.Name 
        FROM dbo.TableA AS a
        INNER JOIN dbo.TableB AS b
        ON a.AccountId = b.AccountId
        WHERE a.Section = o.Section
        FOR XML PATH, TYPE).value(N'.[1]', N'varchar(max)'), 1, 2, '')
    FROM dbo.TableA AS o
    GROUP BY o.section;

At a high level, the reason `DISTINCT` applies to the entire column list. Therefore for any duplicates it has to perform the aggregate work for every duplicate before applying `DISTINCT`. If you use `GROUP BY` then it can potentially remove duplicates *before* doing any of the aggregation work. This behavior can vary by plan depending on a variety of factors including indexes, plan strategy, etc. And a direct switch to `GROUP BY` may not be possible in all cases.

In any case, I ran both of these variations in [SentryOne Plan Explorer](https://sentryone.com/plan-explorer). The plans are different in a few minor, uninteresting ways, but the I/O involved with the underlying worktable is telling. Here is `DISTINCT`:

> [![enter image description here][1]][1]

And here is `GROUP BY`:

> [![enter image description here][2]][2]

When I made the tables larger (14,000+ rows mapping to 24 potential values), this difference is more pronounced. `DISTINCT`:

> [![enter image description here][3]][3]

`GROUP BY`:

> [![enter image description here][4]][4]

In SQL Server 2017, you can use `STRING_AGG`:

    SELECT a.section, STRING_AGG(b.Name, ', ')
        FROM dbo.TableA AS a
        INNER JOIN dbo.TableB AS b
        ON a.AccountId = b.AccountId
        WHERE a.Section = a.Section
    	GROUP BY a.section;

The I/O here is almost nothing:

> [![enter image description here][5]][5]

----

But, if you're not on SQL Server 2017 (or Azure SQL Database), and can't use `STRING_AGG`, **I have to give credit where credit is due**... Paul White's answer below has very little I/O and kicks the pants off of both of the `FOR XML PATH` solutions above.

> [![enter image description here][6]][6]

----
Other enhancements from these posts:

 - [Grouped Concatenation in SQL Server](https://sqlperformance.com/2014/08/t-sql-queries/sql-server-grouped-concatenation)
 - [Grouped Concatenation : Ordering and Removing Duplicates](https://sqlperformance.com/2014/08/t-sql-queries/sql-server-grouped-concatenation-2)
 - [Comparing string splitting / concatenation methods](https://sqlperformance.com/2016/01/t-sql-queries/comparing-splitting-concat)

Also see:

 - [Bad habits to kick : avoiding the schema prefix](https://sqlblog.org/2009/10/11/bad-habits-to-kick-avoiding-the-schema-prefix)


  [1]: https://i.stack.imgur.com/6JDEym.png
  [2]: https://i.stack.imgur.com/MXiHCm.png
  [3]: https://i.stack.imgur.com/NWoUxm.png
  [4]: https://i.stack.imgur.com/YcKIFm.png
  [5]: https://i.stack.imgur.com/fxKkJm.png
  [6]: https://i.stack.imgur.com/QiCjem.png

Answer #2

Paul White (imported from SE)

I thought I would try a solution using XML:

### Tables

    DECLARE @TableA AS table
    (
        ID integer PRIMARY KEY,
        Section varchar(10) NOT NULL,
        AccountID char(2) NOT NULL
    );
    
    DECLARE @TableB AS table
    (
        AccountID char(2) PRIMARY KEY,
        Name varchar(20) NOT NULL
    );

### Data
    
    INSERT @TableA
        (ID, Section, AccountID)
    VALUES
        (1, 'shoes', 'A1'),
        (2, 'shoes', 'A2'),
        (3, 'shoes', 'A3'),
        (4, 'books', 'A1');
    
    INSERT @TableB
        (AccountID, Name)
    VALUES
        ('A1', 'AccountName1'),
        ('A2', 'AccountName2'),
        ('A3', 'AccountName3');

### Join and convert to XML

    DECLARE @x xml =
    (
        SELECT
            TA.Section,
            CA.Name
        FROM @TableA AS TA
        JOIN @TableB AS TB
            ON TB.AccountID = TA.AccountID
        CROSS APPLY
        (
            VALUES(',' + TB.Name)
        ) AS CA (Name)
        ORDER BY TA.Section
        FOR XML AUTO, TYPE, ELEMENTS, ROOT ('Root')
    );

[![XML creation query][2]][2]

The XML in the variable looks like this:



    <Root>
      <TA>
        <Section>shoes</Section>
        <CA>
          <Name>,AccountName1</Name>
        </CA>
        <CA>
          <Name>,AccountName2</Name>
        </CA>
        <CA>
          <Name>,AccountName3</Name>
        </CA>
      </TA>
      <TA>
        <Section>books</Section>
        <CA>
          <Name>,AccountName1</Name>
        </CA>
      </TA>
    </Root>

### Query

The final query shreds the XML into sections, and concatenates the names in each:
    
    SELECT
        Section = 
            N.n.value('(./Section/text())[1]', 'varchar(10)'),
        Names = 
            STUFF
            (
                -- Consecutive text nodes collapse
                N.n.query('./CA/Name/text()')
                .value('./text()[1]', 'varchar(8000)'), 
                1, 1, ''
            )
    -- Shred per section
    FROM @x.nodes('Root/TA') AS N (n);

### Result

[![Output][3]][3]

### Execution plan

[![Execution plan][4]][4]

### Live demo

<>https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=ed1bd17e4df970143d93166c053ad27c


  [1]: http://data.stackexchange.com/dba/query/481209
  [2]: https://i.stack.imgur.com/nmwcVm.png
  [3]: https://i.stack.imgur.com/Yxb0A.png
  [4]: https://i.stack.imgur.com/CkDyp.png

2 Answers