I have a CLR scalar UDF implemented through C# code. I've noticed that using the `String` data type for input parameters significantly improves performance compared to the `SqlString` data type. In [Stairway to SQLCLR Level 5: Development (Using .NET within SQL Server)], [Solomon Rutzky] mentions the following reasons to prefer the SQL data types for strings:
> A primary difference between native common language runtime (CLR) data
> types and SQL Server data types is that the former do not allow for
> NULL values, while the latter provide full NULL semantics.
> Streaming values in can be achieved via SqlChars for N[VAR]CHAR,
> SqlBytes for [VAR]BINARY, and SqlXml.CreateReader() for XML...
> When using SqlString (not string or even SqlChars) you can access the
> CompareInfo, CultureInfo, LCID, and SqlCompareOptions properties...
I know that my input will never be NULL, I don't need to stream the values in, and I'll never check the collation properties. Could my case be an exception where it's better to use `String` instead of `SqlString`? If I do go with that approach, is there anything in particular that I should watch out for?
If it matters, I'm using SQL Server's default collation. Here's part of my source code, with `s1` being the input parameter:
fixed (char* chptr = s1)
char* cp = (char*)current;
for (int i = 0; i < s1.Length; i++)
cp[i] = chptr[i];
Excellent question. As far as I am aware of, under those conditions (i.e. guaranteed no `NULL`s and no need for the extra functionality) there shouldn't be any specific concerns. This could be a situation similar to `CURSOR`s where, if a generic rule is needed, it would be: "don't use cursors". But, the actual rule is: "only use cursors when/where appropriate". The problem is educating people on the technical details of cursors such that they can make that decision, which is those of us who know enough about such things ignore the generic rule and proceed to use them appropriately.
So, I advise folks to "always" use the `Sql*` types because it cuts down on confusion and errors. But, that is not to say that using `string` in your situation wouldn't be better. I say go for it, and if you ever run into a problem with `string`, it's easy enough to go back and change it to be `SqlString`.
With regards to collation and your statement of:
> If it matters, I'm using SQL Server's default collation.
While it generally does not matter, it's also a bit unclear what you mean here given that there is no true default collation. Chances are that you are referring to the [unfortunate default collation when installing SQL Server on an OS with a language setting of "US English"] (i.e. LCID = 1033), which is `SQL_Latin1_General_CP1_CI_AS`. But then there are still three levels of collations that can all be different (Instance / Server, Database, and Column), and you might be meaning only one or even two of these levels.
The reason I mention all of this is that there are some non-obvious things going on here:
1. to some degree, none of those 3 levels that collations affect is relevant since the default culture of SQLCLR threads is the Language setting at the OS level (the LCID of the selected language). This impacts operations using `String.Equals` when using either of the two `StringComparison.CurrentCulture*` values, and operations using `String.Compare` when not specifying a culture.
1. to some degree, none of those 3 levels that collations affect is relevant since the `=` operator does an ordinal comparison (i.e. should be the same as using a `_BIN2` collation). This is also how `String.CompareOrdinal` works, as well as `String.Equals` when _not_ passing in `StringComparison.CurrentCulture*` or `StringComparison.InvariantCulture*` values.
1. one instance where the SQL Server collation matters is when concatenating a `SqlString` input parameter with a `string` via `+`. In this case the `+` operator creates a new `SqlString` to contain the value of the `string` so that it can then concatenate the two `SqlStrings`. The problem is that the new `SqlString` is created with the current threads LCID (which is the Operating System's LCID), and then the `+` operator compares the two `SqlStrings`s prior to concatenation (i.e. verifies that they are the "same type"). But, due to the `SqlString` input parameter having the LCID of the _database_ (not instance or column) and the implicitly created `SqlString` having an LCID of the OS, the operation gets an exception stating that the "collations" do not match. Nice, eh?
This, however, _should_ be a non-issue since nobody should ever use the `SqlString` value directly when wanting the string. Everyone should instead always use the `Value` property to get the string.
That being said, I am curious as to what test you did to determine that `string` was faster. I tested a simple UDF that accepts a single `NVARCHAR(4000)` input parameter, concatenates a short string and then returns the new value. One version of that UDF accepts and returns `string`, and the other version accepts and returns `SqlString`. Over 1 million iterations, the `string` version was about 200-300 milliseconds faster than the `SqlString` version, about 50% of the time, when comparing their fastest times (over all 1 million iterations, not per each). The other 50% of the time the performance gain was around 100 milliseconds, but could also be none.
Also, regarding your test code: is `s1` always the direct input parameter, whether it's `string` or `SqlString`? If yes, then you should also test creating a string locally and setting it `s1.Value`. Meaning:
string s2 = s1.Value; // when s1 is SqlString instead of string
fixed (char* chptr = s2)
char* cp = (char*)current;
for (int i = 0; i < s2.Length; i++)
cp[i] = chptr[i];
Also, some other options to possibly test:
1. [SqlString.GetUnicodeBytes] method (returns `byte`)
1. [SqlChars.Value] property (returns `char`)