String handling in TSQL is poor.
`PATINDEX` supports a limited set of wild cards but has no support for character classes such as `\s` to match white space or quantifiers. It isn't impossible to do the validation for this relatively simple regex with a native TSQL expression. The max length of 12 is enforced by the datatype and use of `varchar` means there are only 6 white space characters in my code page.
So the following approach should work
- First 2 characters must match `NOT LIKE CONCAT('%[^aotxy0123456789',char(9),char(10),char(11),char(12),char(13),char(32),char(160),']%')`
- From 3rd character on must match `NOT LIKE CONCAT('%[^0123456789',char(9),char(10),char(11),char(12),char(13),char(32),char(160),']%')`
This is already fairly messy (I'm not even certain that it is currently correct) and it becomes completely unviable for more complex patterns.
Regular Expression support is available by using CLR and is also called out as one of the possibilities enabled by [the Java language extension][1].
> This extends the TSQL surface area to better handle use cases
> involving regular expressions, string handling, and NLP support.
Examples of using regular expressions with these technologies are already available on the Microsoft Site ([CLR][2], [Java][3])
This answer gives a solution for the cases where these are not an option (perhaps as disabled by policy or using Azure SQL database where these are not available).
This leverages [XML Schema Regular Expressions][4].
## Create an XML Schema with a pattern constraint containing the regex
CREATE XML SCHEMA COLLECTION dbo.PatternValidatorSchema
AS '<?xml version="1.0" encoding="utf-8"?>
<xs:schema attributeFormDefault="unqualified"
elementFormDefault="qualified"
xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="test">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:pattern value="[axyto0-9\s]{0,2}[\s0-9]{0,10}"/>
</xs:restriction>
</xs:simpleType>
</xs:element>
</xs:schema>';
# Add a check constraint so inserts/updates are validated
ALTER TABLE dbo.DemoTable
ADD CONSTRAINT CK_ValidateValueMatchesPattern
CHECK (CAST(ISNULL('<test>' + REPLACE(REPLACE(REPLACE(Value, '&','&'), '<','<'), '>','>') + '</test>','') AS XML(dbo.PatternValidatorSchema)) IS NOT NULL)
This approach is not very flexible for general validation as any failure will throw an error so it can't be used to determine good and bad rows in a set based way, but for the purposes of aborting the insert if a single bad value is found it works fine.
The check constraint isn't really checking anything worthwhile. It is there to cause the conversion to typed XML to be invoked as any failures will cause an error to be thrown.
## Insert some rows
INSERT INTO dbo.DemoTable
VALUES (''),
('ax1234567890'),
('to123456'),
('AX1234567890');
Error
> Msg 6926, Level 16, State 1, Line 37
>
> XML Validation: Invalid simple type value: 'AX1234567890'. Location: /*:test\[1]
The final value attempted violated the expression as it is upper case so the insert was blocked. The offending value is helpfully shown in the error (though the rest of the message may cause confusion)
# Performance
Unfortunately there is a potentially significant performance penalty to pay though. When inserting 10 million valid values into the table with a `TABLOCK` hint on my machine it took about **2.6 seconds with no check constraint**.
Calling the below in a check constraint increased this to **12 seconds with the CLR function**
using Microsoft.SqlServer.Server;
using System.Data.SqlTypes;
using System.Text.RegularExpressions;
public partial class UserDefinedFunctions
{
private static readonly Regex regexObj = new Regex(@"\A[axyto0-9\s]{0,2}[\s0-9]{0,10}\z", RegexOptions.Compiled);
[SqlFunction(DataAccess = DataAccessKind.None,
IsDeterministic = true,
IsPrecise = true,
SystemDataAccess = SystemDataAccessKind.None)]
public static SqlBoolean PatternValidator([SqlFacet(MaxSize = 12)]SqlString valueToCheck)
{
return new SqlBoolean(regexObj.IsMatch(valueToCheck.Value));
}
}
With the below check constraint in place the **native TSQL attempt took 17 seconds**
ALTER TABLE [dbo].[DemoTable] WITH CHECK ADD CONSTRAINT [CK_ValidateValueMatchesPattern] CHECK (
LEFT(Value,2) COLLATE Latin1_General_100_BIN2 NOT LIKE CONCAT('%[^aotxy0123456789',char(9),char(10),char(11),char(12),char(13),char(32),char(160),']%')
AND SUBSTRING(Value,3,10) COLLATE Latin1_General_100_BIN2 NOT LIKE CONCAT('%[^aotxy0123456789',char(9),char(10),char(11),char(12),char(13),char(32),char(160),']%')
)
Using the **typed XML approach took 4 minutes**.
This is a penalty in the order of 24 microseconds per row vs no constraint so it would need to be evaluated whether this is a price worth paying or not taking into account expected volume of inserts/updates.
[![enter image description here][5]][5]
Just by way of comparison after altering the check constraint to produce untyped XML it took 1 min 25 seconds to insert that quantity of rows so the typed XML does add some significant overhead on top of that.
[1]: https://cloudblogs.microsoft.com/sqlserver/2020/04/22/open-sourcing-the-java-language-extension-for-sql-server/
[2]: https://docs.microsoft.com/en-us/archive/msdn-magazine/2007/february/sql-server-regular-expressions-for-efficient-sql-querying
[3]: https://docs.microsoft.com/en-us/sql/language-extensions/tutorials/search-for-string-using-regular-expressions-in-java?view=sql-server-ver15
[4]: https://www.regular-expressions.info/xml.html
[5]: https://i.stack.imgur.com/yMxON.png