TSQL Regular Expression Workbench

本文介绍如何在SQL Server中通过T-SQL使用正则表达式进行数据验证、转换等操作,并提供了一些实用的函数示例。

低功耗蓝牙项目,需要一块懂省电的板

思澈 SF32LB52 芯片,BLE 协议栈深度优化,上手即开发

/* This Workbench is about using Regular expressions with SQL Server via TSQL. It doesn't even attempt to teach how regular expressions work or how to pull them together. There are plenty of such resources on the Web. The aim is to demonstrate a few possibilities and try to persuade you to experiment with them if you don't already use Regex with SQL Server. 

We suggest that, if you are an ordinary mortal like Phil or I, without special powers, you should use an application such as RegexBuddy to form, edit and interpret Regular expressions. It makes learning them a lot easier. In order that people with access only to SQL Server 2000 can use the workbench, we'll use OLE in the examples, but they are readily adapted to CLR 

As always, the source code is in the speechbubble above. */

/*

-- Contents --

  1. Introduction
  2. The OLE Functions
    1. The OLE Regex Match function
    2. The OLE Regex Replace function
    3. The OLE Regex Find (Execute) function
  3. Combining two Regexs
  4. OLE Regex Performance

Regular Expressions can be very useful to the Database programmer, particularly for data validation, data feeds and data transformations. A lot of the time, tools such as grep and awk or Funduc's S&R will be the most suitable way of using regular expressions, but just occasionally, it is handy to be able to use them in TSQL as we'll try to show.

Regular Expressions are not regular in the sense that there is any common dialect of expression that is understood by all Regex engines. On the contrary, regular expresssions aren't always portable and there are many common, similar but incompatible, dialects in use, such as Perl 5.8, Java.util.regex, .NET, PHP, Python, Ruby, ECMA Javascript, PCRE, Apache, vi, Shell tools TCL ARE, POSIX BRE, Funduc and JGsoft.

Regular Expressions were never developed to be easy to understand. They are a condensed shorthand that, on preliminary inspection, looks as if someone has repeatedly sat on the keyboard. Even when interpreted, the logic isn't always easy to follow. If you don't agree,then explain this one!

Probably the best tutorial on the web for Regular Expressions is on www.regular-expressions.info but it is also worth reading Implementing Real-World Data Input Validation using Regular Expressions by Francis Norton for an introduction to regular expressions

A great deal can be done using commandline applications that work with regular expressions such as GREP and AWK. However, there are times where it is handy to use Regex directly from TSQL. There are two Regex engines available to SQL Server. These are

  • the .NET Regex which is in the system.text.regularexpression module
  • The ECMA Regex from VBScript.RegExp which is distributed with the IE browser and is used by Javascript and JScript.

Both of these are excellent standard implementations. Both work well in TSQL.

The .NET Regex requires the creation of CLR functions to provide regular expressions, and works only with SQL Server 2005, (and 2007) See CLR Integration by Christoffer Hedgate

The ECMA Regex can be used via VBScript.RegExp, which are available to SQL Server 2000 as well. The regex is compatible with Javascript.

The advantage of using CLR is that the regular expressions of the NET framework are very good, and performance is excellent. However, the techniques are well-known, whereas some of the more powerful uses of VBScript.RegExp have hardly ever been published, so this workbench will concentrate on the latter

The OLE functions
------------------

There are various properties to consider in these functions

IgnoreCase
By default, the regular expression is case sensitive. In the following functions, we have set the IgnoreCase property to True to make it case insensitive.
The Multiline property
The caret and dollar only match at the very start and very end of the subject string by default. If your subject string consists of multiple lines separated by line breaks, you can make the caret and dollar match at the start and the end of those lines by setting the Multiline property to True. (there is no option to make the dot match line break characters).
The Global property
If you want the RegExp object to return or replace all matches instead of just the first one, set the Global property to True.
Only the 'IgnoreCase is relevant in the first function but we've 'hardcoded' it to 1 as case-sensitive searches are a minority interest.

The OLE Regex Match function
-----------------------------

Let's start off with something simple, a function for testing a string against a regular expression */

IF  OBJECT_ID  ( N'dbo.RegexMatch' IS  NOT NULL
   
DROP FUNCTION  dbo.RegexMatch
GO
CREATE FUNCTION  dbo.RegexMatch
    
(
      
@pattern  VARCHAR ( 2000 ),
      
@matchstring  VARCHAR ( MAX ) --Varchar(8000) got SQL Server 2000
    
)
RETURNS INT
/* The RegexMatch returns True or False, indicating if the regular expression matches (part of) the string. (It returns null if there is an error).
When using this for validating user input, you'll normally want to check if the entire string matches the regular expression. To do so, put a caret at the start of the regex, and a dollar at the end, to anchor the regex at the start and end of the subject string.
*/ 
AS BEGIN
    DECLARE 
@objRegexExp  INT ,
        
@objErrorObject  INT ,
        
@strErrorMessage  VARCHAR ( 255 ),
        
@hr  INT ,
        
@match  BIT

    
SELECT   @strErrorMessage  'creating a regex object'
    
EXEC  @hr sp_OACreate  'VBScript.RegExp' @objRegexExp  OUT
    
IF  @hr 
        
EXEC  @hr sp_OASetProperty  @objRegexExp 'Pattern' @pattern
        
--Specifying a case-insensitive match 
    
IF  @hr 
        
EXEC  @hr sp_OASetProperty  @objRegexExp 'IgnoreCase' 1
        
--Doing a Test' 
    
IF  @hr 
        
EXEC  @hr sp_OAMethod  @objRegexExp 'Test' @match  OUT @matchstring
    
IF  @hr  & lt ;& gt
        
BEGIN
            RETURN 
NULL
        
END
    EXEC 
sp_OADestroy  @objRegexExp
    
RETURN  @match
   
END
GO
/* Now, with this routine, we can do some complex input validation*/
--IS there a repeating word
SELECT  dbo.RegexMatch ( '\b(\w+)\s+\1\b' , 'this has has been repeated' ) --1
SELECT  dbo.RegexMatch ( '\b(\w+)\s+\1\b' , 'this has not been repeated' ) --0

--find a word near another word (in this case 'for' and 'last' 1 or 2 words apart)
SELECT  dbo.RegexMatch ( '\bfor(?:\W+\w+){1,2}?\W+last\b' ,
           
'You have failed me for the last time, Admiral' ) --1
SELECT  dbo.RegexMatch ( '\bfor(?:\W+\w+){1,2}?\W+last\b' ,
           
'You have failed me for what could be the last time, Admiral' ) --0

--is this likely to be a valid credit card
SELECT  dbo.RegexMatch ( '^(?:4[0-9]{12}(?:[0-9]{3})?|5[1-5][0-9]{14}|6011[0-9]{12}|3(?:0
[0-5]|[68][0-9])[0-9]{11}|3[47][0-9]{13}|(?:2131|1800)\d{11})$'
, '4953129482924435' )          

--IS this a valid ZIP code
SELECT  dbo.RegexMatch ( '^[0-9]{5,5}([- ]?[0-9]{4,4})?$' , '02115-4653' )

--is this a valid Postcode
SELECT  dbo.RegexMatch ( '^([Gg][Ii][Rr] 0[Aa]{2})|((([A-Za-z][0-9]{1,2})|(([A-Za-z][A-Ha
-hJ-Yj-y][0-9]{1,2})|(([A-Za-z][0-9][A-Za-z])|([A-Za-z][A-Ha-hJ-Yj-y][0-9]?[A-Za-z])))
) {0,1}[0-9][A-Za-z]{2})$'
, 'RG35 2AQ' )

--is this a valid European date
SELECT  dbo.RegexMatch ( '^((((31\/(0?[13578]|1[02]))|((29|30)\/(0?[1,3-9]|1[0-2])))\/(1[
6-9]|[2-9]\d)?\d{2})|(29\/0?2\/(((1[6-9]|[2-9]\d)?(0[48]|[2468][048]|[13579][26])|((16
|[2468][048]|[3579][26])00))))|(0?[1-9]|1\d|2[0-8])\/((0?[1-9])|(1[0-2]))\/((1[6-9]|[2
-9]\d)?\d{2})) (20|21|22|23|[0-1]?\d):[0-5]?\d:[0-5]?\d$'
, '12/12/2007 20:15:27' )

--is this a valid currency value (dollar)
SELECT  dbo.RegexMatch ( '^\$(\d{1,3}(\,\d{3})*|(\d+))(\.\d{2})?$' , '$34,000.00' )

--is this a valid currency value (Sterling)
SELECT  dbo.RegexMatch ( '^\£(\d{1,3}(\,\d{3})*|(\d+))(\.\d{2})?$' ,
'£34,000.00'
)

--A valid email address?
SELECT  dbo.RegexMatch ( '^(([a-zA-Z0-9!#\$%\^&\*\{\}''`\+=-_\|/\?]+(\.[a-zA-Z0-9!#\$%\^&
\*\{\}''`\+=-_\|/\?]+)*){1,64}@(([A-Za-z0-9]+[A-Za-z0-9-_]*){1,63}\.)*(([A-Za-z0-9]+[A
-Za-z0-9-_]*){3,63}\.)+([A-Za-z0-9]{2,4}\.?)+){1,255}$'
, 'Phil.Factor@simple-Talk.com' )

/*

With this function, the passing back of errors is rudimentary. If an OLE error occurs, then a null is passed back.

There are two other basic Regex functions available. With them, you can use regular expressions in all sorts of places in TSQL without having to get to direct grips with the rather awkward OLE interface.

The OLE Regex Replace function
-----------------------------

*/

IF  OBJECT_ID ( N'dbo.RegexReplace' IS  NOT NULL 
    
DROP FUNCTION  dbo.RegexReplace
GO
CREATE FUNCTION  dbo.RegexReplace
    
(
      
@pattern  VARCHAR ( 255 ),
      
@replacement  VARCHAR ( 255 ),
      
@Subject  VARCHAR ( MAX ),
      
@global  BIT  1 ,
     
@Multiline  bit  = 1
    
)
RETURNS VARCHAR ( MAX )

/*The RegexReplace function takes three string parameters. The pattern (the regular expression) the replacement expression, and the subject string to do the manipulation to.

The replacement expression is one that can cause difficulties. You can specify an empty string '' as the @replacement text. This will cause the Replace method to return the subject string with all regex matches deleted from it (see "strip all HTML elements out of a string" below). 
To re-insert the regex match as part of the replacement, include $& in the replacement text. (see "find a #comment and add a TSQL --" below)
If the regexp contains capturing parentheses, you can use backreferences in the replacement text. $1 in the replacement text inserts the text matched by the first capturing group, $2 the second, etc. up to $9. (e.g. see import delimited text into a database below) To include a literal dollar sign in the replacements, put two consecutive dollar signs in the string you pass to the Replace method.*/
AS BEGIN
    DECLARE 
@objRegexExp  INT ,
        
@objErrorObject  INT ,
        
@strErrorMessage  VARCHAR ( 255 ),
        
@Substituted  VARCHAR ( 8000 ),
        
@hr  INT ,
        
@Replace  BIT

    
SELECT   @strErrorMessage  'creating a regex object'
    
EXEC  @hr sp_OACreate  'VBScript.RegExp' @objRegexExp  OUT
    
IF  @hr 
        
SELECT   @strErrorMessage  'Setting the Regex pattern' ,
                
@objErrorObject  @objRegexExp
    
IF  @hr 
        
EXEC  @hr sp_OASetProperty  @objRegexExp 'Pattern' @pattern
    
IF  @hr  /*By default, the regular expression is case sensitive. Set the IgnoreCase property to True to make it case insensitive.*/
        
SELECT   @strErrorMessage  'Specifying the type of match' 
    
IF  @hr 
        
EXEC  @hr sp_OASetProperty  @objRegexExp 'IgnoreCase' 1
    
IF  @hr 
        
EXEC  @hr sp_OASetProperty  @objRegexExp 'MultiLine' @Multiline
    
IF  @hr 
        
EXEC  @hr sp_OASetProperty  @objRegexExp 'Global' @global
    
IF  @hr 
        
SELECT   @strErrorMessage  'Doing a Replacement' 
    
IF  @hr 
        
EXEC  @hr sp_OAMethod  @objRegexExp 'Replace' @Substituted  OUT ,
            
@subject @Replacement
     
/*If the RegExp.Global property is False (the default), Replace will return the @subject string with the first regex match (if any) substituted with the replacement text. If RegExp.Global is true, the @Subject string will be returned with all matches replaced.*/   
    
IF  @hr  & lt ;& gt
        
BEGIN
            DECLARE 
@Source  VARCHAR ( 255 ),
                
@Description  VARCHAR ( 255 ),
                
@Helpfile  VARCHAR ( 255 ),
                
@HelpID  INT
   
            EXECUTE 
sp_OAGetErrorInfo  @objErrorObject @source  OUTPUT ,
                
@Description  OUTPUT @Helpfile  OUTPUT @HelpID  OUTPUT
            
SELECT   @strErrorMessage  'Error whilst '
                    
COALESCE ( @strErrorMessage 'doing something' ) +  ', '
                    
COALESCE ( @Description '' )
            
RETURN  @strErrorMessage
        
END
    EXEC 
sp_OADestroy  @objRegexExp
    
RETURN  @Substituted
   
END
GO
--remove repeated words in text
SELECT   dbo.RegexReplace ( '\b(\w+)(?:\s+\1\b)+' '$1' ,
                         
'Sometimes I cant help help help stuttering' , 1 1 )

--find a #comment and add a TSQL --
SELECT   dbo.RegexReplace ( '#.*' , '--$&' , '
# this is a comment
first,second,third,fourth'
, 1 , 1 )

--replace a url with an HTML anchor
SELECT   dbo.RegexReplace (
        
'\b(https?|ftp|file)://([-A-Z0-9+&@#/%?=~_|!:,.;]*[-A-Z0-9+&@#/%=~_|])' ,
        
'<a href="$2">$2</a>' ,
         
'There is  this amazing site at http://www.simple-talk.com' , 1 , 1 )

--strip all HTML elements out of a string
SELECT   dbo.RegexReplace ( '<(?:[^>''"]*|([''"]).*?\1)*>' ,
   
'' , '<a href="http://www.simple-talk.com">Simle Talk is wonderful</a><!--This is a comment --> we all love it' , 1 , 1 )

--import delimited text into a database, converting it into insert statements
SELECT   dbo.RegexReplace (
 '([^\|\r\n]+)[|\r\n]+([^\|\r\n]+)[|\r\n]+([^\|\r\n]+)[|\r\n]+([^\|\r\n]+)[|\r\n]+'
,
 
'Insert into MyTable (Firstcol,SecondCol, ThirdCol, Fourthcol)
select $1,$2,$3,$4
'
, '1|white gloves|2435|24565
2|Sports Shoes|285678|0987
3|Stumps|2845|987
4|bat|29862|4875'
, 1 , 1 )
/*

*/

The OLE Regex Find (Execute) function
-----------------------------

This is the most powerful function for doing complex finding and replacing of text. As it passes back detailed records of the hits, including the location and the backreferences, it allows for complex manipulations.

This is written as a table function. The Regex Routine actually passes back a collection for each 'hit'. In the relational world, you'd normally represent this in two tables, so we've returned a left outer join of the two logical tables so as to pass back all the information. This seems to cater for all the uses we can think of. We also append an error column, which should be blank!

*/

IF  OBJECT_ID ( N'dbo.RegexFind' IS  NOT NULL 
    
DROP FUNCTION  dbo.RegexFind
GO
CREATE FUNCTION  RegexFind (
    
@pattern  VARCHAR ( 255 ),
    
@matchstring  VARCHAR ( MAX ),
    
@global  BIT  1 ,
   
@Multiline  bit  = 1 )
RETURNS
    
@result  TABLE
        
(
        
Match_ID  INT ,
          
FirstIndex  INT  ,
          
length  INT  ,
          
Value  VARCHAR ( 2000 ),
          
Submatch_ID  INT ,
          
SubmatchValue  VARCHAR ( 2000 ),
         
Error  VARCHAR ( 255 )
        )


AS  -- columns returned by the function
   
BEGIN
    DECLARE 
@objRegexExp  INT ,
        
@objErrorObject  INT ,
        
@objMatch  INT ,
        
@objSubMatches  INT ,
        
@strErrorMessage  VARCHAR ( 255 ),
       
@error  VARCHAR ( 255 ),
        
@Substituted  VARCHAR ( 8000 ),
        
@hr  INT ,
        
@matchcount  INT ,
        
@SubmatchCount  INT ,
        
@ii  INT ,
        
@jj  INT ,
        
@FirstIndex  INT ,
        
@length  INT ,
        
@Value  VARCHAR ( 2000 ),
        
@SubmatchValue  VARCHAR ( 2000 ),
        
@objSubmatchValue  INT ,
        
@command  VARCHAR ( 8000 ),
        
@Match_ID  INT
        
    DECLARE 
@match  TABLE
        
(
          
Match_ID  INT  IDENTITY ( 1 1 )
                       NOT NULL,
          
FirstIndex  INT  NOT NULL,
          
length  INT  NOT NULL,
          
Value  VARCHAR ( 2000 )
        )    
    
DECLARE  @Submatch  TABLE
        
(
          
Submatch_ID  INT  IDENTITY ( 1 1 ),
          
match_ID  INT  NOT NULL,
          
SubmatchNo  INT  NOT NULL,
          
SubmatchValue  VARCHAR ( 2000 )
        )
       


    
SELECT   @strErrorMessage  'creating a regex object' , @error = ''
    
EXEC  @hr sp_OACreate  'VBScript.RegExp' @objRegexExp  OUT
    
IF  @hr 
        
SELECT   @strErrorMessage  'Setting the Regex pattern' ,
                
@objErrorObject  @objRegexExp
    
IF  @hr 
        
EXEC  @hr sp_OASetProperty  @objRegexExp 'Pattern' @pattern
    
IF  @hr 
        
SELECT   @strErrorMessage  'Specifying a case-insensitive match' 
    
IF  @hr 
        
EXEC  @hr sp_OASetProperty  @objRegexExp 'IgnoreCase' 1
    
IF  @hr 
        
EXEC  @hr sp_OASetProperty  @objRegexExp 'MultiLine' @Multiline
    
IF  @hr 
        
EXEC  @hr sp_OASetProperty  @objRegexExp 'Global' @global
    
IF  @hr 
        
SELECT   @strErrorMessage  'Doing a match' 
    
IF  @hr 
        
EXEC  @hr sp_OAMethod  @objRegexExp 'execute' @objMatch  OUT ,
            
@matchstring
    
IF  @hr 
        
SELECT   @strErrorMessage  'Getting the number of matches'     
    
IF  @hr 
        
EXEC  @hr sp_OAGetProperty  @objmatch 'count' @matchcount  OUT
    
SELECT   @ii 
    
WHILE  @hr  0
        
AND  @ii  & lt @Matchcount
        
BEGIN
/*The Match object has four read-only properties. 
The FirstIndex property indicates the number of characters in the string to the left of the match. 
The Length property of the Match object indicates the number of characters in the match. 
The Value property returns the text that was matched.*/
            
SELECT   @strErrorMessage  'Getting the FirstIndex property' ,
                    
@command  'item('  CAST ( @ii  AS VARCHAR ) +  ').FirstIndex'    
            
IF  @hr 
                
EXEC  @hr sp_OAGetProperty  @objmatch @command ,
                    
@Firstindex  OUT
            
IF  @hr 
                
SELECT   @strErrorMessage  'Getting the length property' ,
                        
@command  'item('  CAST ( @ii  AS VARCHAR ) +  ').Length'    
            
IF  @hr 
                
EXEC  @hr sp_OAGetProperty  @objmatch @command @Length  OUT
            
IF  @hr 
                
SELECT   @strErrorMessage  'Getting the value property' ,
                        
@command  'item('  CAST ( @ii  AS VARCHAR ) +  ').Value'    
            
IF  @hr 
                
EXEC  @hr sp_OAGetProperty  @objmatch @command @Value  OUT
            
INSERT  INTO  @match
                    
(
                      
Firstindex ,
                      
[Length] ,
                      
[Value]
                    
)
                    
SELLECT   @firstindex  1 ,
                            
@Length ,
                            
@Value
            
SELECT   @Match_ID  @@Identity         
/*The SubMatches property of the Match object is a collection of strings. It will only hold values if your regular expression has capturing groups. The collection will hold one string for each capturing group. The Count property (returned as SubmatchCount) indicates the number of string in the collection. The Item property takes an index parameter, and returns the text matched by the capturing group. 
*/
            
IF  @hr 
                
SELECT   @strErrorMessage  'Getting the SubMatches collection' ,
                        
@command  'item('  CAST ( @ii  AS VARCHAR )
                        + 
').SubMatches'    
             IF  @hr 
                
SELECT   @strErrorMessage  'Getting the number of submatches'     
            
IF  @hr 
                
EXEC  @hr sp_OAGetProperty  @objSubmatches 'count' ,
                    
@submatchCount  OUT
            
SELECT   @jj 
            
WHILE  @hr  0
                
AND  @jj  & lt @submatchCount
                
BEGIN
                    IF 
@hr 
                        
SELECT   @strErrorMessage  'Getting the submatch value property' ,
                                
@command  'item('  CAST ( @jj  AS VARCHAR )
                                + 
')'  , @submatchValue = NULL   
                    
IF  @hr 
                        
EXEC  @hr sp_OAGetProperty  @objSubmatches @command ,
                            
@SubmatchValue  OUT
                    
INSERT  INTO  @Submatch
                            
(
                              
Match_ID ,
                              
SubmatchNo ,
                              
SubmatchValue
                            
)
                            
SELECT   @Match_ID ,
                                    
@jj + 1 ,
                                    
@SubmatchValue
                    
SELECT   @jj  @jj  1
                
END   
                 EXEC  @hr sp_OAGetProperty  @objmatch @command ,
                    
@objSubmatches  OUT      
            SELECT  
@ii  @ii  1
        
END
    IF 
@hr  & lt ;& gt
        
BEGIN
            DECLARE 
@Source  VARCHAR ( 255 ),
                
@Description  VARCHAR ( 255 ),
                
@Helpfile  VARCHAR ( 255 ),
                
@HelpID  INT
   
            EXECUTE 
sp_OAGetErrorInfo  @objErrorObject @source  OUTPUT ,
                
@Description  OUTPUT @Helpfile  OUTPUT @HelpID  OUTPUT
            
SELECT   @Error  'Error whilst '
                    
COALESCE ( @strErrorMessage 'doing something' ) +  ', '
                    
COALESCE ( @Description '' )
        
END
    EXEC 
sp_OADestroy  @objRegexExp
     
EXEC  sp_OADestroy         @objMatch
     
EXEC  sp_OADestroy         @objSubMatches

INSERT INTO  @result
          
( Match_ID ,
          
FirstIndex ,
          
[length] ,
          
[Value] ,
          
Submatch_ID ,
          
SubmatchValue ,
         
error )


    
SELECT   m.[Match_ID] ,
           
[FirstIndex] ,
           
[length] ,
           
[Value] , [SubmatchNo] ,
           
[SubmatchValue] , @error
  
FROM     @match  m
    
LEFT  OUTER  JOIN    @submatch  s
    
ON  m.match_ID = s.match_ID    
IF  @@rowcount = AND  LEN ( @error )& gt ; 0
INSERT INTO  @result ( error SELECT  @error
RETURN 
END
GO

--showing the context where two words 'for' and 'last' are found in proximity
DECLARE  @sample  VARCHAR ( 2000 )
SELECT  @Sample = 'You have failed me for the last time, Admiral.
We have not long to wait for your last gasp'
SELECT  '...' + SUBSTRING ( @Sample , Firstindex - 8 , length + 16 )+ '...' 
    
FROM  dbo.RegexFind  ( '\bfor(?:\W+\w+){0,3}?\W+last\b' ,
           
@sample , 1 , 1 )

--finding repeated words, showing the repetition and the repeated word 
SELECT  [repetition] = value [word] = SubmatchValue  FROM  dbo.RegexFind  ( '\b(\w+)\s+\1\b' ,
'this this is is a repeated word word word' , 1 , 1 )

--Split lines based on a regular expression
SELECT  value  FROM  dbo.regexfind ( '[^\r\n]*(?:[\r\n]*)' ,
'
This is the second line
This is the third
and the fourth'
, 1 , 1 WHERE  length & gt ; 0

--break up all words in a string into separate table rows
SELECT  value  FROM  dbo.RegexFind  ( '\b[\w]+\b' ,
'Hickory dickory dock, the mouse ran up the clock' , 1 , 1 )
--split text into keywords and values
SELECT  Match_ID
[keyword] = MAX  ( CASE  WHEN  submatch_ID = THEN   submatchValue  ELSE  ''  END ),
[value] = MAX  ( CASE  WHEN  submatch_ID = THEN   submatchValue  ELSE  ''  END )
  
FROM  dbo.RegexFind  ( '(\w+)\s*=\s*(.*)\s*' ,
'firstname=Phil
Lastname=Factor
Salary=$200,000
age=unknown to us
Post=DBA'
, 1 , 1 GROUP BY  Match_ID

SELECT  FROM  dbo.RegexFind  ( '([^\|\r\n]+[\|\r\n]+)' ,
'1|white gloves|2435|24565
2|Sports Shoes|285678|0987
3|Stumps|2845|987
4|bat|29862|4875'
, 1 , 1 )

--get valid dates and convert to SQL Server format
SELECT DISTINCT  CONVERT ( DATETIME , value , 103 FROM  dbo.RegexFind  ( '\b(0?[1-9]|[12][0-9]|3[01])[- /.](0?[1-9]|1[012])[- /.](19|20?[0-9]{2})\b' , '
12/2/2006 12:30 <> 13/2/2007
32/3/2007
2-4-2007
25.8.2007
1/1/2005
34/2/2104
2/5/2006'
, 1 , 1 )

/*

Combining two Regexs
--------------------

Once you've experimented with the regex calls we've provided, you'll realise that you can create some really cool functions and procedures that combine regexs. Here we have a procedure that does a 'google-style' search on text to find the words you specify. It returns the 'context' in that it quotes the substring where the match occurred. You can specify how close the words need to be to specify a 'hit'

*/

IF  OBJECT_ID ( N'dbo.FindWordsInContext' IS  NOT NULL 
    
DROP FUNCTION  dbo.FindWordsInContext
GO
CREATE FUNCTION  dbo.FindWordsInContext
    
(
      
@words  VARCHAR ( 255 ), --list of words you want searched for
      
@text  VARCHAR ( MAX ), --the text you want searched 
      
@proximity  INT --the maximum distance in words between specified words
    
)
RETURNS  @proximityList  TABLE
    
(
      
Hit  INT  IDENTITY ( 1 1 ),
      
context  VARCHAR ( 2000 )
    )
AS BEGIN
    DECLARE 
@Pattern  VARCHAR ( 512 )
    
SELECT   @Pattern  COALESCE ( @pattern  '(?:\W+\w+){0,'
                                
CAST ( @proximity  AS VARCHAR ( 5 )) +  '}?\W+' ,
                                
'\b' ) +  value
    
FROM     dbo.RegexFind ( '\b[\w]+\b' @words 1 1 )
    
INSERT  INTO  @ProximityList  context  )
            
SELECT   '...'  SUBSTRING ( @text Firstindex  8 length  16 )
                    + 
'...'
            
FROM     dbo.RegexFind ( @pattern + '\b' @text 1 1 )
    
RETURN
   END

GO

SELECT  FROM  dbo.FindWordsInContext ( 'sadness farewell embark' ,
'Sunset and evening star,
And one clear call for me!
And may there by no moaning of the bar,
When I put out to sea,

But such a tide as moving seems asleep,
Too full for sound and foam,
When that which drew from out the boundless deep
Turns again home. 

Twilight and evening bell,
And after that the dark!
And may there be no sadness of farewell,
When I embark; 

For tho'' from out our bourne of Time and Place
The flood may bear me far,
I hope to see my Pilot face to face
When I have crost the bar. 
'
, 8 )


/*

OLE Regex performance
-----------------

Whereas the use of the OLE VBScript.RegExp to scan large chunks of text is fine, it is good for complex validation, and it makes a great testbed for regexes, These OLE functions are too slow for use in queries. The overhead of making the calls is just too high because the performance of OLE in TSQL is not great. See Zach Nichter's excellent article on the subject'Writing to a File Using the sp_OACreate Stored Procedure and OSQL' Here is an example, scanning a databases of nearly 50,000 names of public houses from out XML Jumpstart Workbench.
*/

SELECT  COUNT (*)  FROM publichouses.dbo.publichouses  WHERE  dbo.RegexMatch  ( '\bred\b' , name ) = 1
--5 minutes 28 secs
SELECT  COUNT (*)  FROM  publichouses.dbo.publichouses  WHERE  name  LIKE  '%red %'
--less than 50 ms

/*You can reduce the overhead to a quarter of what it was by using a function like this and creating the Regex object before you do the call. This means the Regex Object does not get repeatedly created and destroyed on every call.*/

IF  OBJECT_ID ( N'dbo.OARegexMatch' IS  NOT NULL 
    
DROP FUNCTION  dbo.OARegexMatch
GO
CREATE FUNCTION  dbo.OARegexMatch  /* very simple Function Wrapper around the call */
    
(
     
@objRegexExp  INT ,
      
@matchstring  VARCHAR ( MAX )
    )
RETURNS INT
AS BEGIN
    DECLARE 
@objErrorObject  INT ,
        
@hr  INT ,
        
@match  BIT
        
EXEC  @hr sp_OAMethod  @objRegexExp 'Test' @match  OUT @matchstring
    
IF  @hr  & lt ;& gt
        
BEGIN
            RETURN 
NULL
        
END
    RETURN 
@match
   
END
GO
/* and now embed the SQL Query within the life-cycle of the Regex object */

DECLARE  @objRegexExp  INT ,
        
@objErrorObject  INT ,
        
@strErrorMessage  VARCHAR ( 255 ),
        
@hr  INT ,
        
@match  BIT

    
SELECT   @strErrorMessage  'creating a regex object'
    
EXEC  @hr sp_OACreate  'VBScript.RegExp' @objRegexExp  OUT
    
IF  @hr 
        
EXEC  @hr sp_OASetProperty  @objRegexExp , 'pattern' '\bred\b'
        
--Specifying a case-insensitive match 
    
IF  @hr 
        
EXEC  @hr sp_OASetProperty  @objRegexExp 'IgnoreCase' 1
        
--Doing a Test' 
    
IF  @hr 
       
SELECT  COUNT (*) 
           
FROM  publichouses.dbo.publichouses 
           
WHERE  dbo.OARegexMatch  ( @objRegexExp , name ) = 1
    
IF  @hr  & lt ;& gt
        
BEGIN
            DECLARE 
@Source  VARCHAR ( 255 ),
                
@Description  VARCHAR ( 255 ),
                
@Helpfile  VARCHAR ( 255 ),
                
@HelpID  INT
   
            EXECUTE 
sp_OAGetErrorInfo  @objErrorObject @source  OUTPUT ,
                
@Description  OUTPUT @Helpfile  OUTPUT @HelpID  OUTPUT
            
SELECT   @strErrorMessage  'Error whilst '
                    
COALESCE ( @strErrorMessage 'doing something' ) +  ', '
                    
COALESCE ( @Description '' )
            
RAISERROR @strErrorMessage , 16 , 1 )
        
END
    EXEC 
sp_OADestroy  @objRegexExp
--1 minute 28 secs

/* it is no consolation for those who are stuck with SQL Server 2000, but the CLR functions are a lot quicker for this sort of usage. */


We've used a range of regex patterns from a number of sources in this workbench. Like a lot of programmers, we collect up snippets we come across, almost always forgetting to record the original author. We therefore apologise in advance for not crediting the source and original author, of regex patterns. As you can guess, they often take a long time and effort to develop. If you spot a regex which we should have cited, please add a comment and let us all know who originally wrote it!


原文:https://www.simple-talk.com/sql/t-sql-programming/tsql-regular-expression-workbench/

低功耗蓝牙项目,需要一块懂省电的板

思澈 SF32LB52 芯片,BLE 协议栈深度优化,上手即开发

内容概要:本文围绕列车-轨道-桥梁交互仿真研究,基于Matlab平台构建数值模型,系统分析列车运行过程中轨道与桥梁结构间的动态相互作用机制。研究涵盖多体动力学建模、耦合系统运动方程求解、边界条件设定及仿真结果可视化等关键环节,重点揭示高速行车条件下基础设施的振动传递规律与力学响应特征。该仿真方法可有效评估结构安全性、舒适性指标及疲劳寿命,为轨道交通工程的设计优化与运维管理提供理论支撑和技术路径。文中配套提供了完整的Matlab代码实现方案及操作说明,便于用户复现、验证和拓展相关研究。; 适合人群:具备Matlab编程基础和结构动力学、车辆动力学等相关专业知识的研究生、科研人员及从事铁路工程、桥梁工程与交通系统安全评估的工程技术人才,尤其适合开展轨道交通耦合振动课题的研究者。; 使用场景及目标:①用于高校与科研机构进行列车-轨道-桥梁耦合系统动力学特性的教学演示与科学研究;②支撑高速铁路桥梁的设计优化、运营安全性评估与减振降噪方案验证;③为复杂交通基础设施的多物理场耦合仿真提供建模思路与代码参考。; 阅读建议:建议读者结合所提供的Matlab代码逐模块深入研读,重点关注系统建模假设、质量-刚度-阻尼矩阵构建方法及数值积分算法的实现细节,同时可通过调整参数进行敏感性分析,进一步掌握仿真模型的适用范围与优化方向。
内容概要:本文系统研究了非线性薛定谔方程的物理信息神经网络(PINN)求解方法,提出一种将物理规律嵌入深度学习模型的科学计算新范式。通过构建全连接神经网络架构,将非线性薛定谔方程及其初始/边界条件作为损失函数的核心组成部分,实现了在无须大量标注数据的前提下对复值偏微分方程的高精度数值求解。该方法充分利用自动微分技术精确计算方程残差,有效融合了数据驱动与模型驱动的优势,在光学孤子传播、量子系统演化等典型场景中展现出优异的逼近能力与泛化性能。文中配套提供了完整的Python实现代码,涵盖网络搭建、损失定义、训练优化与结果可视化全流程。; 适合人群:具备Python编程能力与深度学习基础知识,熟悉偏微分方程理论及科学计算的理工科研究生、科研人员,以及从事光学、量子物理、流体力学等领域建模与仿真的工程技术人员。; 使用场景及目标:① 掌握PINN方法的基本原理与实现技巧;② 学习如何将复杂物理方程转化为可训练的神经网络损失项;③ 应用于非线性光学、玻色-爱因斯坦凝聚、水波动力学等问题的仿真与预测;④ 为相关科研课题提供可复现的算法原型与代码参考。; 阅读建议:建议读者结合所提供的Python代码进行动手实践,重点理解神经网络对微分算子的近似机制、损失函数的多任务加权策略以及训练过程中的超参数调优方法,进而可迁移至其他非线性偏微分方程的求解任务,拓展其在交叉学科中的应用边界。
源码下载地址: https://pan.quark.cn/s/a4b39357ea24 微软推出的【AZ-900微软认证】是一项针对初学者的基础级云服务资格认证,其目的在于帮助学习者掌握云概念、微软Azure服务的运作机制以及云解决方案的核心知识。获得这一认证后,考生将能够清晰地理解云计算领域的基础术语、服务模式(包括IaaS、PaaS、SaaS等)以及这些服务在Azure平台上的实际应用方式。 在【必过考题】部分,我们可以观察到两个重点议题,它们分别聚焦于PaaS(平台即服务)的概念阐释和云成本的计算方式。 在第一个议题中,考生被要求辨别关于PaaS的正确性描述。PaaS平台提供了一个开发环境,但并不允许用户直接访问操作系统(Box 1: No)。比如,Azure Web Apps服务可以用来部署web应用,但用户无法直接管理虚拟机或IIS系统。另一方面,PaaS确实具备自动扩展的功能(Box 2: Yes),这表示可以根据实际需求自动增加负载均衡的虚拟机以支持web应用的运行。PaaS框架还为开发人员提供了构建和调整云端应用的工具,预置的应用组件能够有效缩短新应用的编程周期(Box 3: Yes)。 第二个议题同样关注云计算理念的理解,尤其强调IT支出从资本性支出(CapEx)向运营性支出(OpEx)的转型思想。传统的IT投资通常被视为CapEx,而云计算的按需付费机制使企业能够将这部分开支转化为OpEx,从而在财务规划上获得更大的自由度。 在为AZ-900考试做准备时,考生需要特别关注以下几个核心知识点: 1. **云服务模式**:深入理解IaaS(基础设施即服务)、PaaS和SaaS(软件即服务)之间的差异及其各自的应用情境。 2. **Azure服务*...
源码下载地址: https://pan.quark.cn/s/239a0d536a1e 依据所提供的文件资料,可以归纳出以下核心内容:由清华大学计算机系邓俊辉教授精心编纂的算法训练营题目合集,对于CSP(中国软件专业人才设计与创业大赛)及PAT(程序设计能力测试)这类编程竞赛具有极高的参考价值,堪称一份极具价值的参考资料。此类竞赛普遍对参赛者的算法功底和编程技巧提出严苛要求。该合集中的题目与算法领域紧密相连,其中包含了“最大红矩形”这一典型题目。所谓最大红矩形题目,其核心任务是针对一个由红色与绿色方格构成的棋盘,寻觅出最大的纯红矩形区域。要攻克这一问题,必须运用数据结构与算法的相关知识,特别是栈这一数据结构的应用。 “最大红矩形”问题能够被抽象转化为“直方图最大面积”问题。具体转化方法是将棋盘的每一列视为一个独立的直方图单元,其中红色方格的贡献体现为当前位置与前一个绿色方格所在行数的差值,从而保证每个直方图的基宽恒定为1。随后,借助扫描直方图的技术手段来探寻最大矩形面积。这一过程需要对每个直方图进行系统性遍历,并利用栈来记录各直方图的下标信息。一旦检测到当前直方图的高度小于栈顶元素所记录的高度,则意味着遭遇了一个“高点”,此时需计算以该“高点”为右边界条件的最大矩形面积。 在编程实践环节,必须高度关注栈的操作细节,以及如何精确地初始化和操纵栈来应对直方图问题。代码实现中,通常配置两个栈,一个用于储存直方图的高度值,另一个用于标记直方图的下标位置。当面对新高度时,需审慎判断当前高度与栈顶高度的相对关系,并据此抉择是执行入栈操作还是计算面积。针对“低点”(即当前高度小于栈顶),应直接将当前高度纳入栈中;而对于“高点”,则需执行弹出栈顶元素的操作,并基于该栈顶元素的高...
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值