C# regex match

C# regex match DEFAULT

C# Regex.Match Examples: Regular Expressions

This C# tutorial covers the Regex class and Regex.Match. It provides many examples for System.Text.RegularExpressions.

Regex. Patterns are everywhere.

In text, we often discover, and must process, textual patterns. A regular expression describes a text-based transformation.

A class, Regex, handles regular expressions. We specify patterns as string arguments. Methods (like Match and Replace) are available.

Match. This program introduces the Regex class. We use its constructor and the Match method, and then handle the returned Match object.

Namespace: All these types are found in the System.Text.RegularExpressions namespace.

Pattern: The Regex uses a pattern that indicates one or more digits. The characters "55" match this pattern.

Success: The returned Match object has a bool property called Success. If it equals true, we found a match.

Based on: .NET 4.5 C# program that uses Match, Regex using System; using System.Text.RegularExpressions; class Program { static void Main() { Regex regex = new Regex(@"\d+"); Match match = regex.Match("Dot 55 Perls"); if (match.Success) { Console.WriteLine(match.Value); } } } Output 55

Static method. Here we match parts of a string (a file name in a directory path). We only accept ranges of characters and some punctuation. On Success, we access the group.

Static: We use the Regex.Match static method. It is also possible to call Match upon a Regex object.

Success: We test the result of Match with the Success property. When true, a Match occurred and we can access its Value or Groups.

Groups: This collection is indexed at 1, not zero—the first group is found at index 1. This is important to remember.

Groups

C# program that uses Regex.Match using System; using System.Text.RegularExpressions; class Program { static void Main() { // First we see the input string. string input = "/content/alternate-1.aspx"; // Here we call Regex.Match. Match match = Regex.Match(input, @"content/([A-Za-z0-9\-]+)\.aspx$", RegexOptions.IgnoreCase); // Here we check the Match instance. if (match.Success) { // Finally, we get the Group value and display it. string key = match.Groups[1].Value; Console.WriteLine(key); } } } Outputalternate-1Pattern details @" This starts a verbatim string literal. content/ The group must follow this string. [A-Za-z0-9\-]+ One or more alphanumeric characters. (...) A separate group. \.aspx This must come after the group. $ Matches the end of the string.

NextMatch. More than one match may be found. We can call the NextMatch method to search for a match that comes after the current one in the text. NextMatch can be used in a loop.

Here: We match all the digits in the input string (4 and 5). Two matches occur, so we use NextMatch to get the second one.

Return: NextMatch returns another Match object—it does not modify the current one. We assign a variable to it.

C# program that uses NextMatch using System; using System.Text.RegularExpressions; class Program { static void Main() { string value = "4 AND 5"; // Get first match. Match match = Regex.Match(value, @"\d"); if (match.Success) { Console.WriteLine(match.Value); } // Get second match. match = match.NextMatch(); if (match.Success) { Console.WriteLine(match.Value); } } } Output 4 5

Preprocess. Sometimes we can preprocess strings before using Match() on them. This can be faster and clearer. Experiment. I found using ToLower to normalize chars was a good choice.

ToLower

C# program that uses ToLower, Match using System; using System.Text.RegularExpressions; class Program { static void Main() { // This is the input string. string input = "/content/alternate-1.aspx"; // Here we lowercase our input first. input = input.ToLower(); Match match = Regex.Match(input, @"content/([A-Za-z0-9\-]+)\.aspx$"); } }

Static. Often a Regex instance object is faster than the static Regex.Match. For performance, we should usually use an instance object. It can be shared throughout an entire project.

Static Regex

Sometimes: We only need to call Match once in a program's execution. A Regex object does not help here.

Class: Here a static class stores an instance Regex that can be used project-wide. We initialize it inline.

Static Class

C# program that uses static Regex using System; using System.Text.RegularExpressions; class Program { static void Main() { // The input string again. string input = "/content/alternate-1.aspx"; // This calls the static method specified. Console.WriteLine(RegexUtil.MatchKey(input)); } } static class RegexUtil { static Regex _regex = new Regex(@"/content/([a-z0-9\-]+)\.aspx$"); /// <summary> /// This returns the key that is matched within the input. /// </summary> static public string MatchKey(string input) { Match match = _regex.Match(input.ToLower()); if (match.Success) { return match.Groups[1].Value; } else { return null; } } } Output alternate-1

Numbers. A common requirement is extracting a number from a string. We can do this with Regex.Match. To get further numbers, consider Matches() or NextMatch.

Digits: We extract a group of digit characters and access the Value string representation of that number.

Parse: To parse the number, use int.Parse or int.TryParse on the Value here. This will convert it to an int.

Parse

C# program that matches numbers using System; using System.Text.RegularExpressions; class Program { static void Main() { // ... Input string. string input = "Dot Net 100 Perls"; // ... One or more digits. Match m = Regex.Match(input, @"\d+"); // ... Write value. Console.WriteLine(m.Value); } } Output 100

Value, length, index. A Match object, returned by Regex.Match has a Value, Length and Index. These describe the matched text (a substring of the input).

Value: This is the matched text, represented as a separate string. This is a substring of the original input.

Length: This is the length of the Value string. Here, the Length of "Axxxxy" is 6.

Index: The index where the matched text begins within the input string. The character "A" starts at index 4 here.

C# that shows value, length, index using System; using System.Text.RegularExpressions; class Program { static void Main() { Match m = Regex.Match("123 Axxxxy", @"A.*y"); if (m.Success) { Console.WriteLine("Value = " + m.Value); Console.WriteLine("Length = " + m.Length); Console.WriteLine("Index = " + m.Index); } } } Output Value = Axxxxy Length = 6 Index = 4

IsMatch. This method tests for a matching pattern. It does not capture groups from this pattern. It just sees if the pattern exists in a valid form in the input string.

Bool: IsMatch returns a bool value. Both overloads receive an input string that is searched for matches.

Bool Method

Internals: When we use the static Regex.IsMatch method, a new Regex is created. This is done in the same way as any instance Regex.

And: This instance is discarded at the end of the method. It will be cleaned up by the garbage collector.

C# that uses Regex.IsMatch method using System; using System.Text.RegularExpressions; class Program { /// <summary> /// Test string using Regex.IsMatch static method. /// </summary> static bool IsValid(string value) { return Regex.IsMatch(value, @"^[a-zA-Z0-9]*$"); } static void Main() { // Test the strings with the IsValid method. Console.WriteLine(IsValid("TheDeveloperBlog0123")); Console.WriteLine(IsValid("DotNetPerls")); Console.WriteLine(IsValid(":-)")); // Console.WriteLine(IsValid(null)); // Throws an exception } } Output True True False

Matches. Sometimes one match is not enough. Here we use Matches instead of Match: it returns multiple Match objects at once. These are returned in a MatchCollection.

MatchesMatches: Quote

Replace. Sometimes we need to replace a pattern of text with some other text. Regex.Replace helps. We can replace patterns with a string, or with a value determined by a MatchEvaluator.

Replace: We use the Replace method, with strings and MatchEvaluators, to replace text. We replace spaces, numbers and position-based parts.

ReplaceReplace: EndReplace: NumbersReplace: Spaces

Spaces: Whitespace isn't actually white. But it is often not needed for future processing of data.

Replace: Trim

Split. Do you need to extract substrings that contain only certain characters (certain digits, letters)? Split() returns a string array that will contain the matching substrings.

Split

Numbers: We can handle certain character types, such as numbers, with the Split method. This is powerful. It handles many variations.

Split: Numbers

Caution: The Split method in Regex is more powerful than the one on the string type. But it may be slower in common cases.

String Split

Escape. This method can change a user input to a valid Regex pattern. It assumes no metacharacters were intended. The input string should be only literal characters.

Note: With Escape, we don't get out of jail free, but we do change the representation of certain characters in a string.

Escape

Unescape. The term "unescape" means to do the reverse of escape. It returns character representations to a non-escaped form. This method is rarely useful.

Unescape

Star. Also known as a Kleene closure in language theory. It is important to know the difference between the star and the plus. A star means zero or more.

Star

Word count. With Regex we can count words in strings. We compare this method with Microsoft Word's implementation. We come close to Word's algorithm.

Word Count

Files. We often need to process text files. The Regex type, and its methods, are used for this. But we need to combine a file input type, like StreamReader, with the Regex code.

Regex: Files

HTML. Regex can be used to process or extract parts of HTML strings. There are problems with this approach. But it works in many situations.

Title, P: We focus on title and P elements. These are common tags in HTML pages.

Title: HTMLParagraphs: HTML

Remove HTML: We also remove all HTML tags. Please be cautious with this article. It does not work on many HTML pages.

Remove HTML Tags

RegexOptions. With the Regex type, the RegexOptions enum is used to modify method behavior. Often I find the IgnoreCase value helpful.

IgnoreCase: Lowercase and uppercase letters are distinct in the Regex text language. IgnoreCase changes this.

IgnoreCase

Multiline: We can change how the Regex type acts upon newlines with the RegexOptions enum. This is often useful.

Multiline

C# that uses RegexOptions.IgnoreCase using System; using System.Text.RegularExpressions; class Program { static void Main() { const string value = "TEST"; // ... This ignores the case of the "TE" characters. if (Regex.IsMatch(value, "te..", RegexOptions.IgnoreCase)) { Console.WriteLine(true); } } } Output True

Is Regex fast? This question is a topic of great worldwide concern. Sadly Regex often results in slower code than imperative loops. But we can optimize Regex usage.

1. Compile. Using the RegexOptions.Compiled argument to a Regex instance will make it execute faster. This however has a startup penalty.

RegexOptions.CompiledRegex Performance

2. Replace with loop. Some Regex method calls can be replaced with a loop. The loop is much faster.

Regex vs. Loop

3. Use static fields. You can cache a Regex instance as a static field—an example is provided above.

Research. A regular expression can describe any "regular" language. These languages are ones where complexity is finite: there is a limited number of possibilities.

Caution: Some languages, like HTML, are not regular languages. This means you cannot fully parse them with traditional regular expressions.

Automaton: A regular expression is based on finite state machines. These automata encode states and possible transitions to new states.

Operators. Regular expressions use compiler theory. With a compiler, we transform regular languages (like Regex) into tiny programs that mess with text.

These expressions are commonly used to describe patterns. Regular expressions are built from single characters, using union, concatenation, and the Kleene closure, or any-number-of, operator.

Compilers: Principles, Techniques and Tools

A summary. Regular expressions are a concise way to process text data. This comes at a cost. For performance, we can rewrite Regex calls with low-level char methods.

Representations. Regex is a high-level representation of the same logic expressed with loops and char arrays. This logic is represented in a simple, clear way.



© 2021 - TheDeveloperBlog.com | Visit CSharpDotNet.com for more C# Dot Net Articles

Sours: https://thedeveloperblog.com/regex-match

Regex.Match Method

Definition

Important

Some information relates to prerelease product that may be substantially modified before it’s released. Microsoft makes no warranties, express or implied, with respect to the information provided here.

Searches an input string for a substring that matches a regular expression pattern and returns the first occurrence as a single Match object.

Overloads

Match(String, String, RegexOptions, TimeSpan)

Searches the input string for the first occurrence of the specified regular expression, using the specified matching options and time-out interval.

Match(String, Int32, Int32)

Searches the input string for the first occurrence of a regular expression, beginning at the specified starting position and searching only the specified number of characters.

Match(String, String, RegexOptions)

Searches the input string for the first occurrence of the specified regular expression, using the specified matching options.

Match(String, Int32)

Searches the input string for the first occurrence of a regular expression, beginning at the specified starting position in the string.

Match(String)

Searches the specified input string for the first occurrence of the regular expression specified in the Regex constructor.

Match(String, String)

Searches the specified input string for the first occurrence of the specified regular expression.

Match(String, String, RegexOptions, TimeSpan)

Searches the input string for the first occurrence of the specified regular expression, using the specified matching options and time-out interval.

Parameters

input
String

The string to search for a match.

pattern
String

The regular expression pattern to match.

options
RegexOptions

A bitwise combination of the enumeration values that provide options for matching.

Returns

Match

An object that contains information about the match.

Exceptions

Remarks

The Match(String, String, RegexOptions, TimeSpan) method returns the first substring that matches a regular expression pattern in an input string. For information about the language elements used to build a regular expression pattern, see Regular Expression Language - Quick Reference.

The static Match(String, String, RegexOptions, TimeSpan) method is equivalent to constructing a Regex object with the Regex(String, RegexOptions, TimeSpan) constructor and calling the instance Match(String) method.

The parameter consists of regular expression language elements that symbolically describe the string to match. For more information about regular expressions, see .NET Framework Regular Expressions and Regular Expression Language - Quick Reference.

You can determine whether the regular expression pattern has been found in the input string by checking the value of the returned Match object's Success property. If a match is found, the returned Match object's Value property contains the substring from that matches the regular expression pattern. If no match is found, its value is String.Empty.

This method returns the first substring found in that matches the regular expression pattern. You can retrieve subsequent matches by repeatedly calling the returned Match object's NextMatch method. You can also retrieve all matches in a single method call by calling the Regex.Matches(String, String, RegexOptions) method.

The parameter specifies how long a pattern matching method should try to find a match before it times out. Setting a time-out interval prevents regular expressions that rely on excessive backtracking from appearing to stop responding when they process input that contains near matches. For more information, see Best Practices for Regular Expressions and Backtracking. If no match is found in that time interval, the method throws a RegexMatchTimeoutException exception. overrides any default time-out value defined for the application domain in which the method executes.

Notes to Callers

We recommend that you set the parameter to an appropriate value, such as two seconds. If you disable time-outs by specifying InfiniteMatchTimeout, the regular expression engine offers slightly better performance. However, you should disable time-outs only under the following conditions:

  • When the input processed by a regular expression is derived from a known and trusted source or consists of static text. This excludes text that has been dynamically input by users.

  • When the regular expression pattern has been thoroughly tested to ensure that it efficiently handles matches, non-matches, and near matches.

  • When the regular expression pattern contains no language elements that are known to cause excessive backtracking when processing a near match.

See also

Match(String, Int32, Int32)

Searches the input string for the first occurrence of a regular expression, beginning at the specified starting position and searching only the specified number of characters.

Parameters

input
String

The string to search for a match.

beginning
Int32

The zero-based character position in the input string that defines the leftmost position to be searched.

length
Int32

The number of characters in the substring to include in the search.

Returns

Match

An object that contains information about the match.

Exceptions

ArgumentOutOfRangeException

is less than zero or greater than the length of .

-or-

is less than zero or greater than the length of .

-or-

identifies a position that is outside the range of .

Remarks

The Match(String, Int32, Int32) method returns the first substring that matches a regular expression pattern in a portion of an input string. For information about the language elements used to build a regular expression pattern, see Regular Expression Language - Quick Reference.

The regular expression pattern for which the Match(String, Int32, Int32) method searches is defined by the call to one of the Regex class constructors. For more information about the elements that can form a regular expression pattern, see Regular Expression Language - Quick Reference.

The Match(String, Int32, Int32) method searches the portion of defined by the and parameters for the regular expression pattern. always defines the index of the leftmost character to include in the search, and defines the maximum number of characters to search. Together, they define the range of the search. If the search proceeds from left to right (the default), the regular expression engine searches from the character at index to the character at index + - 1. If the regular expression engine was instantiated by using the RegexOptions.RightToLeft option so that the search proceeds from right to left, the regular expression engine searches from the character at index + - 1 to the character at index . This method returns the first match that it finds within this range. You can retrieve subsequent matches by repeatedly calling the returned Match object's Match.NextMatch method.

You can determine whether the regular expression pattern has been found in the input string by checking the value of the returned Match object's Success property. If a match is found, the returned Match object's Value property contains the substring from that matches the regular expression pattern. If no match is found, its value is String.Empty.

The RegexMatchTimeoutException exception is thrown if the execution time of the matching operation exceeds the time-out interval specified by the Regex.Regex(String, RegexOptions, TimeSpan) constructor. If you do not set a time-out value when you call the constructor, the exception is thrown if the operation exceeds any time-out value established for the application domain in which the Regex object is created. If no time-out is defined in the Regex constructor call or in the application domain's properties, or if the time-out value is Regex.InfiniteMatchTimeout, no exception is thrown.

See also

Match(String, String, RegexOptions)

Searches the input string for the first occurrence of the specified regular expression, using the specified matching options.

Parameters

input
String

The string to search for a match.

pattern
String

The regular expression pattern to match.

options
RegexOptions

A bitwise combination of the enumeration values that provide options for matching.

Returns

Match

An object that contains information about the match.

Exceptions

Examples

The following example defines a regular expression that matches words beginning with the letter "a". It uses the RegexOptions.IgnoreCase option to ensure that the regular expression locates words beginning with both an uppercase "a" and a lowercase "a".

The regular expression pattern is interpreted as shown in the following table.

PatternDescription
Begin the match at a word boundary.
Match the character "a".
Match zero, one, or more word characters.
End the match at a word boundary.

Remarks

The Match(String, String, RegexOptions) method returns the first substring that matches a regular expression pattern in an input string. For information about the language elements used to build a regular expression pattern, see Regular Expression Language - Quick Reference.

The static Match(String, String, RegexOptions) method is equivalent to constructing a Regex object with the Regex(String, RegexOptions) constructor and calling the instance Match(String) method.

The parameter consists of regular expression language elements that symbolically describe the string to match. For more information about regular expressions, see .NET Framework Regular Expressions and Regular Expression Language - Quick Reference.

You can determine whether the regular expression pattern has been found in the input string by checking the value of the returned Match object's Success property. If a match is found, the returned Match object's Value property contains the substring from that matches the regular expression pattern. If no match is found, its value is String.Empty.

This method returns the first substring found in that matches the regular expression pattern. You can retrieve subsequent matches by repeatedly calling the returned Match object's NextMatch method. You can also retrieve all matches in a single method call by calling the Regex.Matches(String, String, RegexOptions) method.

The RegexMatchTimeoutException exception is thrown if the execution time of the matching operation exceeds the time-out interval specified for the application domain in which the method is called. If no time-out is defined in the application domain's properties, or if the time-out value is Regex.InfiniteMatchTimeout, no exception is thrown.

Notes to Callers

This method times out after an interval that is equal to the default time-out value of the application domain in which it is called. If a time-out value has not been defined for the application domain, the value InfiniteMatchTimeout, which prevents the method from timing out, is used. The recommended static method for retrieving a pattern match is Match(String, String), which lets you set the time-out interval.

See also

Match(String, Int32)

Searches the input string for the first occurrence of a regular expression, beginning at the specified starting position in the string.

Parameters

input
String

The string to search for a match.

startat
Int32

The zero-based character position at which to start the search.

Returns

Match

An object that contains information about the match.

Exceptions

Remarks

The Match(String, Int32) method returns the first substring that matches a regular expression pattern, starting at or after the character position, in an input string. For information about the language elements used to build a regular expression pattern, see Regular Expression Language - Quick Reference.

The regular expression pattern for which the Match(String, Int32) method searches is defined by the call to one of the Regex class constructors. For more information about the elements that can form a regular expression pattern, see Regular Expression Language - Quick Reference.

You can optionally specify a starting position in the string by using the parameter. When the regular expression engine parses from left to right (the default), the match and the scan move rightward, starting at the character specified in . When the regular expression engine parses from right to left (when the regular expression pattern is constructed with the RegexOptions.RightToLeft option), the match and scan move in the opposite direction and begin with the character at -1. If you do not specify a starting position, the search begins at the default position. If the regular expression searches from left to right, the default position is at the left end of ; if it searches from right to left, the default position is at the right end of .

If you want to restrict a match so that it begins at a particular character position in the string and the regular expression engine does not scan the remainder of the string for a match, anchor the regular expression with a (at the left for a left-to-right pattern, or at the right for a right-to-left pattern). This restricts the match so it must start exactly at .

You can determine whether the regular expression pattern has been found in the input string by checking the value of the returned Match object's Success property. If a match is found, the returned Match object's Value property contains the substring from that matches the regular expression pattern. If no match is found, its value is String.Empty.

This method returns the first substring found at or after the character position in that matches the regular expression pattern. You can retrieve subsequent matches by repeatedly calling the returned Match object's Match.NextMatch method. You can also retrieve all matches in a single method call by calling the Regex.Matches(String, Int32) method.

The RegexMatchTimeoutException exception is thrown if the execution time of the matching operation exceeds the time-out interval specified by the Regex.Regex(String, RegexOptions, TimeSpan) constructor. If you do not set a time-out interval when you call the constructor, the exception is thrown if the operation exceeds any time-out value established for the application domain in which the Regex object is created. If no time-out is defined in the Regex constructor call or in the application domain's properties, or if the time-out value is Regex.InfiniteMatchTimeout, no exception is thrown.

See also

Match(String)

Searches the specified input string for the first occurrence of the regular expression specified in the Regex constructor.

Parameters

input
String

The string to search for a match.

Returns

Match

An object that contains information about the match.

Exceptions

Examples

The following example finds regular expression pattern matches in a string, then lists the matched groups, captures, and capture positions.

The regular expression pattern matches occurrences of the word "car" along with the word that precedes it. It is interpreted as shown in the following table.

PatternDescription
Match one or more word characters. This is the first capturing group.
Match one or more white-space characters.
(car)Match the literal string "car". This is the second capturing group.

Remarks

The Match(String) method returns the first substring that matches a regular expression pattern in an input string. For information about the language elements used to build a regular expression pattern, see Regular Expression Language - Quick Reference.

You can determine whether the regular expression pattern has been found in the input string by checking the value of the returned Match object's Success property. If a match is found, the returned Match object's Value property contains the substring from that matches the regular expression pattern. If no match is found, its value is String.Empty.

This method returns the first substring in that matches the regular expression pattern. You can retrieve subsequent matches by repeatedly calling the returned Match object's Match.NextMatch method. You can also retrieve all matches in a single method call by calling the Regex.Matches(String) method.

The RegexMatchTimeoutException exception is thrown if the execution time of the matching operation exceeds the time-out interval specified by the Regex.Regex(String, RegexOptions, TimeSpan) constructor. If you do not set a time-out interval when you call the constructor, the exception is thrown if the operation exceeds any time-out value established for the application domain in which the Regex object is created. If no time-out is defined in the Regex constructor call or in the application domain's properties, or if the time-out value is Regex.InfiniteMatchTimeout, no exception is thrown.

See also

Match(String, String)

Searches the specified input string for the first occurrence of the specified regular expression.

Parameters

input
String

The string to search for a match.

pattern
String

The regular expression pattern to match.

Returns

Match

An object that contains information about the match.

Exceptions

Examples

The following example calls the Match(String, String) method to find the first word that contains at least one character, and then calls the Match.NextMatch method to find any additional matches.

The regular expression pattern is interpreted as shown in the following table.

PatternDescription
Begin the match at a word boundary.
Match zero, one, or more word characters.
Match one or more occurrences of the character.
Match zero, one, or more word characters.
End the match at a word boundary.

Remarks

The Match(String, String) method returns the first substring that matches a regular expression pattern in an input string. For information about the language elements used to build a regular expression pattern, see Regular Expression Language - Quick Reference.

The static Match(String, String) method is equivalent to constructing a Regex object with the specified regular expression pattern and calling the instance Match(String) method. In this case, the regular expression engine caches the regular expression pattern.

The parameter consists of regular expression language elements that symbolically describe the string to match. For more information about regular expressions, see .NET Framework Regular Expressions and Regular Expression Language - Quick Reference.

You can determine whether the regular expression pattern has been found in the input string by checking the value of the returned Match object's Success property. If a match is found, the returned Match object's Value property contains the substring from that matches the regular expression pattern. If no match is found, its value is String.Empty.

This method returns the first substring in that matches the regular expression pattern. You can retrieve subsequent matches by repeatedly calling the returned Match object's Match.NextMatch method. You can also retrieve all matches in a single method call by calling the Regex.Matches(String, String) method.

The RegexMatchTimeoutException exception is thrown if the execution time of the matching operation exceeds the time-out interval specified for the application domain in which the method is called. If no time-out is defined in the application domain's properties, or if the time-out value is Regex.InfiniteMatchTimeout, no exception is thrown.

Notes to Callers

This method times out after an interval that is equal to the default time-out value of the application domain in which it is called. If a time-out value has not been defined for the application domain, the value InfiniteMatchTimeout, which prevents the method from timing out, is used. The recommended static method for retrieving a pattern match is Match(String, String), which lets you set the time-out interval.

See also

Sours: https://docs.microsoft.com/en-us/dotnet/api/system.text.regularexpressions.regex.match
  1. Macbook pro 2012 ssd
  2. Crude oil stock symbols
  3. The dao of dragon ball
  4. Hp printer ink 950
  5. Usaa tax refund deposit 2021

C# Regular Expressions

last modified September 5, 2021

C# Regular Expressions tutorial shows how to parse text in C# using regular expressions.

Regular expressions

Regular expressions are used for text searching and more advanced text manipulation. Regular expressions are built into tools including grep and sed, text editors including vi and emacs, programming languages including C#, Java, and Python.

C# has built-in API for working with regular expressions; it is located in .

A regular expression defines a search pattern for strings. represents an immutable regular expression. It contains methods to match text, replace text, or split text.

Regex examples

The following table shows a couple of regular expression strings.

RegexMeaning
Matches any single character.
Matches the preceding element once or not at all.
Matches the preceding element once or more times.
Matches the preceding element zero or more times.
Matches the starting position within the string.
Matches the ending position within the string.
Alternation operator.
Matches a or b, or c.
Range; matches a or b, or c.
Negation, matches everything except a, or b, or c.
Matches white space character.
Matches a word character; equivalent to

C# regex isMatch

The method indicates whether the regular expression finds a match in the input string.

Program.cs

using System; using System.Collections.Generic; using System.Text.RegularExpressions; var words = new List<string>() { "Seven", "even", "Maven", "Amen", "eleven" }; var rx = new Regex(@".even", RegexOptions.Compiled); foreach (string word in words) { if (rx.IsMatch(word)) { Console.WriteLine($"{word} does match"); } else { Console.WriteLine($"{word} does not match"); } }

In the example, we have five words in a list. We check which words match the regular expression.

var words = new List<string>() { "Seven", "even", "Maven", "Amen", "eleven" };

We have a list of words.

var rx = new Regex(@".even", RegexOptions.Compiled);

We define the regular expression. The option specifies that the regular expression is compiled to an assembly. This yields faster execution but increases startup time. The dot (.) metacharacter stands for any single character in the text.

foreach (string word in words) { if (rx.IsMatch(word)) { Console.WriteLine($"{word} does match"); } else { Console.WriteLine($"{word} does not match"); } }

We go through the list of words. The method returns true if the word matches the regular expression.

$ dotnet run Seven does match even does not match Maven does not match Amen does not match eleven does match

C# regex Match index

The property returns a boolean value indicating whether the match is successful. The method returns a new object with the results for the next match, starting at the position at which the last match ended.

We can find out the position of the matches in the string with the property of the .

Program.cs

using System; using System.Text.RegularExpressions; var content = @"Foxes are omnivorous mammals belonging to several genera of the family Canidae. Foxes have a flattened skull, upright triangular ears, a pointed, slightly upturned snout, and a long bushy tail. Foxes live on every continent except Antarctica. By far the most common and widespread species of fox is the red fox."; var rx = new Regex("fox(es)?", RegexOptions.Compiled | RegexOptions.IgnoreCase); Match match = rx.Match(content); while (match.Success) { Console.WriteLine($"{match.Value} at index {match.Index}"); match = match.NextMatch(); }

In the example, we look for all occurrences of the fox word.

var rx = new Regex("fox(es)?", RegexOptions.Compiled | RegexOptions.IgnoreCase);

We add the expression to include the plural form of the word. The searches in case-insensitive mode.

Match match = rx.Match(content); while (match.Success) { Console.WriteLine($"{match.Value} at index {match.Index}"); match = match.NextMatch(); }

The returns the matched string and the returns its index in the text. We find the next occurrence of a match with the method.

$ dotnet run Foxes at index 0 Foxes at index 80 Foxes at index 194 fox at index 292 fox at index 307

C# regex Matches

The method searches an input string for all occurrences of a regular expression and returns all the matches.

Program.cs

using System; using System.Text.RegularExpressions; String content = @"<p>The <code>Regex</code> is a compiled representation of a regular expression.</p>"; var rx = new Regex(@"</?[a-z]+>", RegexOptions.Compiled); var matches = rx.Matches(content); foreach (Match match in matches) { Console.WriteLine(match); }

The example retrieves all HTML tags from a string.

var rx = new Regex(@"</?[a-z]+>", RegexOptions.Compiled);

In the regular expression, we search for tags; both starting and ending.

var matches = rx.Matches(content);

The method returns a collection of the objects found by the search. If no matches are found, the method returns an empty collection object.

foreach (Match match in matches) { Console.WriteLine(match); }

We go through the collection and print all matched strings.

$ dotnet run <p> <code> </code> </p>

C# regex word boundaries

The metacharacter is an anchor which matches at a position that is called a word boundary. It allows to search for whole words.

Program.cs

using System; using System.Text.RegularExpressions; var text = "This island is beautiful"; var rx = new Regex(@"\bis\b", RegexOptions.Compiled); var matches = rx.Matches(text); foreach (Match match in matches) { Console.WriteLine($"{match.Value} at {match.Index}"); }

In the example, we look for the is word. We do not want to include the This and the island words.

var rx = new Regex(@"\bis\b", RegexOptions.Compiled);

With two metacharacters, we search for the is whole word.

$ dotnet run is at 12

C# regex implicit word boundaries

The is a character class used for a character allowed in a word. For the regular expression, which denotes a word, the leading and trailing word boundary metacharacters are implicit; i.e. is equal to .

Program.cs

using System; using System.Text.RegularExpressions; var content = @"Foxes are omnivorous mammals belonging to several genera of the family Canidae. Foxes have a flattened skull, upright triangular ears, a pointed, slightly upturned snout, and a long bushy tail. Foxes live on every continent except Antarctica. By far the most common and widespread species of fox is the red fox."; var rx = new Regex(@"\w+", RegexOptions.Compiled | RegexOptions.IgnoreCase); var matches = rx.Matches(content); Console.WriteLine(matches.Count); foreach (var match in matches) { Console.WriteLine(match); }

In the example, we search for all words in the text.

Console.WriteLine(matches.Count);

The property returns the number of matches.

C# regex currency symbols

The regular expresion can be used to look for currency symbols.

Program.cs

using System; using System.Text.RegularExpressions; Console.OutputEncoding = System.Text.Encoding.UTF8; string content = @"Currency symbols: ฿ Thailand bath, ₹ Indian rupee, ₾ Georgian lari, $ Dollar, € Euro, ¥ Yen, £ Pound Sterling"; string pattern = @"\p{Sc}"; var rx = new Regex(pattern, RegexOptions.Compiled); var matches = rx.Matches(content); foreach (Match match in matches) { Console.WriteLine($"{match.Value} is at {match.Index}"); }

In the example, we look for currency symbols.

string content = @"Currency symbols: ฿ Thailand bath, ₹ Indian rupee, ₾ Georgian lari, $ Dollar, € Euro, ¥ Yen, £ Pound Sterling";

We have a couple of currency symbols in the text.

string pattern = @"\p{Sc}";

We define the regular expression for the currency symbols.

foreach (Match match in matches) { Console.WriteLine($"{match.Value} is at {match.Index}"); }

We find all the symbols and their index.

$ dotnet run ฿ is at 18 ₹ is at 35 ₾ is at 57 $ is at 74 € is at 84 ¥ is at 92 £ is at 99

C# regex anchors

Anchors match positions of characters inside a given text. In the next example, we look if a string is located at the beginning of a sentence.

Program.cs

using System; using System.Collections.Generic; using System.Text.RegularExpressions; var sentences = new List<string>() { "I am looking for Jane.", "Jane was walking along the river.", "Kate and Jane are close friends." }; var rx = new Regex(@"^Jane", RegexOptions.Compiled); foreach (string sentence in sentences) { if (rx.IsMatch(sentence)) { Console.WriteLine($"{sentence} does match"); } else { Console.WriteLine($"{sentence} does not match"); } }

We have three sentences. The search pattern is . The pattern checks if the "Jane" string is located at the beginning of the text. would look for "Jane" at the end of the sentence.

C# regex alternations

The alternation operator | enables to create a regular expression with several choices.

Program.cs

using System; using System.Collections.Generic; using System.Text.RegularExpressions; var users = new List<string>() {"Jane", "Thomas", "Robert", "Lucy", "Beky", "John", "Peter", "Andy"}; var rx = new Regex("Jane|Beky|Robert", RegexOptions.Compiled); foreach (string user in users) { if (rx.IsMatch(user)) { Console.WriteLine($"{user} does match"); } else { Console.WriteLine($"{user} does not match"); } }

We have nine names in the list.

var rx = new Regex("Jane|Beky|Robert", RegexOptions.Compiled);

This regular expression looks for "Jane", "Beky", or "Robert" strings.

C# regex capturing groups

Round brackets are used to create capturing groups. This allows us to apply a quantifier to the entire group or to restrict alternation to a part of the regular expression.

Program.cs

using System; using System.Collections.Generic; using System.Text.RegularExpressions; var sites = new List<string>() {"webcode.me", "zetcode.com", "freebsd.org", "netbsd.org"}; var rx = new Regex(@"(\w+)\.(\w+)", RegexOptions.Compiled); foreach (var site in sites) { Match match = rx.Match(site); if (match.Success) { Console.WriteLine(match.Value); Console.WriteLine(match.Groups[1]); Console.WriteLine(match.Groups[2]); } Console.WriteLine("*****************"); }

In the example, we divide the domain names into two parts by using groups.

var rx = new Regex(@"(\w+)\.(\w+)", RegexOptions.Compiled);

We define two groups with parentheses.

if (match.Success) { Console.WriteLine(match.Value); Console.WriteLine(match.Groups[1]); Console.WriteLine(match.Groups[2]); }

The returns the whole matched string; it is equal to the . The groups are accessed via the property.

$ dotnet run webcode.me webcode me ***************** zetcode.com zetcode com ***************** freebsd.org freebsd org ***************** netbsd.org netbsd org *****************

In the following example, we use groups to work with expressions.

Program.cs

using System; using System.Text.RegularExpressions; string[] expressions = { "16 + 11", "12 * 5", "27 / 3", "2 - 8" }; string pattern = @"(\d+)\s+([-+*/])\s+(\d+)"; foreach (var expression in expressions) { var rx = new Regex(pattern, RegexOptions.Compiled); var matches = rx.Matches(expression); foreach (Match match in matches) { int val1 = Int32.Parse(match.Groups[1].Value); int val2 = Int32.Parse(match.Groups[3].Value); var oper = match.Groups[2].Value; string result = oper switch { "+" => $"{match.Value} = {val1 + val2}", "-" => $"{match.Value} = {val1 - val2}", "*" => $"{match.Value} = {val1 * val2}", "/" => $"{match.Value} = {val1 / val2}", _ => "unknown operator" }; Console.WriteLine(result); } }

The example parses four simple mathematical expressions and computes them.

string[] expressions = { "16 + 11", "12 * 5", "27 / 3", "2 - 8" };

We have an array of four expressions.

string pattern = @"(\d+)\s+([-+*/])\s+(\d+)";

In the regex pattern, we have three groups: two groups for the values, one for the operator.

int val1 = Int32.Parse(match.Groups[1].Value); int val2 = Int32.Parse(match.Groups[3].Value);

We get the values and transform them into integers.

var oper = match.Groups[2].Value;

We get the operator.

string result = oper switch { "+" => $"{match.Value} = {val1 + val2}", "-" => $"{match.Value} = {val1 - val2}", "*" => $"{match.Value} = {val1 * val2}", "/" => $"{match.Value} = {val1 / val2}", _ => "unknown operator" };

With the switch expression, we compute the expressions.

$ dotnet run 16 + 11 = 27 12 * 5 = 60 27 / 3 = 9 2 - 8 = -6

C# regex captures

When we use quantifiers, the group can capture zero, one, or more strings in a single match. All the substrings matched by a single capturing group are available from the property. In such as case, the object contains information about the last captured substring.

Program.cs

using System; using System.Text.RegularExpressions; string text = "Today is a beautiful day. The sun is shining."; string pattern = @"\b(\w+\s*)+\."; MatchCollection matches = Regex.Matches(text, pattern); foreach (Match match in matches) { Console.WriteLine("Matched sentence: {0}", match.Value); for (int i = 0; i < match.Groups.Count; i++) { Console.WriteLine("\tGroup {0}: {1}", i, match.Groups[i].Value); int captures = 0; foreach (Capture capture in match.Groups[i].Captures) { Console.WriteLine("\t\tCapture {0}: {1}", captures, capture.Value); captures++; } } }

In the example, we have two sentences. With a regular expression, we capture all words from a sentence.

string pattern = @"\b(\w+\s*)+\.";

We use the quantifier for the group. The group then contains all captures: words of the sentence.

foreach (Capture capture in match.Groups[i].Captures) { Console.WriteLine("\t\tCapture {0}: {1}", captures, capture.Value); captures++; }

We go through the captures of the group and print them to the console.

$ dotnet run Matched sentence: Today is a beautiful day. Group 0: Today is a beautiful day. Capture 0: Today is a beautiful day. Group 1: day Capture 0: Today Capture 1: is Capture 2: a Capture 3: beautiful Capture 4: day Matched sentence: The sun is shining. Group 0: The sun is shining. Capture 0: The sun is shining. Group 1: shining Capture 0: The Capture 1: sun Capture 2: is Capture 3: shining

This is the output. Remember that equals to .

C# regex replacing strings

It is possible to replace strings with . The method returns the modified string.

Program.cs

using System; using System.Text.RegularExpressions; using System.Net.Http; using System.Threading.Tasks; using var client = new HttpClient(); var content = await client.GetStringAsync("http://webcode.me"); var rx = new Regex(@"<[^>]*>", RegexOptions.Compiled | RegexOptions.IgnoreCase); var modified = rx.Replace(content, String.Empty); Console.WriteLine(modified.Trim());

The example reads HTML data of a web page and strips its HTML tags using a regular expression.

using var client = new HttpClient(); var content = await client.GetStringAsync("http://webcode.me");

We create a GET request with and retrieve the HTML code.

var rx = new Regex(@"<[^>]*>", RegexOptions.Compiled | RegexOptions.IgnoreCase);

This pattern defines a regular expression that matches HTML tags.

var modified = rx.Replace(content, String.Empty);

We remove all the tags with method.

C# regex splitting text

Text can be split with method.

data.csv

22, 1, 3, 4, 5, 17, 18, 2, 13, 4, 1, 8, 4, 3, 21, 4, 5, 1, 48, 9, 42

We read from file.

Program.cs

using System; using System.IO; using System.Text.RegularExpressions; string content = File.ReadAllText("data.csv"); var rx = new Regex(@",\s*", RegexOptions.Compiled); var data = rx.Split(content); Console.WriteLine("[{0}]", string.Join(", ", data)); int sum = 0; Array.ForEach(data, e => { string e2 = e.Trim(); sum += Int32.Parse(e); }); Console.WriteLine(sum);

The examples reads values from a CSV file and computes the sum of them. It uses regular expression to process the data.

string content = File.ReadAllText("data.csv");

In one shot, we read all data into the list of strings with .

var rx = new Regex(@",\s*", RegexOptions.Compiled);

The regular expression is a comma character followed by zero or more white space characters.

var data = rx.Split(content);

The method splits an input string into an array of substrings.

int sum = 0; Array.ForEach(data, e => { var e2 = e.Trim(); sum += Int32.Parse(e); });

We go through the lines and cut off spaces with and compute the value.

$ dotnet run [22, 1, 3, 4, 5, 17, 18, 2, 13, 4, 1, 8, 4, 3, 21, 4, 5, 1, 48, 9, 42] 235

This is the output.

C# case-insensitive regular expression

By setting the flag, we can have case-insensitive matching.

Program.cs

using System; using System.Collections.Generic; using System.Text.RegularExpressions; var words = new List<string>() { "dog", "Dog", "DOG", "Doggy" }; var rx = new Regex(@"\bdog\b", RegexOptions.Compiled | RegexOptions.IgnoreCase); foreach (string word in words) { if (rx.IsMatch(word)) { Console.WriteLine($"{word} does match"); } else { Console.WriteLine($"{word} does not match"); } }

The example performs case-insensitive matching of the regular expression.

var rx = new Regex(@"\bdog\b", RegexOptions.Compiled | RegexOptions.IgnoreCase);

Case-insensitive matching is enabled by setting as the second parameter to .

C# regex subpatterns

Subpatterns are patterns within patterns. Subpatterns are created with () characters.

Program.cs

using System; using System.Collections.Generic; using System.Text.RegularExpressions; var words = new List<string>() {"book", "bookshelf", "bookworm", "bookcase", "bookish", "bookkeeper", "booklet", "bookmark"}; var rx = new Regex("^book(worm|mark|keeper)?$", RegexOptions.Compiled); foreach (string word in words) { if (rx.IsMatch(word)) { Console.WriteLine($"{word} does match"); } else { Console.WriteLine($"{word} does not match"); } }

The example creates a subpattern.

var rx = new Regex("^book(worm|mark|keeper)?$", RegexOptions.Compiled);

The regular expression uses a subpattern. It matches bookworm, bookmark, bookkeeper, and book words.

C# regex word frequency

In the next example, we count the frequency of words in a file.

$ wget https://raw.githubusercontent.com/janbodnar/data/main/the-king-james-bible.txt

We use the King James Bible.

Program.cs

using System; using System.IO; using System.Linq; using System.Text.RegularExpressions; var fileName = "/home/janbodnar/Documents/the-king-james-bible.txt"; var text = File.ReadAllText(fileName); var matches = new Regex("[a-z-A-Z']+").Matches(text); var words = matches.Select(m => m.Value).ToList(); var res = words .GroupBy(m => m) .OrderByDescending(g => g.Count()) .Select(x => new { word = x.Key, Count = x.Count() }) .Take(10); foreach (var r in res) { Console.WriteLine($"{r.word}: {r.Count}"); }

In the example, we count the frequency of the words from the King James Bible.

var matches = new Regex("[a-z-A-Z']+").Matches(text); var words = matches.Select(m => m.Value).ToList();

We find all the matches witch method. From the match collection, we get all the words into a list.

var res = words .GroupBy(m => m) .OrderByDescending(g => g.Count()) .Select(x => new { word = x.Key, Count = x.Count() }) .Take(10);

The words are grouped and ordered by frequency in descending order. We take the first top words.

$ dotnet run the 62103 and 38848 of 34478 to 13400 And 12846 that 12576 in 12331 shall 9760 he 9665 unto 8942

C# regex email example

In the following example, we create a regex pattern for checking email addresses.

Program.cs

using System; using System.Collections.Generic; using System.Text.RegularExpressions; var emails = new List<string>() {"[email protected]", "[email protected]", "34234sdfa#2345", "[email protected]"}; var pattern = @"[a-zA-Z0-9._-][email protected][a-zA-Z0-9-]+\.[a-zA-Z.]{2,18}"; var rx = new Regex(pattern, RegexOptions.Compiled); foreach (string email in emails) { if (rx.IsMatch(email)) { Console.WriteLine($"{email} does match"); } else { Console.WriteLine($"{email} does not match"); } }

This example provides only one possible solution.

var pattern = @"[a-zA-Z0-9._-][email protected][a-zA-Z0-9-]+\.[a-zA-Z.]{2,18}";

The email is divided into five parts. The first part is the local part. This is usually a name of a company, individual, or a nickname. The lists all possible characters, we can use in the local part. They can be used one or more times.

The second part consists of the literal character. The third part is the domain part. It is usually the domain name of the email provider, like Yahoo or Gmail. The is a character set providing all characters that can be used in the domain name. The quantifier makes use of one or more of these characters.

The fourth part is the dot character. It is preceded by the escape character (\). This is because the dot character is a metacharacter and has a special meaning. By escaping it, we get a literal dot.

The final part is the top level domain: . Top level domains can have from 2 to 18 characters, such as sk, net, info, travel, cleaning, travelinsurance. The maximum length can be 63 characters, but most domain are shorter than 18 characters today. There is also a dot character. This is because some top level domains have two parts; for example co.uk.

In this tutorial, we have worked with regular expression in C#.

List all C# tutorials.

Sours: https://zetcode.com/csharp/regex/
c# regular expression groups in 5 min

C# Regex.Match Examples: Regular ExpressionsUse the Regex class and Regex.Match, reviewing features from System.Text.RegularExpressions.

Regex. Programs read in text and often must process it in some way. Often the easiest way to process text is with regular expressions. The Regex class in C# helps here.

Regex details. With methods like Regex.Match, we pass in a pattern, and receive matches based on that pattern. We can optionally create a Regex instance first.

Simple example.This program introduces the Regex class. Regex, and Match, are found in the System.Text.RegularExpressions namespace.

Step 1 We create a Regex. The Regex uses a pattern that indicates one or more digits.

Step 2 Here we invoke the Match method on the Regex. The characters "55" match the pattern specified in step 1.

Step 3 The returned Match object has a bool property called Success. If it equals true, we found a match.

C# program that uses Match, Regex

using System; using System.Text.RegularExpressions; class Program { static void Main() { // Step 1: create new Regex.Regexregex = new Regex(@"\d+"); // Step 2: call Match on Regex instance.Match match = regex.Match("a55a"); // Step 3: test for Success.if (match.Success) { Console.WriteLine("MATCH VALUE: "+ match.Value); } } }MATCH VALUE: 55
Complex example.We do not need to create a Regex instance to use Match: we can invoke the static Regex.Match. This example builds up some complexity—we access Groups after testing Success.

Part 1 This is the string we are testing. Notice how it has a file name part inside a directory name and extension.

Part 2 We use the Regex.Match static method. The second argument is the pattern we wish to match with.

Part 3 We test the result of Match with the Success property. When true, a Match occurred and we can access its Value or Groups.

Regex Groups

Part 4 We access Groups when Success is true. This collection is indexed at 1, not zero—the first group is found at index 1.

C# program that uses Regex.Match

using System; using System.Text.RegularExpressions; class Program { static void Main() { // Part 1: the input string.string input = "/content/alternate-1.aspx"; // Part 2: call Regex.Match.Match match = Regex.Match(input, @"content/([A-Za-z0-9\-]+)\.aspx$", RegexOptions.IgnoreCase); // Part 3: check the Match for Success.if (match.Success) { // Part 4: get the Group value and display it.string key = match.Groups[1].Value; Console.WriteLine(key); } } }alternate-1@" This starts a verbatim string literal. content/ The group must follow this string. [A-Za-z0-9\-]+ One or more alphanumeric characters. (...) A separate group. \.aspx This must come after the group. $ Matches the end of the string.
Start, end matching.We can use metacharacters to match the start and end of strings. This is often done when using regular expressions. Use "^" to match the start, and "$" for the end.

IsMatch Instead of returning a Match object like Regex.Match, IsMatch just returns bool that indicates success.

Also We can use the special start and end-matching characters in Regex.Match—it will return any possible matches at those positions.

C# program that uses IsMatch, start and end

using System; using System.Text.RegularExpressions; class Program { static void Main() { string test = "xxyy"; // Match the start of a string.if (Regex.IsMatch(test, "^xx")) { Console.WriteLine("START MATCHES"); } // Match the end of a string.if (Regex.IsMatch(test, "yy$")) { Console.WriteLine("END MATCHES"); } } }START MATCHES END MATCHES^ Match start of string. xx Match 2 x chars. yy Match 2 y chars. $ Match end of string.
NextMatch.More than one match may be found. We can call NextMatch() to search for a match that comes after the current one in the text. NextMatch can be used in a loop.

Step 1 We call Regex.Match. Two matches occur. This call to Regex.Match returns the first Match only.

Step 2 NextMatch returns another Match object—it does not modify the current one. We assign a variable to it.

C# program that uses NextMatch

using System; using System.Text.RegularExpressions; class Program { static void Main() { string value = "4 AND 5"; // Step 1: get first match.Match match = Regex.Match(value, @"\d"); if (match.Success) { Console.WriteLine(match.Value); } // Step 2: get second match.match = match.NextMatch(); if (match.Success) { Console.WriteLine(match.Value); } } }4 5

C# program that uses Replace, replaces with pattern

using System; using System.Text.RegularExpressions; class Program { static void Main() { // Replace 2 or more digit pattern with a string.Regex regex = new Regex(@"\d+"); string result = regex.Replace("cat 123 456", "bird"); Console.WriteLine("RESULT: {0}", result); } }RESULT: cat bird bird\d+ One or more digit characters.
Greedy matching.Some regular expressions want to match as many characters as they can—this is the default behavior. But with the "?" metacharacter, we can change this.

Version 1 Use the lazy "?" character to match as few characters before the slash as possible.

Version 2 Use the default greedy regular expression behavior—the result Value is as long as possible.

C# program that uses non-greedy Regex

using System; using System.Text.RegularExpressions; class Program { static void Main() { string test = "/bird/cat/"; // Version 1: use lazy (or non-greedy) metacharacter.var result1 = Regex.Match(test, "^/.*?/"); if (result1.Success) { Console.WriteLine("NON-GREEDY: {0}", result1.Value); } // Version 2: default Regex.var result2 = Regex.Match(test, "^/.*/"); if (result2.Success) { Console.WriteLine("GREEDY: {0}", result2.Value); } } }NON-GREEDY: /bird/ GREEDY: /bird/cat/^ Match start of line. / Match forward slash character. .* Zero or more characters, as many as possible. .*? Zero or more characters, as few as possible.
Static.Often a Regex instance object is faster than the static Regex.Match. For performance, we should usually use an instance object. It can be shared throughout an entire project.

Static Regex

Sometimes We only need to call Match once in a program's execution. A Regex object does not help here.

Class Here a static class stores an instance Regex that can be used project-wide. We initialize it inline.

C# program that uses static Regex

using System; using System.Text.RegularExpressions; class Program { static void Main() { // The input string again.string input = "/content/alternate-1.aspx"; // This calls the static method specified.Console.WriteLine(RegexUtil.MatchKey(input)); } } staticclass RegexUtil { staticRegex _regex = new Regex(@"/content/([a-z0-9\-]+)\.aspx$"); /// <summary> /// This returns the key that is matched within the input. /// </summary>static public string MatchKey(string input) { Match match = _regex.Match(input.ToLower()); if (match.Success) { return match.Groups[1].Value; } else { return null; } } }alternate-1
Match, parse numbers.A common requirement is extracting a number from a string. We can do this with Regex.Match. To get further numbers, consider Matches() or NextMatch.

Digits We extract a group of digit characters and access the Value string representation of that number.

Parse To parse the number, use int.Parse or int.TryParse on the Value here. This will convert it to an int.

int.Parse

C# program that matches and parses a number

using System; using System.Text.RegularExpressions; class Program { static void Main() { string input = "Dot Net 100 Perls"; Match match = Regex.Match(input, @"\d+"); if (match.Success) { int.TryParse(match.Value, out int number); // Show that we have the numbers.Console.WriteLine("NUMBERS: {0}, {1}", number, number + 1); } } }NUMBERS: 100, 101
Value, length, index.A Match object, returned by Regex.Match has a Value, Length and Index. These describe the matched text (a substring of the input).

Value This is the matched text, represented as a separate string. This is a substring of the original input.

Length This is the length of the Value string. Here, the Length of "Axxxxy" is 6.

Index The index where the matched text begins within the input string. The character "A" starts at index 4 here.

C# program that shows value, length, index

using System; using System.Text.RegularExpressions; class Program { static void Main() { Match m = Regex.Match("123 Axxxxy", @"A.*y"); if (m.Success) { Console.WriteLine("Value = " + m.Value); Console.WriteLine("Length = " + m.Length); Console.WriteLine("Index = " + m.Index); } } }Value = Axxxxy Length = 6 Index = 4
IsMatch.This method tests for a matching pattern. It does not capture groups from this pattern. It just sees if the pattern exists in a valid form in the input string.

Bool IsMatch returns a bool value. Both overloads receive an input string that is searched for matches.

Bool Method

Internals When we use the static Regex.IsMatch method, a new Regex is created. This is done in the same way as any instance Regex.

And This instance is discarded at the end of the method. It will be cleaned up by the garbage collector.

C# program that uses Regex.IsMatch method

using System; using System.Text.RegularExpressions; class Program { /// <summary> /// Test string using Regex.IsMatch static method. /// </summary>static bool IsValid(string value) { return Regex.IsMatch(value, @"^[a-zA-Z0-9]*$"); } static void Main() { // Test the strings with the IsValid method.Console.WriteLine(IsValid("dotnetperls0123")); Console.WriteLine(IsValid("DotNetPerls")); Console.WriteLine(IsValid(":-)")); // Console.WriteLine(IsValid(null)); // Throws an exception} }True True False
RegexOptions.With the Regex type, the RegexOptions enum is used to modify method behavior. Often I find the IgnoreCase value helpful.

IgnoreCase Lowercase and uppercase letters are distinct in the Regex text language. IgnoreCase changes this.

RegexOptions.IgnoreCase

Multiline We can change how the Regex type acts upon newlines with the RegexOptions enum. This is often useful.

RegexOptions.Multiline

C# program that uses RegexOptions.IgnoreCase

using System; using System.Text.RegularExpressions; class Program { static void Main() { const string value = "TEST"; // ... This ignores the case of the "TE" characters.if (Regex.IsMatch(value, "te..", RegexOptions.IgnoreCase)) { Console.WriteLine(true); } } }True
Benchmark, Regex.Consider the performance of Regex.Match. If we use the RegexOptions.Compiled enum, and use a cached Regex object, we can get a performance boost.

RegexOptions.Compiled

Version 1 In this version of the code, we call the static Regex.Match method, without any object caching.

Version 2 Here we access a cached object and call Match() on this instance of the Regex.

Result By using a static field Regex, and RegexOptions.Compiled, our method completes twice as fast (tested on .NET 5 for Linux).

Warning A compiled Regex will cause a program to start up slower, and may use more memory—so only compile hot Regexes.

C# program that benchmarks Match, RegexOptions.Compiled

using System; using System.Diagnostics; using System.Text.RegularExpressions; class Program { static int Version1() { string value = "This is a simple 5string5 for Regex."; return Regex.Match(value, @"5\w+5").Length; } static Regex _wordRegex = new Regex(@"5\w+5", RegexOptions.Compiled); static int Version2() { string value = "This is a simple 5string5 for Regex."; return _wordRegex.Match(value).Length; } const int _max = 1000000; static void Main() { // Version 1: use Regex.Match.var s1 = Stopwatch.StartNew(); for(int i = 0; i < _max; i++) { if (Version1() != 8) { return; } } s1.Stop(); // Version 2: use Regex.Match, compiled Regex, instance Regex.var s2 = Stopwatch.StartNew(); for(int i = 0; i < _max; i++) { if (Version2() != 8) { return; } } s2.Stop(); Console.WriteLine(((double)(s1.Elapsed.TotalMilliseconds * 1000000) / _max).ToString("0.00 ns")); Console.WriteLine(((double)(s2.Elapsed.TotalMilliseconds * 1000000) / _max).ToString("0.00 ns")); } }265.90 nsRegex.Match 138.78 nsinstanceRegex.Match, Compiled
Benchmark, Regex and loop.Regular expressions can be reimplemented with loops. For example, a loop can make sure that a string only contains a certain range of characters.

Info The string must only contain the characters "a" through "z" lowercase and uppercase, and the ten digits "0" through "9."

Version 1 This method uses Regex.IsMatch to tell whether the string only has the range of characters specified.

Version 2 This uses a for-loop to iterate through the character indexes in the string. It employs a switch on the char.

For

Switch

Result In .NET 5 for Linux (tested in 2021) the regular expression is slower than the loop. But Regex performance has been improved.

C# program that benchmarks Regex versus loop

using System; using System.Diagnostics; using System.Text.RegularExpressions; class Program { static bool IsValid1(string path) { return Regex.IsMatch(path, @"^[a-zA-Z0-9]*$"); } static bool IsValid2(string path) { for (int i = 0; i < path.Length; i++) { switch (path[i]) { case 'a': case 'b': case 'c': case 'd': case 'e': case 'f': case 'g': case 'h': case 'i': case 'j': case 'k': case 'l': case 'm': case 'n': case 'o': case 'p': case 'q': case 'r': case 's': case 't': case 'u': case 'v': case 'w': case 'x': case 'y': case 'z': case 'A': case 'B': case 'C': case 'D': case 'E': case 'F': case 'G': case 'H': case 'I': case 'J': case 'K': case 'L': case 'M': case 'N': case 'O': case 'P': case 'Q': case 'R': case 'S': case 'T': case 'U': case 'V': case 'W': case 'X': case 'Y': case 'Z': case '0': case '1': case '2': case '3': case '4': case '5': case '6': case '7': case '8': case '9': { continue; } default: { return false; } } } return true; } const int _max = 1000000; static void Main() { // Version 1: use Regex.var s1 = Stopwatch.StartNew(); for(int i = 0; i < _max; i++) { if (IsValid1("hello") == false || IsValid1("$bye") == true) { return; } } s1.Stop(); // Version 2: use for-loop.var s2 = Stopwatch.StartNew(); for(int i = 0; i < _max; i++) { if (IsValid2("hello") == false || IsValid2("$bye") == true) { return; } } s2.Stop(); Console.WriteLine(((double)(s1.Elapsed.TotalMilliseconds * 1000000) / _max).ToString("0.00 ns")); Console.WriteLine(((double)(s2.Elapsed.TotalMilliseconds * 1000000) / _max).ToString("0.00 ns")); } }265.71 nsRegex.IsMatch 10.15 nsfor, switch
Matches.Sometimes one match is not enough. Here we use Matches instead of Match: it returns multiple Match objects at once. These are returned in a MatchCollection.

Regex.Matches

Regex.Matches, Quote

Regex.Split.Do you need to extract substrings that contain only certain characters (certain digits, letters)? Split() returns a string array that will contain the matching substrings.

Regex.Split

Regex.Split Numbers

Escape.This method can change a user input to a valid Regex pattern. It assumes no metacharacters were intended. The input string should be only literal characters.

Regex.Escape

Word count.With Regex we can count words in strings. We compare this method with Microsoft Word's implementation. We come close to Word's algorithm.

Word Count

Files.We often need to process text files. The Regex type, and its methods, are used for this. But we need to combine a file input type (like StreamReader) with the Regex code.

Regex File

Review, performance. Regex calls are usually slower than well-written, equivalent char-testing for-loops. We speed things up with RegexOptions.Compiled and cached Regex fields.

A summary. Regular expressions are a concise way to process text data. We use Regex.Matches, and IsMatch, to check a pattern (evaluating its metacharacters) against an input string.

© 2007-2021 sam allen. see site info on the changelog

Sours: https://www.dotnetperls.com/regex

Regex match c#

Regex Modifiers

Regular Expressions (Regex):

In previous articles, we talked about what Regular Expressions are and how to use them in C# for matching, replacing and so on. At this point, you should already have realized how powerful Regular Expressions are and how they can help you in a lot of situations, but they get even more powerful when you know about the possible modifiers.

When working with Regular Expressions, you can use one or several modifiers to control the behavior of the matching engine. For instance, a Regex matching process is usually case-sensitive, meaning that "a" is not the same as "A". However, in a lot of situations, you want your match to be case-insensitive so that the character "a" is just a letter, no matter if its in lowercase or UPPERCASE. Simply supply the RegexOptions.IgnoreCase option when creating the Regex instance and your match will be case-insensitive.

You'll find all the available modifiers in the RegexOptions enumeration. Several of them are common among all programming languages supporting the Regular Expression standard, while others are specific to the .NET framework.

As you'll see in the first example, Regex modifiers are usually specified as the second parameter when creating the Regex instance. You can specify more than one option by separating them with a pipe (|) character, like this:

Now let's run through all the modifiers to give you an idea of how they work and what they can do for you.

RegexOptions.IgnoreCase

This will likely be one of your most used modifiers. As described above, it will change your Regular Expressions from being case-sensitive to being case-insensitive. This makes a big difference, as you can see in this example:

We specify a simple Regex, designed to match only letters (a-z) and whitespaces. We use it to create to Regex instances: One without the RegexOptions.IgnoreCase modifier and one with it, and then we try to match the same test string, which consists of lowercase and UPPERCASE characters and a single space. The output will, probably not surprisingly, look like this:

RegexOptions.Singleline

In Regular Expressions, the dot (.) is basically a catch-all character. However, by default, it doesn't match linebreaks, meaning that you can use the dot to match an entire line of letters, numbers, special characters and so on, but the match will end as soon as a linebreak is encountered. However, if you supply the Singleline modifier, the dot will match linebreaks as well. Allow me to demonstrate the difference:

The output will look like this:

RegexOptions.Multiline

As we have talked about in this chapter, Regular Expressions consists of many different characters which have special purposes. Another example of this is these two characters: ^ and $. We actually used them in the case-sensitivity example above, to match the beginning and end of a string. However, by supplying the Multiline modifier, you can change this behavior from matching the beginning/end of a string to match the beginning/end of lines. This is very useful when you want to deal individually with the lines matched. Here's an example:

Notice how I use several a test string consisting of several lines and then use the matching mechanisms differently: With singlelineRegex, we treat the entire test string as one line, even though it contains linebreaks, as we discussed above. When using the multilineRegex we treat the test string as multiple lines, each resulting in a match. We can use the Regex.Matches() method to catch each line and work with it - in this case, we simply output it to the Console.

RegexOptions.Compiled

While Regular Expressions are generally pretty fast, they can slow things down a bit if they are very complex and executed many times, e.g. in a loop. For these situations, you may want to use the RegexOptions.Compiled modifier, which will allow the framework to compile the Regex into an assembly. This costs a little extra time when you create it, compared to just instantiating a Regex object normally, but it will make all subsequent Regex operations (matches etc.) faster:

More modifiers

The above modifiers are the most interesting ones, but there's a few more, which we'll just go through a bit faster:

  • RegexOptions.CultureInvariant: With this modifier, cultural differences in language is ignored. This is mostly relevant if your application works with multiple non-English languages.
  • RegexOptions.ECMAScript: Changes the Regex variant used from the .NET specific version to the ECMAScript standard. This should rarely be necessary.
  • RegexOptions.ExplicitCapture: Normally, a set of parentheses in a Regex acts as a capturing group, allowing you to access each captured value through an index. If you specify the ExplicitCapture modifier, this behavior is changed so that only named groups are captured and stored for later retrieval.
  • RegexOptions.IgnorePatternWhitespace: When this modifier is enabled, whitespace in the Regex is ignored and you are even allowed to include comments, prefixed with the hash (#) char.
  • RegexOptions.RightToLeft: Changes matching to start from right and move left, instead of the default from left to right.

Summary

As you can see, there are many important Regex modifiers that you should know about to take full advantage of Regular Expressions, to support as many use-cases as possible.

This article has been fully translated into the following languages: Is your preferred language not on the list? Click hereto help us translate this article into your language!

PreviousNext

Sours: https://csharp.net-tutorials.com/regular-expressions-regex/regex-modifiers/
(Regex) Regular expressions in C#.net 2016

What is Regular Expression in C#?

In C#, Regular Expression is a pattern which is used to parse and check whether the given input text is matching with the given pattern or not. In C#, Regular Expressions are generally termed as C# Regex. The .Net Framework provides a regular expression engine that allows the pattern matching. Patterns may consist of any character literals, operators or constructors. 
C# provides a class termed as Regex which can be found in System.Text.RegularExpression namespace. This class will perform two things:

  • Parsing the inputting text for the regular expression pattern.
  • Identify the regular expression pattern in the given text.

Example 1: Below example demonstrate the use of regex in Mobile Number Verification. Suppose you are making a form where you need to verify the user-entered mobile number then you can use regex. 

C#

 
 
 
 
 
 

Output:

9925612824 is a valid mobile number. 8238783138 is a valid mobile number. 02812451830 is not a valid mobile number.

Example 2: Below example demonstrate the use of regex in Email ID Verification. Suppose you are making a form where you need to verify the user-entered email id then you can use regex.

C#

 
 
 
 
 
 

Output:



[email protected] is a valid E-mail address. parthmaniyargmail.com is not a valid E-mail address. @gmail.com is not a valid E-mail address.

 

Regex Syntax

There are many basic syntaxes like Quantifiers, Special Characters, Character Classes, Grouping & Alternatives are used for regular expressions.

Quantifiers:

Sub-expression(Greedy)Sub-expression(Lazy)Matches
**?Used to match the preceding character zero or more times.
++?Used to match the preceding character one or more times.
???Used to match the preceding character zero or one time.
{n}{n}?Used to match the preceding character exactly n times.
{n, }{n, }?Used to match the preceding character at least n times.
{n, m}{n, m}?Used to match the preceding character from n to m times.

Example 1:

C#

 
 

Output:

Match Value: aaaab

Example 2:

C#

 
 

Output:

Match Value: aaab

Example 3:



C#

 
 
 
 

Output:

Match Value: ab

Special Characters

Sub-expressionMatches
^Word after this element matches at the beginning of the string or line.
$Word before this element matches at the end of the line or string.
.(Dot)Matches any character only once expect \n(new line).
\dIt is use to match the digit character.
\DIt is use to match the non-digit character.
\wIt is use to match any alphanumeric and underscore character.
\WIt is use to match the any non-word character.
\sIt is use to match the white-space characters.
\SIt is use to match the non white-space characters.
\nIt is use to match a newline character.

Example 1:

C#

 
 
 
 

Output:

Match Value: Shyam

Example 2:

C#

 
 
 
 

Output:

Match Value: Parth

Example 3:

C#

 
 
 

Output:

Match Value: seat

Example 4:

C#

 
 
 
 

Output: 



Match Value: 1

Character Classes

Sub-expressionMatches
[]It is used to match the range of character
[a-z]It is used to match any character in the range of a-z.
[^a-z]It is used to match any character not in the range of a-z.
\It is used to match Escaped special character.

Example 1:

C#

 
 
 
 

Output:

Match Value: a

Example 2:

C#

 
 
 
 

Output:

Match Value: x

Example 3:

C#

 
 
 
 

Output:

Match Value: m

Grouping and Alternatives

Sub-expressionMatches
()It is used for group expression
(a|b)| Operator is used for alternative either a or b.
(?(exp) yes|no)If expression is matched it gives yes otherwise it gives no.

Example 1:

C#

 
 
 
 

Output:

Match Value: cdcd

Example 2:

C#

 
 
 
 

Output:

Match Value: e

 




My Personal Notesarrow_drop_up
Sours: https://www.geeksforgeeks.org/what-is-regular-expression-in-c-sharp/

You will also be interested:

c# regex matches example

It looks like most of post here described what you need here. However - something you might need more complex behavior - depending on what you're parsing. In your case it might be so that you won't need more complex parsing - but it depends what information you're extracting.

You can use regex groups as field name in class, after which could be written for example like this:

This mechanism uses C# reflection to set value to class. group name is matched against field name in class instance. Please note that Convert.ChangeType won't accept any kind of garbage.

If you want to add tracking of line / column - you can add extra Regex split for lines, but in order to keep for loop intact - all match patterns must have named groups. (Otherwise column index will be calculated incorrectly)

This will results in following output:

answered Nov 21 '14 at 11:36

TarmoPikaroTarmoPikaro

3,85811 gold badge3434 silver badges4848 bronze badges

Sours: https://stackoverflow.com/questions/4740984/c-sharp-regex-matches-example/4741010


1169 1170 1171 1172 1173