INI Files Meet Regex and Linq in C# to Avoid the WayBack Machine of Kernal32.Dll

BetweenStonesWhat if you are stuck having to deal with older technology such as INI files while using the latest and greatest C# and .Net there is available? This article discusses an alternate way to read INI files and extract the data from those dusty tomes while  easily accessing the resulting data from dictionaries. Once the data resides in the dictionaries we can easily extract the data using the power of the indexer on section name followed by key name within the section. Such as IniFile[“TargetSection”][“TargetKey”] which will return a string of the value of that key in the ini file for that section.

Note all the code is one easy code section at the bottom of the article so don’t feel you have to copy each sections code.

Overview

If you are reading this, chances are you know what INI files are and don’t need a refresher. You may have looked into using the Win32 Kern32.dll method GetPrivateProfileSection to achieve your goals. Ack!  “Set the Wayback machine Sherman!” Thanks but no thanks.

Here is how to do this operation using Regular Expressions (Kinda a way back machine but very useful) and Linq to Object to get the values into a dictionary format so we can write this line of code to access the data within the INI file:

string myValue = IniFile[“SectionName”][“KeyName”];

The Pattern

Let me explain the Regex Pattern. If you are not so inclined to understand the semantics of it skip to the next section.

string pattern = @"
^                           # Beginning of the line
((?:\[)                     # Section Start
 (?<Section>[^\]]*)         # Actual Section text into Section Group
 (?:\])                     # Section End then EOL/EOB
 (?:[\r\n]{0,}|\Z))         # Match but don't capture the CRLF or EOB
 (                          # Begin capture groups (Key Value Pairs)
   (?!\[)                    # Stop capture groups if a [ is found; new section
   (?<Key>[^=]*?)            # Any text before the =, matched few as possible
   (?:=)                     # Get the = now
   (?<Value>[^\r\n]*)        # Get everything that is not an Line Changes
   (?:[\r\n]{0,4})           # MBDC \r\n
  )+                        # End Capture groups";

Our goal is to use Named Match groups. Each match will have its section name in the named group called  “Section”  and all of the data, which is the key and value pairs will be named “Key” and “Value” respectively.  The trick to the above pattern is found in line eight. That stops the match when a new section is hit using the Match Invalidator (?!). Otherwise our key/values would bleed into the next section if not stopped.

The Data

Here is the data for your perusal.

string data = @"[WindowSettings]
Window X Pos=0
Window Y Pos=0
Window Maximized=false
Window Name=Jabberwocky

[Logging]
Directory=C:\Rosetta Stone\Logs
";

We are interested in “Window Name” and “Directory”.

The Linq

Ok, if you thought the regex pattern was complicated, the Linq to Objects has some tricks up its sleeve as well. Primarily since our pattern matches create a single match per section with the accompany key and value data in two separate named match capture collections, that presents a problem. We need to join the the capture collections together, but there is no direct way to do that for the join in Linq because that link is only an indirect by the collections index number.

How do we get the two collections to be joined?

Here is the code:

Dictionary<string, Dictionary<string, string>> InIFile
= ( from Match m in Regex.Matches( data, pattern, RegexOptions.IgnorePatternWhitespace | RegexOptions.Multiline )
 select new
 {
  Section = m.Groups["Section"].Value,

  kvps = ( from cpKey in m.Groups["Key"].Captures.Cast<Capture>().Select( ( a, i ) => new { a.Value, i } )
     join cpValue in m.Groups["Value"].Captures.Cast<Capture>().Select( ( b, i ) => new { b.Value, i } ) on cpKey.i equals cpValue.i
     select new KeyValuePair<string, string>( cpKey.Value, cpValue.Value ) ).ToDictionary( kvp => kvp.Key, kvp => kvp.Value )

  } ).ToDictionary( itm => itm.Section, itm => itm.kvps );

Explanation:

  • Line 1: Our end goal object is a Dictionary where the key is the Section name and the value is a sub-dictionary with all the keys and values found in that section.
  • Line 2: The regex needs IPW because we have commented the pattern. It needs multiline because we are spanning multiple lines and need ^ to match each individual line and not just the beginning.
  • Line 5: This is the easiest item, simply access the named capture group “Section” for the section name.
  • Line 7 (.Captures) : Each one of the keys and values are in the specialized capture collection property off of the match.
  • Line 7 (.Cast<Capture>) : Since capture is specialized list and not a true generic list, such as List<string> we are going to Cast it(Cast<(Of <(TResult>) it (to IEnumerable<(Of <(T>)>),so we can access the standard query operators, i.e. the extension methods which are available to IEnumerable<T>. Short answer, so we can call .Select.
  • Line 7 (.Select): Because each list does not have a direct way to associate the data, we are going to create a new object that has a property which will have that index number, along with the target data value. That will allow us join it to the other list.
  • Line 7 (Lambda) : The lambda has two parameters, the first is our actual regex Capture object represented by a. The i is the index value which we need for the join. We then call new and create a new entity with two properties, the first is actual value of the Key found of the Capture class property “Value” and the second is i the index value.
  • Line 8 (Join) : We are going to join the data together using the direct properties of our new entity, but first we need to recreate the magic found in Line 7 for our Values capture collection. It is the same logic as the previous line so I will not delve into its explanation in detail.
  • Line 8 (on cpKey.i equals cpValue.i) : This is our association for the join on the new entities and yay, where index value i equals the other index value i allows us to do that. This is the keystone of all we are doing.
  • Line 9 (new KeyValuePair) : Ok we are now creating each individual linq projection item of the data as a KeyValuePair object. This could be removed for a new entity, but I choose to use the KeyValuePair class.
  • Line 9 (ToDictionary) : We want to easily access these key value pairs in the future, so we are going to place the Key into a Key of a dictionary and the dictionary key’s value from the actual Value.
  • Line 11 (ToDictionary) : Here is where we take the projection of the previous lines of code and create the end goal dictionary where the key name is the section and the value is the sub dictionary created in Line 9.

Whew…what is the result?

Console.WriteLine( InIFile["WindowSettings"]["Window Name"] ); // Jabberwocky
Console.WriteLine( InIFile["Logging"]["Directory"] );          // C:\Rosetta Stone\Logs

Summary

Thanks to the power of regular expressions and Linq we don’t have to use the old methods to extract and process the data. We can easily access the information using the newer structures. Hope this helps and that you may have learned something new from something old.

Code All in One Place

Here is all the code so you don’t have to copy it from each section above. Don’t forget to include the using System.Text.RegularExpressions to do it all.

string data = @"[WindowSettings]
Window X Pos=0
Window Y Pos=0
Window Maximized=false
Window Name=Jabberwocky

[Logging]
Directory=C:\Rosetta Stone\Logs
";
string pattern = @"
^                           # Beginning of the line
((?:\[)                     # Section Start
     (?<Section>[^\]]*)     # Actual Section text into Section Group
 (?:\])                     # Section End then EOL/EOB
 (?:[\r\n]{0,}|\Z))         # Match but don't capture the CRLF or EOB
 (                          # Begin capture groups (Key Value Pairs)
  (?!\[)                    # Stop capture groups if a [ is found; new section
  (?<Key>[^=]*?)            # Any text before the =, matched few as possible
  (?:=)                     # Get the = now
  (?<Value>[^\r\n]*)        # Get everything that is not an Line Changes
  (?:[\r\n]{0,4})           # MBDC \r\n
  )+                        # End Capture groups";

Dictionary<string, Dictionary<string, string>> InIFile
= ( from Match m in Regex.Matches( data, pattern, RegexOptions.IgnorePatternWhitespace | RegexOptions.Multiline )
    select new
    {
        Section = m.Groups["Section"].Value,

        kvps = ( from cpKey in m.Groups["Key"].Captures.Cast<Capture>().Select( ( a, i ) => new { a.Value, i } )
                 join cpValue in m.Groups["Value"].Captures.Cast<Capture>().Select( ( b, i ) => new { b.Value, i } ) on cpKey.i equals cpValue.i
                 select new KeyValuePair<string, string>( cpKey.Value, cpValue.Value ) ).ToDictionary( kvp => kvp.Key, kvp => kvp.Value )

    } ).ToDictionary( itm => itm.Section, itm => itm.kvps );

Console.WriteLine( InIFile["WindowSettings"]["Window Name"] ); // Jabberwocky
Console.WriteLine( InIFile["Logging"]["Directory"] );          // C:\Rosetta Stone\Logs

Are C# .Net Regular Expressions Fast Enough for You?

FastEnough It is generally accepted that there is an overhead in using regular expression parsing and  there is truth to that statement. But the premise of this article is that the difference is really negligible and if its an excuse to not learn regex pattern processing because of that, well that is just plain foolish.  Just like any high level language programming construct which gives the developer a quicker development time, the price paid is in extra cycles it takes to complete it. But is the perception that usage of regular expressions are really that slow? Let me  show you by example….

The MSDN forums are littered with the vague warnings “Don’t use regex, its slow”. I have seen that advice given and yes its based on a truth as mentioned before, but they never add in the time it takes to subsequently process the information.  They forget that in most cases Regular Expressions already provides the post processing needs such storage and data extraction abilities built in.

It comes down to…Is it fast enough for you?

If one needs to shave off milliseconds from a multi-million operation, then don’t use regular expressions or at least do tests first. But for day to day use, I believe its always the right answer. With that premise, let us test some code.

Premise

The usual contender for a regular expression is string.Split. Now string.Split is a fast little function and very useful, but one has to then  consider the ancillary processing and I have found a real example culled from the forums.

The Test

A user asked what could be used to parse specific text and whether regular expressions could be used. The example text, changed slightly, the value 41 was used instead of 0, looked like this

name="rating_count" value="41"

The user was interested in achieving the value of 41 as an integer and wondered which is better.

The Opponent

Right out of the gate there was an answer saying Regex is slower and gave an example which actually failed. I have modified it to work. The originator had tested zero and didn’t realize they were getting a default value instead of an extracted value because it was only splitting on the ‘=’ character. In my test it is fixed and placed into a static method called Highway:

public static int Highway(string text)
{

string []parts = text.Split( new char[] { ' ', '=', '\x22' }, StringSplitOptions.RemoveEmptyEntries );
int value = 0;
for(int index = 0; index < parts.Length-1; index++)
   if(parts[index].ToLower() == "value") {
      string tempValue = parts[index+1];
      int.TryParse(tempValue, out value);
      break;
    }

return value;

}

Note that \x22 is hex for quotes(“).

The Contender

Here is what I wrote to do the same job in Regular Expressions which I called MyWay (get it MyWay or the Highway…bwhahahaha…nevermind)

public static int MyWay( string text )
{
int value = 0;

int.TryParse( Regex.Match( text, "(?:value=\x22)([^\x22]+)", RegexOptions.Compiled ).Groups[1].Value, out value );

return value;
}

Now I knew that this would be run multiple times so I told .Net to compile the expression for future uses after the first, but if this is a one off operation one should not do that.

The Cage

Here is the testing arena for the two operations. I throw away the first value, which does help regex in the long run due to the compilation, but frankly a one off test without the compilation flag is not to shabby. If you try this at home don’t forget System.Diagnostics using.

string data = string.Format("name={0}rating_count{0} value={0}41{0}", "\x22");
Stopwatch st = new Stopwatch();
int index;
int totalRuns = 100000;

Highway( data ); // Do a test and throw it out

st.Start();
for (index = 0; index < totalRuns; index++)
    Highway( data );
st.Stop();

Console.WriteLine( "Non Regex:\t{0}\tAvg Per Run:\t{1}", st.Elapsed.TotalMilliseconds, st.Elapsed.TotalMilliseconds / totalRuns );

MyWay( data ); // Throw out the first

st.Reset();
st.Start();
for ( index = 0; index < totalRuns; index++ )
    MyWay( data );
st.Stop();

Console.WriteLine( "Regex:\t\t{0}\tAvg Per Run:\t{1}", st.Elapsed.TotalMilliseconds, st.Elapsed.TotalMilliseconds / totalRuns );

Results

So what happens? Well in Release mode for 100000 times produces results like this result on a dual core machine (Total Milliseconds values):

Non Regex:      213.9509        Avg Per Run:    0.002139509
Regex:          226.7564        Avg Per Run:    0.002267564

So the difference was not really that great…and though the times for the non regex were usually faster overall, there wasn’t too great of a difference between the two.

So one has to ask, “Is Regex fast enough for you?”

I believe that to be yes! Note, in fairness, poorly formed regex patterns will slow the parser down, but garbage in garbage out; so yes your mileage will vary.

Linq Orderby a Better IComparer in C#

Sometimes IComparer falls short when on has a need to sort on different, for lack of a better term, data columns. Before writing an IComparer interface for sort, try using Linq’s Orderby.

In the forums the user had data, in string lines, which looked like this

3 months ending 9/30/2007
9 months ending 9/30/2007
3 months ending 9/30/2008
9 months ending 9/30/2008

The user needed the white items sorted first in ascending fashion and the red year items sorted descending. Because the data was all in a string and needed differing sorts, je was having problems with sort with a custom IComparer class.

I recommend that he use regex to parse out the items then use linq to sort. Here is the result.  Note I merged all data into one string where each line is a true line.

string input =
@"3 months ending 9/30/2007
9 months ending 9/30/2007
3 months ending 9/30/2008
9 months ending 9/30/2008";

string pattern = @"(?<Total>\d\d?)(?:[^\d]+)(?<Date>[\d/]+)";

var items =
    from Match m in Regex.Matches( input, pattern )
    select new
    {
        Total = m.Groups["Total"].Value,
        Date = DateTime.Parse( m.Groups["Date"].Value ),
        Full = m.Groups[0].Value
    };

var values = from p in items
             orderby p.Total, p.Date.Year descending
             select p;

foreach ( var itm in values )
    Console.WriteLine( itm.Full );

/* Outputs
3 months ending 9/30/2008
3 months ending 9/30/2007
9 months ending 9/30/2008
9 months ending 9/30/2007
             */

Regex To Linq to Dictionary in C#

This article demonstrates these concepts:

  1. Regex extraction of Key Value pairs and placing them into named capture groups.
  2. Linq extraction of the Key Value pairs extracted from the matches of Regex.
  3. Dictionary creation from Linq using the ToDictionary method.

I answered this  on the MSDN forums, the user had this data in key value pairs delimited by the pipe:

abc:1|bbbb:2|xyz:45|p:120

Keys values separators

The need was to get the keys and values into a dictionary. The following code uses named regex group matches which are used in Linq to extract the keys and their values. Once that is done within the linq the extended method ToDictionary is used to create the dictionary on the fly. Here is the code:

string input = "abc:1|bbbb:2|xyz:45|p:120";
string pattern = @"(?<Key>[^:]+)(?:\:)(?<Value>[^|]+)(?:\|?)";

Dictionary<string, string> KVPs
    = ( from Match m in Regex.Matches( input, pattern )
      select new
      {
          key = m.Groups["Key"].Value,
          value = m.Groups["Value"].Value
       }
       ).ToDictionary( p => p.key, p => p.value );

foreach ( KeyValuePair<string, string> kvp in KVPs )
    Console.WriteLine( "{0,6} : {1,3}", kvp.Key, kvp.Value );

/* Outputs:
 abc :   1
bbbb :   2
 xyz :  45
  p : 120
 */

Regex Split Pitfalls

This article goes into how to appropriately use the Regex.Split function and the pitfalls one may run into when using it. This article is based on .Net 3.5 using C#, but can be applied to any version of .Net or language.

Overview

Regex replace is a great tool which allows one to do more than the simple string.Split but it has some serious downfalls to the uninitiated. Let us first review how it works in code. This example I term the Kool-aid example for it looks very much like string.Split; it conveys that it is easy to use….

foreach (string str in Regex.Split("Linq-to-SQL", "-"))
    Console.WriteLine(str);

/* Writes

Linq
to
SQL

*/

Pretty obvious and it appears to work like string.split, we are splitting on the dash and it works. But one might as well use string.Split for the easy examples, for in real life one doesn’t use regex split on basic patterns.

Pitfalls

Since a regex pattern is used to match specific text, one believes that because they have a pattern which picks up a valid matches, it can transfer into regex.split as is…oh no. One has to be extra vigilant with the pattern because one probably wants to split on a particular item, but forgets its surrounded by and contains whitespace and line feeds.

For example, since people work in textual items and not esoteric numeric examples, say you wanted to remove certain lines from paragraphs. Say this text is what is the originating text

Cinthia Blake
Olso Norway
You finished in 1st place.
Fred Alter
Chicago USA
You finished in 4th place.

The goal is to remove the finished place lines, shown in italic. The natural thing to do is to create a match for the line. A pattern such as

Y[^\.]+\.

Will find text that starts with a Y and match/consume til it finds a period. Run that and the data through a regex pattern testing tool and shows that we get this result:

You finished in 1st place.
You finished in 4th place.

Great! So one loads it into code and runs it such as:

string input = @"Cinthia Blake
Olso Norway
You finished in 1st place.
Fred Alter
Chicago USA
You finished in 4th place.";

string pattern = @"Y[^\.]+\.";

foreach (string str in Regex.Split(input, pattern))
    Console.WriteLine( str );

/* Outputs:

Cinthia Blake[CR][LF]
Olso Norway[CR][LF]
[CR][LF]
[CR][LF]
Fred Alter[CR][LF]
Chicago USA[CR][LF]
[CR][LF]

*/

I have included the whitespace of the \r\n’s as [CR][LF] to see the problem. Instead of a clean list of name, location, name, location…we now have name location, line line, name location, line. Whoa! where did those lines come from!

The user thinks that Regex replace is not working; its returning extra lines….and gives up!

No. the problem is that the whitespace was not accounted for when using the split and making the pattern. Yes it matched the lines and dutifully split and left the whitespace; almost as an after thought. Frustrating to the user.

Conclusion

You have to be extra vigilant about using Regex.Split and its pattern. Be painfully aware of whitespace. Here is a pattern to handle the whitespace and achieve the result intended:

string input = @"Cinthia Blake
Olso Norway
You finished in 1st place.
Fred Alter
Chicago USA
You finished in 4th place.";

string pattern =
@"[\r\n]{0,2}        # If there is Line feeds before then match it.
Y[^\.]+\.            # original pattern
\s*                  # Maybe there are spaces after the sentance...get those
[\r\n]{0,2}          # *If* there is Linefeeds after then match it.
";

// IgnorePatternWhitespace allows us to comment the code, it does
// NOT apply to the processing of spaces within the input text.

foreach (string str in Regex.Split(input, pattern, RegexOptions.IgnorePatternWhitespace))
    Console.WriteLine( str );

/* Outputs:

Cinthia Blake[CR][LF]
Olso Norway
Fred Alter[CR][LF]
Chicago USA

*/

Now we have what we expected, two groups split on the appropriate sentences. I have shown where there is still whitespace in the result comment.

But long story short, be extra vigilant about whitespace in real world regex replace work. Just matching on something will not fit the bill.

C# Regex MatchCollection Meets Linq

Here is a code snippet which accomplishes these following goals:

  • It marries a C# Regular Expression MatchCollection to a property list using Linq.
  • It uses a Regex Pattern which creates named capture groups which Linq can easily exploit in the join of two data lists.

Let me show you the code. Don’t get hung up on the pattern or what it is doing. What needs to be known is that the pattern places the data matched into Named Capture Groups of Key and Value. The actual Key value corresponds to a property on a real class. Using reflection we will find that property on the class and link its property name to the value stored. That will allow us to change that properties value on the class from the Value we get from the regex match.

The goal of the Linq code is to join into another list, which is the list of properties from the class and the commonality is the PropertyInfo.Name found in that list. Once that data is joined a new object will be created which will have the actual property object and the value of Value. That new list will allow the following operations to set target properties value to Value of the match in Match collection.

public static T ASCIISerializeOut<T>( string targetSerialized )
     where T : new()
{

     T targetInstance = new T();

     string pattern = string.Format( @"(?<Key>[^{0}]*)(?:{0})(?<Value>[^{1}]*)(?:{1}?)",
           Seperators.cnKVPSeperator,   // "±"
           Seperators.cnSeperator );    // "¶"

     MatchCollection mcKVPs = Regex.Matches( targetSerialized,
                                             pattern,
                                             RegexOptions.Compiled );

     var kvps = from Match m in mcKVPs
                where mcKVPs != null
                where mcKVPs.Count > 0
                join prp in GetPublicProperties<T>() on m.Groups["Key"].Value equals prp.Name
                select new
                {
                    prop  = prp,
                    Value = m.Groups["Value"].Value ?? string.Empty
                };

     foreach (var item in kvps)
         item.prop.SetValue( targetInstance, item.Value, null );

     return targetInstance;

 }

 /// <summary>
 /// Return all public properties which are of string type from T class.
 /// </summary>
 public static IEnumerable<PropertyInfo> GetPublicProperties<T>()
 {
     return from p in typeof( T ).GetProperties()
            where p.PropertyType == typeof( string )
            select p;
 }
  • Line 01: The function takes in text such as "AProp±AValue¶BProp±BValue" which needs to be serialized into a newly created class of type T. The first item in the pattern is the property name AProp followed by a seperator ± then the value of the property AValue and finally a key value seperator: ¶. our regex will create individual matches for each of the key value pairs.
  • Line 07: This pattern when used will get key and value pair combinations and place them in named groups of Key and Value of the match.
  • Line 11: Get all the key/value pair combinations into the match collection.
  • Line 15: Linq starts here: We define a Var object kvps (key value pairs) which will use /loop each match from the match collection.
  • Line 16: Make sure the collection is not null.
  • Line 17: Make sure there are one or more matches.
  • Line 18: Get all the public properties of class T and make a join to our collection data. Key should match the property Name.
  • Line 19: Each match found within the property where the names are the same will create this new object below with two properties.
  • Line 21: Save the actual property object, we need that later to load data.
  • Line 22: Get the value out of the Value group and save that as well. Note, if it is null, just use string.Empty. Thanks Null Coalescing operation (??).
  • Line 25: Now for each var object created enumerate through it and load the target values into our newly minted class object of T.
  • Line 26: Set the target item’s property to the value found from the regex matches.
  • Line 28: Return the new object with the original text data serialized in.
  • Line 35: Return an enumeration of all generic string properties of the type T.

For completeness see my post entitled A C# ASCII Serializer Generic Method for Class Objects which has the actual downloaded project and working test example. (Post coming soon!)

.Net Regex IgnorePatternWhiteSpace Only Applies to the Regex Processor

The option IgnorePatternWhiteSpace only applies to the regex processor reading the pattern and not in how it handles the input text. The option allows one to document a pattern by placing items on different lines and allowing for comments using the # escape character. Using the option I can document my regex as follows:


string pattern = @"
^                # Beginning of The Line
(?<Text>[^\d]+)  # Move all non numbers into the Text group.
(?<Number>\d+)   # Get all numbers into the Number Capture group.
";

Otherwise without the option the pattern must be written like this:


string pattern = @"^(?<Text>[^\d]+)(?<Number>\d+)";

Hence the readability for patterns is greatly increased.

C# Regular Expression Suggestions when working with C#

Here are some things which an make your working with C# regular expression patterns much easier in reading and processing in C#.

Suggestion 1 – Avoid C# Text Escape Pollution

When working in C# it can become confusing when one has to deal with string literals and escapes even before dealing with the regular expression escapes. For example if we have an escape such as word boundary in regex (\b) we have to escape the escape in C# such as

string pattern = "\\b";

That gets confusing because we don’t want have to deal with C#…we are working in regex and \\b does not mean what we think it is (though it gets sent to the parse appropriately). What we should do is use C# literal convention (@) infront of the string such as

string pattern = @"\b";

The two shown C# lines are functionally equivalent…but now we can concentrate on the regex pattern with no pollution from C# escapes.

Suggestion 2 – Use Regex Ascii Escapes for quotes

Some people will go out of there way to use double quotes (@” “” “) in C# to search for a double quote in a regex pattern. This is confusing try using the Regex Ascii escape pattern instead. Below is a code sample that is equivalent:

string pattern = @""""; // I am only searching for a quote
 
// VS
 
string pattern2 = @"\x22"; // Much better
 

Note if you are using Expresso as your regex editor, it provides a handy way of finding those escapes:

image

Suggestion 3 – Use the IgnorePatternWhitespace option

This option confuses beginning regex-ers because they think it applies to what the pattern does….when in fact it is a preprocessing instruction for the regex parser solely! What it does is it allows you to put space in a pattern and have it hang over lines for easier reading. Here is a sample I created for a forum post where I was able to break out a long pattern. Thereby commenting it and making it easier to read. without the IgnorePatternWhitespace option, one would have to remove the comments and make it all one line:

string text =
@"5:16:04.859 PM:  07:18:12p  2.33   0.45   NH4                      9558    WORK
5:16:06.000 PM:  07:18:13p  2.29   0.31   RIN                     10554    WORK
5:16:07.625 PM:  07:18:15p  2.33   0.44   NH4                      9645    WORK
5:16:09.125 PM:  07:18:16p  2.29   0.32   RIN                     10400    WORK";

 
string pattern =
@"^(?<Time1>[^\s]*)  # Start of line, capture first time and place into Time1
   (?:\s*)           # Match but don't capture (MBDC) the space (Used as an anchor)
   (?<AmPm1>[AP]M)   # Get the AM | PM and put it into AmPm1 capture group.
   (?:\:\s*)         # MBDC : and space
   (?<Time2>[^ap]*)  # Time 2 Capture
   (?<AmPm2>[ap])    # AmPm capture
   (?:\s*)
   (?<Col1>[^\s]*)   # Data column 1
   (?:\s*)
   (?<Col2>[^\s]*)   # Data column 2
   (?:\s*)
   (?<Col3>[^\s]*)   # Data column 3
   (?:\s*)
   (?<Col4>[^\s]*)   # Data column 4
   (?:\s*)
   (?<Col5>[^\s]*)   # Data column 5";

 
Regex rgx = new Regex(pattern,
                  RegexOptions.Multiline | // ^ and $ match Beginning and EOL.
                  RegexOptions.IgnorePatternWhitespace); // Allows us to do the comments.
 
 
string[] groupNames = rgx.GetGroupNames();
 
Console.WriteLine("Groups: ({0}){1}", string.Join(") (", groupNames), System.Environment.NewLine);
 
MatchCollection mc = rgx.Matches(text);
 
foreach (Match m in mc)
    if (m.Success)
    {
        Console.WriteLine("Match:");
        foreach (string name in groupNames)
            Console.WriteLine("{0,10} : {1}", name, m.Groups[name]);
 
        Console.WriteLine("{0}Time1 ({1}) Time2 ({2}){0}",
            System.Environment.NewLine,
            m.Groups["AmPm1"].Value,
            ( ( m.Groups["AmPm2"].Value == "a" ) ? "AM" : "PM" ));
    }
 
 
 

.Net Regex MatchEvaluator

The regex match evaluator gives one the ability to do a post process match in step for each match found. It is a handy way to normalize the match before sending it on. In that process could easily change or alter the match when needed. It also allows us to eat the match and have it return nothing!

Here is an example which I had from the boards. The user wanted to use regex replace to remove all alphabetic characters but return all numbers and a decimal place. But he had situations where there were two decimals. In that situation then only return the first one.

12abc.def34 becomes 12.34

.56a.d78 becomes .5678

Here is the code to accomplish that

// Only worry about decimals and letters.
string pattern = @"
(?<Decimal>\.) |       # Check for decimal
(?<Letter>[A-Za-z]+)    # Check for letter
";

string data = "0.ab.c1d.23"; // We want 0.123

int decimalPointCount = 0;

// Here is the Match Evaluator for Post Processing.
// We will eat the letters and return the decimals.
// The match evaluator will feed us every match found as it finds it.
MatchEvaluator CatchMultipleDecimals = delegate( Match m )
{
// Check for a decimal match only and return only the first one found
if (string.IsNullOrEmpty(m.Groups["Decimal"].Value) == false)
{
   if (decimalPointCount++ > 0) // We have gone over…return nothing!
      return string.Empty;
   else
      return m.Groups["Decimal"].Value; // Return the .
}

// We are only capturing text...so return nothing
// on any other match we may get.
   return string.Empty;

};

MatchEvaluator myEvaluator = new MatchEvaluator(CatchMultipleDecimals);

// Remember IgnoreWhiteSpace option only applies to how the regex parser
// processes the pattern and not the data. Since I have created my
// pattern split over multiple lines for readability I need to tell
// the regex parser to strip all whitespace from my pattern *before* it
// looks at any data. It has nothing to do with how the data is matched or processed.
Console.WriteLine(Regex.Replace(data, pattern, myEvaluator, RegexOptions.IgnorePatternWhitespace));  

// Outputs 0.123 from the text 0.ab.c1d.23

Explanation

  • Line 2  : The pattern will only capture a decimal point or letter(s).
  • Line 14  : Here is the delegate that has the code which will be called whenever a match occurs.
  • Line 17 : Since we have placed the items into groups, we will check the Decimal group for any data.  If data exists, we return a decimal point only once, otherwise we return string.empty.
  • Line 21 : We could check for the Letter group, but that is not needed. Since the decimal is handled already, we will eat whatever is at this point and return string.empty to show that this is a non match.
  • Line 27 : We use regex replace to return any numbers with only one decimal thanks to the Match Evaluator we have created and used.

CSV to .Net Dictionary using Regular Expressions

This is the process or parse CSV data and place it into a dictionary for storage in C# in .Net 2 and above. The dictionary will have a key of the actual line number, zero based, and a list of the data items with the commas and quotes removed. The regex pattern can handle data that looks like this

“xxx” or “xxx,xxx” or ‘xxx’ or ‘xxx,xxx’ or xxx

It will then be placed into a dictionary where each line is the actual row line. The below regex is designed to do these things

  1. Each match represents one data line or row.
  2. Each of the data items are inserted into the column capture to keep the data consistent with the current match.
  3. Handles both the single and double quote.
  4. The pattern can handle the comma within the quotes.
  5. The pattern uses an if condition see my blog entitled Regular Expressions and the If Conditional.
  6. Use of named capture group Column will hold the data.
Regex rx = new Regex(
@"((?([\x27\x22])         # Regex If single/double quotes 
   (?:[\x27\x22])         # \\x27\\x22 are single/double quotes
   (?<Column>[^\x27\x22]*)# Match this in the quotes
(?:[\\x27\\x22])
|
(?<Column>[^,\r\n]*]*))   # Else Not within quotes
(?:,?))+                  # Either a comma or EOL
(?:$|[\r\n]{0,2})         # Handle EOL or EOB",
                 RegexOptions.IgnorePatternWhitespace);
Dictionary<int, List<string>> data
    = new Dictionary<int,List<string>>();
string text =
@"'1','01000000043','2','4',20061102
'2',333,444,'555'";
int lineNumber = 0;
foreach(Match m in rx.Matches(text))
    if (m.Success)
    {
        List<string> line = new List<string>();
        foreach (Capture cp in m.Groups["Column"].Captures)
            if (string.IsNullOrEmpty(cp.Value) == false)
                line.Add(cp.Value);
        if (line.Count > 0)
            data.Add(lineNumber++, line);
    }
    foreach (KeyValuePair<int, List<string>> kvp in data)
        Console.WriteLine("Line {0} : {1}",
            kvp.Key.ToString(),
            string.Join(" ", kvp.Value.ToArray()));
Console Output

Line 0 : 1 01000000043 2 4 20061102
Line 1 : 2 333 444 555

Please visit WP-Admin > Options > Snap Shots and enter the Snap Shots key. How to find your key