LINQ With Strings

Some interesting approaches to string querying and conversion via LINQ

LINQ seems to have found its way into the vocabulary of the average .NET Developer. I find myself using LINQ more and more often, usually as a quick way to manipulate objects, sort them, filter and so on.

LINQ can be used to query and transform strings and collections of strings. It can be especially useful with semi-structured data in text files. LINQ queries can be combined with traditional string functions and regular expressions. For example, you can use the Split method to create an array of strings that you can then query or modify by using LINQ. You can use the IsMatch method in the where clause of a LINQ query. And you can use LINQ to query or modify the MatchCollection results returned by a regular expression.

Here are some interesting tidbits of sample code to manipulate strings with LINQ. All of them are assembled in a single project that you can download and play with, located at the bottom of this article.

FIND SENTENCES:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace LINQSTRINGS
{
public class FindSentences
{
public static void Go()
{

Console.WriteLine("FIND SENTENCE:\r\n");

string text = @"Historically, the world of data and the world of objects " +
@"have not been well integrated. Programmers work in C# or Visual Basic " +
@"and also in SQL or XQuery. On the one side are concepts such as classes, " +
@"objects, fields, inheritance, and .NET Framework APIs. On the other side " +
@"are tables, columns, rows, nodes, and separate languages for dealing with " +
@"them. Data types often require translation between the two worlds; there are " +
@"different standard functions. Because the object world has no notion of query, a " +
@"query can only be represented as a string without compile-time type checking or " +
@"IntelliSense support in the IDE. Transferring data from SQL tables or XML trees to " +
@"objects in memory is often tedious and error-prone.";

// Split the text block into an array of sentences.
string[] sentences = text.Split(new char[] { '.', '?', '!' });

// Define the search terms. This list could also be dynamically populated at runtime.
string[] wordsToMatch = { "Historically", "data", "integrated" };

// Find sentences that contain all the terms in the wordsToMatch array.
// Note that the number of terms to match is not specified at compile time.
var sentenceQuery = from sentence in sentences
let w = sentence.Split(new char[] { '.', '?', '!', ' ', ';', ':', ',' },
StringSplitOptions.RemoveEmptyEntries)
where w.Distinct().Intersect(wordsToMatch).Count() == wordsToMatch.Count()
select sentence;

// Execute the query. Note that you can explicitly type
// the iteration variable here even though sentenceQuery
// was implicitly typed.
foreach (string str in sentenceQuery)
{
Console.WriteLine(str);
}

// Keep the console window open in debug mode.
Console.WriteLine("Press any key to exit");
Console.ReadKey();
}
}
/* Output:
Historically, the world of data and the world of objects have not been well integrated
*/

}

QUERY AS STRING:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace LINQSTRINGS
{
public class QueryAString
{
public static void Go()
{
Console.WriteLine("QUERY AS STRING:\r\n");

string aString = "ABCDE99F-J74-12-89A";

// Select only those characters that are numbers
IEnumerable<char> stringQuery =
from ch in aString
where Char.IsDigit(ch)
select ch;

// Execute the query
foreach (char c in stringQuery)
Console.Write(c + " ");

// Call the Count method on the existing query.
int count = stringQuery.Count();
Console.WriteLine("Count = {0}", count);

// Select all characters before the first '-'
IEnumerable<char> stringQuery2 = aString.TakeWhile(c => c != '-');

// Execute the second query
foreach (char c in stringQuery2)
Console.Write(c);

Console.WriteLine(System.Environment.NewLine + "Press any key to exit");
Console.ReadKey();
}
}
/* Output:
Output: 9 9 7 4 1 2 8 9
Count = 8
ABCDE99F
*/
}

QUERY WITH REGEX:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace LINQSTRINGS
{
public class QueryWithRegEx
{
public static void Go()
{
Console.WriteLine("QUERY WITH REGEX:\r\n");

// Modify this path as necessary.
string startFolder = @"c:\program files\Microsoft Visual Studio 9.0\";

// Take a snapshot of the file system.
IEnumerable<System.IO.FileInfo> fileList = GetFiles(startFolder);

// Create the regular expression to find all things "Visual".
System.Text.RegularExpressions.Regex searchTerm =
new System.Text.RegularExpressions.Regex(@"Visual (Basic|C#|C\+\+|J#|SourceSafe|Studio)");

// Search the contents of each .htm file.
// Remove the where clause to find even more matches!
// This query produces a list of files where a match
// was found, and a list of the matches in that file.
// Note: Explicit typing of "Match" in select clause.
// This is required because MatchCollection is not a
// generic IEnumerable collection.
var queryMatchingFiles =
from file in fileList
where file.Extension == ".htm"
let fileText = System.IO.File.ReadAllText(file.FullName)
let matches = searchTerm.Matches(fileText)
where searchTerm.Matches(fileText).Count > 0
select new
{
name = file.FullName,
matches = from System.Text.RegularExpressions.Match match in matches
select match.Value
};

// Execute the query.
Console.WriteLine("The term \"{0}\" was found in:", searchTerm.ToString());


foreach (var v in queryMatchingFiles)
{
// Trim the path a bit, then write
// the file name in which a match was found.
string s = v.name.Substring(startFolder.Length - 1);
Console.WriteLine(s);

// For this file, write out all the matching strings
foreach (var v2 in v.matches)
{
Console.WriteLine(" " + v2);
}
}

// Keep the console window open in debug mode
Console.WriteLine("Press any key to exit");
Console.ReadKey();
}

// This method assumes that the application has discovery
// permissions for all folders under the specified path.
static IEnumerable<System.IO.FileInfo> GetFiles(string path)
{
if (!System.IO.Directory.Exists(path))
throw new System.IO.DirectoryNotFoundException();

string[] fileNames = null;
List<System.IO.FileInfo> files = new List<System.IO.FileInfo>();

fileNames = System.IO.Directory.GetFiles(path, "*.*", System.IO.SearchOption.AllDirectories);
foreach (string name in fileNames)
{
files.Add(new System.IO.FileInfo(name));
}
return files;
}
}
}

REVERSE:


using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace LINQSTRINGS
{
public class Reverse
{
public static void Go()
{
Console.WriteLine("REVERSE:\r\n");

string s = "The quick brown fox jumped over the henhouse.";
var reversed = (
from c in s select c)
.Reverse().ToArray();

// we called ToArray above, so we can just create a new string from it:
string s2 = new string(reversed);

Console.Write(s2);
Console.WriteLine(System.Environment.NewLine + "Press any key to exit");
Console.ReadKey();

}
}
}

STRING XOR:

using System;
using System.Collections.Generic;
using System.Linq;

namespace LINQSTRINGS
{
public class StringXOR
{
public static void Go()
{
Console.WriteLine("XOR:\r\n");

string s = "The quick brown fox jumped over the henhouse.";
string s2 = "";
// do XOR on each character in string:
IEnumerable<char> stringQuery2 = s.Select(c => c = (char) (c ^ 129));

// Execute the second query
foreach (char c in stringQuery2)
{
// add each xor-ed character to a new string:
s2 += c;
Console.Write(c);
}
Console.Write("\r\n");

//Do an XOR on each character of the XOR-ed string to get back the original string:
IEnumerable<char> stringQuery3 = s2.Select(c => c = (char) (c ^ 129));

foreach (char c in stringQuery3)
{
Console.Write(c);
}

Console.WriteLine(Environment.NewLine + "Press any key to exit");
Console.ReadKey();
}
}
}

WORD COUNT:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace LINQSTRINGS
{
public class CountWords
{
public static void Go()
{
Console.WriteLine("WORD COUNT:\r\n");

string text = @"Historically, the world of data and the world of objects" +
@" have not been well integrated. Programmers work in C# or Visual Basic" +
@" and also in SQL or XQuery. On the one side are concepts such as classes," +
@" objects, fields, inheritance, and .NET Framework APIs. On the other side" +
@" are tables, columns, rows, nodes, and separate languages for dealing with" +
@" them. Data types often require translation between the two worlds; there are" +
@" different standard functions. Because the object world has no notion of query, a" +
@" query can only be represented as a string without compile-time type checking or" +
@" IntelliSense support in the IDE. Transferring data from SQL tables or XML trees to" +
@" objects in memory is often tedious and error-prone.";

string searchTerm = "data";

//Convert the string into an array of words
string[] source = text.Split(new char[] { '.', '?', '!', ' ', ';', ':', ',' }, StringSplitOptions.RemoveEmptyEntries);

// Create and execute the query. It executes immediately
// because a singleton value is produced.
// Use ToLowerInvariant to match "data" and "Data"
var matchQuery = from word in source
where word.ToLowerInvariant() == searchTerm.ToLowerInvariant()
select word;

// Count the matches.
int wordCount = matchQuery.Count();
Console.WriteLine("{0} occurrences(s) of the search term \"{1}\" were found.", wordCount, searchTerm);

// Keep console window open in debug mode
Console.WriteLine("Press any key to exit");
Console.ReadKey();
}
}
/* Output:
3 occurrences(s) of the search term "data" were found.
*/

}

You can download the Visual Studio 2008 Solution here.

By Peter Bromberg   Popularity  (6937 Views)