Writing Efficient String Functions in C#

Posted by: Armando , on 2/12/2010, in Category C#
Views: 110299
Abstract: To write efficient string handling functions, it is important to understand the characteristics of string objects in C#.
Writing Efficient String Functions in C#
 
The .NET Framework provides a set of powerful string functions. These building blocks can be used to write more complex algorithms for handling string data. However developers aiming to write fast and efficient string functions must be careful of how they use those building blocks.
 
To write efficient string handling functions, it is important to understand the characteristics of string objects in C#.
 
String Characteristics
 
First and foremost it is important to know that strings in .NET are class objects. There is no difference between the types System.String and string, they are both class objects. Unlike value types, class objects are stored in the heap (instead of the stack). This is an important fact because it means that creating a string object can trigger garbage collection, which is costly in terms of performance. In terms of string functions, this means we want to avoid creating new strings as much as possible.
 
However that is easier said than done. Another important thing about strings in .NET is that they are immutable. This means string objects cannot be modified. To edit a string object, you have to instead create a new string that will have the modification.
 
Working with Characters
 
The solution is to work with characters instead of strings as much as possible. The char object in C# is a value type, which means all char variables are stored in the stack. Furthermore, since a string is a collection of characters, converting between chars and strings is very simple.
 
To convert a string to a char array, use the ToCharArray() .NET function:
 
 
string myStr = “hello world”;
char[] myStrChars = myStr.ToCharArray();
 
To convert a char array back to a string, simply create a new instance of a string:
 
 
char[] myChars = { ‘h’, ‘e’, ‘l’, ‘l’, ‘o’, ‘ ‘, ‘w’, ‘o’, ‘r’, ‘l’, ‘d’ };
string myStr = new string(myChars);
 
Writing efficient string functions thus boils down to working with char arrays. However you might remember that arrays are stored in the heap. Thus there isn’t much difference between working with a string and a character array in terms of performance if we end up handling arrays in the same way as strings.
 
Yet this does not mean working with array is not faster. For one thing, we can make use of dynamic arrays such as List (or ArrayList in .NET Framework 1.1) to make our array management as efficient as possible.
 
Example Function
 
Let's write a very simple string function and compare the difference between using strings and char arrays. The function will capitalize all the vowels in a string (working with the English alphabet), and make all other characters lowercase.
 
Using just strings: 
 
public string CapitalizeVowels(string input)
{
    if (string.IsNullOrEmpty(input)) //since a string is a class object, it could be null
        return string.Empty;
    else
    {
        string output = string.Empty;
 
        for (int i = 0; i < input.Length; i++)
        {
            if (input[i] == 'a' || input[i] == 'e' ||
                input[i] == 'i' || input[i] == 'o' ||
                input[i] == 'u')
                output += input[i].ToString().ToUpper(); //Vowel
            else
                output += input[i].ToString().ToLower(); //Not vowel
        }
 
        return output;
    }
}
 
Using character arrays:
 
public string CapitalizeVowels(string input)
{
    if (string.IsNullOrEmpty(input)) //since a string is a class object, it could be null
        return string.Empty;
    else
    {
        char[] charArray = input.ToCharArray();
 
        for (int i = 0; i < charArray.Length; i++)
        {
            if (charArray[i] == 'a' || charArray[i] == 'e' ||
                charArray[i] == 'i' || charArray[i] == 'o' ||
                charArray[i] == 'u')
                charArray[i] = char.ToUpper(charArray[i]); //Vowel
            else
                charArray[i] = char.ToLower(charArray[i]); //Not vowel
        }
 
        return new string(charArray);
    }
}
 
Both functions will produce the exact same results given the same input data. We can perform some basic benchmarks to compare the performance of each function. For example, the string-based function took an average of 2181ms to process the string “hello world” 1,000,000 times while the array-based function only took 448ms (measured on my computer).
 
Conclusion
 
As with anything, working with character arrays to write efficient string functions in C# must be done with care. The code can quickly become less readable. When working with more complex string algorithms, the code can become very difficult to maintain. However since the transition between working with strings and working with character arrays is easy, a combination of both can reach an advantageous middle ground.
If you liked the article,  Subscribe to the RSS Feed or Subscribe Via Email  
 
Armando is a C#.NET developer who enjoys learning the latest Microsoft technologies. You can find his own C# site at www.vcskicks.com
Give a +1 to this article if you think it was well written. Thanks!
Recommended Articles


Page copy protected against web site content infringement by Copyscape


User Feedback
Comment posted by Andy Till on Friday, February 12, 2010 8:10 AM
I don't think this is a totally fair comparison, the biggest cost is linkely to be the constant string creation when adding the input:

.....
   output += input[i].ToString().ToUpper(); //Vowel
else
   output += input[i].ToString().ToLower(); //Not vowel

If you use a StringBuilder instead you will find that the time taken is cut by more than half.
Comment posted by Neil Sorensen on Friday, February 12, 2010 12:04 PM
In addition to using StringBuilder, as suggested above, take a minute to consider what you've actually saved.  Over 1,000,000 iterations, even using the naive String approach, you've only managed to save 1 1/2 seconds.  That likely means that this kind of optimization won't be necessary for any real applications, since it's not likely to be your bottleneck.  Premature optimization has killed more code than optimization has saved.
Comment posted by Armando on Friday, February 12, 2010 3:49 PM
Thanks for the comments guys. I agree that StringBuilder is definitely something you want to use to avoid constant string creation.

As for it being bottleneck, it probably isn't. But remember this is about writing functions that are as efficient as possible. There are people who are concerned with performance over large amounts of iterations. It's just something to think about when working with strings in general
Comment posted by Will Gottlieb on Friday, February 12, 2010 8:51 PM
Good stuff
Comment posted by Russ on Thursday, February 18, 2010 8:04 AM
Strings are immutable, that is the reason for the performance difference.
Stringbuilder is the best bet when manipulating strings.
Comment posted by James Curran on Tuesday, March 9, 2010 11:35 AM
Here's the way I would have written the function:
public string CapitalizeVowels3(string input)
{
    if (string.IsNullOrEmpty(input)) //since a string is a class object, it could be null
        return string.Empty;
    else
    {
      var sb = new StringBuilder(input.Length);
      foreach(var ch in input)
        {
       var c = Char.ToLower(ch);
         
            if (c == 'a' || c== 'e' ||
                c == 'i' || c== 'o' ||
                c == 'u')
                sb.Append(char.ToUpper(c)); //Vowel
            else
                sb.Append(c); //Not vowel
        }

        return sb.ToString();
    }
}

It is slightly slower than yours (but still significantly faster than the original), but I think it's cleaner design is worth the extra time.  It also fixes a bug in you code (try converting the phrase "Greetings Earthlings" to see the problem)

Comment posted by Writing Services on Tuesday, June 15, 2010 4:01 AM
Thanks for great tips and guidelines!
Comment posted by aly on Monday, March 28, 2011 5:20 AM
Useful tip for writing, thanks <a href="http://www.bestessayhelp.com/college-essay">college essay help</
Comment posted by college essay help on Monday, March 28, 2011 5:24 AM
Good post. Useful information
Comment posted by kl;l;kl; on Tuesday, April 26, 2011 3:46 AM
jk89ikjmy87856yefgddwdgh
Comment posted by MorrisM on Wednesday, May 25, 2011 3:17 AM
Hi,
I am also a developer in .Net framework.
Recently I have found the source code of a application which was written in .Net 4.0. It is a VoIP application and
it can forward and receive calls and it has speech to text function. It means that it can record the telephone
conversation on the basis of keyword. And in addition it is possible to convert this voice record to text format.
And later you can search in the text on the basis of the given keywords. The source code can be the basis of a application which fits to your needs perfectly.

If you are interested in, You can read about this solution here:
http://voip-sip-sdk.com/p_135-c-speech-to-text-voip.html

Good developing!
Comment posted by Gaurav Pandey on Saturday, May 5, 2012 5:22 PM
A bit improved version (& cleaner as well ;) )

static string CapitalizeWords(string input)
{
   if (string.IsNullOrEmpty(input))
      return string.Empty;

   var lookup = new HashSet<Char>() {'a', 'e', 'i', 'o', 'u'};
   return new String(input.Select(x => lookup.Contains(x) ? Char.ToUpper(x) : x).ToArray());
}
Comment posted by ahmed razzak on Saturday, February 2, 2013 2:06 PM
Hi Guys
I Want c# program that compile the for statement
for example if we write the for statement correctly
for(int i=0;i>x;i++)
that console”accept” else
far(iat i&0;i<0;i**) "not accept"
please i want it in main program i don't want it in methods
MAIN PROGRAM

Post your comment
Name:  
E-mail: (Will not be displayed)
Comment:
Insert Cancel