Some Better-Written Custom String Methods using C#
In my daily web-surfing, I often stumble upon snippets of C# code posted by people. Usually, I can tweak it a bit. Sometimes, I can tweak it a lot. I usually post a quick comment to the site offering it. Today, I came upon some code that was so bad --- which the author said was from his forthcoming book! --- more drastic measures must be taken.
First we have a function to put a string into “Title Case” (which the author refers to as “Proper Case”) – Having the first letter of each word capitalized. Here’s the original:
public static String PCase(String strParam)
{
String strProper = strParam.Substring(0, 1).ToUpper();
strParam = strParam.Substring(1).ToLower();
String strPrev = "";
for (int iIndex = 0; iIndex < strParam.Length; iIndex++)
{
if (iIndex > 1)
{
strPrev = strParam.Substring(iIndex - 1, 1);
}
if (strPrev.Equals(" ") ||
strPrev.Equals("\t") ||
strPrev.Equals("\n") ||
strPrev.Equals("."))
{
strProper += strParam.Substring(iIndex, 1).ToUpper();
}
else
{
strProper += strParam.Substring(iIndex, 1);
}
}
return strProper;
}
What wrong here? Lot’s of really bad string handling. Remember, strings are immutable, so any action on one creates a new string. So, “strParam.Substring(iIndex, 1)” creates a new string. “strParam.Substring(iIndex, 1).ToUpper()” create two new strings, and “strProper += strParam.Substring(iIndex, 1).ToUpper();” creates three new strings. And, that’s within a loop. And, since Substring is always used here to create a one-character string, it easier to just use a char --- except apparently, this book author doesn’t know how to. Nor, doesn’t he apparently know about StringBuilder. Then, we get to the algorithm itself, where he does such bizarre things as pointlessly treat the first character as a special case, in two different places.
Ok, now let’s see the revision:
public static String PCase(String strParam)
{
StringBuilder sb = new StringBuilder(strParam.Length);
char cPrev = '.'; // start with something to force the next character to upper.
foreach(char c in strParam)
{
if (cPrev == '.' || Char.IsWhiteSpace(cPrev))
sb.Append(Char.ToUpper(c));
else
sb.Append(Char.ToLower(c));
cPrev = c;
}
return sb.ToString();
}
First we start with a string builder, preallocated to the size of the string we are building. The method doesn’t change the length of the string, so we know the length of the final string right from the start.
Next, since we are going to capitalize every letter after a period, and also the first letter, why not just pretend the mythical initial “last” character was a period? Suddenly, the first letter is no longer a special case, and we still get what we want.
Then, we just loop through the letters, raising or lowering letter as we need. Note that it works on characters and not strings, and uses the build-in IsWhitespace method, instead of using a hardcoded list of a subset of them. A for() loop can in certain cases (however, not the one used in the original) be faster than a foreach(), but here I figured it was safe to sacrifice a tiny bit of speed for clearer code.
Next up, Reversing a String. The Original:
public static String Reverse(String strParam)
{
if (strParam.Length == 1)
{
return strParam;
}
else
{
return Reverse(strParam.Substring(1)) + strParam.Substring(0, 1);
}
}
Now, here the author might be able to earn a pass. It’s possible that somewhere in his book he talks about recursive functions, and uses this as an example. Then, if might be OK. I mean, it is the only reason I can think of that someone might want a function which reverses a string – despite ubiquity of string reversing functions in libraries like this. But, he presented it on his blog as a collection of string functions, so we’ll have to judge them on that basis.
Again we have lots of string manipulation to accomplish something simple --- where there are already method built into the framework to handle such things:
public static String Reverse(String strParam)
{
byte[] rev = Encoding.ASCII.GetBytes(strParam);
Array.Reverse(rev);
return Encoding.ASCII.GetString(rev);
}
UPDATE: One commentator noted (quite rightly) noted that my Reverse() method would only work for strings of ASCII characters and will fail if there are any Unicode characters. I knew that at the time, but I was hoping no one else would notice. The problem was I needed a method which converted a string into an array of characters, and being unable to find one in the CLR, I substituted a string to byte array method instead. I guess this is one of those times where you just have to step away for a while and come back to it, because now, I found the right method in a few seconds:
public static String Reverse(String strParam)
{ char[] rev = strParam.ToCharArray();
Array.Reverse(rev);
return new String(rev);
}
And while my first version was a big improvement over the original, this is a big improvement over my first version, so you get double the benefits!
Next, we have a simple function to count the occurrences of a substring.
public static int CharCount(String strSource, String strToCount)
{
int iCount = 0;
int iPos = strSource.IndexOf(strToCount);
while (iPos != -1)
{
iCount++;
strSource = strSource.Substring(iPos + 1);
iPos = strSource.IndexOf(strToCount);
}
return iCount;}
The revision isn’t much different but the subtle difference is important. Instead of creating a new, shorter string to search, we tell it to just start looking after the last match. Instead we now merely look at the string.
public static int CharCount(String strSource, String strToCount)
{
int iCount = 0;
int iPos = strSource.IndexOf(strToCount);
while (iPos != -1)
{
iCount++;
iPos = strSource.IndexOf(strToCount, iPos+1);
}
return iCount;
}The next one has a special problem --- It doesn’t do what it claims to do!
// Trim the string to contain only a single whitepace between words
public static String ToSingleSpace(String strParam)
{
int iPosition = strParam.IndexOf(" ");
if (iPosition == -1)
{
return strParam;
}
else
{
return ToSingleSpace(strParam.Substring(0, iPosition) + strParam.Substring(iPosition + 1));
}
}
Now, it says that it should remove repeated space, so that there is only one space between words. However, what it actually does it to remove all spaces. This gives us a problem: Should my rewritten function do what it claims to do, or what it actually does? I decided to give you one of each.
Duplicating the result is quite straightforward:
public static String RemoveAllSpaces(String strParam)
{
return strParam.Replace(" ", "");
}
Writing a function to do what it is supposed to is a little more involved, but still simple:
public static String ToSingleSpace(String strParam)
{
var sb = new StringBuilder(strParam.Length);
var prevWS =true;
foreach(var c in strParam)
{
if (Char.IsWhiteSpace(c))
{
if (!prevWS)
sb.Append(' ');
prevWS = true;
}
else
{
sb.Append(c);
prevWS = false;
}
}
return sb.ToString();
}We create a string builder the size of our source string, which would be the maximum size our trimmed string could be, and we set prevWS to true. This way, it will remove all leading whitespace. Then we just step through the string, character by character, appending that character to our new string if it’s not a whitespace character, and appending a space for the first whitespace character found. Note that this reduces all forms of whitespace (tabs, newlines, spaces etc) to a single space. The original just worked on spaces.
Finally, we have a function to determine is a string is a palindrome.
public static bool IsPalindrome(String strParam)
{
int iLength, iHalfLen;
iLength = strParam.Length - 1;
iHalfLen = iLength / 2;
for (int iIndex = 0; iIndex <= iHalfLen; iIndex++)
{
if (strParam.Substring(iIndex, 1) != strParam.Substring(iLength - iIndex, 1))
{
return false;
}
}
return true;
}
Here the change is subtle (ignore the length calculations at the start, which are a trivial micro-optimization).
public static bool IsPalindrome(String strParam)
{
var iLength = strParam.Length;
var iHalfLen = iLength / 2;
iLength --;
for (int iIndex = 0; iIndex < iHalfLen; iIndex++)
{
if (strParam[iIndex] != strParam[iLength - iIndex])
{
return false;
}
}
return true;
}
In this function, instead of comparing one-character long strings, I compare individual characters. That change, by itself, cause a 4X speed improvement.
So, there you have it. The article had more, but the other were trivial, and I couldn’t make them any better. And in case you think I’m all talk here, each of those rewrites was benchmarked to run 2 to 5 times faster than the original.