Welcome to Honest Illusion Sign in | Join | Help

Some Better-Written Custom String Methods using C#

In my daily web-surfing, I often stumble upon snippets of C# code posted by people.  Usually, I can tweak  it a bit. Sometimes, I can tweak it a lot.  I usually post a quick comment to the site offering it.  Today, I came upon some code that was so bad --- which the author said was from his forthcoming book! --- more drastic measures must be taken.

First we have a function to put a string into “Title Case” (which the author refers to as “Proper Case”) – Having the first letter of each word capitalized.  Here’s the original:

public static String PCase(String strParam)
 {
     String strProper = strParam.Substring(0, 1).ToUpper();
     strParam = strParam.Substring(1).ToLower();
     String strPrev = "";
     for (int iIndex = 0; iIndex < strParam.Length; iIndex++)
     {         
         if (iIndex > 1)
         {
             strPrev = strParam.Substring(iIndex - 1, 1);
         }
         if (strPrev.Equals(" ") ||
         strPrev.Equals("\t") ||
         strPrev.Equals("\n") ||
         strPrev.Equals("."))
         {
             strProper += strParam.Substring(iIndex, 1).ToUpper();
         }
         else
         {
             strProper += strParam.Substring(iIndex, 1);
         }
     }
     return strProper;
 } 

 

What wrong here?  Lot’s of really bad string handling.  Remember, strings are immutable, so any action on one creates a new string.  So, “strParam.Substring(iIndex, 1)” creates a new string. “strParam.Substring(iIndex, 1).ToUpper()” create two new strings, and “strProper += strParam.Substring(iIndex, 1).ToUpper();” creates three new strings.  And, that’s within a loop.   And, since Substring is always used here to create a one-character string, it easier to just use a char --- except apparently, this book author doesn’t know how to.   Nor, doesn’t he apparently know about StringBuilder.  Then, we get to the algorithm itself, where he does such bizarre things as pointlessly treat the first character as a special case, in two different places.

Ok, now let’s see the revision:

    public static String PCase(String strParam)
{
         StringBuilder sb = new StringBuilder(strParam.Length);
         char cPrev = '.';  // start with something to force the next character to upper.
        foreach(char c in strParam)
         {
             if (cPrev == '.' || Char.IsWhiteSpace(cPrev))
                 sb.Append(Char.ToUpper(c));
             else
                 sb.Append(Char.ToLower(c));
             cPrev = c;
         }
         return sb.ToString();
     }  


First we start with a string builder, preallocated to the size of the string we are building.  The method doesn’t change the length of the string, so we know the length of the final string right from the start.

Next, since we are going to capitalize every letter after a period, and also the first letter, why not just pretend the mythical initial “last” character was a period?  Suddenly, the first letter is no longer a special case, and we still get what we want.

Then, we just loop through the letters, raising or lowering letter as we need. Note that it works on characters and not strings, and uses the build-in IsWhitespace method, instead of using a  hardcoded list of a subset of them.  A for() loop can in certain cases (however, not the one used in the original) be faster than a foreach(), but here I figured it was safe to sacrifice a tiny bit of speed for clearer code.

Next up, Reversing a String.  The Original:

    public static String Reverse(String strParam)
     {
         if (strParam.Length == 1)
         {
             return strParam;
         }
         else
         {
             return Reverse(strParam.Substring(1)) + strParam.Substring(0, 1);
         }
     }

Now, here the author might be able to earn a pass.  It’s possible that somewhere in his book he talks about recursive functions, and uses this as an example.  Then, if might be OK.  I mean, it is the only reason I can think of that someone might want a function which reverses a string – despite ubiquity of string reversing functions in libraries like this.  But, he presented it on his blog as a collection of string functions, so we’ll have to judge them on that basis. 

Again we have lots of string manipulation to accomplish something simple --- where there are already method built into the framework to handle such things:

public static String Reverse(String strParam)
{
     byte[] rev = Encoding.ASCII.GetBytes(strParam);
     Array.Reverse(rev);
     return Encoding.ASCII.GetString(rev);
}

Nexy, we have a simple function to count the occurrences of a substring. 

public static int CharCount(String strSource, String strToCount)
{
    int iCount = 0;
    int iPos = strSource.IndexOf(strToCount);
    while (iPos != -1)
    {
        iCount++;
        strSource = strSource.Substring(iPos + 1);
        iPos = strSource.IndexOf(strToCount);
    }
    return iCount;}

The revision isn’t much different but the subtle difference is important.  Instead of creating a new, shorter string to search, we tell it to just start looking after the last match.  Instead we now merely look at the string.

public static int CharCount(String strSource, String strToCount)
{
    int iCount = 0;
    int iPos = strSource.IndexOf(strToCount);
    while (iPos != -1)
    {
        iCount++;
        iPos = strSource.IndexOf(strToCount, iPos+1);
    }
    return iCount;
}
The next one has a special problem --- It doesn’t do what it claims to do!
// Trim the string to contain only a single whitepace between wordspublic static String ToSingleSpace(String strParam)
{
    int iPosition = strParam.IndexOf(" ");
    if (iPosition == -1)
    {
        return strParam;
    }
    else
    {
        return ToSingleSpace(strParam.Substring(0, iPosition) +        strParam.Substring(iPosition + 1));
    }
}

Now, it says that it should remove repeated space, so that there is only one space between words. However, what it actually does it to remove all spaces.  This gives us a problem: Should my rewritten function do what it claims to do, or what it actually does?  I decided to give you one of each.

Duplicating the result is quite straightforward:

public static String RemoveAllSpaces(String strParam)
{
    return strParam.Replace(" ", "");
}

Writing a function to do what it is supposed to is a little more involved, but still simple:

public static String ToSingleSpace(String strParam)
{
    var sb = new StringBuilder(strParam.Length);
    var prevWS =true;
    foreach(var c in strParam)
    {
        if (Char.IsWhiteSpace(c))
        {
            if (!prevWS)
                sb.Append(' ');
            prevWS = true;
        }
        else
        {
            sb.Append(c);
            prevWS = false;
        }
    }
    return sb.ToString();
}
We create a string builder the size of our source string, which would be the maximum size our trimmed string could be, and we set prevWS to true.  This way, it will remove all leading whitespace.  Then we just step through the string, character by character, appending that character to our new string if it’s not a whitespace character, and appending a space for the first whitespace character found.  Note that this reduces all forms of whitespace (tabs, newlines, spaces etc) to a single space.  The original just worked on spaces.

Finally, we have a function to determine is a string is a palindrome. 

public static bool IsPalindrome(String strParam)
{
    int iLength, iHalfLen;
    iLength = strParam.Length - 1;
    iHalfLen = iLength / 2;
    for (int iIndex = 0; iIndex <= iHalfLen; iIndex++)
    {
        if (strParam.Substring(iIndex, 1) != strParam.Substring(iLength - iIndex, 1))
        {
            return false;
        }
    }
    return true;
}

 

Here the change is subtle (ignore the length calculations at the start, which are a trivial micro-optimization).

public static bool IsPalindrome(String strParam)
{
    var iLength = strParam.Length;
    var iHalfLen = iLength / 2;
    iLength --;
    for (int iIndex = 0; iIndex < iHalfLen; iIndex++)
    {
        if (strParam[iIndex] != strParam[iLength - iIndex])
        {
            return false;
        }
    }
    return true;
}

 

In this function, instead of comparing one-character long strings, I compare individual characters.  That change, by itself, cause a 4X speed improvement.

So, there you have it.  The article had more, but the other were trivial, and I couldn’t make them any better.  And in case you think I’m all talk here, each of those rewrites was benchmarked to run 2 to 5 times faster than the original.

Share this post: Email it! | bookmark it! | digg it! | reddit!
Readability Stats: Word Count: 1281; Sentence Count: 110; Grade Level: 5.7, more info...
Published Tuesday, February 02, 2010 8:07 PM by James
Filed under: , , , , ,

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

# re: Some Better-Written Custom String Methods using C#

I saw the original article so I know exactly what you mean :-)

The thing I'm most surprised about is the fact the article was linked to from at least two popular coding blogs that I read. I dread to think how many times it will have been copied and pasted by now

Friday, February 05, 2010 1:36 PM by Paul Blamire

# re: Some Better-Written Custom String Methods using C#

This one will not work if you have unicode characters in the string

public static String Reverse(String strParam)

{

    byte[] rev = Encoding.ASCII.GetBytes(strParam);

    Array.Reverse(rev);

    return Encoding.ASCII.GetString(rev);

}

Monday, February 15, 2010 5:13 PM by Maxim

Leave a Comment

(required) 
required 
(required)