Thursday, March 27, 2008

Divisibility Testing and Pattern Matching, Part 2

Last time, I left myself with the challenge of coming up with a regular expression to test for divisibility by 3, and left you with 3 relatively unsatisfying regular expressions to use for determining divisibility (by 2, 5 and 10). (The RE for 4 I'll count as being partly satisfying). First, why were those tests easy to code as a regular expression? If we are testing for divisibility by some number d = 2^a * 5^b, then we only have to consider the last max(a, b) digits. If our test only depends on the last few digits, then at worst we'll be able to enumerate the possibilities. The regular expression (00|04|08|12|...|92|96)$ will also correctly test for divisibility by 4.

For determining divisibility by 3, such an enumeration isn't available to us (we have to consider every digit since there is no power of 10 evenly divisible by 3). How do we know that it's possible to write a regular expression that matches on multiples of 3? Because we can write a finite state automaton that tests for divisibility by 3.


enum State
{
Accept,
Rem1,
Rem2
}

bool IsDivisibleBy3(char* num)
{
State state = State.Accept;
for (; *num; ++num)
{
switch (*num)
{
case '0': case '3': case '6': case '9':
break;
case '1': case '4': case '7':
if (state == State.Accept)
state = State.Rem1;
else if (state == State.Rem1)
state = State.Rem2;
else
state = State.Accept;
break;
case '2': case '5': case '8':
if (state == State.Accept)
state = State.Rem2;
else if (state == State.Rem1)
state = State.Accept;
else
state = State.Rem1;
break;
}
}
return s == State.Accept;
}

I wrote it in this style to make explicit that it represents a finite state automaton. It's essentially an obfuscated version of what you might actually write to test for divisibility by 3:


bool IsDivisibleBy3(char* num)
{
int remainder = 0;
for (; *num; ++num)
{
remainder = ((10 * remainder + *num - '0') % 3);
}
return remainder == 0;
}

This itself is just a slight modification of the code to read in a number and see if 3 divides it, with the important change that this one won't overflow on large input. Notice how close this code comes to implementing the rule that 3 divides a number if 3 divides the sum of its digits!

Since there exists an FSA that checks for divisibility by 3, there must be a regular expression that codes this FSA. To simplify the definition, we'll introduce some character classes:

Z = [0369], O = [147], T = [258]

These partition the digits into equivalence classes modulo 3 (and correspond to the different cases in the code - coincidence?). Then, a regular expression to test for divisibility by 3 is:

^(Z|(OZ*OZ*(TZ*O)*Z*O)|(TZ*TZ*(OZ*T)*Z*T)|(O(OZ*T)*Z*T)|(TZ*(TZ*O)Z*O))+$

If you're not into the whole brevity thing, that expands to:

^
([0369] | ([147][0369]*[147][0369]*([258][0369]*[147])*[0369]*[147]) | ([258][0369]*[258][0369]*([147][0369]*[258])*[0369]*[258]) | ([147]([147][0369]*[258])*[0369]*[258]) | ([258][0369]*([258][0369]*[147])[0369]*[147]))+$

So now we have regular expressions for testing for divisibility by 2, 3, 4, 5, and 10. We could make one for 8 (it would be similar to the one for 4). Testing for divisibility by 6 is equivalent to testing for both 2 and 3, both of which we can do (but which it would not be easy to write a standard RE for, though I believe there are extensions that implement an And operator). Alternately, we could modify the one above to make sure that the last digit is even (ick - that one is left for the reader). What would a regular expression that tests for divisibility by 7 look like?

Sunday, March 16, 2008

Divisibility Testing and Pattern Matching

How do you tell if a number is divisible by 10? Everybody knows you just look to see if the last digit is 0 or not. There are similar rules for checking divisibility by 5 (ends in 0 or 5) and 2 (ends in an even number). What I think is interesting is that these rules are not mathematical in the usual sense. Rather, they are based on simple textual pattern matching. That is, they don't rely on manipulating values, they just require inspecting the textual representation of the value in question.

If you are going to do pattern matching on text, the natural tool is regular expressions. Using the usual regular expression syntax, the above rules can be expressed as 0$, [05]$, and [02468]$, respectively. Those are the easy ones. Can we code more complicated divisibility tests as regular expressions? How about checking for divisibility by 4?

The mathematical shortcut for checking if 4 divides a number is to see if it divides just the last 2 digits. This works because if n = 100 a + b, 4 | n iff 4 | 100 a + b iff 4 | b. If we choose a and b so that b < 100, b will be exactly the last 2 digits of n. So we know we can get away with looking at just the last 2 digits; is there a regular expression we can use? Yes: ([02468][048])|([13579][26])$ works! Unsurprisingly, our regular expression considers only the last 2 digits.

So far, I've been considering the easy cases - testing for divisibility by combinations of 2 and 5 (which are easy since we're using the base 10 representation). You may know the shortcut for testing for divisibility by 3: add up the digits of the number, and if this sum is itself divisible by 3 then so was the original number. Is it possible to code this test as a regular expression?

To be continued...

[Erratum: the regular expression above for divisibility by 4 requires 2 digits, so it won't match 0, 4 and 8]

Saturday, March 15, 2008

C# Extension Methods - Syntactic Sugar That Won't Rot Your Teeth

C# 3.5 introduced extension methods to the language. These are non-member methods that are callable with member method syntax. For example, if we have a class:

class Foo { /**/ }


we can declare an extension method for it like this:
  static class Bar
{
public static void Baz(this Foo f) { /**/ }
}
(The this keyword in the declaration is what marks it as an extension method.) The syntactic sugar is that you are allowed to call Baz on an instance of Foo like this:

new Foo().Baz();

The compiler rewrites the above call to the more traditional:

Bar.Baz(new Foo());

Now that the mechanics are out of the way, let's look at what they are good for. The main benefit of extension methods, in my mind, is that they remove the syntactic penalty of non-member functions. Take Scott Myers's advice [ddj] for C++ class design: "Prefer non-member non-friend functions to member functions." There is little argument that it's a good idea in principle, but there are certainly mixed feelings about the syntactic trade-off [Google seems to bork that thread - the discussion tree it renders doesn't make much sense]. As an example of where you might do this, consider how String.StartsWith() might have been implemented:

public class MyString
{
private Char[] m_chars;
public Char this[int i]
{
get { return m_chars[i]; }
}

public MyString(String s)
{
m_chars = s.ToCharArray();
}

public int Length { get { return m_chars.Length; } }
}

public static class MyStringExt
{
public static bool StartsWith(this MyString str, MyString find)
{
if (str.Length < find.Length) return false;

for (int i = 0; i < find.Length; ++i)
{

if (str[i] != find[i])
return false;
}
return true;
}
}

The same functionality, but with improved encapsulation! There is a downside, of course. Object methods are much more "discoverable" than static functions that happen to accept particular parameters. Member functions have to be declared along with the rest of the class, so there is just one place to look for them [partial classes complicate this a bit]. Microsoft's answer to this is improved Intellisense; extension methods are included alongside regular methods. As long as you're using their IDE, I think it makes up for it.

Sunday, March 2, 2008

The Universe aligns to tell Joel what he wants to hear

Joel Spolsky doesn't think comments belong in blogs. "You don't have a right to post your thoughts at the bottom of someone else's thoughts. That's not freedom of expression, that's an infringement on their freedom of expression." To back this up, he cites some offensive comments that were written in response to a post on a real estate blog. There are so many things wrong with Joel's position that I'm not sure where to begin.

First of all, if the blog allows comments, then yes, you do have the right to post your thoughts at the bottom. If the writer didn't want feedback, they would turn that feature off. There is an implied social contract that commenters will be courteous and civil, which was obviously violated in Joel's real estate example. No big deal, that's what the delete key is for. Joel obviously feels that forum moderation is acceptable in other situations, and goes so far to liken it to picking up trash in the park. This solution may not be perfect, but that Joel doesn't even mention is it interesting.

So why doesn't he mention it? My guess is because of how he thinks about blogging. He doesn't see it as a dialog in the traditional sense, he sees it more like pamphleteering. You write your essay, distribute it, and anyone who cares to respond can write their own pamphlet. Indexing services (bloglines, technorati, etc) make it so that you can find others essays that claim to respond to yours. Perfect.

Perfect, that is, if that's the model you wish to use. A more effective model (in my opinion) is the "presentation with question and answer" model that most blogs use. Short questions that ask about an unclear point and the author's response make the post better when they appear with the original post. Discussion is more focused when it takes place in a single thread (another thing Joel knows); the best place for that is alongside the original.

In the end, I think it's partly a matter of scale. People like Joel can get away with the pamphleteering style of blogging. There are n+1 people responding to his posts, and he certainly doesn't have time to even read them all, so why should he host them. For most of the long tail, there are much fewer readers, and just a handful of replies. For them, accepting (civil!) comments and posting them at the bottom is the way to go.