Thursday, March 27, 2008

Divisibility Testing and Pattern Matching, Part 2

Last time, I left myself with the challenge of coming up with a regular expression to test for divisibility by 3, and left you with 3 relatively unsatisfying regular expressions to use for determining divisibility (by 2, 5 and 10). (The RE for 4 I'll count as being partly satisfying). First, why were those tests easy to code as a regular expression? If we are testing for divisibility by some number d = 2^a * 5^b, then we only have to consider the last max(a, b) digits. If our test only depends on the last few digits, then at worst we'll be able to enumerate the possibilities. The regular expression (00|04|08|12|...|92|96)$ will also correctly test for divisibility by 4.

For determining divisibility by 3, such an enumeration isn't available to us (we have to consider every digit since there is no power of 10 evenly divisible by 3). How do we know that it's possible to write a regular expression that matches on multiples of 3? Because we can write a finite state automaton that tests for divisibility by 3.


enum State
{
Accept,
Rem1,
Rem2
}

bool IsDivisibleBy3(char* num)
{
State state = State.Accept;
for (; *num; ++num)
{
switch (*num)
{
case '0': case '3': case '6': case '9':
break;
case '1': case '4': case '7':
if (state == State.Accept)
state = State.Rem1;
else if (state == State.Rem1)
state = State.Rem2;
else
state = State.Accept;
break;
case '2': case '5': case '8':
if (state == State.Accept)
state = State.Rem2;
else if (state == State.Rem1)
state = State.Accept;
else
state = State.Rem1;
break;
}
}
return s == State.Accept;
}

I wrote it in this style to make explicit that it represents a finite state automaton. It's essentially an obfuscated version of what you might actually write to test for divisibility by 3:


bool IsDivisibleBy3(char* num)
{
int remainder = 0;
for (; *num; ++num)
{
remainder = ((10 * remainder + *num - '0') % 3);
}
return remainder == 0;
}

This itself is just a slight modification of the code to read in a number and see if 3 divides it, with the important change that this one won't overflow on large input. Notice how close this code comes to implementing the rule that 3 divides a number if 3 divides the sum of its digits!

Since there exists an FSA that checks for divisibility by 3, there must be a regular expression that codes this FSA. To simplify the definition, we'll introduce some character classes:

Z = [0369], O = [147], T = [258]

These partition the digits into equivalence classes modulo 3 (and correspond to the different cases in the code - coincidence?). Then, a regular expression to test for divisibility by 3 is:

^(Z|(OZ*OZ*(TZ*O)*Z*O)|(TZ*TZ*(OZ*T)*Z*T)|(O(OZ*T)*Z*T)|(TZ*(TZ*O)Z*O))+$

If you're not into the whole brevity thing, that expands to:

^
([0369] | ([147][0369]*[147][0369]*([258][0369]*[147])*[0369]*[147]) | ([258][0369]*[258][0369]*([147][0369]*[258])*[0369]*[258]) | ([147]([147][0369]*[258])*[0369]*[258]) | ([258][0369]*([258][0369]*[147])[0369]*[147]))+$

So now we have regular expressions for testing for divisibility by 2, 3, 4, 5, and 10. We could make one for 8 (it would be similar to the one for 4). Testing for divisibility by 6 is equivalent to testing for both 2 and 3, both of which we can do (but which it would not be easy to write a standard RE for, though I believe there are extensions that implement an And operator). Alternately, we could modify the one above to make sure that the last digit is even (ick - that one is left for the reader). What would a regular expression that tests for divisibility by 7 look like?

No comments: