Requirements |
Perl 5.005 CPAN/src or CPAN/ports B CPAN/modules/by-module/B B::Fathom CPAN/authors/id/K/KS/KSTAR |
What is good coding practice? What is readable code? For some programmers, these questions lead to heated arguments. In the relatively young field of programming, it's natural that generally accepted rules of style and usage haven't yet emerged. Fortunately, our colleagues in the more mature field of philology (the study of language as used in literature) have set examples that we can follow. In this article, I'll describe Fathom, a module that grades the readability of Perl programs.
BACKGROUND
You may have experience with the grammar check feature of some word processors, which finds likely spelling, grammar, and usage errors in your documents. These tools can be quite useful, particularly for people who don't do much writing, or for people who haven't had much writing instruction.
As a programmer who works mostly in teams, often training new or junior programmers during time-critical projects, I want automated ways to encourage compliance with team coding standards. I know that such tools can (and do) work for business writing, but I've been unable to find a tool that would do the job for business coding. I did some investigation to see if any of the available grammar checkers could be adapted for use with program code.
EXISTING MEASURES
There are many well-known measures of readability in literature. You may have heard of Flesch-Kincaid, FOG, SMOG, Bormuth, or other readability or grade level tests; Microsoft Word uses three Flesch tools to evaluate style. These tests generally look at the average number of syllables per word and the average number of words per sentence, then report a single number which indi-cates either the grade level (1--12) or readability (usually 1-100) of the document. As an example, the Flesch-Kincaid formula for determining the grade level of a document is:
((average sentence length in words) * 0.39) + ((average syllables per word) * 11.8) - 15.59
Unfortunately, these measures don't map well onto code; for example, how many syllables are there in ++ or { or $_? Is select easier to read than gethostbyname?
Once I realized that I wouldn't be able to simply run one of the prose-readability tests on my code and get meaningful results, I began to study the design and function of those tests. Then, I constructed a working model for code readability.
THE BASIC UNITS
After thinking about tools like Flesch-Kincaid, and discussing the idea of a readability tool with colleagues, I came up with a basic model for a code readability metric. I decided to measure the number of tokens per expression, the number of expressions per statement, and the number of statements per subroutine.
Some sample tokens:
++
$foo::bar
;
{
&&
any keyword
Sample expressions:
0.2 ($a + 6) wantarray ? @a : 0
Sample statements:
$a = $foo::bar * 7; $x++;
THE TOOL
Given the basic model I've described, I wrote a module, Fathom.pm, that grades the readability of a Perl program. It rates on an open-ended scale, where 1 indicates a trivial program, 5 indicates "mature" code, 7 indicates very sophisticated code, and anything over 7 is Very Hairy. I established the following norms for mature code:
3 tokens per expression 6 expressions per statement 20 statements per subroutine
From this, I came up with the formula on the next page.
code complexity = ((average expression length in tokens) * 0.55) + ((average statement length in expressions) * 0.28) + ((average subroutine length in statements) * 0.08)
If you plug the norms (3, 6, 20) into this formula, you'll see that ideal mature code actually gets a score of 4.93; that's because I rounded all the multipliers to 2 decimal digits, to keep things simple.
USAGE
First, you'll need to install Fathom. You can find it on CPAN, under authors/id/K/KS/KSTAR.
After installing Fathom, you can invoke it as follows:
perl -MO=Fathom filename
The output looks like this:
315 tokens 97 expressions 17 statements 1 subroutine readability is 4.74 (easier than the norm)
WHY THIS WOULD BE A HARD PROBLEM
Perl is an unusual programming language, in that it has dynamic syntax; that is, any programmer can write code that extends or changes the syntax of Perl. Consider the following code:
use Mystery;
if (mystery /1/ . . .
You can't parse this without knowing about Mystery.pm! Let's consider two different versions of Mystery.pm.
Version 1:
package Mystery;
sub main::mystery { return 5; }
1;
Version 2:
package Mystery;
sub main::mystery() { return 5; }
1;
These two packages are almost trivially different. They both define one function, named mystery(), which returns the value 5. However, the second version uses a prototype. In the first case, our program parses as:
if (mystery(the results of matching the regular expression /1/ ...
In the second case, it parses as:
if (mystery() divided by 1 divided by ...
By the time you've written a program which can successfully parse every possible case, you've rewritten Perl!
THE PERL COMPILER TO THE RESCUE
Fortunately, Malcolm Beattie's Perl compiler gives us access to the pertinent guts of Perl. Without the compiler, this project would have been prohibitively difficult.
EXAMPLES
Benchmark.pm 27 tokens 7 expressions 5 statements 1 subroutine readability is 2.91 (very readable) Apache::AdBlocker 47 tokens 13 expressions 6 statements 1 subroutine readability is 3.08 (readable) CGI/Carp.pm 66 tokens 22 expressions 11 statements 1 subroutine readability is 3.09 (readable) perl5.005/eg/travesty 259 tokens 96 expressions 33 statements 1 subroutine readability is 4.94 (easier than the norm) s2p 2588 tokens 826 expressions 384 statements 11 subroutines readability is 5.12 (mature) CGI.pm 521 tokens 180 expressions 54 statements 1 subroutine readability is 6.85 (complex) DBI.pm 835 tokens 252 expressions 58 statements 1 subroutine readability is 7.68 (very difficult) diagnostics.pm 767 tokens 272 expressions 96 statements 1 subroutine readability is 10.02 (obfuscated)
FUTURE DIRECTIONS
I intend to continue to refine Fathom.pm in several ways: by tweaking its basic formula to produce more accurate grades, by considering the placement and length of comments and PODs, by having it identify problematic code sections, and by having it make specific suggestions for improvement.
There are also some problems which I hope to address in the near future: Fathom doesn't see code which executes at compile time, such as code in BEGIN blocks or use statements; and sometimes it counts implicit tokens, such as $_ in a foreach statement. These limitations probably won't make much statistical difference in a medium-to-large program, but they could give wildly strange grades to one-liners and other short hacks.
Fathom also opens the door to a whole suite of companion tools: a program which checks variable names against a site-wide naming policy; a tool, much like C's indent, to normalize the indentation of Perl code; and likely several more tools, based on experience and feedback. Some of these are already being developed by others.
CONCLUSIONS
Perl's extraordinary architecture makes it possible to produce very powerful companion tools without having to re-invent the wheel. Fathom was developed with a relatively small amount of original codeit simply hooks into the pre-existing Perl internal data structures to do its job. Similarly, the Perl debugger uses built-in features of Perl, plus a minimal amount of black magic, to provide a full-featured debugging environment for your Perl programs.
In most other languages, writing a tool like Fathom would force you to start from scratch, since some of the best tools for other languages (e.g., gdb, indent, and cxref for C) are based on code which is completely independent from the compilers or interpreters which they complement. In the case of languages which are still undergoing refinement (such as C++), maintenance of these tools can be a nightmare. However, Fathom will continue to work even if Perl's syntax changes, because it's hooked into the Perl compiler itself!
I hope that you're so intrigued by Fathom that you'll want to refine it, rewrite it, or develop new tools in a similar vein. Try this at home, kids!
ACKNOWLEDGMENTS
Fathom would not have been possible without Malcolm Beattie's outstanding work on the Perl compiler. Stephen McCamant's B::Deparse module was tremendously helpful in demonstrating how to write a compiler backend. And, of course, I couldn't have done any of this without such a rich language as Perl.
Kurt Starsinic has been programming in Perl since 1994, when he first downloaded the source code to the Lycos search engine. He works at the Institute for Scientific Information in Philadelphia, where he develops (among other things) large CGI applications in Perl. Drop him a line at kstar@isinet.com.