Martin Krzywinski - The Perl Journal - Canada's Michael Smith Genome Sciences Centre

Volumes 1–6 (1996–2002)

Code tarballs available for issues 1–21.

I reformatted the CD-ROM contents. Some things may still be a little wonky — oh, why hello there <FONT> tag. Syntax highlighting is iffy. Please report any glaring issues.

The Perl Journal

#11

Fall 1998

vol 3

num 3

file_download download code 1,300,480 bytes

The Birth of a One-Liner

A beginner's look at shrinking Perl code.

Art Ramos

Just the FAQs: Suffering From Buffering

Your output doesn't always appear when you think it does.

Mark-Jason Dominus

Parsing VRML

Using Perl to interpret the Virtual Reality Modeling Language.

Tuomas J. Lukka

Source Filters

Ever wanted to encrypt your Perl programs?

Paul Marquess

A Web Proxy Module For mod_perl

Eliminate ads from your web pages with a proxy.

Lincoln D. Stein

Perl Style

How readable is your code?

Kurt Starsinic

Perl News

What's new in the Perl community.

Jon Orwant

Win32: PerlCOM And PerlCtrl

Two new products for gluing Win32 COM objects.

Brian Jepson

Benchmarking

Measuring the speed of Perl programs.

Brian D. Foy

Iterating Over Permutations

Making certain brute force searches a little more tractable.

Jason Brazile

Creating Applications With mod_perl

Using Apache::Registry, Apache::Sandwich, Embperl, and DBI.

Mike Fletcher

Making Life And Death Decisions With Perl

Probability theory and deadly diseases.

Richard Dice

Netscape and LDAP

Accessing directories via the Lightweight Directory Access Protocol.

Tom Paquin

The Third Annual Obfuscated Perl Contest Victors

perl -e '$# = print "1 + 1 = "; print 2'

Felix Gallo

The Perl Journal One-Liners

Kurt Starsinic (1998) Perl Style. The Perl Journal, vol 3(3), issue #11, Fall 1998.

How readable is your code?

Kurt Starsinic

Requirements

Perl 5.005      CPAN/src or CPAN/ports
B                    CPAN/modules/by-module/B
B::Fathom     CPAN/authors/id/K/KS/KSTAR

What is good coding practice? What is readable code? For some programmers, these questions lead to heated arguments. In the relatively young field of programming, it's natural that generally accepted rules of style and usage haven't yet emerged. Fortunately, our colleagues in the more mature field of philology (the study of language as used in literature) have set examples that we can follow. In this article, I'll describe Fathom, a module that grades the readability of Perl programs.

BACKGROUND

You may have experience with the grammar check feature of some word processors, which finds likely spelling, grammar, and usage errors in your documents. These tools can be quite useful, particularly for people who don't do much writing, or for people who haven't had much writing instruction.

As a programmer who works mostly in teams, often training new or junior programmers during time-critical projects, I want automated ways to encourage compliance with team coding standards. I know that such tools can (and do) work for business writing, but I've been unable to find a tool that would do the job for business coding. I did some investigation to see if any of the available grammar checkers could be adapted for use with program code.

EXISTING MEASURES

There are many well-known measures of readability in literature. You may have heard of Flesch-Kincaid, FOG, SMOG, Bormuth, or other readability or grade level tests; Microsoft Word uses three Flesch tools to evaluate style. These tests generally look at the average number of syllables per word and the average number of words per sentence, then report a single number which indi-cates either the grade level (1--12) or readability (usually 1-100) of the document. As an example, the Flesch-Kincaid formula for determining the grade level of a document is:

((average sentence length in words) * 0.39)
+ ((average syllables per word) * 11.8)
- 15.59

Unfortunately, these measures don't map well onto code; for example, how many syllables are there in ++ or { or $_? Is select easier to read than gethostbyname?

Once I realized that I wouldn't be able to simply run one of the prose-readability tests on my code and get meaningful results, I began to study the design and function of those tests. Then, I constructed a working model for code readability.

THE BASIC UNITS

After thinking about tools like Flesch-Kincaid, and discussing the idea of a readability tool with colleagues, I came up with a basic model for a code readability metric. I decided to measure the number of tokens per expression, the number of expressions per statement, and the number of statements per subroutine.

Some sample tokens:

++
$foo::bar
;
{
&&
any keyword

Sample expressions:

0.2
($a + 6)
wantarray ? @a : 0

Sample statements:

$a = $foo::bar * 7;
$x++;

THE TOOL

Given the basic model I've described, I wrote a module, Fathom.pm, that grades the readability of a Perl program. It rates on an open-ended scale, where 1 indicates a trivial program, 5 indicates "mature" code, 7 indicates very sophisticated code, and anything over 7 is Very Hairy. I established the following norms for mature code:

3 tokens per expression
6 expressions per statement
20 statements per subroutine

From this, I came up with the formula on the next page.

code complexity =
  ((average expression length in tokens) * 0.55)
+ ((average statement length in expressions) * 0.28)
+ ((average subroutine length in statements) * 0.08)

If you plug the norms (3, 6, 20) into this formula, you'll see that ideal mature code actually gets a score of 4.93; that's because I rounded all the multipliers to 2 decimal digits, to keep things simple.

USAGE

First, you'll need to install Fathom. You can find it on CPAN, under authors/id/K/KS/KSTAR.

After installing Fathom, you can invoke it as follows:

perl -MO=Fathom filename

The output looks like this:

315 tokens
97 expressions
17 statements
1 subroutine
readability is 4.74 (easier than the norm)

WHY THIS WOULD BE A HARD PROBLEM

Perl is an unusual programming language, in that it has dynamic syntax; that is, any programmer can write code that extends or changes the syntax of Perl. Consider the following code:

use Mystery;
 if (mystery /1/ . . .

You can't parse this without knowing about Mystery.pm! Let's consider two different versions of Mystery.pm.

Version 1:

package Mystery;
sub main::mystery { return 5; }
1;

Version 2:

package Mystery;
sub main::mystery() { return 5; }
1;

These two packages are almost trivially different. They both define one function, named mystery(), which returns the value 5. However, the second version uses a prototype. In the first case, our program parses as:

if (mystery(the results of matching the regular expression /1/ ...

In the second case, it parses as:

if (mystery() divided by 1 divided by ...

By the time you've written a program which can successfully parse every possible case, you've rewritten Perl!

THE PERL COMPILER TO THE RESCUE

Fortunately, Malcolm Beattie's Perl compiler gives us access to the pertinent guts of Perl. Without the compiler, this project would have been prohibitively difficult.

EXAMPLES

Benchmark.pm
27 tokens
7 expressions
5 statements
1 subroutine
readability is 2.91 (very readable)

Apache::AdBlocker
47 tokens
13 expressions
6 statements
1 subroutine
readability is 3.08 (readable)

CGI/Carp.pm
66 tokens
22 expressions
11 statements
1 subroutine
readability is 3.09 (readable)

perl5.005/eg/travesty
259 tokens
96 expressions
33 statements
1 subroutine
readability is 4.94 (easier than the norm)

s2p
2588 tokens
826 expressions
384 statements
11 subroutines
readability is 5.12 (mature)

CGI.pm
521 tokens
180 expressions
54 statements
1 subroutine
readability is 6.85 (complex)

DBI.pm
835 tokens
252 expressions
58 statements
1 subroutine
readability is 7.68 (very difficult)

diagnostics.pm
767 tokens
272 expressions
96 statements
1 subroutine
readability is 10.02 (obfuscated)

FUTURE DIRECTIONS

I intend to continue to refine Fathom.pm in several ways: by tweaking its basic formula to produce more accurate grades, by considering the placement and length of comments and PODs, by having it identify problematic code sections, and by having it make specific suggestions for improvement.

There are also some problems which I hope to address in the near future: Fathom doesn't see code which executes at compile time, such as code in BEGIN blocks or use statements; and sometimes it counts implicit tokens, such as $_ in a foreach statement. These limitations probably won't make much statistical difference in a medium-to-large program, but they could give wildly strange grades to one-liners and other short hacks.

Fathom also opens the door to a whole suite of companion tools: a program which checks variable names against a site-wide naming policy; a tool, much like C's indent, to normalize the indentation of Perl code; and likely several more tools, based on experience and feedback. Some of these are already being developed by others.

CONCLUSIONS

Perl's extraordinary architecture makes it possible to produce very powerful companion tools without having to re-invent the wheel. Fathom was developed with a relatively small amount of original code—it simply hooks into the pre-existing Perl internal data structures to do its job. Similarly, the Perl debugger uses built-in features of Perl, plus a minimal amount of black magic, to provide a full-featured debugging environment for your Perl programs.

In most other languages, writing a tool like Fathom would force you to start from scratch, since some of the best tools for other languages (e.g., gdb, indent, and cxref for C) are based on code which is completely independent from the compilers or interpreters which they complement. In the case of languages which are still undergoing refinement (such as C++), maintenance of these tools can be a nightmare. However, Fathom will continue to work even if Perl's syntax changes, because it's hooked into the Perl compiler itself!

I hope that you're so intrigued by Fathom that you'll want to refine it, rewrite it, or develop new tools in a similar vein. Try this at home, kids!

ACKNOWLEDGMENTS

Fathom would not have been possible without Malcolm Beattie's outstanding work on the Perl compiler. Stephen McCamant's B::Deparse module was tremendously helpful in demonstrating how to write a compiler backend. And, of course, I couldn't have done any of this without such a rich language as Perl.

Kurt Starsinic has been programming in Perl since 1994, when he first downloaded the source code to the Lycos search engine. He works at the Institute for Scientific Information in Philadelphia, where he develops (among other things) large CGI applications in Perl. Drop him a line at kstar@isinet.com.