2024 π Daylatest newsbuy art
Love itself became the object of her love.Jonathan Safran Foercount sadnessesmore quotes
very clickable
data + munging

The Perl Journal

Volumes 1–6 (1996–2002)

Code tarballs available for issues 1–21.

I reformatted the CD-ROM contents. Some things may still be a little wonky — oh, why hello there <FONT> tag. Syntax highlighting is iffy. Please report any glaring issues.

The Perl Journal
#23
Spring 2002
vol 6
num 1
Damian Conway (2002) ...And Now for Something Completely Similar. The Perl Journal, vol 6(1), issue #23, Spring 2002.

...And Now for Something Completely Similar

Damian Conway


Resources  
Perl6::VariablesCPAN
Perl 6 discussionsmailto:perl6-language-subscribe@perl.org
Perl 6 archiveshttps://dev.perl.org/
Source codehttps://yetanother.org/damian/TPJ/ANFSCS

The sky isn't falling!

Though, if you read the various mailing lists and Web sites devoted to discussing Perl 6, you could certainly be forgiven for assuming otherwise. Especially after www.perl.com publishes each new Apocalypse and its accompanying Exegesis, when the ether buzzes with dire predictions of the imminent demise of Perl, which will surely be crushed and buried under the massive changes that Larry is proposing. Run for your lives!

Think of It as Evolution in Action

But the real problem doesn't stem from those changes, nor even from people's exaggerated fear of them. The real problem stems from several billion years of evolution, which have inconveniently wired our brains entirely the wrong way. The trouble is, we're adapted to detect, highlight, and focus on anything new, different, surprising, or unexpected and to ignore the old, the commonplace, and the familiar. That made perfect sense back in the primordial jungle, where "new", "different", "surprising", and "unexpected" are frequently synonyms for "deadly". But it works against us in modern life, where focusing on that new, different, and surprising billboard will eventually get you run over by that old, commonplace, and familiar yellow taxi.

The same dangerous misfocus occurs every time Larry releases another Perl 6 design document. Our brains instinctively skip over the majority of familiar, unchanged Perl landmarks and, instead, zero in on the comparatively few features of the language that are actually changing.

And, frankly, the Exegeses only make things worse. They're entirely about demonstrating the new features of Perl 6, in short, carefully crafted examples. So they're unnaturally full of new, different, surprising, and unexpected code. It's no wonder they raise so many hackers' hackles.

So let's calmly back away from all those snarling differences, and turn our attention instead to the many aspects of Perl 6 that are staying the same (or, at least, reassuringly very similar).

To do that, we're going to make some of the Perl 6 design team "eat their own dog-food". That is, we'll take a few real-world Perl 5 programs that Nathan Torkington, Hugo van der Sanden, Dan Sugalski, Simon Cozens, and I use every day, port them to Perl 6, and see how much they actually change.

Mail Domination

Nathan Torkington deals with a lot of email. A lot of email. So he wrote a remarkably compact tool (called mailmap) to help him manage his extensive mail archives under Windows:

 1  use warnings;
 2
 3  $code = shift
 4     or die "usage: $0 code [file ...]\n";
 5  $process =
 6     eval "sub { local \$_=shift; $code }";
 7  die if $@;
 8
 9  @ARGV = map { glob } @ARGV;
10
11  while (<>) {
12    if (/^From /) {
13      $process->($msg) if $msg;
14      $msg = '';
15    }
16    $msg .= $_;
17  }
18  $process->($msg) if $msg;
The program grabs the first command-line argument from @ARGV (lines 3 and 4). It then uses that argument as the source code for an anonymous subroutine, by interpolating it into a string eval (lines 5 and 6). The resulting subroutine is set up to assign its first argument to a localized $_ , so the command-line code that's interpolated can then access that argument implicitly. If the eval fails to create the subroutine for some reason (most likely because the command-line code was invalid), the $@ variable will be automatically assigned an error message and the program will terminate (line 7).

Otherwise, the program takes the remaining command-line arguments in @ARGV, expands each of them to a list of files via the built-in glob command, and assigns the resulting list back to @ARGV (line 9). This list of files then specifies the sources from which the "diamond" operator (line 11) will read successive lines. The loop first looks for the start of a new mail message within the current file (line 12). If it finds one, it processes any previous message using the subroutine referred to by $process (line 13). It then clears the previous message (line 14). Whether or not it found a message boundary, it adds the current line to the message text it's accumulating (line 16). After all the lines from all the files have been read in, the loop terminates and, if there is a message still to be processed, it too is passed to the $process subroutine (line 18).

So, for example, Nat can extract the text of any messages from Larry from any file in his E:mail directory like so:

> mailmap 'print if /^From:.*larry/' E:mail\*
or check whether any waiting message mentions Perl 6:

> mailmap '/Perl\s+6/ and die "Yes"' mail\pending
or just print the subject lines of any messages on his X: drive that mention New Zealand fauna:

> mailmap '/sheep/ && print /^Subject:(.*)/' X:*
That last one is worth looking at more closely. The code in the single quotes becomes a subroutine that receives each complete mail message in the $_ variable. The subroutine (implicitly) looks for the pattern /sheep/ anywhere in the message. If it's found, the right side of the && is evaluated. This tries another implicit match against $_ , this time looking for the pattern /^Subject:(.*)/ . Matching a regex in a list context (such as the argument list of the print ) causes the match to return the substrings corresponding to $1 , $2 , $3 , etc. In this case, a successful match returns the contents of the subject line, which the print then prints.

It's all devilishly clever (as you'd expect from Nat). But what would it look like written in Perl 6? It would look like this:

 1  use warnings;
 2
 3  $code = shift
 4     or die "usage: $0 code [file ...]\n";
 5  $process =
 6     eval "sub { my \$_ = shift; $code }";
 7  die if $!;
 8
 9  @ARGS = map { glob } @ARGS;
10
11  while (<>) {
12    if (/^From /) {
13      $process($msg) if $msg;
14      $msg = '';
15    }
16    $msg _= $_;
17  }
18  $process($msg) if $msg;
As you can see, there are a few cosmetic differences, but overall it's very similar to its earlier Perl 5 incarnation. And, as we'll see, those cosmetic changes are generally for the better.

The first change is in the eval string (line 6), where local becomes my. In Perl 6, $_ is not longer a global variable, but a lexical. So instead of localizing it to a subroutine (which is hard to explain to beginners, and still confusing for many non-beginners), we simply declare a new lexical of the same name. As a bonus, there's now no confusion about the scope of the temporary alteration to $_. Like any other lexical, it's only accessible within the static scope of the eval'd subroutine.

The next change occurs on the very next line. Instead of checking $@ for an error message, the Perl 6 version checks $!. That's because in Perl 6 all four error variables ($?, $!, $@, and $^E) are -- mercifully -- consolidated into one ($!). This change is intended to make it far easier to write error-checking code without having to puzzle out where the particular error you're checking for is supposed to appear.

The third change is to the @ARGV variable (line 9). It's now called @ARGS, which is easier to remember, easier to read, easier to pronounce, and much easier to explain to Perl novices than @ARGV was. ("Well, you see, in Latin they're argumenti, which the Romans, of course, had to write as ARGVMENTI, so the natural abbreviation was @ARGV.")

The fourth difference is in the way the eval'd subroutine is called via the $process variable (lines 13 and 18). Because Perl 6 can work out that we're trying to call the subroutine referred to by $process, it doesn't require a dereferencing arrow between the variable and the argument list. That not only saves typing, it also saves typos: leaving out the dereferencer is a common mistake in Perl 5.

The final change that's required is to convert the Perl 5 string concatenation operator:

16    $msg .= $_;
to its Perl 6 syntax:

16    $msg _= $_;
This is probably the single most controversial change in the entire Perl 6 redesign. And (sigh) it's probably also the least significant. It's required because the Perl 5 dereferencer arrow ( -> ) is being changed in Perl 6 to a dot ( . ). Consequently, the dot concatenation operator has to be changed to something else in Perl 6. After exhaustive deliberation and discussion, Larry concluded that underscore was the best alternative. Despite the public outcry at this "wanton desecration", I suspect the new operator will grow on people, as they discover that a lone underscore is much easier to see than a lone dot. There are, of course, other changes we could have made, had we wished to. We could have allowed the program to handle "empty" code strings:

> mailmap "" E:mail\*
more gracefully, by replacing:

 3  $code = shift
 4     or die "usage: $0 code [file ...]\n";
with:

 3  $code = shift
 4     // die "usage: $0 code [file ...]\n";
That second version would accept an empty first argument as code (a "no-op"), but would still terminate if the first argument were missing. That's because the new // operator evaluates its right-hand side only if its left operand is undefined. So shifting off an empty (but defined) argument doesn't invoke the right-hand die . It would only be called if shift fails to find any arguments at all (and therefore returns undef ).

We could also have simplified the eval'd subroutine, by removing the declaration and initialization of $_, and replacing it with a parameter of the same name:

 5  $process = eval "sub (\$_) { $code }";
This parameter would now automatically be bound to the subroutine's argument. But, because it's named $_ , it will still be the default target for any pattern match or print .

These extra changes certainly would enhance the program, but the important point is this: we could have made them, but we didn't have to.

Without them, porting mailmap to Perl 6 is as easy as:

while (<>) {
   s/\blocal\b/my/g;   # $_ is now lexical
   s/\$\@/\$!/g;      # single error variable
   s/\@ARGV/\@ARGS/g;  # @ARGV becomes @ARGS
   s/\.=/_=/g;        # new string concatenator
   s/->//g;           # no dereferencer required
   print;
}
Of course, you'll never actually have to write anything like that yourself, since the Perl 6 distribution will come with a fully generalized source code translator program.

Escaping the Tar Pits

I hate tar. No, not the bituminous, high-viscosity, coal distillate. It's the voluminous, high-flexibility, UNIX aggregate that I dislike.

The tar program is used to package up ("to tar") and to unpackage ("to untar") collections of files. And, unfortunately, that's essential to my everyday work. Each time I grab a module distribution from the CPAN, its component files will have been tarred together, and probably gzipped (compressed) too. If someone sends me their code via email, if will often be uuencoded (ASCII-fied) as well. And every time I want to upload a new module to the CPAN, I first have to tar and gzip it myself.

The problem is: I can never remember which combination of tar's 52 command-line options will unpack an existing archive (memo to self: tar -xpf archive.tar), nor which are needed to make it package one up again (n.b.: tar cf archive.tar file1 file2 etc). So using tar is perennially frustrating.

Worse still, whenever I need to tar or untar a file, I usually need to gzip and uuencode it afterwards, or uudecode and gunzip it beforehand. So using tar is perennially frustrating and tedious.

Worst of all, people often tar up a large number of files into an archive without first putting those files in a single subdirectory. Later, when the archive is unpacked, it litters the current directory with those new files, which then have to be manually picked out and moved into a suitable subdirectory. So using tar is perennially frustrating and tedious and messy.

Naturally, I turned to Perl. I wrote two programs -- entar and untar -- that automate the entire packing/unpacking process. They also detect tarred files that are likely to cause a mess and sanitize them before they have the chance.

Let's look at the more frequently used of the two, untar:

 1   sub action { print "$_[0]\n"; system $_[1] }
 2
 3   foreach my $file ( @ARGV ) {
 4      my $original = $file;
 5
 6      if ($file =~ s/[.](uu)$//) {
 7         action "unuu'ing $file.$1",
 8                "uudecode $file.$1";
 8      }
 9
10      if ($file =~ s/[.](t?gz)$//) {
11         action "gunzip'ing $file.$1",
12                "gunzip $file.$1";
13         $file .= ".tar" if $1 eq 'tgz';
14      }
15
16      my ($to, $enbundle, $relfile) =
17                bundling_for($file, $original);
18
19      action "untar'ing $file $to",
20             "$enbundle tar -xpf $relfile";
21   }
22
23   sub bundling_for {
24      my ($file, $dir) = @_;
25      chomp(my @files =
26             open(FILELIST, "tar t <$file|")
27             && <FILELIST>);
28      return ("(no contents)", "", $file)
29         if !@files;
30      return ("to $files[0]",  "", $file)
31         if @files == 1 ||
32            $files[0] =~ m{/$} &&
33               !grep {$_ !~ /^\Q$files[0]/}
34                     @files[1..$#files];
35      $dir .= ".CONTENTS";
36      return ("to $dir",
37              "mkdir $dir; cd $dir;",
38              "../$file");
39   }
The program itself is a straightforward example of the kind of "shell glue" at which Perl excels. It starts with a small utility subroutine (line 1) that prints a progress message and then executes some shell command.

The main program is a single foreach (line 3) that iterates through each file specified on the command line, thereby allowing you to untar multiple archives with one shell command. The program first caches the name of each file (line 4) and then checks to see if the filename has a .uu suffix (line 6). If so, it must have been uuencoded, so the program informs the user and uudecodes the file (lines 7 and 8).

It then performs a similar check, this time for a .gz or .tgz suffix (line 10) -- which would indicate the file was gzipped as well. If so, it tells the user, then decompresses the file (lines 11 and 12). A .tgz suffix is a standard abbreviation of .tar.gz. If the suffix was used, the program re-instates the .tar component of the name (line 13).

It then calls another utility subroutine (lines 16 and 17) to determine whether the archive is likely to indiscriminately spray files across the current directory when it's unpacked. If so, the $enbundle variable is assigned a series of additional shell commands that create a single directory into which the archive can be safely untarred.

All that remains is to inform the user of the untarring and then invoke tar with the mystical -xpf flags (lines 19 and 20).

The bundling_for subroutine is where the program determines how to cleanly untar multiple files. It takes two arguments: the name of the intermediate .tar file that uudecoding and gunzipping have produced, and the name of the original file (line 24). It returns a list of three items: a message to be printed, a string containing any additional shell commands required to sanitize the unpackaging, and the pathname of the archive to be unpackaged.

It first opens a filehandle (line 26) to the piped output of a tar t command. The tar t generates a list of the names of the files that are packaged in the tarred archive. That list is then read in line-by-line (line 27), each filename is chomped (line 25), and the results are assigned to the @files array (line 25).

If the archive was empty (line 29), then bundling_for returns (line 28):

  • A message to that effect,
  • An empty action (there's nothing to make a mess, so there'll be nothing to clean up),
  • The original name of the archive.

If the archive contained exactly one file (line 31), or the first file in it was a directory containing all its other files (lines 32 to 34), then bundling_for returns (line 30):

  • A message indicating that everything will be unpacked into the first directory,
  • Another empty action (because the archive will unpack cleanly by itself),
  • The original name of the archive.

Otherwise, the archive must contain multiple unrelated files that will be messy to unpack. So bundling_for appends a .CONTENTS suffix to the name of the original file (line 35). This produces a suitable name for a new subdirectory where the archive can be safely unpacked. The subroutine then returns:

  • A message indicating that everything will be unpacked into that new subdirectory (line 36),
  • An action that creates the subdirectory and moves down into it (line 37),
  • The new relative pathname of the archive (line 38).

To summarize all that: you give untar an archive, and it performs the correct sequence of uudecode, gunzip, and tar needed to unpack the contents. And, if necessary, it also creates a subdirectory to unpack them into.

But what if we wanted untar to run under Perl 6 instead? What would it look like then? Well, it would look just about the same:

 1   sub action { print "@_[0]\n"; system @_[1] }
 2
 3   foreach my $file ( @ARGS ) {
 4      my $original = $file;
 5
 6      if ($file =~ s/[.](uu)$//) {
 7         action "unuu'ing $file.$1",
 8                "uudecode $file.$1";
 8      }
 9
10      if ($file =~ s/[.](t?gz)$//) {
11         action "gunzip'ing $file.$1",
12                "gunzip $file.$1";
13         $file _= ".tar" if $1 eq 'tgz';
14      }
15
16      my ($to, $enbundle, $relfile) =
17                bundling_for($file, $original);
18
19      action "untar'ing $file $to",
20             "$enbundle tar -xpf $relfile";
21   }
22
23   sub bundling_for {
24      my ($file, $dir) = @_;
25      chomp(my @files =
26            ($FILELIST = open("tar t <$file|"))
27             && <$FILELIST>);
28      return ("(no contents)", "", $file)
29         if !@files;
30      return ("to @files[0]",  "", $file)
31         if @files == 1 ||
32            @files[0] =~ m{/$} &&
33               !grep {$_ !~ /^\Q@files[0]/}
34                     @files[1..];
35      $dir _= ".CONTENTS";
36      return ("to $dir",
37              "mkdir $dir; cd $dir;",
38              "../$file");
39   }
As with the earlier port of mailmap , there are only a very few cosmetic differences between the Perl 5 and Perl 6 versions of entar .

The first difference is that $_[0] and $_[1] become @_[0] and @_[1], respectively (line 1). Likewise, $files[0] becomes @files[0] (lines 30 and 32). In Perl 6, the @ is an integral part of an array variable's name (it's even part of the variable's key in the symbol table). So the @ stays with the array, even when it's being indexed. The same applies to the % of a hash, though there are no examples of that in untar.

This fundamental change does take a little getting used to, but once you do, it vastly simplifies the rules about which sign to use where. It's also much easier to explain to newcomers, and it magically eliminates some very common programming errors (as explained in Exegesis 2).

But don't just take my word for that. If you download the Perl6::Variables module from the CPAN, you can actually test drive this new variable syntax under Perl 5.

Incidentally, the action subroutine is another place where we could have rewritten the code more idiomatically, by declaring named parameters:

 1   sub action ($say, $do) {
 1a         print "$say\n"; system $do;
 1b  }
We could have done that but, once again, we don't have to.

The second change to untar is one we've already seen in the previous example -- @ARGV becomes @ARGS (line 3). There's really nothing more to say about it here. Sic transit infamia argvmentorvm!

The third difference also relates to arrays. Specifically, to array slices. In Perl 5, to create a list of all the elements of @files except the first, we wrote:

34                   @files[1..$#files]
The somewhat obscure $#array syntax is no longer available in Perl 6. Instead we write:

34                   @files[1..@files.end]
Or we could take advantage of Perl 6's new semi-infinite lists and just write:

34                   @files[1..]
which has exactly the same effect.

The fourth change that's required is to convert the Perl 5 string concatenation operators to their Perl 6 syntax (lines 13 and 35). Once again, this is exactly the same change from .= to _= that we saw with the earlier mailmap example.

The final change is the most significant in untar, and the most visible. In Perl 6, open no longer takes a bareword filehandle name. Instead, it takes just the name of the file to be opened, creates a new anonymous filehandle, and returns it. So that all-in-one open/read/chomp statement:

25      chomp(my @files =
26            open(FILELIST, "tar t <$file|")
27            && <FILELIST>);

becomes:

25      chomp(my @files =
26            ($FILELIST = open("tar t <$file|"))
27             && <$FILELIST>);
Here too, if we wanted to use a native Perl 6 idiom, we could simplify the code to:

25   chomp(my @files =
26            <open "tar t <$file|" or die>);
In this version, the call to open returns a filehandle that is then used directly in the angle brackets of the read. The results are assigned to @files , and then chomped.

Not only is this new version more compact and more readable, it also provides explicit error generation. But, once again, if you don't want those benefits, you don't have to use the new syntax. In which case, the entire port from Perl 5 to Perl 6 could be accomplished with:

while (<>) {
 # arrays retain their @'s...
   s/\$(\w+)\[/\@$1\[/g;

 # @ARGV becomes @ARGS...
   s/\@ARGV/\@ARGS/g;

 # new string concatenator...
   s/\.=/_=/g;

 # No bareword filehandles...
   s/open\((\w+)(.*?)\)/\$$1=open($2))/g;
   s/<(\w+)>/<\$$1>/g;

   print;
}
It's (no longer) Greek to me

Always be careful what you ask for.

For example, when I asked the Perl 6 design team for some short, typical, everyday Perl programs, Simon Cozens sent me a 142-line converter from the 8-bit, single-byte-coded, ISO8859-7:1987 Latin/Greek character set to 7-bit ASCII. Simon has rather atypical everydays.

But because his program is elegantly data-driven, it was easy to adapt it to a more mundane, but still useful, task. Namely, converting data from the MacOS non-ISO 8-bit character set to regular 7-bit ASCII:

  1   #! /usr/bin/perl -w
  2
  3   my %translate = (
  4          "\r" => "\n",
  5      chr(128) => q{A},
  6      chr(129) => q{A},
  7      chr(130) => q{C},
  8      chr(131) => q{E},
  9      chr(132) => q{N},
 10      chr(133) => q{O},
 :
 53      chr(176) => q{[inf]},
 54      chr(177) => q{[+-]},
 55      chr(178) => q{<=},
 56      chr(179) => q{>=},
 57      chr(180) => q{[JPY]},
 :
126      chr(249) => q{},
127      chr(250) => q{},
128      chr(251) => q{},
129      chr(252) => q{},
130      chr(253) => q{},
131      chr(254) => q{},
132      chr(255) => q{},
133   );
134
135   while (<>) {
136       s{(.)}
137        { defined $translate{$1}
138                ? $translate{$1}
139                : $1
140        }ges;
141       print;
142   }
The bulk of the program is just an enormous look-up table (lines 3 to 133) that maps the 8-bit character codes of the MacOS character set to suitable ASCII approximations.

The mapping for characters 246 through 255 is special. Those characters are used as "combiners" -- to produce acutes, umlauts, cedillas, and other diacritical marks on unaccented characters. Since that's not possible in 7-bit ASCII, they are all ignored, by mapping them to an empty string.

Incidentally, the presence of these "empty" values in the %translate hash is the reason we need the defined test within the substitution, and can't simply write:

136 s{(.)}{ %translate{$1} || $1 }ges;
Once the mapping is set up, the while loop (line 135) reads in each line of input and applies the appropriate substitution, using the /g and /s modifiers to catch every character, including newlines. The substitution matches a single character at a time (line 136) and replaces it with the corresponding entry from the %translate table (line 138), if a suitable entry exists (line 137). Otherwise, the character is preserved (line 138).

Using an /e modifier allows the replacement block to check for the presence of an entry in the table using the ternary operator. This reduces the size of the translation table by nearly 50%, because it means there's no need to specify mappings for the first 128 characters of the two sets (which are identical). If the table doesn't specify a mapping, the ternary operator within the substitution simply returns the character itself. So no translation is performed.

Lastly, once the current line has been completely transliterated, it is printed out (line 141).

And how would that look in Perl 6?

Well, the first 134 lines that set up %translate would be exactly the same. So we'll ignore them. And the rest of the program would look like this:

135   while (<>) {
136       s{(.)}
137        { defined %translate{$1}
138               ?? %translate{$1}
139               :: $1
140        }ges;
141       print;
142   }
There are only two trivial differences: the %translate hash retains its leading % sign, even on look-ups (lines 137 and 138); and the ? and : of the ternary operator (lines 138 and 139) are doubled.

Perl 6 hashes retain their % signs for the same reason that arrays retain their @ signs: having them as an integral part of the variable name makes them easier to understand, and less error-prone to use.

The ternary operator doubles up because it was the only boolean test in Perl that didn't already use a doubled symbol. Besides which, Larry needed to steal the single colon for adverbs (as described in Apocalypse 3).

Of course, Perl 6 also provides a much simpler way to achieve the same "only apply defined translations" effect, if you want it:

136 s{(.)}{ %translate{$1} // $1 }ges;

Once again, the new // operator evaluates to its left operand unless that operand is undefined, in which case it evaluates to its right operand. Which is exactly what we need in this case.

And, once again, doing things this clever new way is an option, not a requirement. If we decide not to use any new features, then the Perl 6 port of Simon's mac2ascii program can be achieved with just:

while (<>) {
   s/\$(\w+)\{/\%$1\{/g;  # hashes retain their %
   s/([?:])/$1$1/g;       # ternary ops doubled
   print;
}
The Missing ln

Hugo van der Sanden has a useful little program that extends the behavior of the UNIX ln command. ln is like a MacOS "Make Alias" or a Windows "Create Shortcut"; it creates a file that is really a symbolic link (within the local directory structure) to some other file elsewhere on disk.

One of the problems with these kinds of symbolic link files under UNIX is that they're not easy to "retarget". That is, if we have symbolic links lx, ly, and lz pointing at actual files /olddir/x, /olddir/y, and /olddir/z, there's no easy way to collectively change them to point at the files /newdir/x, /newdir/y, and /newdir/z instead (the way, for example, MacOS lets you "Select New Original" from the information dialog of an alias).

Under UNIX you have to:

  • Mentally work out the name of the new target file.
  • Manually unlink the existing symbolic link.
  • Create a new symbolic link of the same name, but connected to the new target file.

That makes retargetting hundreds of files very tedious. Actually, it makes retargetting even one file very tedious. Hugo evidently thinks so too, because he wrote a Perl program -- lnsub -- that automates the process:

 1   #! /usr/bin/perl -w
 2   use strict;
 3   my $from = shift;
 4   my $to   = shift;
 5   my($ffrom, $fto, $file);
 6   foreach $file (@ARGV) {
 7      next unless -l $file;
 8      $ffrom = readlink $file;
 9      unless (defined $ffrom) {
10         warn "$file: $!\n";
11         next;
12      }
13      if (($fto = $ffrom) =~ s/$from/$to/) {
14         unlink $file;
15         if (symlink $fto, $file) {
16            print "$file: $ffrom -> $fto\n";
17         } else {
18            warn "Couldn't create link $file:",
19                 " $ffrom -> $fto: $!\n";
20         }
21      }
22   }
Having taken the usual precautions (lines 1 and 2), the program grabs the first two command-line arguments passed to it. It will use these two strings as a specification of the retargetting that's required. For example, if we wanted to take a set of symbolic links foo , bar , and baz and retarget them from files in the directory /usr/lib/beta/ to files (with the same names) in the directory /usr/lib/ship/ , we would call the script like so:

> lnsub /usr/lib/beta/ /usr/lib/ship/ foo bar baz
Or we could grab a large list of symbolic links within a directory tree (using the UNIX find utility) and retarget them all after an upgrade:

> set $files = `find perl-stable`
> lnsub /perl-5.6.0/ /perl-5.6.1/ $files
It's like doing a s/// directly on the filesystem.

The rest of the command line (i.e., what remains in @ARGV) is the list of symbolic links that are to be retargeted. The foreach (line 6) iterates through them. Each file is first checked to see that it actually is a symbolic link (line 7). If not, there's nothing to do, so the next iteration of the loop is requested.

Otherwise, the built-in readlink command is called (line 8) to get the name of the actual file to which the link refers. readlink returns undef if it can't get that information, and sets the system error variable ($!) to explain why. If that happens, the script echoes the warning, and gives up on the current iteration (lines 9 through 12).

Line 13 uses the "copy-then-modify" idiom: first setting up the $fto variable with the actual file name, and then immediately applying to it the substitution specified by the first two command-line arguments. The result is that $fto now has the name of the file to which the current iteration's $file should be retargeted. Of course, if the substitution fails, the if fails too, and once again the foreach moves on to consider the next file.

If the substitution succeeds, then it's just a matter of removing the existing symbolic link (line 14) and creating a new symbolic link to the file named in $fto (line 15). If that can't happen for some reason, the program issues a warning and once again moves on to the next file (lines 18 and 19).

Perl is perfect for this kind of sysadmin automation. After all, how many other languages have readlink, symlink, and unlink built in? Even in UNIX's native language, C, you'd have to explicitly import them from a library.

And what does this very typical Perl program look like in Perl 6? Like this:

 1   #! /usr/bin/perl -w
 2   use strict;
 3   my $from = shift;
 4   my $to   = shift;
 5   my($ffrom, $fto, $file);
 6   foreach $file (@ARGS) {
 7      next unless -l $file;
 8      $ffrom = readlink $file;
 9      unless (defined $ffrom) {
10         warn "$file: $!\n";
11         next;
12      }
13      if (($fto = $ffrom) =~ s/$from/$to/) {
14         unlink $file;
15         if (symlink $fto, $file) {
16            print "$file: $ffrom -> $fto\n";
17         } else {
18            warn "Couldn't create link $file: ",
19                 "$ffrom -> $fto: $!\n";
20         }
21      }
22   }
No, it's not quite identical to the Perl 5 original. There's a single character difference between the two versions. But don't feel bad if you had trouble finding that one distinguishing byte. It's hidden in one of those changes that have been specifically designed to make Perl 6 even easier to understand. It's in the renaming of @ARGV to @ARGS .

It didn't stand out because Larry is using a billion years of evolution against you: the Perl 6 name is cunningly chosen so as to look more familiar than the Perl 5 name it replaces!

Best of all, porting this example from Perl 5 to Perl 6 is as easy as pie:

> perl -p -i -e  'tr/V/S/'  lnsub
Open the pod bay door, Dan

Remember, these are not contrived examples; they're real programs that real Perl hackers use every day. And sometimes those real hackers are already hacking in Perl 6 -- without even intending to do so.

Consider this handy program contributed by Dan Sugalski:

 1   my $section_type = 'text';
 2
 3   sub start {
 4      my ($type, $intro) = @_;
 5      unless ($section_type eq $type) {
 6         print $intro;
 7         $section_type = $type;
 8      }
 9   }
10
11   loop: while(<>) {
12      print and next loop if /^\s*$/;
13
14      if (s/^[*]{3}\s+//) {
15         start('list', "\n=over\n");
16         print "\n=item $_\n";
17         next loop;
18      }
19
20      start('text', "\n=back\n\n")
21         if $section_type eq 'list';
22
23      if (s/^([*]{1,2})\s+//) {
24         start('text', "\n");
25         print "=head", length($1), " $_\n";
26         next loop;
27      }
28      elsif (/^\s/) {
29         start('formatted', "\n");
30      }
31      else {
32         start('text', "\n");
33      }
34      print;
35   }
It's a typical text-munging Perl application that converts from Emacs "outline" formatted text:

* A heading
** A subheading
*** A bulleted point
*** Another bulleted point
           open 0 && print <0>;
           some("indented text - not to be formatted");

Some unindented text that I<is> to
be formatted
into Plain Old Documentation:

=head1 A heading

=head2 A subheading

=over

=item A bulleted point

=item Another bulleted point

=back

    open 0 && print <0>;
    some("indented text - not to be formatted");
 
Some unindented text that I<is> to
be formatted
Having set the initial section type to 'text' (line 1), it defines a utility subroutine (lines 3 to 9) that prints a section introducer and updates the current section type, but only if it's not already processing a section of that type.

The program then iterates across each line of the input data (line 11). If the line is empty, it's simply copied to the output (line 12). Otherwise, a series of regex tests are used to determine whether the current line contains:

  • An Emacs bullet point, which is reformatted as an =item directive (line 14-18);
  • The end of a list, which is reformatted as an =back (lines 20 and 21);
  • An Emacs heading, which is reformatted as an =head directive (line 23 to 27);
  • A preformatted line (lines 28 to 30) or body text (lines 31 to 33), either of which is printed verbatim (line 34).

In other words, it's your archetypical, well-structured, text manipulation program. Perl doing what it does best.

But the most interesting feature of this program as far as we're concerned is that, in addition to being vanilla Perl 5, it's also perfectly valid Perl 6. No changes required.

Of course, if Dan were to write it in Perl 6 from scratch, he would almost certainly give start a proper parameter list:

 3   sub start($type, $intro) {
 4
 5      unless ($section_type eq $type) {
 6         print $intro;
 7         $section_type = $type;
 8      }
 9   }
He might even choose to use a single Perl 6 case statement, rather than a cascaded if :

12      given ($_) {
13         when /^\s*$/ : { print; }
14         when s/^[*]{3}\s+// : {
16            start('list', "\n=over\n");
17            print "\n=item $_\n";
18         }
19         start('text', "\n=back\n\n")
20            if $section_type eq 'list';
21         when s/^([*]{1,2})\s+// : {
22            start('text', "\n");
23            print "=head", length($1), " $_\n";
24         }
25         when /^\s/ : {
26            start('formatted', "\n");
27            print;
28         }
29         else {
30            start('text', "\n");
31            print;
32         }
But, of course, he wouldn't have to make either enhancement. In Perl 6, subroutines like start that don't have an explicit parameter list still pass their arguments via @_ , just as Perl 5 subroutines do. And if statements are still exactly the same in Perl 6.

So Dan's program will continue to work perfectly for years to come. Just as it is.

The Ascent of Perl

And that's the message.

Perl 6 will be different from Perl 5, but never gratuitously so. When syntax or semantics change, it will always be a change for the better: for greater consistency, for more intuitability, for extra Do-What-I-Meanness. And when new syntax and semantics are added, they will almost always be added as options -- alternatives that you can safely ignore until you actually need them.

Many Perl 5 programs will require only trivial syntactic changes in order to run under Perl 6.

Most Perl 5 programs will be able to be ported to Perl 6 automatically, via the standard translator program.

All Perl 6 programs will still look and feel like Perl.

The sky isn't falling.

Damian Conway is a highly evolved bipedal anthropoid, native to the urban jungles of South-Eastern Australia. He is slightly shorter than the average human, and more heavily built. His brownish facial fur shows the characteristic silvering of a mature male. In his natural habitat (the auditorium) he proclaims his territory with stentorian calls -- "INKONSTANTIME!", "BIKRISMUS!", and "OIHAVAPAYPERONTHAT! -- which can be heard up to several miles away. In early 2001, he was finally captured by the Yeti Another Society and is now regularly exhibited around the world for the amusement and edification of the Perl-hacking public.


Martin Krzywinski | contact | Canada's Michael Smith Genome Sciences CentreBC Cancer Research CenterBC CancerPHSA
Google whack “vicissitudinal corporealization”
{ 10.9.234.152 }