2024 π Daylatest newsbuy art
Tango is a sad thought that is danced.Enrique Santos Discépolothink & dancemore quotes
very clickable
data + munging

The Perl Journal

Volumes 1–6 (1996–2002)

Code tarballs available for issues 1–21.

I reformatted the CD-ROM contents. Some things may still be a little wonky — oh, why hello there <FONT> tag. Syntax highlighting is iffy. Please report any glaring issues.

The Perl Journal
#10
Summer 1998
vol 3
num 2
Just the FAQs: Understand References Today
The essentials of data structures.
Infobots and Purl
IRC Robots And The People Who Love Them.
Perl 5.005
The Next Big Perl.
Learning Japanese
Using an HTML filter to read a foreign language.
Parsing Command Line Options
The Getopt::Long module and friends.
Safely Empowering Your CGI Scripts
What to do when your CGI scripts need superuser powers.
Perl News
What's new in the Perl community.
OLE Automation with Perl
Controlling Excel, Notes, and Access with Win32::OLE.
Ray Tracing
Rendering three-dimensional images.
Threads
Parallel execution paths in Perl.
Debugging and Devel::
Modules to help you bulletproof your code.
The Third Annual Obfuscated Perl Contest
The Perl Journal One-Liners
Gurusamy Sarathy (1998) Perl 5.005. The Perl Journal, vol 3(2), issue #10, Summer 1998.

Perl 5.005

The Next Big Perl.

Gurusamy Sarathy


I hate waiting. But when I have to wait, I always feel better if I knew exactly what it was that I was waiting for. It's probably the same for most of you.

The next big version of Perl is getting polished up as I type this, and it occurred to me that I might say something to soothe the poor souls who have slipped into interminable waitstates over its arrival. I might even do some good and correct some of the rumors. So in this article, I'll describe the upcoming major release of Perl, uncovering some facts, interjecting a few opinions, and debunk a few myths.

PROCESS

THEORY

Software development is somewhat of a witches' brew. No one can quite explain how it's done. All anyone can suggest is that it takes a lot of creativity.

I hear a certain large corporation has given up on traditional software development and has invented their own model. It works like this: Hire cheap chimps by the cageful. Imprison them with dumb terminals in vast underground windowless labyrinths. Sample the network traffic at random intervals. Recombine samples until result is good enough to run on the most popular architecture. Cut CDs and sell. One of my collegues has a name for this model: "monkeyware."

There are other models. One of them goes like this: Do all the thinking within one head, or a few heads. Make real sure it sounds good. Write the spec and hype it. Using pack animals, translate it into source code. Yell to sell. Add more hype to taste. This model I call "donkeyware". It generates a lot of noise, but usually doesn't sing very well.

There is a third model: Choose a nice, open location on the ether. Plant a tree. It may be deserted at first, but soon enough, a little puddle forms. It looks like an oasis from afar, so nomads and other wandering critters come looking. Sometimes there are arguments and people leave ("freely they came, freely they shall go"). But a culture develops.

Behold "peopleware". The network is both the medium and the place. Try it. Like it? Keep it. There's no buying or selling. Perl is built this way.

HOW IT'S MADE

To give you a taste of Perl culture, some history is in order.

Scene: The Perl Porters mailing list, the primary Perl development forum, around the end of 1997. The mailing list was in a particularly vile mood. Flame wars were frequent and long-lasting. Some of the core developers (including Larry Wall) felt that progress required a very high degree of coordination, and the mailing list wasn't sufficient. Enter oneperl, an effort to coordinate Perl 5.005.

When Perl 5.004 was released, there were two ports of Perl for Win32 - ActiveState's Perl, and the Win32 support in the 5.004 Perl source code. The oneperl mailing list had been set up as a private forum to facilitate merging the two, along with possibly MacPerl and other divergent Perl ports. It also serves as the forum for tasks identified in biweekly teleconferences between the maintainers of the various ports and Larry. Both the list and the teleconferences are sponsored by O'Reilly & Associates as a continuation of the unification effort that they helped initiate at the Perl Conference v1.0. O'Reilly also provided early funding for the development track, heralding a new level of industry participation in the development of Perl.

It briefly seemed that oneperl would be the de facto forum for discussing technical issues quietly and efficiently. However, the participants quickly realized that the presence of the private list was to some extent subverting what could be useful discussions on perl5-porters ("This list is the Cathedral", Chip Salzenberg exclaimed amid a lengthy discussion). At that point it was resolved that all substantive issues would be discussed on perl5-porters - no matter what the temperature - and the oneperl list will be only used for the nitty-gritty of synchronizing the maintainance activity on the Perl repository. It has served that purpose very well since.

End of story. I hope this scares perl5-porters into behaving.

A major release wouldn't be major if it didn't have significant new features. The next release has dozens, including the ability to create operating-system-level threads using Perl code (see Dan Sugalski's article in this issue), a real compiler that can produce binary or bytecode executables from Perl code, a Perl C++ object API, more reliable signals, and a much-revamped regular expression engine.

Probably more significantly, there have been - quite literally - hundreds of bug-fixes since the last major version, 5.004. Suffice it to say that this will be the biggest ever major release of Perl.

HOW IT WAS MADE

This will be the first major Perl release that was developed using a centralized repository accessible to multiple Perl developers. The version control system enabled safe and rapid parallel development - something that was difficult to do before. Perl is now developed on two separate tracks: a maintenance track (managed by Tim Bunce), and a development track (managed by Malcolm Beattie and now yours truly). The maintenance track collects bugfixes and changes that have a low impact on compatibility with the major released version, while the development track supports more ambitious changes. Both tracks generate independent releases, distinguished by their release numbers. Maintenance releases have a "subversion" number less than 50, while developer releases (being more subversive) have numbers above 50. So versions 5.004_50 and above have thread support.

The two tracks are independent, but are merged at every major version of Perl. So 5.005 will have the features found in both tracks.

This release also heralds a new level of industry participation in the development of Perl. O'Reilly & Associates facilitated development by arranging teleconferences and early funding of the development track.

ActiveState Tool Corp. worked long hours with the maintainers of core Win32 Perl support to merge their C++ Perl API into the standard source code. As a result, ActiveState will be able to build their Perl products from the standard distribution after 5.005. That makes it possible for their applications built with the Perl Object API (described later) to be used with Perl binaries built from the standard sources. More importantly, this makes the Perl Object API an intrinsic part of Perl, to be developed in the future as the community sees fit.

As a reflection of the sheer number of new developments in the upcoming release, Larry has suggested that it may need a deci-increment rather than a milli-increment. Don't be surprised if you hear people calling it version 5.1 instead of 5.005.

PRODUCT

Now it's time to disclaim all my claims. Everything mentioned in this article should be considered tentative information (at best) or uninformed opinion (at worst).

Most of the features mentioned here can be experienced in the development versions. The latest development version as of this writing is 5.004_66.

Almost all bugfixes should also be available in the maintenance versions of 5.004. The latest maintenance version as of this writing is 5.004_04, and 5.004_05 is undergoing trials.

Maintenance releases:
https://www.perl.com/CPAN/authors/Tim_Bunce

PERFORMANCE

There are a number of optimizations that will, potentially, speed up your programs.

In 5.004, sort() uses (internally) the quicksort routine provided by your system's C library. Perl now has its own implementation of quicksort, highly optimized to minimize the number of comparisons. Thus, your programs that use sort() will probably run faster. The newer implementation is also fully resistant to coredumps - unlike the quicksorts in most C libraries - when faced with badly written comparison functions.

When you build 5.005, you can tailor Perl's malloc to reduce memory usage, at a cost: your programs will run a little slower. Now you can compromise between memory and speed.

The regex engine is faster for many common operations, such as /[a-z]/i. The engine now avoids copying strings whenever possible, so that string-shortening transformations (such as s/foobar/bar/) operate more quickly. In previous releases of Perl, regexes were limited to a compiled size of 32767 bytes; this is no longer the case. It is possible to store the compiled representation of a regex in a variable, so the same regex can be used in multiple places. In other words, 5.005 allows this:

$re = study /blah/;
$str =~ $re;
$str2 =~ $re;

This makes it possible to compile a regex once and use it over and over again, or to interpolate the compiled form in other regexen ($str3 =~ /$re/), thereby avoiding the cost of repeated recompilation.

The compiled regex is now optimized using a peephole optimizer which eliminates redundancies like inconsequential branches and recursions.

Counting characters using tr/a/a/ is much faster.

THREADING

This is perhaps the most exciting new feature. Users can spawn native threads from Perl, and use a simple locking mechanism to synchronize access to all kinds of Perl datatypes. Asynchronous procedures can be readily implemented using a simple interface. Here's an example:

use Threads;
my $child = async { return { Foo => do_stuff() }; };
...do other stuff while child thread runs...
my $resulthash = $child->join;

As you can see, a thread can return an arbitrary Perl value. See Dan Sugalski's article in this issue for more details.

Note that there are the beginnings of support for "fake" threads, for platforms where threading is not supported at the operating system level. This user-level threading is handled by Perl itself using a fake scheduler (running at the granularity of a Perl opcode). It is therefore unlikely to be as efficient as native threading by the operating system, and it may not preserve native thread semantics.

Since threading is a completely new feature, it's considered to be in beta status until the major release that follows this one (5.006, if this version is indeed called 5.005). As with all beta features that make it into a production release, it won't be available by default. You'll have to request thread capabilities when you build Perl.

SIGNAL RELIABILITY

I don't mean to shock you, but no production release of Perl has ever had reliable signal delivery. All previous versions supported signals, true, but they were unreliable under certain conditions. Even the Perl documentation was silent about this (probably because such failures are rare and not consistently reproducible). Consider a loop such as this (code adapted from a version by Ilya Zakharevich):

$SIG{ALRM} = sub { $a = -$a };
sub dec { --$_[0] }
while (1) {
     $a = 0;
     alarm(1);
     dec($a) while $a <= 0;
     print ++$b, ": $a\n";
}

This code crashes randomly after an unpredictable number of iterations. Maintenance versions of 5.004 have some black magic to make the failures rarer still, so the outer loop may need to run hundreds (even thousands) of times before you see it fail. Happy staring! (This won't run on systems like Win32, where alarm() is unsupported. The concept of signals is not exactly well supported by Win32, either, so you're not missing much.)

The problem arises because Perl has no mechanism to control when the operating system delivers the signal. That is, signal delivery usually happens asynchronously, and Perl's data structures might well be in an inconsistent state when it happens. Modern operating systems do provide mechanisms for applications to block and unblock signal delivery, but these methods are either too inefficient or too non-portable for Perl.

Two variations of reliable signals are being developed. In the first method, Perl records signals delivered by the operating system in the background (that is, whenever the operating system asynchronously delivers the signal), and calls those signals only when it is absolutely safe to do so. There is a small efficiency hit associated with this method, since Perl needs to periodically (say, at every statement boundary) check if a signal handler needs to be executed. This method might also change the semantics of signal handling in subtle ways due to the delayed delivery.

The second variation requires an operating system that supports building a natively threaded version of Perl. (Most operating systems qualify.) Perl spawns a separate thread that waits for signals and processes them as they arrive. This method will result in fewer changes to signal semantics, but it brings with it the baggage of threaded Perl.

There are other improvements in reliability. Regular expressions have a tendency to run out of stack space, since Perl's regex implementation uses recursion. Changes in the regex implementation ensure that this happens less often.

All Perl programs are executed in a two step process. In the first step, Perl internally "compiles" your program into a set of opcodes (affectionately called "ops"). In the second step, the ops are run one-by-one, in a very particular order, determined by the ops themselves. The internal machinery that executes this second step is generally known for short as Perl's runtime.

Now, the runtime machinery can be called recursively by an op. In fact, most code that gets called "magically", like code associated with tied variables, is implemented this way. In 5.004, this reentrancy resulted in reallocation of Perl's internal data structures (like its main "stack"), and cached copies of pointers into these data structures could suddenly become invalid. If you encountered this particular problem, you could most probably get by with just pre-extending the Perl stack with a little trick:

     { my @a = (0) x 20000; }

This way, the stack never needs to be reallocated in the first place. This was solved by using a "stack of stacks". A fresh new stack is used whenever the runtime reenters, so that local copies of pointers into the stack are guaranteed to remain valid. As a result of these changes, it is now safe to do things like invoking one sort from within another.

Several memory leaks have been fixed. Embedding multiple Perl interpreters is safer now.

Locale support has been vastly improved:

use locale;
$not_this_again = "déjàvu";
($x) = ($not_this_again =~ /(\w+)/);
($y = $x) =~ s/(\w+)/\U\1/;
($z) = sort ($y, $x);
print "x = $x\n"; # length($x) should be 6.
print "y = $y\n"; # length($y) should be 6 and all uppercase.
print "z = $z\n"; # $z could be equal to $not_this_again.

The behavior of local() on array and hash elements is now well-defined. It used to dump core or cause other undesirable behavior if the localized element was modified.

The Perl configuration process has been improved. 64-bit systems are slightly better supported now. It is easier to maintain persistent site policies for building Perl, so build-time questions don't have to be answered every time Perl is built. See the INSTALL file in the Perl distribution for details. Extension modules that come with architecture dependent files (those that have XS, for instance) are now fully installed in architecture dependent locations.

SECURITY

A few taint leaks (situations where an operation removes taintedness when it shouldn't) and taint omissions (situations where an unsafe operation should taint data, but doesn't) have been eliminated.

The -e switch has been made more secure. Previously, -e switch processing used to create temporary files on the file system, which could, in theory, be tampered with.

COMPATIBILITY

A few mandatory warnings that were introduced in 5.004 have been made optional, only appearing when -w is in effect. This includes the notoriously useful "my" variable %s masks earlier declaration in same scope ... warning. Seems that this warning breaks more code than it fixes. We hear you.

Most code written for Perl 4 should still run largely unmodified. There are even a few changes that improve compatibility with the old-age Perl, like the behavior of eval @a. The eval() function now provides a scalar context to its argument, as it used to in Perl 4.

COMPILER

In case you didn't know, the Perl interpreter compiles programs into an internal tree of opcodes before your program is executed. However, this compilation is done on the fly, and the results aren't saved. Thus the compilation cost is incurred whenever your Perl program is invoked.

Perl now comes with a real compiler that can digest a script and emit equivalent C code, which can then be compiled into a native executable. The result won't necessarily run any faster, because it uses the same runtime engine as the regular Perl interpreter. But you won't have to wait for your program to compile.

As with most compilers, the Perl compiler has a number of backends that digest your Perl program in different ways. The previous paragraph described the 'C' backend. Other backends do exist; the 'B' compiler backend emits precompiled, platform- independent bytecode, which can then be run with the supplied bytecode interpreter. The 'CC' backend generates optimized C code that is the result of semantic translation of straight Perl into not-so-straight C. The CC backend holds the most promise for realizing significant performance improvements, but a number of optimizations have yet to be implemented.

The Perl compiler is also considered a beta version. Since it is only an extension module - it doesn't affect the Perl core itself - the default build will install it.

MULTIPLATFORM SUPPORT

The Win32 support in Perl 5.004 has been vastly improved. Additional keywords are supported, including times(), wait(), waitpid() and crypt(). Besides Visual C++, two other compilers are now supported: Borland C and mingw32/gcc. The capability to build Perl using gcc is important, because it opens up all manner of do-it-yourself possibilities for Win32 users, including building extensions from the CPAN that require a C compiler.

OS/2 and VMS support have seen a lot of activity to support threads, and other general enhancements. DOS is now a supported platform, via the djgpp compiler. BeOS is now supported too.

See the operating system-specific README files for more information on a particular port.

PERL OBJECT

The Perl Object is a new object-oriented abstraction, implemented in C++. Everything you can do with Perl, you can do with this object. This has two purposes. First, it incorporates the operating system and C features needed by Perl into a set of abstract C++ classes. This makes it easier to identify which features are lacking on a particular system, and to implement them when possible. Second, it makes it possible for the "host" (that is, the entity that creates and uses the Perl Object) to create and use multiple independent Perl Objects within the same process space, possibly under different threads.

The astute reader will note that the Perl Object support described here resembles the already existing support for creating multiple Perl interpreters. The similarity is mostly accurate, but note that the multiple interpreter support makes distinct interpreters share some of the global data space, while different Perl Objects have no data at all in common. Both models have their advantages and disadvantages.

REGEX ENHANCEMENTS

The regex engine has been seriously overhauled. It now supports the following major features.

  1. Positive and negative zero-width lookbehind assertions. These are like lookahead assertions, only the assertion applies to the text that precedes what's being matched.
  2. A zero-width assertion for evaluating arbitrary code. This can be used to perform a side effect when a portion of a regex matches. Since the assertion always holds true, the code itself has no control over the matching behavior.
  3. Independent subexpression assertion. This allows "interpolating" one regex within another. The interpolated regex doesn't backtrack, so it behaves as though it is anchored to the string at whatever point the regex engine is examining at the time.
  4. Conditional branches. These allow the outcome of an assertion to determine which regex "branch" to attempt later.

See the perlre documentation for the details of these features.

MORE COMPLETE TIES

Perl has a mechanism whereby arbitrary behaviors can be tied to various basic Perl datatypes. More specifically, there are TIESCALAR, TIEHASH, TIEARRAY, and TIEHANDLE mechanisms to allow regular Perl variables to behave in arbitrary ways that you define.

Previously, the TIEARRAY and TIEHANDLE mechanisms were incompletely implemented. The new release fixes that, allowing the behavior of those two datatypes to be almost completely implemented externally. See Tie::Array and Tie::Handle modules for details.

NEW MODULES

There are several new modules:
attrssets subroutine attributes
B::Perl compiler and tools
Thread::Perl thread creation and support
Fatalmake functions/builtins succeed or die
fieldscompile-time class fields
ExtUtils::Packlistmanage .packlist files
ExtUtils::Installedinventory management of installed modules
Testframework for writing test suites
basedeclare base classes
Tie::Arraybase class for tied arrays
Tie::Handlebase class for tied handles

Many new modules have been extensively improved:
DB_Filenow supports version 2 of Berkeley DB
Benchmarkkeeps more accurate time
Cwdis faster
MakeMakersupports writing empty makefiles
Debuggernow supports "watching" expressions

MISCELLANY

There is a new composite type, informally known as the pseudohash. This is a data structure written and accessed like a hash, but stored internally as an array. The keys allowed in a pseudohash are declared with a statement such as use fields qw(foo bar). Perl then checks the keys for validity during compilation, making it useful for implementing objects. As of this writing, usage is restricted to "typed" lexical references.

That brings us to lexicals and strong typing. A lexical scalar may be "typed" as being of a particular variety. To assign $spot to the Dog type, you'd say this:

   my Dog $spot

Currently, pseudohashes and the compiler's CC backend are the only two features that take advantage of this facility, but a number of other optimizations can be built around it.

Keywords can be globally overridden by importing user subroutines into the special package CORE::GLOBAL::. Previously, keywords could only be overridden on a per-package basis.

The lock keyword is new, as is the INIT keyword. These are described later.

There is support for per-interpreter, and possibly per-thread, extension data.

$^E is now supported on Win32, and contains the value of the GetLastError() function. It can be set, which does whatever SetLastError() does on that platform.

The syntax EXPRESSION foreach EXPRESSION is now supported. Note, however, that a lexical declaration in the conditional is not visible before the statement modifier, due to limitations of the parsing technology used in Perl.

Slice notation on glob elements is supported. For example, *foo{ SCALAR, CODE} returns a list of two values: a reference to the scalar value, and a reference to the code value, both extracted from the symbol table entry for foo.

Bareword package names can now end in ::. This helps disambiguate barewords you use the indirect object syntax for method calls: my $spot = new Dog::.

Many new diagnostic messages have been added and can be activated with -w.

prototype('CORE::open') now returns useful results.

exists $Foo::{Bar::} can be used to test whether a package exists.

You may now re-bless an object within its DESTROY() method to delegate its destruction (for instance, to superclasses).

Perl now handles printf() format conversions consistently, regardless of the operating system.

PROBLEMS

PERFORMANCE

Initial tests with a threads-enabled 5.005 indicate a slowdown on the order of 5-10% if you have only a single thread in your application, and 20-30% if you have multiple threads. I view these numbers with vast suspicion - and so should you. For one thing, these numbers may not reflect reality when it comes to real-world code, and for another, the numbers may be very different when Perl is actually released. I suspect the performance won't be worse than these numbers, and predict it will be much better.

Of course, Perl built without threads should be comparable to the last major release, and many individual areas like regexes will be noticeably better.

MAKING YOUR MODULE THREAD-SAFE

While Perl itself will be threads-capable, which parts of CPAN will be is a sticky question. What is clear is that thread-safe modules will have to be designed that way. Some rules of thumb for module design:

  • Don't use global variables, only lexicals (my variables). Lexicals are automatically per-thread. If you must use globals, consider using file-scoped lexicals.
  • Any non-lexicals that you do use should be read-only.
  • Don't hardwire globals of other kinds (like filenames) in your module. Have your functions take them as parameters instead.
  • If you're writing an object-oriented module, store all the state associated with an object within the object itself, and provide access to the state common to all objects in a such a way that such access can be locked when needed.
  • If your module includes XS code, ensure that the above rules apply to your C code as well (substituting "auto variables" wherever you see "lexicals").

SOURCE INCOMPATIBILITIES

There are very few user-visible incompatibilities at the Perl language level. However, some changes were made to undocumented Perl behavior, and these might bite you if you're unlucky enough to rely on them. Make sure you read the perldelta documentation to catch these changes.

If you defined a subroutine named INIT, you will get unexpected results. INIT subroutines are now special, like BEGIN and END, and are called just before the Perl runtime begins executing the internally-compiled opcodes. If you're affected by this, specify the package when you declare INIT, like so:

sub Foo::INIT { ...whatever... }

There's a new lock keyword. It's different than other keywords, because it tries to be smart about whether you meant it as a keyword or a regular subroutine. If Perl can determine at compile time that you declared a subroutine named lock(), and that you did not use Thread, then (and only then) the keyword interpretation is skipped and Perl resolves it as the function call. As with INIT, you may continue to define and call subroutines named lock() by providing explicit package qualifiers.

Magically called code now has more restrictions. You may not jump out of magically invoked code (like say, the FETCH() method) using one of the loop control operators, or goto LABEL. (Prior to 5.005, these restrictions were imposed by the operating system in the form of core dumps.)

If you rely on fine details of the signal implementation in prior releases, you may be unable to take advantage of the reliable signal handling support.

The Perl sources have been converted to ANSI C. This was partly necessary to support C++. It might also be a sign of progress, depending on how you feel about ANSI. If you happen to be stuck with a C compiler that doesn't support ANSI C, you might want to investigate free compilers (like gcc) that do. It is also likely that an external conversion tool like ansi2knr will be supported in time for the official 5.005.

If you have written Perl extensions that use XS (and therefore C) code, you might want to ensure that your C code is ANSI-compliant. This is largely a matter of declaring your prototypes in the right way.

There are also a few incompatible changes to Perl's C API made to accommodate threads. These incompatibilities don't impact you unless you build Perl with thread support, but you might want to obey them if you're designing a module for maximum portability.

The threaded version of Perl moves globals that must remain thread-specific into a per-thread structure, thr. Access to these globals is made to seem as though threads didn't exist via preprocessor macros of the same names. If your C code accesses these globals, you'll need a dTHR; declaration in some instances. The declaration initializes a pointer to the thread-specific structure. You don't need it if you've already declared a dSP (since dSP declares dTHR also for you). A good way to find out where the declaration is needed is by letting your C compiler tell you - just add a dTHR to all functions where it complains about thr being undeclared.

The global variables errgv and defgv are now thread-specific, so the old idiom of GvSV(errgv), etc., won't work in threaded Perl (it is still legal in Perl without threads). perl_get_sv("@", TRUE) is the recommended API for accessing all magic globals. This works with older versions also.

BINARY INCOMPATIBILITIES

The incorporation of threading and major bugfixes has introduced a few changes to the internal structure of Perl. This means that extension binaries built with older versions of Perl won't work with 5.005. Most extensions that didn't come with Perl (say, those that you downloaded from the CPAN) are placed in a site library that is shared across Perl versions by default. Upgrading to the next major version will cause binaries in the site library to become incompatible with the newer Perl. You will need to carefully read the INSTALL document in the Perl distribution if you want to continue using the older version(s) of Perl after installing the new one.

Note that the decision to break binary compatibility is not something that is made lightly. We hope not have to do this again in the near future.

HOW TO COPE

If you are using Perl for production-level tasks, it makes sense to thoroughly test the new Perl at your site before depending on it for mission-critical purposes. Remember, every Perl release is put through its paces by a comprehensive test suite and hundreds of volunteers who have built and tested it on virtually every software and hardware architecture on the planet. However, there is still a chance that your programs rely on a misfeature of a previous version that we "fixed", so it's important to do your own testing.

Here's a suggested plan of action if you're using Perl in a production environment.

  • Thoroughly read the release notes, the INSTALL document, and the perldelta documentation. These cover most of the user-visible changes. If you are interested in the details of a particular change, you should also scan the Changes file. Also read the Changes5.00 x files if you are not upgrading from the immediately prior major release.
  • Build Perl and install it in a completely different location than your standard Perl. Don't delete your old Perl yet. Build and test all extensions that you normally use with the new Perl.
  • Assuming you haven't hit any bumps, switch your PATH (or whatever you have to do to get the new Perl and not the old one) and try running your production applications. Try this on a weekend if you anticipate trouble. One possible area of trouble would be new warnings. The next release doesn't introduce any new mandatory warnings, and even withdraws a few old ones, so the only way you'd see new warnings is if you used -w in your production applications. (This is widely considered a questionable practice.) The -w switch catches a vast number of potential problems, but I normally don't enable it in production applications, because it gives the Perl interpreter carte blanche to make all sorts of noise. That said, if you are at all interested in the correctness of your Perl code, you would go and test your production code ASAP by running it with -w, just to see if Perl discovers any dormant problems with your code.
  • Notify all Perl users in your establishment that a new Perl is being stress tested. Prepare them for any potential problems. Be sure to tell them if you have built Perl in a non-standard or non-default way. Make a clear note of any beta features you included (like threads).
  • If you're satisfied the new Perl works for you, switch to it. This may be as simple as pointing a symbolic link to the new installation on some systems. On others, it might involve changing the system registry or setting a system-wide path to the new location. Even after doing this, I would keep the old version of Perl around for a few months.
  • If you have found a "confirmed gratuitous incompatibility" and are sure it isn't noted in the documentation, we'd like to hear about it. Send a detailed message about it using the perlbug utility that comes with Perl. If you have been unable to build Perl, you may have to compose your message manually and send it to perlbug@perl.com. Be sure to give enough details about the version of Perl, the operating system, and any other relevant information.

POSSIBILITIES

Some features have been discussed but not yet included into the development track. This section summarizes them.

LEXICAL WARNINGS

In current versions of Perl, -w is a global switch that turns all warnings on. You don't have any way to turn this switch off or on for particular scopes. You can localize $^W, of course, but since it's not a lexical variable, external code might set it and thus "break" a module that was never meant to be run with warnings enabled. The boolean nature of the switch also means that there is no granularity of the warnings - you either get all of them or none.

Lexical warnings change all of this. $^W becomes a lexically scoped variable, and can be set at compile time to enable or disable specific classes of warnings. This provides much more fine-grained control over warnings. It also allows new warnings to be added without fear of breaking old code.

foreach OPTIMIZATION

When you say foreach (1..10000000), Perl creates a large list, gobbling up lots of memory. We hope to implement this list as an iterator, which means that the memory needs will be negligible.

MORE GENEROUS INTERPRETATION OF TEXT FILES

For historical reasons, text files on different platforms have different notions about what begins a new line of text. On Unix platforms, the newline character is the linefeed (ASCII 10), and the operating system makes no distinction between text files and binary files. On DOS and its descendants (OS/2 and Win32), files can be accessed in two distinct modes: binary mode and text mode. Binary mode behaves just the same as Unix. In text mode, however, a newline is a carriage-return (ASCII 13) followed by a linefeed (ASCII 10). On the Macintosh, a newline is represented by ASCII 13.

The next version is likely to be more generous in accepting text files from all three universes, by treating both ASCII 13 and ASCII 10 as whitespace, and by intuiting a consistent interpretation of "newline" for a given source file from its actual contents.

PREDICTIONS

Thus far, you've heard my guesses about the upcoming release, which are based in reality. Now I'll try to predict what will happen after 5.005 is released. These are opinions, nothing more.

I believe that threading will be alpha quality. Various unresolved issues have been identified, and it seems unlikely that all of them will find a satisfactory conclusion before release time. For example, the issue of how to limit execution of thread-unsafe modules is important but hasn't been resolved. Another is what to do about data in global symbol tables that need to be used in a thread-specific fashion; a good example is opening files in multiple threads. Perl stores filehandles in the global symbol table, so multiple threads would trample over one another's state if they all used the same filehandle to open files. Ideally we would have a way to store filehandles as lexicals, so that they would automatically be per-thread. Fortunately, the Symbol module can help by generating anonymous globs to hold our filehandles. But this is hard.

I feel that it will take a few revisions to iron out all the wrinkles in the performance of threaded Perl. There are likely to be race conditions that haven't been identified, since not much Perl code uses threading yet. The development of real threaded applications will spur this refinement.

The compiler is likely also to have bugs, and a few important capabilities are still unimplemented in the C backend. For example, compiling modules that use XSUBs or the Autoloader is still tricky.

There has been some talk of concentrating on performance improvements after the next major release. This will most likely happen in the maintenance branch.

The Perl repository is still isolated from most of the active contributors in perl5-porters. Setting up a transparent mechanism for allowing people to peek into the repository will be important for keeping up the momentum and the "release early, release often" credo.

Larry is working on supporting Unicode within Perl. This will be an important feature for later versions of Perl, both for interoperability with other Unicode environments and for internationalization.

XML (the Extensible Markup Language) is an emerging standard for representing web content. It has been talked about a lot in Perl circles lately. Perl-based XML translation tools will become available soon (Larry is working on one).

PRICE

$0.00.

If you are feeling good about yourself and/or about Perl, send an explanatory note to perl-thanks@perl.org. Consider becoming a member of The Perl Institute (www.perl.org) and subscribing to The Perl Journ...umm, never mind.


Gurusamy Sarathy (the first name is silent - friends just call him saa-raa-thee) has played an active role in "fixing" perl in various ways over the last four years. He works and studies at the the University of Michigan and can be reached at gsar@engin.umich.edu. His Ph.D. thesis is reportedly about a way to make electrons as wholesome as atoms, so that virtually all reality can be fixed.
Martin Krzywinski | contact | Canada's Michael Smith Genome Sciences CentreBC Cancer Research CenterBC CancerPHSA
Google whack “vicissitudinal corporealization”
{ 10.9.234.152 }