Felix S. Gallo
There's been talk about a new interpreted language designed with the Internet and the World Wide Web in mind, one which combines safety, object-oriented power, near-native speed and platform independence with a rich and extensible class framework.
There's also been more than a little hype about a competitor to this language: an obscure, relatively unsafe, complex Internet language upstart which was born from a failed effort to make kitchen appliances programmable. Does this language, inexplicably named after coffee, have a chance against Perl's Penguin?
A Peek at Penguin
Penguin is a freely available set of Perl 5 modules which implement the transmission, receipt, setup and execution of Perl code between any two computers on any kind of network. Like Java, its intent is to ease and promote Internet application development; unlike Java, it's designed with philosophies and features which simplify development of complex non-toy applications.
The basic tenet of Penguin is that you can write code and send it to a remote computer to be executed, and conversely, that you can receive code from somewhere else and reliably trust that it won't scribble obscenities on your disk drive where your precious data used to be. This promise enables all sorts of interesting programs, ranging from tiny functional applets embedded in web pages to massive intelligent agents which roam the Internet talking to other agents, buying information and acting to serve your interests.
These futures are brought into view by Penguin's main paradigm: all computers are peers. In this model, there is no distinction between a client and a server; in fact, two computers might simultaneously send each other Penguin programs to accomplish a transaction. On the Internet, this paradigm makes more sense than Java's client-centric model. After all, although the local client computer may be able to display great graphics, it's statistically 30 million times more likely that some remote computer holds the information or data which the client needs to display.
The idea of executing someone else's code (or indeed of sending your own code somewhere else) raises the issue of security. How can you be sure that the code, when run, won't send hate mail to the White House or play the delete fandango on your favorite spreadsheet? Penguin solves this problem with a three stage process.
First, all Penguin code carries with it a cryptographic "digital signature," which guarantees that the code is intact and that a specific someone (the "signing authority") took personal responsibility for the code going out on the network. Once signed, the code and the signature cannot be tampered with in any way.
Second, Penguin formalizes the idea of trust. When your computer receives some Penguin code, the Penguin-enabled Perl interpreter figures out who signed it and then checks to see how much you trust them. Penguin enables you to allow or deny each signing authority's code to execute any specific Perl operation. For example, you might allow all Felix Gallo-signed code to use simple math, arrays, hashes, and print(), but not use loops, open(), system(), or write() - but you might allow Larry Wall-signed code to do anything it wants. The choice of operations to limit and allow is completely customizable. Further, you can limit the amount of memory, disk space, and self-knowledge that each signing authority is allowed to access.
Lastly, Penguin compiles and runs all code in a special interpreter with illegal functions "masked out." If the code attempts to perform any operation you've disallowed, it is loudly rejected at compile time before execution. The moment a Penguin program attempts to cross the bounds of its allowed memory or disk space at run time, or when it uses up more time than you've allowed, it is terminated with extreme prejudice by the operating system.
By comparison, the marketing team at Sun has heavily promoted the fact that Java uses formally "provable" methods and a bytecode verification pass to ensure that Java programs don't run amok - but they have yet to explain why the Computer Emergency Response Team has twice had to issue global warnings of major Java viruses and bugs in the system. Interested parties are free to find and fix bugs in the source themselves, of course - as long as they have $250,000 for the license.
Another pillar of Penguin's power is its extensible class system. Although the idea of a software agent scampering around the net gathering information is neat by itself, implementing such a beast without a standard set of interfaces and protocols for all executing environments to share would be untenably difficult.
To help solve this problem, Penguin provides each executing program with two lists. The first list is the "law" of the environment; it details what the program can and cannot do. The second list is the "way" of the environment: it lets the program know what the local environment is for and provides clues for what the program might want to do. For example, if the computer is a storefront for selling shoes, the second list might tell all executing programs that the function &buy() is available, and that there are 50 pairs of brown wingtips available for $65 each.
Interface standardization helps, but what about extension? Like Perl, Penguin takes an intellectual-free-market approach by permitting Penguin environments to overload, extend or perform radical surgery on any class or function. If a site administrator wants incoming programs to have access to functions outside the accepted set, she's free to implement them and make them available.
Figure 1: A Machine Sending Code to Another
This draws a sharp distinction from Java, which has no standard methods for letting programs know what sort of environment they're running in and no master library of classes for an "Internet agent" to use. Confessedly, Java has a superior set of graphical user interface classes, but Penguin's catching up - it has interfaces for OpenGL, the only cross-platform freely available three-dimensional graphics standard currently supported in hardware, and Tk, Sun's own popular user interface library.
The Penguin story is already a runaway winner by the time we get to the most subtle and yet most compelling part: the underlying language. While language evaluations and comparisons are notoriously complex and fraught with emotional boosterism, the issues for Internet application languages are fairly clear-cut and well-defined.
A good language must handle data well. The Internet is expanding exponentially because people are entranced with the possibilities that accompany the avalanche of information. All of humanity's secrets, from its foibles to its successes, seem just two mouseclicks away. And as we go now from the raw document phase, in which we must gather information and interact ourselves using crude browsers, to the enhanced information phase, in which the deluge of content requires that we make tools to optimize our time and put the burden on computers, our language must filter, search, manipulate and process arbitrary data in a fast and powerful way.
Perl is, of course, very good with information. Regular expressions, associative arrays, references, filehandles and typelessness are all features which enhance Perl's suitability for major data manipulation jobs.
Perl's strength is Java's weakness. Java, based on C++, still shows its systems implementation language ancestry. Although it has some good "stream" classes which help get data from the net to the program, Java's very poor at handling large quantities of information. Lacking regular expressions, and feeling the constrictive effects of a rigorous type system, Java programmers must write great swaths of code to accomplish text manipulation tasks that are relatively simple in Perl.
An Internet application language must also be a big tent. Just as the Web has proven that the media giants alone aren't capable of satisfying humanity's entire thirst for information, the Internet is proceeding to prove that no one software company (in fact, no thousand software companies) can encompass users' ever growing needs for ever more personalized programs. Our language must provide as many people as possible with the ability to do as much as possible.
Perl is strong in this area too. Although much has been said about its initial learning curve, its nature lends itself well to incremental learning. With just $scalars, <FILEHANDLES>, open() and print(), new users can see results that clearly lead to important data manipulation programs. The powerful bits, such as regular expressions, need not be learned all at first or all at once. As a result, Perl is comfortable for even the occasional programmer.
Java is far harder to grasp than Perl. The strictures and rules it applies take effect even before programming starts -- if the file the program is in isn't named according to exacting rules, the compiler won't even run. Going a step further along the road, the programmer must understand classes, objects, methods, functions, the 'public'/'private' distinction, return values, the special 'void' return value, 'static' class members, typed arrays, Java's 'System' class and a function called 'println' before she can make "hello, world!" appear on a screen. Worse, esoteric concepts such as threading, exception handling, and garbage collection are mandatory for even intermediate programmers. Java may be an easy step up for those experienced with C and C++, but it's a mountain for those who haven't taken programming as their profession.
The last important feature of an Internet application language is its scalability. As the Internet expands to fill all available bandwidth and explore new frontiers, it forces the creation of large, sophisticated programs to handle its boundaries. Small programs written carelessly are extended and extended again to serve ever increasing appetites. A good Internet application language must make software engineering possible in the face of even the Internet's unchecked rapaciousness.
Perl is moderately good at permitting careful growth. Penguin, too, is designed with extensive scaffolding to permit future expansion. However, the easy-going nature of the language and the fact that it doesn't enforce anything but the most basic syntactical correctness can result in poor coding discipline and practices which inhibit reuse. As a result, Perl can be difficult to write long-term reusable code in. The only consolation is that on the Internet, most code has a lifespan of months anyway.
Figure 2: Java's Complexity Onramp vs Perl's
Java is better than Perl in the scalability area. It has flaws, such as its odd way of implementing multiple inheritance, its typefulness, and the too-numerous 'finalized' (non-extensible) classes in its provided libraries, but it gains from being derived from a software engineering language. Once people stop writing musical animated bouncing head applets, Java will likely prove a good system for programming in-the-large.
So, Perl with Penguin is a safer, more consistent, more powerful, and more suitable Internet application language. Its network model is in line with the Internet's; its security relies on widely known cryptographic methods, rather than on unknown hidden algorithms that aren't, apparently, working; it provides interfaces specifically designed for agents; and the underlying application language addresses the core needs and strengths of the Internet.
Penguin is freely available (under Perl's Artistic License) on the CPAN site nearest you. The first public release ("One Penguin") is now available - feel free to download it, test it, beat the heck out of it, and submit suggestions, comments, criticism and patches to me. Really interested people should join the Penguin mailing list by sending mail to penguin-request@amicus.com.
Next issue, in 'Penguin: The First Tentative Waddle', we track the implementation of an honest-to-Betsy Internet agent. Stay tuned!
Felix Gallo goes with the floe in Austin, Texas as the lead Perl hacker for Amicus Networks.