SWIG is a freely available compiler developed by Dave Beazley, now at the University of Utah. It works on many Unix variants, as well as NT, Win95, and MacOS, although the Win95 and MacOS versions are less polished. It converts C and C++ files into interface code (called a "wrapper") that makes the contents available to Perl, Tcl, Guile, and Python - all automatically. Like Perl's own XS language, SWIG creates Perl wrappers around C code . Unlike XS, SWIG requires little or no programming effort.
I'm not going to describe the mechanics of SWIG in detail; a more thorough treatment would address function prototypes, complex data structures, multiple inheritance, Perl classes, Tcl 8 modules, exception handling, makefile generation, strict type checking in the interpreter, conditional compilation, and automatic generation of documentation. There's an excellent user's guide bundled with SWIG that covers all of these topics; in this article, I'll just show you the basics: how to take an existing application and create a Perl wrapper around it.
This tutorial is divided into three sections. First, in "Take a Sip," I'll show you a simple use of SWIG: creating a Perl wrapper around a lone C function. Second, you'll see how to make C data structures available to Perl via what SWIG calls an interface file, in "Take Another Sip." Finally, in "Take a SWIG," I'll take the source code for an entire utility and "port" it to Perl.
Along the way, you'll see examples of how SWIG is invoked, how shared libraries are built, and how the resulting Perl modules are used. Finally, I'll demonstrate how to write a Perl script that does a great job of imitating top - a widely-used process monitor - because it uses the actual top source code.
Hooks by Hand
I first started using Perl and Tcl in the early 1990's when both were new. Perl was an immediate hit and I happily used a2p and s2p to convert all my awk and sed scripts to Perl. Then, looking at the generated Perl for clues, I converted my sh scripts too. Tcl, while not nearly as powerful as Perl, had the nifty feature of allowing user-defined C functions to be exported to the interpreter. While Perl would eventually do that as well, it was still years away. (I was using Perl 3, after all.)
I spent an inordinate amount of time trying to simplify the writing of Tcl glue code. Eventually an entire library was developed that used a C template structure to convert a vector of Tcl string arguments into C data structures. This made it much easier to call C functions without an argc/argv style interface. Even so, the linkage between C and Tcl was still a hassle.
Perl caught up with Tcl when XS was introduced. Not only could developers specify how to convert between the compiled and interpreted worlds, but the compiled code could be dynamically loaded via a shared library. On the surface, this seemed a wonderful solution.
The problem with interface code is that, well, you have to write it. That might be acceptable if you're creating entirely new code to be made available via a Perl wrapper, but what about legacy code? Wouldn't you really rather develop new programs than spend your time creating wrappers around old code?
Take a Sip: Wrapping a C Function
Let's start out with a simple example. Many people who post to the Usenet newsgroup comp.lang.perl.misc ask how to determine the amount of time used by chunks of their Perl programs. The usual solution is to use one of the timing modules, such as Benchmark.pm. To demonstrate SWIG, we'll create and use a C function instead.
At the top of the next page you'll see elapsed.c, containing nothing but the elapsed_seconds() function.
Because elapsed.c: a C Function to be Wrappered by SWIG is so simple we can feed it directly to SWIG:
% swig -perl5 -shadow -module Elapsed elapsed.c Generating wrappers for Perl 5
This generates three files: the Perl modules in Elapsed.pm, XS wrapper code in elapsed_wrap.c, and documentation in elapsed_wrap.doc.
I won't discuss elapsed_wrap.doc. It's enough to say that SWIG's surprisingly rich documentation generation can create plain ASCII, HTML, or LaTeX. There are options to locate, extract, and format comments from the source code. As with most configuration preferences, the options can be selected either by the SWIG command line or via directives embedded in the source code.
SWIG isn't a full C/C++ parser; some snippets of code will give it fits. So instead of throwing full-fledged C programs at it, it's more common to process merely a .h (header) file. If you're going to be intermixing C/C++ source code with SWIG directives, then an interface file (ending in .i by convention) would be a better choice. SWIG defines the SWIG preprocessor token, so you can make a portion of the C source code visible only to SWIG by enclosing it between #ifdef SWIG and #endif lines, or render it invisible to SWIG with #ifndef SWIG and #endif.
In addition to providing the name of the source module, we need to tell SWIG a few other things. In the command line of elapsed.c we set the output language to Perl, request shadow classes (more on those later), and set the module name to Elapsed.
The Elapsed.pm and elapsed_wrap.c files generated by SWIG constitute a full-fledged Perl extension. They verify function arguments, translate them into C data structures, invoke the functions, and translate return values into a form palatable by Perl. It handles not only functions, but global variables and read-only constants as well. The wrapper, along with the original source file, is compiled and turned into a shared library.
One of the obstacles to using SWIG (and nearly everything else) on different platforms is the variety of ways to create shared libraries. Here's the particular incantation for FreeBSD:
% gcc -DPIC -fpic -I/opt/perl5.004/lib/ \ i386-freebsd/5.00403/CORE -c elapsed_wrap.c % gcc -DPIC -fpic -Wall -c elapsed.c % ld -Bshareable -L/usr/local/lib -o \ Elapsed.so elapsed_wrap.o elapsed.o
You'll need to figure out what commands and options your platform requires. This information should be available from your Perl configuration; here's an ugly but effective make rule that appends the pertinent variables to the Makefile. Whenever I move to a new platform, make localvars deletes the old variables and adds the new ones. It's not pretty, but it works.
localvars: perl -MConfig -e ' \ printf("LD = %s %s\n", $$Config{ld}, $$Config{lddlflags}); \ printf("CC = %s %s\n", $$Config{cc}, $$Config{cccdlflags});\ printf("PERLINC =-I%s/CORE\n", $$INC[0]);' \ >> Makefile
Our test of elapsed_seconds() is encapsulated in a Perl program called elapsed. This program computes the Fibonacci sequence and the ratio between successive numbers in the sequence, which converges to the Golden Mean. We'll use elapsed_seconds() to time our program. (Remember, this is only so that we can demonstrate SWIG; to time Perl code you can always just use Perl's own Benchmark module, bundled with the standard distribution.)
SWIG's home page is https://www.cs.utah.edu/~beazley/SWIG/swig.html. |
You can download it from the CPAN site nearest you in CPAN/authors/Dave_Beazley, or from ftp://ftp.cs.utah.edu/pub/beazley/SWIG.
To join the SWIG mailing list, send a message saying subscribe swig to Majordomo@cs.utah.edu. The message volume is low and the signal-to-noise ratio is high, because Dave Beazley responds to most questions himself.
As you'd hope, the calls to the C function elapsed_seconds() and the assignments of $before and $after look like regular Perl. Once you include the C function with use Elapsed, you can treat it like any other Perl subroutine; the fact that the underlying code happens to be written in C makes no difference.
Here's the output of our elapsed program:
1: 1 2: 1 1 3: 2 2 4: 3 1.5 5: 5 1.666666667 6: 8 1.6 7: 13 1.625 8: 21 1.615384615 9: 34 1.619047619 10: 55 1.617647059 11: 89 1.618181818 12: 144 1.617977528 13: 233 1.618055556 14: 377 1.618025751 15: 610 1.618037135 16: 987 1.618032787 17: 1597 1.618034448 18: 2584 1.618033813 19: 4181 1.618034056 20: 6765 1.618033963 Elapsed time is 0.001777 seconds.
Take another Sip: Interface Files
Now we'll provide SWIG with an interface file, which lets us use SWIG pragmas. We'll skip over compilation and linking, and look at a few complex data structures instead.
In gettime.i, shown below, we set the module name, make the global variable errno read-only, and ask for default structure constructors. We also define two time structures and provide the prototypes for gettimeofday() and settimeofday().
%module Gettime; // Alternative to command line // arguments (for naming) %readonly // Make all variables read only. int errno; %readwrite // Restore default behavior. %pragma make_default; // Generate default constructors. struct timeval { long tv_sec; // seconds long tv_usec; // and microseconds }; struct timezone { int tz_minuteswest; // minutes west of Greenwich int tz_dsttime; // type of dst correction }; int gettimeofday(struct timeval * tp, struct timezone * tzp); int settimeofday(const struct timeval * tp, const struct timezone * tzp);
We can now build the module much as before. Now we don't need the -module option since the name of the module is set in the interface file with the statement %module Gettime.
% swig -perl5 -shadow gettime.i Generating wrappers for Perl 5 % gcc -DPIC -fpic -I/opt/perl5.004/lib/ \ i386-freebsd/5.00403/CORE -c gettime_wrap.c % ld -Bshareable -L/usr/local/lib -o \ Gettime.so gettime_wrap.o
Now using gettimeofday() is easy:
#!/usr/bin/perl -w use Gettime; my $tv = new timeval(); # Allocate a timeval structure # Below, undef maps to a null pointer Gettime::gettimeofday($tv, undef) && warn("gettimeofday() failed, errno = $Gettime::errno.\n"); # The shadow option is what allows these symbolic references # to structure fields. printf("Time is %d.%06d\n", $tv->{tv_sec}, $tv->{tv_usec});
The script's output:
Time is 877438914.248738
Let's look at a program that doesn't work. This script tries to set the time, but fails because it wasn't run by the superuser. Then it fails again because $errno is read-only.
#!/usr/bin/perl -w use Gettime; my $tv = new timeval(); # Allocate timeval structure. $tv->{tv_sec} = $tv->{tv_usec} = 0; # Turn back the clock. # This if statement will fail unless you're root if (Gettime::settimeofday($tv, undef)) { warn("settimeofday() failed, errno =$Gettime::errno.\n"); # This fails since it's read-only $Gettime::errno = 0; }
The output:
settimeofday() failed, errno = 1. Value is read-only. at ./Gettime-test.pl line 9.
A Top Notch Utility
The remainder of this article shows how the power of top can be made available to Perl. The top utility, developed by William LeFebvre and a cast of dozens for more than a decade, is a great system utility similar to ps: It displays a system summary followed by a listing of processes. Unlike ps, the display is updated at regular intervals. There are various other nifty features, but the kicker is that top is portable. Source code is available from ftp://ftp.groupsys.com/pub/top.
Before we get started I'll point out three reasons why I choose top. First, since I didn't write it, it serves as a good test of adapting legacy code to a new environment. Second, top was written with portability in mind, and that makes our job easier. Version 3.4 runs on over two dozen Unix variants - pretty unusual for a program so sensitive to internal kernel structures. top's portability makes it an ideal candidate for SWIG. Finally, I've always wanted to have access to top's information in my system monitoring scripts without having to decode the internal structures of yet another operating system.
One of top's header files, machine.h, contains three structure definitions and a few function prototypes. That's it. The operating system-specific code for each port need only populate an array of those structures. Those three structures and a few functions are all SWIG needs to know about.
My interface file, top.i, has two sections. The first section, delimited by %{ and %}, is literal source code that will be needed by the Perl extension generated by SWIG. This section is opaque to SWIG and can be as complex as necessary. The code after the %} is almost straight C too; in fact, if the special %include statements weren't necessary, the entire interface file could be nothing but C code.
From %{ to %}. This section contains three functions to be included in the Perl extension. The printable() function mimics a function in the top source code. The original is in a file that, if compiled and linked, introduces many more platform dependencies. So to keep the number of dependencies to a minimum, I just duplicated what I needed.
The two functions after printable(), full_format_header() and full_format_next_process(), are only needed in the interface file for their complete prototype definitions. As you can tell, the original prototypes in machine.h lack arguments, so we have to provide them here. (As LeFebvre points out, this code predates ANSI C and is due for an overhaul.)
Complete prototypes are required, and for a good reason. SWIG isn't as permissive as Perl. It doesn't have Perl's anything-goes attitude, and in fact performs extensive type checking of function arguments. This should be reassuring to people leery of integrating a low-level application into a typeless language.
After the %{... %} block. The next lines in top.i are a few %include statements. The first two import a couple of SWIG's built-in interface files: pointer.i and typemaps.i. SWIG supports basic data types, but more sophisticated structures (structures, arrays, complex pointers, and the like) require additional help. These built-in interface files handle common C constructs, such as a null terminated array of strings represented as a char **.
The next two include statements pull in two top header files: top.h and machine.h. The first contains some constants, and the second, as you've already seen, contains the portable structures used by top.
Finally, the five externs at the bottom of top.i are function prototypes. The first two should look familiar; they need to be included so SWIG will know to generate wrappers. (Remember, the earlier code block was opaque to SWIG.) The last three statements are prototypes for the top functions we'll be calling from our Perl program.
If this interface file seems overly complex, it's due to my desire to leave the legacy top sources untouched. Were this new code, the interface file might have been just a few include statements - or the .h files might even have been used directly.
You can have SWIG process the interface file as follows (assuming you're in the same directory as the code). The only new option, -Itop-3.4, indicates where to look for include files. The other options should all look familiar:
% swig -Itop-3.4 -perl5 -shadow -module Top top.i
SWIG then creates three files: top_wrap.c, Top.pm, and top_wrap.doc. The wrapper source file contains all the Perl to C interfaces - well over 2000 lines of code that you, thankfully, don't have to write. This may seem like a lot; it's because of SWIG's type checking and its tests for end cases. You might be able to do as good a job by hand, but I doubt it.
Top.pm is the Perl module proper. It uses Perl's built-in DynaLoader module to load the top shared library dynamically (assuming your system supports shared libraries). This makes the functions and C constants in top.h available to the module.
The final step is to compile the C code. top_wrap.c is compiled along with three files from the top sources. They're linked together into a shared library (Top.so on my system) and we're ready to go.
% swig -Itop-3.4 -perl5 -shadow -module Top top.i top-3.4/machine.h : Line 28. Warning. Array member will be read-only. Generating wrappers for Perl 5 % gcc -DPIC -fpic -Itop-3.4 -I/opt/perl5.004/lib/ \ i386-freebsd/5.00403/CORE -c top_wrap.c % gcc -DPIC -fpic -Itop-3.4 -c top-3.4/machine.c % gcc -DPIC -fpic -Itop-3.4 -c top-3.4/utils.c % gcc -DPIC -fpic -Itop-3.4 -c top-3.4/username.c % ld -Bshareable -L/usr/local/lib -o Top.so \ top_wrap.o machine.o utils.o username.o -lkvm
The warning from line 28 about the read-only array member (double load_avg[NUM_AVERAGES]) is telling - the distinction between an array and a pointer is subtle in C. To avoid such problems, SWIG treats references as read-only by default. You can usually create unambiguous types with typedef if you need to.
The last line above uses ld to link the four .o files with the KVM library. KVM is the kernel memory interface for FreeBSD; it will be loaded along with the top library at the first call to the top module. Since the module opens sensitive kernel structures, you'll most likely need to run it as the superuser.)
load averages: 0.14, 0.03, 0.01 22:44:41 30 processes: 1 running, 29 sleeping, CPU states: 5.3% user, 0.0% nice, 1.5% system, 0.8% interrupt, 92.4% idle Mem: 27M Active, 8M Inact, 14M Wired, 5060K Cache, 7640K Buf, 6952K Free PID USERNAME PRI NICE SIZE RES STATE TIME WCPU CPU COMMAND 15121 root 28 0 1276K 1748K RUN 0:00 2.13% 0.84% perl 14499 root 18 0 684K 896K pause 0:00 0.00% 0.00% tcsh 14498 scott 2 0 608K 1828K select 0:00 0.04% 0.04% xterm 13917 scott 3 0 1448K 1652K ttyin 0:02 0.00% 0.00% vi 242 scott 18 0 828K 1016K pause 0:04 0.00% 0.00% tcsh 241 scott 3 0 652K 944K ttyin 0:00 0.00% 0.00% tcsh 240 scott 3 0 664K 952K ttyin 0:00 0.00% 0.00% tcsh 239 root 2 0 1244K 1380K select 1:08 0.00% 0.00% perl 238 scott 3 0 772K 980K ttyin 0:05 0.00% 0.00% tcsh 234 scott 2 0 532K 1340K select 0:00 0.00% 0.00% xterm 233 scott 2 0 532K 1408K select 0:19 0.00% 0.00% xterm 232 scott 2 0 208K 1172K select 0:02 0.00% 0.00% xclock
Take a Swig: The top Emulator
A Perl program that emulates top is A SWIG-enabled Top Emulator. There are a few new aspects to it, notably the ptrvalue calls, but most of the program is straightforward.
If you look back at machine.h you'll see that the statics structure has pointers to character arrays. Normally, the potential ambiguity of pointers to pointers would cause SWIG to punt unless it had explicit directions for what to do. However, the array-of-strings construct is so common that SWIG provides support via the pointer.i interface file.
The names() subroutine steps through the char ** array, pulling off the strings one by one until NULL is reached. ptrvalue() is a standard SWIG function that requiring an array reference and an index; it returns the element at that index. The assembled array is then returned by names(). The memfix() routine is more mundane. It just converts a number from kilobytes to megabytes.
The initialization section of the script creates three structures ($statics, $si, and $ps) via calls to new(). SWIG's %make_default pragma (back in top.i) automatically allocated the structures and created the new() methods for you.
The ps structure is initialized by hand so that its values make sense the first time they are used, and the Top::machine_init() method populates the statics structure. The three calls to names() extract the respective names into arrays.
The remainder of the script is just a loop that repeats sixty times. On each iteration, the script pauses, gathers current statistics, and prints a top-style report. The only feature missing is the sorting of processes according to their CPU usage.
This script is simple minded; it merely emulates top rather than extends it. Still, it runs equally well on all top-ready operating systems. More sophisticated scripts might extend top in different ways: data trending, event triggers, real time plots of system data, and so on.
A Toast
SWIG's 300 page user manual goes into great detail about features not covered in this article: pointers, input constraints, typemaps for complex data types, exception handling, and further customization. It also covers C++ and Objective C.
Other common uses of SWIG include rapid prototyping, interactive debugging, script-based testing of systems, and optimizing existing scripts by implementing slow portions in C or C++. SWIG makes it simple to embed C and C++ code in your favorite interpreter. In addition to Perl, SWIG can just as easily generate interface code for Tcl, Guile, and Python. Best of all, SWIG is portable and free. A toast is in order.
Scott Bolte (bolte@niss.com) recently joined GE Medical Systems in Milwaukee as a Lead Software Architect for the Unix Foundation Group.