2024 π Daylatest newsbuy art
Twenty — minutes — maybe — more.Naomichoose four wordsmore quotes
very clickable
data + munging

The Perl Journal

Volumes 1–6 (1996–2002)

Code tarballs available for issues 1–21.

I reformatted the CD-ROM contents. Some things may still be a little wonky — oh, why hello there <FONT> tag. Syntax highlighting is iffy. Please report any glaring issues.

The Perl Journal
#18
Summer 2000
vol 5
num 2
Stephen Lidie (2000) TkComics, A Perl/Tk Web Client. The Perl Journal, vol 5(2), issue #18, Summer 2000.

TkComics, A Perl/Tk Web Client

Stephen Lidie


Packages
Perl/Tk.........................................................................CPAN
LWP............................................................................CPAN
Tk::JPEG, Tk::PNG......................................................CPAN
MIME::Base64.............................................................CPAN
Storable........................................................................CPAN
Win32::MemMap.........https://www.generation.net/~aminer/Perl
Tie::Win32MemMap..................................https://www.sysadminmag.com/tpj

This article describes how to write eventful Perl/Tk web client programs using the LWP (library for WWW access in Perl) bundle. We'll develop a web client called tkcomics that retrieves comic strips with LWP and displays them with Perl/Tk.

LWP

LWP is a Perl library providing access to the web, and is used primarily for writing client applications that mimic what your browser does. It "speaks" HTTP (the Hypertext Transfer Protocol), and one of its most basic duties is fetching the content of URLs. The beauty of LWP is that it hides all the operational details while allowing us to do our work with simple objects and methods.

In essence, a user agent object working on our behalf takes a request object and does whatever low-level communication and error handling is required to send a request for web content on its way. The user agent then waits for the reply (or error) and hands it back to our program in the form of a response object. Our job is to create the user agent and then use various LWP methods to manipulate requests and responses. But if all we want to do is simply fetch web content, we can use LWP in "simple" mode and just say "fetch this URL". LWP will create a user agent and request/response objects internally, eventually returning content data to our program.

At first glance, the simple LWP mechanism would appear to suffice for a client like tkcomics, but it has some drawbacks. As you might suspect, they have to do with blocking: there might be a considerable time delay between a request and when the network finally delivers the content. Even manual handling of LWP's user agent won't surmount all of these problems, so we'll craft a solution that launches multiple processes to minimize the adverse effects of blocking.

An excellent place for private viewing of your favorite comics is https://www.comics.com/, although all the glamour, glitz, and advertisements may be too much for the stomach. But there's nothing to stop us from fetching just the comics and using Perl/Tk to display them. We'll start by learning now to fetch and interpret the www.comics.com home page, and then build our Perl/Tk client around that framework.

Fetching Web Content With LWP::Simple

This module is so simple it's not even object oriented; rather, it exports a handful of subroutine for fetching and mirroring web content. All we're interested in is fetching content, which we can do with the get() subroutine:

use LWP::Simple qw/get/;

To retrieve a web page all we do is call get() with the desired URL:

  my $comics_home = 'https://www.comics.com';
  my $comics = get $comics_home or die "Can't get $comics_home.";

So now $comics contains a ton of raw HTML, stuff we normally let our browser interpret and display for us. If we actually did browse that URL, one of the things we'd see is the actual comic selector widget, what appears to be an Optionmenu (Figure 1): Figure 1. The comic selector widget
The comic selector widget.

It's not a Tk Optionmenu, of course, but it's what the browser renders when it encounters a <SELECT> tag. Looking at the HTML, either by printing $comics or viewing the page source in the browser, we see this:

  <SELECT NAME= ... >
      <OPTION>Click to select a comic
      <OPTION VALUE="/universal/adam/ab.html">Adam
      <OPTION VALUE="/comics/alleyoop/ab.html">Alley Oop
      <OPTION VALUE="/comics/andycapp/ab.html">Andy Capp
      <OPTION VALUE="/comics/arlonjanis/ab.html">Arlo & Janis
      <OPTION VALUE="/comics/askshagg/ab.html">Ask Shagg
      <OPTION VALUE="/comics/bc/ab.html">B.C.

  ...

      <OPTION VALUE="/comics/wizardofid/ab.html">Wizard of Id
      <OPTION VALUE="/universal/ziggy/ab.html">Ziggy
  </SELECT>
 

This looks promising indeed; we seem to have a list of URLs, one per comic. If we're lucky, we should be able to key on the string OPTION VALUE and then use the following code to pick out the comic names and their associated URLs.

  my $n = 0;
  foreach (split /\n/, $comics) {
      next unless /OPTION\s+VALUE/i;
      if (my($comic_url, $comic) = m\"([^"]+)">(.*)\) {
          $comic =~ s/\t//g;
          $comic =~ s/\r//g;
          printf "%-30s : %s\n", $comic, $comic_url;
          $n++;
      }
  }
  print "\nfound $n comics\n";

And luck is with us. If we run the code, this is what we'll see:

  Adam	: /universal/adam/ab.html
  Alley Oop	: /comics/alleyoop/ab.html
  Andy Capp	: /comics/andycapp/ab.html
  Arlo & Janis	: /comics/arlonjanis/ab.html
  Ask Shagg	: /comics/askshagg/ab.html
  B.C.	: /comics/bc/ab.html

  ...

  Wizard of Id	: /comics/wizardofid/ab.html
  Ziggy	: /universal/ziggy/ab.html

  found 91 comics
 

The URLs on the right are not the comic images, but another page relative to the site's home address containing the actual URL of the comic. For instance, if we concatenate Ask Shagg's relative URL with $comics_home and view the HTML at https://www.comics.com/comics/askshagg/ab.html, we see an <IMG> tag with the relative URL of the actual GIF image:

<IMG SRC="/comics/askshagg/archive/images/askshagg21461240000515.gif" 
ALT="today's comics strip" ALIGN=TOP BORDER="0">
tkcomics can easily extract this URL, fetch the image file, convert it to a Tk Photo and display it. So, we have proof-of-concept code. On the upside, it's extremely simple, but on the downside it's blocking, and there's no timeout mechanism. Let's try to address these concerns.

lwp-request and fileevent

To keep Perl/Tk events flowing, we need to use a separate process (or thread), and ensure that the two processes can talk in a non-blocking way. The first process, the Tk parent, handles the GUI and event processing, while the child fetches comic images and forwards them to the parent.

In the article Signals, Sockets, and Pipes from issue TPJ #5 of TPJ, we saw a Perl/Tk client program that used TCP/IP sockets in concert with pipe, fork, exec, and fileevent(), all to monitor the disk space utilization of a cluster of server machines. That IPC solution was unusually specialized because it used bi-directional pipes for a two-way conversation between the Perl/Tk parent and the child.

In contrast, if the tkcomics child can get its URL from the command line and send comics to standard output, we can use the pipe form of the open command and let Perl do the hard work.

Included in the LWP package is lwp-request, a standalone program perfectly matched for our needs -- it accepts a URL on the command line and prints the resulting web content on its STDOUT.

So that's our plan, and with Perl 5.6.0, it might work on Win32 too.

We start with the de rigeur declarations. All the comic images are binary data, but to use them in Perl/Tk we must first Base64 encode them -- that's why we need MIME::Base64. Additionally, although most of the comics are GIFs, a few are in JPEG or PNG format. Note that Tk::JPEG and Tk::PNG are not bundled with Perl/Tk, so you'll have to install them yourself.

  #!/usr/bin/perl -w
  #
  # tkcomics - display comics courtesy of https://www.comics.com
  #
  # Because LWP::Simple and LWP::UserAgent can block, do the
  # network I/O in another thread (er, child) with fileevent().
  # Add a Stop/Cancel button that kills the pipe.
  
  use MIME::Base64;
  use Tk;
  use Tk::JPEG;
  use Tk::PNG;
  use subs qw/get_url show_comic status stop_get/;
  use strict;

Here we open the MainWindow in the standard fashion, and then initialize a few global variables. $photo holds the comic image object reference. $status displays periodic status messages that keep the user informed of the state of data transfers. $eof is set either when the lwp-request child completes (signaling that the web fetch is complete), or by a user button click to prematurely interrupt a transfer. $pid holds the process ID of the child, used to kill it if we tire of waiting for the response. The %ext hash maps a comic file name extension to a Tk Photo image processor. Finally, $help is a special status message that's periodically displayed telling the user how to work the application. We'll stuff the names of all available comics into a List widget and use a <Button-1> event to select one for viewing. <Button-2>, obviously, is used to interrupt a long (or hung) transfer.

  my $mw = MainWindow->new;
  my $photo = '';
  my $status = '';
  my($eof, $pid);
  my %ext = qw/
               gif gif
               jpg jpeg
               png png
               tif tiff
               xbm xbm
               ppm ppm
              /;   # file extension => Photo format map
  my $help = '<Button-1> fetch comic, <Button-2> interrupt transfer';

As Figure 2 shows, the entire application consists of just three Tk widgets: a Listbox that holds the names of the comics, one Label that displays the comic images (empty initially, of course), and a second Label that displays relevant status information. Figure 2. The initial tkcomics display
The initial tkcomics display

  my $s  = $mw->Label(-textvariable => \$status, -width => 100);
  my $lb = $mw->Scrolled('Listbox');
  my $l  = $mw->Label;
  $s->pack (qw/-side bottom -fill x -expand 1/);
  $lb->pack(qw/side left -fill y -expand 1 -anchor w/);
  $l->pack (-side => 'right');

The following code is essentially our old proof-of-concept example, except that instead of printing comic names and URLs, we build a hash of comic URLs indexed by comic name. It's important to note that we've eliminated LWP::Simple, hence use our own get_url() subroutine in place of LWP::Simple::get().

  my $comics_home = 'https://www.comics.com';
  my $comics = get_url $comics_home 
or die "Can't get $comics_home.";
  
  my (%comics, $comic_url, $comic);
  foreach (split /\n/, $comics) {
      next unless /OPTION\s+VALUE/i;
      if (($comic_url, $comic) = m\"([^"]+)">(.*)\) {
          $comic =~ s/\t//g;
          $comic =~ s/\r//g;
          $comics{$comic} = $comic_url;
      }
  }

We now display our first status message, telling us how many comics were found, and how to use tkcomics. The status() subroutine simply sets the status Label's -textvariable, and then invokes idletasks() to flush Tk's low priority idle events queue. This updates the entire Tk display so that the status message appears immediately.

  status 'Found '.scalar(keys %comics).' comics, '.$help;
  
  sub status {
      $status = $_[0];
      $mw->idletasks;
  }

Finally, we populate the Listbox with the comic names, sorted alphabetically, establish two button bindings that give life to tkcomics, and enter the main Perl/Tk event loop.

  foreach (sort keys %comics) {
      $lb->insert('end', $_);
  }
  
  $lb->bind('<ButtonRelease-1>' => \&show_comic);
  $mw->bind('<ButtonRelease-2>' => \&stop_get);
  
  MainLoop;
  

The initial tkcomics display is shown in Figure 2.

The heart of tkcomics is the subroutine get_url(), shown below. Look it over before I explain it. Note that there's an implicit tradeoff between efficiency and simplicity. For an important program it would be better to fork a persistent child once and establish a dialog between it and the Tk parent. In this case however, it's considerably easier just to use a pipe open and run lwp-request for every URL.

  sub get_url {
  
      my ($url) = @_;
      
      status "Fetching $url";
      $pid = open PIPE, "lwp-request -m GET -t 20 -e $url 2>&1 |" 
          or die "Open error: $!";
      binmode PIPE if $^O eq 'MSWin32';
  
      my $content;
      $mw->fileevent(\*PIPE, 'readable' => sub {
          my($stat, $data);
          while ($stat = sysread PIPE, $data, 4096) {
              $content .= $data;
          }
          die "sysread error:  $!" unless defined $stat;
          $eof = 1 if $stat == 0;
      });
      $mw->waitVariable(\$eof);
      $mw->fileevent(\*PIPE, 'readable' => '');
      close PIPE;
      
      $pid = undef;
      
      (my $response, $content) = 
$content =~ /(.*?)\n\n(.*)/is if $content;
  
      return wantarray ? ($response, $content) : $content;
                     
  } # end get_url

Subroutine get_url() is passed a single argument, the URL to fetch, which is immediately posted in the status Label. The open statement performs the requisite pipe, fork, and execing of lwp-request for us, so all we (the Tk parent) need do is establish a fileevent() handler to read web content. The lwp-request option -t causes a twenty second timeout, and the -e option says to return the response headers along with the web content. The response headers are returned first, separated from the content by an empty line.

If you're on Win32, the binmode statement is very important, because the comic images are binary data. On Unix, binmode isn't required, but it also does no harm.

Now we set up the fileevent() readable callback on the lwp-request output pipe. The callback simply appends up to 4K of web data to the $content variable, and nominally ends at the end-of-file. Meanwhile, Tk event processing continues because the Tk parent is spinning on the waitVariable statement, waiting for the $eof variable to change. $eof changes in one of two ways, either when the fileevent() callback detects end-of-file, or, as we shall see, when the user clicks Button-2.

Once the waitVariable is satisfied, we cancel the fileevent() readable callback, close the pipe handle and undef $pid. Notice that get_url() uses wantarray to determine whether it was called in scalar or list context. In list context, we assume the caller wants two strings: the response headers and the actual URL content. Otherwise we assume that only the content is desired.

To stop a web get(), we click Button-2, which invokes this subroutine, stop_get(). We then set $eof so the fileevent() readable callback terminates, and if $pid is defined (that is, if the lwp-request child is still running), we kill it.

  sub stop_get {
      status "Stopping transfer ...";
      $mw->after(5000, sub {status $help});
      $eof = -1;
      kill 'TERM', $pid if defined $pid;
  }
 

The Recipe for Displaying Web Images

Let's take a peek at what the tkcomics application looks like when rendering a comic. Figure 3 depicts the program displaying a GIF file. To see how we rendered it, read on. Basically, using the active Listbox element, we find the comic URL and fetch its contents. Within the page is an <IMG> tag with another URL pointing to the actual image, which we then fetch, convert to a Photo, and display. Periodically we examine $eof to see if any transfer was aborted. Figure 3. Ask Shagg Camel Facts
Ask Shagg Camel Facts

As with any binding callback of the form

$lb->bind('<ButtonRelease-1>' => \& show_comic);

Perl/Tk implicitly passes the bound widget reference -- the Listbox -- as the callback's first argument, which we assign to the variable $lb.

  sub show_comic {

      my ($lb) = @_;

Since we got here on a ButtonRelease-1 event, we're guaranteed that the active Listbox entry is our target comic. We get() it, and index into the %comics hash to get the URL of the page containing the comic image URL. We return immediately if the transfer was interrupted.

      my $comic = $lb->get('active');
      my $comic_url = $comics{$comic};
      my $comic_html = get_url 
          $comic_url =~ /^https:/i ? $comic_url 
            : "$comics_home$comic_url";
      return if $eof == -1;

Now we extract the image URL from the mass of HTML sitting in $comic_html, and call get_url() in list context to get both the response header and the binary comic image. If the transfer wasn't interrupted, we can assume all the returned data is correct.

      my ($image_url) =
          $comic_html =~ m\.*<IMG SRC="([^"]+)".*? 
ALT="(today|daily)\is;

      my ($response, $image) = 
get_url "$comics_home$image_url";
      return if $eof == -1;
	  

Perl/Tk images can be generated either from an external file or embedded data. Due to its Tcl heritage where everything is (or more accurately, was) a string, embedded image data must be in printable characters, which is why we first Base64 encode the image data. Now we do a little bookkeeping with the variable $photo -- the second and subsequent invocations of this callback delete any previous image to stem a possible memory leak. Finally, we create an image of the appropriate format, configure the image Label to display it, and update the status help message. Q.E.D.


      my $photo_data = encode_base64($image);
      $photo->delete 
if UNIVERSAL::isa($photo => 'Tk::Photo');
      my($ext) = $image_url =~ /\.(.*)?/;
      $ext ||= 'gif';
      status "Creating $bytes byte $ext Photo";
      $photo = $mw->Photo(-data => $photo_data, 
                        -format => $ext{lc $ext});
      $l->configure(-image => $photo);
  
      status $help;
  
  } # end show_comic
  

Win32 Considerations

This is all well and good; the code is sound, and should work with Perl 5.6.0 and a recent Tk, such as Tk800.022. But, alas, it doesn't. It hangs on the fileevent(), which is never triggered. What can we do?

Threads are out, as Tk is not thread-safe. TCP/IP comes to mind, but I've got code that forks a child and uses local sockets, and it hangs on the fileevent() too. We could arrange for the child to write a file and signal the Tk parent by some means, perhaps a semaphore, but that lacks style. We can't use shared memory, since the Unix shm* shared memory functions aren't available in Win32, right? True, but Win32 has its own shared memory capabilities, so let's investigate further...

A chance email message from Grant Hopwell clued me into his Tie::Win32MemMap module. With it, Grant would spin off Win32 children usingWin32::Process::Create, and the two processes would talk using a shared Perl hash! This seemed an interesting possibility -- create a shared hash with two keys, CONTENT and COMPLETE, and have the child run lwp-request and store web content in $shared_hash{CONTENT}, while the Perl/Tk parent watches (using waitVariable) for a change in $shared_hash{COMPLETE}, set by the child when ready.

Tie::Win32MemMap requires Storable, available from CPAN, and Win32::MemMap, written by Amine Moulay Ramdane, available at https://www.generation.net/~aminer/Perl. Tie::Win32MemMap itself is available from The Perl Journal programs web page.

As it happens, we can indeed write a drop-in replacement for subroutine get_url(), specifically for Win32, and keep the rest of tkcomics intact. Let's look at get_url() for Win32 now.

  sub get_url {

      my($url) = @_;
    
      status "Fetching $url";
	  

Here we create and initialize a new chunk of shared memory and tie it to the hash %content. The shared memory is tagged with the name tkcomics, which any Win32 process can access if it uses the proper MapName.

      use Win32::Process;
      use Tie::Win32MemMap;
  
      my %content;
      tie %content, 'Tie::Win32MemMap', {
          Create  => MEM_NEW_SHARE,
          MapName => 'tkcomics',
      };
      $content{'COMPLETE'} = 0;
      $content{'CONTENT'}  = '';
	 

Now we fire up the child process, another Perl program stored in the file tkcwin32.kid, whose job is to connect to the shared memory for tkcomics, fill the CONTENT entry with web data from the URL passed on the command line, and set the COMPLETE entry when it's finished.

      Win32::Process::Create(
          my $child,
          'c:\\perl\\bin\\perl.exe',
          "perl tkcwin32.kid $url",
          0,
          NORMAL_PRIORITY_CLASS,
          '.',
      ) or die Win32::FormatMessage(Win32::GetLastError);
  
      $eof = 0;
      $mw->update;
	 

Here we wait for the signal from the child that it has completed. Normally we would use the statement waitVariable(\$content{'COMPLETE'}) to do this, but there is competing magic between the tie module and Tk, so we have to synthesize our own using this loop:

      while ( $content{'COMPLETE'} != 1 ) {
          last if $eof == -1;
          $mw->after(10);
          $mw->update;
      }
And once the child completes, we separate the response headers from the actual content, and return the particular data required by our caller, just like the Unix version.
      my $content = $content{'CONTENT'};
      (my $response, $content) = 
$content =~ /(.*?)\n\n(.*)/is if $content;
      return wantarray ? ($response, $content) : $content;
                     
  } # end get_url

For our purposes the child, tkcwin32.kid, must reside in the current working directory because we haven't qualified the pathname in the Win32::Process::Create() call. It's certainly trivial to embed the child in tkcomics proper and create it during initialization, an exercise left to the reader. Here, then, is the Win32 child program:

  #!/usr/bin/perl -w
  #
  # Win32 tkcomics helper program that shovels 
  # web content to the Tk parent.
  
  use Tie::Win32MemMap;
  
  my $url = shift;
 

Because we're not in the context of a subroutine, the naked shift statement uses as its argument @ARGV -- the command line -- rather than @_, thus providing the child the URL specified by the parent.

By this point in real-time the Tk parent has already created and tied the shared memory to its hash, so all the child need do is tie to the same MapName in "share" mode.

  my %content;
  tie %content, 'Tie::Win32MemMap', {
      Create  => MEM_VIEW_SHARE,
      MapName => 'tkcomics',
  };

Once again (with Perl 5.6.0 only!) the child is free on Win32 to do a pipe open and run lwp-request in the same manner as the Unix code. Don't forget the binmode statement:

  open(PIPE, "lwp-request -m GET -t 20s -e $url|") 
                                 or die "open failure: $!";
  binmode PIPE;

And once again, read 4K chunks of web content and build up the scalar $content{'CONTENT'}. When end-of-file is reached, close the pipe and set the COMPLETE marker, which signals the Tk parent to proceed.

  my($stat, $data);
  while ($stat = sysread PIPE, $data, 4096) {
      $content{'CONTENT'} .= $data;
  }
  die "sysread error:  $!" unless defined $stat;
  close PIPE;

  $content{'COMPLETE'} = 1;
  exit(0);

In case you don't believe all this actually works, gaze upon Figure 4 and witness tkcomics for Win32! Figure 4. tkcomics works on Win32 too.
tkcomics works on Win32 too.

Until next time,

use Tk;

Martin Krzywinski | contact | Canada's Michael Smith Genome Sciences CentreBC Cancer Research CenterBC CancerPHSA
Google whack “vicissitudinal corporealization”
{ 10.9.234.152 }