2024 π Daylatest newsbuy art
Where am I supposed to go? Where was I supposed to know?Violet Indianaget lost in questionsmore quotes
very clickable
data + munging

The Perl Journal

Volumes 1–6 (1996–2002)

Code tarballs available for issues 1–21.

I reformatted the CD-ROM contents. Some things may still be a little wonky — oh, why hello there <FONT> tag. Syntax highlighting is iffy. Please report any glaring issues.

The Perl Journal
#14
Summer 1999
vol 4
num 2
Perl News
What's new in the Perl community.
What Is Truth?
Truth and falsehood aren't black and white.
Downloading Web Pages Through A Proxy Server
How LWP can cope with firewalls.
Seven Useful Uses of local
Some rare occasions when my won't do.
On-the-Fly Web Plots Made Easy
Using Gnuplot to graph web logs
E-mail with Attachments
Using MIME to send images, audio, and more.
Review of Perl: The Programmer's Companion
A Perl book for experienced programmers.
Perl/Tk Menus: Past, Present, and Future
Creating menubars in Perl/Tk 4 and Perl/Tk 8.
Manipulating Images with Perl and the Gimp
Creating plug-ins for a free alternative to Adobe's Photoshop.
Review of Learning Perl/Tk
An introductory text for graphics programming with Perl/Tk.
Building A Better Hash
How a problem was solved with a homebrew data structure.
Using Databases with DBI: What Not To Do
Speeding up your database connections.
Sending mail without sendmail
Sending mail from Perl in a portable way.
International Sorting with sort
Grappling with "funny" letters? Bi-level sorting can help.
The Solitaire 500 Results
The fastest card players from last issue's contest.
The Fourth Annual Obfuscated Perl Contest
Confuse us and win a prize.
The Perl Journal One Liners
Rob Svirskas (1999) Downloading Web Pages Through A Proxy Server. The Perl Journal, vol 4(2), issue #14, Summer 1999.

Downloading Web Pages Through A Proxy Server

How LWP can cope with firewalls.

Rob Svirskas


Modules Used

LWP, Text::Wrap

In TPJ #13 ("Five Quick Hacks: Downloading Web Pages"), Jon Orwant and Dan Gruhl presented five simple but elegant programs that download information from various web services: stock quotes, weather predictions, currency information, U.S. postal address correction, and CNN headline news. If you're like me, your company uses a firewall to repel wily hackers, which means that we have to use a proxy server to access most URLs. A proxy server (sometimes called a "gateway") is simply an intermediary computer that sends your request to a server and returns its response to you. The bad news: If you try to use the LWP::Simple get() function without first letting it know about your proxy server, it returns nothing at all.

The good news: There's a simple way around this. The LWP::Simple module checks an environment variable called http_proxy. If $ENV{http_proxy} contains the name of a computer, your calls to get() use it as a proxy server. You can set envrionment variables in two ways: either by assigning a value to $ENV{http_proxy}, or by using whatever mechanism your shell or operating system provides. For instance, you can define your proxy server under the Unix bash shell as follows:

% export http_proxy=https://proxy.mycompany.com:1080

This makes LWP::Simple route requests through port 1080 of the proxy server proxy.mycompany.com. You may need to use the set or setenv command, depending upon what shell you're using. There are also related environment variables for non-http services: ftp_proxy, gopher_proxy, and wais_proxy. There's also a no_proxy variable, but we'll talk about that in a bit. Since we are using Perl, There's More Than One Way To Do It. We can still access URLs via a proxy without mucking with environment variables if we replace LWP::Simple with LWP::UserAgent and HTTP:Request::Common. Let's look at a version of the currency converter (the first example from TPJ #13) that uses LWP::UserAgent:

The line beginning $ua->proxy defines our proxy server. This routes the user agent's HTTP requests through proxy.mycompany.com. To use a proxy server for multiple protocols, specify them in a list as below:

$ua->proxy(['http','ftp','wais'] =>
            'https://proxy.mycompany.com:1080');

The programs that download the weather report and the CNN top story (the second and third examples from TPJ #13) are equally simple to convert: replace LWP::Simple with LWP::UserAgent and HTTP:Request::Common, and the calls to get() with the user agent code as described above. The U.S. Postal Address program, zip4, already has the UserAgent code -- all we need to do is add the single line of code after the UserAgent has been created:

$ua = new LWP::UserAgent();
$ua->proxy('http','https://proxy.mycompany.com:1080');

Or, if you're into brevity, create the user agent and set its proxy server in one line:

($ua=(new LWP::UserAgent))->proxy('http',
          'https://proxy.mycompany.com:1080');

Most proxy servers will not let you access URLs within your own domain. That's why you often need to use your browser's Preferences menu to identify exceptions, telling your browser which domains to access without using the proxy. Fortunately, we can do that in our programs as well. If you prefer using environment variables:

export no_proxy="mycompany.com"

This will bypass the proxy server for URLs ending in "mycompany.com" (including URLs like www.itsmycompany.com). As you might expect, this can be done in the program instead:

$ua->no_proxy('mycompany.com');

If your program only needed to access web sites inside your firewall, you wouldn't need to declare the proxy server in the first place, so the no_proxy would be superfluous.

#!/usr/bin/perl -w

# Currency converter.
# Usage: currency [amount] [from curr] [to curr]

use LWP::UserAgent;
use HTTP::Request::Common;

$ua = new LWP::UserAgent();

# Set up your proxy server in the next line.
$ua->proxy('http','https://proxy.mycompany.com:1080');
$resp = $ua->request(GET 
        "https://www.oanda.com/converter/classic?value="
        . "$ARGV[0]&exch=" . uc($ARGV[1]) . 
        "&expr=" . uc($ARGV[2]));

$_ = $resp->{_content};
s/^.*<!-- conversion result starts//s;
s/<!-- conversion result ends.*$//s;
s/<[^>]+>//g;
s/[ \n]+/ /gs;
print $_, "\n";
Martin Krzywinski | contact | Canada's Michael Smith Genome Sciences CentreBC Cancer Research CenterBC CancerPHSA
Google whack “vicissitudinal corporealization”
{ 10.9.234.152 }