Lincoln D. Stein
It's a sad fact of life that computer programs aren't bug free. Loops overwrite array boundaries, memory is mistakenly freed twice,
if-then
statements make decisions based on random data in uninitialized variables, and while blocks go into endless loops. Perl programmers like ourselves have much to rejoice about because we don't have to worry about the memory management problems that plague C and C++ programmers. Of course, we have our own idiosyncratic problems, such as inadvertently using a string in a numeric context.
Many of the programs you use on any given day have bugs. Many of the bugs are minor, and most are invisible. You won't know a program contains a bug until a particular combination of conditions triggers it. For example, a word processing program might work fine for months, and then crash one day when memory is tight and you attempt a global search and replace on a large document.
Bugs are usually just a nuisance. The text editor eats your homework, the printer pours out reams of gibberish, the graphics program flood fills the diagram you've labored over for hours with magenta polka dots. When the bug occurs in software that's part of a web site, however, the consequences can be more frightening. A bug in a web server can cause it to crash, making the site unavailable until someone notices and reboots the server. A bug in a CGI script or server module may cause the browser to display a bewildering "Internal Error" message like the one shown below.
Worse, however, is the risk that a bug in the server software or one of its CGI scripts can be exploited by a malicious remote user. A bug that crashes a web server can be used deliberately to bring a site down in a denial-of-service attack. Scarier still is the possibility that the bug can be exploited to break into the host machine, steal information from it, or modify its files. If this is possible, the software bug becomes a major security hole.
In the past, there have been two major types of bugs that blast security holes in web sites. The first is the failure of the programmer to check user-provided input before passing it to a command shell. This kind of bug shows up frequently in CGI scripts, and unfortunately more often than not in Perl CGI scripts. A clever user can trick a CGI script containing this bug into executing whatever Unix or NT command he likes. Fortunately, there's an easy way to avoid this trap. Activate Perl taint checks by placing a -T flag on the top line of your script, right after #!/usr/bin/perl.(A tainted variable is something that contains data from the outside, like $name = <>, that might be used by nefarious people for nefarious purposes. Taint checking ensures that $name is a name and not something suspicious like /etc/passwd.)
The second type of bug is more commonly found in CGI scripts and web servers written in a compiled language, typically C. In this type of bug, the programmer fails to check the length of data before copying it into a statically allocated buffer. Unexpectedly long data will overwrite memory, and again, a clever remote user can exploit this to gain access to both Unix and NT shells. This bug plagued the NCSA httpd server up to version 1.3. Remote users could exploit it to run any Unix program they cared to on the web server. More recently, this bug surfaced in Microsoft's Internet Information Server (up to version 3.0). By sending the server a URL of a particular length, remote users could make it crash.
In this column, I present a short Perl script called torture.pl designed to catch web servers and CGI programs that suffer from memory allocation problems. It employs a technique called "random input testing" in which the web server is pummeled with a long series of requests for random URLs of varying lengths, some of which can be quite large. A well-designed server or CGI script will accept the random input gracefully and produce some sort of reasonable error message. Software with a memory allocation bug will crash or behave unpredictably. Although this type of testing is very inefficient, the program does catch the problem in both the NCSA httpd and IIS servers.
torture.pl can be used locally to test your own server, or remotely to test other peoples' (but be sure to get their permission first!) To use it, provide torture.pl with the URL of the server or CGI script you wish to test. For example:
$ torture.pl https://www.foo.bar.com/cgi-bin/search torture.pl version 1.0 starting Base URL: https://www.foo.bar.com/cgi-bin/search Max random data length: 1024 Repetitions: 1 Post: 0 Append to path: 0 Escape URLs: 0 200 OK
In this example, we've asked he script to test the server at www.foo.bar.com, using the default settings of one test repetition and a maximum URL length of 1024 bytes. After echoing its settings, the script does a single fetch of the indicated URL and returns the HTTP result code, in this case 200 OK, indicating that the URL was fetched successfully. If we look at the server log, we'll see an entry something like this:
pico lstein - [22/Nov/1997:12:54:17 -0400] "GET /cgi-bin/ search?%F18n%99%DB%15_a%5E8%C2%A7)%7D%AD%196%9DZ%C1%0FX%%D9K %5D%AA%BA=%CC%C7%85%A4%93%81%A9%7F%E3%B7%A6%B0%E1%_%FA%5B%FC V%1D%AEC%E6%F9%A0%91%B4%DE%5E%7De%04%11%85%85%BA%05j%C3%BD%1 2t%9F7%D4%9A%93%D1%F1%B1%DE%A0%F4%C5%9B%96XPu%B7%CD%DB%BB%DF bB%9Ag%AC_&%BE%D4%C6%F6%A9b%8A%7CT%3C%5C%F42 HTTP/1.0" 200 928
This entry shows that torture.pl generated a URL containing a query string consisting of 928 bytes of random data.
Fetching a URL just once isn't much of a torture test. Let's make things more challenging for the CGI script by fetching 1000 random URLs, each containing a random query string of up to 5K in length:
$ torture.pl -t 1000 -l 5000 https://www.foo.bar.com/cgi-bin/search torture.pl version 1.0 starting Base URL: https://www.foo.bar.com/cgi-bin/search Max random data length: 5000 Repetitions: 1000 Post: 0 Append to path: 0 Escape URLs: 0 200 OK 200 OK 200 OK 200 OK ...
This time we use the -t option to set the repetitions to 1000, and the -l option to set the maximum length to 5000 bytes. The script fires off the URLs and begins printing out the server responses.
By default, the torture script uses the GET method to access the server. This actually imposes a server-specific limit on the length of the URL. Many servers will truncate URLs that are larger than some reasonable length. If you'd like to blast a CGI script with large requests, use the POST command instead. To do this, pass -P to torture.pl.
If you're more interested in testing the server itself than a CGI script, you should use the -p option. That makes the script randomly generate URL paths rather than query strings. The output looks like this:
$ torture.pl -p -t 1000 -l 5000 https://www.foo.bar.com/ torture.pl version 1.0 starting Base URL: https://www.foo.bar.com/ Max random data length: 5000 Repetitions: 1000 Post: 0 Append to path: 1 Escape URLs: 0 400 Bad Request 404 File Not Found 404 File Not Found 404 File Not Found 400 Bad Request 400 Bad Request ...
Now, because the script is generating random URL path names, the expected outcome is either 400 Bad Request for a URL that contains invalid characters, or 404 File Not Found for a valid URL that doesn't point to any particular file.
The server log shows the difference between this and the previous tests:
pico.foo.bar.com lstein - [22/Nov/1997:13:21:03 -0400] "GET / %F2%F1%FE%98%8C%F5%8E0%BC%17%A0%F1%DE%DD%9D%99%D4%9C%ACb%EA% AEg%BC*%B3%D2E%8C%E39~%E3%D1%D9%60=%97x%DE%89W%BC'%F0%91%C4% FA?(%E5%EE%90%A3%19Ew_%D1%5C%98QAj%5D%1B%CB%9A%B3Dz%3E%9C7e% 8D%C9+%88 HTTP/1.0" 500 404
When -p is used, the random data is appended to the URL with a / character rather than a ? mark. The server treats the request as an attempt to fetch a document, rather than as an attempt to pass a query string to a CGI script.
The final option that torture.pl recognizes is -e. When this is provided, the script uses URL escape codes to generate invalid but legal URLs. Otherwise, the script generates arbitrary binary data, including nulls and control characters. This tests the server's ability to handle binary input.
In all the examples I've shown so far, the script and server have passed the test by processing the request or exiting with an error. What happens when a program fails the test? If the testing causes a CGI script to crash, you'll see something like this:
200 OK
200 OK
200 OK
500 Internal Server Error
200 OK
200 OK
500 Unexpected EOF
torture.pl
0 #!/usr/bin/perl 1 2 # file: torture.pl 3 # Torture test web servers and scripts by sending them large 4 # arbitrary URLs and record the outcome. 5 6 use LWP::UserAgent; 7 use URI::Escape 'uri_escape'; 8 require "getopts.pl"; 9 10 $USAGE = <<USAGE; 11 Usage: $0 -[options] URL 12 Torture-test Web servers and CGI scripts 13 14 Options: 15 -l <integer> Max length of random URL to send [1024 bytes] 16 -t <integer> Number of times to run the test [1] 17 -P Use POST method rather than GET method 18 -p Attach random data to path instead of query string 19 -e Escape the query string before sending it 20 USAGE 21 ; 22 $VERSION = '1.0'; 23 24 # process command line 25 &Getopts('l:t:Ppe') || die $USAGE; 26 # seed the random number generator (not necessary in Perl 5.004) 27 srand(); 28 29 # get parameters 30 $URL = shift || die $USAGE; 31 $MAXLEN = $opt_l ne '' ? $opt_l : 1024; 32 $TIMES = $opt_t || 1; 33 $POST = $opt_P || 0; 34 $PATH = $opt_p || 0; 35 $ESCAPE = $opt_e || 0; 36 37 # cannot do both a post and a path at the same time 38 $POST = 0 if $PATH; 39 40 # create an LWP agent 41 my $agent = new LWP::UserAgent; 42 43 print <<EOF; 44 torture.pl version $VERSION starting 45 Base URL: $URL 46 Max random data length: $MAXLEN 47 Repetitions: $TIMES 48 Post: $POST 49 Append to path: $PATH 50 Escape URLs: $ESCAPE 51 52 EOF 53 ; 54 55 # Do the test $TIMES times 56 while ($TIMES--) { 57 # create a string of random stuff 58 my $garbage = random_string(rand($MAXLEN)); 59 $garbage = uri_escape($garbage) if $ESCAPE; 60 my $url = $URL; 61 my $request; 62 63 if (length($garbage) == 0) { # if no garbage to add, fetch URL 64 $request = new HTTP::Request ('GET', $url); 65 } 66 67 elsif ($POST) { # handle POST request 68 my $header = new HTTP::Headers ( 69 Content_Type => 'application/x-www-form-urlencoded', 70 Content_Length => length($garbage) 71 ); 72 # garbage becomes the POST content 73 $request = new HTTP::Request ('POST',$url,$header,$garbage); 74 75 } else { # handle GET request 76 77 if ($PATH) {# append garbage to the base URL 78 chop($url) if substr($url, -1, 1) eq '/'; 79 $url .= "/$garbage"; 80 } else { # append garbage to the query string 81 $url .= "?$garbage"; 82 } 83 84 $request = new HTTP::Request ('GET', $url); 85 } 86 87 # do the request and fetch the response 88 my $response = $agent->request($request); 89 90 # print the numeric response code and the message 91 print $response->code, ' ', $response->message, "\n"; 92 } 93 94 # return some random data of the requested length 95 sub random_string { 96 my $length = shift; 97 return unless $length >= 1; 98 return join('', map chr(rand(255)), 0..$length-1); 99 }
Every so often the random testing triggers a bug in the CGI script that causes it to abort. Depending on whether the bug occurred before or after the script printed its HTTP header you may see a 500 Internal Server Error message or 500 Unexpected EOF message. Either way, you've got a problem.
If the server itself crashes during testing, the results are even more dramatic:
200 OK 200 OK 200 OK 200 OK 200 OK 500 Internal Server Error 500 Could not connect to www.foo.bar.com:80 500 Could not connect to www.foo.bar.com:80 500 Could not connect to www.foo.bar.com:80 500 Could not connect to www.foo.bar.com:80 ...
In this sequence, everything went along well until the torture script triggered a bug in the server, causing a 500 Internal Server Error message. The server then went down completely, making it unavailable for future incoming connections.
The Code
The previous listing shows the code for torture.pl. It makes extensive use of Martijn Koster and Gisle Aas's excellent LWP web client library, available from the CPAN as LWP/libwww-perl.
Lines 6-8: We bring in the LWP::UserAgent library, which provides all the functions we need for generating and processing HTTP requests. We next import the uri_escape() function from the URI::Escape module, which implements the rules for escaping URLs. Finally, we load the getopts library (part of the standard Perl distribution), a handy package for parsing a script's command line options.
Lines 24-38: We process the command-line options and assign defaults to any not provided. The only required argument is the base URL to fetch. If present on the command line, we assign it to $URL. Otherwise we abort with a usage statement.
We also seed the random number generator in order to avoid generating the same series of random URLs each time the script is run. This step is no longer necessary in Perl 5.004, which seeds the random number the first time you invoke the rand() function.
Lines 41-53: We create a new UserAgent object (think of it as a virtual browser) that will connect to the web server and make the URL request. We then print the test parameters so that they can be recorded.
Lines 56-92: This is the meat of the program. We enter a loop that repeats as many times as requested. Each time through the loop, we create a string of random data by calling the random_string() function described below, assigning the result to a variable with the inelegant but descriptive name $garbage. We also assign the base URL to a local variable named $url.
What we do now depends on the length of the random data and the script's options. If the random data happens to be of zero length, we do nothing with it. We simply generate a GET request to fetch the base URL by creating a new HTTP::Request object (line 64). The two arguments to HTTP::Request::new() are the request method (GET in this case) and the URL to fetch.
Otherwise, if the user requested a POST transaction, we need to set up the HTTP headers that will be sent to the server. We do this by creating a new HTTP::Headers object in line 68, passing the new() method a hash with the HTTP headers we wish to send. For a valid POST operation, we'll need two header fields: a Content-Type field with a value of application/x-www-form-urlencoded, to fool the script into thinking that the random data was generated by a bona fide fill-out form, and a Content-Length field containing the length of the random data. We now create an HTTP::Request using the four-argument form of HTTP::Request::new() (line 73). As before, the first and second arguments correspond to the request method and the URL to fetch. The optional third and fourth arguments contain the HTTP::Headers object and content to be POSTed. In this case, the content is the random data that we generated earlier.
In lines 77-84, we create a GET request for non zero-length random data. This is merely a matter of appending the random data to the requested URL and generating the appropriate HTTP::Request object. If the command-line options indicate that we're to generate a query string for a CGI script, we append the random data to the base URL after a ? character. If the user wishes to generate a random URL instead, we append the data after a / character.
On line 88, we perform the actual network fetch by calling the UserAgent object's request() method. The response is returned as an HTTP::Response object, and stored in a like-named variable. We use this object on line 91 to print the result code (e.g. 500) and result message (e.g. Internal Server Error).
Lines 95-99 define the random_string() function, which generates an array of random numbers between 0 and 255, then transforms the array into a random ASCII character string using Perl's map() and chr() functions. Notice that this function isn't particularly memory efficient, since it generates a temporary integer array as long as the requested random string. Replace it with a loop if this bothers you.
Wrapping Up
That's all there is to it. Point the script at your favorite server and let it rip! For best results I recommend that you run the torture script overnight using at least a thousand test repetitions (the more the better). Redirect its output to a file so that you can analyze the results at your leisure. Be careful not to run the tests on a server that's being used for a live web site. Even if there aren't any bugs to trigger, the script will load down the server and might hurt its performance.
In the next issue, I'll continue the theme of web server testing by presenting a simple server benchmarking application.
Lincoln Stein wrote CGI.pm.