Martin Krzywinski - The Perl Journal - Canada's Michael Smith Genome Sciences Centre

Volumes 1–6 (1996–2002)

Code tarballs available for issues 1–21.

I reformatted the CD-ROM contents. Some things may still be a little wonky — oh, why hello there <FONT> tag. Syntax highlighting is iffy. Please report any glaring issues.

The Perl Journal

#14

Summer 1999

vol 4

num 2

file_download download code 142,848 bytes

Perl News

What's new in the Perl community.

Chris Nandor

What Is Truth?

Truth and falsehood aren't black and white.

Nathan Torkington

Downloading Web Pages Through A Proxy Server

How LWP can cope with firewalls.

Rob Svirskas

Seven Useful Uses of local

Some rare occasions when my won't do.

Mark-Jason Dominus

On-the-Fly Web Plots Made Easy

Using Gnuplot to graph web logs

Lincoln D. Stein

E-mail with Attachments

Using MIME to send images, audio, and more.

Dan Sugalski

Review of Perl: The Programmer's Companion

A Perl book for experienced programmers.

Perl/Tk Menus: Past, Present, and Future

Creating menubars in Perl/Tk 4 and Perl/Tk 8.

Steve Lidie

Manipulating Images with Perl and the Gimp

Creating plug-ins for a free alternative to Adobe's Photoshop.

Aaron Sherman

Review of Learning Perl/Tk

An introductory text for graphics programming with Perl/Tk.

Building A Better Hash

How a problem was solved with a homebrew data structure.

Dan Schmidt

Using Databases with DBI: What Not To Do

Speeding up your database connections.

Thomas Akin

Sending mail without sendmail

Sending mail from Perl in a portable way.

Dan Sugalski

International Sorting with sort

Grappling with "funny" letters? Bi-level sorting can help.

Sean M. Burke

The Solitaire 500 Results

The fastest card players from last issue's contest.

David Nicol

The Fourth Annual Obfuscated Perl Contest

Confuse us and win a prize.

Felix Gallo

The Perl Journal One Liners

Thomas Akin (1999) Using Databases with DBI: What Not To Do. The Perl Journal, vol 4(2), issue #14, Summer 1999.

Speeding up your database connections.

Thomas Akin

Perl's DBI module allows you to use Perl with database engines. It's one of the most popular Perl utilities, and I have dozens of clients who make extensive use of DBI for their daily production needs. Some clients use DBI much more effectively than others, and in this article we'll see how.

DBI and loops

DBI statements are often used within loops that store or fetch data from a database. You might insert the contents of a text file into a database line by line, or read and parse multiple lines of output from a database query. I often see code like the following from new DBI programmers:

foreach $line (<FILE>) {
  chomp $line;
  ($alpha, $beta, $charlie, $delta) = split(/,/,$line);  
  $sql = qq{ 
      insert into $table (col_a, col_b, col_c, col_d)
      values($alpha,$beta,$charlie,$delta)
  };

  $dbh = DBI->connect($dsn,'login','password');
  $sth = $dbh->prepare($sql);
  $sth->execute;
  $dbh->disconnect;
}

This code works, but not very efficiently. With the above code it took an hour to insert 200,000 rows into a MySQL database -- and MySQL is pretty fast! It might take half a day with Oracle.

There are lots of improvements we can make. The first is just common sense: Never put anything in a loop that doesn't absolutely have to be there. For each of our 200,000 insertions, we connect to the database, prepare our SQL statement, execute it, and disconnect from the database. However, we only need to connect and disconnect once. We simply have to move our connect() and disconnect() outside the loop to drastically improve our performance:

$dbh = DBI->connect($dsn,'login','password');

foreach $line (<FILE>) {
  chomp $line;
  ($alpha, $beta, $charlie, $delta) = split(/,/,$line);
  $sql = qq{
      insert into $table (col_a, col_b, col_c, col_d)
      values($alpha,$beta,$charlie,$delta)
  };

  $sth = $dbh->prepare($sql);
  $sth->execute;
}

$dbh->disconnect;

This reduces our run time to half an hour, doubling our speed.

Placeholders

DBI allows us to further improve our code by using placeholders. We now connect and disconnect the database only once, but we still prepare and execute our SQL statement 200,000 times. The statement we prepare doesn't change much; all that changes is the insertion value. We can create a generic statement that we prepare only once, outside the loop. With this generic statement we can specify the insertion values at execution time. DBI uses a ? to define a placeholder, so our new SQL statement looks like this:

$sql = qq{
  insert into $table (col_a, col_b, col_c, col_d)
  values(?, ?, ?, ?)
};

The statement is prepared the same as before:

$sth = $dbh->prepare($sql);

Now we can fill in each ? at execution time, once we fill in the ? values. There are two ways to do this. First, we can use the bind_param() subroutine to indicate what values to use:

$sql = qq{
  insert into $table (col_a, col_b, col_c, col_d)
  values(?, ?, ?, ?)
};

$sth = $dbh->prepare($sql);

foreach $line (<FILE>) {
  chomp $line;
  ($alpha, $beta, $charlie, $delta) = split(/,/,$line);
  $sth->bind_param(1, $alpha);
  $sth->bind_param(2, $beta);
  $sth->bind_param(3, $charlie);
  $sth->bind_param(4, $delta);
  $sth->execute;
}

The bind_param() subroutine allows a data type as a third argument. Normally, this is used to tell the driver whether the placeholder is a number or a string:

$sth->bind_param(1, $alpha, SQL_INTEGER);

$sth->bind_param(2, $beta, SQL_VARCHAR);

The second way to specify values at execution time is to pass them directly as arguments to the execute statement:

$sth->execute($alpha, $beta, $charlie, $delta);

Using this method our code simplifies to:

$dbh = DBI->connect($dsn,'login','password');
$sql = qq{
    insert into $table (col_a, col_b, col_c, col_d)
    values(?, ?, ?, ?)
}

$sth = $dbh->prepare($sql);

foreach $line (<FILE>) {
    chomp $line;
    ($alpha, $beta, $charlie, $delta) = split(/,/,$line);
    $sth->execute($alpha, $beta, $charlie, $delta);
}

$dbh->disconnect;

Our code now takes twenty minutes to run. We achieved this improvement despite the fact that the MySQL database driver only emulates placeholders. Had this been a database that directly supports placeholders, such as Oracle, the improvement would have been even more dramatic. Placeholders are any easy way to improve your DBI coding, but there are a few gotchas that beginners need to avoid. First, notice that placeholders don't have quotes. This is correct:

$sql = qq{
        insert into $table (col_a, col_b, col_c, col_d)
        values(?, ?, ?, ?)
};

This is not correct:

$sql = qq{
        insert into $table (col_a, col_b, col_c, col_d)
        values('?', '?', '?', '?')
};

Placeholders cannot be used for column names or table names. This won't work:

$sql = qq{
         select alpha, ? from table
};

Nor will this:

$sql = qq{
         select alpha, beta from ?
};

Or even this:

$sql = qq{
          select alpha, beta from table where ? > 200
};

However, you can use Perl variables for column and table names:

$sql = qq{
        select alpha, beta from $table where $col > 200
};

You'll have to re-prepare this statement whenever you want to change the values of $table or $col.

Fetches

We've dealt mostly with inserting data into a database, so now we'll look at the ways DBI provides to get data out. Most of those ways follow the same basic format:

$dbh = DBI->connect($dsn, 'login', 'passwd');

$sql = qq{
    select * from $table
};

$sth = $dbh->prepare($sql);
$sth->execute;

while (@row = $sth->fetchrow_array) {
    print "@row\n";
}

$dbh->disconnect;
      The most efficient of these is fetchrow_arrayref():

while ($row = $sth->fetchrow_arrayref) {
    print "@$row\n";
}

There's also fetchrow_hashref(), which stores the keys and values of the database into a hash:

while ($ref = $sth->fetchrow_hashref) {
    foreach $key (keys %{$ref}) {
       print "$ref->{$key}, ";
    }
    print "\n";
}

Bind Columns

Database fetches can be made even more efficient and elegant with the use of bind columns. Bind columns allow you to bind a variable to each column of a table, so that whenever a fetch is performed the bound variables are automatically updated:

$sth->bind_columns(\%attr,@references_to_variables_to_bind);

We can use it like this:

$sql = qq{
     select alpha, beta, delta from table
};

$sth->prepare($sql);
$sth->execute;

my ($alpha, $beta, $delta);
$sth->bind_columns(undef, \$alpha, \$beta, \$delta);

while ($sth->fetch) {
    print "$alpha - $beta - $delta\n";
}

Here, the fetch() subroutine is synonymous with fetchrow_arrayref(). This subroutine, combined with bind_columns(), is the fastest way to fetch data with DBI.

Error Checking

There's a big problem with all of code snippets you've seen in this article: We never check the return values of DBI methods. We simply assume that every DBI call was successful. The obvious solution is to add an or die clause after every DBI method call:

$dbh = DBI->connect($dsn,'login','passwd') 
             || die $DBI::errstr;
$sth = $dbh->prepare($sql) || die $dbh->errstr;
$sth->execute || die $sth->errstr;

However, there is a cleaner solution. Each DBI handle has a boolean attribute called RaiseError. If we set this to true, any errors triggered by that handle cause the script to die with an appropriate error message, saving us from having to check every DBI call we make. RaiseError can be set via the connect statement:

$dbh = DBI->connect($dsn,'login','passwd', 
            { RaiseError => 1 });

We can also call it after a connection is already made:

$dbh->{RaiseError} = 1;

Transactions

I've found that many administrators shy away from using transactions with DBI. For those unfamiliar, transactions provide a way to group multiple SQL statements together so that database changes are only made if every statement is successful. For example, a transaction for updating a customer order database might include statements to update inventory, update accounts payable, and update shipping manifests. All updates must succeed; otherwise, we don't want to perform any of them. If only some of the updates succeeded, we might bill a customer for an order not shipped, or ship an order to a customer without billing them.

Many administrators shy away from using DBI transactions.

By default, DBI commits each statement as it's performed. To use transactions, we need to disable this behavior with the AutoCommit variable:

$dbh = DBI->connect($dsn,'login','passwd', {
	      RaiseError => 1,
	      AutoCommit => 0,
});

We can also just say:

   $dbh->{AutoCommit} = 0;

With AutoCommit off, changes are revocable until commit() is called. Uncommitted statements can be undone with rollback().

Next, we need a way to group our statements that lets us test whether any failed. The easiest and most robust way to do this is to set RaiseError and wrap the statements in an eval block:

	$dbh->{RaiseError} = 1;
        eval {
           $sth_1->execute;    # Update Inventory
           $sth_2->execute;    # Update Accounts Payable
           $sth_3->execute;    # Update Shipping
        };
        if ($@) 
           $dbh->rollback; 
        } else { 
           $dbh->commit; 
        }

If any statement in the eval block fails, $@ is set and we revoke all three statements.

Don't forget to commit your statements. If you do forget, what happens next depends on the database. Some databases automatically commit them; others roll them back.

References

The material for this article came from personal experience, a thousand or so readings of the DBI documentation and FAQ, and a generous review by DBI guru Tim Bunce. You can find the DBI module and its related documentation at https://www.symbolstone.org/technology/perl/ DBI.

Thomas Akin is President & CEO of CrossRealm Consulting, Inc. in Atlanta, Georgia. He spends his spare time teaching, writing, and as a Perl evangelist. He can be reached at takin@crossrealm.com.