As I write this, canvassing boards in Palm Beach and Broward Counties, Florida, are desperately trying to get manual recounts done in time for a court-imposed deadline. Countless lawsuits and countersuits launched by the Republican and Democratic adversaries are in various stages of adjudication, citizens are up in arms because they feel they have been disenfranchised by poorly-designed "butterfly ballots" and other election day mistakes, the Florida legislature has threatened to take the election into its own hands, and the U.S. Supreme Court has just agreed to add its considerable weight to the fracas.
All this because many of Florida's counties ballot using the Hollerith punchcard, an antiquated balloting technology whose main virtue is its low price tag. On the nightly news, election officials discourse gravely on "chads," those little bits of paper that some voters have trouble dislodging. We hear of pregnant chads, hanging chads, and dimpled chads. One election board volunteer is even accused of having eaten the chads that dropped out of the ballots he was handling.
Is this any way to run an election? I don't think so. Fortunately, Perl can help rebuild democracy, and in this article I'll show a simple Perl-based framework for a secure Internet-based balloting system.
About Secure Elections
Much of the inspiration for this article comes from Bruce Schneier's magnum opus Applied Cryptography and specifically from section 6.1, "Secure Elections." As explained there, the guiding principles of a secure election are to maintain privacy and prevent cheating. Schneier lays out six minimal requirements for a good election protocol:
- Only authorized voters can vote.
- No one can vote more than once.
- No one can determine for whom anyone else voted.
- No one can duplicate anyone else's vote.
- No one can change anyone else's vote without being discovered.
- Every voter can verify that his vote has been taken into account in the final tabulation.
Conventional paper ballots satisfy requirement 1 by voter registration, a process that ensures that only American citizens of a certain age can vote. Requirements 2 and 4 are satisfied by crossing the registered voter's name off a list when the voter enters the polling location, and 3 is satisfied by using an anonymous ballot. The last two requirements, however, are not completely satisfied by paper ballots, and are the source of much of the uncertainty and accusations in the current election fiasco.
Schneier's book describes several cryptographic protocols which meet these six requirements for secure elections. Some of them are quite elaborate and require new software at the voter's side of the connection. In this article, we will use one of the simpler ones that happens to be well suited for web-based voting.
This protocol requires two independent central facilities to work, called the CEA and the CLA. The CEA is the Central Enumeration Agency (although the Bush camp might call it "Chad Eaters Anonymous"). It is responsible for collecting and tallying ballots, and publishing the results on election night. The CLA is the Central Legitimization Agency (or "Controlling Legal Authority" in Gore-speak). It is responsible for registering and credentialing voters.
Here's how it works:
- Before the election, the CLA supervises voter registration. Each registered voter is issued a Voter Registration Number (VRN), which is simply a large random number. VRNs are issued electronically, for example, by email or floppy (see Figure 1).
- The CLA maintains a list of all VRNs, and a list of who VRNs were issued to, in order to prevent someone from registering twice. There is no record of who a particular VRN was issued to.
- Prior to election day, the CLA sends the CEA the list of VRNs.
- On election day, the voter sends in an electronic ballot that contains his choices for elected office. The ballot contains his VRN from step 1.
- The CEA checks that the VRN is valid, and crosses it off the list in order to prevent someone from voting twice.
- The CEA generates a large random confirmation number (CN) for the voter, and uses it to enter the voter's choices into the vote tally.
- The CEA returns the CN to the voter.
- After all votes have been received, the CEA publishes the outcome, along with the lists of CNs and for whom their owners voted.
Figure 1. The voter registration number.
Dear Registered Voter, Attached is your voter registration number. PLEASE KEEP A COPY OF THIS E-MAIL; YOU WILL NEED IT IN ORDER TO VOTE. DO NOT BEND, FOLD OR SPINDLE. --REGISTRATION-START-- 8334290304185989711211356 0784480602526249967749323 7190985476311629502773466 9927297350189474428770582 --REGISTRATION-END--
The privacy of the ballot is ensured by the separation of the CLA and the CEA. One facility knows the identity of the voters, but not who they voted for. The other has access to their vote, but not their identity.
This protocol discourages election fraud in a number of ways. If a registered voter tries to vote twice, he will be caught in step 5. We can discourage non-registered voters from trying to guess valid VRNs by using large random numbers (in this example, we use 100-digit numbers).
If the CEA itself tries to cheat by stuffing the ballot box or "losing" ballots, this can be detected in step 8. By publishing a list of CNs and their votes, the protocol allows voters to check the list to make sure that their votes were tallied correctly.
There are still ways to defraud this protocol. For example, the CEA and CLA can collude to figure out the identity of a voter; if voting is being done over the web, the CEA can use the voter's IP address to figure out who he is. The first of these problems can be addressed by election auditors and statutory law. The second can be addressed using proxy servers or by having the balloting take place in central polling places equipped with ATM-like web browsers.
It is also important to note that the protocol described here is a slight departure from the one described by Schneier. The original protocol is built along an email model, in which the voter himself generates the CN, rather than letting the CEA do it for him.
The Database Schema
Tallying votes is a task for a database, and this application uses MySQL to do the heavy lifting. Listing 1 shows the schema used by the CEA's database. In addition to managing the vote tally itself, the information in the database is used to generate the ballot on the fly. This avoids having to design a new web page for each ballot, and discourages election officials from coming up with butterfly ballots and other innovative designs.
The schema has six tables:
- party. This table lists political parties. Each party has a unique ID, numbered from 1, and a short description, such as "Republican Party".
- office. This table lists the offices that are up for grabs. Each office has a unique ID and a short description like "Town Dogcatcher".
- registration. This table lists valid VRNs. Each VRN has a small integer associated with it that indicates whether the VRN has been used in the current balloting.
- candidate. This table lists infor-mation about the candidates in the current election. Each candidate has a unique ID as well as fields that describe who he is and what he's running for. The
party_id
andoffice_id
fields describe the candidate's party affiliation and the office he is a candidate for. Thefirst_name
andlast_name
fields are self-evident. AUNIQUE
constraint ensures that there cannot be two candidates from the same party running for the same office. The same candidate can, however, run for two different offices, which should make Senator and/or Vice President Lieberman happy.
- tally. This table keeps track of the vote tally. The
voter_id
field corresponds to the voter's CN ballot confirmation number (not the VRN). Thecandidate_id
contains the voter's choice for candidate, and office_id contains the office the voter wants to place him in. AUNIQUE
constraint ensures that a voter can only vote once for a given office.
- writein. This table keeps tabs on write-in candidates.
Listing 2 gives a set of MySQL statements that insert some test values into the CEA database. There are three offices on this test ballot, including President of the United States, State Senator, and Town Dogcatcher. (No soft money was used for the selection of the various celebrities listed in this example, and I was not promised a night in the Lincoln bedroom.)
Generating VRNs
We won't develop the CLA very fully in this article. The CLA should maintain a database of voter registration information, such as birth dates, addresses, and driver's licenses. For testing purposes, we'll generate blocks of VRNs using the program shown in Listing 3, and load the VRNs into the database using the program, shown in Listing 4.
The generate_vrns.pl
script generates the number of VRNs requested on the command line. The part that does all the work is the subroutine random_digits()
between lines 12 and 19. It is critical to use a good random number generator to generate VRNs; otherwise, valid VRNs would be too easy to guess. The Math::TrulyRandom module (available on CPAN) promises to do this, but it hangs on my Linux system. Instead, I use the /dev/urandom device, which uses a Linux kernel driver that generates random data from non-deterministic system information, such as interrupts. We read the requisite number of random bytes from the device, and then transform them into a set of base 10 digits.
The enter_vrn.pl
script shown in Listing 4 takes a list of VRNs generated by the previous script, and enters them into the CEA database using the DBI module and its DBD::mysql driver.
Line 5 connects to the CEA database or dies with an error message. Line 6 prepares a SQL INSERT
statement that will add VRNs to the registration table. The code between lines 8-14 loops through the list of VRNs one paragraph at a time, extracts the VRN information, and calls the SQL statement's execute()
method to perform the insertion. After the last VRN is processed, we call finish()
to close the SQL statement, and disconnect()
to disconnect from the database.
The E-Ballot
The fun part is the electronic ballot generated by the CEA. Figure 2 shows how it looks to the voter. There are three steps to voting. In the first step, the voter makes his choices by selecting radio buttons in the ballot. Each candidate is sorted by his party and the office he is running for. There is also a text field that allows for write-ins.
In step 2, the voter enters his VRN. He can do so by cutting and pasting the VRN into a large text field, or by uploading a file provided by the CLA that contains the VRN.
When satisifed, the voter presses the VOTE button. The CEA checks that the ballot is filled out, that the voter hasn't voted twice for candidates for the same office, and that the VRN is valid and has only been used once. If these checks are satisified, the CEA enters the voter's choices into the database, generates a CN, and displays the confirmation to the user (Figure 3).
Later, when the votes have been tallied and published, the user can go to the published results and make sure that his vote was correctly counted.
Of course, for the e-ballot to be at all secure, all communication between the voter and the CEA's web site must use SSL, and the voter should be instructed to check the web site's SSL certificate to make sure that it is valid.
The vote
Program
The e-ballot is implemented by a conventional CGI script shown in Listing 5. The listing is a bit long, but most of it is fancy formatting in the ballot section. We'll step through it a line at a time.
Lines 0-4: Load modules. We turn on taint checking and Perl warnings. Taint checking ensures that we will be prevented from doing anything stupid with user-supplied input, such as passing it to shells, and warnings alert us of uninitialized variables and the like. We load the CGI and DBI modules. One trick to notice is that among the symbols loaded from CGI are *table
, *Tr
, and *dl
. The asterisk means to automagically generate functions to start and end the corresponding HTML tags, such as start_table()
to generate a <TABLE>
tag and end_table() to generate a </TABLE>
tag.
Lines 5-7: Define constants and globals. We define a constant for the length of a valid VRN, and declare globals that will hold various information about the ballot. @CANDIDATES
is a two-dimensional list of candidates, in which the first dimension is the candidate's party affiliate and the second dimension is the office the candidate is running for. The values of this array are candidate ID's. Each of @PARTIES
, @OFFICES
, and @CANDIDATE_NAME
map from database IDs into human-readable labels. To adjust for the fact that 1 is the lowest ID used in the CEA schema, we adjust each of the indexes by 1. For example, the party_id
for the "Republican Party" is 1, and so it can be found in @PARTIES
at $PARTIES[0]
.
We set the PATH environment variable to a safe known value in order to satisfy Perl's taint-check requirements.
Lines 8-9: Connect to the database. We call the DBI->connect()
method to connect to the CEA database running on the local machine. We don't use any password authorization here, but in a real application we would want to.
Lines 10-20: Start the page. We emit the standard HTTP header by calling the CGI module's header()
function, and start the top of the HTML page by calling start_html()
and h1()
to generate HTML boilerplate and a level-one header.
Lines 21-22: Initialize globals. We call get_globals()
to initialize the four global variables that describe the current ballot. get_globals()
will make the appropriate database calls to do this.
Lines 23-26: Handle a submitted ballot. We call the CGI module's param()
method to look for a CGI field named VOTE. If such a field exists, then it indicates that the user has submitted his ballot by pressing the VOTE button. We immediately call a subroutine named validate()
, which checks that the ballot is filled out correctly. If the ballot checks out, it returns the user's VRN, and we pass the VRN to a subroutine named enter_ballot()
that adds the information to the growing tally.
Lines 27-30: Generate a new ballot. Otherwise, we call generate_ballot()
to create the ballot that the user sees on the page. This subroutine will be called the first time the user loads the page, as well as when the validate()
subroutine detects an error in a previously-submitted ballot.
Lines 31-34: Finish up. We call end_html()
to generate the boilerplate at the bottom of the HTML page, and disconnect from the database. We then exit.
Lines 40-60: get_globals()
subroutine. This subroutine is responsible for loading the global variables with information about the current election. For efficiency's sake, it fetches all the information it needs in a single large SQL statement that retrieves each of the candidates, their party and office IDs, and the human-readable labels for candidates, offices, and parties.
We do this by passing the appropriate SQL statement to the database's prepare()
method, and then executing the resulting statement handle. We then loop over each row of the returned table, populating the @CANDIDATES
, @CANDIDATE_NAME
, @PARTIES
, and @OFFICES
arrays as we go. Notice how we offset each ID by one in order to use it as an array index.
Lines 61-71: The generate_ballot()
subroutine. This subroutine is responsible for generating the HTML for the ballot. It calls the CGI module's start_multipart_form()
function in order to start a fill-out form. We use this function rather than the more common start_form()
because we will be accepting a file containing the voter's VRN for upload, and only the multipart-style form can accept file uploads.
We then call three functions to generate the top, middle, and bottom of the form, and call end_form()
to generate the form's closing tag.
Lines 72-104: The voting_matrix(
) subroutine. This large subroutine generates the table that shows the ballot information. Don't be intimidated. The part of the subroutine that does all the work is just two nested loops. The outer one loops through parties, which become the rows of the ballot, and the inner one loops through offices, which become the columns. Within the inner loop, we check whether @CANDIDATES
contains a candidate for the current party affiliation and office. If there is, we generate a radio button whose name is the index into @OFFICES
and whose value is the candidate ID. For the label we use the human-readable version of the candidate's name, derived from @CANDIDATE_NAME
.
After creating the radio buttons for standard candidates, we create a series of write-ins, one for each office. These are text fields with the name "writein X", where X is the ID of the corresponding office.
Lines 105-116: The registration_number()
subroutine. This subroutine generates the section of the ballot that prompts the user for his VRN. There are two fields that can be used. One is a large text area named registration_id
, where the user can cut and paste his VRN. The other is a file field named registration_file
, which the user can use to upload a text file containing the VRN.
Lines 117-123: The vote()
subroutine. This subroutine generates a single HTML submission button labeled "VOTE". This concludes the portion of the script that generates the e-ballot.
Lines 124-165: The validate()
subroutine. This is responsible for validating the voter's submission. There are several checks on the integrity of the ballot. First we check for the easy things: whether the VRN has been filled in, and whether any of the radio buttons in the ballot have been selected (we only require a minimum of one office to be selected; it's perfectly valid for the user to vote for some offices and enter no selection for others).
We now check for more subtle problems. Lines 134-138 verify that each office has exactly 0 or 1 votes. Although the fill-out form only allows a single candidate from each office to be selected, a malicious voter could roll his own fill-out form and try to vote for multiple candidates from the same office.
Having passed these checks, we recover the VRN. If the registration_file
field is present, then the user has chosen to upload a file. We call the CGI module's param()
function to recover a filehandle for the uploaded file, and parse out its contents. Otherwise, if the registration_id
field is present, we use its contents to recover a cut-and-pasted VRN.
Having recovered the VRN, we ensure that it is valid. First, we check that the VRN is the correct length. If so, we consult the database to see whether the VRN is in the registration table, and whether it is still unused. If both these tests pass, then we declare that the submitted ballot is valid and return the VRN to the caller.
When any of the tests fails, it calls a utility subroutine named error()
. The error()
function displays a bold red error message on the top of the page, and prompts the user to make corrections and try again.
Lines 166-205: The enter_ballot()
subroutine. The enter_ballot()
subroutine is where the information from the ballot is collected and entered into the database, registering the voter's intent and keeping the sacred flame of Democracy alight.
The first thing that we do is to update the database in order to mark the VRN as used. This prevents the VRN from being used again. We do the update in a way that will cause it to fail unless the VRN is currently marked as unused, and avoids an attack based on race conditions while updating the database.
We now generate a confirmation number for the ballot by caling random_digits()
. We use the newly-generated ID to generate two SQL insert statements, one for regular candidates, and the other for write-ins. Each statement uses "?" as placeholders for the office and candidate fields.
We now enter the voter's choices into the database, simultaneously generating a confirmation page as we do so. We loop over the @OFFICES
array, looking for CGI parameters corresponding either to a regular candidate for the office or to a write-in. If we find a write-in, we recover it and insert it into the writein table using the appropriate insert statement. Otherwise we insert the voter's choice into the tally table. Notice how we add 1 to the office index in order to convert it back into the 1-based ID used in the MySQL tables.
Each time through the loop, we print out the office and the selected candidate, using a definition list (<DL>) style HTML list.
At the end of the subroutine, we finish()
both SQL statements, and then print out a nicely-formatted version of the voter's ballot confirmation number.
Lines 209-223: Utility subroutines. We've already seen the random_digits()
subroutine. The error()
subroutine takes its arguments and incorporates them into an HTML paragraph, using a red font and a large font size. The subroutine explicitly returns undef, which allows this type of idiom in the caller:
return error('Please stop munching chads
and start punching ballots')
unless $is_valid;
Tallying the Votes
On election night, tallying the vote is simply a matter of issuing a SQL statement to add up each candidate's counts and grouping the results by office. Here's one that will do the trick:
By this count Ogden Nash deserves to be the next President, Timothy O'Leary next State Senator, and Morticia Addams the next Dogcatcher. A definite improvement over this year's choices!
Lincoln D. Stein wrote CGI.pm.