2024 π Daylatest newsbuy art
Where am I supposed to go? Where was I supposed to know?Violet Indianaget lost in questionsmore quotes
very clickable
data + munging

The Perl Journal

Volumes 1–6 (1996–2002)

Code tarballs available for issues 1–21.

I reformatted the CD-ROM contents. Some things may still be a little wonky — oh, why hello there <FONT> tag. Syntax highlighting is iffy. Please report any glaring issues.

The Perl Journal
#15
Fall 1999
vol 4
num 3
Letters
MIDI::Simple, and a little hate mail.
Perl News
What's new in the Perl community.
Just the FAQs: Precedence Problems
There's more to it than what you learned in fourth grade.
Braille Contractions and Regular Expressions
How a 14 kilobyte regex helps the visually challenged.
Client-Server Applications
Turn your program into a server.
Genetic Algorithms with Perl
Evolving algebraic expressions.
Review: Perl 5 For Dummies
Threadsafing a Module
Make your unthreaded modules palatable to threaded Perl.
Visual Debugging with ptkdb
Free software that finds bugs in your programs.
Predicting Sports Championships
Why the Denver Broncos will win the Superbowl again.
Hiding Object Data Using Closures
Concealing attributes from prying programmers.
Turning a Perl Program Into an NT Service
Long-lived Perl programs on Windows NT.
Operator Overloading in Perl
Use +, x, and other operators on your objects.
A Web Spider...In One Line?
Using HTML::, LWP::, and HTTP:: modules to traverse links.
Review: Writing Apache Modules with Perl and C
Prequel to SQL
Using Microsoft Access and DBI with a web application.
Version Control with makepatch
A free utility for updating documents.
The Obfuscated Perl Contest Victors
The Perl Journal One Liners
Mark-Jason Dominus (1999) Just the FAQs: Precedence Problems. The Perl Journal, vol 4(3), issue #15, Fall 1999.

Just the FAQs: Precedence Problems

There's more to it than what you learned in fourth grade.

Mark-Jason Dominus


What is Precedence?

What's 2+3x4?

You probably learned about this in grade school; it was fourth-grade material in the New York City public school I attended. If not, that's okay too; I'll explain everything.

It's well-known that 2+3x4 is 14, because you are supposed to do the multiplication before the addition. 3x4 is 12, and then you add the 2 and get 14. What you do not do is perform the operations in left-to-right order; if you did that you would add 2 and 3 to get 5, then multiply by 4 and get 20.

This is just a convention about what an expression like 2+3x4 means. It's not an important mathematical fact; it's just a rule we set down about how to interpret certain ambiguous artihmetic expressions. It could have gone the other way, or we could have the rule that the operations are always done left-to-right. But we don't have those rules; we have the rule that says that you do the multiplication first and then the addition. We say that multiplication takes precedence over addition.

What if you really do want to say 'add 2 and 3, and multiply the result by 4'? Then you use parentheses, like this: (2+3)x4. The rule about parentheses is that expressions in parentheses must always be fully evaluated before anything else.

If we always used the parentheses, we wouldn't need rules about precedence. There wouldn't be any ambiguous expressions. We have precedence rules because we're lazy and we like to leave out the parentheses when we can. The fully-parenthesized form is always unambiguous. The precedence rule tells us how to interpret a version with fewer parentheses to decide what it would look like if we wrote the equivalent fully-parenthesized version. In the example above:

a. 2+(3x4) b. (2+3)x4

Is 2+3x4 like (a) or like (b)?

The precedence rule just tells us that it is like (a).

Rules and More Rules

In grade school you probably learned a few more rules:
    4 x 52

Which of these interpretations is correct?

    (4 x 5)2  = 400 or 4 x (5)2 = 100

The rule is that exponentiation takes precedence over multiplication, so it's 100, and not 400.

What about 8 – 3 + 4? Is this like (8 – 3) + 4 = 9 or 8 – (3 + 4) = 1? Here the rule is a little different. Neither + nor – has precedence over the other. Instead, the – and + are just done left-to-right. This rule handles the case of 8 – 4 – 3 also. Is it (8 – 4) – 3 = 1 or is it 8 – (4 – 3) = 7? Subtractions are done left-to-right, so it's 1 and not 7. A similar left-to-right rule handles ties between x and /.

Our rules are getting complicated now:

  1. Exponentiation first
  2. Then multiplication and division, left to right
  3. Then addition and subtraction, left to right

Maybe we can leave out the 'left-to-right' part and just say that all ties will be broken left-to right? No, because for exponentiation that isn't true.

     223

means

    2(23) = 256,    not   (22)3  = 64.

So exponentiations are resolved from upper-right to lower-left. Perl uses the token ** to represent exponentiation, writing x**y instead of xy. In this case x**y**z means x**(y**z), not (x**y)**z, so ** is resolved right-to-left.

Programming languages have this same notational problem, except that they have it even worse than mathematics does, partly because they have so many different operator symbols. For example, Perl has at least seventy different operator symbols. This is a problem, because communication with the compiler and with other programmers must be unambiguous. You don't want to be writing something like 2+3x4 and have Perl compute 20 when you wanted 14, or vice versa.

Nobody knows a really good solution to this problem, and different languages solve it in different ways. For example, the language APL, which has a whole lot of unfamiliar operators like p and , dispenses with precedence entirely and resolves them all from right to left. The advantage of this is that you don't have to remember any rules, and the disadvantage is that many expressions are confusing: If you write 2x3+4, you get 14, not 10. In Lisp the issue never comes up, because in Lisp the parentheses are required, and so there are no ambiguous expressions. (Now you know why Lisp looks the way it does.)

Perl, with its seventy operators, has to solve this problem somehow. The strategy Perl takes (and most other programming languages as well) is to take the fourth-grade system and extend it to deal with the new operators. The operators are divided into many 'precedence levels', and certain operations, like multiplication, have higher precedence than other operations, like addition. The levels are essentially arbitrary, and are chosen without any deep plan, but with the hope that you will be able to omit most of the parentheses most of the time and still get what you want. So, for example, Perl gives * a higher precedence than +, and ** a higher precedence than *, just like in grade school.

An Explosion of Rules

Let's see some examples of the reasons for which the precedence levels are set the way they are. Suppose you wrote something like this:
    $v = $x + 3;

This is actually ambiguous. It might mean

    ($v = $x) + 3;

or it might mean

    $v = ($x + 3);

The first of these is silly, because it stores the value $x into $v, and then computes the value of $x+3 and then throws the result of the addition away. In this case the addition was useless. The second one, however, makes sense, because it does the addition first and stores the result into $v. Since people write things like

    $v = $x + 3;

all the time, and expect to get the second behavior and not the first, Perl's = operator has low precedence, lower than the precedence of +, so that Perl makes the second interpretation.

Here's another example:

    $result = $x =~ /foo/;

means this:

    $result = ($x =~ /foo/);

which looks to see if $x contains the string foo, and stores a true or false result into $result. It doesn't mean this:

    ($result = $x) =~ /foo/;

which copies the value of $x into $result and then looks to see if $result contains foo. In this case it's likely that the programmer wanted the first meaning, not the second. But sometimes you do want it to go the other way. Consider this expression:

    $p = $q =~ s/w//g;

Again, this expression is interpreted this way:

    $p = ($q =~ s/w//g);

All the w's are removed from $q, and the number of successful substitutions is stored into $p. However, sometimes you really do want the other meaning:

    ($p = $q) =~ s/w//g;

This copies the value of $q into $p, and then removes all the w's from $p, leaving $q alone. If you want this, you have to include the parentheses explicitly, because = has lower precedence than =~.

Often the rules do what you want. Consider this:

    $worked = 1 + $s =~ /pattern/;

There are five ways to interpret this:

a.  ($worked = 1) + ($s  =~ /pattern/);

b. (($worked = 1) + $s)  =~ /pattern/;

c.  ($worked = (1 + $s)) =~ /pattern/;

d.   $worked = ((1 + $s) =~ /pattern/);

e.   $worked = (1 + ($s  =~ /pattern/));

We already know that + has higher precedence than =, so it happens before =, and that rules out (a) and (b).

We also know that =~ has higher precedence than =, so that rules out (c).

To choose between (d) and (e) we need to know whether + takes precedence over =~ or vice versa. (d) will convert $s to a number, add 1 to it, convert the resulting number to a string, and do the pattern match. That is a pretty silly thing to do. (e) will match $s against the pattern, return a boolean result, add 1 to that result to yield the number 1 or 2, and store the number into $worked. That makes a lot more sense; perhaps $worked will be used later to index an array. We should hope that Perl chooses interpretation (e) rather than (d). And in fact that is what it does, because =~ has higher precedence than +. =~ behaves similarly with respect to multiplication.

Our table of precedence is shaping up:

1. **         (right to left)
	
2. =~

3. *, /       (left to right)

4. +, –       (left to right)

5. =        

How are multiple = symbols resolved? Left to right, or right to left? The question is whether this:

    $a = $b = $c;

will mean this:

    ($a = $b) = $c;

or this:

    $a = ($b = $c);

The first one means to store the value of $b into $a, and then to store the value of $c into $a; this is obviously not useful. But the second one means to store the value of $c into $b, and then to store that value into $a also, and that obviously is useful. So = is resolved right to left.

Why does =~ have lower precedence than **? No good reason. It's just a side effect of the low precedence of =~ and the high precedence of **. It's probably very rare to have =~ and ** in the same expression anyway. Perl tries to get the common cases right. Here's another common case:

    if ($x == 3 && $y == 4) { … }

Is this interpreted as:

a. (($x == 3)  &&  $y) == 4

b.  ($x == 3)  && ($y == 4)

c.  ($x == (3  &&  $y)) == 4

d.   $x == ((3 &&  $y) == 4)

e.   $x == (3  && ($y == 4))

We really hope that it will be (b). To make (b) come out, && will have to have lower precedence than ==; if the precedence is higher we'll get (c) or (d), which would be awful. So && has lower precedence than ==. If this seems like an obvious decision, consider that Pascal got it wrong.

|| has precedence about the same as &&, but slightly lower, in accordance with the usual convention of mathematicians, and by analogy with * and +. ! has high precedence, because when people write

    !$x …some long complicated expression…

they almost always mean that the ! applies to the $x, not to the entire long complicated expression. In fact, almost the only time they don't mean this is in cases like this one:

    if (! $x->{'annoying'}) { … }

it would be very annoying if this were interpreted to mean

    if ((! $x)->{'annoying'}) { … }

The same argument we used to explain why ! has high precedence works even better and explains why -> has even higher precedence. In fact, -> has the highest precedence of all. If ## and @@ are any two operators at all, then

    $a ## $x->$y

and

    $x->$y @@ $b
    $a ## ($x->$y)

and

    ($x->$y) @@ $b

    ($a ## $x)->$y

or

    $x->($y @@ $b)

For a long time, the operator with lowest precedence was the , operator. The , operator is for evaluating two expressions in sequence. For example,

    $a*=2 , $c*=3

doubles $a and triples $c. It would be a shame if you wrote something like this:

    $a*=2 , $c*=3 if $change_the_variables;

and Perl interpreted it to mean this:

    $a*= (2, $c) *= 3 if $change_the_variables;

That would just be bizarre. The very very low precedence of , ensures that you can write

    EXPR1, EXPR2

for any two expressions at all, and be sure that they are not going to get mashed together to make some nonsense expression like $a*= (2, $c) *= 3.

, is also the list constructor operator. If you want to make a list of three things, you have to write

    @list = ('Gold', 'Frankincense', 'Myrrh');

because if you left off the parentheses, like this:

    @list = 'Gold', 'Frankincense', 'Myrrh';

what you would get would be the same as this:

    (@list = 'Gold'), 'Frankincense', 'Myrrh';

This assigns @list to have one element (Gold) and then executes the two following expressions in sequence, which is pointless. So this is a prime example of a case where the default precedence rules don't do what you want. But people are already in the habit of putting parentheses around their list elements, so nobody minds this very much, and the problem isn't really a problem at all.

Precedence Traps and Surprises

This very low precedence for , causes some other problems, however. Consider this common idiom:
open(F, "< $file") || die "Couldn't open $file: $!";

This tries to open a filehandle, and if it can't it aborts the program with an error message. Now watch what happens if you leave the parentheses off the open call:

open F, "< $file"  || die "Couldn't open $file: $!";

, has very low precedence, so the || takes precedence here, and Perl interprets this expression as if you had written this:

open F, ("< $file"  || die "Couldn't open $file: $!");

This is totally bizarre, because the die will only be executed when the string "< $file" is false, which never happens. Since the die is controlled by the string and not by the open call, the program will not abort on errors the way you wanted. Here we wish that || had lower precedence, so that we could write

try to perform big long hairy complicated action  || die ;

and be sure that the || was not going to gobble up part of the action the way it did in our open example. Perl 5 introduced a new version of || that has low precedence for exactly this purpose. It's spelled or, and in fact it has the lowest precedence of all Perl's operators. You can write

    try to perform big long hairy complicated action or die ;

and be quite sure that or will not gobble up part of the action the way it did in our open example, whether or not you leave off the parentheses. To summarize:

open(F, "< $file") or die "Couldn't open $file: $!";  # OK
open F, "< $file"  or die "Couldn't open $file: $!";  # OK
open(F, "< $file") || die "Couldn't open $file: $!";  # OK
open F, "< $file"  || die "Couldn't open $file: $!";  # Whooops!

If you use or, you're safe from this error, and if you always put in the parentheses, you're safe. Pick a strategy you like and stick with it.

The other big use of || is to select a value from the first source that provides it. For example:

$directory = $opt_D || $ENV{DIRECTORY} || $DEFAULT_DIRECTORY;

It looks to see if there was a -D command-line option specifying the directory first; if not, it looks to see if the user set the DIRECTORY environment variable; if neither of these is set, it uses a hard-wired default directory. It gets the first value that it can, so, for example, if you have the environment variable set and supply an explicit -D option when you run the program, the option overrides the environment variable. The precedence of || is higher than that of =, so this means what we wanted:

$directory = ($opt_D || $ENV{DIRECTORY} || $DEFAULT_DIRECTORY);

But sometimes people have a little knowledge and end up sabotaging themselves, writing this:

$directory = $opt_D or $ENV{DIRECTORY} or $DEFAULT_DIRECTORY;

or has very very very low precedence, even lower than =, so Perl interprets this as:

($directory = $opt_D) or $ENV{DIRECTORY} or $DEFAULT_DIRECTORY;

$directory is always assigned from the command-line option, even if none was set. Then the values of the expressions $ENV{DIRECTORY} and $DEFAULT_DIRECTORY are thrown away. Perl's -w option will warn you about this mistake if you make it. To avoid it, remember this rule of thumb: use || for selecting values, and use or for controlling the flow of statements.

List Operators and Unary Operators

A related problem is that all of Perl's 'list operators' have high precedence, and tend to gobble up everything to their right. (A 'list operator' is a Perl function that accepts a list of arguments, like open as above, or print.) We already saw this problem with open. Here's a similar sort of problem:
@successes = (unlink $new, symlink $old, $new, open N, $new);

This isn't even clear to humans. What was really meant was

@successes = (unlink($new), symlink($old, $new), open(N, $new));

which performs the three operations in sequence and stores the three success-or-failure codes into @successes. But what Perl thought we meant here was something totally different:

@successes = (unlink($new, symlink($old, $new, open(N, $new))));

It thinks that the result of the open call should be used as the third argument to symlink, and that the result of symlink should be passed to unlink, which will try to remove a file with that name. This won't even compile, because symlink needs two arguments, not three. We saw one way to dismbiguate this; another is to write it like this:

@successes = ((unlink $new), (symlink $old, $new), (open N, $new));

Again, pick a style you like and stick with it.

Why do Perl list operators gobble up everything to the right? Often it's very handy. For example:

@textfiles = grep -T, map "$DIRNAME/$_", readdir DIR;

Here Perl behaves as if you had written this:

@textfiles = grep(-T, (map("$DIRNAME/$_", (readdir(DIR)))));

Some filenames are read from the dirhandle with readdir and the resulting list is passed to map, which turns each filename into a full path name and returns a list of paths. Then grep filters the list of paths, extracts all the paths that refer to text files, and returns a list of just the text files from the directory.

One possibly fine point is that the parentheses might not always mean what you want. For example, suppose you had this:

    print $a, $b, $c;

Then you discover that you need to print out double the value of $a. If you do this you're safe:

    print 2*$a, $b, $c;

but if you do this, you might get a surprise:

    print (2*$a), $b, $c;

If a list operator is followed by parentheses, Perl assumes that the parentheses enclose all the arguments, so it interprets this as:

    (print (2*$a)), $b, $c;

It prints out twice $a, but doesn't print out $b or $c at all. (Perl will warn you about this if you have -w on.) To fix this, add more parentheses:

    print ((2*$a), $b, $c);

Some people will suggest that you do this instead:

    print +(2*$a), $b, $c;

Perl does what you want here, but I think it's bad advice because it looks bizarre.

Here's a similar example:

    print @items, @more_items;

Say we want to join up the @items with some separator, so we use join:

    print join '---', @items, @more_items;

Oops; this is wrong; we only want to join @items, not @more_items also. One way we might try to fix this is:

    print (join '---', @items), @more_items;

This falls afoul of the problem we just saw: Perl sees the parentheses, assumes that they contain the arguments of print, and never prints @more_items at all. To fix, use

 print ((join '---', @items), @more_items); or
 print join('---', @items), @more_items;

Sometimes you won't have this problem. Some of Perl's built-in functions are unary operators, which means that they always get exactly one argument. defined and uc are examples. They don't have the problem that the list operators have of gobbling everything to the right; they only gobble one argument. Here's an example similar to the one I just showed:

    print $a, $b;

Now we decide we want to print $a in all lower case letters:

    print lc $a, $b;

Don't we have the same problem as in the print join example? If we did, it would print $b in all lowercase also. But it doesn't, because lc is a unary operator and only gets one argument. This doesn't need any fixing.

Complete Rules of Precedence

Finally, here's Perl's complete precedence table:

left terms and list operators (leftward)
left ->
nonassoc ++ --
right **
right ! ~ \ and unary + and -
left =~ !~
left * / % x
left + - .
left << >>
nonassoc named unary operators
nonassoc < > <= >= lt gt le ge
nonassoc == != <= eq ne cmp>
left &
left | ^
left &&
left ||
nonassoc .. ...
right ?:
right = += -= *= etc.
left , =>
nonassoc list operators (rightward)
right not
left and
left or xor

This is straight out of the perlop documentation that comes with Perl. left and right mean that the operators associate to the left or the right, respectively; nonassoc means that the operators don't associate at all. For example, if you try to write

    $a < $b < $c

Perl will deliver a syntax error message. Perhaps what you really meant was

    $a < $b && $b < $c

The precedence table is much too big and complicated to remember; that's a problem with Perl's approach. You have to trust it to handle to common cases correctly, and be prepared to deal with bizarre, hard-to-find bugs when it doesn't do what you wanted. The alternatives, as I mentioned before, have their own disadvantages.

How to Remember all the Rules

Probably the best strategy for dealing with Perl's complicated precedence hierarchy is to cluster the operators in your mind:
Arithmetic:	+, -, *, /, %, **
Bitwise:	&, |, ~, <<, >>
Logical:	&&, ||, !
Comparison:	==, !=, >=, <=, >, <
Assignment:	=, +=, -=, *=, /=, etc.

and try to remember how the operators behave within each group. Mostly the answer will be 'they behave as you expect.' For example, the operators in the 'arithmetic' group all behave the according to the rules you learned in fourth grade. The 'comparison' group all have about the same precedence, and you aren't allowed to mix them anyway, except to say something like

    $a<$b == $c<$d

which compares the truth values of $a<$b and $c<$d.

Then, once you're familiar with the rather unsurprising behavior of the most common groups, just use parentheses liberally everywhere else.

Quiz

Now try to guess how Perl will interpret the following expressions:
a. $x = $x | $y << 3;

b. $y % 4 == 0 && $y % 100 != 0 || $y % 400 == 0

c. $V = 4/3*$PI*$r**3;

d. $x = 1 || $x <= 10

Answers

a. $x = ($x | ($y << 3));

b. ((($y % 4) == 0) && (($y % 100) != 0)) || (($y % 400) == 0) 
   (This computes whether or not the year $y is a leap year.)

c. $V = ((4/3)*$PI*($r**3)); (Volume of a sphere with radius $r.)

d. ($x >= 1) || ($x <= 10)


Mark-Jason Dominus recently quit his consulting job to take an extended vacation and possibly write a book. Instead of programming, he is trying to make a living by giving classes on Perl. He just got back from O'Reilly's Perl conference, where he gave classes on regular expressions, web security, and tricks of the wizards. He likes to get email, so send him some at mjd-tpj@plover.com.
Martin Krzywinski | contact | Canada's Michael Smith Genome Sciences CentreBC Cancer Research CenterBC CancerPHSA
Google whack “vicissitudinal corporealization”
{ 10.9.234.152 }