chapters – Page 5 – Learning Perl

“The stat preceding -l _ wasn’t an lstat”

I ran into a fatal error that I haven’t previously encountered and I couldn’t find a good explanation where I expected it. The -l file test operator can only use the virtual _ filehandle if the preceding lookup was an lstat.

The file test operators, all documented under the -X entry in perlfunc, can use the virtual filehandle _, the single underscore, to reuse the results of the previous file lookup. They don’t just look up the single attribute you test, but all of it (through stat) which it filters to give you the answer to the question that you ask. The _ reuses that information to answer the next question instead of looking it up again.

I had a program that was similar to this one, where I used some filetest operators, including the -l to test if it’s a symbolic link.

use v5.14;

my $filename = join ".", $0, $$, time, 'txt';
my $symname  = $filename =~ s/\.txt/-link.txt/r;

open my $fh, '>', $filename
	or die "Could not open [$filename]: $!";
say $fh 'Just another Perl hacker,';
close $fh;

symlink $filename, $symname 
	or die "Could not symlink [$symname]";

# http://perldoc.perl.org/functions/-X.html
foreach( $filename, $symname ) {
	say;
	say "\texists"           if -e;
	say "\thas size " . -s _ if -z _;
	say "\tis a link"        if -l _;
	}

I get this fatal error:

The stat preceding -l _ wasn't an lstat at test_link_test.pl line 19

The entry in perlfunc doesn’t say anything about this, but it hints that -l is a bit special:

If any of the file tests (or either the stat or lstat operator) is given the special filehandle consisting of a solitary underline, then the stat structure of the previous file test (or stat operator) is used, saving a system call. (This doesn’t work with -t , and you need to remember that lstat() and -l leave values in the stat structure for the symbolic link, not the real file.) (Also, if the stat buffer was filled by an lstat call, -T and -B will reset it with the results of stat _ ).

Adding the diagnostics pragma has the answer that isn’t in perlfunc:

The stat preceding -l _ wasn't an lstat at test_link_test.pl line 19 (#1)
    (F) It makes no sense to test the current stat buffer for symbolic
    linkhood if the last stat that wrote to the stat buffer already went
    past the symlink to get to the real file.  Use an actual filename
    instead.

The other file test operators will perform a stat. If the file is a symlink, the stat follows the symlink to get the information from its target. A symlink to a symlink will even keep going until it ultimately gets to a non symlink. With a stat, the -l _ will never be true because it always ends up at the target, even if it doesn’t exist.

The lstat doesn’t follow the link, so it can answer the -l _ question because it might have returned the information for a link and in the case of a non-link, it works just like stat.

As the long version of the warning says, it’s probably better to never use the _ filehandle and use the full filename instead. Sure, it has to redo the work, but you won’t be surprised by a fatal error if you did the wrong type of lookup before.

Why Perl’s conditional operator is right associative

What happens if you change the associativity of the conditional operator? PHP implemented it incorrectly and now it’s part of the language. In What does this PHP print?, Ovid posted a bit of PHP code that gives him unexpected results. The code comes from a much longer rant by Alex Munroe titled PHP: a fractal of bad design:

The result is 'horse', and it will be for almost all values of $arg.

% php test.php
horse

I don’t care so much about the rant, but it told me the answer to this problem. The conditional operator is left associative in PHP, as documented in Operator Precedence. That almost made sense to me, and I know that putting parentheses around these things makes it more clear. I’m almost embarrassed to say that I couldn’t do it right off in this case. Where do I put them? With other operators it’s easy because the operator characters are next to each other. I started writing this to figure out the grouping when the operator characters are separated by other things.

Let’s simplify that a bit to we don’t have a big mess. Now there are only two:

The result is still 'horse' because we haven’t really changed anything:

% php simple.php
horse

Joel Berger gave a hint when he said that changing 'car' to '' yields 'feet':

And it does yield 'feet'::

% php null.php
feet

In Perl, the language I do know, the same operator is right associative (Why is the conditional operator right associative? on Stackoverflow explains why). Associativity, documented in perlop, comes into play when the compiler has to figure out which operator to do first when it has the same operator next to each other. In Learning Perl, we show this with the expontentiation operator since many other operators, such as multiplication and addition, don’t really care. The expontentiation is right associative because that’s what Larry decided it was (C doesn’t have this operator). That means it does the operation on the right before it does the operation on the left. You can see this when you use parentheses, the highest precedence operator, to denote the order you want and compare it to the version without the explicit grouping:

my $num = 4**3**2;    # 262144
my $num = 4**(3**2);  # 262144
my $num = (4**3)**2;  # 4096

We can do the same for the conditional operator in Perl. First, we translate the code to PHP, which is mostly changing == to eq:

# perl.pl
use v5.10;

my $arg = 'C';
my $vehicle = (
               ( $arg eq 'C' ) ? 'car' :
               ( $arg eq 'H' ) ? 'horse' : 'feet'
             );
say $vehicle;

This only outputs “car”:

% perl.pl
car

In Perl, we get the same behavior if we put parentheses around the second conditional:

# right.pl
use v5.10;

my $arg = 'C';
my $vehicle = (
               ( $arg eq 'C' ) ? 'car' :
               ( ( $arg eq 'H' ) ? 'horse' : 'feet' )
             );
say $vehicle;

We get the same result as perl.pl because we haven’t changed the order of anything:

% perl right.pl
car

To get the PHP behaviour, we have to change the parentheses like this, to surround everything up to the next ?. It took quite a mental leap for me to get this far because it’s so unnatural:

# left.pl
use v5.10;

my $arg = 'C';                                                        
my $vehicle = (
               ( ( $arg eq 'C' ) ? 'car' : ( $arg eq 'H' ) ) 
               	? 'horse' : 'feet'
             );
say $vehicle;

Now we get different behaviour:

% perl left.pl
horse

That’s really odd, but it’s also a small gotcha we mention in the Learning Perl class. You can have things such as ( $arg == 'H' ) as a branch. This use probably isn’t useful, but it’s a consequence of the syntax. We can do assignments, for instance:

my $result = $value ? ( $n = 5 ) : ( $m = 6 );

It’s easier to see this as a picture for the path through the conditionals. The right associative version branches either to an endpoint or another decision and there’s only one way to get to that endpoint.

Right associative, as in Perl

The left associative version has multiple ways to get to the same endpoint because either branch in the previous conditional can be the value for the next test. This also shows how 'car' isn’t the endpoint that you think it should be:

Left associative, as in PHP

Going back to do the same thing with the original chain of conditionals, we get this diagram that looks more like a corset lacing instruction than something we meant to program.

The full monty

However, we already know the answers in this particular case because some values are literals, so we can remove several paths. Now it’s much more clear that many paths are feeding into a path that must end up at 'horse'.

The full monty

In fact, the only way to get to 'feet' is to be any letter that is not B, A, T, C, or H. Joel figured this out by changing 'car' to the empty string, which has this diagram:

Joel’s change

The only way to get to 'horse' is to be exactly H. The other letters must end up at 'feet' because they all end up at the empty string. Every other string ends up at 'feet' because they are not exactly H.

Maybe the complicated stuff makes sense to PHP programmers. I don’t know. It’s more likely that they don’t do these sorts of things, at least if they’ve read the advice in the PHP manual. Some people blame Perl since PHP inherited from Perl, but it seems like a yacc error that they can’t fix for backward compatibility. It’s not like that’s never happened to Perl

There’s a better (correct) way to case fold

We show you the wrong way to do a case insensitive sort in Learning Perl, 6th Edition showed many of Perl’s Unicode features, which we had mostly ignored in all of the previous editions (despite Unicode support starting in Perl v5.6). In our defense, it wasn’t an easy thing to do without CPAN modules before the upcoming Perl v5.16.

In the “Strings and Sorting” chapter, we show this subroutine:

sub case_insensitive { "\L$a" cmp "\L$b" }

In the Unicode world, that doesn’t work (which I explain in Fold cases properly at The Effective Perler). With Perl v5.16, we should use the new fc built-in which does case folding according to Unicode’s rules:

use v5.16; # when it's released
sub case_insensitive { fc($a) cmp fc($b) }

We could use the double-quote case shifter \F to do the same thing:

use v5.16; # when it's released
sub case_insensitive { "\F$a" cmp "\F$b" }

Without Perl v5.16, we could use the Unicode::CaseFold module which defines an fc function.

Why we teach bareword filehandles

There’s a debate raging in Perl 5 Porters over some updates to perlopentut. Mike Doherty sent a patch to remove the bareword filehandles, stepping into the overlapping minefields of fashionable practice and documentation authorship. That second one is beyond the scope of Learning Perl, so I’ll ignore it even though that’s almost the entire debate.

There are three reasons we teach bareword filehandles in Learning Perl: generality, immediacy, and history. Having said that, we are not choosing between barewords or filehandle references, even if we do use the references for most of the book. We’re agnostic, and we leave it to our reader to make the best choice for them. We don’t see out job to tell you how to program. We just want to explain what Perl actually lets you do.

Generality: We don’t write books for individual people. That just doesn’t work since it would take more time than any author has, even if authors did nothing but write every second of every day. For a book such as Learning Perl which sells tens of thousands of copies just in the paper version, we have many sorts of people to think about, virtually none of whom we have ever met, will ever meet, and will never give us feedback. We don’t know why you are using Perl. You could be system administrators, web programmers, data mungers, or, sometimes, non-techies. We don’t know if you are writing new code or adjusting old code. We don’t know what you need, so we don’t make judgments based on what we don’t know. If you are talking to one of us during a face-to-face class, we can ask you questions and find out more about what you need.

Perl suffers from success in vastly different areas. You can write quick one-liners or huge, enterprise systems. You might need an interactive user interface, or not. Different tasks and different domains require different practices. Instead of assuming that you are going to limit yourself to just one of those areas, we aim to make Learning Perl useful to most of those areas.

Immediacy: In a tutorial like Learning Perl, we have to start somewhere. A bad book can quickly overload a reader with several concepts at once. Our goal is to get people writing code as soon as possible, which is the fundamental goal of Perl. Larry Wall wants people to be able to program even if they don’t consider themselves programmers. He doesn’t want people to have to spend hours of research to be able to doing something useful quickly. Larry’s term for this is “baby Perl”.

There are several concepts that go into a this single line of Perl, none of which are immediately useful to someone who wants to write the sort of short program we consider in Learning Perl:

open my $fh, '<', $filename or die '...';

In their first afternoon of Perl, the new programmer has to be comfortable with my as well as the concepts around open, its modes, and its lack of punctuation. That my $fh inside a larger expression is really confusing for beginners. We don't make any judgements based on that. We just know that's the way it is. To avoid the extra sources of confusion, it's easier to reduce the problem to one with the fewest distractions:

open FILE, $filename or die '...';

Don't forget that the tutorial for open is almost 1,000 lines long. This is not a simple built-in. Once people get used to the idea of a very basic open, we can expand on that to add more to the readers knowledge and understanding. All education is a continual process of refinement and adjustment as you integrate new concepts with what you already understand.

History: Finally, no matter what we personally think about one form or the other, we can't deny that people are going to see the bareword versions. The "Modern Perl" movement would like to pretend that bareword filehandles don't exist, but they can't ignore STDOUT, STDERR, STDIN, or DATA. Perl has bareword filehandles and most people are going to write new code using bareword filehandles, indeed, even relies on those barewords, even if they subscribe to all of "Modern Perl". Putting that aside, though, people are going to see bareword filehandles in old code. Our readers need to know what those are and what they do. Programming is not just about writing code, it's about reading code. We can't hide widespread patterns because they've become unfashionable to some people.

A Unicode demonstration: Roman::Unicode

I just uploaded Roman::Unicode. It’s a silly little module to represent Perl numbers in roman numerals. I adapted it from Roman, but use the special roman numeral characters in the Unicode Character Set (UCS) instead of the ASCII versions. It’s not complete, but it’s a start.

So far I just have a start, but eventually I want to use this module to demonstrate various Unicode character things in my “Unicode in Perl” class. For instance:

There are fancy characters for higher numbers, like ↁ (U+2181) for 5,000; ↂ (U+2182) for 10,000; ↇ (U+2187) for 50,000; and ↈ (U+2188) for 100,000. The ASCII-only module limits itself to 3,999 because of it’s limited character set. This module limits itself to 399,999.
The roman numeral characters can compatibility-decompose to their ASCII versions. The Ⅰ (U+2160) is compatible to the capital I (U+0049) you already use. The single character Ⅷ (U+2167) can compatibility-decompose to the four characters V (U+0056) and I (U+0049) (times three).
I want to do something to convert ASCII versions to the higher character versions.
The roman numeral characters know they are numbers because they have the right properties. The ASCII versions don’t know that they are numbers.
I don’t want to use some of the characters between (U+2160) to (U+2188), but not all of them, so I need to invent some custom character classes (although we did not talk about that in Learning Perl.
You can lowercase these, even though they aren’t ASCII.
These characters have numeric values, even though they aren’t 0-9.

I’m still thinking about the next bit, which actually isn’t classic roman numerals. After awhile, people figured out they needed larger numbers and started drawing bars over the numbers, and gates around groups of numbers to indicate higher orders of magnitude. I’d like to see if I could do that with the UCS.