Learning Perl

Why Perl’s conditional operator is right associative

What happens if you change the associativity of the conditional operator? PHP implemented it incorrectly and now it’s part of the language. In What does this PHP print?, Ovid posted a bit of PHP code that gives him unexpected results. The code comes from a much longer rant by Alex Munroe titled PHP: a fractal of bad design:

The result is 'horse', and it will be for almost all values of $arg.

% php test.php
horse

I don’t care so much about the rant, but it told me the answer to this problem. The conditional operator is left associative in PHP, as documented in Operator Precedence. That almost made sense to me, and I know that putting parentheses around these things makes it more clear. I’m almost embarrassed to say that I couldn’t do it right off in this case. Where do I put them? With other operators it’s easy because the operator characters are next to each other. I started writing this to figure out the grouping when the operator characters are separated by other things.

Let’s simplify that a bit to we don’t have a big mess. Now there are only two:

The result is still 'horse' because we haven’t really changed anything:

% php simple.php
horse

Joel Berger gave a hint when he said that changing 'car' to '' yields 'feet':

And it does yield 'feet'::

% php null.php
feet

In Perl, the language I do know, the same operator is right associative (Why is the conditional operator right associative? on Stackoverflow explains why). Associativity, documented in perlop, comes into play when the compiler has to figure out which operator to do first when it has the same operator next to each other. In Learning Perl, we show this with the expontentiation operator since many other operators, such as multiplication and addition, don’t really care. The expontentiation is right associative because that’s what Larry decided it was (C doesn’t have this operator). That means it does the operation on the right before it does the operation on the left. You can see this when you use parentheses, the highest precedence operator, to denote the order you want and compare it to the version without the explicit grouping:

my $num = 4**3**2;    # 262144
my $num = 4**(3**2);  # 262144
my $num = (4**3)**2;  # 4096

We can do the same for the conditional operator in Perl. First, we translate the code to PHP, which is mostly changing == to eq:

# perl.pl
use v5.10;

my $arg = 'C';
my $vehicle = (
               ( $arg eq 'C' ) ? 'car' :
               ( $arg eq 'H' ) ? 'horse' : 'feet'
             );
say $vehicle;

This only outputs “car”:

% perl.pl
car

In Perl, we get the same behavior if we put parentheses around the second conditional:

# right.pl
use v5.10;

my $arg = 'C';
my $vehicle = (
               ( $arg eq 'C' ) ? 'car' :
               ( ( $arg eq 'H' ) ? 'horse' : 'feet' )
             );
say $vehicle;

We get the same result as perl.pl because we haven’t changed the order of anything:

% perl right.pl
car

To get the PHP behaviour, we have to change the parentheses like this, to surround everything up to the next ?. It took quite a mental leap for me to get this far because it’s so unnatural:

# left.pl
use v5.10;

my $arg = 'C';                                                        
my $vehicle = (
               ( ( $arg eq 'C' ) ? 'car' : ( $arg eq 'H' ) ) 
               	? 'horse' : 'feet'
             );
say $vehicle;

Now we get different behaviour:

% perl left.pl
horse

That’s really odd, but it’s also a small gotcha we mention in the Learning Perl class. You can have things such as ( $arg == 'H' ) as a branch. This use probably isn’t useful, but it’s a consequence of the syntax. We can do assignments, for instance:

my $result = $value ? ( $n = 5 ) : ( $m = 6 );

It’s easier to see this as a picture for the path through the conditionals. The right associative version branches either to an endpoint or another decision and there’s only one way to get to that endpoint.

Right associative, as in Perl

The left associative version has multiple ways to get to the same endpoint because either branch in the previous conditional can be the value for the next test. This also shows how 'car' isn’t the endpoint that you think it should be:

Left associative, as in PHP

Going back to do the same thing with the original chain of conditionals, we get this diagram that looks more like a corset lacing instruction than something we meant to program.

The full monty

However, we already know the answers in this particular case because some values are literals, so we can remove several paths. Now it’s much more clear that many paths are feeding into a path that must end up at 'horse'.

The full monty

In fact, the only way to get to 'feet' is to be any letter that is not B, A, T, C, or H. Joel figured this out by changing 'car' to the empty string, which has this diagram:

Joel’s change

The only way to get to 'horse' is to be exactly H. The other letters must end up at 'feet' because they all end up at the empty string. Every other string ends up at 'feet' because they are not exactly H.

Maybe the complicated stuff makes sense to PHP programmers. I don’t know. It’s more likely that they don’t do these sorts of things, at least if they’ve read the advice in the PHP manual. Some people blame Perl since PHP inherited from Perl, but it seems like a yacc error that they can’t fix for backward compatibility. It’s not like that’s never happened to Perl

Learning Perl Challenge: popular history (Answer)

June’s challenge counted the most popular commands from a shell history. Some shells remember the last commands you used so you can start a new session and still have them available. For this exercise, I’ll assume the bash shell.

You setup the history feature by telling your shell to track the history. You want to remember 3,000 previous commands:

HISTFILE=/Users/brian/.bash_history
HISTFILESIZE=3000
HISTSIZE=1500

There’s much you can do with your command history, but that’s not what I’m covering here. The bash history cheat sheet explains most of it.

There are two ways you can get at the history. You can run the history command and pipe it to your Perl program as standard input, or you can read the history file (perhaps also piping it at standard input). The shell only writes to the file at the end of the session, so the file doesn’t know about recent commands. Also, each session merely appends to the file. Each session’s history is contiguous, so the file is not necessarily chronological. This doesn’t matter much to the challenge to find the overall popular commands.

Once you get the input, you have to figure out which part is the command. There are several issues there too. The first part of the line might be the command, a path to the command, or some sort of modifier for the command. A command line might have a pipeline of multiple commands or a series of separate commands. Some of the commands might be shell built-ins while others are external programs. Some commands might be user-defined aliases:

tail -f /var/log/system.log
/usr/bin/tail/ -f /var/log/system.log
sudo vi /etc/groups
history | perl -pe 's/\A\s*\d+\s*//'
grep ^_x /etc/passwd | cut -d : -f 1,5 | perl -C -Mutf8 -pe 's/:/ → /g'
(cd /git/dumbbench; git pull origin master)
perldoc -l SQL::Parser | xargs bbedit
export HISTFILESIZE=3000
l

This breaks the challenge into three parts, the last of which is basic accumulator stuff that we show Learning Perl for many tasks.

Get the history
Extract the commands
Count and report the results

I’ll cover each of those parts separately as I go through the answers.

Get the history

Most people opened a filehandle on the history file. Some people hard-coded the path while others made it relative to the home directory:

open( FILE, "/Users/me/.bash_history" );     # rm

my $hist_file = "$ENV{HOME}/.bash_history";  # jose
open my $hf, "<", $hist_file;

my $path    = $ENV{HOME};                    # Dave M
my $history = $path . '/' . '.bash_history';
open( my $f, '<', $history );

open HISTORY, $ENV{'HOME'}.'/.bash_history'  # ulric
  or die "Cannot open .bash_history: $!";

Daniel Keane was the only person to use zsh, which has a history format with timestamps (one of the issues I noted with the bash history):

my $history_path = "$ENV{'HOME'}/.zhistory";
die "Cannot locate file: $history_path" unless -e $history_path;
open(my $history_fh, '<', $history_path);

Neil Bowers used File::Slurp to get it all at once:

read_file($ENV{'HOME'}.'/.bash_history', chomp => 1);

Anonymous Coward set a default value for the history file but also allows people to override it with a command-line option:

my $histfile = "$ENV{HOME}/.bash_history";
my $position = 0;
my $number = 10;
GetOptions (
  'histfile=s' => \$histfile,
  'position=i' => \$position,
  'number=i'   => \$number,
);

A couple of people shelled out to run the history command. WK used history -r to add the history from the history file to the current history, then read the history from the command line:

  my @history = qx/$ENV{ SHELL } -i -c "history -r; history"/; # WX

Javier's answer did much of the work in the shell and awk:

    my $get_cmds =
        qq/$ENV{'SHELL'} -i -c "history -r; history"/
        . q/ | awk '{for (i=2; i>NF; i++) printf $i " "; print $NF}'/;

    chomp(my @cmds = qx# $get_cmds #);

The two winners for getting the data, however, are the two people who provided one-liners. They used standard input, which means they could handle either a file in @ARGV or a pipe from history:

VINIAN might get a "Useless use of cat Award", something Perl's Randal Schwartz used to hand out. The -a switch (see perlrun splits on whitespace and puts the list in @F:

% cat ~/.bash_history | perl  -lane 'if ($F[0] eq "sudo"){$hash{$F[1]}++ } else { $hash{$F[0]}++ }; $count ++; END { @top = map {  [ $_, $hash{$_} ] } sort { $hash{$b}  <=> $hash{$a} } keys %hash; @max=@top[0..9]; printf("%10s%10d%10.2f%%\n", $_->[0], $_->[1], $_->[1]/$count*100) for @max}'

But, VINIAN's program would work with a file argument, As Chris Fedde uses:

% perl -lanE '$sum{$F[0]}++; END{ say "$_ $sum{$_}" for (reverse sort {$sum{$a}  <=> $sum{$b}} keys %sum)}' ~/.bash_history | less

I would think that a better answer to this part would examine the environment variable for HISTFILE, but jose notes that it doesn't work on cygwin. I did not investigate that.

Extract the commands

Extracting the commands is the tough part of the problem, but many people skipped most of that problem. They assumed the first group of non-whitespace was the command, possibly turning an absolute path to its basename.

jose used basename is the command started with a /:

$cmd = basename $cmd if $cmd and $cmd =~ /\//;

You don't really need to check that though. You could just take the basename though, as Dave M did unconditionally:

my $basename = basename( $args[0] );

This ignores something that would be harder to solve; what if two different commands have the same name? For instance, the system perl and a user-installed one might have the same name. Either might be specified as just perl depending on the PATH, and the user-installed one might be a relative path. Most people counted the basename only.

Some people took the commands and checked that they were in the PATH:

    for my $p (@paths) {
        if ( -e "$p/$basename" ) {
            $words{$basename}++;
        }
    }

No one checked for symbolic links.

WK is the only person who translated aliases:

my %alias = get_aliases_hash();

sub get_aliases_hash {
  my @alias = qx/$ENV{'SHELL'} -i -c "alias"/;
  return map { m/\Aalias\s+(.+)='(.+)'\s*$/; $1 => $2 } @alias;
}

ulric and WK are the only people who didn't count sudo by skipping it. Here's how ulric did it:

  if ($commandline[0] =~ 'sudo') {
    shift @commandline;
  } # sudo is ignored as a metacommand

I think most people get fair marks for this part, but if I had to choose a winner, I'd go with WK.

Count and report the results

Once you have the commands, most people used the command as a hash key and adding one to the value. There's not too much interesting there.

My Answer

My answer differs substantially only in the way that I examine the command line. I know about the Text::ParseWords module that can break up a line like the shell would. This is, it can break up a line while preserving quoting. I want to handle the shell special separators ; and | except when they are parts of quoted strings. The parse_lines can handle that:

;
my $delims = '(?:\s+|\||;)';
parse_line( $delims, 'delimiters', $_ );

The first argument is the regular expression I use to recognize a delimiter outside of a quoted string. I can't rely on whitespace (although at first I tried), but the shell can handle things such as tail -f file|perl -pe '...' where the separator has no whitespace around it.

The second argument, if it is the special value delimiters, returns the delimiter string. I need to use this to recognize the start of new commands on the same line.

My program is long and not very fun, since I limit myself to the material in Learning Perl. Fun things such as splice only show up as "things we didn't cover". It's in the Learning Perl Challenges GitHub repository with my other answers.

As well as keeping track of the commands, I track which ones were modified by other commands that I specify in %modifiers. I didn't handle aliases or basenames like many other people did:

#!perl
use v5.10;
use strict;
use warnings;

use Text::ParseWords;

my %commands;
my %modifieds;

my $N = 10;

my %modifiers = map { $_, 1 } qw( sudo xargs );
my $delims = '(?:\s+|\||;)';

while( <> ) {
	my @shellwords    = 
		grep { defined && /\S/ }
		parse_line( $delims, 'delimiters', $_ );
	next unless @shellwords;

	my @start_indices = get_starts( @shellwords );

	# go through the shellwords to find the delimiters like ; and |
	# one command line can have multiple commands, so find all of 
	# them.
	I: foreach my $i ( 0 .. $#start_indices ) {
		my( $start, $end ) = ( 
			$start_indices[$i], 
			$i < $#start_indices ? $start_indices[$i+1] - 1 : $#shellwords
			);
		
		# look through a command group to find the the command
		my $modified = 0;
		J: foreach my $j ( $start .. $end ) {
			next if $shellwords[$j] =~ m/\A$delims\Z/;
			if( exists $modifiers{$shellwords[$j]} ) {
				$modified = $shellwords[$j];
				next;
				}
			if( $modified ) {
				$modifieds{"$modified $shellwords[$j]"}++;
				}
			$commands{$shellwords[$j]}++;
			last J;
			}
		}	
	}

say "------ Top commands";
report_top_ten( $N, %commands );

say "------ Top modified commands";
report_top_ten( $N, %modifieds );


sub get_starts {
	my @starts = 0;
	while ( my( $i, $value ) = each @_ ) {
		push @starts, $i if $value =~ /\A$delims\z/;
		}
	return @starts;
	}

sub report_top_ten {
	my( $top_count, %hash ) = @_;
	
	my @top_commands  = sort { $hash{$b} <=> $hash{$a} } keys %hash;
	my $max_width = length $hash{$top_commands[0]};
	while( my( $i, $value ) = each @top_commands ) {
		last if $i >= $top_count;
		printf '%*d %s' . "\n", $max_width, $hash{$top_commands[$i]}, $top_commands[$i];
		}
	}

And, with that, I find my top commands by reading from standard input and using my environment variable to locat the file:

$ perl history.pl $HISTFILE
------ Top commands
843 make
518 perl
364 ls
343 git
244 cd
156 bb
117 open
 68 pwd
 51 rm
 42 ssh
------ Top modified commands
3 xargs rm
2 xargs bbedit
2 xargs bb
1 xargs stripper
1 sudo cp

The bb is an alias for bbedit, the command-line program to do various things with that GUI editor. The stripper is a program I use to remove end-of-line whitespace. I most often use it with find, which is why it shows up after xargs.

That make comes mostly as make test, and the perl as perl Makefile.PL.

Learning Perl Challenge: directory with the most files

Find the top 10 directories on your system that has the most files in it. For that, only count the files immediately under it. That is, don’t count files in subdirectories. For file, I mean just about anything that isn’t a symbolic link or weird device. Think about how you want to show the person running your program what it’s doing.

This challenge isn’t about counting so much as traversing, remembering, and displaying. How do you know what you need to handle next and which one was largest?

I’m actually writing my own version of this right now because I’m making some benchmarks on opendir versus glob and need some test cases. I could just create some new directories and make a bunch of fake files in them, but that’s no fun.

I don’t care how long your program takes, although you might. Let it run in a window (or screen) on its own. Test it on a small directory first (so, there’s a hint there).

I made a Curses version (but don’t look at it until you’ve tried your own solution!):

You can see a list of all Challenges and my summaries as well as the programs that I created and put in the Learning Perl Challenges GitHub repository.

50% off Learning Perl Video Series during OSCON

During OSCON, you can get Randal’s Schwartz’s Learning Perl Video Series for $74.99 (50% off). This is the same class that Stonehenge presents to its corporate clients, based on Learning Perl (or, more correctly, the book is based on the class).

You can also get the Programming Perl ebook for $19.99 (50% the list price) when you buy it through the open source geeks promotion or by using offer code CFOSCON.

Learning Perl Challenge: popular history

For June’s challenge, write a program to read the history of shell commands and determine which ones you use the most.

In bash, you can set the HISTFILE and HISTFILESIZE to control history‘s behavior. To get the commands to count, you can read the right file or shell out to the history command. From there, you have to decide how to split up each line to figure out what the command is. Can you handle commands with relative paths and absolute paths so you don’t count them separately?

This was an meme for a little while, and I wish I could find my answer to it. If you find my answer, please let me know.

You can see a list of all Challenges and my summaries as well as the programs that I created and put in the Learning Perl Challenges GitHub repository.