Perl’s file globbing uses the FreeBSD-style globbing, but it works mostly everywhere since Perl handles it internally through the File::Glob module. I’m working on the “Directory Operations” chapter for Learning Perl, 7th edition, where we cover glob
. I’m trying to make the book more Windows friendly so I’ve been considering how this stuff translates.
I ran across Raymond Chen’s “How did wildcards work in MS-DOS?”. He lays out the steps for turning what we think of as a pattern (such as “*.txt”) into the CP/M-style pattern that MS-DOS used. He shows how to convert the glob pattern to primitive pattern.
- Initialize the target pattern to 11 spaces and set the cursor to 0.
- Read the next character from the input. Stop if there are no more characters.
- If the input is
.
, set positions 8 to 10 to spaces. Set the cursor to position 8 and go back to
step 2. - If the input is
*
, fill in the remaining places with?
(the CP/M wildcard). Go to position 11 and then start step 2. - If the cursor is not at position 11, copy the input character to the cursor position and advance the cursor.
I translated this to Perl, just for fun. I used only one feature that we did not cover in Learning Perl—the use of the /g
flag in scalar context. If that matches, it remember where it matched and picks up there the next time, allowing me to walk to the string in $glob
without destroying it:
while( ) { chomp; my $dos_pattern = ' ' x 11; my $cursor = 0; while( m/(.)/g ) { # /g in scalar content remembers where it left off my $input = $1; last unless defined $input; if( $input eq '.' ) { substr( $dos_pattern, 8, 3, ' ' x 3 ); $cursor = 8; next; } elsif( $input eq '*' ) { my $rest = 11 - $cursor; substr( $dos_pattern, $_, 1, '?' ) for ( $cursor .. 10 ); $cursor = 11; next; } elsif( $cursor != 11 ) { substr( $dos_pattern, $cursor++, 1 ) = $input; } } printf "%-12s -> %12s\n", $_, $dos_pattern; } __END__ ABCD.TXT ABCDEFGHIJK A*B.TXT *.* * *.TXT .TXT
The output shows the translation of glob patterns:
ABCD.TXT -> ABCD TXT ABCDEFGHIJK -> ABCDEFGHIJK A*B.TXT -> A???????TXT *.* -> ??????????? * -> ??????????? *.TXT -> ????????TXT
Some things to note:
- This assumes that all filenames are 8.3 names. The dot is implicit.
- Names shorter than eight characters have implicit spaces to pad them.
- These only allow one
*
, so any characters after a*
and before a.
are ignored.
This isn’t what Perl does on Windows, though. It’s only a bit of fun programming, maybe worthy of an exercise in the book.
Is it my predisposition to BSD style globs or is the third case to magically match the B somewhere in that glob? I’d have expected it to match a B (and importantly only a B) just before the dot but
$dos_pattern
doesn’t show that (or rather it could match A*C as well)?