Perl's File::Find on Windows: A path forward?

In Implementation by Wishful Thinking, I looked at a problem caused by a simple change to how File::Find handles symlink related options. The problem itself was moved several steps from the actual change in File::Find and manifested itself in Module::Pluggable not being able to find plugins.

Before the change that gave rise to my not being able to install pwhich, File::Find used to have this:

$full_check        = $Is_Win32 ? 0 : $wanted->{follow};
$follow            = $Is_Win32 ? 0 :
                         $full_check || $wanted->{follow_fast};

That is, it used to be that if the program was running on Windows, full_check and follow were always disabled regardless of the options passed in.

If you think this does not make much sense to begin with, you are correct. It doesn’t. The code is conflating something that is more a property of filesystems with being a property of an operating system. More importantly, let’s say we know no Windows version will ever support symlinks and no Perl script running on Windows will ever encounter a filesystem with symlinks on it. Even then, there is no a priori reason to create a situation whereby chunks of code are never exercised at all on one operating system. This way, we ended up with two separate libraries instead of one. Only, the fact is not obvious. It is hidden in messy, accreted spaghetti code. It is almost like the messes created by C’s preprocessor except that it is not even apparent to the programmer who’s originally introducing the hidden sibling library that that’s what they are doing.

My goal in discussing this situation where 1) problems with the code remained undiscovered for a long time; 2) they were surfaced by a seemingly innocent change; 3) the developer then decided to ignore the alarm by changing the tests to pass; and 4) made 5.34’s File::Find and anything that depends on it unusable if you build Perl using a supported compiler is not to beat up on Perl and P5P. These are the kinds of things that happen every day in all sorts of codebases. However, the problems are rarely exhibited so clearly in such a specific instance in publicly available code, so this case provides a very good example to discuss.

In general, sticking with straight line code with few conditionals, giving functions and methods meaningful names, and not relying on weird undocumented constants and oddly contracted or meaningless names tends to help avoid whole categories of problems. Given the fact that it is easy to dismiss concerns at the time the code is being reviewed and the fact that benefits seem intangible and too far in the future or irrelevant leads to a compounding of one small, easily avoidable problem after another causing an actual avalanche of problems when things do go wrong.

The importance to me of this bug is minimal: My livelihood does not depend on Perl. I do not use Perl for anything important. I like Perl. I would like more people to use Perl. I also like writing software that is not fragile. I like writing code that lends itself to straightforward maintenance when the world it deals with changes.

File::Find consists entirely of convoluted and fragile code. I don’t know exactly when the first version was written, but I do know it was there in the mid-90s and the world has changed a lot since then. Each change introduced was in response to a specific problem that arose. Each change could well have been justified at the time. But, taken together, they have resulted in a module that is hard to maintain, exemplified by the fact that this kind of bug made it into a production release.

On r/perl, mpersico asked:

So net-net, we need to patch File::Find?

To me, that is not obvious: While installing Git or Visual Studio means you automatically get perl on Windows, very few people are building Perl using cl and I have a feeling even fewer are depending on that process to keep anything going with the most recent version of Perl. It may not be worth anyone’s time to worry about this or the general fragility of path handling on Windows: It certainly is clear from the way the File::Find change made it into the release with tests being modified to pass instead of code that fails the tests being scrutinized that it wasn’t worth core developers’ time to think carefully about code that targets Windows. I take the world as it is. The attitude is likely at least partially based on the fact that it is not easy to patch File::Find: The simple act of enabling an option to be set exposed code that had never run on Windows for a quarter of a century to a new environment. What if the changes we make cause further Heisenbugs that only appear when Perl is built using GCC on Linux? The impact of that problem on Perl’s reputation will be greater. In this particular case, people can just smugly say “You are using MSVC… my condolences” and move on. I am not sure anyone needs to do anything.

I still like discussing this will illustrate how sticking with certain rules of thumb can help reduce the chances that you’ll be pulling your hair out at 3 am with every availability indicator going red at the same time.

perlancar said:

I think (at least some) Perl core functions and core libraries need to do more automatic path style conversion (foo/bar to foo\bar and vice versa).

I agree completely. Let’s apply “be liberal in what you accept, conservative in what you produce”. It’s a good rule in this context regarless of its shortcomings in others. That is, within core perl code, we can use 🙈 to separate directory names in a file path, but when we produce any kind of output, let’s use that platform’s convention.

This, unfortunately, does not address the issue in File::Find though. In too many places, there are obscure substr manipulations and regex replacements with no explanation to help the poor “maintainer of the future”. So, the maintainer’s incentive is to avoid touching those no matter what.

When I write code, I use Path::Tiny which uses / in all paths internally, but gives you a canonpath method for when you want to interface with the outside world. That still leaves the responsibility up to the programmer which results in things like non-functioning symlinks created by cpanm. That indicates that perl itself should provide the conversion when interfacing with the OS.

File::Find users, on the other hand, may have come to rely on getting Unix style paths in the output (or, if they actually did build their perl using MSVC, they might have come to expect having a mix of \ and / which is why my scripts using File::Find tend to have a canonpath invocation as the first thing.) So, a better fix in this case might be to switch to canonicalizing all paths used in internal manipulations, but only ever exposing Unix style paths to wanted.

One might even question whether there is any point in using File::Find in a script instead of just running find ... -exec script.pl {} \; on the command line. Sure, all those extra processes are costly, but one does get to utilize the cores and memory much more easily with that. As it is the case with all things in life, the answer is likely to be “it depends”.

Let’s stick with the specific issue at hand.

Ultimately, the question of what needs to change depends on what our goal is: Do we want a more maintainable File::Find or do we want Module::Pluggable and similar functionality to be restored to a working state with the tiniest possible change to the current File::Find?

I am curious about where the decision to categorically remove the ability to set follow on Windows came from. This bifurcation seems to have happened 16 years ago in response to rt.perl.org#37223. If you look at the ticket, you will see that running:

mkdir "$dir/dir" or die;
find({wanted => sub { 1 }, follow => 1}, $dir);
print STDERR "ok $i\n";

generated the message:

C:/Temp/__empty__test__dir__/dir encountered a second time at C:/perl5/lib/File/Find.pm line 560.

The solution at the time should have involved figuring out why on earth File::Find was seeing this path twice on an operating system with no symlinks. Instead, we get

Presumably “follow” (and “follow_fast”?) should be no-ops on Win32 since symbolic links are not supported on that OS.

which is just sweeping the problem under the rug so it is no longer visible instead of investigating the actual cause. This is the attitude that results in disintegrating space shuttles and other less visible incidents. The root cause is revealed further down the thread:

On perl-5.8.6 I get this from the test script in the bug report:

not ok 1: The stat preceding -l _ wasn’t an lstat at D:/Perls/perl586/lib/File/Find.pm line 532.

There is a glimmer of hope in this comment:

since symbolic links aren’t available on Win32, below is a patch to default Win32 to not follow, regardless of what is passed in. The scary part is that this change allowed all the tests to continue to pass on Win32.

but it is immediately extinguished.

The robust solution to this problem would have been for the code to deal with the fact that lstat might not be a true lstat. But that’s too far gone now.

The sad reality is that File::Spec does not give enough low level information nor does it provide enough higher level operations to enable File::Find to categorically avoid direct manipulation of file paths. The first step would be to map those low level manipulations to actual concepts.

To that end, let’s look at contract_name. The first problem is that there are no unit tests for this function. Second, there is no documentation. Ironically, we do not know the contract of contract_name. Each person attempting to maintain it has to decipher the deeper meaning of a bunch of slicing and dicing:

sub contract_name {
    my ($cdir,$fn) = @_;

    return substr($cdir,0,rindex($cdir,'/')) if $fn eq $File::Find::current_dir;

    $cdir = substr($cdir,0,rindex($cdir,'/')+1);

    $fn =~ s|^\./||;

    my $abs_name= $cdir . $fn;

    if (substr($fn,0,3) eq '../') {
       1 while $abs_name =~ s!/[^/]*/\.\./+!/!;
    }

    return $abs_name;
}

Right off the bat, we see why the rest of the code is so adamant about not having a trailing slash. One certainly cannot rely on the return value of substr($cdir,0,rindex($cdir,'/')) if $cdir might have a trailing slash. With the assumption that there will never be a trailing slash, though, the code seems return the directory containing $cdir if the $fn parameter represents the same path as $File::Find::current_dir. In theory, we should be able to replace the first stanza with:

if ($fn eq $File::Find::current_dir) {
    return File::Spec->canonpath(File::Spec->join($cdir, File::Spec->updir));
}

In practice, that is already worrysome because of 1) the reliance on Unix directory separators everywhere else; and 2) I still do not understand the logic of returning the parent directory of $cdir if $fn is the current directory.

After that, the only purpose of the line

$cdir = substr($cdir,0,rindex($cdir,'/')+1);

seems to be to get the parent of $cdir but have a trailing slash at the end of the parent path. Using File::Spec->join, we don’t need to worry about that. In fact, we also should not need the

$fn =~ s|^\./||;

line if we stick with File::Spec->join. Finally, we get to:

if (substr($fn,0,3) eq '../') {
   1 while $abs_name =~ s!/[^/]*/\.\./+!/!;
}

If $fn starts with an upward traversal, remove all upward traversals from $abs_name … I am a bit confused. Assuming the meaning of the code is correct, though, one must admit that there is no corresponding method in File::Spec for doing this. To understand the intent, we need to look at 51393fc07, fecbda2b, and, finally 81793b90 which is where contract_name is introduced without any explanation or documentation. The original version looks like this:

sub contract_name {
    my ($cdir,$fn) = @_;

    return substr($cdir,0,rindex($cdir,'/')) if $fn eq '.';

    $cdir = substr($cdir,0,rindex($cdir,'/')+1);

    $fn =~ s|^\./||;

    my $abs_name= $cdir . $fn;

    if (substr($fn,0,3) eq '../') {
    do 1 while ($abs_name=~ s|/(?>[^/]+)/\.\./|/|);
    }

    return $abs_name;
}

I do not know how this plays with the remark in File::Spec->canonpath documentation

Note that this does not collapse x/../y sections into y. This is by design. If /foo on your system is a symlink to /bar/baz, then /foo/../quux is actually /bar/quux, not /quux as a naive ../-removal would give you.

Indeed, let’s see the difference between canonpath and the current code in File::Find. First, using the Visual Studio compiled perl 5.34.0:

C:\> perl -MFile::Spec::Functions=canonpath -E "say canonpath('/foo/../bar/mar/../quux')"
\bar\quux

Now, let’s look at the perl 5.32.1 that comes with Cygwin:

$ perl -MFile::Spec::Functions=canonpath -E 'say canonpath("/foo/../bar/mar/../quux")'
/foo/../bar/mar/../quux

Should one try to understand how on earth the simple difference of the convention of using \ instead of / as directory separators in file paths leads to this kind of discrepancy? How can the behavior on Windows deviate from the documented contract in such a drastic manner?

Clearly, 1 while $abs =~ s!/[^/]*/\.\./+!/!; is different than canonpath on Unix.

I decided to stop trying to figure out the actual implemented contract of File::Find and gave up any intention of rehabilitating File::Find and File::Spec. Instead, I sat down and worked on the smallest changeset that could possibly “work”.

No pull request (yet?), but here’s WiP code that lets tests pass on my machines. That is my fork of Perl5 which I am using to figure out what to send upstream.

PS: I am no longer on Reddit, but you can discuss this post on r/perl.