Slashing one's feet with tests, or, how to fix 2,950 test failures in one fell swoop

This morning’s cpan-outdated | cpanm failed with a torrent of test failures from Pod::Simple.

The log file was so large that I was intimidated at first. Then, I switched to the build directory to see what was failing. Running nmake test filled my screen with a scrolling avalanche of test failures. When the dust settled, I noticed a pattern to the last few tests that were still visible on the screen:

...

#   Failed test ' find('pods::perldiag') should match survey's name2where{pods::perldiag}'
#   at t\search50.t line 78.
#          got: 'C:\...\lib\pods\perldiag.pod'
#     expected: 'C:/.../lib/pods/perldiag.pod'

#   Failed test ' find('DateTime::Locale::uz_Cyrl') should match survey's name2where{DateTime::Locale::uz_Cyrl}'
#   at t\search50.t line 78.
#          got: 'C:\...\site\lib\DateTime\Locale\uz_Cyrl.pm'
#     expected: 'C:/.../site/lib/DateTime/Locale/uz_Cyrl.pm'

...

# Looks like you failed 2950 tests of 2958.
t\search50.t .. Dubious, test returned 254 (wstat 65024, 0xfe00)
Failed 2950/2958 subtests

Notice that the paths actually match, except for the directory separators. The module produces Windows directory separators, when the script expects Unix-style directory separators.

Running prove -b t\search50.t told me I was facing a seemingly insurmountable obstacle:

Test Summary Report
-------------------
t\search50.t (Wstat: 65024 Tests: 2958 Failed: 2950)
  Failed tests:  8-2957
  Non-zero exit status: 254
Files=1, Tests=2958, 71 wallclock secs ( 1.17 usr +  0.08 sys =  1.25 CPU)
Result: FAIL

2,950 failed tests!

2,950 failed tests!!

2,950 failed tests!!!

This must be serious.

OK, let’s look at line 78 in t\search50.t:

# Search for all files in $name2where.
while (my ($testmod, $testpath) = each %{ $name2where }) {
  unless ( $testmod ) {
    fail;  # no 'thatpath/<name>.pm' means can't test find()
    next;
  }
  my @x = ($x->find($testmod)||'(nil)', $testpath);
  # print "# Comparing \"$x[0]\" to \"$x[1]\"\n";
  for(@x) { s{[/\\]}{/}g; }
  # print "#        => \"$x[0]\" to \"$x[1]\"\n";
  is(
      File::Spec->rel2abs($x[0]), ← This is line 78
      $x[1],
      " find('$testmod') should match survey's name2where{$testmod}"
  );
}

First things first: Where does $name2where come from?

It is populated earlier on in the test script by invoking the survey method in Pod::Simple::Search with no arguments.

By default, survey collects all the modules in your @INC. So, the extract above from t\search50.t carries out the same test for every single module in every single directory in your @INC.

That’s how I ended up 2,950 test failures. Your failures may vary.

What we have here are perfectly correlated tests. It is hard to imagine one test passing without all of them passing, or one test failing without all of them failing.

Given that they are perfectly correlated, running more of them gives you zero additional information about any aspect of the code.

It is a waste of time, and, literally, energy to do the same thing 2,950 times for no additional benefit.

What type of non-determinacy in the fabric of our universe would cause there to be a discrepancy between the path returned for DateTime::Locale::uz_Cyrl to be correct and the path returned for Perl::Critic::Policy::Modules::ProhibitExcessMainComplexity to be incorrect at the same time, and vice versa?

I am not a fan of inflated test counts resulting from running the same piece of code against thousands and thousands of tiny little cases. ~~The otherwise excellent Capture::Tiny also suffers from a similar testing approach.~~ (As David Golden points out, the comparison to Capture::Tiny is not really fair). You are just not going to get a clear picture from test failures. Even a single simple typo may cause thousands of test failures, and obscure the real problem.

Back to the problem at hand: Of course, changing:

  is(
      File::Spec->rel2abs($x[0]),
      $x[1],
      " find('$testmod') should match survey's name2where{$testmod}"
  );

  is(
      File::Spec->rel2abs($x[0]),
      File::Spec->rel2abs($x[1]),
      " find('$testmod') should match survey's name2where{$testmod}"
  );

will fix the test failures.

After all, if you are going to canonicalize one, you should canonicalize the other as well, right?

But, a thorny question remains: Why does survey seem to return paths with Unix style directory separators on Windows?

Except, it doesn’t!

If you look at the relevant code, you’ll notice that Pod::Simple::Search->survey meticulously uses File::Spec methods. Therefore, one would not expect to have the expected path contain the “wrong” directory separators.

Let’s look at line 75 in t\search50.t:

for(@x) { s{[/\\]}{/}g; }

Wait, what?!

This part of the test script attempts to canonicalize the paths used in the subsequent test by replacing all Windows and Unix style directory separators with forward slashes.

OK …

Then, immediately, in the invocation of is, it transforms one of the paths using File::Spec->rel2abs($x[0]) whereas the other is passed unaltered (following the transformation of all directory separators to forward slashes).

Of course, the whole point of File::Spec is to use the correct platform specific path components without explicitly worrying about it.

I don’t know if there is a more clear-cut case of self-sabotage.

Yes, just commenting out line 75 without changing anything else also fixes the 2,950 test failures.

Imagine that!

Just a single character, the lowly #, fixes almost three thousand test failures in one fell swoop.

While I like the general idea of TDD, this is a perfect illustration of why I am vary of its practitioners. Since tests are code, they will also have bugs.

It is essential for each additional test to provide at least some additional information about the validity of your code.

Having thousands upon thousands of individual tests that are perfectly correlated with each other does not provide that information. Just like regression lines with R²=0.99, the tests themselves are useless.

Also, keep in mind, in this case, just like in many others I have come across, these thousands of test failures do not actually indicate a failure of the code being tested.

So, what should the immediate patch for this test script be?

I decided I would make the smallest reasonable change instead of performing massive surgery. First, line 75 must go. Second, both arguments to is must be canonicalized. Unless you are testing arguments to another program, test code on Windows should not treat some\dir\file and some/dir\file and some\dir\file differently. They refer to the same file.

While the methods in Pod::Simple all seem to do the right thing by using File::Spec, it is still possible that some configuration or environment variable will use Unix style paths, and you do not want that to possibly cause superfluous test failures.

diff --git a/t/search50.t b/t/search50.t
index b049e15..f72c7c0 100644
--- a/t/search50.t
+++ b/t/search50.t
@@ -72,11 +72,10 @@ while (my ($testmod, $testpath) = each %{ $name2where }) {
   }
   my @x = ($x->find($testmod)||'(nil)', $testpath);
   # print "# Comparing \"$x[0]\" to \"$x[1]\"\n";
-  for(@x) { s{[/\\]}{/}g; }
   # print "#        => \"$x[0]\" to \"$x[1]\"\n";
   is(
       File::Spec->rel2abs($x[0]),
-      $x[1],
+      File::Spec->rel2abs($x[1]),
       " find('$testmod') should match survey's name2where{$testmod}"
   );
 }

C:\…\src\pod-simple> prove -b t\search50.t
t\search50.t .. ok
All tests successful.
Files=1, Tests=2958, 20 wallclock secs ( 0.39 usr +  0.02 sys =  0.41 CPU)
Result: PASS

OK, so fixing 2,950 with a single keystroke would have been cool, but I believe the additional robustness in the test gained from canonicalizing both arguments is worth the additional 25 or so keypresses.

After all, a programmer thinks more than s/he types.

My point is not to bash the contributors to Pod::Simple, but to use this perfect case in point to motivate everyone 1) to write more portable code by not hard-coding OS and FS specific parameters; and 2) to think harder about each test you are adding to the code base.

In this case, the for(@x) { s{[/\\]}{/}g; } was added to the codebase in 2009, and the application of File::Spec->rel2abs happened earlier this year. It is clear why the person making the latter change would not notice the bug that was introduced by the fact that both these pieces of code exist, and the bug in the test script wouldn’t be noticed by anyone using systems that do not produce Windows style directory separators.

PS: You can discuss this post on /r/perl.

PPS: Pull request.

Slashing one's feet with tests, or, how to fix 2,950 test failures in one fell swoop

A. Sinan Unur

August 25, 2015