Who's testing the tests? The case of an interesting false negative

I still follow Perl releases and build them from source using the most recent version of Microsoft Visual C. Each new version of Perl brings in some improvements, interesting features, and some new curiosities to deal with.

After building 5.34, I tried to install JSON::MaybeXS which is a layer of indirection for picking among various Perl libraries for handling JSON files. I believe the community’s preferred library at this time is CPanel::JSON::XS. I use cpanm to test & install a library and all its dependencies in one go.

So, I typed cpanm JSON::MaybeXS and went to get coffee. When I got back, I reviewed the log file and noticed something curious:

Number found where operator expected at (eval 7) line 1, near "require threads::shared 1.21"
    (Do you need to predeclare require?)
t\125_shared_boolean.t ..... skipped: no shared_clone)

Hmmm?! “Do you need to predeclare require?” I shouldn’t. It is a Perl builtin.

What is that string which is being evaled?

eval "require threads::shared 1.21"

Did Perl get the ability to require modules with a version constraint and did I miss it?

Let’s look at the test code

BEGIN {
  plan skip_all => 'no threads' if !$Config{usethreads};
  eval "require threads::shared 1.21;";
  plan skip_all => 'no shared_clone' if ;
  plan tests => 8;
}

Nope. It seems like we have a case of “implementation by wishful thinking”. We know that we can specify a version constraint when use a module:

C:\> perl -e "use threads 3.14"
threads version 3.14 required--this is only version 2.26 at -e line 1.
BEGIN failed--compilation aborted at -e line 1.

That is equivalent to BEGIN { require threads; threads->VERSION(3.14) }.

require does not support require MODULE VERSION. The only thing one use is require VERSION which is a runtime check that the perl running your script is at least as recent as VERSION.

Anyone could have made this mistake. I sometimes find myself wishing Perl (and other languages) could load multiple distinct versions of a library at the same time so that library X which requires library Z to be older than version 5, and library Y which requires Zto be newer than 7 can work within the same program. Then I think about all the implications for ossification of outdated crap and give up wishing that and deal with reality.

Knowing that use supports the syntax use MODULE VERSION, knowing that use is equivalent to require and import in a BEGIN block, it is natural to have a short-circuit in the brain which leads to one assuming require MODULE VERSION also works and does the thing we want.

I am more interested in how such a mistake makes it into the released version as that has implications for the kinds of SDLC, review process, etc I prefer. As I discussed yesterday, false negatives, tests that pass even though the code is bad, are a burden and timebombs waiting to blow up at the worst possible time. They can remain hidden for a long time and they can stop your momentum in its tracks when some unrelated change now causes these tests to start failing. False negatives tend to be overlooked because, by definition, they do not prevent delivery at the time they are injected into the codebase. Even though the problems foregone cannot be explicitly accounted for, it is worth taking small extra steps to make sure they are not injected in the codebase.

So, let’s look at the commit which introduced this statement. The associated issue was discussed among three developers are experienced in Perl. I am not sure the actual commit was reviewed before being merged as there does not seem to be an associated PR.

One might ask why should we care. After all, the tests passed, the module got installed, my script ran.

The problem is that the tests in t/125_shared_boolean_t are presumably there for a reason. Failing to notice that the evaled code was throwing not because the requisite version of threads::shared was not installed, but because the argument to eval was incorrect Perl means if the tested for functionality stopped working, we’d never notice. If then we run into problems in production that are hard to track, we would be making incorrect assumptions about tests passing and this blindness might even lengthen our downtime.

So, is there a simple rule of thumb that would have prevented this problem?

Sure: Avoid the string form of eval except in a few specific circumstances.

If, instead of

eval "require threads::shared 1.21;";

we had used

eval { require threads::shared 1.21 };

the nmake test would have failed:

Test Summary Report
-------------------
t\125_shared_boolean.t   (Wstat: 512 Tests: 0 Failed: 0)
  Non-zero exit status: 2
  Parse errors: No plan found in TAP output
Files=60, Tests=2844, 14 wallclock secs ( 0.39 usr +  0.16 sys =  0.55 CPU)
Result: FAIL

The message in the log remains similar:

t\125_shared_boolean.t ..... Number found where operator expected at t\125_shared_boolean.t line 9, near "require threads 1.21"
        (Do you need to predeclare require?)
syntax error at t\125_shared_boolean.t line 9, near "require threads 1.21"
BEGIN not safe after errors--compilation aborted at t\125_shared_boolean.t line 13.
t\125_shared_boolean.t ..... Dubious, test returned 2 (wstat 512, 0x200)

but now the developer cannot fail to notice it because testing fails instead of skipping tests.

Note that the problem was not surfaced on TravisCI because none of the perls used supported threads in the first place and on AppVeyor because tests were silently skipped instead of failing.

I submitted a proposed fix.

PS: I am no longer on Reddit, but you can discuss this post on r/perl