It all boils down to stat vs fstat from MSVCRT on Windows

Last time, I had just noticed that my perl’s stat $filename and stat $fh did not agree on the modification time of a given file.

This turns out to be due to a difference in the way the underlying system calls on Windows behave.

Even though NTFS stores file times in UTC, how you access that information determines what you get. If you look at Microsoft’s File Times, you will notice that every function mentioned there does go through a handle, and not by the name of the file.

As far as I can see, Perl’s implementation of stat is here. After a bit of a prelude, we arrive at:

#if defined(WIN64) || defined(USE_LARGE_FILES)
    res = _stati64(path, sbuf);
#else
    res = stat(path, sbuf);
#endif

which means, with my build, stat $filename calls _stati64.

Further down, we have:

DllExport int
win32_fstat(int fd, Stat_t *sbufptr)
{
#if defined(WIN64) || defined(USE_LARGE_FILES)
     return _fstati64(fd, sbufptr);
#else
     return fstat(fd, sbufptr);
#endif
}

I am assuming this implements stat $fh.

Let’s take a look at an experiment:

C:\...\t> c:\opt\cygwin64\bin\touch -t 201502210108.00 test.txt

C:\...\t> dir test.txt
2015-02-21  02:08 AM                 0 test.txt

Already, we can see cmd.exe disagrees with us.

Look at the property sheet of the file:

So, first of all, cmd.exe and Explorer cannot agree on the modification time of this file.

Now, let’s see what Perl’s stat tells us:

C:\...\t> perl -E "say scalar localtime +(stat 'test.txt')[9]"
Sat Feb 21 01:08:00 2015

and

C:\..\t> perl -E "open $fh, 'test.txt'; say scalar localtime +(stat $fh)[9]"
Sat Feb 21 02:08:00 2015

and

C:\...\t> perl -E "say scalar gmtime +(stat 'test.txt')[9]"
Sat Feb 21 06:08:00 2015

and

C:\...\t> perl -E "open $fh, 'test.txt'; say scalar gmtime +(stat $fh)[9]"
Sat Feb 21 07:08:00 2015

I guess it’s good that they both apply the correct offset for February 21, 2015.

Apparently, this craziness was known by a lot of people, and I vaguely knew something wasn’t right about the way Windows handled Daylight Savings Time transitions, but this is the first time I was able to let it sink in.

I had also noted that I didn’t observe this discrepancy with Strawberry Perl because it has its own C runtime instead of relying on MSVCRT.

C:\...\t> c:\opt\strawberry\perl\bin\perl -E "say scalar gmtime +(stat 'test.txt')[9]"
Sat Feb 21 07:08:00 2015

and

C:\...\t> c:\opt\strawberry\perl\bin\perl -E "open $fh, 'test.txt'; say scalar gmtime +(stat $fh)[9]"
Sat Feb 21 07:08:00 2015

which agrees with what cmd.exe shows me, but, keep in mind that I created this file by telling touch to use a time of 01:08:00 EST, that is 06:08:00 UTC.

Now, let’s compare the output of these small C programs:

#include <sys/stat.h>
#include <stdio.h>

int main(int argc, char *argv[]) {
   struct _stati64 buf;
   int result;
   char* filename = argv[1];

   result = _stati64(filename, &buf);
   if ( result == 0) {
       printf("%16X\n", buf.st_mtime);
   }
   return 0;
}

gives

C:\...\t> mystat test.txt
        54E820C0

and

#include <io.h>
#include <fcntl.h>
#include <sys/stat.h>
#include <stdio.h>
#include <share.h>

int main(int argc, char *argv[]) {
    struct _stati64 buf;
    int fd, result;
    _sopen_s(&fd, argv[1], _O_RDONLY, _SH_DENYNO, _S_IREAD);
    result = _fstati64(fd, &buf);

    if (result == 0) {
        printf("%16x\n", buf.st_mtime);
    }
    _close( fd );
    return 0;
}

gives

C:\...\t> fstat test.txt
        54e82ed0

We have:

C:\...\t> perl -E "say scalar localtime 0x54E820C0"
Sat Feb 21 01:08:00 2015

and

C:\...\t>  perl -E "say scalar localtime 0x54e82ed0"
Sat Feb 21 02:08:00 2015

So, stat agrees with Explorer and fstat agrees with cmd.exe.

Finally, let’s consider FindFirstFile:

#include <windows.h>
#include <tchar.h>
#include <stdio.h>

int _tmain(int argc, TCHAR *argv[]) {
   WIN32_FIND_DATA ffd;
   SYSTEMTIME systim;
   HANDLE hFind;

   hFind = FindFirstFile(argv[1], &ffd);
   if (hFind != INVALID_HANDLE_VALUE)
   {
      FileTimeToSystemTime(&ffd.ftLastWriteTime, &systim);
      _tprintf(
        TEXT("%4d-%02d-%02dT%02d:%02d:%02d\n"),
        systim.wYear, systim.wMonth, systim.wDay,
        systim.wHour, systim.wMinute, systim.wSecond
      );
      FindClose(hFind);
   }
   return 0;
}

which gives

C:\...\t> ff test.txt
2015-02-21T06:08:00

Well, alright then.

Conclusion

I don’t think I have a useful conclusion, other than to speculate that it may be a good idea to work around this craziness so that stat $filename and stat $fh return consistent values.

If that is a good idea, should stat $fh use FindFirstFile? I don’t know the right answer. I am just talking out loud at this point.

PS: You can discuss this post on /r/perl.