UTF-8 output from perl in cmd.exe: It can't be just WriteFile

At first glance, the spurious trailing output perl produces in cmd.exe Window set to code page 65001 seems to be trivially explainable by the WriteFile bug brought to my attention by Tony Cook.

After all, look at this pattern:

C:\> chcp 65001

C:\> perl -e "print qq{\xce\xb1a}" 3 bytes, 2 characters
αaa
C:\> perl -e "print qq{\xce\xb1\xce\xb1a}" 5 bytes, 3 characters
ααa�a
C:\> perl -e "print qq{\xce\xb1\xce\xb1aa}" 6 bytes, 4 characters
ααaaaa

Then, you hit this one:

C:\> perl -e "print qq{\xce\xb1\xce\xb1\xce\xb1a}" 4 characters, 7 bytes
αααaαaa

If WriteFile reporting the number of characters written instead of bytes were the sole culprit, one would expect to see the original string of “αααa” and the last three bytes, i.e. “αa”, displayed. Instead, the extra output consists of “αaa”. Why is that? Probably because the sequence of events goes:

  1. Send seven bytes (representing the four character string “αααa”)
  2. All seven bytes are output, but we are told of only four
  3. Send three bytes (representing the two character string “αa”)
  4. All three bytes are output, but we are told of only two
  5. Send one more by (representing the single character “a”)
  6. We are told one byte was written, and it indeed was

So, what’s the problem?!

I got this from a perl where I modified PerlIOWin32_write to ignore what WriteFile says, and always return the count argument it was passed!

Yes, I searched: PerlIOWin32_write seems to be the only relevant place from which WriteFile is called.

As the short example at the end of my previous post shows, it is not WriteFile that keeps looping in response the confusion between numbers of bytes and characters written. If it were, then pushing extra layers onto STDOUT would not eliminate the problem of extraneous output:

#!/usr/bin/env perl

use utf8;
use strict;
use warnings;

use PerlIO::Layers qw( get_layers );
use YAML::XS;

binmode STDOUT, ':unix:encoding(utf8):crlf';
print qq{αβγabc};

print Dump get_layers(\*STDOUT);
C:\> perl g.pl
αβγabc--- ← Note correct output
- unix
- ~
- - CANWRITE
  - OPEN
  - TRUNCATE
  - CRLF
---
- crlf
- ~
- - CANWRITE
  - LINEBUF
  - TRUNCATE
  - FASTGETS
  - CRLF
---
- unix
- ~
- - CANWRITE
  - OPEN
  - TRUNCATE
---
- encoding
- utf8
- - CANWRITE
  - LINEBUF
  - UTF8
  - TRUNCATE
  - FASTGETS
---
- crlf
- ~
- - CANWRITE
  - LINEBUF
  - UTF8
  - TRUNCATE
  - FASTGETS
  - CRLF
  - WRBUF

Don’t get me wrong, as I showed before, the WriteFile bug is real.

But, the fact that spurious output persists when perl is compiled to ignore what WriteFile reports, and disappears when I push extra layers onto STDOUT seems to suggest something else might be in play as well.

That CRLF flag on the bottom-most Unix layer keeps bothering me, too.