nu42.com: Sinan Unur's blog on programming and information technologyν42: It is geek and cliché, but I like geek and I like clichétag:blog.nu42.com,2021-11-04:/2021-11-04T18:19:53.877782ZBleach your files faster!tag:www.nu42.com,2021-10-22:/2021/10/bleach-faster.html2021-10-22T15:30:00Z
<article class="h-entry article">
<div class="article-header-container">
<header>
<h1 class="p-name article-title">Bleach your files faster!</h1><h2 class="p-author article-author">A. Sinan Unur</h2><h3 class="article-published"><time datetime="2021-10-22T15:30:00Z" class="dt-published">October 22, 2021</time></h3>
</header>
</div>
<div class="article-content"><p>This is a quick follow up to my previous post “<a href="/2021/10/another-optimization-tale.html">Another optimization story</a>”. It was inspired by <a href="https://articles.foletta.org/">Greg Foletta</a>’s “<a href="https://articles.foletta.org/post/a-tale-of-two-optimisations/">A Tale Of Two Optimisations</a>” and <a href="https://news.ycombinator.com/item?id=28833985">subsequent commentary on HN</a>.</p>
<p>Briefly, we are trying to find ways to encode bytes using four whitespace characters and decode them back.</p>
<p>Encoding is done half a nibble at a time and corresponds to the mapping:</p>
<pre class="text"><code> x | y
--------------------------------
0 | '\t' (0x09, 0000_1001)
1 | '\n' (0x0a, 0000_1010)
2 | '\r' (0x0d, 0000_1101)
3 | ' ' (0x20, 0010_0000)</code></pre>
<p>That means given a byte containing a half nibble, denoted by <code>c</code>, we can use <code>(c * c) + 14 * (c & (c >> 1)) + 0x09;</code> to do the encoding because:</p>
<pre class="text"><code> c | c² + 9
----------------
0 | 9
1 | 10
2 | 13
3 | 18</code></pre>
<p>In case <code>c == 3</code>, and only in that case, we need to add <code>14</code> and we’d like to do this without using a conditional which is where the <code>(c & (c >> 1))</code> comes in: Both digits of half nibble are set if and only if it is equal to <code>3</code>.</p>
<p>This part was already in the original post, but I moved things around a little.</p>
<p>On the decoding side, we have the inverse mapping:</p>
<pre class="text"><code> x | y
---------------------------------------
'\t' (0x09, 0000_1001) | 0 (00)
'\n' (0x0a, 0000_1010) | 1 (01)
'\r' (0x0d, 0000_1101) | 2 (10)
' ' (0x20, 0010_0000) | 3 (11)</code></pre>
<p>Using integer arithmetic, <code>x/10</code> almost gets us there:</p>
<pre class="text"><code> x | x/10
---------------------------------------
'\t' (0x09, 0000_1001) | 0 (00)
'\n' (0x0a, 0000_1010) | 1 (01)
'\r' (0x0d, 0000_1101) | 1 (01)
' ' (0x20, 0010_0000) | 3 (11)</code></pre>
<p>In case <code>x</code> is <code>0x0d</code>, we need <code>x/10 + 1</code>. Luckily, <code>0x0d</code>is the only <code>x</code> value with bit 2 set, so <code>x/10 + !!(x & 4)</code> does the trick without using a conditional.</p>
<p>Division tends to be more expensive than multiplication, so we can replace <code>x/10</code> with <code>d*x</code> where <code>d</code> is an approximation to <code>1/10</code>. Luckily, we only need this to work for a very small number of small values, so we can approximate <code>1/10</code> using <code>ceil(64/6) = 7</code> and replace <code>x/10</code> with <code>(x * 7) >> 6</code>. You can verify that:</p>
<pre class="text"><code> x | (7 * x) >> 6
-------------------------------------------
'\t' (0x09, 0000_1001) | 0 (00)
'\n' (0x0a, 0000_1010) | 1 (01)
'\r' (0x0d, 0000_1101) | 1 (01)
' ' (0x20, 0010_0000) | 3 (11)</code></pre>
<p>holds. That means we can do all our airthmetic with 8 bits without needing to upcast/downcast anything.</p>
<p>Hence, mapping a whitespace character to half a nibble now becomes <code>((7 * x) >> 6) + !!(x & 4)</code>.</p>
<p>With <a href="https://github.com/nanis/wscoder/tree/200b1ac060a42e3d27015adeb4acb53bc1349ea5">these changes in place</a>, encoding a 256 MiB into 1,024 MiB on a <a href="https://ark.intel.com/content/www/us/en/ark/products/39312/intel-core2-duo-processor-t9900-6m-cache-3-06-ghz-1066-mhz-fsb.html">T9900</a> on Windows 10 64-bit:</p>
<pre class="text"><code>C:\> timethis "wse < test.data | wsd > NUL"
TimeThis : Command Line : wse < test.data > NUL
TimeThis : Elapsed Time : 00:00:02.187</code></pre>
<p>Decoding the resulting 1,024 MiB output file on the same system:</p>
<pre class="text"><code>C:\> timethis "wsd < test.encoded > NUL"
TimeThis : Command Line : wsd < test.encoded > NUL
TimeThis : Elapsed Time : 00:00:01.852</code></pre>
<p>This basically concludes the matter of replacing the lookup table with a function without having to resort to fitting polynomials. We can, indeed, and the resulting straightforward implementation gives me about twice the performance on this ancient T9900 compared to the 8th generation i7 laptop.</p>
<p>This led me to ask: Can I bleach files faster?</p>
<p>On and off, I tried to figure out if I can use SSE/AVX instructions in a similarly elegant fashion, but didn’t get anywhere (see also <a href="https://news.ycombinator.com/item?id=28859061">lifthrasiir’s solution on HN</a>).</p>
<p>Another approach is to ask whether multi-threading would help. While this task is mostly IO-bound, once caches are populated, <a href="https://www.nu42.com/2012/04/can-parallelforkmanager-speed-up.html">dividing the work can make things go faster</a>. To that end, I first wrote an implementation using <a href="https://en.cppreference.com/w/c/thread">C11 thread support library</a> before realizing <a href="https://devblogs.microsoft.com/cppblog/c11-and-c17-standard-support-arriving-in-msvc/">Visual Studio did not yet support it</a>. So, I converted the code to Franken-C++ to test the idea. I was not disappointed: With two threads, I got:</p>
<pre class="text"><code>TimeThis : Command Line : wse < test.data > NUL
TimeThis : Elapsed Time : 00:00:01.404</code></pre>
<p>which represents approximately 35% improvement in time and 55% improvement in encoding throughput. Decoding performance improved as well:</p>
<pre class="text"><code>TimeThis : Command Line : wsd < test.encoded > NUL
TimeThis : Elapsed Time : 00:00:01.505</code></pre>
<p>This corresponds to about 19% improvement in time and 23% improvement in throughtput.</p>
<p>The pipeline performance got worse on the dual core T9900 since each part of the pipeline was running two threads reducing parallelism in the pipeline.</p>
<p>While this has been fun (never been one to enjoy crossword puzzles) and I like the neat equations above, this is clearly not as performant as it can be.</p>
<p>In the HN thread, there was one comment which pointed out a very simple and straightforward replacement of a lookup table or a switch statement:</p>
<blockquote>
<p>clang does something smart with the switch, equivalent to this C code:</p>
</blockquote>
<blockquote>
<pre><code> unsigned char lookup2_encode(const unsigned char dibit) {
return ' \r\n\t' >> (dibit * 8);
}</code></pre>
</blockquote>
<p>For clarity, I rewrote that as (note we are in the Franken-C++ world with the threads now):</p>
<div class="sourceCode" id="cb11"><pre class="sourceCode C++"><code class="sourceCode cpp"><span id="cb11-1"><a href="#cb11-1" aria-hidden="true" tabindex="-1"></a><span class="at">static</span> <span class="dt">uint8_t</span></span>
<span id="cb11-2"><a href="#cb11-2" aria-hidden="true" tabindex="-1"></a>half_nibble_to_ws(<span class="at">const</span> <span class="dt">uint8_t</span>& c)</span>
<span id="cb11-3"><a href="#cb11-3" aria-hidden="true" tabindex="-1"></a>{</span>
<span id="cb11-4"><a href="#cb11-4" aria-hidden="true" tabindex="-1"></a> <span class="at">static</span> <span class="at">const</span> <span class="dt">uint32_t</span> x = (<span class="ch">' '</span> << <span class="dv">24</span>) |</span>
<span id="cb11-5"><a href="#cb11-5" aria-hidden="true" tabindex="-1"></a> (<span class="ch">'</span><span class="sc">\r</span><span class="ch">'</span> << <span class="dv">16</span>) |</span>
<span id="cb11-6"><a href="#cb11-6" aria-hidden="true" tabindex="-1"></a> (<span class="ch">'</span><span class="sc">\n</span><span class="ch">'</span> << <span class="dv">8</span>) |</span>
<span id="cb11-7"><a href="#cb11-7" aria-hidden="true" tabindex="-1"></a> <span class="ch">'</span><span class="sc">\t</span><span class="ch">'</span></span>
<span id="cb11-8"><a href="#cb11-8" aria-hidden="true" tabindex="-1"></a> ;</span>
<span id="cb11-9"><a href="#cb11-9" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb11-10"><a href="#cb11-10" aria-hidden="true" tabindex="-1"></a> <span class="cf">return</span> (x >> (<span class="dv">8</span> * c)) & <span class="bn">0xff</span>;</span>
<span id="cb11-11"><a href="#cb11-11" aria-hidden="true" tabindex="-1"></a>}</span></code></pre></div>
<p>I doubt the masking is necessary, but I am not sure so it stays.</p>
<p>How about the decoding part? My first thought was to position the half-nibbles in a 32-bit integer with offsets corresponding to the byte values of the characters. There are problems with that: First, <code>' '</code> has code <code>32</code>, and, of course, shifting a 32-bit integer left or right by 32 bits is not going to give us much (keep in mind that we are trying to avoid conditionals and special cases). Second, the code for <code>'\t'</code> is <code>9</code> while the code for <code>\n</code> is <code>10</code>. We can’t place <code>00</code> at bit <code>9</code> and <code>01</code> at bit <code>10</code> for obvious reasons. So, I went for the simplest solution: Place the half-nibbles at twice the difference between each character and the tab. That is:</p>
<pre class="text"><code> x | offset (value)
-----------------------------------
'\t' (0x09) | 0 (00)
'\n' (0x0a) | 2 (01)
'\r' (0x0d) | 8 (10)
' ' (0x20) | 46 (11)</code></pre>
<p>which means we need a 64-bit integer to store this map:</p>
<div class="sourceCode" id="cb13"><pre class="sourceCode C++"><code class="sourceCode cpp"><span id="cb13-1"><a href="#cb13-1" aria-hidden="true" tabindex="-1"></a><span class="at">static</span> <span class="dt">uint8_t</span></span>
<span id="cb13-2"><a href="#cb13-2" aria-hidden="true" tabindex="-1"></a>ws_to_half_nibble(<span class="at">const</span> <span class="dt">uint8_t</span>& ws)</span>
<span id="cb13-3"><a href="#cb13-3" aria-hidden="true" tabindex="-1"></a>{</span>
<span id="cb13-4"><a href="#cb13-4" aria-hidden="true" tabindex="-1"></a> <span class="at">static</span> <span class="at">const</span> <span class="dt">uint64_t</span> x = (<span class="dv">3</span><span class="bu">ULL</span> << <span class="dv">46</span>) | (<span class="dv">2</span> << <span class="dv">8</span>) | (<span class="dv">1</span> << <span class="dv">2</span>);</span>
<span id="cb13-5"><a href="#cb13-5" aria-hidden="true" tabindex="-1"></a> <span class="cf">return</span> (x >> (<span class="dv">2</span> * (ws - <span class="ch">'</span><span class="sc">\t</span><span class="ch">'</span>))) & <span class="dv">3</span>;</span>
<span id="cb13-6"><a href="#cb13-6" aria-hidden="true" tabindex="-1"></a>}</span></code></pre></div>
<p>Encoding the same 256 MB file now took:</p>
<pre class="text"><code>TimeThis : Command Line : wse < test.data > NUL
TimeThis : Elapsed Time : 00:00:00.933</code></pre>
<p>which is a 34% improvement on the nice, elegant looking (in the eye of the beholder) expression I had come up with to replace the OP’s polynomial.</p>
<p>Decoding the resulting 1,024 MB file took:</p>
<pre class="text"><code>TimeThis : Command Line : wsd < test.encoded > NUL
TimeThis : Elapsed Time : 00:00:01.457</code></pre>
<p>which seems to represent a consistent 3% improvement over the expression</p>
<div class="sourceCode" id="cb16"><pre class="sourceCode C"><code class="sourceCode c"><span id="cb16-1"><a href="#cb16-1" aria-hidden="true" tabindex="-1"></a>((<span class="dv">7</span> * x) >> <span class="dv">6</span>) + !!(x & <span class="dv">4</span>)</span></code></pre></div>
<p>I formulated above. I still like that expression (approximating division by 10 using multiplication by 7 followed by shifting right 6 bits and the straight-line handling of the special case just feels more elegant than just shuffling bits), but, oh well.</p>
<p>I should note that I stuck with the <code>stdio</code> functions (<code>fread</code>/<code>fwrite</code>), because using <a href="https://en.cppreference.com/w/cpp/io/basic_istream/read"><code>istream::read</code></a> and <a href="https://en.cppreference.com/w/cpp/io/basic_ostream/write"><code>ostream::write</code></a> caused approximately 30% slower overall operation.</p>
<p>The current implementation is <a href="https://github.com/nanis/wscoder">available on GitHub</a>. You can <a href="https://news.ycombinator.com/item?id=28958419">discuss this post on HN</a>.</p>
</div>
</article>
Sinan UnurAnother optimization storytag:www.nu42.com,2021-10-15:/2021/10/another-optimization-tale.html2021-10-15T15:30:00Z
<article class="h-entry article">
<div class="article-header-container">
<header>
<h1 class="p-name article-title">Another optimization story</h1><h2 class="p-author article-author">A. Sinan Unur</h2><h3 class="article-published"><time datetime="2021-10-15T15:30:00Z" class="dt-published">October 15, 2021</time></h3>
</header>
</div>
<div class="article-content"><p>I quickly skimmed through “<a href="https://articles.foletta.org/post/a-tale-of-two-optimisations/">A Tale of Two Optimizations</a>” a couple of days ago on a phone after noticing the title <a href="https://news.ycombinator.com/item?id=28833985">on HN</a>. As the author mentions in the <a href="https://github.com/gregfoletta/whitespacer/blob/c439402a65f4614cefb8b6a28eed2a172fab0349/README.md#seems-pointless">repo</a>:</p>
<blockquote>
<p>I’m not sure there are many useful applications of this program. But it was a fun little project with a small scope.</p>
</blockquote>
<p>Indeed, it is not easy these days to find fun, well defined projects to exercise our programming muscles just for the fun of it these days: Leetcode is everywhere and StackOverflow is not really that much fun for me any more.</p>
<p>I was a little struck by what seemed to be low throughput:</p>
<blockquote>
<p>on my Intel i7-8650U laptop, running over a file that’s cached in memory and outputting to /dev/null, the encoding / decoding process runs at 258MiB/s.</p>
</blockquote>
<p>Of course, it is unreasonable to expect to achieve max memory bandwidth, but I thought it ought to be possible to do better than that. Among the optimization attempts, this one caught my eye:</p>
<blockquote>
<p>I had an idea about using mathematical functions (rather than the lookup table) to perform the encoding/decoding.</p>
</blockquote>
<p>If the lookup table is small enough to fit in 4 bytes, it will probably stay in the fastest cache available to the CPU so that is probably not a huge concern. Also, as some commenters on <a href="https://news.ycombinator.com/item?id=28859877">HN noted</a>, compilers can come up with neat tricks.</p>
<p>I was surprised that the author went the way of fitting a polynomial using linear regression to the four points and using its inverse to speed things up. I would not expect much of a return there: Even if the mapping in one direction might help, the inverse mapping is unlikely to be symmetrically performant. In fact, the author did run two separate regressions, presumably for that reason.</p>
<p>It can be a good idea to see if you can replace lookups with computation, but it is worth looking for something simpler than polynomials. It is not obvious a mapping that is simple enough exists, especially one that avoids branching.</p>
<p>The <a href="https://github.com/gregfoletta/whitespacer/blob/f86771d10447e1bf57b35ae710637e9e80576d69/encoding.c#L6">encoding scheme</a> chosen by the author is straightforward:</p>
<blockquote>
<pre><code> char encode_lookup_tbl[] = { '\t', '\n', '\r', ' ' };</code></pre>
</blockquote>
<p>We want a function that will map the values <code>(0, 1, 2, 3)</code> to whitespace characters <code>('\t', '\n', '\r', ' ')</code>. At first blush, it looks like there is no obvious systematic relationship we can exploit:</p>
<pre class="text"><code> x | y
-------------
0 | 9
1 | 10
2 | 13
3 | 32</code></pre>
<p>which is what I think led the author to consider fitting polynomials. This is where representation starts to matter. We are used to looking at numbers in decimal notation, but that can hide patterns that might otherwise be useful to us. After all, all we need is to map four values on one side to four values on the other side. Let’s look at it in hex:</p>
<pre class="text"><code> x | y
-------------
0 | 0x09
1 | 0x0a
2 | 0x0d
3 | 0x20</code></pre>
<p>OK, this already gives me something. Three of the values have their high nibble set to zero and the other one has its low nibble set to zero. To be frank, I stared at this for a while to figure out if I can come up with some simple arithmetic, but that went nowhere. That’s when I thought I might gain more insight by looking at the bit patterns. So, I expressed both sides in binary:</p>
<pre class="text"><code> x | y
-----------------
00 | 00001001
01 | 00001010
10 | 00001101
11 | 00100000</code></pre>
<p>Look at that! The first three <code>x</code> values map straight to bits 1 and 2 in the <code>y</code> values.</p>
<p>This actually made the decoding, that is, going from whitespace values to half-<a href="https://en.wikipedia.org/wiki/Nibble">nibble</a> values rather straightforward:</p>
<div class="sourceCode" id="cb5"><pre class="sourceCode C"><code class="sourceCode c"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a><span class="dt">uint8_t</span></span>
<span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a>ws_to_half_nibble(<span class="dt">uint8_t</span> ws)</span>
<span id="cb5-3"><a href="#cb5-3" aria-hidden="true" tabindex="-1"></a>{</span>
<span id="cb5-4"><a href="#cb5-4" aria-hidden="true" tabindex="-1"></a> <span class="cf">return</span> ((ws >> <span class="dv">1</span>) & <span class="dv">3</span>) | (ws >> <span class="dv">4</span>) | (ws >> <span class="dv">5</span>);</span>
<span id="cb5-5"><a href="#cb5-5" aria-hidden="true" tabindex="-1"></a>}</span></code></pre></div>
<p>The <code>(ws >> 1) & 3</code> part maps the tab, newline, and carriage return characters to 0, 1, and 2, respectively.</p>
<p>If the whitespace value is space (<code>0x20</code>, <code>00100000</code>), then <code>(ws >> 4) | (ws >> 5)</code> gives me 3 (<code>11</code> in binary).</p>
<p>Therefore, decoding becomes:</p>
<div class="sourceCode" id="cb6"><pre class="sourceCode C"><code class="sourceCode c"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a><span class="dt">uint8_t</span></span>
<span id="cb6-2"><a href="#cb6-2" aria-hidden="true" tabindex="-1"></a>ws_decode(<span class="dt">const</span> <span class="dt">uint8_t</span>* buf)</span>
<span id="cb6-3"><a href="#cb6-3" aria-hidden="true" tabindex="-1"></a>{</span>
<span id="cb6-4"><a href="#cb6-4" aria-hidden="true" tabindex="-1"></a> <span class="cf">return</span> (ws_to_half_nibble(buf[<span class="dv">0</span>]) << <span class="dv">6</span>) |</span>
<span id="cb6-5"><a href="#cb6-5" aria-hidden="true" tabindex="-1"></a> (ws_to_half_nibble(buf[<span class="dv">1</span>]) << <span class="dv">4</span>) |</span>
<span id="cb6-6"><a href="#cb6-6" aria-hidden="true" tabindex="-1"></a> (ws_to_half_nibble(buf[<span class="dv">2</span>]) << <span class="dv">2</span>) |</span>
<span id="cb6-7"><a href="#cb6-7" aria-hidden="true" tabindex="-1"></a> ws_to_half_nibble(buf[<span class="dv">3</span>])</span>
<span id="cb6-8"><a href="#cb6-8" aria-hidden="true" tabindex="-1"></a> ;</span>
<span id="cb6-9"><a href="#cb6-9" aria-hidden="true" tabindex="-1"></a>}</span></code></pre></div>
<p>I did not unroll for performance: This just seemed just as clear if not more so as having a <code>for</code> loop with computed indexes.</p>
<p>Finding a similarly straightforward mapping in the opposite direction, one that maps <code>0</code>, <code>1</code>, <code>2</code>, and <code>3</code> to <code>\t</code>, <code>\n</code>, <code>\r</code>, <code>' '</code>, respectively felt harder for me until I decided to use offsets from the tab character value in the <code>y</code> column:</p>
<pre class="text"><code> x | Δ
----------------------------
00 | 00000000 ( 0, 0x00)
01 | 00000001 ( 1, 0x01)
10 | 00000100 ( 4, 0x04)
11 | 00010111 (23, 0x17)</code></pre>
<p>Once again, the first three values seem straightforward to map: Just square the half-nibble. How do we special-case <code>3</code> without introducing branching? The answer turns out to look trivial: <code>3&times;3</code> is 9. <code>23 - 9 = 14</code>. So, just add <code>14</code> to the result of squaring if bit <code>0</code> and bit <code>1</code> are both set. Therefore, encoding a byte into four whitespace characters becomes:</p>
<div class="sourceCode" id="cb8"><pre class="sourceCode C"><code class="sourceCode c"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a><span class="dt">void</span></span>
<span id="cb8-2"><a href="#cb8-2" aria-hidden="true" tabindex="-1"></a>ws_encode(<span class="dt">uint8_t</span> c, <span class="dt">uint8_t</span>* buf)</span>
<span id="cb8-3"><a href="#cb8-3" aria-hidden="true" tabindex="-1"></a>{</span>
<span id="cb8-4"><a href="#cb8-4" aria-hidden="true" tabindex="-1"></a> <span class="cf">for</span> (<span class="dt">int</span> i = <span class="dv">0</span>; i < <span class="dv">4</span>; ++i) {</span>
<span id="cb8-5"><a href="#cb8-5" aria-hidden="true" tabindex="-1"></a> <span class="dt">const</span> <span class="dt">uint8_t</span> b = (c >> <span class="dv">2</span> * (<span class="dv">3</span> - i)) & <span class="dv">3</span>; <span class="co">// move half nibble into position</span></span>
<span id="cb8-6"><a href="#cb8-6" aria-hidden="true" tabindex="-1"></a> <span class="dt">const</span> <span class="dt">uint8_t</span> b0 = b & <span class="dv">1</span>;</span>
<span id="cb8-7"><a href="#cb8-7" aria-hidden="true" tabindex="-1"></a> <span class="dt">const</span> <span class="dt">uint8_t</span> b1 = (b >> <span class="dv">1</span>);</span>
<span id="cb8-8"><a href="#cb8-8" aria-hidden="true" tabindex="-1"></a> buf[i] = <span class="ch">'\t'</span> + (b * b) + <span class="dv">14</span> * (b0 & b1);</span>
<span id="cb8-9"><a href="#cb8-9" aria-hidden="true" tabindex="-1"></a> }</span>
<span id="cb8-10"><a href="#cb8-10" aria-hidden="true" tabindex="-1"></a>}</span></code></pre></div>
<p>Hmmm … while actually elegant, I am not sure if the squaring is doing me any favors here. An alternative way to write this (in the context of this specific problem) is:</p>
<div class="sourceCode" id="cb9"><pre class="sourceCode C"><code class="sourceCode c"><span id="cb9-1"><a href="#cb9-1" aria-hidden="true" tabindex="-1"></a><span class="dt">void</span></span>
<span id="cb9-2"><a href="#cb9-2" aria-hidden="true" tabindex="-1"></a>ws_encode(<span class="dt">uint8_t</span> c, <span class="dt">uint8_t</span>* buf)</span>
<span id="cb9-3"><a href="#cb9-3" aria-hidden="true" tabindex="-1"></a>{</span>
<span id="cb9-4"><a href="#cb9-4" aria-hidden="true" tabindex="-1"></a> <span class="cf">for</span> (<span class="dt">int</span> i = <span class="dv">0</span>; i < <span class="dv">4</span>; ++i) {</span>
<span id="cb9-5"><a href="#cb9-5" aria-hidden="true" tabindex="-1"></a> <span class="dt">const</span> <span class="dt">uint8_t</span> b = (c >> <span class="dv">2</span> * (<span class="dv">3</span> - i)) & <span class="dv">3</span>; <span class="co">// move half nibble into position</span></span>
<span id="cb9-6"><a href="#cb9-6" aria-hidden="true" tabindex="-1"></a> <span class="dt">const</span> <span class="dt">uint8_t</span> b0 = b & <span class="dv">1</span>;</span>
<span id="cb9-7"><a href="#cb9-7" aria-hidden="true" tabindex="-1"></a> <span class="dt">const</span> <span class="dt">uint8_t</span> b1 = b & <span class="dv">2</span>;</span>
<span id="cb9-8"><a href="#cb9-8" aria-hidden="true" tabindex="-1"></a> buf[i] = <span class="ch">'\t'</span> + b0 + <span class="dv">2</span> * b1 + <span class="dv">18</span> * (b0 & (b1 >> <span class="dv">1</span>));</span>
<span id="cb9-9"><a href="#cb9-9" aria-hidden="true" tabindex="-1"></a> }</span>
<span id="cb9-10"><a href="#cb9-10" aria-hidden="true" tabindex="-1"></a>}</span></code></pre></div>
<p>This then answers my original question of whether there is a way to replace the lookup tables with functions which have no branching for both encoding and decoding.</p>
<p>I am not sure how this would affect overall performance as I decided to limit the time I spent on this post by focusing solely on the task of expressing the mapping of half-nibbles to the chosen whitespace characters The specific order in which the original lookup table was constructed happened to be very helpful in coming up with relatively simple expressions.</p>
</div>
</article>
Sinan UnurPerl's File::Find on Windows: A path forward?tag:www.nu42.com,2021-09-22:/2021/09/canonical-paths-file-find-way-forward.html2021-09-22T16:30:00Z
<article class="h-entry article">
<div class="article-header-container">
<header>
<h1 class="p-name article-title">Perl's File::Find on Windows: A path forward?</h1><h2 class="p-author article-author">A. Sinan Unur</h2><h3 class="article-published"><time datetime="2021-09-22T16:30:00Z" class="dt-published">September 22, 2021</time></h3>
</header>
</div>
<div class="article-content"><p>In <a href="/2021/09/implementation-by-wishful-thinking.html">Implementation by Wishful Thinking</a>, I looked at a problem caused by a simple change to how <a href="https://metacpan.org/pod/File::Find">File::Find</a> handles symlink related options. The problem itself was moved several steps from the actual change in <code>File::Find</code> and manifested itself in <code>Module::Pluggable</code> not being able to find plugins.</p>
<p>Before <a href="https://github.com/Perl/perl5/commit/0d00729c03a1f68e1b51e986d1ce9000b0e3d301">the change that gave rise to my not being able to install <code>pwhich</code></a>, <code>File::Find</code> used to have this:</p>
<div class="sourceCode" id="cb1"><pre class="sourceCode perl"><code class="sourceCode perl"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="dt">$full_check</span> = <span class="dt">$Is_Win32</span> ? <span class="dv">0</span> : <span class="dt">$wanted</span>->{follow};</span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a><span class="dt">$follow</span> = <span class="dt">$Is_Win32</span> ? <span class="dv">0</span> :</span>
<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a> <span class="dt">$full_check</span> || <span class="dt">$wanted</span>->{follow_fast};</span></code></pre></div>
<p>That is, it used to be that if the program was running on Windows, <code>full_check</code> and <code>follow</code> were always disabled regardless of the options passed in.</p>
<p>If you think this does not make much sense to begin with, you are correct. It doesn’t. The code is conflating something that is more a property of filesystems with being a property of an operating system. More importantly, let’s say we know no Windows version will ever support symlinks and no Perl script running on Windows will ever encounter a filesystem with symlinks on it. Even then, there is no <em>a priori</em> reason to create a situation whereby chunks of code are never exercised at all on one operating system. This way, we ended up with two separate libraries instead of one. Only, the fact is not obvious. It is hidden in messy, accreted spaghetti code. It is almost like the messes created by C’s preprocessor except that it is not even apparent to the programmer who’s originally introducing the hidden sibling library that that’s what they are doing.</p>
<p>My goal in discussing this situation where 1) problems with the code remained undiscovered for a long time; 2) they were surfaced by a seemingly innocent change; 3) the developer then decided to ignore the alarm by <em>changing the tests to pass</em>; and 4) made 5.34’s <code>File::Find</code> and anything that depends on it unusable if you build Perl using a supported compiler is not to beat up on Perl and P5P. These are the kinds of things that happen every day in all sorts of codebases. However, the problems are rarely exhibited so clearly in such a specific instance in publicly available code, so this case provides a very good example to discuss.</p>
<p>In general, sticking with straight line code with few conditionals, giving functions and methods meaningful names, and not relying on weird undocumented constants and oddly contracted or meaningless names tends to help avoid whole categories of problems. Given the fact that it is easy to dismiss concerns at the time the code is being reviewed and the fact that benefits seem intangible and too far in the future or irrelevant leads to a compounding of one small, easily avoidable problem after another causing an actual avalanche of problems when things do go wrong.</p>
<p>The importance to me of this bug is minimal: My livelihood does not depend on Perl. I do not use Perl for anything important. I like Perl. I would like more people to use Perl. I also like writing software that is not fragile. I like writing code that lends itself to straightforward maintenance when the world it deals with changes.</p>
<p><code>File::Find</code> consists entirely of convoluted and fragile code. I don’t know exactly when the first version was written, but I do know it was there in the mid-90s and the world has changed a lot since then. Each change introduced was in response to a specific problem that arose. Each change could well have been justified at the time. But, taken together, they have resulted in a module that is hard to maintain, exemplified by the fact that this kind of bug made it into a production release.</p>
<p>On r/perl, <a href="https://old.reddit.com/r/perl/comments/pp89k5/implementation_by_wishful_thinking/hd3901v/">mpersico asked</a>:</p>
<blockquote>
<p>So net-net, we need to patch File::Find?</p>
</blockquote>
<p>To me, that is not obvious: While installing Git or Visual Studio means you automatically get <code>perl</code> on Windows, very few people are building Perl using <code>cl</code> and I have a feeling even fewer are depending on that process to keep anything going with the most recent version of Perl. It may not be worth anyone’s time to worry about this or the general fragility of path handling on Windows: It certainly is clear from the way the <code>File::Find</code> change made it into the release with tests being modified to pass instead of code that fails the tests being scrutinized that it wasn’t worth core developers’ time to think carefully about code that targets Windows. I take the world as it is. The attitude is likely at least partially based on the fact that it is not easy to patch <code>File::Find</code>: The simple act of enabling an option to be set exposed code that had never run on Windows for a quarter of a century to a new environment. What if the changes we make cause further Heisenbugs that only appear when Perl is built using GCC on Linux? The impact of that problem on Perl’s reputation will be greater. In this particular case, people can just smugly say “<a href="/2014/12/yeah-you-put-me-in-my-place-real-good.html">You are using MSVC… my condolences</a>” and move on. I am not sure anyone needs to do anything.</p>
<p>I still like discussing this will illustrate how sticking with certain rules of thumb can help reduce the chances that you’ll be pulling your hair out at 3 am with every availability indicator going red at the same time.</p>
<p><a href="https://www.reddit.com/r/perl/comments/pp89k5/implementation_by_wishful_thinking/hd9rx4f/">perlancar said</a>:</p>
<blockquote>
<p>I think (at least some) Perl core functions and core libraries need to do more automatic path style conversion (<code>foo/bar</code> to <code>foo\bar</code> and vice versa).</p>
</blockquote>
<p>I agree completely. Let’s apply “<a href="https://datatracker.ietf.org/doc/html/rfc761">be liberal in what you accept, conservative in what you produce</a>”. It’s a good rule in this context <a href="https://tools.ietf.org/id/draft-thomson-postel-was-wrong-03.html">regarless of its shortcomings in others</a>. That is, within core <code>perl</code> code, we can use <abbr title="See no evil">🙈</abbr> to separate directory names in a file path, but when we produce any kind of output, let’s use that platform’s convention.</p>
<p>This, unfortunately, does not address the issue in <code>File::Find</code> though. In too many places, there are obscure <code>substr</code> manipulations and regex replacements with no explanation to help the poor “maintainer of the future”. So, the maintainer’s incentive is to avoid touching those no matter what.</p>
<p>When I write code, I use <a href="https://metacpan.org/pod/Path::Tiny">Path::Tiny</a> which uses <code>/</code> in all paths internally, but gives you a <code>canonpath</code> method for when you want to interface with the outside world. That still leaves the responsibility up to the programmer which results in things like non-functioning symlinks created by <code>cpanm</code>. That indicates that <code>perl</code> itself should provide the conversion when interfacing with the OS.</p>
<p><code>File::Find</code> users, on the other hand, may have come to rely on getting Unix style paths in the output (or, if they actually did build their <code>perl</code> using MSVC, they might have come to expect having a mix of <code>\</code> and <code>/</code> which is why my scripts using <code>File::Find</code> tend to have a <code>canonpath</code> invocation as the first thing.) So, a better fix in this case might be to switch to canonicalizing all paths used in internal manipulations, but only ever exposing Unix style paths to <code>wanted</code>.</p>
<p>One might even question whether there is any point in using <code>File::Find</code> in a script instead of just running <code>find ... -exec script.pl {} \;</code> on the command line. Sure, all those extra processes are costly, but one does get to utilize the cores and memory much more easily with that. As it is the case with all things in life, the answer is likely to be “<em>it depends</em>”.</p>
<p>Let’s stick with the specific issue at hand.</p>
<p>Ultimately, the question of what needs to change depends on what our goal is: Do we want a more maintainable <code>File::Find</code> or do we want <code>Module::Pluggable</code> and similar functionality to be restored to a working state with the tiniest possible change to the current <code>File::Find</code>?</p>
<p>I am curious about where the decision to categorically remove the ability to set <code>follow</code> on Windows came from. This <a href="https://github.com/Perl/perl5/commit/204b4d7f800e266ce239f9e434271307a9c45b3e">bifurcation seems to have happened 16 years ago</a> in response to <a href="https://github.com/Perl/perl5/issues/8120">rt.perl.org#37223</a>. If you look at the ticket, you will see that running:</p>
<div class="sourceCode" id="cb2"><pre class="sourceCode perl"><code class="sourceCode perl"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="fu">mkdir</span> <span class="ot">"</span><span class="dt">$dir</span><span class="st">/dir</span><span class="ot">"</span> <span class="ot">or</span> <span class="fu">die</span>;</span>
<span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a>find({wanted => <span class="kw">sub </span>{ <span class="dv">1</span> }, follow => <span class="dv">1</span>}, <span class="dt">$dir</span>);</span>
<span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a><span class="fu">print</span> STDERR <span class="ot">"</span><span class="st">ok </span><span class="dt">$i</span><span class="ch">\n</span><span class="ot">"</span>;</span></code></pre></div>
<p>generated the message:</p>
<blockquote>
<p><code>C:/Temp/__empty__test__dir__/dir</code> encountered a second time at <code>C:/perl5/lib/File/Find.pm</code> <a href="https://github.com/Perl/perl5/blob/667342e9372792655bd9e69275759a3f66394d54/lib/File/Find.pm#L560">line 560</a>.</p>
</blockquote>
<p>The solution at the time should have involved figuring out why on earth <code>File::Find</code> was seeing this path twice on an operating system with no symlinks. Instead, we get</p>
<blockquote>
<p>Presumably “<code>follow</code>” (and “<code>follow_fast</code>”?) should be no-ops on Win32 since symbolic links are not supported on that OS.</p>
</blockquote>
<p>which is just sweeping the problem under the rug so it is no longer visible instead of investigating the actual cause. This is the attitude that results in disintegrating space shuttles and other less visible incidents. The root cause is revealed further down the thread:</p>
<blockquote>
<p>On perl-5.8.6 I get this from the test script in the bug report:</p>
</blockquote>
<blockquote>
<p>not ok 1: The stat preceding <code>-l _</code> wasn’t an lstat at <code>D:/Perls/perl586/lib/File/Find.pm</code> <a href="https://github.com/Perl/perl5/blob/667342e9372792655bd9e69275759a3f66394d54/lib/File/Find.pm#L533">line 532</a>.</p>
</blockquote>
<p>There is a glimmer of hope in this comment:</p>
<blockquote>
<p>since symbolic links aren’t available on Win32, below is a patch to default Win32 to not follow, regardless of what is passed in. The scary part is that this change allowed all the tests to continue to pass on Win32.</p>
</blockquote>
<p>but it is immediately extinguished.</p>
<p>The robust solution to this problem would have been for the code to deal with the fact that <code>lstat</code> might not be a true <code>lstat</code>. But that’s too far gone now.</p>
<p>The sad reality is that <code>File::Spec</code> does not give enough low level information nor does it provide enough higher level operations to enable <code>File::Find</code> to categorically avoid direct manipulation of file paths. The first step would be to map those low level manipulations to actual concepts.</p>
<p>To that end, let’s look at <a href="https://github.com/Perl/perl5/blob/blead/ext/File-Find/lib/File/Find.pm#L28"><code>contract_name</code></a>. The first problem is that there are no unit tests for this function. Second, there is no documentation. Ironically, we do not know the contract of <code>contract_name</code>. Each person attempting to maintain it has to decipher the deeper meaning of a bunch of slicing and dicing:</p>
<div class="sourceCode" id="cb3"><pre class="sourceCode perl"><code class="sourceCode perl"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="kw">sub </span><span class="fu">contract_name</span> {</span>
<span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a> <span class="kw">my</span> (<span class="dt">$cdir</span>,<span class="dt">$fn</span>) = <span class="dt">@_</span>;</span>
<span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb3-4"><a href="#cb3-4" aria-hidden="true" tabindex="-1"></a> <span class="kw">return</span> <span class="fu">substr</span>(<span class="dt">$cdir</span>,<span class="dv">0</span>,<span class="fu">rindex</span>(<span class="dt">$cdir</span>,<span class="ot">'</span><span class="ss">/</span><span class="ot">'</span>)) <span class="kw">if</span> <span class="dt">$fn</span> <span class="ot">eq</span> <span class="dt">$File</span>::<span class="dt">Find</span>::<span class="dt">current_dir</span>;</span>
<span id="cb3-5"><a href="#cb3-5" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb3-6"><a href="#cb3-6" aria-hidden="true" tabindex="-1"></a> <span class="dt">$cdir</span> = <span class="fu">substr</span>(<span class="dt">$cdir</span>,<span class="dv">0</span>,<span class="fu">rindex</span>(<span class="dt">$cdir</span>,<span class="ot">'</span><span class="ss">/</span><span class="ot">'</span>)+<span class="dv">1</span>);</span>
<span id="cb3-7"><a href="#cb3-7" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb3-8"><a href="#cb3-8" aria-hidden="true" tabindex="-1"></a> <span class="dt">$fn</span> =~ <span class="ot">s|</span><span class="ch">^</span><span class="ot">\./||</span>;</span>
<span id="cb3-9"><a href="#cb3-9" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb3-10"><a href="#cb3-10" aria-hidden="true" tabindex="-1"></a> <span class="kw">my</span> <span class="dt">$abs_name</span>= <span class="dt">$cdir</span> . <span class="dt">$fn</span>;</span>
<span id="cb3-11"><a href="#cb3-11" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb3-12"><a href="#cb3-12" aria-hidden="true" tabindex="-1"></a> <span class="kw">if</span> (<span class="fu">substr</span>(<span class="dt">$fn</span>,<span class="dv">0</span>,<span class="dv">3</span>) <span class="ot">eq</span> <span class="ot">'</span><span class="ss">../</span><span class="ot">'</span>) {</span>
<span id="cb3-13"><a href="#cb3-13" aria-hidden="true" tabindex="-1"></a> <span class="dv">1</span> <span class="kw">while</span> <span class="dt">$abs_name</span> =~ <span class="ot">s!/</span><span class="ch">[^</span><span class="bn">/</span><span class="ch">]*</span><span class="ot">/\.\./</span><span class="ch">+</span><span class="ot">!</span><span class="st">/</span><span class="ot">!</span>;</span>
<span id="cb3-14"><a href="#cb3-14" aria-hidden="true" tabindex="-1"></a> }</span>
<span id="cb3-15"><a href="#cb3-15" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb3-16"><a href="#cb3-16" aria-hidden="true" tabindex="-1"></a> <span class="kw">return</span> <span class="dt">$abs_name</span>;</span>
<span id="cb3-17"><a href="#cb3-17" aria-hidden="true" tabindex="-1"></a>}</span></code></pre></div>
<p>Right off the bat, we see why the rest of the code is so adamant about not having a trailing slash. One certainly cannot rely on the return value of <code>substr($cdir,0,rindex($cdir,'/'))</code> if <code>$cdir</code> might have a trailing slash. With the assumption that there will never be a trailing slash, though, the code seems return the directory containing <code>$cdir</code> if the <code>$fn</code> parameter represents the same path as <code>$File::Find::current_dir</code>. In theory, we should be able to replace the first stanza with:</p>
<div class="sourceCode" id="cb4"><pre class="sourceCode perl"><code class="sourceCode perl"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a><span class="kw">if</span> (<span class="dt">$fn</span> <span class="ot">eq</span> <span class="dt">$File</span>::<span class="dt">Find</span>::<span class="dt">current_dir</span>) {</span>
<span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a> <span class="kw">return</span> <span class="fu">File::Spec</span>->canonpath(<span class="fu">File::Spec</span>-><span class="fu">join</span>(<span class="dt">$cdir</span>, <span class="fu">File::Spec</span>->updir));</span>
<span id="cb4-3"><a href="#cb4-3" aria-hidden="true" tabindex="-1"></a>}</span></code></pre></div>
<p>In practice, that is already worrysome because of 1) the reliance on Unix directory separators everywhere else; and 2) I still do not understand the logic of returning the parent directory of <code>$cdir</code> if <code>$fn</code> is the current directory.</p>
<p>After that, the only purpose of the line</p>
<div class="sourceCode" id="cb5"><pre class="sourceCode perl"><code class="sourceCode perl"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a><span class="dt">$cdir</span> = <span class="fu">substr</span>(<span class="dt">$cdir</span>,<span class="dv">0</span>,<span class="fu">rindex</span>(<span class="dt">$cdir</span>,<span class="ot">'</span><span class="ss">/</span><span class="ot">'</span>)+<span class="dv">1</span>);</span></code></pre></div>
<p>seems to be to get the parent of <code>$cdir</code> but have a trailing slash at the end of the parent path. Using <code>File::Spec->join</code>, we don’t need to worry about that. In fact, we also should not need the</p>
<div class="sourceCode" id="cb6"><pre class="sourceCode perl"><code class="sourceCode perl"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a><span class="dt">$fn</span> =~ <span class="ot">s|</span><span class="ch">^</span><span class="ot">\./||</span>;</span></code></pre></div>
<p>line if we stick with <code>File::Spec->join</code>. Finally, we get to:</p>
<div class="sourceCode" id="cb7"><pre class="sourceCode perl"><code class="sourceCode perl"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a><span class="kw">if</span> (<span class="fu">substr</span>(<span class="dt">$fn</span>,<span class="dv">0</span>,<span class="dv">3</span>) <span class="ot">eq</span> <span class="ot">'</span><span class="ss">../</span><span class="ot">'</span>) {</span>
<span id="cb7-2"><a href="#cb7-2" aria-hidden="true" tabindex="-1"></a> <span class="dv">1</span> <span class="kw">while</span> <span class="dt">$abs_name</span> =~ <span class="ot">s!/</span><span class="ch">[^</span><span class="bn">/</span><span class="ch">]*</span><span class="ot">/\.\./</span><span class="ch">+</span><span class="ot">!</span><span class="st">/</span><span class="ot">!</span>;</span>
<span id="cb7-3"><a href="#cb7-3" aria-hidden="true" tabindex="-1"></a>}</span></code></pre></div>
<p>If <code>$fn</code> starts with an upward traversal, remove <em>all</em> upward traversals from <code>$abs_name</code> … I am a bit confused. Assuming the meaning of the code is correct, though, one must admit that there is no corresponding method in <code>File::Spec</code> for doing this. To understand the intent, we need to look at <a href="https://github.com/Perl/perl5/commit/51393fc07355ffd0a4b6b212fd676ee37de23e09">51393fc07</a>, <a href="https://github.com/Perl/perl5/commit/fecbda2b590e985946f0a69ff09a806c69267f6f">fecbda2b</a>, and, finally <a href="https://github.com/Perl/perl5/commit/81793b9077171abb50d56e2bdf5d4208e13a783d">81793b90</a> which is where <code>contract_name</code> is introduced without any explanation or documentation. The original version looks like this:</p>
<div class="sourceCode" id="cb8"><pre class="sourceCode perl"><code class="sourceCode perl"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a><span class="kw">sub </span><span class="fu">contract_name</span> {</span>
<span id="cb8-2"><a href="#cb8-2" aria-hidden="true" tabindex="-1"></a> <span class="kw">my</span> (<span class="dt">$cdir</span>,<span class="dt">$fn</span>) = <span class="dt">@_</span>;</span>
<span id="cb8-3"><a href="#cb8-3" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb8-4"><a href="#cb8-4" aria-hidden="true" tabindex="-1"></a> <span class="kw">return</span> <span class="fu">substr</span>(<span class="dt">$cdir</span>,<span class="dv">0</span>,<span class="fu">rindex</span>(<span class="dt">$cdir</span>,<span class="ot">'</span><span class="ss">/</span><span class="ot">'</span>)) <span class="kw">if</span> <span class="dt">$fn</span> <span class="ot">eq</span> <span class="ot">'</span><span class="ss">.</span><span class="ot">'</span>;</span>
<span id="cb8-5"><a href="#cb8-5" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb8-6"><a href="#cb8-6" aria-hidden="true" tabindex="-1"></a> <span class="dt">$cdir</span> = <span class="fu">substr</span>(<span class="dt">$cdir</span>,<span class="dv">0</span>,<span class="fu">rindex</span>(<span class="dt">$cdir</span>,<span class="ot">'</span><span class="ss">/</span><span class="ot">'</span>)+<span class="dv">1</span>);</span>
<span id="cb8-7"><a href="#cb8-7" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb8-8"><a href="#cb8-8" aria-hidden="true" tabindex="-1"></a> <span class="dt">$fn</span> =~ <span class="ot">s|</span><span class="ch">^</span><span class="ot">\./||</span>;</span>
<span id="cb8-9"><a href="#cb8-9" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb8-10"><a href="#cb8-10" aria-hidden="true" tabindex="-1"></a> <span class="kw">my</span> <span class="dt">$abs_name</span>= <span class="dt">$cdir</span> . <span class="dt">$fn</span>;</span>
<span id="cb8-11"><a href="#cb8-11" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb8-12"><a href="#cb8-12" aria-hidden="true" tabindex="-1"></a> <span class="kw">if</span> (<span class="fu">substr</span>(<span class="dt">$fn</span>,<span class="dv">0</span>,<span class="dv">3</span>) <span class="ot">eq</span> <span class="ot">'</span><span class="ss">../</span><span class="ot">'</span>) {</span>
<span id="cb8-13"><a href="#cb8-13" aria-hidden="true" tabindex="-1"></a> <span class="kw">do</span> <span class="dv">1</span> <span class="kw">while</span> (<span class="dt">$abs_name</span>=~ <span class="ot">s|/</span><span class="ch">(?>[^</span><span class="bn">/</span><span class="ch">]+)</span><span class="ot">/\.\./|</span><span class="st">/</span><span class="ot">|</span>);</span>
<span id="cb8-14"><a href="#cb8-14" aria-hidden="true" tabindex="-1"></a> }</span>
<span id="cb8-15"><a href="#cb8-15" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb8-16"><a href="#cb8-16" aria-hidden="true" tabindex="-1"></a> <span class="kw">return</span> <span class="dt">$abs_name</span>;</span>
<span id="cb8-17"><a href="#cb8-17" aria-hidden="true" tabindex="-1"></a>}</span></code></pre></div>
<p>I do not know how this plays with the remark in <a href="https://metacpan.org/pod/File::Spec#canonpath"><code>File::Spec->canonpath</code> documentation</a></p>
<blockquote>
<p>Note that this does <em>not</em> collapse <code>x/../y</code> sections into <code>y</code>. This is by design. If <code>/foo</code> on your system is a symlink to <code>/bar/baz</code>, then <code>/foo/../quux</code> is actually <code>/bar/quux</code>, not <code>/quux</code> as a naive <code>../</code>-removal would give you.</p>
</blockquote>
<p>Indeed, let’s see the difference between <code>canonpath</code> and the current code in <code>File::Find</code>. First, using the Visual Studio compiled <code>perl</code> 5.34.0:</p>
<div class="sourceCode" id="cb9"><pre class="sourceCode perl"><code class="sourceCode perl"><span id="cb9-1"><a href="#cb9-1" aria-hidden="true" tabindex="-1"></a>C:\> perl <span class="ot">-M</span><span class="fu">File::Spec</span>::<span class="fu">Functions</span>=canonpath -E <span class="ot">"</span><span class="st">say canonpath('/foo/../bar/mar/../quux')</span><span class="ot">"</span></span>
<span id="cb9-2"><a href="#cb9-2" aria-hidden="true" tabindex="-1"></a>\bar\quux</span></code></pre></div>
<p>Now, let’s look at the <code>perl</code> 5.32.1 that comes with Cygwin:</p>
<div class="sourceCode" id="cb10"><pre class="sourceCode perl"><code class="sourceCode perl"><span id="cb10-1"><a href="#cb10-1" aria-hidden="true" tabindex="-1"></a>$ perl <span class="ot">-M</span><span class="fu">File::Spec</span>::<span class="fu">Functions</span>=canonpath -E <span class="ot">'</span><span class="ss">say canonpath("/foo/../bar/mar/../quux")</span><span class="ot">'</span></span>
<span id="cb10-2"><a href="#cb10-2" aria-hidden="true" tabindex="-1"></a><span class="ot">/foo/</span>..<span class="ot">/bar/m</span>ar/..<span class="ot">/quux</span></span></code></pre></div>
<p>Should one try to understand how on earth the simple difference of the convention of using <code>\</code> instead of <code>/</code> as directory separators in file paths leads to this kind of discrepancy? How can the behavior on Windows deviate from the documented contract in such a drastic manner?</p>
<p>Clearly, <code>1 while $abs =~ s!/[^/]*/\.\./+!/!;</code> is different than <code>canonpath</code> on Unix.</p>
<p>I decided to stop trying to figure out the actual implemented contract of <code>File::Find</code> and gave up any intention of rehabilitating <code>File::Find</code> and <code>File::Spec</code>. Instead, I sat down and worked on the smallest changeset that could possibly “work”.</p>
<p>No pull request (yet?), but <a href="https://github.com/nanis/perl5/pull/1/files">here’s WiP code that lets tests pass on my machines</a>. That is my fork of Perl5 which I am using to figure out what to send upstream.</p>
<p>PS: I am no longer on Reddit, but you can <a href="https://old.reddit.com/r/perl/comments/pti2az/perls_filefind_on_windows_a_path_forward/">discuss this post on r/perl</a>.</p>
</div>
</article>
Sinan UnurImplementation by Wishful Thinkingtag:www.nu42.com,2021-09-15:/2021/09/implementation-by-wishful-thinking.html2021-09-15T16:15:00Z
<article class="h-entry article">
<div class="article-header-container">
<header>
<h1 class="p-name article-title">Implementation by Wishful Thinking</h1><h2 class="p-author article-author">A. Sinan Unur</h2><h3 class="article-published"><time datetime="2021-09-15T16:15:00Z" class="dt-published">September 15, 2021</time></h3>
</header>
</div>
<div class="article-content"><p>It is natural to want the world to be simpler. To wish legacy code away. To want to avoid dealing with platform specific differences.</p>
<p>It is natural. But, in most cases, it also not possible for the world to be simpler. Legacy code is always there. And, there are unavoidable platform specific differences.</p>
<p>I brought up the fact that <a href="https://www.nu42.com/2015/04/windows-has-symlinks.html">that Windows has symlinks</a> more than five years ago. At the time, creating them required admin privileges. Soon afterwards, <a href="https://blogs.windows.com/windowsdeveloper/2016/12/02/symlinks-windows-10/">the ability for non-admin users in developer mode to create symlinks</a> was implemented. In the Perl world, Bayan Maxim wrote <a href="https://metacpan.org/pod/Win32::NTFS::Symlink">Win32::NTFS::Symlink</a>.</p>
<p>Fast forward to me playing with Perl 5.34 built on Windows with MSVC recently. I wanted to install <a href="https://metacpan.org/pod/App::pwhich"><code>pwhich</code></a> so I ran <code>cpanm App::pwhich</code>. After a while, the installation failed deep inside the chain of dependencies. I changed in to my <code>.cpanm</code> directory and noticed:</p>
<p><code>2021-09-13 10:39 AM <SYMLINK> build.log [C:\Users\user\.cpanm\work\$time.4012\build.log]</code></p>
<p><code>2021-09-13 10:39 AM <SYMLINK> latest-build [C:\Users\user/.cpanm/work/$time.4012]</code></p>
<p>Oh, cute … Let’s use those symlinks:</p>
<pre class="text"><code>C:\Users\user\.cpanm> cd latest-build
The directory name is invalid.</code></pre>
<p>and</p>
<pre class="text"><code>C:\Users\user\.cpanm> more build.log
<<< output snipped >>></code></pre>
<p>Why does one symlink work and not the other? Simple: Internal APIs in DOS (I think since v.3) and Windows really do not care whether directories are separated using <code>\</code> or <code>/</code>, but mixing the two styles within a single path <em>and</em> passing that to an external program is not good. In the above listing, you will notice that the symlink for <code>build.log</code> is using a canonicalized target whereas the symlink for <code>latest-build</code> is appending a Unix style subdirectory to a Windows style parent directory path.</p>
<p>As the synopsis for <a href="https://metacpan.org/pod/Module::CoreList">Module::CoreList</a> shows, <a href="https://metacpan.org/pod/File::Spec">File::Spec</a> was added to Perl core with version <code>5.005</code> in <a href="https://metacpan.org/dist/perl/view/pod/perlhist.pod#THE-RECORDS">1998</a>. I’d say 23 years is enough to consistently handle file paths. In <code>cpanm</code>, the code for creating these symlinks looks like this:</p>
<div class="sourceCode" id="cb3"><pre class="sourceCode perl"><code class="sourceCode perl"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="kw">if</span> (CAN_SYMLINK){</span>
<span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a> <span class="kw">my</span> <span class="dt">$build_link</span> = <span class="ot">"</span><span class="dt">$self</span>-><span class="st">{home}/latest-build</span><span class="ot">"</span>;</span>
<span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a> <span class="fu">unlink</span> <span class="dt">$build_link</span>;</span>
<span id="cb3-4"><a href="#cb3-4" aria-hidden="true" tabindex="-1"></a> <span class="fu">symlink</span> <span class="dt">$self</span>->{base}, <span class="dt">$build_link</span>;</span>
<span id="cb3-5"><a href="#cb3-5" aria-hidden="true" tabindex="-1"></a> <span class="fu">unlink</span> <span class="dt">$final_log</span>;</span>
<span id="cb3-6"><a href="#cb3-6" aria-hidden="true" tabindex="-1"></a> <span class="fu">symlink</span> <span class="dt">$self</span>->{<span class="fu">log</span>}, <span class="dt">$final_log</span>;</span>
<span id="cb3-7"><a href="#cb3-7" aria-hidden="true" tabindex="-1"></a>}</span></code></pre></div>
<p>The difference between the two links is simple to see:</p>
<div class="sourceCode" id="cb4"><pre class="sourceCode perl"><code class="sourceCode perl"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a><span class="dt">$self</span>->{base} = <span class="ot">"</span><span class="dt">$self</span>-><span class="st">{home}/work/</span><span class="ot">"</span> . <span class="fu">time</span> . <span class="ot">"</span><span class="st">.</span><span class="wa">$$</span><span class="ot">"</span>;</span>
<span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb4-3"><a href="#cb4-3" aria-hidden="true" tabindex="-1"></a><span class="co"># versus</span></span>
<span id="cb4-4"><a href="#cb4-4" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb4-5"><a href="#cb4-5" aria-hidden="true" tabindex="-1"></a><span class="dt">$self</span>->{<span class="fu">log</span>} = <span class="fu">File::Spec</span>->catfile(<span class="dt">$self</span>->{base},<span class="ot">"</span><span class="st">build.log</span><span class="ot">"</span>)</span></code></pre></div>
<p>What we see here is the fact that the developer embedded their then current knowledge of the world (“Perl on Windows does not support symlinks”) as a permanent assumption in the code. Manipulating all paths using <code>File::Spec</code> would not have been significantly more complicated at the time, but would have avoided the maintenance burden when the state of the world changed (and change it will).</p>
<p>This is not the most interesting thing. This bug is a mere inconvenience, one that doesn’t matter that much.</p>
<p>What caused <code>cpanm App::pwhich</code> to fail?</p>
<p>It turns out running the tests for <code>App::pwhich</code> requires <code>Test2::V0</code>. And, that depends on <a href="https://metacpan.org/pod/Module::Pluggable">Module::Pluggable</a>. The tests for <code>Module::Pluggable</code> failed. <code>Module::Pluggable</code> is in the Perl core.</p>
<p>Why did it fail?</p>
<pre class="text"><code># Failed test at t\02alsoworks.t line 13.
# Failed test 'is deeply'
# at t\02alsoworks.t line 17.
# Structures begin differing at:
# $got->[0] = Does not exist
# $expected->[0] = 'MyOtherTest::Plugin::Bar'
# etc etc etc</code></pre>
<p>A tedious study of the module’s code and tests did not reveal any immediate hints. Then I tried single stepping, but that proved to be rather boring. So I resorted to <code>printf</code> debugging. … OMFG! Look at that path!</p>
<p><code>C:/Users/user/.cpanm/work/$time.1592/Module-Pluggable-5.2/C:\Users\user\.cpanm\work\$time.1592\Module-Pluggable-5.2\t\lib\MyOtherTest\Plugin</code></p>
<p>That’s when I went trawling for anything symlink related in <code>Module::Pluggable</code>. It <a href="https://github.com/simonwistow/Module-Pluggable/blob/master/lib/Module/Pluggable/Object.pm#L74">wasn’t hard to find</a>:</p>
<div class="sourceCode" id="cb6"><pre class="sourceCode perl"><code class="sourceCode perl"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a><span class="dt">$self</span>->{<span class="ot">'</span><span class="ss">follow_symlinks</span><span class="ot">'</span>} = <span class="dv">1</span> <span class="kw">unless</span> <span class="fu">exists</span> <span class="dt">$self</span>->{<span class="ot">'</span><span class="ss">follow_symlinks</span><span class="ot">'</span>};</span></code></pre></div>
<p>So, if the thing that is instantiating the <code>Module::Pluggable</code> does not specify a value, <code>Module::Pluggable</code> defaults to setting the <code>follow_symlink</code> option when invoking <code>find</code>:</p>
<div class="sourceCode" id="cb7"><pre class="sourceCode perl"><code class="sourceCode perl"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a><span class="fu">File::Find</span>::<span class="fu">find</span>( { no_chdir => <span class="dv">1</span>,</span>
<span id="cb7-2"><a href="#cb7-2" aria-hidden="true" tabindex="-1"></a> follow => <span class="dt">$self</span>->{<span class="ot">'</span><span class="ss">follow_symlinks</span><span class="ot">'</span>},</span>
<span id="cb7-3"><a href="#cb7-3" aria-hidden="true" tabindex="-1"></a> wanted => <span class="kw">sub </span>{</span>
<span id="cb7-4"><a href="#cb7-4" aria-hidden="true" tabindex="-1"></a> <span class="co"># Inlined from File::Find::Rule C< name => '*.pm' ></span></span>
<span id="cb7-5"><a href="#cb7-5" aria-hidden="true" tabindex="-1"></a> <span class="kw">return</span> <span class="kw">unless</span> <span class="dt">$File</span>::<span class="dt">Find</span>::<span class="dt">name</span> =~ <span class="ot">/</span><span class="dt">$file_regex</span><span class="ot">/</span>;</span>
<span id="cb7-6"><a href="#cb7-6" aria-hidden="true" tabindex="-1"></a> (<span class="kw">my</span> <span class="dt">$path</span> = <span class="dt">$File</span>::<span class="dt">Find</span>::<span class="dt">name</span>) =~ <span class="ot">s#</span><span class="ch">^</span><span class="ot">\\./##</span>;</span>
<span id="cb7-7"><a href="#cb7-7" aria-hidden="true" tabindex="-1"></a> <span class="fu">push</span> <span class="dt">@files</span>, <span class="dt">$path</span>;</span>
<span id="cb7-8"><a href="#cb7-8" aria-hidden="true" tabindex="-1"></a> }</span>
<span id="cb7-9"><a href="#cb7-9" aria-hidden="true" tabindex="-1"></a> }, <span class="dt">$search_path</span> );</span></code></pre></div>
<p>There is nothing fundamentally wrong with that. After all, <code>find</code> should know what to do.</p>
<p>Except … <code>File::Find</code> is as legacy as it gets with a whole bunch of assumptions from decades ago permeating every line. In the olden days, passing <code>follow => 1</code> to <code>find</code> did not even matter: It categorically did <em>not</em> follow symlinks on Windows. What changed?</p>
<p>This commit titled <a href="https://github.com/Perl/perl5/commit/0d00729c03a1f68e1b51e986d1ce9000b0e3d301">File::Find support Win32 symlinks</a> where the code change consists of replacing:</p>
<div class="sourceCode" id="cb8"><pre class="sourceCode perl"><code class="sourceCode perl"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a><span class="dt">$full_check</span> = <span class="dt">$Is_Win32</span> ? <span class="dv">0</span> : <span class="dt">$wanted</span>->{follow};</span>
<span id="cb8-2"><a href="#cb8-2" aria-hidden="true" tabindex="-1"></a><span class="dt">$follow</span> = <span class="dt">$Is_Win32</span> ? <span class="dv">0</span> :</span>
<span id="cb8-3"><a href="#cb8-3" aria-hidden="true" tabindex="-1"></a> <span class="dt">$full_check</span> || <span class="dt">$wanted</span>->{follow_fast};</span></code></pre></div>
<p>with</p>
<div class="sourceCode" id="cb9"><pre class="sourceCode perl"><code class="sourceCode perl"><span id="cb9-1"><a href="#cb9-1" aria-hidden="true" tabindex="-1"></a><span class="dt">$full_check</span> = <span class="dt">$wanted</span>->{follow};</span>
<span id="cb9-2"><a href="#cb9-2" aria-hidden="true" tabindex="-1"></a><span class="dt">$follow</span> = <span class="dt">$full_check</span> || <span class="dt">$wanted</span>->{follow_fast};</span></code></pre></div>
<p>and expecting everything will just work out. I can only call this “implementation by wishful thinking.”</p>
<p>The only substantial code change is in <a href="https://github.com/Perl/perl5/commit/0d00729c03a1f68e1b51e986d1ce9000b0e3d301#diff-7533486a4554444c46583e4a8909408439de617a2d14b8ab9c71cbe9931a8f2f">taint.t</a> which consists of adding:</p>
<div class="sourceCode" id="cb10"><pre class="sourceCode perl"><code class="sourceCode perl"><span id="cb10-1"><a href="#cb10-1" aria-hidden="true" tabindex="-1"></a><span class="kw">BEGIN</span> {</span>
<span id="cb10-2"><a href="#cb10-2" aria-hidden="true" tabindex="-1"></a> <span class="fu">require</span> <span class="fu">File::Spec</span>;</span>
<span id="cb10-3"><a href="#cb10-3" aria-hidden="true" tabindex="-1"></a> <span class="kw">if</span> (<span class="wa">$ENV</span>{PERL_CORE}) {</span>
<span id="cb10-4"><a href="#cb10-4" aria-hidden="true" tabindex="-1"></a> <span class="co"># May be doing dynamic loading while @INC is all relative</span></span>
<span id="cb10-5"><a href="#cb10-5" aria-hidden="true" tabindex="-1"></a> <span class="wa">@INC</span> = <span class="fu">map</span> { <span class="wa">$_</span> = <span class="fu">File::Spec</span>->rel2abs(<span class="wa">$_</span>); <span class="ot">/</span><span class="ch">(</span><span class="ot">.</span><span class="ch">*)</span><span class="ot">/</span>; <span class="wa">$1</span> } <span class="wa">@INC</span>;</span>
<span id="cb10-6"><a href="#cb10-6" aria-hidden="true" tabindex="-1"></a> }</span>
<span id="cb10-7"><a href="#cb10-7" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb10-8"><a href="#cb10-8" aria-hidden="true" tabindex="-1"></a> <span class="kw">if</span> (<span class="wa">$^O</span> <span class="ot">eq</span> <span class="ot">'</span><span class="ss">MSWin32</span><span class="ot">'</span> || <span class="wa">$^O</span> <span class="ot">eq</span> <span class="ot">'</span><span class="ss">cygwin</span><span class="ot">'</span> || <span class="wa">$^O</span> <span class="ot">eq</span> <span class="ot">'</span><span class="ss">VMS</span><span class="ot">'</span>) {</span>
<span id="cb10-9"><a href="#cb10-9" aria-hidden="true" tabindex="-1"></a> <span class="co"># This is a hack - at present File::Find does not produce native names</span></span>
<span id="cb10-10"><a href="#cb10-10" aria-hidden="true" tabindex="-1"></a> <span class="co"># on Win32 or VMS, so force File::Spec to use Unix names.</span></span>
<span id="cb10-11"><a href="#cb10-11" aria-hidden="true" tabindex="-1"></a> <span class="co"># must be set *before* importing File::Find</span></span>
<span id="cb10-12"><a href="#cb10-12" aria-hidden="true" tabindex="-1"></a> <span class="fu">require</span> <span class="fu">File::Spec</span>::<span class="fu">Unix</span>;</span>
<span id="cb10-13"><a href="#cb10-13" aria-hidden="true" tabindex="-1"></a> <span class="dt">@File</span>::<span class="dt">Spec</span>::<span class="dt">ISA</span> = <span class="ot">'</span><span class="ss">File::Spec::Unix</span><span class="ot">'</span>;</span>
<span id="cb10-14"><a href="#cb10-14" aria-hidden="true" tabindex="-1"></a> }</span>
<span id="cb10-15"><a href="#cb10-15" aria-hidden="true" tabindex="-1"></a> <span class="fu">require</span> <span class="fu">File::Find</span>;</span>
<span id="cb10-16"><a href="#cb10-16" aria-hidden="true" tabindex="-1"></a> <span class="fu">import</span> <span class="fu">File::Find</span>;</span>
<span id="cb10-17"><a href="#cb10-17" aria-hidden="true" tabindex="-1"></a>}</span></code></pre></div>
<p>So, tests fail with the new change in place and the instinct is to modify the tests to pass (treat this as a false positive) instead of finding the problem in the code under test and fixing the code. In case it is not obvious, this code is telling <code>File::Spec</code> to assume it is running on a Unix flavor disregarding the actual platform.</p>
<p>Allowing <code>follow</code> to be set causes code paths that used to be skipped on Windows to now be executed. That’s how we end up with a path like <code>C:/Users/user/.cpanm/work/$time.1592/Module-Pluggable-5.2/C:\Users\user\.cpanm\work\$time.1592\Module-Pluggable-5.2\t\lib\MyOtherTest\Plugin</code>.</p>
<p><code>File::Find</code>, despite the fact that <a href="https://github.com/Perl/perl5/blob/2f1eff3d4e0c24e2ac28c8bcaa8eb740b8e22c48/ext/File-Find/lib/File/Find.pm#L18">it loads <code>File::Spec</code></a>, relies on hard-coded <code>/</code> characters. That’s how we end up executing <a href="https://github.com/Perl/perl5/blob/2f1eff3d4e0c24e2ac28c8bcaa8eb740b8e22c48/ext/File-Find/lib/File/Find.pm#L206">line 206</a>:</p>
<div class="sourceCode" id="cb11"><pre class="sourceCode perl"><code class="sourceCode perl"><span id="cb11-1"><a href="#cb11-1" aria-hidden="true" tabindex="-1"></a><span class="dt">$abs_dir</span> = contract_name(<span class="ot">"</span><span class="dt">$cwd</span><span class="st">/</span><span class="ot">"</span>,<span class="dt">$top_item</span>);</span></code></pre></div>
<p>and since <code>contract_name</code> looks like this:</p>
<div class="sourceCode" id="cb12"><pre class="sourceCode perl"><code class="sourceCode perl"><span id="cb12-1"><a href="#cb12-1" aria-hidden="true" tabindex="-1"></a><span class="kw">sub </span><span class="fu">contract_name</span> {</span>
<span id="cb12-2"><a href="#cb12-2" aria-hidden="true" tabindex="-1"></a> <span class="kw">my</span> (<span class="dt">$cdir</span>,<span class="dt">$fn</span>) = <span class="dt">@_</span>;</span>
<span id="cb12-3"><a href="#cb12-3" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb12-4"><a href="#cb12-4" aria-hidden="true" tabindex="-1"></a> <span class="kw">return</span> <span class="fu">substr</span>(<span class="dt">$cdir</span>,<span class="dv">0</span>,<span class="fu">rindex</span>(<span class="dt">$cdir</span>,<span class="ot">'</span><span class="ss">/</span><span class="ot">'</span>)) <span class="kw">if</span> <span class="dt">$fn</span> <span class="ot">eq</span> <span class="dt">$File</span>::<span class="dt">Find</span>::<span class="dt">current_dir</span>;</span>
<span id="cb12-5"><a href="#cb12-5" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb12-6"><a href="#cb12-6" aria-hidden="true" tabindex="-1"></a> <span class="dt">$cdir</span> = <span class="fu">substr</span>(<span class="dt">$cdir</span>,<span class="dv">0</span>,<span class="fu">rindex</span>(<span class="dt">$cdir</span>,<span class="ot">'</span><span class="ss">/</span><span class="ot">'</span>)+<span class="dv">1</span>);</span>
<span id="cb12-7"><a href="#cb12-7" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb12-8"><a href="#cb12-8" aria-hidden="true" tabindex="-1"></a> <span class="dt">$fn</span> =~ <span class="ot">s|</span><span class="ch">^</span><span class="ot">\./||</span>;</span>
<span id="cb12-9"><a href="#cb12-9" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb12-10"><a href="#cb12-10" aria-hidden="true" tabindex="-1"></a> <span class="kw">my</span> <span class="dt">$abs_name</span>= <span class="dt">$cdir</span> . <span class="dt">$fn</span>;</span>
<span id="cb12-11"><a href="#cb12-11" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb12-12"><a href="#cb12-12" aria-hidden="true" tabindex="-1"></a> <span class="kw">if</span> (<span class="fu">substr</span>(<span class="dt">$fn</span>,<span class="dv">0</span>,<span class="dv">3</span>) <span class="ot">eq</span> <span class="ot">'</span><span class="ss">../</span><span class="ot">'</span>) {</span>
<span id="cb12-13"><a href="#cb12-13" aria-hidden="true" tabindex="-1"></a> <span class="dv">1</span> <span class="kw">while</span> <span class="dt">$abs_name</span> =~ <span class="ot">s!/</span><span class="ch">[^</span><span class="bn">/</span><span class="ch">]*</span><span class="ot">/\.\./</span><span class="ch">+</span><span class="ot">!</span><span class="st">/</span><span class="ot">!</span>;</span>
<span id="cb12-14"><a href="#cb12-14" aria-hidden="true" tabindex="-1"></a> }</span>
<span id="cb12-15"><a href="#cb12-15" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb12-16"><a href="#cb12-16" aria-hidden="true" tabindex="-1"></a> <span class="kw">return</span> <span class="dt">$abs_name</span>;</span>
<span id="cb12-17"><a href="#cb12-17" aria-hidden="true" tabindex="-1"></a>}</span></code></pre></div>
<p>we end up with the weird path. Of course, that path does not exist and <code>Module::Pluggable</code> fails to build & install.</p>
<p>Note:</p>
<div class="sourceCode" id="cb13"><pre class="sourceCode perl"><code class="sourceCode perl"><span id="cb13-1"><a href="#cb13-1" aria-hidden="true" tabindex="-1"></a><span class="kw">#!/usr/bin/env perl</span></span>
<span id="cb13-2"><a href="#cb13-2" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb13-3"><a href="#cb13-3" aria-hidden="true" tabindex="-1"></a><span class="fu">use</span> v5.<span class="dv">34</span>;</span>
<span id="cb13-4"><a href="#cb13-4" aria-hidden="true" tabindex="-1"></a><span class="fu">use</span> <span class="kw">strict</span>;</span>
<span id="cb13-5"><a href="#cb13-5" aria-hidden="true" tabindex="-1"></a><span class="fu">use</span> <span class="kw">warnings</span>;</span>
<span id="cb13-6"><a href="#cb13-6" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb13-7"><a href="#cb13-7" aria-hidden="true" tabindex="-1"></a><span class="fu">use</span> <span class="fu">File::Spec</span> ();</span>
<span id="cb13-8"><a href="#cb13-8" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb13-9"><a href="#cb13-9" aria-hidden="true" tabindex="-1"></a><span class="kw">my</span> <span class="dt">$up</span> = <span class="ot">'</span><span class="ss">C:/Users/user/.cpanm/work/$time.1592/Module-Pluggable-5.2/</span><span class="ot">'</span>;</span>
<span id="cb13-10"><a href="#cb13-10" aria-hidden="true" tabindex="-1"></a><span class="kw">my</span> <span class="dt">$wp</span> = <span class="ot">'</span><span class="ss">C:</span><span class="ch">\\</span><span class="ss">Users</span><span class="ch">\\</span><span class="ss">user</span><span class="ch">\\</span><span class="ss">.cpanm</span><span class="ch">\\</span><span class="ss">work</span><span class="ch">\\</span><span class="ss">$time.1592</span><span class="ch">\\</span><span class="ss">Module-Pluggable-5.2</span><span class="ch">\\</span><span class="ss">t</span><span class="ch">\\</span><span class="ss">lib</span><span class="ch">\\</span><span class="ss">MyOtherTest</span><span class="ch">\\</span><span class="ss">Plugin</span><span class="ot">'</span>;</span>
<span id="cb13-11"><a href="#cb13-11" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb13-12"><a href="#cb13-12" aria-hidden="true" tabindex="-1"></a><span class="fu">say</span> <span class="fu">File::Spec</span>->catfile(<span class="dt">$up</span>, <span class="fu">File::Spec</span>->abs2rel(<span class="dt">$wp</span>,<span class="dt">$up</span>));</span></code></pre></div>
<p>Running this code correctly produces:</p>
<pre class="text"><code>C:\Users\user\.cpanm\work\$time.1592\Module-Pluggable-5.2\t\lib\MyOtherTest\Plugin</code></pre>
<p>It is unclear to me why <code>File::Find</code> after decades avoids another core library <code>File::Spec</code>.</p>
<h2 id="what-are-the-next-steps">What are the next steps?</h2>
<ul>
<li><p>It is reasonable for <code>Module::Pluggable</code> to default to supporting symlinks unless overridden except that it seems to have been written with the assumption that the calling module knows what works for the user. It cannot. So, <code>Module::Pluggable</code> should allow the functionality to be overridden using an environment variable. Something like <code>PERL5_MODULE_PLUGGABLE_FOLLOW_SYMLINKS</code>. After all, this module is used to locate other code to load and organizations may justfiable want to disallow the functionality instead of having to audit every library/module/script that uses <code>Module::Pluggable</code>.</p></li>
<li><p>It is reasonable for <code>Test::V0</code> to <a href="https://github.com/Test-More/Test2-Suite/blob/baeb89caad240a427d84cc8be4f65ba4dafe6ed9/lib/Test2/Tools/Tester.pm#L8">use features provided through <code>Module::Pluggable</code></a> and probably appropriate for it to defer to <code>Module::Pluggable</code> for low level settings. Once again, however, since test suites might be running in interesting environments, it might reasonable to provide a mechanism for an organization to categorically override the ability to follow symlinks for loading code to be executed during the test run. I would recomment something like <code>PERL5_TEST2_SUITE_FOLLOW_SYMLINKS</code>.</p></li>
<li><p>It is not reasonable for a module that is as fundamental as <code>File::Find</code> to continue to pretend that the only computing environments in the world are just variations on a single flavor of Unix.</p></li>
<li><p>Above all, injecting fundamental changes in behavior in core modules with nary a new test in sight is not consistent with stability and backwards compatibility as priorities.</p></li>
</ul>
<p>PS: I am no longer on Reddit, but you can <a href="https://redd.it/pp89k5">discuss this post on r/perl</a></p>
</div>
</article>
Sinan UnurWho's testing the tests? The case of an interesting false negativetag:www.nu42.com,2021-09-14:/2021/09/who-is-testing-tests-false-negatives.html2021-09-14T15:45:00Z
<article class="h-entry article">
<div class="article-header-container">
<header>
<h1 class="p-name article-title">Who's testing the tests? The case of an interesting false negative</h1><h2 class="p-author article-author">A. Sinan Unur</h2><h3 class="article-published"><time datetime="2021-09-14T15:45:00Z" class="dt-published">September 14, 2021</time></h3>
</header>
</div>
<div class="article-content"><p>I still follow Perl releases and <a href="/2014/11/64-bit-perl-5201-with-visual-studio.html">build them from source using the most recent version of Microsoft Visual C</a>. Each new version of Perl brings in some improvements, interesting features, and some new curiosities to deal with.</p>
<p>After building <a href="https://metacpan.org/dist/perl/changes">5.34</a>, I tried to install <a href="https://metacpan.org/pod/JSON::MaybeXS">JSON::MaybeXS</a> which is a layer of indirection for picking among various Perl libraries for handling JSON files. I believe the community’s preferred library at this time is <a href="https://metacpan.org/pod/Cpanel::JSON::XS">CPanel::JSON::XS</a>. I use <a href="https://metacpan.org/pod/App::cpanminus">cpanm</a> to test & install a library and all its dependencies in one go.</p>
<p>So, I typed <code>cpanm JSON::MaybeXS</code> and went to get coffee. When I got back, I reviewed the log file and noticed something curious:</p>
<pre class="text"><code>Number found where operator expected at (eval 7) line 1, near "require threads::shared 1.21"
(Do you need to predeclare require?)
t\125_shared_boolean.t ..... skipped: no shared_clone)</code></pre>
<p>Hmmm?! “<em>Do you need to predeclare <code>require</code>?</em>” I shouldn’t. It is a <a href="https://perldoc.perl.org/perlfunc">Perl builtin</a>.</p>
<p>What is that string which is being <code>eval</code>ed?</p>
<div class="sourceCode" id="cb2"><pre class="sourceCode perl"><code class="sourceCode perl"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="fu">eval</span> <span class="ot">"</span><span class="st">require threads::shared 1.21</span><span class="ot">"</span></span></code></pre></div>
<p>Did Perl get the ability to <a href="https://perldoc.perl.org/functions/require"><code>require</code></a> modules with a version constraint and did I miss it?</p>
<p>Let’s look at the <a href="https://github.com/rurban/Cpanel-JSON-XS/blob/8f502994f7a14ada53b13fe548c8a8ccec498ed9/t/125_shared_boolean.t#L6">test code</a></p>
<div class="sourceCode" id="cb3"><pre class="sourceCode perl"><code class="sourceCode perl"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="kw">BEGIN</span> {</span>
<span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a> plan skip_all => <span class="ot">'</span><span class="ss">no threads</span><span class="ot">'</span> <span class="kw">if</span> !<span class="dt">$Config</span>{usethreads};</span>
<span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a> <span class="fu">eval</span> <span class="ot">"</span><span class="st">require threads::shared 1.21;</span><span class="ot">"</span>;</span>
<span id="cb3-4"><a href="#cb3-4" aria-hidden="true" tabindex="-1"></a> plan skip_all => <span class="ot">'</span><span class="ss">no shared_clone</span><span class="ot">'</span> <span class="kw">if</span> <span class="wa">$@</span>;</span>
<span id="cb3-5"><a href="#cb3-5" aria-hidden="true" tabindex="-1"></a> plan tests => <span class="dv">8</span>;</span>
<span id="cb3-6"><a href="#cb3-6" aria-hidden="true" tabindex="-1"></a>}</span></code></pre></div>
<p>Nope. It seems like we have a case of “<em>implementation by wishful thinking</em>”. We know that we can specify a version constraint when <a href="https://perldoc.perl.org/functions/use">use</a> a module:</p>
<pre class="text"><code>C:\> perl -e "use threads 3.14"
threads version 3.14 required--this is only version 2.26 at -e line 1.
BEGIN failed--compilation aborted at -e line 1.</code></pre>
<p>That is equivalent to <code>BEGIN { require threads; threads->VERSION(3.14) }</code>.</p>
<p><code>require</code> does <em>not</em> support <code>require MODULE VERSION</code>. The only thing one use is <code>require VERSION</code> which is a runtime check that the <code>perl</code> running your script is at least as recent as <code>VERSION</code>.</p>
<p>Anyone could have made this mistake. I sometimes find myself wishing Perl (and other languages) could load multiple distinct versions of a library at the same time so that library <code>X</code> which requires library <code>Z</code> to be older than version 5, and library <code>Y</code> which requires <code>Z</code>to be newer than 7 can work within the same program. Then I think about all the implications for ossification of outdated crap and give up wishing that and deal with reality.</p>
<p>Knowing that <code>use</code> supports the syntax <code>use MODULE VERSION</code>, knowing that <code>use</code> is equivalent to <code>require</code> and <code>import</code> in a <code>BEGIN</code> block, it is natural to have a short-circuit in the brain which leads to one assuming <code>require MODULE VERSION</code> also works and does the thing we want.</p>
<p>I am more interested in how such a mistake makes it into the released version as that has implications for the kinds of SDLC, review process, etc I prefer. As I discussed yesterday, <em>false negatives</em>, tests that pass even though the code is bad, are a burden and timebombs waiting to blow up at the worst possible time. They can remain hidden for a long time and they can stop your momentum in its tracks when some unrelated change now causes these tests to start failing. False negatives tend to be overlooked because, by definition, they do not prevent delivery at the time they are injected into the codebase. Even though the problems foregone cannot be explicitly accounted for, it is worth taking small extra steps to make sure they are not injected in the codebase.</p>
<p>So, let’s look at the <a href="https://github.com/rurban/Cpanel-JSON-XS/commit/1f8971c0f2ea54d22b1d26fee4c8b39e1f79a72f">commit</a> which introduced this statement. The <a href="https://github.com/rurban/Cpanel-JSON-XS/issues/170">associated issue</a> was discussed among three developers are experienced in Perl. I am not sure the actual commit was reviewed before being merged as there does not seem to be an associated PR.</p>
<p>One might ask why should we care. After all, the tests passed, the module got installed, my script ran.</p>
<p>The problem is that the tests in <code>t/125_shared_boolean_t</code> are presumably there for a reason. Failing to notice that the <code>eval</code>ed code was throwing not because the requisite version of <code>threads::shared</code> was not installed, but because the argument to <code>eval</code> was incorrect Perl means if the tested for functionality stopped working, we’d never notice. If then we run into problems in production that are hard to track, we would be making incorrect assumptions about tests passing and this blindness might even lengthen our downtime.</p>
<p>So, is there a simple rule of thumb that would have prevented this problem?</p>
<p>Sure: Avoid the string form of <code>eval</code> except in a few specific circumstances.</p>
<p>If, instead of</p>
<div class="sourceCode" id="cb5"><pre class="sourceCode perl"><code class="sourceCode perl"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a><span class="fu">eval</span> <span class="ot">"</span><span class="st">require threads::shared 1.21;</span><span class="ot">"</span>;</span></code></pre></div>
<p>we had used</p>
<div class="sourceCode" id="cb6"><pre class="sourceCode perl"><code class="sourceCode perl"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a><span class="fu">eval</span> { <span class="fu">require</span> <span class="fu">threads::shared</span> <span class="fl">1.21</span> };</span></code></pre></div>
<p>the <code>nmake test</code> would have failed:</p>
<pre class="text"><code>Test Summary Report
-------------------
t\125_shared_boolean.t (Wstat: 512 Tests: 0 Failed: 0)
Non-zero exit status: 2
Parse errors: No plan found in TAP output
Files=60, Tests=2844, 14 wallclock secs ( 0.39 usr + 0.16 sys = 0.55 CPU)
Result: FAIL</code></pre>
<p>The message in the log remains similar:</p>
<pre class="text"><code>t\125_shared_boolean.t ..... Number found where operator expected at t\125_shared_boolean.t line 9, near "require threads 1.21"
(Do you need to predeclare require?)
syntax error at t\125_shared_boolean.t line 9, near "require threads 1.21"
BEGIN not safe after errors--compilation aborted at t\125_shared_boolean.t line 13.
t\125_shared_boolean.t ..... Dubious, test returned 2 (wstat 512, 0x200)</code></pre>
<p>but now the developer cannot fail to notice it because testing fails instead of skipping tests.</p>
<p>Note that the problem was not surfaced <a href="https://travis-ci.org/github/rurban/Cpanel-JSON-XS/jobs/724266224#L483">on TravisCI</a> because none of the <code>perl</code>s used supported <code>threads</code> in the first place and <a href="https://ci.appveyor.com/project/rurban/cpanel-json-xs/builds/35037308/job/32fmeyhqhuocpnyk#L983">on AppVeyor</a> because tests were silently skipped instead of failing.</p>
<p>I submitted a <a href="https://github.com/rurban/Cpanel-JSON-XS/pull/185/files">proposed fix</a>.</p>
<p>PS: I am no longer on Reddit, but you can <a href="https://redd.it/ponulf">discuss this post on r/perl</a></p>
</div>
</article>
Sinan UnurSome principles of unit testingtag:www.nu42.com,2021-09-13:/2021/09/principles-of-unit-testing.html2021-09-13T14:35:00Z
<article class="h-entry article">
<div class="article-header-container">
<header>
<h1 class="p-name article-title">Some principles of unit testing</h1><h2 class="p-author article-author">A. Sinan Unur</h2><h3 class="article-published"><time datetime="2021-09-13T14:35:00Z" class="dt-published">September 13, 2021</time></h3>
</header>
</div>
<div class="article-content"><p>Much pontification exists when it comes to unit testing. Developers get introduced to the idea of <abbr title="Test Driven Development">TDD</abbr> with simple code such as testing a method that adds two numbers etc so that they can learn the mechanics, but such motivators obscure away the very real tradeoffs that arise from the fact that tests are also code.</p>
<p><a href="/2015/05/who-is-testing-the-tests.html">No one writes code that tests the tests</a>. When faced with complicated, hard to test code and behavior they do not undersand fully, developers tend to fall into the <a href="/2017/02/deception-in-tests-harmful.html">self-deception trap</a> so as to keep tests passing instead of investing time in refactoring. Sometimes, tests are written but <a href="/2015/11/tests-never-ran.html">are never exercised</a> due to unforeseen reasons and no one notices the fact because everyone assumes a green check next to a commit means things are allright! Sometimes, tests do not actually test code, but <a href="/2016/12/cpp-boost-median-test.html">whether computers can do arithmetic</a>. And, sometimes, the reason your test suite takes a long time to run is because someone thought it was a good idea to run <a href="/2015/08/fix-2950-test-failures.html">thousands of tests that will either all fail or all pass</a>. Occasionally, your deployment pipeline grinds to a halt because the tests are <a href="/2018/03/dont-complicate-things.html">overly-complicated</a>.</p>
<p>No one writes perfect code. Software development is change management: As external conditions change, code needs to continue to work and that’s why we benefit from having unit tests: First, they help codify our understanding of how the code should behave. Second, they help us feel confident that our changes will not break behavior on which we depend.</p>
<p>So, how do we avoid chasing our tails in a neverending cycle of testing the code tests the tests that test the code etc.</p>
<p>There is no perfect solution, but I will share a few principles that have helped. I have not seen the motivations for TDD explained this way. If TDD is presented of as a rigid set of rules instead of helping developers understand the motivations, you end up with tests that are hard to maintain and are flaky.</p>
<h3 id="understand-false-positives-and-false-negatives">Understand false positives and false negatives</h3>
<p>When tests are run, there are <em>four</em> (not two) possible outcomes:</p>
<div class="thumb">
<table style="width:450px">
<thead>
<tr>
<td width="33.33333%"></td>
<th id="tests-pass" style="width:33.33333%;padding:0.5em">Tests pass</th>
<th id="tests-fail" style="width:33.33333%;padding:0.5em">Tests fail</th>
</tr>
</thead>
<tbody>
<tr>
<th id="good-code" style="padding:0.5em">Code is good</th>
<td headers="good-code tests-pass">True Negative<sup>(1)</sup></td>
<td headers="good-code tests-fail">False Positive<sup>(2)</sup></td>
</tr>
<tr>
<th id="bad-code" style="padding:0.5em">Code is bad</th>
<td headers="bad-code tests-pass">False Negative<sup>(3)</sup></td>
<td headers="bad-code tests-fail">True Positive<sup>(4)</sup></td>
</tr>
</tbody>
</table>
</div>
<p>Cases (1) and (4) correspond to true negatives and true positives. If your code and test suite never exhibit any other behavior, congratulations.</p>
<p>The third scenario causes wasted resources: When tests fail, you have to investitage the reasons. Deployment pipelines stall. Since developers are people and people in general do not want to be put in a position of having ignored warnings. Different teams deal with this in different ways. A rather counterproductive strategy I’ve seen is to say “we expect 10% of tests to fail” or some such. Why include such “tests” in the first place? Bonus points if the subset of tests that fail changes with every deployment.</p>
<p>The final scenario is no less insidious. When bad code is deployed because tests are written to pass, problems manifest themselves in production in the form of errors that hard to pin down, unexpected downtime, or worse.</p>
<p>This is where TDD is really helpful. By writing tests before you write the code that implements functionality, you know that the test failure you get there is a true positive: The test is failing because there is no code to provide the required behavior. When you then write some code and the test passes, you know that tests are passing because of the code you wrote and not the other way around.</p>
<p>This doesn’t mean the code is not buggy. However, it does get you started on the path of writing a set of tests that fully specify the expected behavior of a piece of code. In most cases, we start writing code with an imperfect understanding of the world with which the code will be expected to interact. In fact, even if we knew everything perfectly on Monday, by Friday, an external change might invalidate one of our assumptions and we might need to specify a new constraint which must be satisfied by the code. Being able have the contract build upon strong foundations of known valid statements improves the maintainability of the code and reduces the amount of information developers must keep in their heads when making changes.</p>
<h3 id="write-simple-tests">Write simple tests</h3>
<p><a href="https://www.cs.princeton.edu/~bwk/">Brian Kernighan</a> said:</p>
<blockquote>
<p>Everyone knows that debugging is twice as hard as writing a program in the first place. So if you’re as clever as you can be when you write it, how will you ever debug it?</p>
</blockquote>
<p>A similar principle applies to tests. Since tests codify the expected behavior of a piece of code, it is best to be able to state that expected behavior as clearly and directly as possible. Otherwise, confusion ensues. Developers cannot see the how changes in code relate to outcomes of tests. If you find yourself writing a couple of hundred lines to create the appropriate pre-conditions for a test, realize that that time can be much better spent refactoring the code under test so testing it does not require so much set up.</p>
<p>It is far more preferable to have clear, concise, easily reviewable tests that offer partial coverage than to have 100% coverage which requires baroque code to have them run. That code will not have tests. It will be hard to assure yourself that tests are passing for the right reasons. Of course, do try to increase coverage over time, but sometimes you can’t get there immediately. All team members should be aware of where coverage is lacking and try to increase it, but don’t accept the Faustian bargain of checking in unmaintainable spaghettic mocking code for increased coverage stats.</p>
<h3 id="unit-tests-are-not-uptime-indicators-for-remote-services">Unit tests are not uptime indicators for remote services</h3>
<p>Unit tests should not reach outside of the environment (machine, container, jail) in which they are running. This includes network access. If you want to ascertain that your app can correctly handle a response from another API, your tests do not need to reach out to the service. Ideally, you separate the thing that makes the request and the thing that deals with the response and you can invoke the handler with just a response object you have constructed. Dynamic languages such as Perl, Python, Ruby, JavaScript make this rather easy. Otherwise, I prefer relying on interfaces rather than class hierarchies.</p>
<p>Unit tests that try to hit other services over the internet reduce the security of your testing and deployment infrastructure. On a purely practical level, checking if a remote service on which your app depends on is not the job of the unit tests. Downtime is a fact of life and <em>apps</em> that need to work with services have to be able handle that. Even code testing this functionality should not be making requests to the real service, however, as you cannot expect to be able to simulate downtime conditions on actual services.</p>
<p>Further, does you organization really want to tell outsiders every time you are trying out some new code?</p>
<h3 id="always-test-whether-the-code-compiles-the-module-loads-and-the-expected-methods-exist">Always test whether the code compiles, the module loads, and the expected methods exist</h3>
<p>This is much more important in the case dynamic languages where simple typos may remain undiscovered until the right call chain happens. However, even in C++, it is useful to ensure that each test run starts with a clean slate so as to catch stupid simple problems early on. In scripting languages where the chances of catching <code>doodad.frobncate</code> when you had intended to invoke <code>frobnicate</code> on the <code>doodad</code> without actually running the code, this becomes more important.</p>
<p>Also, even, or especially, when working with languages without a concept of interfaces, codify the expected interfaces in tests. With Ruby, you can use <a href="https://relishapp.com/rspec/rspec-expectations/docs/built-in-matchers/respond-to-matcher"><code>respond_to</code></a>. With Perl, you can use <a href="https://metacpan.org/pod/Test2::Tools::Class#can_ok($thing,-@methods)"><code>can_ok</code></a>. In Python, depending on the situation, a combination of <a href="https://docs.python.org/3/library/functions.html#getattr"><code>getattr</code></a> with handling <a href="https://docs.python.org/3/library/exceptions.html#AttributeError"><code>AttributeError</code></a> should help. When you are faced with a pages long backtrace in production that ends with the equivalent of “<em>method not found</em>”, it helps to know that the methods that are supposed to exist do exist and therefore the problem must be near the invocation not the definition.</p>
<p>This is not a panacea, but given that we all make “stupid” mistakes sometimes, it is useful to rule out whole classes of them at the outset.</p>
<h2 id="finally">Finally</h2>
<p>We exist in an imperfect world. It is fun to strive for perfection in our <a href="https://www.bryanbraun.com/checkboxland/">hobby projects</a>, but when solving business problems in a dynamic environment with preexisting codebases and processes, tradeoffs must be made. By sticking with a simple set of principles rather than rigid processes, we ought to be able to make better tradeoffs more often than not. Despite all the glossy advertising for various fashionable “methodologies”, that’s really the best one can hope for.</p>
<p>PS: I am no longer on Reddit, but you can <a href="https://redd.it/ponvaq">discuss this post on r/perl</a></p>
</div>
</article>
Sinan UnurUse your own WiFi connection test server in Windowstag:www.nu42.com,2021-09-13:/2021/09/use-your-own-wifi-connection-test.html2021-09-13T14:35:00Z
<article class="h-entry article">
<div class="article-header-container">
<header>
<h1 class="p-name article-title">Use your own WiFi connection test server in Windows</h1><h2 class="p-author article-author">A. Sinan Unur</h2><h3 class="article-published"><time datetime="2021-09-13T14:35:00Z" class="dt-published">September 13, 2021</time></h3>
</header>
</div>
<div class="article-content"><p><em><strong>This post describes changes to the <a href="https://docs.microsoft.com/en-us/troubleshoot/windows-server/performance/windows-registry-advanced-users">Windows registry</a>. Here is Microsoft’s warning about that:</strong></em></p>
<div style="background:#fadaba;font-size:80%;padding:1em">
<b title="You've been warned">⚠️ Warning</b>
<blockquote>
<p>Serious problems might occur if you modify the registry incorrectly by using Registry Editor or by using another method. These problems might require that you reinstall the operating system. Microsoft cannot guarantee that these problems can be solved. Modify the registry at your own risk.</p>
</blockquote>
</div>
<p>Please keep that in mind and do not try any of this unless you are prepared to solve whatever problems you might experience.</p>
<p>Sometimes, things work for the wrong reasons. I had been having problems with my T400’s WiFi connection for a long time. I recently decided to <a href="https://www.nu42.com/2021/07/upgrade-intel-7260-mpe-ax3000h.html">replace</a> the Intel 7260 AC <a title="Amazon affiliate link" href="https://amzn.to/3krxI2m">with a WiFi6 card I found on Amazon</a>. For the first week, I lived with the illusion that there must have been something wrong with either the old card or the no-longer-updated drivers for it, because my problems went away immediately.</p>
<p>Then, I realized that I had forgotten to change the DNS server settings for the new card! <em>Sigh</em>. Upon reverting the custom DNS settings, the flakiness returned. The card would connect, go through the motions, then fail to detect a connection in a loop. Eventually, Windows would turn it off. I would reboot. Sometimes though, it would latch on just fine and be fine for days. By this time, I had tried all sorts of weird ideas including messing with the OpenWRT firmware on the router to provide more diagnostics but I had nothing. The <code>$WORK</code> laptop on which I could not change any settings worked extremely reliably, while my ancient Frankenstein’s monster of a <a href="https://www.nu42.com/2016/12/another-one-bites-the-dust.html">T400</a> would struggle. It wasn’t like the thing had never worked: The <a href="https://www.nu42.com/2015/06/upgrade-laptop-wireless-intel-7260.html">7260</a> was able to give me download speeds that matched the cable internet provider’s advertised download speeds and I was happy with it for a long time.</p>
<p>The fact that the new card got flaky with my custom DNS settings gave me a hint that I should look away from network drivers, antenna connections, OpenWRT Linux kernel versions etc. I tried a bunch of permutations of keywords which landed me on <a href="https://superuser.com/a/277964/2077">this answer by Jeff Atwood</a> from 2011. A light bulb went off immediately.</p>
<p>I put the cart before the horse, searched for a cute domain name related to “WiFi” and “Check”, bought it, and set up a simple vhost for <a href="https://httpd.apache.org">Apache</a> and fiddled with the registry entries:</p>
<div class="thumb"><a href="windows-10-registry-nla-connect-test-params.png"><img src="https://www.nu42.comwindows-10-registry-nla-connect-test-params.png" width="400" alt="[ Screenshot of registry settings for customizing Windows NLA connection test ]"></a></div>
<p>Windows establishes whether a <a href="https://docs.microsoft.com/en-us/previous-versions/windows/it-pro/windows-vista/cc766017(v=ws.10)?redirectedfrom=MSDN#BKMK_How">network connection has internet connectivity</a> by making a DNS lookup followed by an <code>HTTP</code> request. One option is to disable this check. This becomes problematic if you expect to be able to deal with captive portals.</p>
<p>So, I went with the other option. I put some custom text in a file and set the value of <code>ActiveWebProbeHost</code> my cute new <code>.com</code> and the value of <code>ActiveWebProbePath</code> to <code>success.txt</code>. Then, I put a file called <code>success.txt</code> in the <code>vhosts</code>’s <code>htdocs</code> directory with the exact contents of <code>ActiveWebProbeContent</code>. Note that the comparison is exact and Windows does not trim leading/trailing whitespace, so the file must not have a newline at the end. I used one of the very reasonably priced and rather well-provisioned personal <a title="SSDNodes affiliate link" href="https://www.ssdnodes.com/manage/aff.php?aff=139">SSDNodes VPS</a> I keep around for these kinds of odds and ends.</p>
<p>And, that is it. Absolutely no connection problems since then. After trying it out for a week or so, I decided to roll this out to all the family computers I can.</p>
<p>As a minor side benefit, you shut down one avenue through which you are tracked on the intarwebs, but that is really minor because Windows and other software installed on your computer talk about you all the time.</p>
<p>Interestingly, Google has already indexed my server and the customized connection test string is already recorded in its cache.</p>
</div>
</article>
Sinan UnurUTF-8 everywhere and command line argument expansion on Windowstag:www.nu42.com,2021-07-22:/2021/07/windows-utf-8-everywhere-argv.html2021-07-22T23:05:00+00:00
<article class="h-entry article">
<div class="article-header-container">
<header>
<h1 class="p-name article-title">UTF-8 everywhere and command line argument expansion on Windows</h1><h2 class="p-author article-author">A. Sinan Unur</h2><h3 class="article-published"><time datetime="2021-07-22T23:05:00+00:00" class="dt-published">July 22, 2021</time></h3>
</header>
</div>
<div class="article-content"><p><a href="https://utf8everywhere.org/">UTF-8 Everywhere</a> is a good idea.</p>
<p>In particular, see their <a href="https://utf8everywhere.org/#windows">advice on how to do text on Windows</a>. It is possible to follow their advice manually.</p>
<p>This morning, I thought of a utility I could write very easily in any scripting language, but decided I would implement it in <a href="https://en.cppreference.com/w/cpp/compiler_support#C.2B.2B17_core_language_features">modernish C++</a>. In writing the utility, I thought I should take advantage of the standalone version of <a href="https://github.com/boostorg/nowide">boost::nowide</a> so as to minimize the amount of code I’d need to write to make sure it could handle command line arguments including fancy characters in both Windows and *nixy environments.</p>
<p>One of the facilities this library provides is <a href="https://www.boost.org/doc/libs/master/libs/nowide/doc/html/classboost_1_1nowide_1_1args.html">nowide::args</a>. It “<em>temporarily replaces standard main() function arguments with their equal, but UTF-8 encoded values under Microsoft Windows for the lifetime of the instance.</em>”</p>
<blockquote>
<p>The class uses <a href="https://docs.microsoft.com/en-us/windows/win32/api/processenv/nf-processenv-getcommandlinew"><code>GetCommandLineW()</code></a>, <a href="https://docs.microsoft.com/en-us/windows/win32/api/shellapi/nf-shellapi-commandlinetoargvw"><code>CommandLineToArgvW()</code></a> and <a href="https://docs.microsoft.com/en-us/windows/win32/api/processenv/nf-processenv-getenvironmentstringsw"><code>GetEnvironmentStringsW()</code></a> in order to obtain Unicode-encoded values. It does not relate to actual values of argc, argv and env under Windows.</p>
</blockquote>
<p>This is not wrong <em>per se</em>, but it interacts badly with another dimension of handling command line arguments on Windows: <code>cmd.exe</code> does not do glob expansion. Instead, if you want <code>prog *.txt</code> to give you <code>file1.txt</code>, <code>file2.txt</code>, etc in <code>argv</code>, you need to explicitly link with <a href="https://docs.microsoft.com/en-us/cpp/c-language/expanding-wildcard-arguments?view=msvc-160"><code>setargv.obj</code></a> or <a href="https://docs.microsoft.com/en-us/cpp/c-language/expanding-wildcard-arguments?view=msvc-160"><code>wsetargv.obj</code></a>. That way, the runtime sets up an expanded <code>argv</code> using either the OEM charset or the “Unicode” charset depending on whether the program has a <code>main</code> or <code>wmain</code>.</p>
<p>Since <code>boost::nowide::args</code> bypasses the actual <code>argv</code>, but instead reparses the “Unicode” version of the command line as originally given, it is oblivious to the now expanded arguments. Since there is no Win32 API function you can call to the filename expansion on the result of <code>CommandLineToArgvW()</code> (at least, I could not find it), this means the Windows version of my utility will need to have a <code>wmain</code> instead of <code>main</code>.</p>
<p>I’ve written about <a href="/2017/02/unicode-windows-command-line.html">fixing this in MoarVM</a> a few years ago and <a href="https://github.com/MoarVM/MoarVM/pull/528/files">submitted a PR</a>. When I first read about <code>boost::nowide::args</code>, I thought it was going to help me avoid the need to engage in various contortions. Unfortunately, it seems like if you do want file name expansion in command line arguments, you cannot use <code>boost::nowide::args</code> (or its standalone equivalent).</p>
<p>It sure is not rocket surgery, but disappointing nevertheless.</p>
<p>I am going to include a few examples to illustrate the problems I mentioned here.</p>
<h2 id="no-filename-expansion-in-cmd">No filename expansion in <code>cmd</code></h2>
<p>Consider the following C program:</p>
<div class="sourceCode" id="cb1"><pre class="sourceCode C"><code class="sourceCode c"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="pp">#include </span><span class="im"><stdio.h></span></span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a><span class="dt">int</span> main(<span class="dt">int</span> argc, <span class="dt">char</span> *argv[])</span>
<span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a>{</span>
<span id="cb1-5"><a href="#cb1-5" aria-hidden="true" tabindex="-1"></a> <span class="cf">for</span> (<span class="dt">int</span> i = <span class="dv">1</span>; i < argc; ++i)</span>
<span id="cb1-6"><a href="#cb1-6" aria-hidden="true" tabindex="-1"></a> {</span>
<span id="cb1-7"><a href="#cb1-7" aria-hidden="true" tabindex="-1"></a> puts(argv[i]);</span>
<span id="cb1-8"><a href="#cb1-8" aria-hidden="true" tabindex="-1"></a> }</span>
<span id="cb1-9"><a href="#cb1-9" aria-hidden="true" tabindex="-1"></a> <span class="cf">return</span> <span class="dv">0</span>;</span>
<span id="cb1-10"><a href="#cb1-10" aria-hidden="true" tabindex="-1"></a>}</span></code></pre></div>
<p>Compile it using:</p>
<pre class="text"><code>C:\Temp> cl t.c</code></pre>
<p>and now run it in <code>cmd</code>:</p>
<pre class="text"><code>C:\Temp> t t.*
t.*</code></pre>
<p>Now, open a <a href="https://cygwin.com/">Cygwin</a> or <a href="https://git-scm.com/download/win">Git</a> Bash shell and try again without re-compiling:</p>
<pre class="text"><code>$ ./t t.*
t.c
t.c.swp
t.exe
t.obj</code></pre>
<h2 id="link-with-setargvobj-for-filename-expansion">Link with <code>setargv.obj</code> for filename expansion</h2>
<p>Now, let’s recompile:</p>
<pre class="text"><code>C:\Temp> cl t.c /link setargv.obj</code></pre>
<p>and try again in <code>cmd.exe</code>:</p>
<pre class="text"><code>C:\Temp> t t.*
t.c
t.c.swp
t.exe
t.obj</code></pre>
<h2 id="cant-handle-funny-characters">Can’t handle “funny” characters</h2>
<p>In <code>cmd</code>:</p>
<pre class="text"><code>C:\Temp> dir /b k*
kârlı.txt
C:\Temp> t k*
kΓrli.txt</code></pre>
<h2 id="no-file-name-expansion-with-nowideargs">No file name expansion with <code>nowide::args</code></h2>
<p>Let’s try this minimal program:</p>
<div class="sourceCode" id="cb8"><pre class="sourceCode cpp"><code class="sourceCode cpp"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a><span class="pp">#include </span><span class="im"><nowide/args.hpp></span></span>
<span id="cb8-2"><a href="#cb8-2" aria-hidden="true" tabindex="-1"></a><span class="pp">#include </span><span class="im"><nowide/iostream.hpp></span></span>
<span id="cb8-3"><a href="#cb8-3" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb8-4"><a href="#cb8-4" aria-hidden="true" tabindex="-1"></a><span class="dt">int</span></span>
<span id="cb8-5"><a href="#cb8-5" aria-hidden="true" tabindex="-1"></a>main(<span class="dt">int</span> argc, <span class="dt">char</span>* argv[])</span>
<span id="cb8-6"><a href="#cb8-6" aria-hidden="true" tabindex="-1"></a>{</span>
<span id="cb8-7"><a href="#cb8-7" aria-hidden="true" tabindex="-1"></a> nowide::args a(argc, argv);</span>
<span id="cb8-8"><a href="#cb8-8" aria-hidden="true" tabindex="-1"></a> nowide::cout << <span class="st">"With 'nowide::args'</span><span class="sc">\n</span><span class="st">"</span>;</span>
<span id="cb8-9"><a href="#cb8-9" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb8-10"><a href="#cb8-10" aria-hidden="true" tabindex="-1"></a> <span class="cf">for</span> (<span class="dt">int</span> i = <span class="dv">1</span>; i < argc; ++i) {</span>
<span id="cb8-11"><a href="#cb8-11" aria-hidden="true" tabindex="-1"></a> nowide::cout << argv[i] << <span class="ch">'</span><span class="sc">\n</span><span class="ch">'</span>;</span>
<span id="cb8-12"><a href="#cb8-12" aria-hidden="true" tabindex="-1"></a> }</span>
<span id="cb8-13"><a href="#cb8-13" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb8-14"><a href="#cb8-14" aria-hidden="true" tabindex="-1"></a> <span class="cf">return</span> <span class="dv">0</span>;</span>
<span id="cb8-15"><a href="#cb8-15" aria-hidden="true" tabindex="-1"></a>}</span></code></pre></div>
<p>Compile using:</p>
<p><code>cl /EHsc /DUNICODE /D_UNICODE /MD /Ic:\...\opt\include t.cpp /link setargv.obj c:\...\opt\lib\nowide.lib Shell32.lib</code></p>
<p>In <code>cmd</code>:</p>
<pre class="text"><code>After 'nowide::args'
k*</code></pre>
<p>In <code>bash</code>:</p>
<pre class="text"><code>$ ./t k*
After 'nowide::args'
kârlı.txt</code></pre>
<p>Let’s make a simple modification by deleting the instantiation of the <code>nowide::args</code> object:</p>
<div class="sourceCode" id="cb11"><pre class="sourceCode cpp"><code class="sourceCode cpp"><span id="cb11-1"><a href="#cb11-1" aria-hidden="true" tabindex="-1"></a><span class="pp">#include </span><span class="im"><nowide/args.hpp></span></span>
<span id="cb11-2"><a href="#cb11-2" aria-hidden="true" tabindex="-1"></a><span class="pp">#include </span><span class="im"><nowide/iostream.hpp></span></span>
<span id="cb11-3"><a href="#cb11-3" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb11-4"><a href="#cb11-4" aria-hidden="true" tabindex="-1"></a><span class="dt">int</span></span>
<span id="cb11-5"><a href="#cb11-5" aria-hidden="true" tabindex="-1"></a>main(<span class="dt">int</span> argc, <span class="dt">char</span>* argv[])</span>
<span id="cb11-6"><a href="#cb11-6" aria-hidden="true" tabindex="-1"></a>{</span>
<span id="cb11-7"><a href="#cb11-7" aria-hidden="true" tabindex="-1"></a> nowide::cout << <span class="st">"Without 'nowide::args'</span><span class="sc">\n</span><span class="st">"</span>;</span>
<span id="cb11-8"><a href="#cb11-8" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb11-9"><a href="#cb11-9" aria-hidden="true" tabindex="-1"></a> <span class="cf">for</span> (<span class="dt">int</span> i = <span class="dv">1</span>; i < argc; ++i) {</span>
<span id="cb11-10"><a href="#cb11-10" aria-hidden="true" tabindex="-1"></a> nowide::cout << argv[i] << <span class="ch">'</span><span class="sc">\n</span><span class="ch">'</span>;</span>
<span id="cb11-11"><a href="#cb11-11" aria-hidden="true" tabindex="-1"></a> }</span>
<span id="cb11-12"><a href="#cb11-12" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb11-13"><a href="#cb11-13" aria-hidden="true" tabindex="-1"></a> <span class="cf">return</span> <span class="dv">0</span>;</span>
<span id="cb11-14"><a href="#cb11-14" aria-hidden="true" tabindex="-1"></a>}</span></code></pre></div>
<p>Compile using the same command line and run in <code>cmd</code>:</p>
<pre class="text"><code>C:\Temp> t k*
Without 'nowide::args'
k�li.txt</code></pre>
<p>So, why do we want to use <code>nowide::args</code> anyway? Simple:</p>
<pre class="text"><code>C:\Temp> t kârlı.txt
Without 'nowide::args'
k�li.txt</code></pre>
<p>whereas:</p>
<pre class="text"><code>C:\Temp> t kârlı.txt
With 'nowide::args'
kârlı.txt</code></pre>
<h2 id="conclusion">Conclusion</h2>
<p>I want the utility I am writing to both handle filenames containing non-OEM characters <em>and</em> have the benefit of file name expansion in command line arguments. Therefore, I can’t take advantage of <code>nowide::args</code> and will need to ensure the entry point for the Windows version is <code>wmain</code> and will need to handle the UTF-8 encoding of <code>argv</code> myself.</p>
</div>
</article>
Sinan UnurUpgrading from Intel 7260AC to MPE-AX3000Htag:www.nu42.com,2021-07-18:/2021/07/windows-c-time-in-nanoseconds.html2021-07-18T22:45:00-00:00
<article class="h-entry article">
<div class="article-header-container">
<header>
<h1 class="p-name article-title">Upgrading from Intel 7260AC to MPE-AX3000H</h1><h2 class="p-author article-author">A. Sinan Unur</h2><h3 class="article-published"><time datetime="2021-07-18T22:45:00-00:00" class="dt-published">July 18, 2021</time></h3>
</header>
</div>
<div class="article-content"><p>I had first installed this Intel 7260AC card in my <a href="/2015/06/upgrade-laptop-wireless-intel-7260.html">old Lenovo 300 N100</a>. When <a href="https://www.nu42.com/2016/12/another-one-bites-the-dust.html">that died</a>, some of its internals, including this WiFi card, moved to the T400 I got on Ebay. The card served me well for a while through Windows 8 and 10 upgrades, different router + OpenWRT combinations etc. But, recently, with drivers no longer being updated, I started noticing issues. The card would frequently go into fits, repeatedly appearing and disappearing in device manager, causing me all sorts of annoyances. Given that the T400 can only use mini PCIe cards, I was not hopeful that I could find a decent replacement.</p>
<p>Eventually, I did decide to take a chance on something called <a href="https://amzn.to/3krxI2m" title="Amazon affiliate link">“MPE-AX3000H Dual Band WiFi 6 Card 802.11ax Wireless Half Mini PCI-E WiFi Card”</a> on Amazon. I am glad I did.</p>
<p>Given that I lose a screw or two every time I open this laptop, I decided to do also go ahead and put the <a href="https://ark.intel.com/content/www/us/en/ark/products/39312/intel-core-2-duo-processor-t9900-6m-cache-3-06-ghz-1066-mhz-fsb.html">T9900</a> that I had picked up at a very decent price on Ebay some months ago. In addition, I had been meaning to put in a new <a href="https://amzn.to/2VXHi2L" title="Affiliate link">BIOS battery</a> given that the original was about 13 years old at this point.</p>
<p>All went really well. I am using the <a href="https://downloadcenter.intel.com/product/189347/Intel-Wi-Fi-6-AX200-Gig-">latest drivers from Intel</a> for the WiFi card, the laptop immediately recognized the new CPU, and weird BIOS issues at boot time are gone. These were all shots in the dark, but they worked :-)</p>
</div>
</article>
Sinan UnurOn Windows, how do you get time in nanoseconds in C?tag:www.nu42.com,2021-07-18:/2021/07/windows-c-time-in-nanoseconds.html2021-07-18T12:45:00-00:00
<article class="h-entry article">
<div class="article-header-container">
<header>
<h1 class="p-name article-title">On Windows, how do you get time in nanoseconds in C?</h1><h2 class="p-author article-author">A. Sinan Unur</h2><h3 class="article-published"><time datetime="2021-07-18T12:45:00-00:00" class="dt-published">July 18, 2021</time></h3>
</header>
</div>
<div class="article-content"><p>I recently decided to go back to dabbling in <a href="https://ocaml.org/">OCaml</a>. The easier path would have been to install it in a Linux VM and play with it there, but I decided to build it from source to use Microsoft’s C compiler. The <a href="https://github.com/ocaml/ocaml/blob/trunk/README.win32.adoc#compilation-from-the-sources">instructions</a> are clear and easy to follow. The advantage of compiling with <code>cl</code> is that while the actual build does need the Cygwin tools, the resulting binaries have no restrictions on them as they are not linked with the Cygwin DLL.</p>
<p>After <code>ocaml</code> itself was built and installed, it was <code>opam</code> next. Similarly, build and install were uneventful, made easy by the availability of excellent instructions.</p>
<p>Once <code>opam</code> was installed and initialized, I tried <code>opam install core</code> to install <a href="https://opam.ocaml.org/packages/core/">JaneStreet’s Core</a> library. The installation failed because <code>core_kernel</code> depends on <code>time_now</code> and <code>time_now</code> does not compile with MS Visual C:</p>
<pre class="text"><code> * install time_now v0.14.0
<><> Processing actions <><><><><><><><><><><><><><><><><><><><><><><><><><><><>
▼ retrieved time_now.v0.14.0 (cached)
[ERROR] The compilation of time_now.v0.14.0 failed at "dune build -p time_now -j 1".
#=== ERROR while compiling time_now.v0.14.0 ===================================#
# context 2.1.0~rc2 | win32/x86_64 | ocaml.4.12.0 | https://opam.ocaml.org#6609b442
# path ~\.opam\default\.opam-switch\build\time_now.v0.14.0
# command ~\.opam\default\bin\dune.exe build -p time_now -j 1
# exit-code 1
# env-file ~\.opam\log\time_now-8632-6ce4ee.env
# output-file ~\.opam\log\time_now-8632-6ce4ee.out
### output ###
# cl src/time_now_stubs.obj (exit 2)
# (cd _build/default/src && "C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\
VC\Tools\MSVC\14.29.30037\bin\HostX64\x64\cl.exe" -nologo -O2 -Gy- -MD -D_CRT_SECURE_NO_DEP
RECATE -nologo -O2 -Gy- -MD -I c:/opt/ocaml/lib/ocaml -I C:\opt\cygwin64\home\user\.opam\d
efault\lib\base -I C:\opt\cygwin64\home\user\.opam\default\lib\base\base_internalhash_type
s -I C:\opt\cygwin64\home\user\.[...]
# time_now_stubs.c
# time_now_stubs.c(25): fatal error C1083: Cannot open include file: 'sys/time.h': No such
file or directory</code></pre>
<p>If you look at the <a href="https://github.com/janestreet/time_now/blob/e9901a6ee11567c82dd264b2d7ffbb4be1dd55c2/src/time_now_stubs.c#L25">time_now source code</a>, the reason is easy to easy to see: POSIX <a href="https://pubs.opengroup.org/onlinepubs/9699919799/functions/clock_getres.html">clock_gettime</a> is not available, so we try to compile the fallback which uses <code>gettimeofday</code> whose prototype comes from the non-standard <code>sys/time.h</code> in that while it is generally available on Linux, it is not in any C standard. A simple Google search for getting time in nanoseconds on Windows yields a number of cases where people wrote a simple function that provided the same prototype and similar functionality to use with older Microsoft compilers: E.g., <a href="https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/port/gettimeofday.c;h=75a91993b74414c0a1c13a2a09ce739cb8aa8a08;hb=HEAD">gettimeofday in PostgreSQL</a>, <a href="https://stackoverflow.com/a/26085827/100754">gettimeofday on Stackoverflow</a>, or an <a href="http://cs.uccs.edu/~cchow/pub/master/isemwal/new_webstone/WebStone2.5/src/gettimeofday.c">implementation</a> in <a href="http://www.mindcraft.com/webstone/">Webstone</a>.</p>
<p>I decided to dig a little deeper instead of copying and pasting one of these solutions.</p>
<p>It turns out, MS Visual C has supported <a href="https://en.cppreference.com/w/c/chrono/timespec_get">timespec_get</a> since Visual Studio 2015:</p>
<pre class="text"><code>C:\> cl /?
Microsoft (R) C/C++ Optimizing Compiler Version 19.00.24215.1 for x64
...</code></pre>
<p>Compiling and executing the following C program:</p>
<div class="sourceCode" id="cb3"><pre class="sourceCode C"><code class="sourceCode c"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="pp">#include </span><span class="im"><inttypes.h></span></span>
<span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a><span class="pp">#include </span><span class="im"><stdint.h></span></span>
<span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a><span class="pp">#include </span><span class="im"><stdio.h></span></span>
<span id="cb3-4"><a href="#cb3-4" aria-hidden="true" tabindex="-1"></a><span class="pp">#include </span><span class="im"><stdlib.h></span></span>
<span id="cb3-5"><a href="#cb3-5" aria-hidden="true" tabindex="-1"></a><span class="pp">#include </span><span class="im"><time.h></span></span>
<span id="cb3-6"><a href="#cb3-6" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb3-7"><a href="#cb3-7" aria-hidden="true" tabindex="-1"></a><span class="dt">static</span> <span class="dt">uint64_t</span></span>
<span id="cb3-8"><a href="#cb3-8" aria-hidden="true" tabindex="-1"></a>time_now(<span class="dt">void</span>)</span>
<span id="cb3-9"><a href="#cb3-9" aria-hidden="true" tabindex="-1"></a>{</span>
<span id="cb3-10"><a href="#cb3-10" aria-hidden="true" tabindex="-1"></a> <span class="kw">struct</span> timespec ts;</span>
<span id="cb3-11"><a href="#cb3-11" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb3-12"><a href="#cb3-12" aria-hidden="true" tabindex="-1"></a> <span class="cf">if</span> (timespec_get(&ts, TIME_UTC) != TIME_UTC)</span>
<span id="cb3-13"><a href="#cb3-13" aria-hidden="true" tabindex="-1"></a> {</span>
<span id="cb3-14"><a href="#cb3-14" aria-hidden="true" tabindex="-1"></a> fputs(<span class="st">"timespec_get failed!"</span>, stderr);</span>
<span id="cb3-15"><a href="#cb3-15" aria-hidden="true" tabindex="-1"></a> <span class="cf">return</span> <span class="dv">0</span>;</span>
<span id="cb3-16"><a href="#cb3-16" aria-hidden="true" tabindex="-1"></a> }</span>
<span id="cb3-17"><a href="#cb3-17" aria-hidden="true" tabindex="-1"></a> <span class="cf">return</span> <span class="dv">1000000000</span> * ts.tv_sec + ts.tv_nsec;</span>
<span id="cb3-18"><a href="#cb3-18" aria-hidden="true" tabindex="-1"></a>}</span>
<span id="cb3-19"><a href="#cb3-19" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb3-20"><a href="#cb3-20" aria-hidden="true" tabindex="-1"></a><span class="dt">int</span> main(<span class="dt">void</span>)</span>
<span id="cb3-21"><a href="#cb3-21" aria-hidden="true" tabindex="-1"></a>{</span>
<span id="cb3-22"><a href="#cb3-22" aria-hidden="true" tabindex="-1"></a> printf(<span class="st">"%"</span> PRIu64 <span class="st">"</span><span class="sc">\n</span><span class="st">"</span>, time_now());</span>
<span id="cb3-23"><a href="#cb3-23" aria-hidden="true" tabindex="-1"></a> <span class="cf">return</span> EXIT_SUCCESS;</span>
<span id="cb3-24"><a href="#cb3-24" aria-hidden="true" tabindex="-1"></a>}</span></code></pre></div>
<p>gives:</p>
<pre class="text"><code>C:\> cl t.c
Microsoft (R) C/C++ Optimizing Compiler Version 19.00.24215.1 for x64
Copyright (C) Microsoft Corporation. All rights reserved.
t.c
Microsoft (R) Incremental Linker Version 14.00.24215.1
Copyright (C) Microsoft Corporation. All rights reserved.
/out:t.exe
t.obj
C:\> perl -E "my $t = `t`; say $t; say scalar localtime($t/1_000_000_000)"
1626610211198802200
Sun Jul 18 08:10:11 2021</code></pre>
<p>It seems like the time has about 100 ns resolution:</p>
<pre class="text"><code>C:\> perl -E "system 't' for 1 .. 10"
1626610285846156200
1626610285862472200
1626610285875507600
1626610285886970500
1626610285902410000
1626610285918170900
1626610285932019000
1626610285946177400
1626610285960105900
1626610285973125100</code></pre>
<p>At this point, I decided that going from the state of the world where <code>time_now</code> had no implementation that worked with <code>cl</code> to one where it works with versions of the compiler released since 2015 was good enough progress that <a href="https://github.com/janestreet/time_now/pull/2/files">I opened a PR</a> to provide a standard <code>timespec_get</code> implementation when <code>time_now</code> is being compiled with recent versions of <code>cl</code>. The PR is limited in scope on purpose as I do not want to impose on the library authors a choice to move away from stuff that works well enough on POSIX/Linux systems just to cater to this specific case.</p>
<p>I also added the information to two of the questions I found on Stackoverflow: <a href="https://stackoverflow.com/a/68429021/100754">What should I use to replace gettimeofday() on Windows?</a> and <a href="https://stackoverflow.com/a/68429012/100754">Equivalent of gettimeday() for Windows</a>. The original questions/answers predate the availability of <code>timespec_get</code> and are reasonable, but, these days, it is possible to use the standard C function.</p>
<p>Finally, if you are using C++, not that Visual C++ provides C++17 and experimental C++20 support and it pays to keep an eye on <a href="https://en.cppreference.com/w/cpp/chrono">chrono</a>.</p>
<p>PS: You can <a href="https://news.ycombinator.com/item?id=27872800">discuss</a> this post on HN.</p>
</div>
</article>
Sinan UnurReading the state of the 4x4 keypad on the HC-35 using an ATtiny85tag:www.nu42.com,2021-02-13:/2021/02/attiny85-chirper-red-keys-liyafy-hc35.html2021-02-13T00:00:00-00:00
<article class="h-entry article">
<div class="article-header-container">
<header>
<h1 class="p-name article-title">Reading the state of the 4x4 keypad on the HC-35 using an ATtiny85</h1><h2 class="p-author article-author">A. Sinan Unur</h2><h3 class="article-published"><time datetime="2021-02-13T00:00:00-00:00" class="dt-published">February 13, 2021</time></h3>
</header>
</div>
<div class="article-content"><p>This post is third in a series where I try to figure out how to use the three main functional components on the <a href="https://amzn.to/3raaH4b" title="Amazon affiliate link to Liyafy HC-35">Liyafy HC-35</a> using an <a href="https://amzn.to/2NNFi97" title="Amazon affiliate link to ATtiny85 10 pack">ATtiny85</a>. In the <a href="/2021/01/attiny85-liyafy-hc-35-8-led-keypad-serial-in-parallel-out-shift-register.html">first part</a>, I discovered how to drive the LEDs with the <a href="https://www.microchip.com/wwwproducts/en/ATtiny85" title="ATtiny85 datasheet">ATtiny85</a> driving them using a <a href="https://amzn.to/3ah4zBB" title="Amazon affiliate link to 25 pack of 595 serial to parallel shift registers">74HC595</a> serial to parallel shift register. That allowed me to dedicate three pins on the ATtiny85 to turn eight LEDs on/off independently. More importantly, it still left me with two open pins on that chip which I can use to receive information about the state of the two sets of keys on the HC-35.</p>
<p>In the <a href="/2021/02/attiny85-chirper-red-keys-liyafy-hc35.html">second part</a>, I verified that I could simply read the pins for the four red keys on the <a href="https://amzn.to/3raaH4b" title="Amazon affiliate link to Liyafy HC-35">HC-35</a> and have a pin free to transmit key state out. That was almost trivial.</p>
<p>Those experiments gave me the confidence to proceed with the assumption that I could simply read the row and column pins and figure out which key was pressed. I was wrong. In retrospect, I should have known from the lack of resistors for the black keys that they were not going to work like the red keys. Here is a picture of the <a href="https://amzn.to/3raaH4b" title="Amazon affiliate link to Liyafy HC-35">HC-35</a> so you can see what I am referring to:</p>
<div class="thumb"><a href="/2021/01/liyafy-hc-35-lg.jpg"><img src="https://www.nu42.com/2021/01/liyafy-hc-35-sm.jpg" width="400"></a></div>
<p>Recall that the ATtiny85 has five pins which can be used for input-output. Emboldened the naive assumption that the black keys worked just like the red ones did, I proceeded to fish out a <a href="https://amzn.to/3jNAlJe" title="Amazon affiliate link to 20 pack of 74HC165 parallel to serial registers">74HC165</a> parallel to serial shift register, connect all the row/column pins on the HC-35 to the input pins on the <code>165</code>, wire up the signals between the ATtiny85 and wrote a short sketch to show everything worked.</p>
<p>Except, it didn’t.</p>
<p>That’s when I noticed the lack of resistors for the 4x4 keypad. At first, I thought it was a cost saving measure, so I put some 10K resistors between the row/column pins and the inputs to the <code>165</code> … Hmmm.</p>
<p>That didn’t work.</p>
<p>I did some searching and found the [scanning routine] in <a href="https://www.arduino.cc/reference/en/libraries/keypad/">Keypad</a>.</p>
<p>It assumes that you have eight pins to dedicate, so it is not useful to me in that regard, but I finally understood how the 4x4 keyboard state is supposed to be read. I should note that I am writing this as I discover, so the current working theory might also be wrong, but, as I understand this, the 4x4 keypad does not use the power from the 3.3V input that the LEDs and the red keys use. Instead, when you want to read the state of the keyboard, you set the column pins to <code>HIGH</code> one by one, and read the row pins. If a button is pressed, that row will read <code>LOW</code>. That’s how you know which key is pressed.</p>
<p>There doesn’t seem to be a way the <a href="https://www.ti.com/lit/ds/symlink/sn74hc165.pdf" title="Datasheet for the 74HC165">165</a> is going to be sufficient for this:</p>
<div class="thumb"><a href="sn74hc165-pins.jpg"><img src="https://www.nu42.comsn74hc165-pins.jpg" width="380"></div>
<p>Those parallel input pins don’t change direction.</p>
<p>After some research, I realized what I really need seems to be a <a href="https://amzn.to/3rR50IM" title="Amazon affiliate link to 10 pack of 74HC4051N 8-bit multiplexer/demultiplexer">74HC4051N</a> 8-bit multiplexer/demultiplexer. It <em>seems</em> like I need three pins on the ATtiny85 to tell it to select which pin to read from/write to and one more pin to actually read/write. I am operating under the assumption that I can just wire the enable pin to <code>GND</code>. That means I need one more pin which I can use to communicate the key state with the outside world. Without that pin, there is no point to this. This is pure speculation though. I do not have the <a href="https://assets.nexperia.com/documents/data-sheet/74HC_HCT4051.pdf" title="Datasheet for the 74HC4051N">4051N</a> in hand yet. Here is the description from the datasheet for reference:</p>
<blockquote>
<p>74HCT4051 is a single-pole octal-throw analog switch (SP8T) suitable for use in analog or digital 8:1 multiplexer/demultiplexer applications. The switch featuresthree digital select inputs (<code>S0</code>, <code>S1</code> and <code>S2</code>), eight independent inputs/outputs (<code>Yn</code>), acommon input/output (<code>Z</code>) and a digital enable input (<code>E</code>). When <code>E</code> is <code>HIGH</code>, the switches are turned off.</p>
</blockquote>
<p>I would still like to test if I understood how to read the state of the 4x4 keypad correctly. To that end, I am going to use another 595. It is going to be used to set the columns to <code>HIGH</code>. When looking at each column, we’ll also a set an LED corresponding to the column. I’ll use the left-most LEDs to indicate which column is being set to <code>HIGH</code> (i.e. being prepared to be read from). I am going to wire the row pins to the LED right hand side LED pins so that the pin corresponding to the row will turn off when the key at the intersection of the row/column is pressed. Not very practical, but, at least it will verify that I understood how this thing works.</p>
<p>For a recap of the wiring between the <code>595</code> and the <code>ATtiny85</code>, please refer to the “<a href="/2021/01/attiny85-liyafy-hc-35-8-led-keypad-serial-in-parallel-out-shift-register.html">Fun with an ATTiny85, Liyafy HC-35 keypad with eight LEDs, and a serial to parallel shift register</a>”.</p>
</div>
</article>
Sinan UnurATtiny85 chirping in response to red key presses on a Liyafy HC-35 keypadtag:www.nu42.com,2021-02-07:/2021/02/attiny85-chirper-red-keys-liyafy-hc35.html2021-02-07T23:45:00-00:00
<article class="h-entry article">
<div class="article-header-container">
<header>
<h1 class="p-name article-title">ATtiny85 chirping in response to red key presses on a Liyafy HC-35 keypad</h1><h2 class="p-author article-author">A. Sinan Unur</h2><h3 class="article-published"><time datetime="2021-02-07T23:45:00-00:00" class="dt-published">February 7, 2021</time></h3>
</header>
</div>
<div class="article-content"><p>For background on this post, please see “<a href="/2021/01/attiny85-liyafy-hc-35-8-led-keypad-serial-in-parallel-out-shift-register.html" title="Controlling LEDs on the Liyafy HC-35 using ATtiny85">Fun with an ATTiny85, Liyafy HC-35 keypad with eight LEDs, and a serial to parallel shift register</a>”.</p>
<p><a href="/2021/01/attiny85-liyafy-hc-35-8-led-keypad-serial-in-parallel-out-shift-register.html" title="Controlling LEDs on the Liyafy HC-35 using ATtiny85">Last time</a>, I was just excited to get some lights to blink. Since then, I’ve figured out a few things. For example, if a bit in the output is set, the corresponding LED on the HC-35 is <em>turned off</em>. Therefore, to get the pleasing animated counting effect I wanted, I needed to send the complement of the number to the shift register. I also worked on tidying up some of the wiring to get things slightly more organized on the breadboard to prepare for six more wires coming in from the HC-35 and placing another ATtiny85 on the board. It turned out that I needed to run a separate pair of VCC/GND to the second ATtiny85 which where that power supply really shone.</p>
<p>So, before proceeding further, here is the improved code for flashing the LEDs on the HC-35:</p>
<div class="sourceCode" id="cb1"><pre class="sourceCode cpp"><code class="sourceCode cpp"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="co">// Adapted sample code in https://www.arduino.cc/en/Tutorial/Foundations/ShiftOut</span></span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a><span class="co">// Code sample 1: Hello World https://www.arduino.cc/en/Tutorial/ShftOut11</span></span>
<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a><span class="co">// ATtiny85 SN74HC595</span></span>
<span id="cb1-5"><a href="#cb1-5" aria-hidden="true" tabindex="-1"></a><span class="co">// ----------------------</span></span>
<span id="cb1-6"><a href="#cb1-6" aria-hidden="true" tabindex="-1"></a><span class="co">// PB0 (5) <-> RCLCK (12)</span></span>
<span id="cb1-7"><a href="#cb1-7" aria-hidden="true" tabindex="-1"></a><span class="co">// PB1 (6) <-> SRCLK (11)</span></span>
<span id="cb1-8"><a href="#cb1-8" aria-hidden="true" tabindex="-1"></a><span class="co">// PB2 (7) <-> SER (14)</span></span>
<span id="cb1-9"><a href="#cb1-9" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-10"><a href="#cb1-10" aria-hidden="true" tabindex="-1"></a><span class="co">// ATtiny85</span></span>
<span id="cb1-11"><a href="#cb1-11" aria-hidden="true" tabindex="-1"></a><span class="co">// --------</span></span>
<span id="cb1-12"><a href="#cb1-12" aria-hidden="true" tabindex="-1"></a><span class="co">// 5V <-> VCC(8)</span></span>
<span id="cb1-13"><a href="#cb1-13" aria-hidden="true" tabindex="-1"></a><span class="co">// GND <-> GND(4)</span></span>
<span id="cb1-14"><a href="#cb1-14" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-15"><a href="#cb1-15" aria-hidden="true" tabindex="-1"></a><span class="co">// SN74HC595</span></span>
<span id="cb1-16"><a href="#cb1-16" aria-hidden="true" tabindex="-1"></a><span class="co">// ---------</span></span>
<span id="cb1-17"><a href="#cb1-17" aria-hidden="true" tabindex="-1"></a><span class="co">// GND <-> GND (8)</span></span>
<span id="cb1-18"><a href="#cb1-18" aria-hidden="true" tabindex="-1"></a><span class="co">// VCC <-> SRCLR (10)</span></span>
<span id="cb1-19"><a href="#cb1-19" aria-hidden="true" tabindex="-1"></a><span class="co">// VCC <-> OE (13)</span></span>
<span id="cb1-20"><a href="#cb1-20" aria-hidden="true" tabindex="-1"></a><span class="co">// VCC <-> VCC (16)</span></span>
<span id="cb1-21"><a href="#cb1-21" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-22"><a href="#cb1-22" aria-hidden="true" tabindex="-1"></a><span class="co">// PB0 (5) <-> RCLK (12)</span></span>
<span id="cb1-23"><a href="#cb1-23" aria-hidden="true" tabindex="-1"></a><span class="co">// ST_CP / latch / Storage register clock pin in tutorial</span></span>
<span id="cb1-24"><a href="#cb1-24" aria-hidden="true" tabindex="-1"></a><span class="at">static</span> <span class="at">const</span> <span class="dt">int</span> rclk_out = <span class="dv">0</span>;</span>
<span id="cb1-25"><a href="#cb1-25" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-26"><a href="#cb1-26" aria-hidden="true" tabindex="-1"></a><span class="co">// PB1 (6) <-> SRCLK (11)</span></span>
<span id="cb1-27"><a href="#cb1-27" aria-hidden="true" tabindex="-1"></a><span class="co">// SH_CP / Shift register clock pin in tutorial</span></span>
<span id="cb1-28"><a href="#cb1-28" aria-hidden="true" tabindex="-1"></a><span class="at">static</span> <span class="at">const</span> <span class="dt">int</span> srclk_out = <span class="dv">1</span>;</span>
<span id="cb1-29"><a href="#cb1-29" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-30"><a href="#cb1-30" aria-hidden="true" tabindex="-1"></a><span class="co">// PB2 (7) <-> SER (14)</span></span>
<span id="cb1-31"><a href="#cb1-31" aria-hidden="true" tabindex="-1"></a><span class="co">// DS / Serial data input in tutorial</span></span>
<span id="cb1-32"><a href="#cb1-32" aria-hidden="true" tabindex="-1"></a><span class="at">static</span> <span class="at">const</span> <span class="dt">int</span> ser_out = <span class="dv">2</span>;</span>
<span id="cb1-33"><a href="#cb1-33" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-34"><a href="#cb1-34" aria-hidden="true" tabindex="-1"></a><span class="dt">void</span> setup() {</span>
<span id="cb1-35"><a href="#cb1-35" aria-hidden="true" tabindex="-1"></a> delay(<span class="dv">250</span>); <span class="co">// let things settle</span></span>
<span id="cb1-36"><a href="#cb1-36" aria-hidden="true" tabindex="-1"></a> pinMode(rclk_out, OUTPUT);</span>
<span id="cb1-37"><a href="#cb1-37" aria-hidden="true" tabindex="-1"></a> pinMode(srclk_out, OUTPUT);</span>
<span id="cb1-38"><a href="#cb1-38" aria-hidden="true" tabindex="-1"></a> pinMode(ser_out, OUTPUT);</span>
<span id="cb1-39"><a href="#cb1-39" aria-hidden="true" tabindex="-1"></a> delay(<span class="dv">250</span>);</span>
<span id="cb1-40"><a href="#cb1-40" aria-hidden="true" tabindex="-1"></a> digitalWrite(ser_out, <span class="dv">255</span>); <span class="co">// lights off to start</span></span>
<span id="cb1-41"><a href="#cb1-41" aria-hidden="true" tabindex="-1"></a>}</span>
<span id="cb1-42"><a href="#cb1-42" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-43"><a href="#cb1-43" aria-hidden="true" tabindex="-1"></a><span class="dt">void</span> led_out(<span class="dt">int</span> b) {</span>
<span id="cb1-44"><a href="#cb1-44" aria-hidden="true" tabindex="-1"></a> digitalWrite(rclk_out, LOW); <span class="co">// so LEDs don't change while the bits are being transmitted</span></span>
<span id="cb1-45"><a href="#cb1-45" aria-hidden="true" tabindex="-1"></a> shiftOut(ser_out, srclk_out, LSBFIRST, b); <span class="co">// send the data</span></span>
<span id="cb1-46"><a href="#cb1-46" aria-hidden="true" tabindex="-1"></a> digitalWrite(rclk_out, HIGH); <span class="co">// make the new eight bits available</span></span>
<span id="cb1-47"><a href="#cb1-47" aria-hidden="true" tabindex="-1"></a>}</span>
<span id="cb1-48"><a href="#cb1-48" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-49"><a href="#cb1-49" aria-hidden="true" tabindex="-1"></a><span class="dt">void</span> one_at_a_time() {</span>
<span id="cb1-50"><a href="#cb1-50" aria-hidden="true" tabindex="-1"></a> <span class="co">// Light up each LED once</span></span>
<span id="cb1-51"><a href="#cb1-51" aria-hidden="true" tabindex="-1"></a> <span class="co">// Should take about a second</span></span>
<span id="cb1-52"><a href="#cb1-52" aria-hidden="true" tabindex="-1"></a> <span class="cf">for</span> (<span class="dt">int</span> i = <span class="dv">0</span>; i < <span class="dv">8</span>; ++i) {</span>
<span id="cb1-53"><a href="#cb1-53" aria-hidden="true" tabindex="-1"></a> led_out(~(<span class="dv">1</span> << i));</span>
<span id="cb1-54"><a href="#cb1-54" aria-hidden="true" tabindex="-1"></a> delay(<span class="dv">125</span>);</span>
<span id="cb1-55"><a href="#cb1-55" aria-hidden="true" tabindex="-1"></a> }</span>
<span id="cb1-56"><a href="#cb1-56" aria-hidden="true" tabindex="-1"></a>}</span>
<span id="cb1-57"><a href="#cb1-57" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-58"><a href="#cb1-58" aria-hidden="true" tabindex="-1"></a><span class="dt">void</span> all_combos() {</span>
<span id="cb1-59"><a href="#cb1-59" aria-hidden="true" tabindex="-1"></a> <span class="co">// Count from 0 to 255 in a fraction under a minute</span></span>
<span id="cb1-60"><a href="#cb1-60" aria-hidden="true" tabindex="-1"></a> <span class="cf">for</span> (<span class="dt">int</span> i = <span class="dv">0</span>; i < <span class="dv">256</span>; ++i) {</span>
<span id="cb1-61"><a href="#cb1-61" aria-hidden="true" tabindex="-1"></a> led_out(~i);</span>
<span id="cb1-62"><a href="#cb1-62" aria-hidden="true" tabindex="-1"></a> delay(<span class="dv">234</span>);</span>
<span id="cb1-63"><a href="#cb1-63" aria-hidden="true" tabindex="-1"></a> }</span>
<span id="cb1-64"><a href="#cb1-64" aria-hidden="true" tabindex="-1"></a> led_out(<span class="dv">255</span>);</span>
<span id="cb1-65"><a href="#cb1-65" aria-hidden="true" tabindex="-1"></a>}</span>
<span id="cb1-66"><a href="#cb1-66" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-67"><a href="#cb1-67" aria-hidden="true" tabindex="-1"></a><span class="dt">void</span> blink_all() {</span>
<span id="cb1-68"><a href="#cb1-68" aria-hidden="true" tabindex="-1"></a> <span class="co">// Rapid blinking for five seconds</span></span>
<span id="cb1-69"><a href="#cb1-69" aria-hidden="true" tabindex="-1"></a> <span class="cf">for</span> (<span class="dt">int</span> i = <span class="dv">0</span>; i < <span class="dv">50</span>; ++i) {</span>
<span id="cb1-70"><a href="#cb1-70" aria-hidden="true" tabindex="-1"></a> led_out(<span class="dv">0</span>);</span>
<span id="cb1-71"><a href="#cb1-71" aria-hidden="true" tabindex="-1"></a> delay(<span class="dv">50</span>);</span>
<span id="cb1-72"><a href="#cb1-72" aria-hidden="true" tabindex="-1"></a> led_out(<span class="dv">255</span>);</span>
<span id="cb1-73"><a href="#cb1-73" aria-hidden="true" tabindex="-1"></a> delay(<span class="dv">50</span>);</span>
<span id="cb1-74"><a href="#cb1-74" aria-hidden="true" tabindex="-1"></a> }</span>
<span id="cb1-75"><a href="#cb1-75" aria-hidden="true" tabindex="-1"></a> led_out(<span class="dv">255</span>);</span>
<span id="cb1-76"><a href="#cb1-76" aria-hidden="true" tabindex="-1"></a>}</span>
<span id="cb1-77"><a href="#cb1-77" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-78"><a href="#cb1-78" aria-hidden="true" tabindex="-1"></a><span class="dt">void</span> loop() {</span>
<span id="cb1-79"><a href="#cb1-79" aria-hidden="true" tabindex="-1"></a> delay(<span class="dv">1000</span>); <span class="co">// wait a little before the show</span></span>
<span id="cb1-80"><a href="#cb1-80" aria-hidden="true" tabindex="-1"></a> one_at_a_time();</span>
<span id="cb1-81"><a href="#cb1-81" aria-hidden="true" tabindex="-1"></a> delay(<span class="dv">1000</span>);</span>
<span id="cb1-82"><a href="#cb1-82" aria-hidden="true" tabindex="-1"></a> all_combos();</span>
<span id="cb1-83"><a href="#cb1-83" aria-hidden="true" tabindex="-1"></a> delay(<span class="dv">1000</span>);</span>
<span id="cb1-84"><a href="#cb1-84" aria-hidden="true" tabindex="-1"></a> blink_all();</span>
<span id="cb1-85"><a href="#cb1-85" aria-hidden="true" tabindex="-1"></a>}</span></code></pre></div>
<p>Satisfied that each LED was being turned on from the LSB to MSB and they were counting from <code>0</code> to <code>255</code> in binary, I turned my attention to the task of getting key press information.</p>
<p>I took for granted that I could read the keypresses using four wires. That turned out to be a correct assumption. So, four pins for reading keys, one pin for VCC, one pin for GND, the RESET pin which I am <em>not</em> going to disable because <a href="https://electronics.stackexchange.com/q/258997/276109">I do not want to <q>need a high voltage serial programmer to reprogram the chip</q></a>. That leaves one pin to get information out. My mistake was assume that I should immediately figure out how use <a href="https://www.arduino.cc/en/Reference/SoftwareSerialConstructor" title="SoftwareSerial documentation">SoftwareSerial</a> or similar to output bytes on that pin. I decided to try to use <code>PB4</code> because mapping four keys to <code>PB0</code>–<code>PB3</code> just made sense even though I know <code>PB2</code> has abbreviations associated with serial interfaces next to it. At first, I had been using the pin change interrupts to read key states, but then <code>SoftwareSerial</code> wants to install its own handlers for receive operations. Remember, I am only interested in getting information out. Stuff was just not working out. So, I decided to take a step back and proceed in small, discrete steps.</p>
<p>How about <em>proving</em> that I can read key states using the simplest approach? I decided I wanted audible confirmation that I was able to distinguish which specific key was pressed. So, I just wrote a simple busy loop, wired up the active buzzer to <code>PB4</code>. And, it works:</p>
<div class="sourceCode" id="cb2"><pre class="sourceCode cpp"><code class="sourceCode cpp"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="dt">void</span></span>
<span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a>setup() {</span>
<span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a> pinMode(<span class="dv">0</span>, INPUT_PULLUP);</span>
<span id="cb2-4"><a href="#cb2-4" aria-hidden="true" tabindex="-1"></a> pinMode(<span class="dv">1</span>, INPUT_PULLUP);</span>
<span id="cb2-5"><a href="#cb2-5" aria-hidden="true" tabindex="-1"></a> pinMode(<span class="dv">2</span>, INPUT_PULLUP);</span>
<span id="cb2-6"><a href="#cb2-6" aria-hidden="true" tabindex="-1"></a> pinMode(<span class="dv">3</span>, INPUT_PULLUP);</span>
<span id="cb2-7"><a href="#cb2-7" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb2-8"><a href="#cb2-8" aria-hidden="true" tabindex="-1"></a> pinMode(<span class="dv">4</span>, OUTPUT);</span>
<span id="cb2-9"><a href="#cb2-9" aria-hidden="true" tabindex="-1"></a> digitalWrite(<span class="dv">4</span>, LOW);</span>
<span id="cb2-10"><a href="#cb2-10" aria-hidden="true" tabindex="-1"></a> delay(<span class="dv">5000</span>);</span>
<span id="cb2-11"><a href="#cb2-11" aria-hidden="true" tabindex="-1"></a>}</span>
<span id="cb2-12"><a href="#cb2-12" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb2-13"><a href="#cb2-13" aria-hidden="true" tabindex="-1"></a><span class="dt">bool</span></span>
<span id="cb2-14"><a href="#cb2-14" aria-hidden="true" tabindex="-1"></a>key_pressed(<span class="dt">int</span> pin)</span>
<span id="cb2-15"><a href="#cb2-15" aria-hidden="true" tabindex="-1"></a>{</span>
<span id="cb2-16"><a href="#cb2-16" aria-hidden="true" tabindex="-1"></a> <span class="cf">return</span> digitalRead(pin) == LOW;</span>
<span id="cb2-17"><a href="#cb2-17" aria-hidden="true" tabindex="-1"></a>}</span>
<span id="cb2-18"><a href="#cb2-18" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb2-19"><a href="#cb2-19" aria-hidden="true" tabindex="-1"></a><span class="dt">void</span></span>
<span id="cb2-20"><a href="#cb2-20" aria-hidden="true" tabindex="-1"></a>blink(<span class="dt">int</span> times, <span class="dt">int</span> duration)</span>
<span id="cb2-21"><a href="#cb2-21" aria-hidden="true" tabindex="-1"></a>{</span>
<span id="cb2-22"><a href="#cb2-22" aria-hidden="true" tabindex="-1"></a> <span class="cf">for</span> (<span class="dt">int</span> i = <span class="dv">0</span>; i < times; ++i) {</span>
<span id="cb2-23"><a href="#cb2-23" aria-hidden="true" tabindex="-1"></a> digitalWrite(<span class="dv">4</span>, HIGH);</span>
<span id="cb2-24"><a href="#cb2-24" aria-hidden="true" tabindex="-1"></a> delay(duration);</span>
<span id="cb2-25"><a href="#cb2-25" aria-hidden="true" tabindex="-1"></a> digitalWrite(<span class="dv">4</span>, LOW);</span>
<span id="cb2-26"><a href="#cb2-26" aria-hidden="true" tabindex="-1"></a> delay(duration);</span>
<span id="cb2-27"><a href="#cb2-27" aria-hidden="true" tabindex="-1"></a> }</span>
<span id="cb2-28"><a href="#cb2-28" aria-hidden="true" tabindex="-1"></a>}</span>
<span id="cb2-29"><a href="#cb2-29" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb2-30"><a href="#cb2-30" aria-hidden="true" tabindex="-1"></a><span class="dt">void</span></span>
<span id="cb2-31"><a href="#cb2-31" aria-hidden="true" tabindex="-1"></a>loop() {</span>
<span id="cb2-32"><a href="#cb2-32" aria-hidden="true" tabindex="-1"></a> <span class="cf">if</span> (key_pressed(<span class="dv">0</span>)) blink(<span class="dv">50</span>, <span class="dv">50</span>);</span>
<span id="cb2-33"><a href="#cb2-33" aria-hidden="true" tabindex="-1"></a> <span class="cf">if</span> (key_pressed(<span class="dv">1</span>)) blink(<span class="dv">20</span>, <span class="dv">125</span>);</span>
<span id="cb2-34"><a href="#cb2-34" aria-hidden="true" tabindex="-1"></a> <span class="cf">if</span> (key_pressed(<span class="dv">2</span>)) blink(<span class="dv">10</span>, <span class="dv">250</span>);</span>
<span id="cb2-35"><a href="#cb2-35" aria-hidden="true" tabindex="-1"></a> <span class="cf">if</span> (key_pressed(<span class="dv">3</span>)) blink(<span class="dv">5</span>, <span class="dv">500</span>);</span>
<span id="cb2-36"><a href="#cb2-36" aria-hidden="true" tabindex="-1"></a>}</span></code></pre></div>
<p>Here’s a photo of the slightly cleaned up placement and wiring of all the components:</p>
<div class="thumb"><a title="One ATtiny85 blinking LEDs, another reading keypresses and chirping an active buzzer in response" href="attiny85-active-buzzer-liyafy-hc-35-red-keys.jpg"><img alt="[ One ATtiny85 blinking LEDs, another reading keypresses and chirping an active buzzer in response ]" width="400" src="https://www.nu42.comattiny85-active-buzzer-liyafy-hc-35-red-keys-thumb.jpg"></a></div>
<p>And here’s a <a href="https://youtu.be/7ZpFFdNyAks" title="Video on YouTube">video of the buzzer chirping in response to key presses and LEDs flashing pleasingly</a>. I kept the buzzer covered because otherwise it was getting too loud on the video. In case the chirping is not audible, I added a yellow LED inline to also give a visual indication that different keypresses generate different output.</p>
<p>At this point, I was ready it call it a day, but something else came up. Later, I did a search for using SotwareSerial at low bit rates, and I found <a href="https://forum.arduino.cc/index.php?topic=351913.0">a forum post</a> where the possibility of fiddling with the calibration of the oscillator was mentioned. The OP posted a calibration routine which looped through values <code>0</code> to <code>255</code> for <code>OSCCAL</code>, sending the output on the serial line it was trying to use. Watching on the monitor, you could see at what <code>OSCCAL</code> value you stopped seeing gibberish. My first few attempts were discouraging because after running it or five minutes or so, I would see long periods of nothing punctuated by some gibberish. I decided to fiddle with the calibration routine by instead starting at the mid-point by setting <code>OSCCAL = 128</code>, and then adding <code>Δ ∈ {±1, ±2, ±3, ... ±127}</code>. Yeah, this misses <code>OSCCAL = 0</code> but we’ve already established I don’t see anthying at that value.</p>
<p>Surprise!</p>
<p>Here’s the calibration routine:</p>
<div class="sourceCode" id="cb3"><pre class="sourceCode cpp"><code class="sourceCode cpp"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="pp">#include </span><span class="im"><SoftwareSerial.h></span></span>
<span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a>SoftwareSerial comm(-<span class="dv">1</span>, <span class="dv">0</span>);</span>
<span id="cb3-4"><a href="#cb3-4" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb3-5"><a href="#cb3-5" aria-hidden="true" tabindex="-1"></a><span class="at">static</span> <span class="at">const</span> <span class="dt">int</span> anchor = <span class="dv">128</span>;</span>
<span id="cb3-6"><a href="#cb3-6" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb3-7"><a href="#cb3-7" aria-hidden="true" tabindex="-1"></a><span class="dt">void</span></span>
<span id="cb3-8"><a href="#cb3-8" aria-hidden="true" tabindex="-1"></a>print_osccal(<span class="dt">int</span> v) {</span>
<span id="cb3-9"><a href="#cb3-9" aria-hidden="true" tabindex="-1"></a> comm.println(F(<span class="st">"********************************"</span>));</span>
<span id="cb3-10"><a href="#cb3-10" aria-hidden="true" tabindex="-1"></a> comm.print(F(<span class="st">"OSCCAL = "</span>));</span>
<span id="cb3-11"><a href="#cb3-11" aria-hidden="true" tabindex="-1"></a> comm.println(v);</span>
<span id="cb3-12"><a href="#cb3-12" aria-hidden="true" tabindex="-1"></a> comm.println(F(<span class="st">"********************************"</span>));</span>
<span id="cb3-13"><a href="#cb3-13" aria-hidden="true" tabindex="-1"></a>}</span>
<span id="cb3-14"><a href="#cb3-14" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb3-15"><a href="#cb3-15" aria-hidden="true" tabindex="-1"></a><span class="dt">void</span></span>
<span id="cb3-16"><a href="#cb3-16" aria-hidden="true" tabindex="-1"></a>setup() {</span>
<span id="cb3-17"><a href="#cb3-17" aria-hidden="true" tabindex="-1"></a> delay(<span class="dv">5000</span>);</span>
<span id="cb3-18"><a href="#cb3-18" aria-hidden="true" tabindex="-1"></a> comm.begin(<span class="dv">300</span>);</span>
<span id="cb3-19"><a href="#cb3-19" aria-hidden="true" tabindex="-1"></a> OSCCAL = anchor;</span>
<span id="cb3-20"><a href="#cb3-20" aria-hidden="true" tabindex="-1"></a> print_osccal(anchor);</span>
<span id="cb3-21"><a href="#cb3-21" aria-hidden="true" tabindex="-1"></a> delay(<span class="dv">5000</span>);</span>
<span id="cb3-22"><a href="#cb3-22" aria-hidden="true" tabindex="-1"></a>}</span>
<span id="cb3-23"><a href="#cb3-23" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb3-24"><a href="#cb3-24" aria-hidden="true" tabindex="-1"></a><span class="dt">void</span></span>
<span id="cb3-25"><a href="#cb3-25" aria-hidden="true" tabindex="-1"></a>loop() {</span>
<span id="cb3-26"><a href="#cb3-26" aria-hidden="true" tabindex="-1"></a> <span class="dt">int</span> x;</span>
<span id="cb3-27"><a href="#cb3-27" aria-hidden="true" tabindex="-1"></a> <span class="cf">for</span> (<span class="dt">int</span> i = <span class="dv">1</span>; i < <span class="dv">128</span>; ++i) {</span>
<span id="cb3-28"><a href="#cb3-28" aria-hidden="true" tabindex="-1"></a> x = anchor + i;</span>
<span id="cb3-29"><a href="#cb3-29" aria-hidden="true" tabindex="-1"></a> OSCCAL = x;</span>
<span id="cb3-30"><a href="#cb3-30" aria-hidden="true" tabindex="-1"></a> print_osccal(x);</span>
<span id="cb3-31"><a href="#cb3-31" aria-hidden="true" tabindex="-1"></a> delay(<span class="dv">1000</span>);</span>
<span id="cb3-32"><a href="#cb3-32" aria-hidden="true" tabindex="-1"></a> x = anchor - i;</span>
<span id="cb3-33"><a href="#cb3-33" aria-hidden="true" tabindex="-1"></a> OSCCAL = x;</span>
<span id="cb3-34"><a href="#cb3-34" aria-hidden="true" tabindex="-1"></a> print_osccal(x);</span>
<span id="cb3-35"><a href="#cb3-35" aria-hidden="true" tabindex="-1"></a> delay(<span class="dv">1000</span>);</span>
<span id="cb3-36"><a href="#cb3-36" aria-hidden="true" tabindex="-1"></a> }</span>
<span id="cb3-37"><a href="#cb3-37" aria-hidden="true" tabindex="-1"></a>}</span></code></pre></div>
<p>Now, I have decided I’d rather not use <code>SoftwareSerial</code> in the actual project, but it does the job of helping me discover the <code>OSCCAL</code> value at which serial communication works. In the case, I have a baud rate of <code>300</code> mostly because it creates a retro effect in the serial monitor. It might also be the right baud rate to use for this project, but let’s not get ahead of ourselves. I can’t adapt one of those send only serial libraries without knowing the value of <code>OSCCAL</code> which makes serial work reliably.</p>
<p>It is at this point that I decided to get fancy: Why not give me button to tell the ATtiny85 to save the current value in EEPROM so I don’t have to remember to set it manually in every the actual code that uses serial comms? Note that it is not clear I gain much from this as now I have to remember to add EEPROM support to the code that uses the value. But, I am doing this for fun, so why not? My first search landed me on <a href="https://arduino.stackexchange.com/q/25080/70572">this question</a> and <a href="https://arduino.stackexchange.com/a/25086/70572">this helpful answer</a> told me I needed to modify <code>boards.txt</code> in Arduino IDE to set <a href="http://eleccelerator.com/fusecalc/fusecalc.php?chip=attiny85&LOW=E2&HIGH=D7&EXTENDED=FF&LOCKBIT=FF">“high fuse” to <code>0xD7</code> instead o <code>0xDF</code></a> for the EEPROM to be preserved when I reprogram the chip.</p>
<p>When you have the tempatation to get fancy … resist. This where things went a bit wrong. First, I shorted the push button on the breadboard. Then, I burned the ATtiny85 by inserting it the wrong way in the programmer. Finally, <a href="https://amzn.to/39Vjj90" title="Amazon.com affiliate link for DSD TECH SH-U09C5 USB to TTL UART Converter Cable with FTDI Chip">fancy USB to TTL UART converter</a> became unresponsive (possibly due to me connecting one of the legs of that button to 5V instead of GND). So, took another break from this and came back to it later.</p>
<p>The next ATtiny85 did not seem to have an initial <code>OSCCAL</code> value anywhere near the middle of the 0 … 255 range, so I decided to fiddle with the search to make it asymmetric around the value of <code>OSCCAL</code> at startup. Here is the code (note, I’ve kind a left the saving the EEPROM thing as an exercise for the reader):</p>
<pre class="arduino"><code>#include <EEPROM.h>
#include <SoftwareSerial.h>
#define TX_BAUD 300
#define TX_PIN 0
#define BUTTON_PIN 2
#define LED_PIN 4
#define LONG_WAIT 1000
#define SHORT_WAIT 250
static int initial_osccal;
static void blink_on_button_press();
static void blink_on_done();
static void blink_on_print_osccal();
static void blink_on_startup();
static void do_blink(int, int);
static void print_osccal(int);
static int restore();
static void save(int);
static int search_osccal();
static bool should_save();
static int try_osccal(int);
SoftwareSerial* comm;
static void
print_osccal(int v) {
comm->println("\r\n*-- OSCCAL -------------------------------");
comm->print("* At startup = ");
comm->println(initial_osccal);
comm->print("* Current = ");
comm->println(v);
comm->println("*--- Press button to save in EEPROM ------\r\n");
}
static void
save(int v)
{
EEPROM.update(0, 'O');
EEPROM.update(1, 'S');
EEPROM.update(2, 'C');
EEPROM.update(3, (byte)(v & 0xff));
}
static int
restore()
{
// Check signature to avoid bogus values
if (
(EEPROM.read(0) == 'O') &&
(EEPROM.read(1) == 'S') &&
(EEPROM.read(2) == 'C')
)
{
return EEPROM.read(3);
}
return -1;
}
static bool
should_save()
{
// Figure out what this should do
return false;
}
static int
try_osccal(int v)
{
if ((v < 0) || (v > 255)) return -1;
OSCCAL = v;
print_osccal(v);
blink_on_print_osccal();
delay(SHORT_WAIT);
if (should_save()) return v;
return -1;
}
static int
search_osccal()
{
// Maybe the initial value is good.
if (try_osccal(initial_osccal) >= 0) return initial_osccal;
int anchor = initial_osccal;
int limit = max(initial_osccal, 255 - initial_osccal);
int x;
// Search up and down
for (int i = 1; i < limit; ++i) {
if ((x = try_osccal(anchor + i)) >= 0) return x;
if ((x = try_osccal(anchor - i)) >= 0) return x;
}
return -1;
}
static void
do_blink(int times, int delta)
{
for (int i = 0; i < times; ++i)
{
digitalWrite(LED_PIN, HIGH);
delay(delta);
digitalWrite(LED_PIN, LOW);
delay(delta);
}
}
static void
blink_on_button_press()
{
do_blink(25, 20);
}
static void
blink_on_done()
{
do_blink(50, LONG_WAIT/50);
}
static void
blink_on_print_osccal()
{
do_blink(8, LONG_WAIT/8);
}
static void
blink_on_startup()
{
do_blink(75, LONG_WAIT/75);
}
void
setup() {
blink_on_startup();
initial_osccal = restore();
if (initial_osccal < 0)
{
initial_osccal = OSCCAL;
}
OSCCAL = initial_osccal;
pinMode(LED_PIN, OUTPUT);
pinMode(BUTTON_PIN, INPUT);
comm = new SoftwareSerial(-1, TX_PIN);
comm->begin(TX_BAUD);
delay(5000);
blink_on_startup();
}
void
loop() {
int good_osccal;
while ((good_osccal = search_osccal()) < 0)
{
// Spin until desired value is determined
}
save(good_osccal);
OSCCAL = good_osccal;
blink_on_done();
comm->print("OSCCAL value = ");
comm->print(good_osccal);
comm->println("should be saved now.");
}</code></pre>
<p>The screenshot below links to an animated GIF of the <a href="serial-monitor-attiny85-osccal-search.gif">serial monitor during the <code>OSCCAL</code> search</a>:</p>
<div class="thumb"><a href="serial-monitor-attiny85-osccal-search.gif" title="Link to animated GIF of the serial monitor during the OSCCAL search"><img src="https://www.nu42.comserial-monitor-attiny85-osccal-search.png" width="640" alt="[ Screenshot of the serial monitor during the search for an appropriate OSCCAL ]"></a></div>
<p>The code for this takes up about 3.5 Kb of the 8 Kb flash space and uses something 330 bytes out of the 512 bytes of RAM. So, I have no intention of including any of this in the solution for sending key state to another ATtiny85, but getting reliable serial output was good. I am still going to attempt to put together, using others’ good ideas, obviously, a decent send only serial bit banger to operate at 300 baud and 1 MHz clock speed (<code>SoftwareSerial</code> requires at least 8 MHz clock).</p>
<p><u>PS</u>: You can discuss this post on <a href="https://redd.it/lezfra">r/attiny</a> and <a href="https://news.ycombinator.com/item?id=26059463">HackerNews</a>.</p>
</div>
</article>
Sinan UnurFun with an ATTiny85, Liyafy HC-35 keypad with eight LEDs, and a serial to parallel shift registertag:www.nu42.com,2021-01-31:/2021/01/attiny85-liyafy-hc-35-8-led-keypad-serial-in-parallel-out-shift-register.html2021-01-31T21:00:00-00:00
<article class="h-entry article">
<div class="article-header-container">
<header>
<h1 class="p-name article-title">Fun with an ATTiny85, Liyafy HC-35 keypad with eight LEDs, and a serial to parallel shift register</h1><h2 class="p-author article-author">A. Sinan Unur</h2><h3 class="article-published"><time datetime="2021-01-31T21:00:00-00:00" class="dt-published">January 31, 2021</time></h3>
</header>
</div>
<div class="article-content"><p>Some months ago, I got curious about ESP32 microcontrollers. The exact reason is not important. I ended up learning about <a href="https://www.nodemcu.com/">NodeMCU</a>, <a href="https://micropython.org/">MicroPython</a>, <a href="https://www.i2c-bus.org/">I<sup>2</sup>C</a>. In the end, I was able to wire up an ESP32 to an OLED screen which showed a small set of rotating fortune cookies.</p>
<p>It turned out to be rather straightforward. I wanted more artificial constraints. After all, I am not doing this for real work. It’s just a different way of having a well defined task that can be accomplished in a short amount of time with a visible outcome. In that sense, it is similar to <a href="https://stackoverflow.com/users/100754/sinan-%c3%9cn%c3%bcr">answering questions on Stackoverlow</a>. I find it helpful in a way others might find meditation helpful.</p>
<p>Before going further, I am know I late to the party. People have been doing this stuf for quite some time now. This gives me the ability to more easily discover how to solve a given problem because others did and wrote blog posts about them years ago.</p>
<p>My search for a <em>reasonably</em> constrained environment led me to <a href="https://www.microchip.com/wwwproducts/en/ATtiny85" title="ATtiny85 datasheet">ATtiny85</a>. I decided I wanted to do something with it, but I did not know what. So, the first thing I tried was to wire up an ATtiny85 to an OLED screen and display a (now much smaller) set of rotating <a href="/2020/12/small-is-beautiful.html">fortune cookies</a>.</p>
<p>ATtiny85 has 8 kilobytes of <a href="https://en.wikipedia.org/wiki/Flash_memory">programmable flash</a>, 512 bytes of programmable <a href="https://en.wikipedia.org/wiki/EEPROM">EEPROM</a>, and 512 bytes of <a href="https://en.wikipedia.org/wiki/Static_random-access_memory">SRAM</a>. As I said, for a while I did not know what I wanted to do with it. I would occasionally browse Amazon and order things that looked cheap and fun.</p>
<p><em>Note that in what follows, I will be using shortened Amazon affiliate links to the products I actually bought and used in making this blog post. The advantage of shortened links is that just loading the page does not immediately result in tracking images and cookies to be fetched (of course, if your browser does aggressive <a href="https://developer.mozilla.org/en-US/docs/Web/HTTP/Link_prefetching_FAQ">link prefetching</a>, all bets are off). Linking to those products do not mean I endorse them or even recommend them. I am just writing about my experience using them.</em></p>
<p>Among the things I bought because it looked cheap at the time was <a title="Amazon afiliate link to Liyafy HC-35 product page" href="https://amzn.to/3raaH4b">Liyafy HC-35 8 LED 4x4 push button matrix keyboard</a>. From my perspective, it looked appealing because it came with no documentation, there were no useful comments or Q&A on the product page, and a Google search only revealed <a href="https://forum.arduino.cc/index.php?topic=500018.0">an unanswered question from 2017 on the Arduino forums</a>.</p>
<p>In the package were two keypads on a single board with a slight perforation running through the middle. So, I cracked them apart :-) Now, I had two keypads with no documentation. Here is one of them:</p>
<div class="thumb"><a href="liyafy-hc-35-lg.jpg"><img src="https://www.nu42.comliyafy-hc-35-sm.jpg"></a></div>
<p>Documentation or not, the task seemed simple to me: There are three banks of pins on the left side of the board. The bottom bank is for manipulating the LEDs, and the two banks above that are for reading the state of the red buttons and the state of the black buttons, respectively. Eight LEDs, eight bits in a byte, not a huge leap to assume that each LED maps to a bit in a byte. It is not really clear whether the LEDs are ordered MSB first or LSB first. I am not good at reading board traces.</p>
<p>So, we need to have eight wires to drive the LEDs. VCC is clearly marked, but I didn’t know whether the board needed 5V or 3.3V. In fact, at first I guessed wrong and tried 5V. It doesn’t seem to have harmed the board, but after the fact, I thought maybe it would have been better to try the lower voltage first :-)</p>
<p>The ATtiny85 has 8 pins in <em>total</em>. We need a further four wires to read the state of the KEY B bank consisting of the red buttons, and we need an additional eight wires to read state of the KEY A bank consisting of the 4x4 keypad. Note the pins on the KEY A bank are labeled <var>L<sub>i</sub></var> and <var>R<sub>i</sub></var> where I would expect L to stand for “line” and R to stand for “row” which seem like the same thing to me. I decided to anchor on “R is row” and “L is column” as a working assumption, and focus solely on getting some lights to blink first.</p>
<p>Before going further, I should note that by the time I got around the thinking about this, a couple of months had gone by. Luckily, I had picked some other bits and pieces I thought might come in handy if I ever got around to trying this. Here is a list of what I ended up using:</p>
<ul>
<li><a href="https://amzn.to/2NNFi97" title="Amazon affiliate link to ATtiny85 10 pack">A ten-pack of ATtiny85 chips</a></li>
<li><a href="https://amzn.to/2NNF1TD" title="Amazon affiliate link to Belker 45W/3A AC/DC adapter">Belker 45W 5V 6V 7.5V 9V 12V 13.5V 15V Universal AC DC Power Supply</a></li>
<li><a href="https://amzn.to/3tfScgj" title="Amazon affiliate link to REXQualis Electronics Component Fun Kit">REXQualis Electronics Component Fun Kit</a>
<ul>
<li>Breadboard</li>
<li>Jumper wires</li>
<li>Extra LEDs</li>
<li>Power supply module</li>
<li><a href="https://www.ti.com/product/SN74HC595" title="595 8 bit serial to parallel shifter data sheet">74HC595</a></li>
</ul></li>
<li><a href="https://amzn.to/3r5mXmo" title="Amazon affiliate link to Tiny AVR Programmer">Tiny AVR programmer</a></li>
</ul>
<p>Note that stuff you find on Amazon tends to have a markup compared to the unit prices you find when shopping at speciality or overseas suppliers. And, prices on Amazon tend to vary a lot. I was not in a rush, did not have a specific project in mind, and just grabbed a thing or two when I thought the price was right. The Belker adapter is great. I have, of course, a box of wall warts accumulated over time, but with those either the specs turn out not to be what you want (if you can read them) or the tips don’t match etc. Since I bought this, it’s already been useful in multiple other contexts.</p>
<p>Well, I figured the first task was to actually be able to get something on the ATtiny85 to execute. I knew that I needed to install some drivers for the “programmer”, stick the chip in the right way (here’s where the 10-pack comes in handy: If you insert the chip in the wrong way, it burns because you end up swapping VCC and GND and 5V runs through going the wrong way … don’t trust your eyes, triple check before plugging the programmer in to your computer’s USB port). The <a href="https://www.microchip.com/wwwproducts/en/ATtiny85" title="ATtiny85 datasheet">datasheet</a> has the pin out diagram as well as other good information:</p>
<div class="thumb"><a href="attiny85-8pin-dip.png"><img title="ATtiny85 8 pin DIP package pinout diagram" alt="[ ATtiny85 8 pin DIP package pinout diagram ]" src="https://www.nu42.comattiny85-8pin-dip-thumb.png"></a></div>
<p>I followed <a href="https://learn.sparkfun.com/tutorials/tiny-avr-programmer-hookup-guide/all">SparkFun’s Tiny AVR programming hookup guide</a> followed by <a href="https://hackaday.com/2018/11/01/drawing-on-an-oled-with-an-attiny85-no-ram-buffers-allowed/">Drawing On An OLED With An ATtiny85</a> to produce some output on a <a href="https://amzn.to/3aiv5JA" title="Amazon affilliate link to SSD1306 128x32 OLED 3-pack">128x32 OLED screen</a> to get warmed up.</p>
<p>The next task was to figure out how to drive the LEDs which require 8 wires using the five pins I had at my disposal. So, I started by staring at the contents of the “Fun Kit”. There are only two ICs included in the pack. I did not know what the 4N35 was for (it turned out to be an <a href="https://www.vishay.com/docs/81181/4n35.pdf"><del>light sensor</del> optocoupler</a>) but it has only six pins, so clearly it could not do anything with mapping one bit at a time (aka serial) output to eight bits at a time (parallel). So, I searched the web for the only other IC in the package, <a href="sipo">74HC595</a>. TI’s page for the product mentions “<a href="ti-595-snip.png">8-bit shift registers with 3-state output registers</a>” right under the product name. To be honest, that didn’t mean any thing to me and I almost navigated away from the page, but scrolling down a little (is it me, or is everything on the web in huge and huger fonts now?) revealed the description:</p>
<blockquote>
<p>The SNx4HC595 devices contain an 8-bit, serial-in, parallel-out shift register that feeds an 8-bit D-type storage register. The storage register has parallel 3-state outputs. Separate clocks are provided for both the shift and storage register. The shift register has a direct overriding clear (SRCLR) input, serial (SER) input, and serial outputs for cascading. When the output-enable (OE) input is high, the outputs are in the high-impedance state.</p>
</blockquote>
<p>Yay! It looks like the “fun kit” includes just the part I needed for this! Good.</p>
<p>Time to learn what the gobbledygook means … It sounds like it’s saying it takes serial input, puts bits in eight bins, and, when you give it the all clear, it puts those zeros and ones on some IO pins. Here is the pin-out:</p>
<div class="thumb"><a href="SN74HC595-pins.png"><img alt="[ SN74HC595 pin out diagram ]" width="326" src="https://www.nu42.comSN74HC595-pins.png"></a></div>
<p><var>Q<sub>A</var> - <var>Q<sub>H</sub></var> are the eight output pins. VCC is VCC, GND is GND. <u>SER</u> better be serial. O̅E̅ is “output enable” meaning put “put the bits you buffered on the pins”. <var>Q<sub>H</sub><sup>′</sup></var> seems to provide the same output serially.</p>
<p>OK, so I need to connect <var>Q<sub>A</var> - <var>Q<sub>H</sub></var> to the LED inputs … I think I did this the “wrong” way by connecting <var>Q<sub>A</sub></var> to <var>D<sub>1</sub> on the keypad, but at this stage, I didn’t care what lit up … just that something did.</p>
<p>The big question is what to do with the five other pins. How do I drive them from the ATtiny85? I was confronted with a choice: I could read the datasheet, draw on my “vast” knowledge from electronics all gained in 8th grade, and translate the information in the datasheet to a program. Or, I could see if anyone else had done something similar. Searching for 74HC595 and Arduino led me to the documentation for the <a href="https://www.arduino.cc/reference/en/language/functions/advanced-io/shiftout/">shiftout</a> function:</p>
<blockquote>
<p>Shifts out a byte of data one bit at a time. Starts from either the most (i.e. the leftmost) or least (rightmost) significant bit. Each bit is written in turn to a data pin, after which a clock pin is pulsed (taken high, then low) to indicate that the bit is available.</p>
</blockquote>
<p>Nice. It turns out Arduino documentation is really good like Perl documentation (docs on both focus on giving you the information you need to be able to use a specific piece of functionality – For a contrast, see Python’s documentation which tends to obscure the information you need in dense prose). The docs for <code>shiftout</code> came with a complete example as well as a link to a <a href="https://www.arduino.cc/en/Tutorial/Foundations/ShiftOut" title="Serial to Parallel Shifting-Out with a 74HC595">tutorial on controlling the 74HC595 shift register</a>. Of course, the mnemonics used in the pin-out diagram in the tutorial do not fully match the datasheet, but it is really nice to have these resources when you are trying to take the baby steps. It is easier to understand how you ended up blinking those if you can get them to blink in the first place.</p>
<p>Obviously, the tutorial is great. Explains all you need to know really. It is geared towards an Arduino board and those tend to have way more available pins than the ATtiny85. Laying cable on a breadboard is not my strong suit (never touched a soldering iron either), but after spending a moment considering my options, I decided to use the three available pins on the right side of the ATtiny85 (<var>PB<sub>0</sub></var> … <var>PB<sub>2</sub></var>) for interfacing with the 595. While reading the tutorial, I noticed this:</p>
<blockquote>
<p>“3 states” refers to the fact that you can set the output pins as either high, low or “high impedance.” … Neither example takes advantage of this feature and you won’t usually need to worry about getting a chip that has it.</p>
</blockquote>
<p>Good. One less thing to worry about.</p>
<p>Anyway, I used whatever wires were available within the routing constraints and came up with this wiring:</p>
<div class="thumb"><a href="attiny85-SN74HC595-HC-35-breadboard.jpg"><img src="https://www.nu42.comattiny85-SN74HC595-HC-35-breadboard-thumb.jpg" width="400" title="ATtiny85 SN74HC595 Liyafy HC-35 keypad breadboard wiring" alt="[ ATtiny85 SN74HC595 Liyafy HC-35 keypad breadboard wiring ]"></a></div>
<p>The colors of the wires were dictated by available lengths because I did not feel like cutting and trimming or trying to lay out everything so that colors could be consistent. Right is mostly for communication between the 595 and the ATtiny85 and the left side is mostly for communicating with the LEDs. The power adapter is putting 5V on the left rails. The right rails are off. After discovering that the LEDs did not work with the 5V current, the power adapter proved to be really useful by letting me tap into a separate 3.3V fed by the same power source. Nice.</p>
<p>Note that the “fun pack” also included an <a href="https://electronics.stackexchange.com/a/224442">active buzzer</a> which is what helped me discover that the output pins on the 595 really were sending some signals by just plugging in to the bread board for each of them. It also helped me reassure myself that the ATtiny85 was at least doing something on those PINs. Not the most sophisticated debugging technique, but, then, <code>printf</code> debugging tends to be the quickest way to pinpoint problem code. The buzzer had a sticker that said “remove after washing”. I did not know if that meant the component <em>had</em> to be washed, but <a href="https://electronics.stackexchange.com/a/98566">thanks to this answer on Electronics SE</a>, I realized I could leave it on if only for the muting effect.</p>
<p>You may notice the extra RGB LEDs plugged in to the board (no resistors – that must be violating a cardinal rule somewhere). I mean, if I am going to flash lights, why not flash some more lights?</p>
<p>Here is the detail of the connections between the ATtiny85 and the 595:</p>
<div class="thumb"><a href="attiny85-SN74HC595-HC-35-breadboard-detail.jpg"><img src="https://www.nu42.comattiny85-SN74HC595-HC-35-breadboard-detail-thumb.jpg" width="400" title="ATtiny85 SN74HC595 Liyafy HC-35 keypad breadboard wiring" alt="[ ATtiny85 SN74HC595 Liyafy HC-35 keypad breadboard wiring ]"></a></div>
<p>And, here is the code I used (basically just fiddled with the sample code until it looked pleasing to <em>my eye</em>):</p>
<div class="sourceCode" id="cb1"><pre class="sourceCode c"><code class="sourceCode c"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="co">// Adapted sample code in https://www.arduino.cc/en/Tutorial/Foundations/ShiftOut</span></span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a><span class="co">// Code sample 1: Hello World https://www.arduino.cc/en/Tutorial/ShftOut11</span></span>
<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a><span class="co">// ATtiny85 SN74HC595</span></span>
<span id="cb1-5"><a href="#cb1-5" aria-hidden="true" tabindex="-1"></a><span class="co">// ----------------------</span></span>
<span id="cb1-6"><a href="#cb1-6" aria-hidden="true" tabindex="-1"></a><span class="co">// PB0 (5) <-> RCLCK (12)</span></span>
<span id="cb1-7"><a href="#cb1-7" aria-hidden="true" tabindex="-1"></a><span class="co">// PB1 (6) <-> SRCLK (11)</span></span>
<span id="cb1-8"><a href="#cb1-8" aria-hidden="true" tabindex="-1"></a><span class="co">// PB2 (7) <-> SER (14)</span></span>
<span id="cb1-9"><a href="#cb1-9" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-10"><a href="#cb1-10" aria-hidden="true" tabindex="-1"></a><span class="co">// ATtiny85</span></span>
<span id="cb1-11"><a href="#cb1-11" aria-hidden="true" tabindex="-1"></a><span class="co">// --------</span></span>
<span id="cb1-12"><a href="#cb1-12" aria-hidden="true" tabindex="-1"></a><span class="co">// 5V <-> VCC(8)</span></span>
<span id="cb1-13"><a href="#cb1-13" aria-hidden="true" tabindex="-1"></a><span class="co">// GND <-> GND(4)</span></span>
<span id="cb1-14"><a href="#cb1-14" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-15"><a href="#cb1-15" aria-hidden="true" tabindex="-1"></a><span class="co">// SN74HC595</span></span>
<span id="cb1-16"><a href="#cb1-16" aria-hidden="true" tabindex="-1"></a><span class="co">// ---------</span></span>
<span id="cb1-17"><a href="#cb1-17" aria-hidden="true" tabindex="-1"></a><span class="co">// GND <-> GND (8)</span></span>
<span id="cb1-18"><a href="#cb1-18" aria-hidden="true" tabindex="-1"></a><span class="co">// GND <-> SRCLR (10)</span></span>
<span id="cb1-19"><a href="#cb1-19" aria-hidden="true" tabindex="-1"></a><span class="co">// </span></span>
<span id="cb1-20"><a href="#cb1-20" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-21"><a href="#cb1-21" aria-hidden="true" tabindex="-1"></a><span class="co">// PB0 (5) <-> RCLK (12) ST_CP / latch / Storage register clock pin in tutorial</span></span>
<span id="cb1-22"><a href="#cb1-22" aria-hidden="true" tabindex="-1"></a><span class="dt">int</span> rclk_out = <span class="dv">0</span>;</span>
<span id="cb1-23"><a href="#cb1-23" aria-hidden="true" tabindex="-1"></a><span class="co">// PB1 (6) <-> SRCLK (11) SH_CP / Shift register clock pin in tutorial</span></span>
<span id="cb1-24"><a href="#cb1-24" aria-hidden="true" tabindex="-1"></a> <span class="dt">int</span> srclk_out = <span class="dv">1</span>;</span>
<span id="cb1-25"><a href="#cb1-25" aria-hidden="true" tabindex="-1"></a><span class="co">// PB2 (7) <-> SER (14) DS / Serial data input in tutorial </span></span>
<span id="cb1-26"><a href="#cb1-26" aria-hidden="true" tabindex="-1"></a><span class="dt">int</span> ser_out = <span class="dv">2</span>;</span>
<span id="cb1-27"><a href="#cb1-27" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-28"><a href="#cb1-28" aria-hidden="true" tabindex="-1"></a><span class="dt">void</span> setup() {</span>
<span id="cb1-29"><a href="#cb1-29" aria-hidden="true" tabindex="-1"></a> pinMode(rclk_out, OUTPUT);</span>
<span id="cb1-30"><a href="#cb1-30" aria-hidden="true" tabindex="-1"></a> pinMode(srclk_out, OUTPUT);</span>
<span id="cb1-31"><a href="#cb1-31" aria-hidden="true" tabindex="-1"></a> pinMode(ser_out, OUTPUT);</span>
<span id="cb1-32"><a href="#cb1-32" aria-hidden="true" tabindex="-1"></a>}</span>
<span id="cb1-33"><a href="#cb1-33" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-34"><a href="#cb1-34" aria-hidden="true" tabindex="-1"></a><span class="dt">void</span> led_out(<span class="dt">int</span> b) {</span>
<span id="cb1-35"><a href="#cb1-35" aria-hidden="true" tabindex="-1"></a> digitalWrite(rclk_out, LOW); <span class="co">// so LEDs don't change while the bits are being transmitted</span></span>
<span id="cb1-36"><a href="#cb1-36" aria-hidden="true" tabindex="-1"></a> shiftOut(ser_out, srclk_out, LSBFIRST, b); <span class="co">// send the data</span></span>
<span id="cb1-37"><a href="#cb1-37" aria-hidden="true" tabindex="-1"></a> digitalWrite(rclk_out, HIGH); <span class="co">// make the new eight bits available</span></span>
<span id="cb1-38"><a href="#cb1-38" aria-hidden="true" tabindex="-1"></a>}</span>
<span id="cb1-39"><a href="#cb1-39" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-40"><a href="#cb1-40" aria-hidden="true" tabindex="-1"></a><span class="dt">void</span> one_at_a_time() {</span>
<span id="cb1-41"><a href="#cb1-41" aria-hidden="true" tabindex="-1"></a> <span class="cf">for</span> (<span class="dt">int</span> i = <span class="dv">0</span>; i < <span class="dv">7</span>; ++i) {</span>
<span id="cb1-42"><a href="#cb1-42" aria-hidden="true" tabindex="-1"></a> led_out(<span class="dv">1</span> << i);</span>
<span id="cb1-43"><a href="#cb1-43" aria-hidden="true" tabindex="-1"></a> delay(<span class="dv">500</span>);</span>
<span id="cb1-44"><a href="#cb1-44" aria-hidden="true" tabindex="-1"></a> } </span>
<span id="cb1-45"><a href="#cb1-45" aria-hidden="true" tabindex="-1"></a>}</span>
<span id="cb1-46"><a href="#cb1-46" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-47"><a href="#cb1-47" aria-hidden="true" tabindex="-1"></a><span class="dt">void</span> all_combos() {</span>
<span id="cb1-48"><a href="#cb1-48" aria-hidden="true" tabindex="-1"></a> <span class="cf">for</span> (<span class="dt">int</span> i = <span class="dv">0</span>; i < <span class="dv">256</span>; ++i) {</span>
<span id="cb1-49"><a href="#cb1-49" aria-hidden="true" tabindex="-1"></a> led_out(i);</span>
<span id="cb1-50"><a href="#cb1-50" aria-hidden="true" tabindex="-1"></a> delay(<span class="dv">500</span>);</span>
<span id="cb1-51"><a href="#cb1-51" aria-hidden="true" tabindex="-1"></a> }</span>
<span id="cb1-52"><a href="#cb1-52" aria-hidden="true" tabindex="-1"></a>}</span>
<span id="cb1-53"><a href="#cb1-53" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-54"><a href="#cb1-54" aria-hidden="true" tabindex="-1"></a><span class="dt">void</span> blink_all() {</span>
<span id="cb1-55"><a href="#cb1-55" aria-hidden="true" tabindex="-1"></a> <span class="cf">for</span> (<span class="dt">int</span> i = <span class="dv">0</span>; i < <span class="dv">100</span>; ++i) {</span>
<span id="cb1-56"><a href="#cb1-56" aria-hidden="true" tabindex="-1"></a> led_out(<span class="dv">0</span>);</span>
<span id="cb1-57"><a href="#cb1-57" aria-hidden="true" tabindex="-1"></a> delay(<span class="dv">250</span>);</span>
<span id="cb1-58"><a href="#cb1-58" aria-hidden="true" tabindex="-1"></a> led_out(<span class="dv">255</span>);</span>
<span id="cb1-59"><a href="#cb1-59" aria-hidden="true" tabindex="-1"></a> delay(<span class="dv">250</span>);</span>
<span id="cb1-60"><a href="#cb1-60" aria-hidden="true" tabindex="-1"></a> }</span>
<span id="cb1-61"><a href="#cb1-61" aria-hidden="true" tabindex="-1"></a>}</span>
<span id="cb1-62"><a href="#cb1-62" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-63"><a href="#cb1-63" aria-hidden="true" tabindex="-1"></a><span class="dt">void</span> loop() {</span>
<span id="cb1-64"><a href="#cb1-64" aria-hidden="true" tabindex="-1"></a> one_at_a_time();</span>
<span id="cb1-65"><a href="#cb1-65" aria-hidden="true" tabindex="-1"></a> delay(<span class="dv">1000</span>);</span>
<span id="cb1-66"><a href="#cb1-66" aria-hidden="true" tabindex="-1"></a> all_combos();</span>
<span id="cb1-67"><a href="#cb1-67" aria-hidden="true" tabindex="-1"></a> delay(<span class="dv">1000</span>);</span>
<span id="cb1-68"><a href="#cb1-68" aria-hidden="true" tabindex="-1"></a> blink_all();</span>
<span id="cb1-69"><a href="#cb1-69" aria-hidden="true" tabindex="-1"></a> delay(<span class="dv">1000</span>);</span>
<span id="cb1-70"><a href="#cb1-70" aria-hidden="true" tabindex="-1"></a>}</span></code></pre></div>
<p>It is not visible in the pictures, but running this code revealed that I probably miswired the 595 outputs to the LEDs. I expected the <code>one_at_a_time</code> to basically walk from one side to the other regardless of wiring, but it really doesn’t look like that’s happening.</p>
<p>For now, I got my blinking lights. If I tap <var>PB<sub>2</sub></var> with the buzzer, I get a nice rythmic pulse.</p>
<p>Clearly, the next steps are to figure out how to correctly map the output pins of the 595 to the invidiual leds. There is also the issue of getting some input from the board. Iigure with the two open pins on the left side of this ATtiny, I have one more pin than I need to receive serial data. So, reading the read buttons seems to be a straightforward proposition: Just use a separate ATtiny85 to read the for pins, and send the output to the one that controls the LEDs. That way, I can also more easily debug the wiring. Press a button, see if I can toggle a specific LED.</p>
<p>I can also handle the keypad using one ATtiny85 for the rows and another for the columns. Row reader reads from column reader and then sends to ATtiny85 on the remaining pin. Itlooks like <a href="https://www.arduino.cc/en/Reference/SoftwareSerialConstructor">SoftwareSerial</a> can help here. Combining row, column, and red button information into a single value seems to require <del>nine bits</del> <ins>six bits</ins>for daisy-chaining these.</p>
<p>Of course, there are also parallel to serial shift registers, but apparently they are not regularly included in “fun packs”. So, the question whether the challenge of daisy chaining three ATtiny85 to give the LED controller full key state information or whether it is better to locate a cousin of the 595 such as the <a href="https://www.ti.com/lit/ds/symlink/sn74hc165.pdf">74HC165</a>, but one of those is still not enought to read both sets of key states. Another alternative is the <a href="https://media.digikey.com/pdf/Data%20Sheets/Fairchild%20PDFs/74F676.pdf">74F676</a> which is a 16-bit Serial/Parallel-In Serial Out shift register. But 16-bit shift registers in DIP packages seem to be a rarity.</p>
<p>We’ll see what strikes me as more fun when get another couple of hours to try something. In the meantime, here is a <a title="Short clip of the contraption in action on YouTube" href="https://youtu.be/z32jE9T6wS8">short video of some flashing lights</a></p>
<p>You can discuss this post <a href="https://redd.it/l9m2r5">on r/attiny</a> and <a href="https://news.ycombinator.com/item?id=25982614">HackerNews</a>.</p>
<p><b>Note</b>: Hope you enjoyed reading this post. Please note that I tend not to be too careful playing with electronics. Heck, I burned a poor litte ATtiny85 to a crisp when I plugged in the AVR programmer for the first time. It is definitely possible to fry various components (from the cheap ATtiny85 to your expensive computer) when working with things that carry a current and might end up short-circuiting. Further, you can really do a lot of damage if you interface with anything connected to AC. So, please be careful. While I am sharing this information for fun and informational purposes, it is your responsibility to take precautions and work in a safe manner.</p>
</div>
</article>
Sinan UnurSmall is beautifultag:www.nu42.com,2020-12-06:/2020/12/small-is-beautiful2020-12-06T02:26:00-00:00
<article class="h-entry article">
<div class="article-header-container">
<header>
<h1 class="p-name article-title">Small is beautiful</h1><h2 class="p-author article-author">A. Sinan Unur</h2><h3 class="article-published"><time datetime="2020-12-06T02:26:00-00:00" class="dt-published">December 6, 2020</time></h3>
</header>
</div>
<div class="article-content"><p><a href="attiny85-128x32-oled-murphy-was-an-optimist.jpg" title="Attiny85 and a 128x32 I2C OLED screen on a breadboard"><img src="https://www.nu42.comattiny85-128x32-oled-murphy-was-an-optimist-400x400.jpg" alt="[Attiny95 driving a 128x32 I2C OLED screen on a breadboard]" width="400" height="400"></a></p>
</div>
</article>
Sinan UnurDon't complicate thingstag:www.nu42.com,2018-03-13:/2018/03/dont-complicate-things.html2018-03-13T18:30:00+00:00
<article class="h-entry article">
<div class="article-header-container">
<header>
<h1 class="p-name article-title">Don't complicate things</h1><h2 class="p-author article-author">A. Sinan Unur</h2><h3 class="article-published"><time datetime="2018-03-13T18:30:00+00:00" class="dt-published">March 13, 2018</time></h3>
</header>
</div>
<div class="article-content"><p>Every week, I look forward to receiving <a href="https://perlmaven.com/">Gabor</a>’s <a href="http://perlweekly.com/">Perl Weekly</a>. I noticed <a href="https://domm.plix.at/perl/2018_03_forking_tests.html">Forking tests</a> in the <a href="http://perlweekly.com/archive/346.html">latest issue</a>. The author has written a password generation module, and wants to ensure that child and parent processes produce different sequences of pseudo-random passwords following a <a href="https://perldoc.perl.org/functions/fork.html">fork</a>. That, in and of itself, is a commendable objective.</p>
<p>However, the author makes the test script unnecessarily complicated, and makes some misleading statements. For example, in closing, the author states:</p>
<blockquote>
<h3 id="windows-does-not-like-to-fork">Windows does not like to fork()</h3>
</blockquote>
<blockquote>
<p>Another easy fix: just skip the test if we’re running on windows:</p>
</blockquote>
<blockquote>
<pre><code>if ( $^O eq 'MSWin32' ) {
plan( skip_all => 'skip fork tests on MSWin32' ) ;
}</code></pre>
</blockquote>
<p>It is true that Perl’s <code>fork</code> on Windows does not really do a <a href="http://pubs.opengroup.org/onlinepubs/9699919799/functions/fork.html">POSIX <code>fork</code></a>. However, for stuff like this, you would not notice the difference.</p>
<p>The author also states:</p>
<blockquote>
<p>… what I needed to do is to collect the passwords generated in the child process and the parent process and then make sure that they are not the same. I needed what is called <abbr title="Inter Process Communication">IPC</abbr> … Which is ugly and messy</p>
</blockquote>
<blockquote>
<p>I played a bit with <a href="https://metacpan.org/pod/IPC::Shareable">IPC::Shareable</a> but did not get it working, so I resorted to a simple and battle-tested way to share data between processes: the file system!</p>
</blockquote>
<blockquote>
<p>I open a temp-file and write each password into this file (in the child and the parent process). After closing all the child processes, I read the file and can now inspect the passwords and make sure they do not repeat.</p>
</blockquote>
<p>Yes, IPC can get hairy. But, not this. This is explained very well in <a href="https://perldoc.perl.org/perlipc.html#Using-open()-for-IPC"><code>perldoc perlipc</code></a>. For a simple demonstration, see also my crazy stupid <a href="https://benchmarksgame.alioth.debian.org/u64q/program.php?test=regexredux&lang=perl&id=4">regex-redux entry</a> in the Benchmarks Game. That script helped Perl edge ahead of Python3, and runs without modification on Windows and Linux systems (in a sense, it is less stupid than the one which <a href="https://benchmarksgame.alioth.debian.org/u64q/program.php?test=regexredux&lang=perl&id=3">mixes <code>fork</code> with threads</a>).</p>
<p>I decided to install <code>CtrlO::Crypt::XkcdPassword</code> on my Windows testbed. …:</p>
<pre class="text"><code>Building and testing Class-Accessor-0.51 ... OK
Successfully installed Class-Accessor-0.51
! Finding WordList (0) on mirror https://cpan.metacpan.org failed.
! Couldn't find module or a distribution WordList
...
! Installing the dependencies failed: Module 'WordList' is not installed
! Bailing out the installation for CtrlO-Crypt-XkcdPassword-1.003.</code></pre>
<p>Well, OK then. I know <a href="https://metacpan.org/pod/WordLists::WordList">WordLists::WordList</a> exists, but I am not sure what <code>WordList</code> is … So, instead of going down that rabbit hole, let me close with a short script which should help the author simplify the tests (<em>and</em> enable them on Windows):</p>
<div class="sourceCode" id="cb3"><pre class="sourceCode perl"><code class="sourceCode perl"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="kw">#!/usr/bin/env perl</span></span>
<span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a><span class="fu">use</span> <span class="kw">strict</span>;</span>
<span id="cb3-4"><a href="#cb3-4" aria-hidden="true" tabindex="-1"></a><span class="fu">use</span> <span class="kw">warnings</span>;</span>
<span id="cb3-5"><a href="#cb3-5" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb3-6"><a href="#cb3-6" aria-hidden="true" tabindex="-1"></a><span class="fu">use</span> <span class="kw">constant</span> N_RANDOMS => <span class="dv">10</span>;</span>
<span id="cb3-7"><a href="#cb3-7" aria-hidden="true" tabindex="-1"></a><span class="fu">use</span> <span class="fu">Test::More</span>;</span>
<span id="cb3-8"><a href="#cb3-8" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb3-9"><a href="#cb3-9" aria-hidden="true" tabindex="-1"></a>run();</span>
<span id="cb3-10"><a href="#cb3-10" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb3-11"><a href="#cb3-11" aria-hidden="true" tabindex="-1"></a><span class="kw">sub </span><span class="fu">genrand</span> { <span class="fu">int</span>(<span class="fu">rand</span>(<span class="dv">2</span>**<span class="dv">32</span>)) }</span>
<span id="cb3-12"><a href="#cb3-12" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb3-13"><a href="#cb3-13" aria-hidden="true" tabindex="-1"></a><span class="kw">sub </span><span class="fu">run</span> {</span>
<span id="cb3-14"><a href="#cb3-14" aria-hidden="true" tabindex="-1"></a> <span class="kw">my</span> <span class="dt">@parents_randoms</span>;</span>
<span id="cb3-15"><a href="#cb3-15" aria-hidden="true" tabindex="-1"></a> <span class="kw">my</span> <span class="dt">@childs_randoms</span>;</span>
<span id="cb3-16"><a href="#cb3-16" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb3-17"><a href="#cb3-17" aria-hidden="true" tabindex="-1"></a> <span class="fu">pipe</span>(<span class="kw">my</span> <span class="dt">$reader</span>, <span class="kw">my</span> <span class="dt">$writer</span>) <span class="ot">or</span> <span class="fu">die</span> <span class="ot">"</span><span class="st">Cannot set up pipe: </span><span class="wa">$!</span><span class="ot">"</span>;</span>
<span id="cb3-18"><a href="#cb3-18" aria-hidden="true" tabindex="-1"></a> <span class="kw">my</span> <span class="dt">$pid</span> = <span class="fu">fork</span>;</span>
<span id="cb3-19"><a href="#cb3-19" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb3-20"><a href="#cb3-20" aria-hidden="true" tabindex="-1"></a> <span class="kw">if</span> ( <span class="dt">$pid</span> ) {</span>
<span id="cb3-21"><a href="#cb3-21" aria-hidden="true" tabindex="-1"></a> <span class="fu">close</span> <span class="dt">$writer</span></span>
<span id="cb3-22"><a href="#cb3-22" aria-hidden="true" tabindex="-1"></a> <span class="ot">or</span> <span class="fu">die</span> <span class="ot">"</span><span class="st">Cannot close child's writer in parent: </span><span class="wa">$!</span><span class="ot">"</span>;</span>
<span id="cb3-23"><a href="#cb3-23" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb3-24"><a href="#cb3-24" aria-hidden="true" tabindex="-1"></a> <span class="dt">@parents_randoms</span> = <span class="fu">map</span> genrand(), <span class="dv">1</span> .. +N_RANDOMS;</span>
<span id="cb3-25"><a href="#cb3-25" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb3-26"><a href="#cb3-26" aria-hidden="true" tabindex="-1"></a> <span class="fu">chomp</span>(<span class="dt">@childs_randoms</span> = <<span class="dt">$reader</span>>);</span>
<span id="cb3-27"><a href="#cb3-27" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb3-28"><a href="#cb3-28" aria-hidden="true" tabindex="-1"></a> <span class="fu">close</span> <span class="dt">$reader</span></span>
<span id="cb3-29"><a href="#cb3-29" aria-hidden="true" tabindex="-1"></a> <span class="ot">or</span> <span class="fu">die</span> <span class="ot">"</span><span class="st">Cannot close parent's reader in parent: </span><span class="wa">$!</span><span class="ot">"</span>;</span>
<span id="cb3-30"><a href="#cb3-30" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb3-31"><a href="#cb3-31" aria-hidden="true" tabindex="-1"></a> <span class="fu">waitpid</span>(<span class="dt">$pid</span>, <span class="dv">0</span>);</span>
<span id="cb3-32"><a href="#cb3-32" aria-hidden="true" tabindex="-1"></a> }</span>
<span id="cb3-33"><a href="#cb3-33" aria-hidden="true" tabindex="-1"></a> <span class="kw">else</span> {</span>
<span id="cb3-34"><a href="#cb3-34" aria-hidden="true" tabindex="-1"></a> <span class="fu">defined</span>(<span class="dt">$pid</span>) <span class="ot">or</span> <span class="fu">die</span> <span class="ot">"</span><span class="st">Failed to fork: </span><span class="wa">$!</span><span class="ot">"</span>;</span>
<span id="cb3-35"><a href="#cb3-35" aria-hidden="true" tabindex="-1"></a> <span class="fu">close</span> <span class="dt">$reader</span></span>
<span id="cb3-36"><a href="#cb3-36" aria-hidden="true" tabindex="-1"></a> <span class="ot">or</span> <span class="fu">die</span> <span class="ot">"</span><span class="st">Cannot close parent's reader in child: </span><span class="wa">$!</span><span class="ot">"</span>;</span>
<span id="cb3-37"><a href="#cb3-37" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb3-38"><a href="#cb3-38" aria-hidden="true" tabindex="-1"></a> <span class="fu">srand</span>();</span>
<span id="cb3-39"><a href="#cb3-39" aria-hidden="true" tabindex="-1"></a> <span class="fu">print</span> <span class="dt">$writer</span> genrand(), <span class="ot">"</span><span class="ch">\n</span><span class="ot">"</span> <span class="kw">for</span> <span class="dv">1</span> .. +N_RANDOMS;</span>
<span id="cb3-40"><a href="#cb3-40" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb3-41"><a href="#cb3-41" aria-hidden="true" tabindex="-1"></a> <span class="fu">close</span> <span class="dt">$writer</span></span>
<span id="cb3-42"><a href="#cb3-42" aria-hidden="true" tabindex="-1"></a> <span class="ot">or</span> <span class="fu">die</span> <span class="ot">"</span><span class="st">Cannot close child's writer in child: </span><span class="wa">$!</span><span class="ot">"</span>;</span>
<span id="cb3-43"><a href="#cb3-43" aria-hidden="true" tabindex="-1"></a> <span class="fu">exit</span>( <span class="dv">0</span> );</span>
<span id="cb3-44"><a href="#cb3-44" aria-hidden="true" tabindex="-1"></a> }</span>
<span id="cb3-45"><a href="#cb3-45" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb3-46"><a href="#cb3-46" aria-hidden="true" tabindex="-1"></a> ok(<span class="dt">@parents_randoms</span> == +N_RANDOMS,</span>
<span id="cb3-47"><a href="#cb3-47" aria-hidden="true" tabindex="-1"></a> <span class="ot">"</span><span class="st">parent produced the right number of pseudo random integers</span><span class="ot">"</span>);</span>
<span id="cb3-48"><a href="#cb3-48" aria-hidden="true" tabindex="-1"></a> ok(<span class="dt">@childs_randoms</span> == +N_RANDOMS,</span>
<span id="cb3-49"><a href="#cb3-49" aria-hidden="true" tabindex="-1"></a> <span class="ot">"</span><span class="st">child produced the right number of pseudo random integers</span><span class="ot">"</span>);</span>
<span id="cb3-50"><a href="#cb3-50" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb3-51"><a href="#cb3-51" aria-hidden="true" tabindex="-1"></a> <span class="kw">my</span> <span class="dt">%all_randoms</span> = <span class="fu">map</span> +(<span class="wa">$_</span> => <span class="dv">1</span>), <span class="dt">@parents_randoms</span>, <span class="dt">@childs_randoms</span>;</span>
<span id="cb3-52"><a href="#cb3-52" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb3-53"><a href="#cb3-53" aria-hidden="true" tabindex="-1"></a> ok(<span class="fu">keys</span>(<span class="dt">%all_randoms</span>) > +N_RANDOMS,</span>
<span id="cb3-54"><a href="#cb3-54" aria-hidden="true" tabindex="-1"></a> <span class="ot">"</span><span class="st">parent and child produced different sequences of random integers</span><span class="ot">"</span>);</span>
<span id="cb3-55"><a href="#cb3-55" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb3-56"><a href="#cb3-56" aria-hidden="true" tabindex="-1"></a> ok(<span class="fu">keys</span>(<span class="dt">%all_randoms</span>) == <span class="dv">2</span> <span class="ot">*</span> N_RANDOMS,</span>
<span id="cb3-57"><a href="#cb3-57" aria-hidden="true" tabindex="-1"></a> <span class="ot">"</span><span class="st">parent and child produced disjoint sets of pseudo random integers</span><span class="ot">"</span>);</span>
<span id="cb3-58"><a href="#cb3-58" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb3-59"><a href="#cb3-59" aria-hidden="true" tabindex="-1"></a> done_testing;</span>
<span id="cb3-60"><a href="#cb3-60" aria-hidden="true" tabindex="-1"></a>}</span>
<span id="cb3-61"><a href="#cb3-61" aria-hidden="true" tabindex="-1"></a><span class="kw">__END__</span></span></code></pre></div>
<p>Here is a sample run on Windows 10 with a Visual Studio 2017 built <code>perl</code> 5.26.1:</p>
<pre class="text"><code>C:\Users\sinan\AppData\Local\Temp> prove -v --no-color t.pl
t.pl ..
ok 1 - parent produced the right number of pseudo random integers
ok 2 - child produced the right number of pseudo random integers
ok 3 - parent and child produced different sequences of random integers
ok 4 - parent and child produced disjoint sets of pseudo random integers
1..4
ok
All tests successful.
Files=1, Tests=4, 0 wallclock secs ( 0.06 usr + 0.00 sys = 0.06 CPU)
Result: PASS</code></pre>
<p>There is a small probability (I didn’t bother to calculate) that the last test will fail just due to randomness. In fact, one might consider it sufficient if the generated sequences just differ instead of being completely disjoint. I expect more from my pseudo-random number generators <code>;-)</code></p>
<p>If you are using an older <code>perl</code> on Windows, <a href="/2010/09/be-vary-of-using-built-in-rng-for.html">don’t use builtin <code>rand</code></a>.</p>
<p>Test driven development is great in principle, but in all too many instances, the tests themselves can be the source of problems. Simplicity is important, especially in tests.</p>
<p>Premptive response: No, I am not going to put together a pull request. There are too many nitty gritty details to get into. If the author wants improve the module’s code and tests based on the example above, he/she can do it with an acknowledgement. If others reading this post are prompted to think “maybe I can make this piece of code simpler” as a result of comparing and contrasting the snippet above with the significantly more complicated original <a href="https://github.com/domm/CtrlO-Crypt-XkcdPassword/blob/master/t/40-fork.t">test script</a>, that’s even better.</p>
<p>As a side note, I still run into modules that try to create temporary files in the root directory of my <code>C:</code> drive. That usually happens due to the script clearing the environment and not saving temporary directory locations. This is an unfortunate interaction with <a href="https://metacpan.org/source/XSAWYERX/PathTools-3.74/lib/File/Spec/Win32.pm#L70"><code>File::Spec->tmpdir</code></a> which defaults to trying to write to the root (hey, Windows 95 allowed it!) of the current drive if it can’t locate the customary directories. I think <code>File::Spec->tmpdir</code> ought to <code>croak</code> if the environment does not contain one of <code>TMP</code>, <code>TEMP</code>, or <code>TMPDIR</code>, instead of offering <code>C:\system\temp</code> or <code>C:\temp</code> or <code>/tmp</code> or <code>/</code> on Windows. Regardless of <code>File::Spec</code>’s behavior, scripts, modules, etc should not delete those environment variables.</p>
<p>You can discuss this post on <a href="https://redd.it/84chwx">r/perl</a>.</p>
</div>
</article>
Sinan UnurWhat is Perl 6 to Perl?tag:www.nu42.com,2018-03-13:/2018/03/perl-vs-perl6.html2018-03-13T18:30:00+00:00
<article class="h-entry article">
<div class="article-header-container">
<header>
<h1 class="p-name article-title">What is Perl 6 to Perl?</h1><h2 class="p-author article-author">A. Sinan Unur</h2><h3 class="article-published"><time datetime="2018-03-13T18:30:00+00:00" class="dt-published">March 13, 2018</time></h3>
</header>
</div>
<div class="article-content"><p>From time to time, people I meet ask me if they should learn Perl 6 since that is the “latest” Perl. For the most part, they have not written a line of Perl in their lives. They also ask me how Perl 6 relates to Perl.</p>
<p>I think everyone should at least dabble in a variety of languages, and I would be the last person to dissuade someone from learning a new language. However, I do think one would benefit more from learning Perl instead. After all, the installed base must be growing pretty well. Without realizing, a lot of people have installed a recent (I think most are <code>5.24</code>) MinGW based Perl distribution on their Windows machines when they installed Git for Windows either alongside their Visual Studio or VS Code installation. They can open the gates to pure Perl magic with minimal effort. I am going to guess that there are more Windows machines with a decent Perl distribution installed now than ever.</p>
<p>That aside, during one of these conversations, I was asked about what Perl 6 is to Perl. For example, “is it an upgrade?” “Does it run existing Perl code?”</p>
<p>Earlier, I had been suckered into accepting the <a href="https://www.perl.com/article/an-open-letter-to-the-perl-community/">“sister language”</a> narrative. Then, briefly, <a href="https://www.reddit.com/r/perl/comments/7s41yp/perl_weekly_issue_339_perl_vs_perl/dt2kghx/">I thought Perl 6 was trying to be the Borg</a>. Just a couple of days ago, it dawned on me: Perl is <a href="https://en.wikipedia.org/wiki/Omar_Khayyam">Omar Khayyam</a> and Perl 6 is <a href="https://en.wikipedia.org/wiki/Hassan-i_Sabbah">Hassan Sabbah</a>. If you haven’t heard these names before, the legends have been fictionalized in Western media in books such as <a href="https://amzn.to/2FEgxDz">Samarkand</a>, <a href="https://amzn.to/2Hw0ZlC">Alamut</a>, and in the movie <a href="https://amzn.to/2FEhnAd">The Keeper: The Legend of Omar Khayyam</a>.</p>
<p>This may not make sense to anyone else, but I am happy to finally have been able to categorize them.</p>
<p>You can discuss this post on <a href="https://redd.it/84arjd">r/perl</a>.</p>
<p>PS: Understanding the juxtaposition of Khayyam and Sabbah will improve your understanding of the world today as well. It might even help you understand <a href="https://en.wikipedia.org/wiki/Rumi">Mevlânâ</a>.</p>
<p>PPS: I am not a fan of Fitzgerald’s translation. Don’t get me wrong, it’s great work, and has the exact right ring <em><strong>IF</strong></em> you can get yourself in a particularly heavy <em>Victorian</em> mood, but these days the original meaning is easily lost to a reader who is not familiar with Victorian literature. Luckily, the Internet Archive provides <a href="https://en.wikipedia.org/wiki/Edward_Heron-Allen">Edward Heron-Allen</a>’s <a href="https://archive.org/details/in.ernet.dli.2015.173075">translation of Khayyam’s Rubaiyat</a>:</p>
<div style="background-color:#bacfba;padding:.5em"><p style="width:80%;margin:2em auto"><span style="margin-left:40%">148.</span><br><br>In a thousand places on the road I walk, Thou placest snares.<br>
Thou sayest, “I will catch thee if thou placest step <i>in them</i>”;<br>
in no smallest thing is the world independent of Thee,<br>
Thou orderest all things, and callest me rebellious.</p></div>
<p>PPPS: I you really want to learn Perl 6, you should take a look at <a href="https://www.learningperl6.com/">brian’s book</a>. I have been following him work his way through the various <a href="https://stackoverflow.com/search?tab=newest&q=%5bperl6%5d%20user%3a2766176%20is%3aquestion">gotchas and design flaws</a> of Perl 6 so you don’t have to. But, I am afraid Perl 6 is too full of those to gain widespread adoption.</p>
</div>
</article>
Sinan UnurAnother look at stock market behavior around "change" presidential elections in the U.S.tag:www.nu42.com,2017-04-12:/2017/04/look-at-sp500-since-election.html2017-04-12T19:30:00+00:00
<article class="h-entry article">
<div class="article-header-container">
<header>
<h1 class="p-name article-title">Another look at stock market behavior around "change" presidential elections in the U.S.</h1><h2 class="p-author article-author">A. Sinan Unur</h2><h3 class="article-published"><time datetime="2017-04-12T19:30:00+00:00" class="dt-published">April 12, 2017</time></h3>
</header>
</div>
<div class="article-content"><blockquote>
<p>Nothing discussed herein should be taken as investment advice or as a recommendation regarding any particular investment vehicle or course of action. All statements herein are statements of subjective opinion and for information and entertainment purposes only. Seek a duly licensed professional for investment advice.</p>
</blockquote>
<p>In January, prompted by stories about <a href="http://www.marketwatch.com/story/how-long-post-election-rallies-last-after-inauguration-day-in-one-sp-chart-2017-01-13">selling the inauguration</a>, I <a href="https://www.nu42.com/2017/01/sell-inauguration.html">looked at</a> <a href="https://www.nu42.com/2017/01/stock-market-presidential-election.html">stock market performance</a> around change elections in the U.S. Following the MarketWatch article, I used the <a href="https://finance.yahoo.com/quote/%5EGSPC/history?p=%5EGSPC">S&P 500</a> as a broad stock market indicator, and looked at its performance over the period 100 trading days before and after each election. Of course, that interval is rather arbitrary, but, then, so was the interval of 54 days before to 66 days after the election used in the original article: I tend to consider nice round arbitrary numbers less arbitrary than oddly specific arbitrary numbers <code>;-)</code></p>
<p>Due to a number of recent distractions, I totally missed the fact that it has now been 105 trading days since the 2016 presidential election. In my previous posts, I noted that, with the data available up to that point, the stock market reaction to Trump’s election looked most similar to that following Clinton’s election in 1992. But, the future had not been written at that point, so we did not know how it would compare to the 7.4% gain after 100 days of the Clinton administration. Below, I am going to go over some Perl and SQL to fetch and transform the S&P 500 data you can obtain from Yahoo! to do your analysis, but, if you are curious, at the end of the 100th trading day following Trump’s election, the S&P 500 was up by 10.31% relative to its level on election day — a level that compared to the 7.35% gain over the same period following Clinton’s election.</p>
<p>In fact, among change elections (those in 1952, 1960, 1968, 1976, 1980, 1992, 2000, 2008, and 2016), the performance of S&P 500 100 trading days after the election was second only to its performance over a comporable period following Kennedy’s election: On April 4, 1961, S&P 500 closed at about 66 points, compared to about 55 points around election day (markets were closed on November 8, 1960).</p>
<p>Once again, let me stress that the future is yet to be written. I don’t put any more stock in this kind of “analysis” than I put in <a href="http://www.turkishstylegroundcoffee.com/turkish-coffee-reading/">Turkish coffee reading</a>. Looking forward, there are plenty of reasons to be cautious: There is ample uncertainty due to the looming threat of war and terrorism and political volatility. After all, S&P 500 did not perform well over March, and today looks like it’s going to be another down day. Where we go from here will depend on what happens in the future, not what happened in 1992 or 1952.</p>
<p>With that aside, let’s look at some code.</p>
<p>This time, I decided to leverage <a href="https://www.sqlite.org/">SQLite</a> to produce the output tables I wanted instead of munging Perl arrays. I could have written the whole analysis using a mix of SQLite directives and SQL, but I dove into Perl first.</p>
<p>The first order of business was getting rid of the step of manually copying the URL for the relevant date ranges of S&P 500 data on Yahoo! Finance. So, I wrote the following simple function so I did not have to remember which single letter in the URL corresponded to which parameter:</p>
<div class="sourceCode" id="cb1"><pre class="sourceCode perl"><code class="sourceCode perl"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Assumes $start and $end contain YMD dates with no separators</span></span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a><span class="co"># Yahoo! Finance expects zero-based month number</span></span>
<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a><span class="kw">sub </span><span class="fu">sp500_url</span> ( <span class="dt">$start</span>, <span class="dt">$end</span> ) {</span>
<span id="cb1-5"><a href="#cb1-5" aria-hidden="true" tabindex="-1"></a> const <span class="kw">state</span> <span class="dt">$URL_TMPL</span> => <span class="ot">'</span><span class="ss">https://chart.finance.yahoo.com/table.csv?s=^GSPC&c=%d;a=%d&b=%d&f=%d&d=%d&e=%d&g=d&ignore=.csv</span><span class="ot">'</span>;</span>
<span id="cb1-6"><a href="#cb1-6" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-7"><a href="#cb1-7" aria-hidden="true" tabindex="-1"></a> <span class="kw">my</span> <span class="dt">@start</span> = <span class="fu">unpack</span> <span class="ot">'</span><span class="ss">A4 A2 A2</span><span class="ot">'</span>, <span class="dt">$start</span>;</span>
<span id="cb1-8"><a href="#cb1-8" aria-hidden="true" tabindex="-1"></a> <span class="kw">my</span> <span class="dt">@end</span> = <span class="fu">unpack</span> <span class="ot">'</span><span class="ss">A4 A2 A2</span><span class="ot">'</span>, <span class="dt">$end</span>;</span>
<span id="cb1-9"><a href="#cb1-9" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-10"><a href="#cb1-10" aria-hidden="true" tabindex="-1"></a> -- <span class="wa">$_</span>->[<span class="dv">1</span>] <span class="kw">for</span> \(<span class="dt">@start</span>, <span class="dt">@end</span>);</span>
<span id="cb1-11"><a href="#cb1-11" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-12"><a href="#cb1-12" aria-hidden="true" tabindex="-1"></a> <span class="fu">sprintf</span> <span class="dt">$URL_TMPL</span>, <span class="dt">@start</span>, <span class="dt">@end</span>;</span>
<span id="cb1-13"><a href="#cb1-13" aria-hidden="true" tabindex="-1"></a>}</span></code></pre></div>
<p>A few notes: I am using my favorite constant handling module <a href="https://metacpan.org/pod/Const::Fast">Const::Fast</a>, along with the <a href="https://metacpan.org/pod/distribution/perl/pod/perl5100delta.pod#state()-variables">state variables</a> feature introduced in Perl 5.10 along with the <a href="https://www.effectiveperlprogramming.com/2015/04/use-v5-20-subroutine-signatures/">subroutine signatures</a> feature introduced in Perl 5.20. In a sense, these uses are all gratuitous in that it is easy to write Perl without depending on features beyond what exists in 5.8, but why not take advantage of niceties if there is no pressing backward compatibility constraints?</p>
<p>Similarly, I wanted to make sure the filenames I used adhered to a simple convention, so, I wrote:</p>
<div class="sourceCode" id="cb2"><pre class="sourceCode perl"><code class="sourceCode perl"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="kw">sub </span><span class="fu">sp500_filename</span> ( <span class="dt">$start</span>, <span class="dt">$end</span>, <span class="dt">$ext</span> ) {</span>
<span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a> <span class="fu">sprintf</span> <span class="ot">'</span><span class="ss">sp500-%s-%s.%s</span><span class="ot">'</span>, <span class="dt">$start</span>, <span class="dt">$end</span>, <span class="dt">$ext</span>;</span>
<span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a>}</span></code></pre></div>
<p>This way, I only have to change one thing if I want to name files differently.</p>
<p>To download the data, I used <a href="https://metacpan.org/pod/HTTP::Tiny">HTTP::Tiny</a>:</p>
<div class="sourceCode" id="cb3"><pre class="sourceCode perl"><code class="sourceCode perl"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="kw">sub </span><span class="fu">download</span> ( <span class="dt">$url</span>, <span class="dt">$file</span> ) {</span>
<span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a> <span class="kw">my</span> <span class="dt">$response</span> = <span class="fu">HTTP::Tiny</span>->new->mirror( <span class="dt">$url</span> => <span class="dt">$file</span>, { verify_SSL => <span class="dv">1</span> });</span>
<span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb3-4"><a href="#cb3-4" aria-hidden="true" tabindex="-1"></a> <span class="kw">unless</span> ( <span class="dt">$response</span>->{success} ) {</span>
<span id="cb3-5"><a href="#cb3-5" aria-hidden="true" tabindex="-1"></a> croak <span class="ot">"</span><span class="st">Failed to download from '</span><span class="dt">$url</span><span class="ot">'</span><span class="st"> and save in '</span><span class="dt">$file</span><span class="ot">'"</span>;</span>
<span id="cb3-6"><a href="#cb3-6" aria-hidden="true" tabindex="-1"></a> }</span>
<span id="cb3-7"><a href="#cb3-7" aria-hidden="true" tabindex="-1"></a>}</span></code></pre></div>
<p>The main routine is rather straightforward. You can run the script with a start and end date for the daily S&P 500 data table you want to download from Yahoo! Finance. If those dates are not specified, the script defaults to downloading the entire series. I manually looked up the date of each presidential election on Wikipedia. Somehow, that was quicker than writing a script to do it for me.</p>
<p>One complication has to do with the fact that markets were closed on election days in 1952, 1960, 1976, and 1980, and open on election days in 1992, 2000, 2008, and 2016, but I decided to ignore that as there is not a huge difference in the levels of the index on the days before and after the election in those years:</p>
<div class="sourceCode" id="cb4"><pre class="sourceCode sql"><code class="sourceCode sql"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a>sqlite<span class="op">></span> <span class="kw">select</span> <span class="op">*</span> <span class="kw">from</span> sp500 <span class="kw">where</span> dt <span class="kw">between</span> <span class="st">'1952-11-03'</span> <span class="kw">and</span> <span class="st">'1952-11-05'</span>;</span>
<span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a><span class="dv">1952</span><span class="op">-</span><span class="dv">11</span><span class="op">-</span><span class="dv">03</span>|<span class="fl">24.6</span></span>
<span id="cb4-3"><a href="#cb4-3" aria-hidden="true" tabindex="-1"></a><span class="dv">1952</span><span class="op">-</span><span class="dv">11</span><span class="op">-</span><span class="dv">05</span>|<span class="fl">24.67</span></span>
<span id="cb4-4"><a href="#cb4-4" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb4-5"><a href="#cb4-5" aria-hidden="true" tabindex="-1"></a>sqlite<span class="op">></span> <span class="kw">select</span> <span class="op">*</span> <span class="kw">from</span> sp500 <span class="kw">where</span> dt <span class="kw">between</span> <span class="st">'1960-11-07'</span> <span class="kw">and</span> <span class="st">'1960-11-09'</span>;</span>
<span id="cb4-6"><a href="#cb4-6" aria-hidden="true" tabindex="-1"></a><span class="dv">1960</span><span class="op">-</span><span class="dv">11</span><span class="op">-</span><span class="dv">07</span>|<span class="fl">55.110001</span></span>
<span id="cb4-7"><a href="#cb4-7" aria-hidden="true" tabindex="-1"></a><span class="dv">1960</span><span class="op">-</span><span class="dv">11</span><span class="op">-</span><span class="dv">09</span>|<span class="fl">55.349998</span></span>
<span id="cb4-8"><a href="#cb4-8" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb4-9"><a href="#cb4-9" aria-hidden="true" tabindex="-1"></a>sqlite<span class="op">></span> <span class="kw">select</span> <span class="op">*</span> <span class="kw">from</span> sp500 <span class="kw">where</span> dt <span class="kw">between</span> <span class="st">'1968-11-04'</span> <span class="kw">and</span> <span class="st">'1968-11-06'</span>;</span>
<span id="cb4-10"><a href="#cb4-10" aria-hidden="true" tabindex="-1"></a><span class="dv">1968</span><span class="op">-</span><span class="dv">11</span><span class="op">-</span><span class="dv">04</span>|<span class="fl">103.099998</span></span>
<span id="cb4-11"><a href="#cb4-11" aria-hidden="true" tabindex="-1"></a><span class="dv">1968</span><span class="op">-</span><span class="dv">11</span><span class="op">-</span><span class="dv">06</span>|<span class="fl">103.269997</span></span>
<span id="cb4-12"><a href="#cb4-12" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb4-13"><a href="#cb4-13" aria-hidden="true" tabindex="-1"></a>sqlite<span class="op">></span> <span class="kw">select</span> <span class="op">*</span> <span class="kw">from</span> sp500 <span class="kw">where</span> dt <span class="kw">between</span> <span class="st">'1976-11-01'</span> <span class="kw">and</span> <span class="st">'1976-11-03'</span>;</span>
<span id="cb4-14"><a href="#cb4-14" aria-hidden="true" tabindex="-1"></a><span class="dv">1976</span><span class="op">-</span><span class="dv">11</span><span class="op">-</span><span class="dv">01</span>|<span class="fl">103.099998</span></span>
<span id="cb4-15"><a href="#cb4-15" aria-hidden="true" tabindex="-1"></a><span class="dv">1976</span><span class="op">-</span><span class="dv">11</span><span class="op">-</span><span class="dv">03</span>|<span class="fl">101.919998</span></span>
<span id="cb4-16"><a href="#cb4-16" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb4-17"><a href="#cb4-17" aria-hidden="true" tabindex="-1"></a>sqlite<span class="op">></span> <span class="kw">select</span> <span class="op">*</span> <span class="kw">from</span> sp500 <span class="kw">where</span> dt <span class="kw">between</span> <span class="st">'1980-11-03'</span> <span class="kw">and</span> <span class="st">'1980-11-05'</span>;</span>
<span id="cb4-18"><a href="#cb4-18" aria-hidden="true" tabindex="-1"></a><span class="dv">1980</span><span class="op">-</span><span class="dv">11</span><span class="op">-</span><span class="dv">03</span>|<span class="fl">129.039993</span></span>
<span id="cb4-19"><a href="#cb4-19" aria-hidden="true" tabindex="-1"></a><span class="dv">1980</span><span class="op">-</span><span class="dv">11</span><span class="op">-</span><span class="dv">05</span>|<span class="fl">131.330002</span></span></code></pre></div>
<p>As you can see below, the routine <code>create_election_tables</code> takes ‘days before’ and ‘days after’ arguments. I decided that ‘days before’ should include election day if markets were open. I did this because I did not want to think too much. Looking at it now, omitting the election day from all analyses might have made more sense. But, then, you have to deal with whether to look at performance relative to the day before election day. If I did this, S&P 500 would be up 10.73% over the 100 trading days following the 2016 election. Or, if I looked at performance relative to the day following the election, S&P 500 would be up 9.1% over the same period.</p>
<p>It makes more sense <em>a priori</em> to look at performance relative to the point when the winner of the election had not yet been revealed, but then one has to think about whether to ignore everything that happened between November 7, 2000 and December 12, 2000, and that would have been incompatible with my desire to avoid thinking.</p>
<div class="sourceCode" id="cb5"><pre class="sourceCode perl"><code class="sourceCode perl"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a><span class="kw">#!/usr/bin/env perl</span></span>
<span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-3"><a href="#cb5-3" aria-hidden="true" tabindex="-1"></a><span class="co">=for PURPOSE</span></span>
<span id="cb5-4"><a href="#cb5-4" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-5"><a href="#cb5-5" aria-hidden="true" tabindex="-1"></a><span class="co">Download S&P 500 data from Yahoo!, and produce an output file whose</span></span>
<span id="cb5-6"><a href="#cb5-6" aria-hidden="true" tabindex="-1"></a><span class="co">rows are trading days relative to each president's election, and whose</span></span>
<span id="cb5-7"><a href="#cb5-7" aria-hidden="true" tabindex="-1"></a><span class="co">columns are the values of S&P 500 relative to its value on election</span></span>
<span id="cb5-8"><a href="#cb5-8" aria-hidden="true" tabindex="-1"></a><span class="co">day.</span></span>
<span id="cb5-9"><a href="#cb5-9" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-10"><a href="#cb5-10" aria-hidden="true" tabindex="-1"></a><span class="co">=cut</span></span>
<span id="cb5-11"><a href="#cb5-11" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-12"><a href="#cb5-12" aria-hidden="true" tabindex="-1"></a><span class="fu">use</span> v5.<span class="dv">24</span>; <span class="co"># why not?!</span></span>
<span id="cb5-13"><a href="#cb5-13" aria-hidden="true" tabindex="-1"></a><span class="fu">use</span> <span class="kw">warnings</span>;</span>
<span id="cb5-14"><a href="#cb5-14" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-15"><a href="#cb5-15" aria-hidden="true" tabindex="-1"></a><span class="fu">use</span> feature <span class="ot">'</span><span class="ss">signatures</span><span class="ot">'</span>;</span>
<span id="cb5-16"><a href="#cb5-16" aria-hidden="true" tabindex="-1"></a><span class="fu">no</span> <span class="kw">warnings</span> <span class="ot">'</span><span class="ss">experimental::signatures</span><span class="ot">'</span>;</span>
<span id="cb5-17"><a href="#cb5-17" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-18"><a href="#cb5-18" aria-hidden="true" tabindex="-1"></a><span class="fu">use</span> autouse Carp => <span class="ot">'</span><span class="ss">croak</span><span class="ot">'</span>;</span>
<span id="cb5-19"><a href="#cb5-19" aria-hidden="true" tabindex="-1"></a><span class="fu">use</span> autouse <span class="ot">'</span><span class="ss">YAML::XS</span><span class="ot">'</span> => <span class="ot">'</span><span class="ss">Dump</span><span class="ot">'</span>;</span>
<span id="cb5-20"><a href="#cb5-20" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-21"><a href="#cb5-21" aria-hidden="true" tabindex="-1"></a><span class="fu">use</span> <span class="fu">Const::Fast</span>;</span>
<span id="cb5-22"><a href="#cb5-22" aria-hidden="true" tabindex="-1"></a><span class="fu">use</span> DBI;</span>
<span id="cb5-23"><a href="#cb5-23" aria-hidden="true" tabindex="-1"></a><span class="fu">use</span> <span class="fu">HTTP::Tiny</span>;</span>
<span id="cb5-24"><a href="#cb5-24" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-25"><a href="#cb5-25" aria-hidden="true" tabindex="-1"></a>run( <span class="wa">@ARGV</span> );</span>
<span id="cb5-26"><a href="#cb5-26" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-27"><a href="#cb5-27" aria-hidden="true" tabindex="-1"></a><span class="kw">sub </span><span class="fu">run</span> {</span>
<span id="cb5-28"><a href="#cb5-28" aria-hidden="true" tabindex="-1"></a> const <span class="kw">my</span> <span class="dt">@ELECTIONS</span> => (</span>
<span id="cb5-29"><a href="#cb5-29" aria-hidden="true" tabindex="-1"></a> [ eisenhower => <span class="ot">'</span><span class="ss">1952-11-04</span><span class="ot">'</span> ],</span>
<span id="cb5-30"><a href="#cb5-30" aria-hidden="true" tabindex="-1"></a> [ kennedy => <span class="ot">'</span><span class="ss">1960-11-08</span><span class="ot">'</span> ],</span>
<span id="cb5-31"><a href="#cb5-31" aria-hidden="true" tabindex="-1"></a> [ nixon => <span class="ot">'</span><span class="ss">1968-11-05</span><span class="ot">'</span> ],</span>
<span id="cb5-32"><a href="#cb5-32" aria-hidden="true" tabindex="-1"></a> [ carter => <span class="ot">'</span><span class="ss">1976-11-02</span><span class="ot">'</span> ],</span>
<span id="cb5-33"><a href="#cb5-33" aria-hidden="true" tabindex="-1"></a> [ reagan => <span class="ot">'</span><span class="ss">1980-11-04</span><span class="ot">'</span> ],</span>
<span id="cb5-34"><a href="#cb5-34" aria-hidden="true" tabindex="-1"></a> [ clinton => <span class="ot">'</span><span class="ss">1992-11-03</span><span class="ot">'</span> ],</span>
<span id="cb5-35"><a href="#cb5-35" aria-hidden="true" tabindex="-1"></a> [ bush => <span class="ot">'</span><span class="ss">2000-11-07</span><span class="ot">'</span> ],</span>
<span id="cb5-36"><a href="#cb5-36" aria-hidden="true" tabindex="-1"></a> [ obama => <span class="ot">'</span><span class="ss">2008-11-04</span><span class="ot">'</span> ],</span>
<span id="cb5-37"><a href="#cb5-37" aria-hidden="true" tabindex="-1"></a> [ trump => <span class="ot">'</span><span class="ss">2016-11-08</span><span class="ot">'</span> ],</span>
<span id="cb5-38"><a href="#cb5-38" aria-hidden="true" tabindex="-1"></a> );</span>
<span id="cb5-39"><a href="#cb5-39" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-40"><a href="#cb5-40" aria-hidden="true" tabindex="-1"></a> const <span class="kw">my</span> <span class="dt">%CONFIG</span> => (</span>
<span id="cb5-41"><a href="#cb5-41" aria-hidden="true" tabindex="-1"></a> colswanted => [<span class="dv">0</span>, -<span class="dv">1</span>],</span>
<span id="cb5-42"><a href="#cb5-42" aria-hidden="true" tabindex="-1"></a> sp500_start => <span class="ot">'</span><span class="ss">19500103</span><span class="ot">'</span>,</span>
<span id="cb5-43"><a href="#cb5-43" aria-hidden="true" tabindex="-1"></a> );</span>
<span id="cb5-44"><a href="#cb5-44" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-45"><a href="#cb5-45" aria-hidden="true" tabindex="-1"></a> <span class="kw">my</span> <span class="dt">$start</span> = <span class="wa">$_</span>[<span class="dv">0</span>] ? <span class="wa">$_</span>[<span class="dv">0</span>] : <span class="dt">$CONFIG</span>{sp500_start};</span>
<span id="cb5-46"><a href="#cb5-46" aria-hidden="true" tabindex="-1"></a> <span class="kw">my</span> <span class="dt">$end</span> = <span class="wa">$_</span>[<span class="dv">1</span>];</span>
<span id="cb5-47"><a href="#cb5-47" aria-hidden="true" tabindex="-1"></a> <span class="kw">unless</span> ( <span class="dt">$end</span> ) {</span>
<span id="cb5-48"><a href="#cb5-48" aria-hidden="true" tabindex="-1"></a> <span class="fu">require</span> DateTime;</span>
<span id="cb5-49"><a href="#cb5-49" aria-hidden="true" tabindex="-1"></a> <span class="dt">$end</span> = DateTime->now(time_zone => <span class="ot">'</span><span class="ss">America/New_York</span><span class="ot">'</span>)->ymd(<span class="ot">''</span>);</span>
<span id="cb5-50"><a href="#cb5-50" aria-hidden="true" tabindex="-1"></a> }</span>
<span id="cb5-51"><a href="#cb5-51" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-52"><a href="#cb5-52" aria-hidden="true" tabindex="-1"></a> <span class="kw">my</span> <span class="dt">$db</span> = sp500_filename( <span class="dt">$start</span>, <span class="dt">$end</span>, <span class="ot">'</span><span class="ss">db</span><span class="ot">'</span>);</span>
<span id="cb5-53"><a href="#cb5-53" aria-hidden="true" tabindex="-1"></a> <span class="kw">my</span> <span class="dt">$file</span> = sp500_filename( <span class="dt">$start</span>, <span class="dt">$end</span>, <span class="ot">'</span><span class="ss">csv</span><span class="ot">'</span> );</span>
<span id="cb5-54"><a href="#cb5-54" aria-hidden="true" tabindex="-1"></a> <span class="kw">my</span> <span class="dt">$url</span> = sp500_url( <span class="dt">$start</span>, <span class="dt">$end</span> );</span>
<span id="cb5-55"><a href="#cb5-55" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-56"><a href="#cb5-56" aria-hidden="true" tabindex="-1"></a> <span class="ot">-e</span> <span class="dt">$file</span> <span class="ot">or</span> download( <span class="dt">$url</span>, <span class="dt">$file</span> );</span>
<span id="cb5-57"><a href="#cb5-57" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-58"><a href="#cb5-58" aria-hidden="true" tabindex="-1"></a> <span class="kw">my</span> <span class="dt">$dbh</span> = <span class="fu">import</span>( <span class="dt">$file</span>, <span class="dt">$db</span>, <span class="dt">$CONFIG</span>{colswanted} );</span>
<span id="cb5-59"><a href="#cb5-59" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-60"><a href="#cb5-60" aria-hidden="true" tabindex="-1"></a> create_election_tables(</span>
<span id="cb5-61"><a href="#cb5-61" aria-hidden="true" tabindex="-1"></a> <span class="dt">$dbh</span>,</span>
<span id="cb5-62"><a href="#cb5-62" aria-hidden="true" tabindex="-1"></a> \<span class="dt">@ELECTIONS</span>,</span>
<span id="cb5-63"><a href="#cb5-63" aria-hidden="true" tabindex="-1"></a> <span class="dv">101</span>,</span>
<span id="cb5-64"><a href="#cb5-64" aria-hidden="true" tabindex="-1"></a> <span class="dv">100</span></span>
<span id="cb5-65"><a href="#cb5-65" aria-hidden="true" tabindex="-1"></a> );</span>
<span id="cb5-66"><a href="#cb5-66" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-67"><a href="#cb5-67" aria-hidden="true" tabindex="-1"></a> export_analysis_table(</span>
<span id="cb5-68"><a href="#cb5-68" aria-hidden="true" tabindex="-1"></a> <span class="dt">$dbh</span>,</span>
<span id="cb5-69"><a href="#cb5-69" aria-hidden="true" tabindex="-1"></a> \<span class="dt">@ELECTIONS</span>,</span>
<span id="cb5-70"><a href="#cb5-70" aria-hidden="true" tabindex="-1"></a> <span class="dv">101</span>,</span>
<span id="cb5-71"><a href="#cb5-71" aria-hidden="true" tabindex="-1"></a> <span class="ot">'</span><span class="ss">sp500-elections.tsv</span><span class="ot">'</span>,</span>
<span id="cb5-72"><a href="#cb5-72" aria-hidden="true" tabindex="-1"></a> );</span>
<span id="cb5-73"><a href="#cb5-73" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-74"><a href="#cb5-74" aria-hidden="true" tabindex="-1"></a> const <span class="kw">my</span> <span class="dt">@ELECTIONS_PRE16</span> => <span class="fu">grep</span> <span class="wa">$_</span>->[<span class="dv">0</span>] <span class="ot">ne</span> <span class="ot">'</span><span class="ss">trump</span><span class="ot">'</span>, <span class="dt">@ELECTIONS</span>;</span>
<span id="cb5-75"><a href="#cb5-75" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-76"><a href="#cb5-76" aria-hidden="true" tabindex="-1"></a> create_election_tables(</span>
<span id="cb5-77"><a href="#cb5-77" aria-hidden="true" tabindex="-1"></a> <span class="dt">$dbh</span>,</span>
<span id="cb5-78"><a href="#cb5-78" aria-hidden="true" tabindex="-1"></a> \<span class="dt">@ELECTIONS_PRE16</span>,</span>
<span id="cb5-79"><a href="#cb5-79" aria-hidden="true" tabindex="-1"></a> <span class="dv">51</span>,</span>
<span id="cb5-80"><a href="#cb5-80" aria-hidden="true" tabindex="-1"></a> <span class="dv">200</span></span>
<span id="cb5-81"><a href="#cb5-81" aria-hidden="true" tabindex="-1"></a> );</span>
<span id="cb5-82"><a href="#cb5-82" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-83"><a href="#cb5-83" aria-hidden="true" tabindex="-1"></a> export_analysis_table(</span>
<span id="cb5-84"><a href="#cb5-84" aria-hidden="true" tabindex="-1"></a> <span class="dt">$dbh</span>,</span>
<span id="cb5-85"><a href="#cb5-85" aria-hidden="true" tabindex="-1"></a> \<span class="dt">@ELECTIONS_PRE16</span>,</span>
<span id="cb5-86"><a href="#cb5-86" aria-hidden="true" tabindex="-1"></a> <span class="dv">51</span>,</span>
<span id="cb5-87"><a href="#cb5-87" aria-hidden="true" tabindex="-1"></a> <span class="ot">'</span><span class="ss">sp500-electionspre16.tsv</span><span class="ot">'</span>,</span>
<span id="cb5-88"><a href="#cb5-88" aria-hidden="true" tabindex="-1"></a> );</span>
<span id="cb5-89"><a href="#cb5-89" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-90"><a href="#cb5-90" aria-hidden="true" tabindex="-1"></a> <span class="dt">$dbh</span>-><span class="dt">disconnect</span>;</span>
<span id="cb5-91"><a href="#cb5-91" aria-hidden="true" tabindex="-1"></a>}</span></code></pre></div>
<p>The <code>import</code> routine is below. It simply reads the downloaded file, and creates a simple table whose primary key is the date column and the only other column is the adjusted daily close value for S&P 500. The <code>$colswanted</code> argument tells us the indexes of the columns we want. Of course, one could make this routine much more generic, but, my goal is to create a small script that records each step taken to create the data tables I want in a reproducible way with the minimum fuss. I use <a href="https://metacpan.org/pod/DBI">DBI</a>, but, in a sense, just opening a pipe to <code>sqlite3</code> would have been just as easy and would probably have performed faster, but I wrote the first thing that popped into my head.</p>
<div class="sourceCode" id="cb6"><pre class="sourceCode perl"><code class="sourceCode perl"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a><span class="kw">sub </span><span class="fu">import</span> ( <span class="dt">$file</span>, <span class="dt">$db</span>, <span class="dt">$colswanted</span> ) {</span>
<span id="cb6-2"><a href="#cb6-2" aria-hidden="true" tabindex="-1"></a> <span class="kw">my</span> <span class="dt">$data</span> = read_data( <span class="dt">$file</span>, <span class="dt">$colswanted</span> );</span>
<span id="cb6-3"><a href="#cb6-3" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb6-4"><a href="#cb6-4" aria-hidden="true" tabindex="-1"></a> <span class="kw">my</span> <span class="dt">$dbh</span> = DBI-><span class="fu">connect</span>(<span class="ot">"</span><span class="st">dbi:SQLite:</span><span class="dt">$db</span><span class="ot">"</span>, <span class="fu">undef</span>, <span class="fu">undef</span>,</span>
<span id="cb6-5"><a href="#cb6-5" aria-hidden="true" tabindex="-1"></a> { AutoCommit => <span class="dv">0</span>, RaiseError => <span class="dv">1</span> }</span>
<span id="cb6-6"><a href="#cb6-6" aria-hidden="true" tabindex="-1"></a> );</span>
<span id="cb6-7"><a href="#cb6-7" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb6-8"><a href="#cb6-8" aria-hidden="true" tabindex="-1"></a> <span class="dt">$dbh</span>-><span class="dt">do</span>( <span class="ot">q{</span><span class="ss">DROP TABLE IF EXISTS sp500</span><span class="ot">}</span> );</span>
<span id="cb6-9"><a href="#cb6-9" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb6-10"><a href="#cb6-10" aria-hidden="true" tabindex="-1"></a> <span class="dt">$dbh</span>-><span class="dt">do</span>( <span class="ot">q{</span><span class="ss">CREATE TABLE sp500 (</span></span>
<span id="cb6-11"><a href="#cb6-11" aria-hidden="true" tabindex="-1"></a><span class="ss"> dt CHAR[10] PRIMARY KEY,</span></span>
<span id="cb6-12"><a href="#cb6-12" aria-hidden="true" tabindex="-1"></a><span class="ss"> p REAL NOT NULL</span></span>
<span id="cb6-13"><a href="#cb6-13" aria-hidden="true" tabindex="-1"></a><span class="ss"> )</span><span class="ot">}</span> );</span>
<span id="cb6-14"><a href="#cb6-14" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb6-15"><a href="#cb6-15" aria-hidden="true" tabindex="-1"></a> <span class="dt">$dbh</span>-><span class="dt">commit</span>;</span>
<span id="cb6-16"><a href="#cb6-16" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb6-17"><a href="#cb6-17" aria-hidden="true" tabindex="-1"></a> <span class="kw">my</span> <span class="dt">$sth</span> = <span class="dt">$dbh</span>-><span class="dt">prepare</span>(<span class="ot">q{</span><span class="ss">INSERT INTO sp500(dt, p) VALUES (?, ?)</span><span class="ot">}</span>);</span>
<span id="cb6-18"><a href="#cb6-18" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb6-19"><a href="#cb6-19" aria-hidden="true" tabindex="-1"></a> <span class="kw">for</span> <span class="kw">my</span> <span class="dt">$i</span> ( <span class="wa">$#</span><span class="dt">$data</span> ) {</span>
<span id="cb6-20"><a href="#cb6-20" aria-hidden="true" tabindex="-1"></a> <span class="dt">$sth</span>-><span class="dt">bind_param_array</span>(<span class="dt">$i</span> + <span class="dv">1</span>, <span class="dt">$data</span>->[<span class="dt">$i</span>]);</span>
<span id="cb6-21"><a href="#cb6-21" aria-hidden="true" tabindex="-1"></a> }</span>
<span id="cb6-22"><a href="#cb6-22" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb6-23"><a href="#cb6-23" aria-hidden="true" tabindex="-1"></a> <span class="dt">$sth</span>-><span class="dt">execute_array</span>({}, <span class="dt">$data</span>-><span class="dt">@</span><span class="ot">*</span>);</span>
<span id="cb6-24"><a href="#cb6-24" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb6-25"><a href="#cb6-25" aria-hidden="true" tabindex="-1"></a> <span class="dt">$dbh</span>-><span class="dt">commit</span>;</span>
<span id="cb6-26"><a href="#cb6-26" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb6-27"><a href="#cb6-27" aria-hidden="true" tabindex="-1"></a> <span class="kw">return</span> <span class="dt">$dbh</span>;</span>
<span id="cb6-28"><a href="#cb6-28" aria-hidden="true" tabindex="-1"></a>}</span></code></pre></div>
<p>My next step was to write <code>create_election_tables</code> whose job is to select <code>$lo</code> days before (and possibly including) election day and <code>$hi</code> days following it. The SQL below looks like an abomination to me, but the multiple levels of <code>SELECT</code> statements was the only way I was able to get the correct output:</p>
<div class="sourceCode" id="cb7"><pre class="sourceCode perl"><code class="sourceCode perl"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a><span class="kw">sub </span><span class="fu">create_election_tables</span> ( <span class="dt">$dbh</span>, <span class="dt">$elections</span>, <span class="dt">$lo</span>, <span class="dt">$hi</span> ) {</span>
<span id="cb7-2"><a href="#cb7-2" aria-hidden="true" tabindex="-1"></a> <span class="kw">for</span> <span class="kw">my</span> <span class="dt">$election</span> ( <span class="dt">$elections</span>-><span class="dt">@</span><span class="ot">*</span> ) {</span>
<span id="cb7-3"><a href="#cb7-3" aria-hidden="true" tabindex="-1"></a> <span class="kw">my</span> (<span class="dt">$president</span>, <span class="dt">$date</span>) = <span class="dt">$election</span>-><span class="dt">@</span><span class="wa">*;</span></span>
<span id="cb7-4"><a href="#cb7-4" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb7-5"><a href="#cb7-5" aria-hidden="true" tabindex="-1"></a> <span class="dt">$dbh</span>-><span class="dt">do</span>(<span class="ot">"</span><span class="st">DROP TABLE IF EXISTS </span><span class="dt">$president</span><span class="ot">"</span>);</span>
<span id="cb7-6"><a href="#cb7-6" aria-hidden="true" tabindex="-1"></a> <span class="dt">$dbh</span>-><span class="dt">commit</span>;</span>
<span id="cb7-7"><a href="#cb7-7" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb7-8"><a href="#cb7-8" aria-hidden="true" tabindex="-1"></a> <span class="dt">$dbh</span>-><span class="dt">do</span>(<span class="fu">sprintf</span>(<span class="ot">q{</span></span>
<span id="cb7-9"><a href="#cb7-9" aria-hidden="true" tabindex="-1"></a><span class="ss"> CREATE TABLE %s AS</span></span>
<span id="cb7-10"><a href="#cb7-10" aria-hidden="true" tabindex="-1"></a><span class="ss"> SELECT * FROM</span></span>
<span id="cb7-11"><a href="#cb7-11" aria-hidden="true" tabindex="-1"></a><span class="ss"> (</span></span>
<span id="cb7-12"><a href="#cb7-12" aria-hidden="true" tabindex="-1"></a><span class="ss"> SELECT * FROM</span></span>
<span id="cb7-13"><a href="#cb7-13" aria-hidden="true" tabindex="-1"></a><span class="ss"> (</span></span>
<span id="cb7-14"><a href="#cb7-14" aria-hidden="true" tabindex="-1"></a><span class="ss"> SELECT dt, p</span></span>
<span id="cb7-15"><a href="#cb7-15" aria-hidden="true" tabindex="-1"></a><span class="ss"> FROM sp500</span></span>
<span id="cb7-16"><a href="#cb7-16" aria-hidden="true" tabindex="-1"></a><span class="ss"> WHERE dt <= ?</span></span>
<span id="cb7-17"><a href="#cb7-17" aria-hidden="true" tabindex="-1"></a><span class="ss"> ORDER BY dt DESC</span></span>
<span id="cb7-18"><a href="#cb7-18" aria-hidden="true" tabindex="-1"></a><span class="ss"> LIMIT ?</span></span>
<span id="cb7-19"><a href="#cb7-19" aria-hidden="true" tabindex="-1"></a><span class="ss"> )</span></span>
<span id="cb7-20"><a href="#cb7-20" aria-hidden="true" tabindex="-1"></a><span class="ss"> )</span></span>
<span id="cb7-21"><a href="#cb7-21" aria-hidden="true" tabindex="-1"></a><span class="ss"> UNION ALL</span></span>
<span id="cb7-22"><a href="#cb7-22" aria-hidden="true" tabindex="-1"></a><span class="ss"> SELECT * FROM</span></span>
<span id="cb7-23"><a href="#cb7-23" aria-hidden="true" tabindex="-1"></a><span class="ss"> (</span></span>
<span id="cb7-24"><a href="#cb7-24" aria-hidden="true" tabindex="-1"></a><span class="ss"> SELECT dt, p FROM</span></span>
<span id="cb7-25"><a href="#cb7-25" aria-hidden="true" tabindex="-1"></a><span class="ss"> (</span></span>
<span id="cb7-26"><a href="#cb7-26" aria-hidden="true" tabindex="-1"></a><span class="ss"> SELECT dt, p</span></span>
<span id="cb7-27"><a href="#cb7-27" aria-hidden="true" tabindex="-1"></a><span class="ss"> FROM sp500</span></span>
<span id="cb7-28"><a href="#cb7-28" aria-hidden="true" tabindex="-1"></a><span class="ss"> WHERE dt > ?</span></span>
<span id="cb7-29"><a href="#cb7-29" aria-hidden="true" tabindex="-1"></a><span class="ss"> ORDER BY dt</span></span>
<span id="cb7-30"><a href="#cb7-30" aria-hidden="true" tabindex="-1"></a><span class="ss"> LIMIT ?</span></span>
<span id="cb7-31"><a href="#cb7-31" aria-hidden="true" tabindex="-1"></a><span class="ss"> )</span></span>
<span id="cb7-32"><a href="#cb7-32" aria-hidden="true" tabindex="-1"></a><span class="ss"> )</span></span>
<span id="cb7-33"><a href="#cb7-33" aria-hidden="true" tabindex="-1"></a><span class="ss"> ORDER BY dt</span></span>
<span id="cb7-34"><a href="#cb7-34" aria-hidden="true" tabindex="-1"></a><span class="ss"> </span><span class="ot">}</span>, <span class="dt">$president</span>), {}, <span class="dt">$date</span>, <span class="dt">$lo</span>, <span class="dt">$date</span>, <span class="dt">$hi</span>);</span>
<span id="cb7-35"><a href="#cb7-35" aria-hidden="true" tabindex="-1"></a> <span class="dt">$dbh</span>-><span class="dt">commit</span>;</span>
<span id="cb7-36"><a href="#cb7-36" aria-hidden="true" tabindex="-1"></a> }</span>
<span id="cb7-37"><a href="#cb7-37" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb7-38"><a href="#cb7-38" aria-hidden="true" tabindex="-1"></a> <span class="kw">return</span>;</span>
<span id="cb7-39"><a href="#cb7-39" aria-hidden="true" tabindex="-1"></a>}</span></code></pre></div>
<p>After <code>create_election_tables</code> returns, we have a table for each election listed in our <code>@ELECTIONS</code> array. The next step is to join these tables so the trading days around each election line up. Luckily, SQLite creates a <code>rowid</code> column with each of those tables such that</p>
<blockquote>
<p>Tables created using <code>CREATE TABLE AS</code> are initially populated with the rows of data returned by the <code>SELECT</code> statement. Rows are assigned contiguously ascending <code>rowid</code> values, starting with <code>1</code>, in the order that they are returned by the <code>SELECT</code> statement (see <a href="https://sqlite.org/lang_createtable.html">SQLite documentation for <code>CREATE TABLE</code></a>).</p>
</blockquote>
<p>The following creates the SQL needed to join the tables created above so that a <code>t</code> column gives the number of trading days relative to the base, and there is a column named after each president whose rows give S&P 500 closing values relative to the base trading day. It fetches the resulting rows, and writes them to an output file so I can import it to a another application for further manipulation.</p>
<div class="sourceCode" id="cb8"><pre class="sourceCode perl"><code class="sourceCode perl"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a><span class="kw">sub </span><span class="fu">export_analysis_table</span> ( <span class="dt">$dbh</span>, <span class="dt">$elections</span>, <span class="dt">$lo</span>, <span class="dt">$output_file</span> ) {</span>
<span id="cb8-2"><a href="#cb8-2" aria-hidden="true" tabindex="-1"></a> <span class="fu">open</span> <span class="kw">my</span> <span class="dt">$fh</span>, <span class="ot">'</span><span class="ss">></span><span class="ot">'</span>, <span class="dt">$output_file</span></span>
<span id="cb8-3"><a href="#cb8-3" aria-hidden="true" tabindex="-1"></a> <span class="ot">or</span> croak <span class="ot">"</span><span class="st">Failed to open '</span><span class="dt">$output_file</span><span class="ot">'</span><span class="st"> for writing</span><span class="ot">"</span>;</span>
<span id="cb8-4"><a href="#cb8-4" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb8-5"><a href="#cb8-5" aria-hidden="true" tabindex="-1"></a> <span class="kw">my</span> <span class="dt">$header</span> = <span class="fu">join</span>(<span class="ot">"</span><span class="ch">\t</span><span class="ot">"</span>, t => <span class="fu">map</span> <span class="wa">$_</span>->[<span class="dv">0</span>], <span class="dt">$elections</span>-><span class="dt">@</span><span class="ot">*</span>);</span>
<span id="cb8-6"><a href="#cb8-6" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb8-7"><a href="#cb8-7" aria-hidden="true" tabindex="-1"></a> <span class="fu">say</span> <span class="dt">$fh</span> <span class="dt">$header</span>;</span>
<span id="cb8-8"><a href="#cb8-8" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb8-9"><a href="#cb8-9" aria-hidden="true" tabindex="-1"></a> <span class="kw">my</span> <span class="dt">$vars</span> = <span class="fu">join</span> <span class="ot">'</span><span class="ss">, </span><span class="ot">'</span>, <span class="fu">map</span> {</span>
<span id="cb8-10"><a href="#cb8-10" aria-hidden="true" tabindex="-1"></a> <span class="kw">my</span> <span class="dt">$president</span> = <span class="wa">$_</span>->[<span class="dv">0</span>];</span>
<span id="cb8-11"><a href="#cb8-11" aria-hidden="true" tabindex="-1"></a> <span class="fu">sprintf</span>(</span>
<span id="cb8-12"><a href="#cb8-12" aria-hidden="true" tabindex="-1"></a> <span class="ot">'</span><span class="ss">round( %s.p / (select p from %s where rowid = %d), 4 ) as %s</span><span class="ot">'</span>,</span>
<span id="cb8-13"><a href="#cb8-13" aria-hidden="true" tabindex="-1"></a> (<span class="dt">$president</span>) x <span class="dv">2</span>, <span class="dt">$lo</span>, <span class="dt">$president</span></span>
<span id="cb8-14"><a href="#cb8-14" aria-hidden="true" tabindex="-1"></a> )</span>
<span id="cb8-15"><a href="#cb8-15" aria-hidden="true" tabindex="-1"></a> } <span class="dt">$elections</span>-><span class="dt">@</span><span class="wa">*;</span></span>
<span id="cb8-16"><a href="#cb8-16" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb8-17"><a href="#cb8-17" aria-hidden="true" tabindex="-1"></a> <span class="kw">my</span> <span class="dt">$tables</span> = <span class="fu">join</span> <span class="ot">q{</span><span class="ss"> JOIN </span><span class="ot">}</span>, <span class="fu">map</span> <span class="wa">$_</span>->[<span class="dv">0</span>], <span class="dt">$elections</span>-><span class="dt">@</span><span class="wa">*;</span></span>
<span id="cb8-18"><a href="#cb8-18" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb8-19"><a href="#cb8-19" aria-hidden="true" tabindex="-1"></a> <span class="kw">my</span> <span class="dt">$cond</span> = <span class="fu">join</span> <span class="ot">q{</span><span class="ss"> AND </span><span class="ot">}</span>, <span class="fu">map</span> <span class="fu">sprintf</span>(</span>
<span id="cb8-20"><a href="#cb8-20" aria-hidden="true" tabindex="-1"></a> <span class="ot">'</span><span class="ss">( %s.rowid = %s.rowid )</span><span class="ot">'</span>, <span class="dt">$elections</span>->[<span class="wa">$_</span> - <span class="dv">1</span>][<span class="dv">0</span>], <span class="dt">$elections</span>->[<span class="wa">$_</span>][<span class="dv">0</span>]</span>
<span id="cb8-21"><a href="#cb8-21" aria-hidden="true" tabindex="-1"></a> ), <span class="dv">1</span> .. <span class="wa">$#</span><span class="dt">$elections</span>;</span>
<span id="cb8-22"><a href="#cb8-22" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb8-23"><a href="#cb8-23" aria-hidden="true" tabindex="-1"></a> <span class="kw">my</span> <span class="dt">$sth</span> = <span class="dt">$dbh</span>-><span class="dt">prepare</span>(<span class="ot">qq{</span></span>
<span id="cb8-24"><a href="#cb8-24" aria-hidden="true" tabindex="-1"></a><span class="st"> SELECT clinton.rowid - </span><span class="dt">$lo</span><span class="st"> AS t, </span><span class="dt">$vars</span></span>
<span id="cb8-25"><a href="#cb8-25" aria-hidden="true" tabindex="-1"></a> FROM <span class="ot">$</span><span class="st">tables</span></span>
<span id="cb8-26"><a href="#cb8-26" aria-hidden="true" tabindex="-1"></a><span class="st"> WHERE </span><span class="ot">$</span>cond</span>
<span id="cb8-27"><a href="#cb8-27" aria-hidden="true" tabindex="-1"></a> });</span>
<span id="cb8-28"><a href="#cb8-28" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb8-29"><a href="#cb8-29" aria-hidden="true" tabindex="-1"></a> <span class="dt">$sth</span>-><span class="dt">execute</span>;</span>
<span id="cb8-30"><a href="#cb8-30" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb8-31"><a href="#cb8-31" aria-hidden="true" tabindex="-1"></a> <span class="kw">while</span> ( <span class="kw">my</span> <span class="dt">$row</span> = <span class="dt">$sth</span>-><span class="dt">fetch</span> ) {</span>
<span id="cb8-32"><a href="#cb8-32" aria-hidden="true" tabindex="-1"></a> <span class="fu">say</span> <span class="dt">$fh</span> <span class="fu">join</span>(<span class="ot">"</span><span class="ch">\t</span><span class="ot">"</span>, <span class="dt">$row</span>-><span class="dt">@</span><span class="ot">*</span>);</span>
<span id="cb8-33"><a href="#cb8-33" aria-hidden="true" tabindex="-1"></a> }</span>
<span id="cb8-34"><a href="#cb8-34" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb8-35"><a href="#cb8-35" aria-hidden="true" tabindex="-1"></a> <span class="fu">close</span> <span class="dt">$fh</span></span>
<span id="cb8-36"><a href="#cb8-36" aria-hidden="true" tabindex="-1"></a> <span class="ot">or</span> croak <span class="ot">"</span><span class="st">Failed to close '</span><span class="dt">$output_file</span><span class="ot">'</span><span class="st">: </span><span class="wa">$!</span><span class="ot">"</span>;</span>
<span id="cb8-37"><a href="#cb8-37" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb8-38"><a href="#cb8-38" aria-hidden="true" tabindex="-1"></a> <span class="kw">return</span>;</span>
<span id="cb8-39"><a href="#cb8-39" aria-hidden="true" tabindex="-1"></a>}</span></code></pre></div>
<p>Note that I am using another favorite syntax improvement that was introduced experimentally with Perl 5.20 and become stable in 5.24: <a href="https://www.effectiveperlprogramming.com/2014/09/use-postfix-dereferencing/">Postfix dereferencing</a>. Once again, writing <code>@$row</code> is not horrible, but <em>postfix dereferencing</em> is really useful when you are doing some deep dereferencing, and I prefer to stick with it unless there is an overriding backward compatibility requirement, or unless I simply forget <code>;-)</code></p>
<p>Here are a couple of pictures. First, a look at S&P 500 around all the elections under consideration. Note that S&P 500 performance 100 days from the election is bounded from below by its performance during the first days of the Bush and Obama presidencies, and from above by its performance under Kennedy. While the S&P 500 is down since March 1, it is still about 10% above its level on election day. By inauguration day, it was up by about 6.2% relative to election day. As I write this, today is not over and S&P 500 has given up about 2.2% since March 1, but it still up 3.2% since inauguration day. So, if you had bought into the “sell the inauguration” calls, you would have given up something between 5.5% to 3.2%, corresponding to selling on March 1 and today, respectively.</p>
<div class="thumb"><a href="https://www.nu42.com/2017/04/sp500-elections-all.png"><img src="https://www.nu42.com/2017/04/sp500-elections-all.png" width="500" alt="[ S&P 500 around change elections ]" title="S&P 500 around change elections"></a></div>
<p>And, here is the comparison between S&P 500 around the 1992 election and the 2016 election:</p>
<div class="thumb"><a href="https://www.nu42.com/2017/04/sp500-elections-clinton-trump.png"><img src="https://www.nu42.com/2017/04/sp500-elections-clinton-trump.png" width="500" alt="[ S&P 500 around change elections ]" title="S&P 500 around change elections"></a></div>
<p>PS: You can discuss this post on <a href="https://redd.it/650m7i">r/perl</a></p>
</div>
</article>
Sinan UnurData work is dirty worktag:www.nu42.com,2017-03-29:/2017/03/data-work-dirty-work.html2017-03-29T17:00:00+00:00
<article class="h-entry article">
<div class="article-header-container">
<header>
<h1 class="p-name article-title">Data work is dirty work</h1><h2 class="p-author article-author">A. Sinan Unur</h2><h3 class="article-published"><time datetime="2017-03-29T17:00:00+00:00" class="dt-published">March 29, 2017</time></h3>
</header>
</div>
<div class="article-content"><blockquote cite="https://en.wikipedia.org/wiki/Josiah_Stamp,_1st_Baron_Stamp#Quotes"><p> "The government are very keen on amassing statistics. They collect them, add them, raise them to the nth power, take the cube root and prepare wonderful diagrams. But you must never forget that every one of these figures comes in the first instance from the chowky dar (village watchman in India), who just puts down what he damn pleases."</p>
<p style="text-align:right"><a href="https://en.wikipedia.org/wiki/Josiah_Stamp,_1st_Baron_Stamp#Quotes">Stamp's Law, Sir Josiah Stamp</a><br><small>Quoted often in my Stats classes</small></p>
</blockquote>
<p>The recent popularity of something called “<a title="You're probably not dealing with big data" href="https://en.wikipedia.org/wiki/Data_science">data science</a>” has created an environment where every day I run into some post about some data set and how you can analyze it in a few lines of Python or R or Julia or something else. These posts come with pretty and colorful pictures. They ever so subtly seem to send the message that anyone can become a data “scientist”.</p>
<p>I hate to break it to you, no matter how good it sounds, the phrase “data science” does not make sense.</p>
<p>The <a href="https://en.oxforddictionaries.com/definition/science">definition of the word science</a> seems straightforward enough:</p>
<blockquote>
<p>The intellectual and practical activity encompassing the systematic study of the structure and behaviour of the physical and natural world through observation and experiment.</p>
</blockquote>
<p>Data are what you generate “through observation and experiment.”</p>
<p><strong>Data are not the objective of science</strong>. Every field adopts certain techniques suitable for the analysis of the types of data studies in such fields generate. For example, most data economists have to work with are observational and hence we have <a href="https://en.wikipedia.org/wiki/Econometrics">econometrics</a>.</p>
<p>Dealing with observational data, organizing them, double-checking them, reshaping them to something suitable for analysis is hard work. In fact, if I had one-tenth the talent of <a href="http://mickens.seas.harvard.edu/wisdom-james-mickens">James Mickens</a>, I would write something like <a href="http://scholar.harvard.edu/files/mickens/files/towashitallaway.pdf">this</a> about data work.</p>
<p>I started working with real, observational data some decades ago. A popular assignment when teaching <a href="https://en.wikipedia.org/wiki/Linear_programming">Linear Programming</a> is to calculate the minimum cost of obtaining a healthy diet (since both the objective function and constraints are linear), assume about 40% of one’s income goes to nutrition, and calculate a “living wage” on the basis of that. To complete the assignment, students are supposed to collect data on prices of main food items available locally etc. It’s a fun way to introduce people to all sorts of issues they should be familiar with if they are going to deal with empirical models without being too overwhelming.</p>
<p>After that, I dealt with some micro- and some macro-economic data. Then, during my senior year, I worked as an analyst at the Central Bank of Turkey where I was responsible for the specification of the labor market component of the annual macroeconometric model of the Turkish economy. Of course, I was not the first person who had dealt with this model. But, I think I was the first person who had decided to check the underlying data.</p>
<p>Up to that point, the underlying data tables had been entered by hand from the publications of the <a href="https://en.wikipedia.org/wiki/State_Planning_Organization_(Turkey)">State Planning Organization</a> and the <em>State Institute of Statistics</em> (<abbr title="Devlet İstatistik Enstitüsü">DİE</abbr>, now <a href="http://www.turkstat.gov.tr/">TÜİK</a>). I decided to double check some past entries, and I realized the numbers we had in our tables did not match the numbers of found in some of those publications. That led me to the realization that <a href="http://wrap.warwick.ac.uk/164/">central planning is impossible</a>: Central planning requires timely, extensive, and correct data. On the other hand, keeping one’s job in the state apparatus requires one to please one’s political bosses (one of the reasons I decided not to stick with the lucrative position with the prestiguous Research Department of the Central Bank). That means, you write a five year plan that forecasts impressive wage and productivity gains, great increases in employment etc. If, within the horizon of the plan, your political bosses change, past forecasts must be modified to make sure the new administration starts from a lower base so their achievements look more impressive. Or, if important measures are falling behind the rosy forecasts, you may want to make sure unfavorable data are not included in a report right before a crucial debate in the parliament or before a visit by the IMF.</p>
<p>So, I decided to look into the actual data underlying these published reports. It took some doing, but I got access to the original tabulations still stored in a basement at the State Institute of Statistics. Those archived pieces of yellow, low quality paper were moldy. They were falling apart. I gleaned whatever I could from them but that wasn’t enough to come up with a consistent and well defined series for the cost of labor in industry.</p>
<p>The effort was not wasted though: I learned the lesson of <a href="https://en.wikipedia.org/wiki/Josiah_Stamp,_1st_Baron_Stamp#Quotes">Stamp’s Law</a> first hand before I had even heard of it.</p>
<p>By the mid-90s, in the United States, putting together data tables, at least for data sets produced by the <a href="https://www.census.gov/">Census Bureau</a>, <a href="https://www.bls.gov/">Bureau of Labor Statistics</a>, and <a href="https://www.bea.gov/">Bureau of Economic Analysis</a> had become considerably easier. You could actually FTP stuff. Many more data sets were now coming on CD-ROMs and personal computers with CD drives were everywhere. But, intuitively understanding how individual pieces of information ended up as bits and bytes on digital media has been invaluable to me.</p>
<p>These days, the power to tabulate, aggregate, and collate data sets of sizes previously unimaginable is available to pretty much anyone. On the one hand, this democratization of the ability to access and analyze data is a good thing. On the other hand, the ease with which data flow from sources through analysis tools into pretty pictures has created an environment where really hard questions are cast aside in favor of pretty graphs and over-glamourization of applying canned techniques to data which may or may not fit assumptions implicit in those techniques.</p>
<p><a href="http://perlweekly.com/archive/296.html">This Monday’s Perl Weekly</a> featured <a href="https://culturedperl.com/gun-violence-using-perl-to-analyze-publicly-available-data-4697b6d7dc1f">a post by James Keenan</a> on the so-called <a href="http://www.gunviolencearchive.org/">Gun Violence Archive</a> (<abbr title="Gun Violence Archive">GVA</abbr>) published by the <a href="https://www.theguardian.com/world/2017/mar/20/mapping-gun-murders-micro-level-new-data-2015">Guardian</a>. As far as I understand, the data are produced using an <em>attempt</em> at a <a href="http://www.st.nmfs.noaa.gov/recreational-fisheries/Understanding-Estimation/census-vs-sampling">census</a> of all incidents in the United States. An incomplete census may have <a href="https://stats.stackexchange.com/questions/21403/why-is-it-claimed-that-a-sample-is-often-more-accurate-than-a-census">biases</a> that cannot be controlled for using straightforward statistical techniques. Ultimately, the underlying data come from individual cities and counties and their police departments etc. These data are <a href="https://www2.fbi.gov/ucr/ucr_general.html#methodology">voluntarily reported</a> (or not reported, as the case might be) by those units. One cannot assume that <a href="https://en.wikipedia.org/wiki/Missing_data#Types_of_missing_data">missing data</a> are missing at random. So, that’s the first problem, from the get go, with using that data set to do serious analysis.</p>
<p>In his <a href="https://culturedperl.com/gun-violence-using-perl-to-analyze-publicly-available-data-4697b6d7dc1f">post</a>, Jim notices some oddities. For example:</p>
<blockquote>
<p>If we look at our report more closely, we see that there are many large cities where the 2014 Murder Rate was listed as zero.</p>
</blockquote>
<p>The problem is simple. Not all departments report rates. When a rate is not reported, it will be a missing entry in the data set (see also <a href="https://twitter.com/sinan_unur/status/846290680230154240">my comment on Twitter</a> for a screenshot). Clearly, Honolulu reported a positive count of murders, but did not report a murder rate. The value was missing in the original data set <a href="https://www.ucrdatatool.gov/">provided by the FBI</a>, but Guardian’s “data scientists” replaced missing values with zeros in putting together this data set.</p>
<p>So, lesson one in data work: Do not ever trust <a href="https://www.youtube.com/watch?v=1tWLDhJ6mjQ">regurgitated data</a>, no matter how much you trust the entity doing the processing. We are not little birds.</p>
<p>If you use FBI’s oddly restricted data access tool (which means you’d have to write a bot to gather all available data, or, order a CD from FBI’s CJIS division if you can figure out whom to contact), you find out that Honolulu’s reporting has become sparser since 2011 (see <a href="honolulu-crime-ucr-20170329.csv">CSV file produced by the extraction tool</a>. It is horribly formatted, but I left it untouched for your reference). The table below shows the number of months in each year in which Honolulu Police Department submitted data to the UCR:</p>
<pre class="text"><code>Year Months
1985 12
1986 12
1987 12
1988 12
1990 12
1991 12
1992 12
1993 12
1994 12
1995 12
1996 12
1997 12
1998 12
1999 12
2000 12
2001 12
2002 12
2003 12
2004 12
2005 12
2006 12
2007 12
2008 12
2009 12
2010 12
2011 5
2012 3
2013 6
2014 4</code></pre>
<p>So, immediately you run into the question of what determines which months were reported and <em><strong>why</strong></em>. I can’t answer that, so we’ll move on, but there can be no <em>analysis</em> without a satisfactory answer to that question. A cursory look suggests that the sparse reporting regime started with the <a href="https://en.wikipedia.org/wiki/Peter_Carlisle">previous mayor</a> of Honolulu and continued with the current one. Also, it seems to have started right after the <a href="http://www.honolulupd.org/department/index.php?page=chief10">former police chief</a> took the position. He <a href="http://khon2.com/2017/01/06/police-commission-decides-on-fate-on-chief-kealoha/">retired</a> early this year. We’ll have to wait and see if data for 2015, 2016, and 2017 will be more complete. The important thing to keep in mind is that if the probability of month’s data not being reported is not independent of the phenomena we want to analyze, then straightforward inference or comparisons are really difficult. We first need to model the process of sparse reporting.</p>
<p>Guardian also provides an incident database which seems to have been derived from FBI’s <a href="https://ucr.fbi.gov/nibrs-overview">NIBRS</a>. There is a problem though. The Guardian article says this:</p>
<blockquote>
<p>Incident-level data – each row includes a gun homicide incident.</p>
</blockquote>
<p>Well, try this:</p>
<pre class="text"><code>$ curl https://interactive.guim.co.uk/2017/feb/09/gva-data/gva_release_2015_raw_incidents.csv | wc -l
...
0</code></pre>
<p>Yes, there are <em><strong>zero</strong></em> rows in the incident file!</p>
<p>The file is not empty though. No, it is a single 2,302,651 byte string with no newlines in it! Seriously?!</p>
<p>What is going on here?</p>
<p>Let’s look at the underlying bytes:</p>
<pre class="text"><code>$ xxd gva_release_2015_raw_incidents.csv | head -n 25
00000000: 6776 615f 6964 2c69 6e63 6964 656e 745f gva_id,incident_
00000010: 6461 7465 2c73 7461 7465 2c63 6974 795f date,state,city_
00000020: 6f72 5f63 6f75 6e74 795f 6775 6172 6469 or_county_guardi
00000030: 616e 5f63 6f72 7265 6374 6564 2c63 6974 an_corrected,cit
00000040: 795f 6f72 5f63 6f75 6e74 795f 6f72 6967 y_or_county_orig
00000050: 696e 616c 5f67 7661 2c61 6464 7265 7373 inal_gva,address
00000060: 2c6e 756d 5f6b 696c 6c65 642c 6e75 6d5f ,num_killed,num_
00000070: 696e 6a75 7265 642c 6c61 7469 7475 6465 injured,latitude
00000080: 2c6c 6f6e 6769 7475 6465 2c67 7661 5f75 ,longitude,gva_u
00000090: 726c 2c66 6970 735f 6675 6c6c 2c66 6970 rl,fips_full,fip
000000a0: 735f 7374 6174 652c 6669 7073 5f63 6f75 s_state,fips_cou
000000b0: 6e74 792c 6669 7073 5f74 7261 6374 2c66 nty,fips_tract,f
000000c0: 6970 735f 626c 6f63 6b2c 6669 7073 5f66 ips_block,fips_f
000000d0: 756c 6c5f 7472 6163 742c 7472 6163 745f ull_tract,tract_
000000e0: 6c61 6e64 5f73 7175 6172 655f 6d69 6c65 land_square_mile
000000f0: 732c 7472 6163 745f 7761 7465 725f 7371 s,tract_water_sq
00000100: 7561 7265 5f6d 696c 6573 0d33 3736 3435 uare_miles.37645
00000110: 372c 372f 3136 2f31 352c 416c 6162 616d 7,7/16/15,Alabam
00000120: 612c 416e 6461 6c75 7369 612c 416e 6461 a,Andalusia,Anda
00000130: 6c75 7369 612c 4c65 6f6e 2057 6967 6769 lusia,Leon Wiggi
00000140: 6e73 2052 6f61 642c 312c 302c 3331 2e32 ns Road,1,0,31.2
00000150: 3631 362c 2d38 362e 3337 3639 2c68 7474 616,-86.3769,htt
00000160: 703a 2f2f 7777 772e 6775 6e76 696f 6c65 p://www.gunviole
00000170: 6e63 6561 7263 6869 7665 2e6f 7267 2f69 ncearchive.org/i
00000180: 6e63 6964 656e 742f 3337 3634 3537 2c30 ncident/376457,0</code></pre>
<p>The header ends at byte 266 where there is a single solitary carriage return, <code>0x0d</code>:</p>
<blockquote>
<pre><code>00000100: 7561 7265 5f6d 696c 6573 *0d*33 3736 3435 uare_miles.37645</code></pre>
</blockquote>
<p>In fact, this file uses carriage returns for line endings. Curious. Either Guardian did all this work on a Macintosh, or, more likely, someone tried to replace CRLFs with LFs but instead of using a capable program such as <code>dos2unix</code>, used a one-liner incorrectly.</p>
<p>Let’s fix that:</p>
<pre class="text"><code>$ perl -pi.bak -e "s/\R/\n/g" gva_release_2015_raw_incidents.csv
$ head gva_release_2015_raw_incidents.csv
gva_id,incident_date,state,city_or_county_guardian_corrected,city_or_county_original_gva,address,num_killed,num_injured,latitude,longitude,gva_url,fips_full,fips_state,fips_county,fips_tract,fips_block,fips_full_tract,tract_land_square_miles,tract_water_square_miles
376457,7/16/15,Alabama,Andalusia,Andalusia,Leon Wiggins Road,1,0,31.2616,-86.3769,http://www.gunviolencearchive.org/incident/376457,010399623003016,01,039,962300,3016,01039962300,107.68,1.04
...</code></pre>
<p>Still, I would rather try to get source data from the FBI for anything serious. I am not going to cross check each individual record. Given these amateur hour mistakes by Guardian’s “data science” team, I don’t have much trust in the rest of the data set (replacing missing values with zeros is <em>extremely</em> problematic).</p>
<p>I am going to close with some of my own recommendations:</p>
<h3 id="make-sure-all-steps-in-your-analysis-including-data-retrieval-are-replicable">Make sure all steps in your analysis, including data retrieval are replicable</h3>
<p>Don’t issue individual command lines in a console. If possible put raw data files in version control, so you notice changes when they occur. Write a script to retrieve the data file. In this particular case, something like this should work:</p>
<div class="sourceCode" id="cb6"><pre class="sourceCode perl"><code class="sourceCode perl"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a><span class="kw">#!/usr/bin/env perl</span></span>
<span id="cb6-2"><a href="#cb6-2" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb6-3"><a href="#cb6-3" aria-hidden="true" tabindex="-1"></a><span class="fu">use</span> v5.<span class="dv">24</span>; <span class="co"># why not?!</span></span>
<span id="cb6-4"><a href="#cb6-4" aria-hidden="true" tabindex="-1"></a><span class="fu">use</span> <span class="kw">warnings</span>;</span>
<span id="cb6-5"><a href="#cb6-5" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb6-6"><a href="#cb6-6" aria-hidden="true" tabindex="-1"></a><span class="fu">use</span> <span class="fu">Const::Fast</span>;</span>
<span id="cb6-7"><a href="#cb6-7" aria-hidden="true" tabindex="-1"></a><span class="fu">use</span> <span class="fu">File::Basename</span> <span class="ot">qw(</span> basename <span class="ot">)</span>;</span>
<span id="cb6-8"><a href="#cb6-8" aria-hidden="true" tabindex="-1"></a><span class="fu">use</span> <span class="fu">HTTP::Tiny</span>;</span>
<span id="cb6-9"><a href="#cb6-9" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb6-10"><a href="#cb6-10" aria-hidden="true" tabindex="-1"></a>const <span class="kw">my</span> <span class="dt">@URI</span> => <span class="ot">qw(</span></span>
<span id="cb6-11"><a href="#cb6-11" aria-hidden="true" tabindex="-1"></a> https://interactive.guim.co.uk/2017/feb/09/gva-data/gva_release_2015_raw_incidents.csv</span>
<span id="cb6-12"><a href="#cb6-12" aria-hidden="true" tabindex="-1"></a> https://interactive.guim.co.uk/2017/feb/09/gva-data/gva_release_2015_grouped_by_city_and_ranked.csv</span>
<span id="cb6-13"><a href="#cb6-13" aria-hidden="true" tabindex="-1"></a> https://interactive.guim.co.uk/2017/feb/09/gva-data/gva_release_2015_grouped_by_tract.csv</span>
<span id="cb6-14"><a href="#cb6-14" aria-hidden="true" tabindex="-1"></a> https://interactive.guim.co.uk/2017/feb/09/gva-data/UCR-1985-2015.csv</span>
<span id="cb6-15"><a href="#cb6-15" aria-hidden="true" tabindex="-1"></a><span class="ot">)</span>;</span>
<span id="cb6-16"><a href="#cb6-16" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb6-17"><a href="#cb6-17" aria-hidden="true" tabindex="-1"></a><span class="kw">my</span> <span class="dt">$http</span> = <span class="fu">HTTP::Tiny</span>->new;</span>
<span id="cb6-18"><a href="#cb6-18" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb6-19"><a href="#cb6-19" aria-hidden="true" tabindex="-1"></a><span class="kw">for</span> <span class="kw">my</span> <span class="dt">$uri</span> ( <span class="dt">@URI</span> ) {</span>
<span id="cb6-20"><a href="#cb6-20" aria-hidden="true" tabindex="-1"></a> <span class="kw">my</span> <span class="dt">$file</span> = basename <span class="dt">$uri</span>;</span>
<span id="cb6-21"><a href="#cb6-21" aria-hidden="true" tabindex="-1"></a> <span class="kw">my</span> <span class="dt">$response</span> = <span class="dt">$http</span>-><span class="dt">mirror</span>( <span class="dt">$uri</span> => <span class="dt">$file</span> );</span>
<span id="cb6-22"><a href="#cb6-22" aria-hidden="true" tabindex="-1"></a> <span class="kw">unless</span> ( <span class="dt">$response</span>->{success} ) {</span>
<span id="cb6-23"><a href="#cb6-23" aria-hidden="true" tabindex="-1"></a> <span class="fu">warn</span> <span class="ot">"</span><span class="st">Problem fetching '</span><span class="dt">$uri</span><span class="ot">'</span><span class="st">: </span><span class="dt">$response</span>-><span class="st">{status} </span><span class="dt">$response</span>-><span class="st">{reason}</span><span class="ch">\n</span><span class="ot">"</span>;</span>
<span id="cb6-24"><a href="#cb6-24" aria-hidden="true" tabindex="-1"></a> }</span>
<span id="cb6-25"><a href="#cb6-25" aria-hidden="true" tabindex="-1"></a>}</span></code></pre></div>
<p>If the source files are too large for source control, have the download script generate hashes and save them in a file. Then, you can track that file.</p>
<h3 id="when-you-download-files-run-wc--l-on-them">When you download files, run <code>wc -l</code> on them</h3>
<p>It is important to ensure that the number of records in your extract are within the ballpark of your expectations.</p>
<h3 id="dont-parse-the-source-data-repeatedly">Don’t parse the source data repeatedly</h3>
<p>Instead, write a script to put the data in a database. I prefer <a href="https://www.sqlite.org/">SQLite</a> at this point. SQL is easier than custom programs in any programming language for <em>ad hoc</em> inspection of a data set.</p>
<p>For this purpose, I find it handy to keep around an <code>sqlite3</code> binary which is compiled with my favorite options. If you are using <code>gcc</code>, make sure taking advantage of <code>-O3 -march=native</code>. With Visual C, I like <code>/Ox /favor:INTEL64</code>. If your CPU supports it, also use <code>/arch:AVX2</code>.</p>
<p>Note that both <code>gva_release_2015_grouped_by_city_and_ranked.csv</code> and <code>gva_release_2015_grouped_by_tract.csv</code> also suffer from having bare carriage returns (<code>0x0d</code> aka Macintosh line endings), so make sure to fix that before running this step.</p>
<p>After correcting the EOL problems, you can import the all the tables using this simple SQLite script:</p>
<div class="sourceCode" id="cb7"><pre class="sourceCode sql"><code class="sourceCode sql"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a><span class="kw">DROP</span> <span class="kw">TABLE</span> <span class="cf">IF</span> <span class="kw">EXISTS</span> city_level;</span>
<span id="cb7-2"><a href="#cb7-2" aria-hidden="true" tabindex="-1"></a><span class="kw">DROP</span> <span class="kw">TABLE</span> <span class="cf">IF</span> <span class="kw">EXISTS</span> tract_level;</span>
<span id="cb7-3"><a href="#cb7-3" aria-hidden="true" tabindex="-1"></a><span class="kw">DROP</span> <span class="kw">TABLE</span> <span class="cf">IF</span> <span class="kw">EXISTS</span> incidents;</span>
<span id="cb7-4"><a href="#cb7-4" aria-hidden="true" tabindex="-1"></a><span class="kw">DROP</span> <span class="kw">TABLE</span> <span class="cf">IF</span> <span class="kw">EXISTS</span> ucr;</span>
<span id="cb7-5"><a href="#cb7-5" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb7-6"><a href="#cb7-6" aria-hidden="true" tabindex="-1"></a>.separator ,</span>
<span id="cb7-7"><a href="#cb7-7" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb7-8"><a href="#cb7-8" aria-hidden="true" tabindex="-1"></a>.import gva_release_2015_grouped_by_city_and_ranked.csv city_level</span>
<span id="cb7-9"><a href="#cb7-9" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb7-10"><a href="#cb7-10" aria-hidden="true" tabindex="-1"></a>.import gva_release_2015_grouped_by_tract.csv tract_level</span>
<span id="cb7-11"><a href="#cb7-11" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb7-12"><a href="#cb7-12" aria-hidden="true" tabindex="-1"></a>.import gva_release_2015_raw_incidents.csv incidents</span>
<span id="cb7-13"><a href="#cb7-13" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb7-14"><a href="#cb7-14" aria-hidden="true" tabindex="-1"></a>.import UCR<span class="op">-</span><span class="dv">1985</span><span class="op">-</span><span class="fl">2015.</span>csv ucr</span></code></pre></div>
<p>run it using <code>sqlite3 gva_20170329.db < import.sql</code>. By distinguishing the database using the date it was created, you retain the option of comparing differences in case source files are updated.</p>
<h3 id="come-up-with-short-mnemonic-variable-names">Come up with short, mnemonic variable names</h3>
<p>While the Guardian data set already has short variable names, the <a href="honolulu-crime-ucr-20170329.csv">CSV file I downloaded from FBI</a> has column headings such as “Murder and nonnegligent Manslaughter” which is not easy to work with. Come up with a short name for it. Other data sets come with codebooks which list the mnemonic for each variable along with the possible values it can take. E.g, see <a href="https://www.cdc.gov/brfss/annual_data/annual_2015.html">BRFSS</a>.</p>
<h3 id="reshape-the-data-tables">Reshape the data tables</h3>
<p>For example, the UCR data contains umpteen columns named:</p>
<blockquote>
<p><code>1985_raw_murder_num</code>, <code>1986_raw_murder_num</code>, …, <code>2014_raw_murder_num</code>, <code>2015_raw_murder_num</code>, <code>1985_murder_rate</code>, <code>1986_murder_rate</code>, …, <code>2014_murder_rate</code>, <code>2015_murder_rate</code></p>
</blockquote>
<p>Your life will be simpler if you instead transform the table so that you have the columns <code>agency</code>, <code>city</code>, <code>state</code>, <code>state_short</code>, <code>year</code>, <code>murder_count</code>, and <code>murder_rate</code>.</p>
<p>This data table is completely unsuitable for anything even semi-serious because it does not include a variable telling us how many months of data are missing for each record. The numbers from agencies that reported all 12 months of data each year are mingled with numbers from agencies that only reported data for a few months in some years.</p>
<p>To actually analyze the data, make inferences and arrive at conclusion and predictions, you need to develop a model and combine these observations with other relevant sociological, demographic, and economic information. You need to build your model before looking at the data: If you let the data tell you which model to choose, you are not going anything remotely scientific. That kind of data torture has its uses, but it is not science.</p>
<p>Finally, collating, tabulating, and graphing data is not analysis. Those comprise just the dirty work we must do to get to a point where it is possible to analyze data.</p>
<p>PS: You can discuss this post on <a href="https://redd.it/627xd3">r/perl</a>.</p>
</div>
</article>
Sinan UnurFor your 'İ's and 'ı's onlytag:www.nu42.com,2017-02-27:/2017/02/for-your-eyes-only.html2017-02-27T17:00:00+00:00
<article class="h-entry article">
<div class="article-header-container">
<header>
<h1 class="p-name article-title">For your 'İ's and 'ı's only</h1><h2 class="p-author article-author">A. Sinan Unur</h2><h3 class="article-published"><time datetime="2017-02-27T17:00:00+00:00" class="dt-published">February 27, 2017</time></h3>
</header>
</div>
<div class="article-content"><p>I have been poking around <code>perl</code>’s internals in <a href="https://www.nu42.com/2017/02/perl-unicode-windows-trilogy-one.html">my quest</a> to help <code>perl</code> see interesting characters in names and values of environment variables. The first step was to translate the UTF-16 environment which <code>wmain</code> received to a UTF-8 encoded one. The next step will be to ensure that the relevant parts of the code know about this. This requires <a href="https://perl5.git.perl.org/perl.git/blob/443bd156a6baaf7a8fe6b6b05fcf6c4178140ed2:/mg.c#l1185">a small change in <code>mg.c</code></a> and a much <a href="https://perl5.git.perl.org/perl.git/blob/443bd156a6baaf7a8fe6b6b05fcf6c4178140ed2:/hv.c#l336">more significant change in <code>hv.c</code></a> (I must admit, I had not realized until now most of Perl’s hash functionality existed in a single 600 line function). The mechanics of the changes are not that hard, but this made me realize something which I thought was interesting. So, this post is not part of the N-part trilogy of adding Unicode support to <code>perl</code> on the Windows command line.</p>
<p>The reason I ended up at this point is that I realized I would have to deal with the <code>ENV_IS_CASELESS</code> code in <code>hv.c</code>. The code uses <code>strupr</code> to make all environment variables upper case on platforms like Windows where environment variables are case insensitive. A small problem with this is the fact that the Windows environment is <em>case preserving</em> since XP. I do remember some people used this fact to detect whether their programs were running under Windows 9x or XP, but I don’t think that technique is something to be relied on.</p>
<p>Upon realizing I would have to deal with casing issues, the first thing that popped in to my head was the question of how any code I wrote or changed would deal with the <a href="http://www.i18nguy.com/unicode/turkish-i18n.html">Turkish I problem</a>. In a nutshell, the <a href="https://www.howtosayinturkish.com/contents/alphabet/">Turkish alphabet</a> has two ’I’s. We have the dotless <code>ı</code> whose upper case version is <code>I</code> and the dotted <code>i</code> whose upper case version is <code>İ</code>. If you are given an <code>i</code>, you don’t know whether to map that to <code>I</code> or <code>İ</code> without knowing if it is used in Turkish or another language. Similarly, given an <code>I</code>, you don’t know whether the lower case version of that is <code>i</code> or <code>ı</code> without knowing if it is used in Turkish. There are two cases without ambiguity: If you have an <code>İ</code> the lower case of that is unambiguously <code>i</code> and if you have an <code>ı</code>, the upper case of that is unambiquously <code>I</code>.</p>
<p>However, very few environments do any of this correctly, so I gave up on things like Turkish characters in file names many decades ago, and I haven’t looked back. This is the one situation I really have to think hard about this because if making <code>perl</code> Unicode aware on the Windows command line is going to break anything that uses the environment, then the effort is not worth it.</p>
<p>So, I went experimenting.</p>
<p>On a modern Windows 10 machine (with OS code page set to 437), here is what I observe:</p>
<pre class="shell"><code>$ set iş=kârlı
$ echo %iş%
kârlı
$ echo %İŞ%
%İŞ%
$ echo %IŞ%
kârlı</code></pre>
<p>which makes sense. Now, let’s start out with upper case <code>İ</code>:</p>
<pre class="shell"><code>$ set İş=kârlı
$ echo %iş%
%iş%
$ echo %ış%
%ış%
$ echo %İŞ%
kârlı</code></pre>
<p>That doesn’t make so much sense. I am not sure what <code>cmd.exe</code> does in the background, but it is probably using something like <a href="https://msdn.microsoft.com/en-us/library/windows/desktop/ms647475(v=vs.85).aspx">CharUpperBuff</a>:</p>
<blockquote>
<p>Note that CharUpperBuff always maps lowercase I (“i”) to uppercase I, even when the current language is Turkish or Azeri.</p>
</blockquote>
<p>or</p>
<p><a href="https://msdn.microsoft.com/en-us/library/windows/desktop/dd318700(v=vs.85).aspx">LCMapString</a> which supposedly maps <code>i</code> to <code>İ</code> if the current language is Turkish or Azeri. I can’t test this on a computer with a Turkish locale because I am unwilling to deal with any unintended consequences of using anything other than the U.S. English locale.</p>
<p>Regardless of which function Windows uses, I don’t see why mapping <code>İ</code> to <code>i</code> presents a problem. <strong>Update:</strong> Of course, the problem is that when I set <code>İş</code> in the environment and ask for the value of <code>%iş%</code>, Windows upper-cases the <code>i</code> in <code>%iş%</code> to <code>I</code> because I am not working in a Turkish locale. Duh!</p>
<p>This made me curious about how <code>perl</code> and <code>perl6</code> deal with case transformations of Turkish <code>İ</code> and <code>ı</code>. To abstract away from any issues having to do with <code>cmd.exe</code>, I wrote the simplest script I can run using both interpreters:</p>
<div class="sourceCode" id="cb3"><pre class="sourceCode perl"><code class="sourceCode perl"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="fu">print</span> <span class="fu">lc</span>( <span class="ot">'</span><span class="ss">İ</span><span class="ot">'</span> ), <span class="ot">"</span><span class="ch">\n</span><span class="ot">"</span>;</span>
<span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a><span class="fu">print</span> <span class="fu">uc</span>( <span class="fu">lc</span> <span class="ot">'</span><span class="ss">İ</span><span class="ot">'</span> ), <span class="ot">"</span><span class="ch">\n</span><span class="ot">"</span>;</span>
<span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a><span class="fu">print</span> <span class="fu">lc</span>( <span class="fu">uc</span> <span class="ot">'</span><span class="ss">ı</span><span class="ot">'</span>), <span class="ot">"</span><span class="ch">\n</span><span class="ot">"</span>;</span></code></pre></div>
<p>I also changed my code page to 65001 (UTF-8) in the <code>cmd.exe</code> window I was going to use to run these experiments.</p>
<pre class="shell"><code>$ perl -Mutf8 -CS t.pl
i̇
İ
i</code></pre>
<p>Now, <code>gvim</code> displays <code>lc( 'İ' )</code> as something that looks like <code>i</code>, but <code>cmd.exe</code> showed this:</p>
<div class="thumb"><a href="turkish-eyes-cmdexe.png"><img src="https://www.nu42.comturkish-eyes-cmdexe.png" alt="transforming İ to lower case"></a></div>
<p>Let’s look at what octets are produced:</p>
<pre class="shell"><code>$ perl -Mutf8 -CS t.pl |xxd
00000000: 69cc 870d 0a49 cc87 0d0a 690d 0a i....I....i..</code></pre>
<p>That’s curious. That is <code>i</code> followed by another Unicode character. What is that?</p>
<pre class="shell"><code>print charnames::viacode( ord(lc 'İ') ), "\n";
LATIN SMALL LETTER I</code></pre>
<p>That did not reveal much, did it?</p>
<p>Without further ado, Unicode code point <code>\x307</code> is <a href="http://www.fileformat.info/info/unicode/char/0307/index.htm">COMBINING DOT ABOVE</a>. This means <code>perl</code> can preserve the identity <code>'İ' ≡ uc( lc 'İ' )</code>.</p>
<p>Let’s look at the output I get from <code>perl6</code> running the same script:</p>
<pre class="shell"><code>$ perl6 t.pl|xxd
00000000: 69cc 870d 0ac4 b00d 0a69 0d0a i........i..</code></pre>
<p>Again, <code>lc( 'İ' )</code> becomes <code>i</code> followed by <code>COMBINING DOT ABOVE</code> which means <code>uc(lc 'İ')</code> becomes <a href="http://www.fileformat.info/info/unicode/char/0130/index.htm">LATIN CAPITAL LETTER I WITH DOT ABOVE</a> as a by product of the fact that <code>perl6</code> deals in <a href="https://perl6advent.wordpress.com/2015/12/07/day-7-unicode-perl-6-and-you/">graphemes</a>, which is a good thing:</p>
<pre class="shell"><code>say 'İ' eq 'İ'.lc.uc.lc.uc;
True</code></pre>
<p>Well, that’s neither here or there, but I thought it was rather clever to map <code>lc( 'İ' )</code> to <code>i</code> followed by “combining dot above” so that <code>'İ' ≡ uc( lc 'İ' )</code> still held.</p>
<p>I am wondering if there is another codepoint that means something like “no diacritic above” but looking at Wikipedia’s <a href="https://en.wikipedia.org/wiki/Combining_character">combining characters</a>, I do not see anything that could be useful.</p>
<p>Is there way within the Unicode specification of preserving the identity <code>'ı' ≡ lc( uc 'ı' )</code>?</p>
<p>PS: You can discuss this post on <a href="https://redd.it/5widdc">r/perl</a>.</p>
</div>
</article>
Sinan UnurFixing Perl's Unicode problems on the command line on Windows: A trilogy in N partstag:www.nu42.com,2017-02-18:/2017/02/perl-unicode-windows-trilogy-one.html2017-02-18T13:45:00+00:00
<article class="h-entry article">
<div class="article-header-container">
<header>
<h1 class="p-name article-title">Fixing Perl's Unicode problems on the command line on Windows: A trilogy in N parts</h1><h2 class="p-author article-author">A. Sinan Unur</h2><h3 class="article-published"><time datetime="2017-02-18T13:45:00+00:00" class="dt-published">February 18, 2017</time></h3>
</header>
</div>
<div class="article-content"><p>I have used Perl on Windows for decades without being seriously hampered by any of its past or current limitations. Still, it would be nice to solve some of the issues, if only so I can post <a href="https://twitter.com/sinan_unur/status/831233924072370176">cute screenshots</a>.</p>
<p>Here are some problems with Perl and Unicode on the command line in Windows.</p>
<h4 id="1-cant-pass-interesting-characters-to-perl-on-the-command-line">1. Can’t pass interesting characters to <code>perl</code> on the command line</h4>
<p>You can’t pass characters that are outside of the Windows code page to <code>perl</code> on the command line. It doesn’t matter whether you have set the code page to <a href="https://msdn.microsoft.com/en-us/library/windows/desktop/dd317756(v=vs.85).aspx">65001</a> and use the <code>-CA</code> command line argument to <code>perl</code>: Because <code>perl</code> uses <code>main</code> instead of <code>wmain</code> as the entry point, it never sees anything other than characters in the ANSI code page.</p>
<p>For example:</p>
<pre class="text"><code>$ chcp 65001
$ perl -CAS -E "say for @ARGV" şey
sey</code></pre>
<p>That’s because <code>ş</code> does not appear in CP 437 which is what my laptop is using. By the time it reaches the internals of <code>perl</code>, it has already become <code>s</code>.</p>
<p>On the other hand,</p>
<pre><code>$ perl -CAS -E "say for @ARGV" ünür
Malformed UTF-8 character (unexpected end of string) in say at -e line 1.
�n�r</code></pre>
<p>because <code>ü</code> does appear in CP 437 so it remains intact. But then we lied, the command line is not UTF-8 encoded.</p>
<p>This “works”:</p>
<pre class="text"><code>$ perl -CS -E "say for @ARGV" ünür
ünür</code></pre>
<p>but not for the right reasons.</p>
<h4 id="2-cant-use-interesting-characters-in-perl-one-liners">2. Can’t use interesting characters in <code>perl</code> one-liners</h4>
<p>For example:</p>
<pre class="text"><code>$ perl -Mutf8 -CS -E "say 'şey'"
sey</code></pre>
<p>Again, by the time <code>perl</code> sees the source of the one-liner, it is too late for <code>-Mutf8</code>.</p>
<h4 id="3-cant-use-interesting-characters-in-script-names">3. Can’t use interesting characters in script names</h4>
<p>For the same reason:</p>
<pre class="text"><code>$ type şey
use v5.24;
use utf8;
say 'şey';
$ perl şey
Can't open perl script "sey": No such file or directory</code></pre>
<p>This one comes with the added caveat that even if <code>perl</code> did get the name of the file right, it would still not be able to run the script because it would be using the ANSI API which would once again not be able to deal with characters outside of the current code page.</p>
<h4 id="4a-cant-access-environment-variables-with-interesting-characters-in-their-names">4.a. Can’t access environment variables with interesting characters in their names;</h4>
<h4 id="4b-cant-access-values-of-environment-variables-if-they-contain-intersting-characters">4.b. Can’t access values of environment variables if they contain intersting characters</h4>
<p>For example:</p>
<pre class="text"><code>$ set iş=kârlı
$ set hava=karlı
$ echo %iş%
kârlı
$ echo %hava%
karlı
$ type t.pl
use v5.24;
use utf8;
say $ENV{$_} for qw(iş hava);
$ perl t.pl
karli</code></pre>
<p>So, business is profitable, and the weather is snowy, but we can’t look up <code>$ENV{iş}</code> and the value of <code>$ENV{hava}</code> is misspelled.</p>
<h4 id="5-cant-read-lines-containing-interesting-characters-from-the-console">5. Can’t read lines containing interesting characters from the console</h4>
<p>For example:</p>
<pre class="text"><code>$ perl -e "print while <>"
hava yağmurlu mu karlı mı olacak?
hava yagmurlu mu karli mi olacak?</code></pre>
<p>Depending on your Windows version, this script may terminate prematurely.</p>
<h4 id="6-using-standard-perl-functions-interacting-with-data-files-with-interesting-characters-in-their-names-is-weird">6. Using standard Perl functions, interacting with data files with interesting characters in their names is weird</h4>
<p><code>perl</code> tries to access files using their short names which doesn’t work if you have disabled short name creation. Even if it does, it’s ugly</p>
<pre class="text"><code>$ dir
...
2017-02-17 09:45 AM 38 şey
$ perl -E "opendir $d, '.'; say for grep !/^\./, readdir $d"
EY61AE~1</code></pre>
<p>This one is not a huge problem, because one can use <a href="https://metacpan.org/pod/Win32::LongPath">Win32::LongPath</a> to deal with the issue.</p>
<p>In fact, none of these are huge problems: I have done useful work with Perl on Windows for decades despite the occasional glitch.</p>
<p>However, they are things I thought I should make some effort to fix some day. After I <a href="https://github.com/MoarVM/MoarVM/pull/528/files?diff=split">fixed Perl6’s Unicode issues on the command line in Windows</a>, I felt slightly guilty that I had not given Perl the same TLC. Maybe “some day” has arrived.</p>
<p>The easiest to fix among the problems I mentioned above is the case of command line arguments, and that’s what I am going to start with. I will dig deeper in subsequent posts.</p>
<p>I approached this problem a little sideways: I decided to leverage Perl’s support for UTF-8 encoded command line arguments. I just had to modify the arguments before <code>perl</code>’s internals saw them to make sure they were <code>UTF-8</code> encoded. To try out my idea, I first wrote a wrapper for <code>perl.exe</code>. The wrapper was very simple: It used <code>wmain</code> as its entry point, and constructed a UTF-8 encoded command line argument array with <code>-CA</code> inserted between the first and the second elements to invoke <code>perl.exe</code>. It was ugly, but it worked in the sense that any interesting characters I used in command line arguments made it to the Perl side of things intact.</p>
<p><code>perl</code> itself comes with a wrapper, <a href="https://perl5.git.perl.org/perl.git/blob/c3d9aeb96afe725795daceaf39f6b133c0593328:/win32/runperl.c"><code>runperl.c</code></a> which becomes <code>perlmain.c</code> during build. This would be the ideal place to transform both the command line arguments and the environment array <code>perl</code> sees before it sets up anything. Basically, the idea is to always run <code>perl</code> with UTF-8 encoded arguments. This keeps any changes we want to make to Perl’s internals minimal. Of course, Windows does not have an API for console programs to receive their arguments as UTF-8 encoded strings. Instead, we use <code>wmain</code> as the entry point so we receive the command line arguments and the environment as UTF-16 encoded strings. Then, we create UTF-8 encoded command line argument and environment arrays using standard Windows APIs. The patch is rather straightforward:</p>
<div class="sourceCode" id="cb9"><pre class="sourceCode diff"><code class="sourceCode diff"><span id="cb9-1"><a href="#cb9-1" aria-hidden="true" tabindex="-1"></a><span class="kw">diff --git a/win32/runperl.c b/win32/runperl.c</span></span>
<span id="cb9-2"><a href="#cb9-2" aria-hidden="true" tabindex="-1"></a>index 2157224..9cd3c7c 100644</span>
<span id="cb9-3"><a href="#cb9-3" aria-hidden="true" tabindex="-1"></a><span class="dt">--- a/win32/runperl.c</span></span>
<span id="cb9-4"><a href="#cb9-4" aria-hidden="true" tabindex="-1"></a><span class="dt">+++ b/win32/runperl.c</span></span>
<span id="cb9-5"><a href="#cb9-5" aria-hidden="true" tabindex="-1"></a><span class="dt">@@ -2,6 +2,11 @@</span></span>
<span id="cb9-6"><a href="#cb9-6" aria-hidden="true" tabindex="-1"></a> #include <crtdbg.h></span>
<span id="cb9-7"><a href="#cb9-7" aria-hidden="true" tabindex="-1"></a> #endif</span>
<span id="cb9-8"><a href="#cb9-8" aria-hidden="true" tabindex="-1"></a> </span>
<span id="cb9-9"><a href="#cb9-9" aria-hidden="true" tabindex="-1"></a><span class="va">+#include <windows.h></span></span>
<span id="cb9-10"><a href="#cb9-10" aria-hidden="true" tabindex="-1"></a><span class="va">+#include <fcntl.h></span></span>
<span id="cb9-11"><a href="#cb9-11" aria-hidden="true" tabindex="-1"></a><span class="va">+#include <io.h></span></span>
<span id="cb9-12"><a href="#cb9-12" aria-hidden="true" tabindex="-1"></a><span class="va">+#include <stdlib.h></span></span>
<span id="cb9-13"><a href="#cb9-13" aria-hidden="true" tabindex="-1"></a><span class="va">+</span></span>
<span id="cb9-14"><a href="#cb9-14" aria-hidden="true" tabindex="-1"></a> #include "EXTERN.h"</span>
<span id="cb9-15"><a href="#cb9-15" aria-hidden="true" tabindex="-1"></a> #include "perl.h"</span>
<span id="cb9-16"><a href="#cb9-16" aria-hidden="true" tabindex="-1"></a> </span>
<span id="cb9-17"><a href="#cb9-17" aria-hidden="true" tabindex="-1"></a><span class="dt">@@ -21,9 +26,54 @@ int _CRT_glob = 0;</span></span>
<span id="cb9-18"><a href="#cb9-18" aria-hidden="true" tabindex="-1"></a> </span>
<span id="cb9-19"><a href="#cb9-19" aria-hidden="true" tabindex="-1"></a> #endif</span>
<span id="cb9-20"><a href="#cb9-20" aria-hidden="true" tabindex="-1"></a> </span>
<span id="cb9-21"><a href="#cb9-21" aria-hidden="true" tabindex="-1"></a><span class="va">+static void</span></span>
<span id="cb9-22"><a href="#cb9-22" aria-hidden="true" tabindex="-1"></a><span class="va">+error_exit(const wchar_t *msg)</span></span>
<span id="cb9-23"><a href="#cb9-23" aria-hidden="true" tabindex="-1"></a><span class="va">+{</span></span>
<span id="cb9-24"><a href="#cb9-24" aria-hidden="true" tabindex="-1"></a><span class="va">+ int err = GetLastError();</span></span>
<span id="cb9-25"><a href="#cb9-25" aria-hidden="true" tabindex="-1"></a><span class="va">+ _setmode(_fileno(stderr), _O_U16TEXT);</span></span>
<span id="cb9-26"><a href="#cb9-26" aria-hidden="true" tabindex="-1"></a><span class="va">+ fwprintf(stderr, L"%s: %d\n", msg, err);</span></span>
<span id="cb9-27"><a href="#cb9-27" aria-hidden="true" tabindex="-1"></a><span class="va">+ exit( err );</span></span>
<span id="cb9-28"><a href="#cb9-28" aria-hidden="true" tabindex="-1"></a><span class="va">+}</span></span>
<span id="cb9-29"><a href="#cb9-29" aria-hidden="true" tabindex="-1"></a><span class="va">+</span></span>
<span id="cb9-30"><a href="#cb9-30" aria-hidden="true" tabindex="-1"></a><span class="va">+static char *</span></span>
<span id="cb9-31"><a href="#cb9-31" aria-hidden="true" tabindex="-1"></a><span class="va">+utf8_encode_wstring(const wchar_t *src)</span></span>
<span id="cb9-32"><a href="#cb9-32" aria-hidden="true" tabindex="-1"></a><span class="va">+{</span></span>
<span id="cb9-33"><a href="#cb9-33" aria-hidden="true" tabindex="-1"></a><span class="va">+ char *encoded;</span></span>
<span id="cb9-34"><a href="#cb9-34" aria-hidden="true" tabindex="-1"></a><span class="va">+ int len;</span></span>
<span id="cb9-35"><a href="#cb9-35" aria-hidden="true" tabindex="-1"></a><span class="va">+</span></span>
<span id="cb9-36"><a href="#cb9-36" aria-hidden="true" tabindex="-1"></a><span class="va">+ len = WideCharToMultiByte( CP_UTF8, WC_ERR_INVALID_CHARS, src,</span></span>
<span id="cb9-37"><a href="#cb9-37" aria-hidden="true" tabindex="-1"></a><span class="va">+ -1, NULL, 0, NULL, NULL);</span></span>
<span id="cb9-38"><a href="#cb9-38" aria-hidden="true" tabindex="-1"></a><span class="va">+</span></span>
<span id="cb9-39"><a href="#cb9-39" aria-hidden="true" tabindex="-1"></a><span class="va">+ encoded = malloc(len + 1);</span></span>
<span id="cb9-40"><a href="#cb9-40" aria-hidden="true" tabindex="-1"></a><span class="va">+ if (!encoded) {</span></span>
<span id="cb9-41"><a href="#cb9-41" aria-hidden="true" tabindex="-1"></a><span class="va">+ error_exit(L"Failed to allocate memory for UTF-8 encoded string");</span></span>
<span id="cb9-42"><a href="#cb9-42" aria-hidden="true" tabindex="-1"></a><span class="va">+ }</span></span>
<span id="cb9-43"><a href="#cb9-43" aria-hidden="true" tabindex="-1"></a><span class="va">+</span></span>
<span id="cb9-44"><a href="#cb9-44" aria-hidden="true" tabindex="-1"></a><span class="va">+ (void) WideCharToMultiByte( CP_UTF8, WC_ERR_INVALID_CHARS, src,</span></span>
<span id="cb9-45"><a href="#cb9-45" aria-hidden="true" tabindex="-1"></a><span class="va">+ -1, encoded, len, NULL, NULL);</span></span>
<span id="cb9-46"><a href="#cb9-46" aria-hidden="true" tabindex="-1"></a><span class="va">+</span></span>
<span id="cb9-47"><a href="#cb9-47" aria-hidden="true" tabindex="-1"></a><span class="va">+ return encoded;</span></span>
<span id="cb9-48"><a href="#cb9-48" aria-hidden="true" tabindex="-1"></a><span class="va">+}</span></span>
<span id="cb9-49"><a href="#cb9-49" aria-hidden="true" tabindex="-1"></a><span class="va">+</span></span>
<span id="cb9-50"><a href="#cb9-50" aria-hidden="true" tabindex="-1"></a><span class="va">+static void</span></span>
<span id="cb9-51"><a href="#cb9-51" aria-hidden="true" tabindex="-1"></a><span class="va">+utf8_encode_warr(const wchar_t **warr, const int n, char **arr)</span></span>
<span id="cb9-52"><a href="#cb9-52" aria-hidden="true" tabindex="-1"></a><span class="va">+{</span></span>
<span id="cb9-53"><a href="#cb9-53" aria-hidden="true" tabindex="-1"></a><span class="va">+ int i;</span></span>
<span id="cb9-54"><a href="#cb9-54" aria-hidden="true" tabindex="-1"></a><span class="va">+</span></span>
<span id="cb9-55"><a href="#cb9-55" aria-hidden="true" tabindex="-1"></a><span class="va">+ for (i = 0; i < n; ++i) {</span></span>
<span id="cb9-56"><a href="#cb9-56" aria-hidden="true" tabindex="-1"></a><span class="va">+ arr[i] = utf8_encode_wstring(warr[i]);</span></span>
<span id="cb9-57"><a href="#cb9-57" aria-hidden="true" tabindex="-1"></a><span class="va">+ }</span></span>
<span id="cb9-58"><a href="#cb9-58" aria-hidden="true" tabindex="-1"></a><span class="va">+</span></span>
<span id="cb9-59"><a href="#cb9-59" aria-hidden="true" tabindex="-1"></a><span class="va">+ return;</span></span>
<span id="cb9-60"><a href="#cb9-60" aria-hidden="true" tabindex="-1"></a><span class="va">+}</span></span>
<span id="cb9-61"><a href="#cb9-61" aria-hidden="true" tabindex="-1"></a><span class="va">+</span></span>
<span id="cb9-62"><a href="#cb9-62" aria-hidden="true" tabindex="-1"></a> int</span>
<span id="cb9-63"><a href="#cb9-63" aria-hidden="true" tabindex="-1"></a><span class="st">-main(int argc, char **argv, char **env)</span></span>
<span id="cb9-64"><a href="#cb9-64" aria-hidden="true" tabindex="-1"></a><span class="va">+wmain(int argc, wchar_t **wargv, wchar_t **wenv)</span></span>
<span id="cb9-65"><a href="#cb9-65" aria-hidden="true" tabindex="-1"></a> {</span>
<span id="cb9-66"><a href="#cb9-66" aria-hidden="true" tabindex="-1"></a><span class="va">+ char **argv;</span></span>
<span id="cb9-67"><a href="#cb9-67" aria-hidden="true" tabindex="-1"></a><span class="va">+ char **env;</span></span>
<span id="cb9-68"><a href="#cb9-68" aria-hidden="true" tabindex="-1"></a><span class="va">+ int env_count;</span></span>
<span id="cb9-69"><a href="#cb9-69" aria-hidden="true" tabindex="-1"></a><span class="va">+</span></span>
<span id="cb9-70"><a href="#cb9-70" aria-hidden="true" tabindex="-1"></a> #ifdef _MSC_VER</span>
<span id="cb9-71"><a href="#cb9-71" aria-hidden="true" tabindex="-1"></a> /* Arrange for _CrtDumpMemoryLeaks() to be called automatically at program</span>
<span id="cb9-72"><a href="#cb9-72" aria-hidden="true" tabindex="-1"></a> * termination when built with CFG = DebugFull. */</span>
<span id="cb9-73"><a href="#cb9-73" aria-hidden="true" tabindex="-1"></a><span class="dt">@@ -36,6 +86,30 @@ main(int argc, char **argv, char **env)</span></span>
<span id="cb9-74"><a href="#cb9-74" aria-hidden="true" tabindex="-1"></a> _CrtSetBreakAlloc(-1L);</span>
<span id="cb9-75"><a href="#cb9-75" aria-hidden="true" tabindex="-1"></a> #endif</span>
<span id="cb9-76"><a href="#cb9-76" aria-hidden="true" tabindex="-1"></a> </span>
<span id="cb9-77"><a href="#cb9-77" aria-hidden="true" tabindex="-1"></a><span class="va">+ ++argc; /* we are going to insert -CA between argv[0] and argv[1] */</span></span>
<span id="cb9-78"><a href="#cb9-78" aria-hidden="true" tabindex="-1"></a><span class="va">+ argv = malloc((argc + 1) * sizeof(*argv));</span></span>
<span id="cb9-79"><a href="#cb9-79" aria-hidden="true" tabindex="-1"></a><span class="va">+ if (!argv) {</span></span>
<span id="cb9-80"><a href="#cb9-80" aria-hidden="true" tabindex="-1"></a><span class="va">+ error_exit(L"Failed to allocate memory of UTF-8 encoded argv");</span></span>
<span id="cb9-81"><a href="#cb9-81" aria-hidden="true" tabindex="-1"></a><span class="va">+ }</span></span>
<span id="cb9-82"><a href="#cb9-82" aria-hidden="true" tabindex="-1"></a><span class="va">+</span></span>
<span id="cb9-83"><a href="#cb9-83" aria-hidden="true" tabindex="-1"></a><span class="va">+ argv[0] = utf8_encode_wstring(wargv[0]);</span></span>
<span id="cb9-84"><a href="#cb9-84" aria-hidden="true" tabindex="-1"></a><span class="va">+ argv[1] = "-CA";</span></span>
<span id="cb9-85"><a href="#cb9-85" aria-hidden="true" tabindex="-1"></a><span class="va">+ argv[ argc ] = NULL;</span></span>
<span id="cb9-86"><a href="#cb9-86" aria-hidden="true" tabindex="-1"></a><span class="va">+</span></span>
<span id="cb9-87"><a href="#cb9-87" aria-hidden="true" tabindex="-1"></a><span class="va">+ utf8_encode_warr(wargv + 1, argc - 1, argv + 2);</span></span>
<span id="cb9-88"><a href="#cb9-88" aria-hidden="true" tabindex="-1"></a><span class="va">+</span></span>
<span id="cb9-89"><a href="#cb9-89" aria-hidden="true" tabindex="-1"></a><span class="va">+ env_count = 0;</span></span>
<span id="cb9-90"><a href="#cb9-90" aria-hidden="true" tabindex="-1"></a><span class="va">+ while ( wenv[env_count] ) {</span></span>
<span id="cb9-91"><a href="#cb9-91" aria-hidden="true" tabindex="-1"></a><span class="va">+ ++env_count;</span></span>
<span id="cb9-92"><a href="#cb9-92" aria-hidden="true" tabindex="-1"></a><span class="va">+ }</span></span>
<span id="cb9-93"><a href="#cb9-93" aria-hidden="true" tabindex="-1"></a><span class="va">+ env = malloc( (env_count + 1) * sizeof(*env));</span></span>
<span id="cb9-94"><a href="#cb9-94" aria-hidden="true" tabindex="-1"></a><span class="va">+ if (!env) {</span></span>
<span id="cb9-95"><a href="#cb9-95" aria-hidden="true" tabindex="-1"></a><span class="va">+ error_exit(L"Failed to allocate memory for UTF-8 encoded environment");</span></span>
<span id="cb9-96"><a href="#cb9-96" aria-hidden="true" tabindex="-1"></a><span class="va">+ }</span></span>
<span id="cb9-97"><a href="#cb9-97" aria-hidden="true" tabindex="-1"></a><span class="va">+ env[ env_count ] = NULL;</span></span>
<span id="cb9-98"><a href="#cb9-98" aria-hidden="true" tabindex="-1"></a><span class="va">+</span></span>
<span id="cb9-99"><a href="#cb9-99" aria-hidden="true" tabindex="-1"></a><span class="va">+ utf8_encode_warr(wenv, env_count, env);</span></span>
<span id="cb9-100"><a href="#cb9-100" aria-hidden="true" tabindex="-1"></a><span class="va">+</span></span>
<span id="cb9-101"><a href="#cb9-101" aria-hidden="true" tabindex="-1"></a> return RunPerl(argc, argv, env);</span>
<span id="cb9-102"><a href="#cb9-102" aria-hidden="true" tabindex="-1"></a> }</span></code></pre></div>
<p>Here’s what this change gets us:</p>
<pre class="text"><code>$ ..\perl -Mopen=:std,:utf8 -E "say for @ARGV" iş
iş</code></pre>
<p>or</p>
<pre class="text"><code>$ ..\perl -CAS -E "say for @ARGV" iş
iş</code></pre>
<p>Yes, if we are going to use <code>-CS</code> we must actually use <code>-CAS</code> because, apparently, <code>-C</code> flags are not cumulative. That is, it looks like <code>perl -CA -CS</code> is equivalent to <code>perl -CS</code> and not <code>perl -CAS</code>. I have considered whether to make <code>-CAS</code> the default, but that is of doubtful usefulness because it would require everyone using Perl to use the UTF-8 codepage in the console. That is a bigger change than transparently converting anything passed on the command line to UTF-8.</p>
<p>These changes solve only part of the problem:</p>
<pre class="text"><code>$ ..\perl.exe şey
Can't open perl script "şey": No such file or directory</code></pre>
<p><code>perl</code> looks for the correct script file, but can’t open it because it uses the ANSI functions in the Windows API.</p>
<pre class="text"><code>$ ..\perl.exe -Mutf8 -E "say $ENV{iş}"
kârlı</code></pre>
<p>But, of course,</p>
<pre class="text"><code>$ ..\perl.exe -Mutf8 -Mopen=:std,:utf8 -E "say $ENV{iş}"
kârlı</code></pre>
<p>By the way, I know why <code>kârlı</code> gets double UTF-8 encoded, I know what needs to be fixed, but, as the title says, this post is the first in a series, and I will discuss those issues and their fixes in follow-up posts. The most important criterion for me is to change the smallest number of lines possible to get correct behavior. Otherwise, making changes all over the place in such a large codebase with a long history is bound to get one in trouble by breaking things.</p>
<p>The good news is I got very few test failures due to these changes.</p>
<p>I know at least some people think no one ought to write about anything unless they have filed bug reports and sent patches in triplicate with the blue copy stamped and filed with the Open Source Planning Agency or something, but, rest assured, I will do all that … When I have a complete patch set I can submit with confidence (i.e., when the set of tests failng with the patched <code>perl</code> is identical to the set of tests failing with blead <code>perl</code>).</p>
<p>Along the way, I am going to share how I arrived at that state.</p>
<p>You can discuss this post on <a href="https://redd.it/5ush7c">r/perl</a>.</p>
</div>
</article>
Sinan UnurDeception in tests considered harmfultag:www.nu42.com,2017-02-16:/2017/02/deception-in-tests-harmful.html2017-02-16T17:00:00+00:00
<article class="h-entry article">
<div class="article-header-container">
<header>
<h1 class="p-name article-title">Deception in tests considered harmful</h1><h2 class="p-author article-author">A. Sinan Unur</h2><h3 class="article-published"><time datetime="2017-02-16T17:00:00+00:00" class="dt-published">February 16, 2017</time></h3>
</header>
</div>
<div class="article-content"><p>I felt really good after finally <a href="https://github.com/MoarVM/MoarVM/pull/528/files?diff=split">ironing out all the kinks in PR #528</a> which added support to Perl 6 on MoarVM for handling Unicode script names, command line arguments, and environment strings on the Windows command line. (As a side note, I am indeed working on a similar fix for Perl, but that is going to take a bit more than 60 lines of changes).</p>
<p>A few days ago, after building <a href="https://github.com/MoarVM/MoarVM">MoarVM</a>, <a href="https://github.com/perl6/nqp">NQP</a>, and <a href="https://github.com/rakudo/rakudo">rakudo</a> from a fresh <code>git pull</code>, I decided to check if any spec tests were failing. I <a href="https://github.com/perl6/roast/issues/232">reported</a> my results. I was rather pleased that very few tests were failing and some of those failures were because I had forgotten to set my code page to UTF-8.</p>
<p>Zoffix Znet <a href="https://github.com/perl6/roast/issues/232#issuecomment-279866951">pointed me to <code>nmake stresstest</code></a>, so I ran that yesterday. That produced <a href="https://github.com/perl6/roast/issues/232#issuecomment-280061047">more test failures</a>, but I thought the failures were probably due to a small number of actual problems so that fixing one small thing might fix a number of test failures. Then, I looked more closely at one of the test failures, and got a knot in my stomach:</p>
<pre class="text"><code>$ perl t\harness5 t\spec\S17-procasync\basic.rakudo.moar --verbosity=5
... not ok 33 - Tapping stdout supply after start of process does not lose data
# Failed test 'Tapping stdout supply after start of process does not lose data'
# at t\spec\S17-procasync\basic.rakudo.moar line 109
# expected: "Hello World\n"
# got: "Hello World\r\n"
...</code></pre>
<p>I have been, shall we say, <em>alarmed</em> by the way <a href="/2015/12/perl6-newline-translation-broken.html">Perl 6 handled differing EOL conventions</a>, but <a href="/2015/12/perl6-newline-behavior-fixed.html">I thought it had been fixed</a>. Soon after I filed my <a href="https://rt.perl.org/Public/Bug/Display.html?id=130788">bug report</a>, Zoffix Znet <a href="https://rt.perl.org/Public/Bug/Display.html?id=130788#txn-1448881">pointed out</a>:</p>
<blockquote>
<p>Looks like several other tests in <code>S17-procasync/basic.t</code> would be failing as well if it weren’t for the explicit kludges added to replace <code>"\r\n"</code> to <code>"\n"</code>. And <code>grep -nFR '\r\n' | grep subs</code> shows 32 potential places with a similar workaround.</p>
</blockquote>
<p>My heart sank!</p>
<p>Handling different EOL conventions is straightforward: When reading text files on platforms where <a href="https://en.wikipedia.org/wiki/Newline">EOL is different than <code>0x0a</code></a>, the platform specific EOL sequences in text streams are converted to a canonicalized form when reading. When writing out text streams, the canonical EOLs are converted to the platform specific byte sequence. I don’t know when people started doing this, but I have been aware of it for more than a few decades. But, alas:</p>
<pre class="text"><code>$ perl -wE "$x = `perl -E ""say 'hello'""`; say 'ok' if $x eq qq{hello\n}"
ok</code></pre>
<p>Good.</p>
<pre class="text"><code>$ perl6 -e "my $x = qx<perl -E ""say 'hello'"">; say $x eq qq<hello\n>"
True
$ perl6 -e "my $x = qx<perl6 -e ""say 'hello'"">; say $x eq qq<hello\n>"
True</code></pre>
<p>Still good. But, then, we have this:</p>
<div class="sourceCode" id="cb4"><pre class="sourceCode perl"><code class="sourceCode perl"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a>$ more t.pl6</span>
<span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a><span class="fu">use</span> v6.c;</span>
<span id="cb4-3"><a href="#cb4-3" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb4-4"><a href="#cb4-4" aria-hidden="true" tabindex="-1"></a><span class="kw">my</span> <span class="dt">$proc</span> = <span class="fu">Proc::Async</span>.new(<span class="ot">'</span><span class="ss">perl</span><span class="ot">'</span>, <-E <span class="ot">"</span><span class="st">say 'hello'</span><span class="ot">"</span>>);</span>
<span id="cb4-5"><a href="#cb4-5" aria-hidden="true" tabindex="-1"></a> </span>
<span id="cb4-6"><a href="#cb4-6" aria-hidden="true" tabindex="-1"></a><span class="dt">$proc</span>.stdout.tap(-> <span class="dt">$v</span> { <span class="fu">say</span> <span class="dt">$v</span> <span class="ot">eq</span> <span class="ot">"</span><span class="st">hello</span><span class="ch">\n</span><span class="ot">"</span>; <span class="fu">say</span> <span class="dt">$v</span>.perl });</span>
<span id="cb4-7"><a href="#cb4-7" aria-hidden="true" tabindex="-1"></a><span class="kw">my</span> <span class="dt">$promise</span> = <span class="dt">$proc</span>.start;</span>
<span id="cb4-8"><a href="#cb4-8" aria-hidden="true" tabindex="-1"></a> </span>
<span id="cb4-9"><a href="#cb4-9" aria-hidden="true" tabindex="-1"></a>await <span class="dt">$promise</span>;</span></code></pre></div>
<pre class="text"><code>$ perl6 t.pl6
False
"hello\r\n"</code></pre>
<p>So, the problem is when <code>Proc::Async</code> is used with text streams, EOL conversion goes out the window. To mask this programming error, test files contain things like:</p>
<div class="sourceCode" id="cb6"><pre class="sourceCode perl"><code class="sourceCode perl"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a><span class="dt">$so</span>.act: { <span class="dt">$stdout</span> ~= <span class="wa">$_</span>.subst(<span class="ot">"</span><span class="ch">\r\n</span><span class="ot">"</span>, <span class="ot">"</span><span class="ch">\n</span><span class="ot">"</span>, :g) };</span>
<span id="cb6-2"><a href="#cb6-2" aria-hidden="true" tabindex="-1"></a><span class="dt">$se</span>.act: { <span class="dt">$stderr</span> ~= <span class="wa">$_</span>.subst(<span class="ot">"</span><span class="ch">\r\n</span><span class="ot">"</span>, <span class="ot">"</span><span class="ch">\n</span><span class="ot">"</span>, :g) };</span></code></pre></div>
<p>These kludges create false negatives: Tests that should have failed pass.</p>
<p>Lying in tests like this serve no purpose other than to deceive not just the author of the tests, but also anyone else who relies on these tests to alert them to problems in their code. Remember, this code is testing <code>Proc::Async</code>, and <code>Proc::Async</code> should be handling EOL conversion on text streams.</p>
<p>Simply put, this is disappointing.</p>
<p>While looking at this, I also discovered something else. I copied and pasted the first code example in the <a href="https://docs.perl6.org/type/Proc::Async">docs for <code>Proc::Async</code></a> and ran it without modification:</p>
<div class="sourceCode" id="cb7"><pre class="sourceCode perl"><code class="sourceCode perl"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a>$ more asyncex.pl6</span>
<span id="cb7-2"><a href="#cb7-2" aria-hidden="true" tabindex="-1"></a><span class="co"># command with arguments</span></span>
<span id="cb7-3"><a href="#cb7-3" aria-hidden="true" tabindex="-1"></a><span class="kw">my</span> <span class="dt">$proc</span> = <span class="fu">Proc::Async</span>.new(<span class="ot">'</span><span class="ss">echo</span><span class="ot">'</span>, <span class="ot">'</span><span class="ss">foo</span><span class="ot">'</span>, <span class="ot">'</span><span class="ss">bar</span><span class="ot">'</span>);</span>
<span id="cb7-4"><a href="#cb7-4" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb7-5"><a href="#cb7-5" aria-hidden="true" tabindex="-1"></a><span class="co"># subscribe to new output from out and err handles:</span></span>
<span id="cb7-6"><a href="#cb7-6" aria-hidden="true" tabindex="-1"></a><span class="dt">$proc</span>.stdout.tap(-> <span class="dt">$v</span> { <span class="fu">print</span> <span class="ot">"</span><span class="st">Output: </span><span class="dt">$v</span><span class="ot">"</span> }, quit => { <span class="fu">say</span> <span class="ot">'</span><span class="ss">caught exception </span><span class="ot">'</span> ~ .^name });</span>
<span id="cb7-7"><a href="#cb7-7" aria-hidden="true" tabindex="-1"></a><span class="dt">$proc</span>.stderr.tap(-> <span class="dt">$v</span> { <span class="fu">print</span> <span class="ot">"</span><span class="st">Error: </span><span class="dt">$v</span><span class="ot">"</span> });</span>
<span id="cb7-8"><a href="#cb7-8" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb7-9"><a href="#cb7-9" aria-hidden="true" tabindex="-1"></a><span class="fu">say</span> <span class="ot">"</span><span class="st">Starting...</span><span class="ot">"</span>;</span>
<span id="cb7-10"><a href="#cb7-10" aria-hidden="true" tabindex="-1"></a><span class="kw">my</span> <span class="dt">$promise</span> = <span class="dt">$proc</span>.start;</span>
<span id="cb7-11"><a href="#cb7-11" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb7-12"><a href="#cb7-12" aria-hidden="true" tabindex="-1"></a><span class="co"># wait for the external program to terminate</span></span>
<span id="cb7-13"><a href="#cb7-13" aria-hidden="true" tabindex="-1"></a>await <span class="dt">$promise</span>;</span>
<span id="cb7-14"><a href="#cb7-14" aria-hidden="true" tabindex="-1"></a><span class="fu">say</span> <span class="ot">"</span><span class="st">Done.</span><span class="ot">"</span>;</span></code></pre></div>
<p>Note that <code>echo</code> is a <code>cmd.exe</code> builtin.</p>
<p>First, a couple of examples not using Perl 6’s <code>Proc::Async</code>:</p>
<pre class="text"><code>$ perl -e "print `echo`"
ECHO is on.
$ perl6 -e "print qx<echo>"
ECHO is on.</code></pre>
<p>So, both <code>perl</code> and <code>perl6</code> are able to handle a <code>qx</code> using a <code>cmd.exe</code> builtin by spawning a shell and running the command there. That is nice, but not required.</p>
<p>Let’s see what happens when the <code>Proc::Async</code> example is run:</p>
<pre class="text"><code>$ perl6 asyncex.pl6
Starting...
MoarVM panic: use of invalid eventloop work item index -1</code></pre>
<p>The fix is simple: Change the constructor invocation to:</p>
<div class="sourceCode" id="cb10"><pre class="sourceCode perl"><code class="sourceCode perl"><span id="cb10-1"><a href="#cb10-1" aria-hidden="true" tabindex="-1"></a><span class="kw">my</span> <span class="dt">$proc</span> = <span class="fu">Proc::Async</span>.new(<span class="ot">'</span><span class="ss">cmd</span><span class="ot">'</span>, </c echo foo bar>);</span></code></pre></div>
<p>but <a href="https://docs.perl6.org/type/Proc::Async">there should be no panic from MoarVM</a> just because <code>Proc::Async</code> cannot spawn something.</p>
<p>Finally, another test failure was a <a href="https://github.com/perl6/roast/issues/233">false positive</a>: A test that should not have been expected to pass was being included in the run. It was failing not due to a problem with any part of the Perl 6 machinery but due to the way symlinks are restricted on Windows.</p>
<p>When I write about flaws and problems I observe in Perl 6, people call me names and claim I am mocking Perl 6, lying about its stability etc. In truth, I want to be enthusiastic about Perl 6, but I am disappointed more often than not.</p>
<p>At this point, correct handling of differing EOL conventions should not be an issue.</p>
<p>You can comment on this post on <a href="https://redd.it/5ug6ws">r/perl</a>.</p>
</div>
</article>
Sinan UnurNotes on Unicode on the command line in Windows with applications to Perl and Perl 6tag:www.nu42.com,2017-02-08:/2017/02/unicode-windows-command-line.html2017-02-08T17:15:00+00:00
<article class="h-entry article">
<div class="article-header-container">
<header>
<h1 class="p-name article-title">Notes on Unicode on the command line in Windows with applications to Perl and Perl 6</h1><h2 class="p-author article-author">A. Sinan Unur</h2><h3 class="article-published"><time datetime="2017-02-08T17:15:00+00:00" class="dt-published">February 8, 2017</time></h3>
</header>
</div>
<div class="article-content"><p>Handling of interesting characters on the command line in Windows or DOS environments has never been an annoyance-free experience. Heck, 30 years ago, I was patching lookup tables in keyboard drivers for IBM PCs and compatibles at METU so we could write stuff using Turkish characters. At the time, there wasn’t even a standard Turkish keyboard layout. So, we have come a long way.</p>
<p>If you are writing a C program from scratch, it is simple to accept all sorts of characters on the command line <em>and</em> work solely with UTF-8 encoded stuff. Instead of <code>main</code>, use <code>wmain</code>:</p>
<div class="sourceCode" id="cb1"><pre class="sourceCode C"><code class="sourceCode c"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="dt">int</span> wmain(<span class="dt">int</span> argc, <span class="dt">wchar_t</span> *argv[], <span class="dt">wchar_t</span> *envp[]) {</span></code></pre></div>
<p>Your program will now receive command line arguments in UTF-16. You can convert the <code>argv</code> and <code>envp</code> arrays to UTF-8 encoding and just work with them or stick with the <code>wchar_t</code> and compatible functions, depending on which makes the most sense for your specific situation.</p>
<p>However, when you are dealing with a script interpreter written mostly for *nix folks, things get hairer. For example, the following behavior always annoyed me:</p>
<pre class="text"><code>C:\> chcp 65001
Active code page: 65001
C:\> perl -CS -E "say 'kârlı iş'"
kârli is
C:\> perl -CS -E "say 'kârlı iş'"|xxd
00000000: 6bc3 a272 6c69 2069 730d 0a k..rli is..</code></pre>
<p>What happened there? Why do we see “â” but not “ı” or “ş”?</p>
<p>Simple, <code>perl</code> does not define a <code>wmain</code>, but uses the standard <code>main</code> function as the entry point. Therefore, arguments to it are simple <code>char</code>s corresponding to entries in the current ANSI code page. Windows looks at the string passed to this program, and tries to map the arguments to their best representation using the characters available in the OS’ code page. In my case, this is CP 437 (I have <em>never</em> used anything other than the US code page simply because, throughout the decades, it was easier to give up using “ş” in filenames than dealing with various uncertainties in various incarnations of DOS and Windows). As luck would have it “â” does exist in CP 437 at <code>0xE2</code>. Using <code>-CS</code>, I told <code>perl</code> to encode the output in UTF-8 and I set the locale code page to UTF-8, so I get the correcly encoded output displayed correctly. <em>phew!</em></p>
<p>But, the string lost its original meaning in the process: A “profitable business” has become “profitable soot” (most Turks are not fooled by accidental substitutions of “i” for “ı” :-) That is because neither “ı” not “ş” are in CP 437.</p>
<p>This behavior is not specific to <code>perl</code>, but Perl is the language I use most often.</p>
<p>What happens if we ask <code>perl</code> to execute a file whose name contains another character that does not exist in the ANSI code page?</p>
<pre class="text"><code>C:\> perl yağmur
Can't open perl script "yagmur": No such file or directory</code></pre>
<p>Yup, the <a href="https://www.howtosayinturkish.com/contents/alphabet/">Turkish soft g, “ğ”</a> does not exist in CP 437 either, so “g” is substituted in its place with predictable effects.</p>
<p>None of this is original or new. And none of it prevented me from doing extremely useful work in many languages on Windows using Perl by avoiding the trouble spots. I avoided investing time into figuring out a solution, because I was convinced such a fix would have to touch way too many spots all throughout Perl’s source code and I did not feel up for that.</p>
<p>So, that was a long intro. I am going to ask you to tuck that away for a bit while I digress a little.</p>
<p>A couple of weeks ago, <a href="https://www.learningperl6.com/">brian</a> and I were discussing a hidden gotcha with <code>perl6</code>. Currently, <code>perl6</code> on Windows is a batch file and on *nix systems it is a shell script. Which means invoking it via <code>system</code> or opening a pipe ends up involving a shell no matter what you do … That is not the end of the world, but it is problematic in certain contexts.</p>
<p>These shell scripts and batch files are just wrappers around <code>moar</code> invocations. The thought occurred to me that one could just templatize a simple OS-specific C file to wrap the invocation of <code>moar</code>. Then, <code>Configure.pl</code> would fill in the various paths, and, bingo, <code>system perl6 => @args</code> no longer needs to involve the shell. Of course, the Windows version of this idea is more fiddly because you have to take the command line arguments passed to the wrapper and flatten them correctly to a string containing both the arguments to <code>moar</code> <em>and</em> the arguments passed to the wrapper because <code>CreateProcess</code> expects command line arguments in a string.</p>
<p>While writing the code for flattening the arguments (and making sure everything is correctly quoted and escaped), another thought popped up: <code>perl</code> has for a long time allowed one to specify that command line arguments are UTF-8 encoded. Except, on Windows, it doesn’t work well because by the time <code>perl</code>’s main sees the arguments, they have already been mapped to whatever ANSI code page by Windows.</p>
<p>What if my wrapper used <code>wmain</code> so it received the command line arguments in UTF-16, but used <code>CreateProcessA</code> to invoke <code>perl</code> with the <code>-CA</code> argument along with any additional arguments specified on the command line? (As far as I can tell, I can’t use a similar flag with <code>perl6</code> or <code>moar</code>.)</p>
<p>If I did that, I could encode the path to <code>perl</code> using the ANSI code page and append the arguments to the wrapper to the plain <code>char</code> array holding the command line after encoding them in UTF-8. I wrote a simple proof of concept. Lo and behold, it works on my simple set up:</p>
<pre class="text"><code>C:\> p5run -Mutf8 -CS -E "say 'kârlı iş'"
kârlı iş</code></pre>
<p>except …</p>
<pre class="text"><code>C:\> p5run -CS yağmur
Can't open perl script "yağmur": No such file or directory</code></pre>
<p>That’s what we economists call a Pareto-improvement: The situation is made better in some contexts and no worse in others. Not perfect, but a movement in the right direction.</p>
<p>At this point, I remembered that Perl 6 is designed from the ground up around Unicode and the wrapper may have more success there. So, I cobbled together something and I was met with disappointment:</p>
<pre class="text"><code>C:\> p6run -e "say 'kârlı iş'"
kârlı iş</code></pre>
<p>Ouch! Clearly, something somewhere was re-encoding things.</p>
<p>I must admit, I am still not comfortable with exactly how all the layers involved in executing Perl 6 code fit together, so I went searching in GitHub repositories. During the process, I <a href="https://github.com/perl6/nqp/issues/346">filed a confused bug report</a> because I got fooled by GitHub’s syntax highlighting inside a POD section, but that serendipitously led to <a href="https://github.com/perl6/nqp/issues/346#issuecomment-278118005">timo pointing me in the right direction</a>.</p>
<p>The deed indeed happens in <a href="https://github.com/MoarVM/MoarVM/blob/8193e8e983e2e57d6dc868be0d3547c55f2697bd/src/io/procops.c#L1243">MoarVm/src/io/procops.c</a>:</p>
<div class="sourceCode" id="cb7"><pre class="sourceCode C"><code class="sourceCode c"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a> MVMROOT(tc, clargs, {</span>
<span id="cb7-2"><a href="#cb7-2" aria-hidden="true" tabindex="-1"></a> <span class="dt">const</span> MVMuint16 acp = GetACP();</span>
<span id="cb7-3"><a href="#cb7-3" aria-hidden="true" tabindex="-1"></a> <span class="dt">const</span> MVMint64 num_clargs = instance->num_clargs;</span>
<span id="cb7-4"><a href="#cb7-4" aria-hidden="true" tabindex="-1"></a> MVMint64 count;</span>
<span id="cb7-5"><a href="#cb7-5" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb7-6"><a href="#cb7-6" aria-hidden="true" tabindex="-1"></a> MVMString *prog_string = MVM_string_utf8_c8_decode(tc,</span>
<span id="cb7-7"><a href="#cb7-7" aria-hidden="true" tabindex="-1"></a> instance->VMString,</span>
<span id="cb7-8"><a href="#cb7-8" aria-hidden="true" tabindex="-1"></a> instance->prog_name, strlen(instance->prog_name));</span>
<span id="cb7-9"><a href="#cb7-9" aria-hidden="true" tabindex="-1"></a> MVMObject *boxed_str = MVM_repr_box_str(tc,</span>
<span id="cb7-10"><a href="#cb7-10" aria-hidden="true" tabindex="-1"></a> instance->boot_types.BOOTStr, prog_string);</span>
<span id="cb7-11"><a href="#cb7-11" aria-hidden="true" tabindex="-1"></a> MVM_repr_push_o(tc, clargs, boxed_str);</span>
<span id="cb7-12"><a href="#cb7-12" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb7-13"><a href="#cb7-13" aria-hidden="true" tabindex="-1"></a> <span class="cf">for</span> (count = <span class="dv">0</span>; count < num_clargs; count++) {</span>
<span id="cb7-14"><a href="#cb7-14" aria-hidden="true" tabindex="-1"></a> <span class="dt">char</span> *raw_clarg = instance->raw_clargs[count];</span>
<span id="cb7-15"><a href="#cb7-15" aria-hidden="true" tabindex="-1"></a> <span class="dt">char</span> * <span class="dt">const</span> _tmp = ANSIToUTF8(acp, raw_clarg); <span class="co">/* <-- here, line 1243 */</span></span>
<span id="cb7-16"><a href="#cb7-16" aria-hidden="true" tabindex="-1"></a> MVMString *string = MVM_string_utf8_c8_decode(tc,</span>
<span id="cb7-17"><a href="#cb7-17" aria-hidden="true" tabindex="-1"></a> instance->VMString, _tmp, strlen(_tmp));</span>
<span id="cb7-18"><a href="#cb7-18" aria-hidden="true" tabindex="-1"></a> MVM_free(_tmp);</span>
<span id="cb7-19"><a href="#cb7-19" aria-hidden="true" tabindex="-1"></a> boxed_str = MVM_repr_box_str(tc,</span>
<span id="cb7-20"><a href="#cb7-20" aria-hidden="true" tabindex="-1"></a> instance->boot_types.BOOTStr, string);</span>
<span id="cb7-21"><a href="#cb7-21" aria-hidden="true" tabindex="-1"></a> MVM_repr_push_o(tc, clargs, boxed_str);</span>
<span id="cb7-22"><a href="#cb7-22" aria-hidden="true" tabindex="-1"></a> }</span>
<span id="cb7-23"><a href="#cb7-23" aria-hidden="true" tabindex="-1"></a> });</span></code></pre></div>
<p>So when my wrapper encodes the command line arguments in UTF-8 and passes them to <code>moar</code>, they go through the blender … and out come some minced guts or some such. To verify my intuition, I deleted lines 1243 and 1246 and rebuilt MoarVM. This time, my wrapper gave the correct output.</p>
<p>That meant I just had to make sure command line arguments got encoded in UTF-8 at the earliest opportunity. I added the following function to <code>procops.c</code>:</p>
<div class="sourceCode" id="cb8"><pre class="sourceCode C"><code class="sourceCode c"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a>MVM_PUBLIC <span class="dt">char</span> **</span>
<span id="cb8-2"><a href="#cb8-2" aria-hidden="true" tabindex="-1"></a>UnicodeToUTF8_argv(<span class="dt">const</span> <span class="dt">int</span> argc, <span class="dt">const</span> <span class="dt">wchar_t</span> **wargv)</span>
<span id="cb8-3"><a href="#cb8-3" aria-hidden="true" tabindex="-1"></a>{</span>
<span id="cb8-4"><a href="#cb8-4" aria-hidden="true" tabindex="-1"></a> <span class="dt">int</span> i;</span>
<span id="cb8-5"><a href="#cb8-5" aria-hidden="true" tabindex="-1"></a> <span class="dt">char</span> **argv = MVM_malloc((argc + <span class="dv">1</span>) * <span class="kw">sizeof</span>(*argv));</span>
<span id="cb8-6"><a href="#cb8-6" aria-hidden="true" tabindex="-1"></a> <span class="cf">for</span> (i = <span class="dv">0</span>; i < argc; ++i)</span>
<span id="cb8-7"><a href="#cb8-7" aria-hidden="true" tabindex="-1"></a> {</span>
<span id="cb8-8"><a href="#cb8-8" aria-hidden="true" tabindex="-1"></a> argv[i] = UnicodeToUTF8(wargv[i]);</span>
<span id="cb8-9"><a href="#cb8-9" aria-hidden="true" tabindex="-1"></a> }</span>
<span id="cb8-10"><a href="#cb8-10" aria-hidden="true" tabindex="-1"></a> argv[i] = NULL;</span>
<span id="cb8-11"><a href="#cb8-11" aria-hidden="true" tabindex="-1"></a> <span class="cf">return</span> argv;</span>
<span id="cb8-12"><a href="#cb8-12" aria-hidden="true" tabindex="-1"></a>}</span></code></pre></div>
<p>and modified <code>MoarVM/main.c</code> to use <code>wmain</code> on Windows:</p>
<div class="sourceCode" id="cb9"><pre class="sourceCode C"><code class="sourceCode c"><span id="cb9-1"><a href="#cb9-1" aria-hidden="true" tabindex="-1"></a><span class="pp">#ifndef _WIN32</span></span>
<span id="cb9-2"><a href="#cb9-2" aria-hidden="true" tabindex="-1"></a><span class="dt">int</span> main(<span class="dt">int</span> argc, <span class="dt">char</span> *argv[])</span>
<span id="cb9-3"><a href="#cb9-3" aria-hidden="true" tabindex="-1"></a><span class="pp">#else</span></span>
<span id="cb9-4"><a href="#cb9-4" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb9-5"><a href="#cb9-5" aria-hidden="true" tabindex="-1"></a><span class="dt">char</span> ** UnicodeToUTF8_argv(<span class="dt">const</span> <span class="dt">int</span> argc, <span class="dt">const</span> <span class="dt">wchar_t</span> **wargv);</span>
<span id="cb9-6"><a href="#cb9-6" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb9-7"><a href="#cb9-7" aria-hidden="true" tabindex="-1"></a><span class="dt">int</span> wmain(<span class="dt">int</span> argc, <span class="dt">wchar_t</span> *wargv[])</span>
<span id="cb9-8"><a href="#cb9-8" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb9-9"><a href="#cb9-9" aria-hidden="true" tabindex="-1"></a><span class="pp">#endif</span></span>
<span id="cb9-10"><a href="#cb9-10" aria-hidden="true" tabindex="-1"></a>{</span>
<span id="cb9-11"><a href="#cb9-11" aria-hidden="true" tabindex="-1"></a> MVMInstance *instance;</span>
<span id="cb9-12"><a href="#cb9-12" aria-hidden="true" tabindex="-1"></a> <span class="dt">const</span> <span class="dt">char</span> *input_file;</span>
<span id="cb9-13"><a href="#cb9-13" aria-hidden="true" tabindex="-1"></a> <span class="dt">const</span> <span class="dt">char</span> *executable_name = NULL;</span>
<span id="cb9-14"><a href="#cb9-14" aria-hidden="true" tabindex="-1"></a> <span class="dt">const</span> <span class="dt">char</span> *lib_path[<span class="dv">8</span>];</span>
<span id="cb9-15"><a href="#cb9-15" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb9-16"><a href="#cb9-16" aria-hidden="true" tabindex="-1"></a><span class="pp">#ifdef _WIN32</span></span>
<span id="cb9-17"><a href="#cb9-17" aria-hidden="true" tabindex="-1"></a> <span class="dt">char</span> **argv = UnicodeToUTF8_argv(argc, wargv);</span>
<span id="cb9-18"><a href="#cb9-18" aria-hidden="true" tabindex="-1"></a><span class="pp">#endif</span></span></code></pre></div>
<p>and rebuilt MoarVM (note that creating the UTF-8 encoded <code>argv</code> array involves allocation memory which needs to be freed at some point, but, at this point, I am just exploring).</p>
<p>And, here we go:</p>
<pre class="text"><code>C:\> perl6 -e "say 'kârlı iş'"
kârlı iş</code></pre>
<p>and</p>
<pre class="text"><code>C:\> type yağmur
say "it's raining!";
C:\> perl6 yağmur
it's raining!</code></pre>
<p>I haven’t had time to run the test suites yet. In addition, <a href="https://github.com/MoarVM/MoarVM/blob/8193e8e983e2e57d6dc868be0d3547c55f2697bd/src/io/procops.c#L54"><code>MVM_proc_getenvhash</code></a> also needs to be fixed in a similar manner:</p>
<pre class="text"><code>C:\> set iş=kârlı
C:\> @echo %iş%
kârlı
C:\> perl6 -e "say %*ENV<iş>"
(Any)
C:\> perl6 -e "say %*ENV<is>"
kârli</code></pre>
<p>That’s why I haven’t put together a <a href="https://github.com/MoarVM/MoarVM/pull/528/files?diff=split">pull request</a> yet.</p>
<p>The discovery process itself was interesting enough for me to want to share it. I’ll take care of the <a href="https://github.com/MoarVM/MoarVM/pull/528/files?diff=split">pull request</a> as soon as I can. If someone decides to go ahead and patch MoarVM with these changes or improve upon them, I am OK with that, too. In that case I would really appreciate an acknowledgement. I think I deserved one in response to <a href="https://www.nu42.com/2015/12/perl6-newline-behavior-fixed.html">my discovery of erroneous EOL handling</a>, among others.</p>
<p>I am not sure if the fix to <code>perl</code> will be so straightforward.</p>
<p>PS: For reference, examples using other interpreters:</p>
<pre class="text"><code>C:\> ruby -e "print 'kârlı iş'"
kârlı iş
C:\> python3.6.exe -c "print('kârlı iş')"
kârlı iş
C:\> python2.7.exe -c "print 'kârlı iş'"
k�rli is</code></pre>
<p>PPS: I still think wrapping <code>moar</code> using a proper C program is the way to go and I am working on a nice templatable wrapper on Windows which I’ll make available soon.</p>
<p>PPPS: You can <a href="https://redd.it/5su0dc">discuss this post</a> on <a href="https://redd.it/5su0dc">r/perl</a>.</p>
<p>PPPPS: Here is the <a href="https://github.com/MoarVM/MoarVM/pull/528/files?diff=split">pull request</a>.</p>
</div>
</article>
Sinan Unur\c[PERSON FROWNING, ZERO WIDTH JOINER, PROGRAMMER]tag:www.nu42.com,2017-02-01:/2017/02/perl6-programmer-frowning.html2017-02-01T18:15:00+00:00
<article class="h-entry article">
<div class="article-header-container">
<header>
<h1 class="p-name article-title">\c[PERSON FROWNING, ZERO WIDTH JOINER, PROGRAMMER]</h1><h2 class="p-author article-author">A. Sinan Unur</h2><h3 class="article-published"><time datetime="2017-02-01T18:15:00+00:00" class="dt-published">February 1, 2017</time></h3>
</header>
</div>
<div class="article-content"><p>In December 2015, I was rather <a href="/2015/12/build-rakudo-star-perl6-visual-studio-windows.html">looking forward to the release of Perl 6</a>. I went ahead and tried to see if it would build with Microsoft’s tools on Windows. I immediately ran into some issues which I detailed in that post. One of the issues was a discrepancy between the specifications and the specification tests that only arose on Windows (<a href="https://rt.perl.org/Public/Bug/Display.html?id=126876">bug report</a>, <a href="https://github.com/perl6/roast/pull/87">pull request</a>). Later, digging deeper into other test failures I had seen, I noticed that <a href="/2015/12/perl6-newline-translation-broken.html">Perl 6 was breaking</a> well-established ways of treating cross-platform EOL conventions. The behavior was eventually fixed before the release.</p>
<p>Of course, my intention was not to spend hours tracking down elusive bugs. I was eager to try things out. I was enthusiastic. But, I did end up spending time on those issues. Then I ran out of time. Time is the ultimate scarce resource, after all.</p>
<p>My enthusiasm for Perl 6 flared up again when brian starting writing <a href="https://www.learningperl6.com/">Learning Perl 6</a>. I decided to build Rakudo Star release from source following <a href="http://rakudo.org/how-to-get-rakudo/#Installing-Rakudo-Star-Source-Manual">instructions available on their web site</a>. I built everything, installed <a href="https://github.com/tadzik/panda">panda</a> as was recommended at the time and then proceeded with <code>panda install Task::Star</code>. I ran into <a href="https://stackoverflow.com/q/41416011/100754">a problem</a> with one of the modules <code>Task::Star</code> wanted to install.</p>
<p>In response to my question, I was told <code>panda</code> was no longer recommended, and I should have used <a href="https://github.com/ugexe/zef"><code>zef</code></a>, but, of course that would not have fixed the problem I was having with building the module (which, keep in mind, is only being installed because <code>Task::Star</code> requires it). My question prompted <a href="https://github.com/hoelzro/p6-linenoise/pull/19">the build bug to be fixed</a>. I proceeded, got my <code>perl6</code> and jumped into the REPL to be greeted by this:</p>
<div class="thumb"><a href="/2017/02/perl6-repl-error-2016-12.jpg"><img src="https://www.nu42.com/2017/02/perl6-repl-error-2016-12.jpg" width="600" title="Failure to load Linenoise results in cryptic error message in Perl 6 REPL"></a></div>
<p>When <a href="https://twitter.com/sinan_unur/status/816304327761547264">I tweeted about this</a>, I was told <q>Your “adventures” will be much smoother if you use the user distribution instead of building the dev branch.</q> Well, that’s more than a little passive aggressive: The reason I am building from the dev branch is so I am not re-discovering bugs that have already been fixed: I am trying to be helpful by pointing out what the devs are missing.</p>
<p>So, what are some of the problems illustrated in the screenshot above?</p>
<ul>
<li><p>First, and foremost, the module was just built in front of my eyes, passed its tests, yet it can’t be used. There is clearly something missing from the tests.</p></li>
<li><p>Second, failure to load a module for supporting tab completions etc is not a big deal. There is no need to make so much noise every time I get in the REPL. Just a gentle reminder would be enough. Keep the noise optional.</p></li>
<li><p>Third, in fact, I do have basic line editor functionality, just no tab completion.</p></li>
<li><p>Fourth, the REPL tells me I can exit using <kbd>CTRL-D</kbd> which does not work. Neither does the corresponding Windows keypress, <kbd>CTRL-Z</kbd>. The resulting error message is weird.</p></li>
<li><p>In response to above, I was told <kbd>CTRL-D</kbd> works when <code>perl6</code> is built with MinGW … That’s also incorrect behavior. On Windows, EOF is signalled with <kbd>CTRL-Z</kbd>. I am in a CMD shell. Not in Cygwin bash or anything else. Besides, I am trying to see how <code>perl6</code> works when built with Microsoft’s tools. So, what works with MinGW is not relevant.</p></li>
</ul>
<p>Given the non-cooperative response from <a href="http://news.perlfoundation.org/2017/01/january-2017-grant-votes.html">that leading Perl 6 developer</a>, I decided to suspend my tinkering and wait for the next release that was due soon.</p>
<p>A few days ago, <a href="http://rakudo.org/2017/01/30/announce-rakudo-star-release-2017-01/">Rakudo Star 2017.01 was released</a>. I immediately downloaded the source (note that the downloads are over http and no checksums are given, so do download at your own risk), and proceeded to follow the instructions. This time, thankfully, I did not have to wait long to run into a problem:</p>
<div class="thumb"><a href="/2017/02/rakudo-star-2017-01-build-problem.png"><img src="https://www.nu42.com/2017/02/rakudo-star-2017-01-build-problem.png" width="600" title="Rakudo Star 2017.01 build failure"></a></div>
<p>It turns out the <a href="https://github.com/MoarVM/MoarVM/commit/357438a99c63c2caa7c927e60dac16ee2e60a3a7#diff-c62c27f5e5a1685d74e427c9a639f10a">fix is easy</a> … But, that condescending remark about sticking with user releases rather than playing with dev branches for a smooth experience … well, I am just going to say that this bug would probably have been discovered sooner had I kept playing with the dev branch.</p>
<p>So, I deleted the offending line and proceeded to build. When I typed <code>nmake test</code>, I got this:</p>
<div class="thumb"><a href="/2017/02/rakudo-star-2017-01-nmake-test-message.png"><img src="https://www.nu42.com/2017/02/rakudo-star-2017-01-nmake-test-message.png" width="600" title="Rakudo Start 2017.01 nmake test message gobbles directory separators"></a></div>
<p>What is the problem? Well, where are those pesky directory separators?</p>
<p>Exactly!</p>
<p>Anyway, I ran all of those tests. There were some failures. Some failures went away when I issued <code>chcp 65001</code> before running them. Honestly, at this point, I am losing steam: My only interest in Perl 6 is as an enthusiast who wants it to succeed mostly because it is related to Perl and it is a Larry Wall project and a good friend is writing a book about it. All sentimental. There is no problem I can solve with Perl 6 I can’t already solve with either Perl or C++ or SAS or Stata or SQL. If tomorrow Perl 6 ceased to exist, it would make no difference in my life or the satisfaction I derive from tinkering with other things.</p>
<p>Lest you think I think I am “all that”, I am not. If Perl 6 is ever going to grow outside of the tight knit group who are all high fiving each other feverishly every time a new emoji is minted, it is going to have to attract enhusiasts in significant numbers who are going to carry the language forward. Take a look at the way we write Perl now: Does that bear any relation to the way Larry Wall wrote Perl? In fact, the only reason I fell in love with Perl is because I started with 5.6, not any of the earlier versions.</p>
<p>If Perl 6 is going to grow, it is going to grow among people who will use it and write it in ways the people who built it did not anticipate. For that to happen, a lot of friction needs to go away so that one does not run into some kind of problem every time one starts tinkering.</p>
<p>In any case, I finally had my fresh <code>perl6</code> binary. As I was familiarizing myself with regex stuff, I was running a number of one-liners. Some of those one liners repeatedly read stuff from <code>STDIN</code> and carried out some string manipulation and printed output in a loop. The usual way to get out of such loops whether you are in <code>fish</code> or <code>cmd.exe</code> is to signal EOF via the keyboard: On Unix-like OSes, this is <kbd>CTRL-D</kbd>. In the DOS/Windows world, <kbd>CTRL-Z</kbd> has carried that role since time eternal. But, it seemed like it was not working. I have a faint memory of it working at some point, but I may be wrong. In any case, I kept tinkering with my one liners, until I was pretty sure it wasn’t my fault that <kbd>CTRL-Z</kbd> wasn’t working. To illustrate the problem, I posted a <a href="https://twitter.com/sinan_unur/status/826443292204216320">screenshot on Twitter</a>.</p>
<p>And, of course, I should have done something else:</p>
<blockquote>
<p>I don’t get the errors <a href="https://twitter.com/sinan_unur/status/826443292204216320"><code>@sinan_unur</code> see’s</a> (sic) in the REPL… but I installed using .msi installer, which is how most people should.</p>
</blockquote>
<p>Well … No, I am not going to do that. These setup programs have their own ideas of where things should go etc. Not interested. It is not productive to say “do it some other way” every time I run into a problem while <em>following published instructions</em>.</p>
<p>And, of course, it is not helpful to point out “Perl 6 one can be done with <code>.say for lines</code>.” Who cares? Does using <code>.say for lines</code> magically make <code>perl6</code> terminate the loop? No. My screenshot had <code>perl6 -e "for $*IN.lines -> $x { say $x }"</code> because the one liner I had started from was embarassingly long and I kept taking things out trying to figure out what was going on. I didn’t say, “oh, let me write the shortest single line <code>echo</code>”.</p>
<p>Then we come to the</p>
<blockquote>
<p>As for <code>^Z</code>, I question whether <code>@rakudoperl</code> should treat <code>\x1A</code> as <code>EOF</code> in this age.</p>
</blockquote>
<p>Well, how do you propose to signal <code>EOF</code> from the command line then? Clearly, <kbd>CTRL-D</kbd> did not work. And, if I have to <kbd>CTRL-C</kbd> to terminate terminal input, well, that has the presumably not so desirable side effect of terminating the program as well. So, what do you propose?</p>
<p>That’s a rhetorical question. The answer will probably be “Don’t build from source using Visual Studio. Stick with MinGW.”</p>
<p>I am a fan of <code>gcc</code>, but using <a href="/2016/03/tar-anomaly.html">MinGW</a> is not always the right answer.</p>
<p>There will probably be responses to this post along the usual lines, “get on IRC”, “submit bug reports”, “submit pull requests” etc. I am not interested. Apparently, trying to make Perl 6 better by exploring problematic corners and publicly discussing issues is considered “trashing Perl 6” among the regulars of <a href="https://irclog.perlgeek.de/perl6/2017-01-26#i_13994397">#perl6</a>.</p>
<div class="thumb"><a href="/2017/02/perl6-irc-bullying.png"><img src="https://www.nu42.com/2017/02/perl6-irc-bullying.png" width="600" title="Recent TPF grant recipient is not very civil"></a></div>
<p>Apparently, publicly announcing my support for Learning Perl 6 makes me “brian’s pet”. Why the insult? I have only ever made logical arguments without calling anyone names even when expressing my frustrations.</p>
<p>Now I find out that people are casually trying to marginalize me by resorting to name calling with absolutely no reaction from other participants in <code>#perl6</code>, especially Larry Wall whom I hold in very high esteem.</p>
<p>It has been almost five years since schwern made a bunch of us stand up so he could <a href="/2012/06/dont-box-me-in-yapcna-2012-impressions.html">try to shame us for our “privilege”</a>. I resented then being lumped into one homogenous group and being told I am the cause of lack of diversity at YAPC::NA 2012 just because of my skin color. And, I resent now the attitude that somehow advocating for a project I believe in makes me someone’s pet. I wonder if people would be just as comfortable with this language if my name were “Pete”? Do they think I should just know my place and not voice my opinion? Do they think my opinions could not possibly have been mine? For what reason other than blind prejudice?</p>
<p>I am still interested in Perl 6. It is an intriguing and promising language, even though they still can’t get some of the <a href="https://rt.perl.org/Public/Bug/Display.html?id=125757">basics right</a>. Try running this:</p>
<div class="sourceCode" id="cb1"><pre class="sourceCode perl"><code class="sourceCode perl"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="fu">use</span> v6;</span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a><span class="fu">use</span> Test;</span>
<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a>plan <span class="dv">1</span>;</span>
<span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-5"><a href="#cb1-5" aria-hidden="true" tabindex="-1"></a><span class="kw">my</span> <span class="dt">$p</span> = shell(<span class="ot">'</span><span class="ss">false</span><span class="ot">'</span>, :out);</span>
<span id="cb1-6"><a href="#cb1-6" aria-hidden="true" tabindex="-1"></a>isnt <span class="dt">$p</span>.exitcode, <span class="dv">0</span>, <span class="ot">'</span><span class="ss">exit code of false should not be 0</span><span class="ot">'</span>;</span></code></pre></div>
<p>Going forward, I am going to occasionally try it out and, of course, write about my experiences because I enjoy trying things out and writing about my experiences. But, until Perl 6 developers accept input from people who are not members of their tribe more gracefully, I will stay away.</p>
<p>In the mean time, enjoy inserting emoji in your programs, but good luck if you need to capture output <em><strong>and</strong></em> test an exit code.</p>
</div>
</article>
Sinan UnurStock market behavior around "change" presidential elections in the U.S.tag:www.nu42.com,2017-01-17:/2017/01/stock-market-presidential-election.html2017-01-17T21:30:00+00:00
<article class="h-entry article">
<div class="article-header-container">
<header>
<h1 class="p-name article-title">Stock market behavior around "change" presidential elections in the U.S.</h1><h2 class="p-author article-author">A. Sinan Unur</h2><h3 class="article-published"><time datetime="2017-01-17T21:30:00+00:00" class="dt-published">January 17, 2017</time></h3>
</header>
</div>
<div class="article-content"><blockquote>
<p>Nothing discussed herein should be taken as investment advice or as a recommendation regarding any particular investment vehicle or course of action. All statements herein are statements of subjective opinion and for information and entertainment purposes only. Seek a duly licensed professional for investment advice.</p>
</blockquote>
<p>While writing yesterday’s post on <a href="/2017/01/sell-inauguration.html">the idea of selling the inauguration</a>, I noticed that the behavior of S&P500 around President Obama’s election in 2008 was highly positively correlated with its behavior around the first time President Bush was elected in 2000. This is not something I had expected <em>a priori</em>, so I decided to take a closer look.</p>
<p>Sticking with the basic features of the <a href="http://www.marketwatch.com/story/how-long-post-election-rallies-last-after-inauguration-day-in-one-sp-chart-2017-01-13">MarketWatch article</a>, I’ll restrict myself to the presidential elections since 1950 where the party holding the White House changed: 1952 (Eisenhower), 1960 (Kennedy), 1968 (Nixon), 1976 (Carter), 1980 (Reagan), 1992 (Clinton), 2000 (Bush), and 2008 (Obama). I will be looking at S&P 500 levels relative to election day performance over the period 100 days before the election to 100 days after the election.</p>
<p>To refresh your memory, here’s what things look like if you plot each 201-point series on the same chart:</p>
<div class="thumb"><a href="/2017/01/sp500-party-change-elections.png"><img src="https://www.nu42.com/2017/01/sp500-party-change-elections.png" width="600" title="SP 500 behavior around election date, change elections"></a></div>
<p>Clearly, there is a lot of clutter there. One can “identify” some patterns by squinting hard this or that way, but that’s rather subjective and error-prone. So, let’s try to reduce the variability to its essential components.</p>
<p>Take out your favorite Stats tool, and run a simple Principal Components on the eight variables (I’ll be using <a href="https://www.stata.com/">Stata</a>):</p>
<pre><code>. pca obama bush clinton reagan carter nixon kennedy eisenhower
Principal components/correlation Number of obs = 201
Number of comp. = 8
Trace = 8
Rotation: (unrotated = principal) Rho = 1.0000
--------------------------------------------------------------------------
Component | Eigenvalue Difference Proportion Cumulative
-------------+------------------------------------------------------------
Comp1 | 4.52593 2.92298 0.5657 0.5657
Comp2 | 1.60295 .477661 0.2004 0.7661
Comp3 | 1.12529 .798646 0.1407 0.9068
Comp4 | .326645 .14957 0.0408 0.9476
Comp5 | .177076 .0697737 0.0221 0.9697
Comp6 | .107302 .0200748 0.0134 0.9831
Comp7 | .0872271 .0396505 0.0109 0.9941
Comp8 | .0475765 . 0.0059 1.0000
--------------------------------------------------------------------------</code></pre>
<p>Let’s look at what those components look like:</p>
<pre><code>. predict pc1-pc8, score
Scoring coefficients
sum of squares(column-loading) = 1
----------------------------------------------------------------------------------------------
Variable | Comp1 Comp2 Comp3 Comp4 Comp5 Comp6 Comp7 Comp8
-------------+--------------------------------------------------------------------------------
obama | -0.4340 -0.1862 0.1284 0.1272 0.2261 0.1264 0.7981 0.2009
bush | -0.4345 0.0403 0.1023 0.3964 0.5330 0.1474 -0.5363 0.2205
clinton | 0.4327 -0.1531 0.1615 0.1364 0.0452 0.8585 0.0211 -0.0717
reagan | 0.3500 0.4276 0.0707 -0.4628 0.6420 -0.0594 0.1275 0.2086
carter | -0.1901 0.0409 0.8296 -0.3474 -0.2991 0.0386 -0.1445 0.2034
nixon | 0.1406 0.7037 0.0618 0.5238 -0.3036 -0.0032 0.1584 0.2992
kennedy | 0.3629 -0.4763 -0.0587 0.1013 -0.0546 -0.2049 -0.0640 0.7607
eisenhower | 0.3543 -0.1835 0.4965 0.4342 0.2569 -0.4222 0.0933 -0.3909
----------------------------------------------------------------------------------------------</code></pre>
<p>We can plot the first three principal components:</p>
<h3 id="pc1"><code>pc1</code></h3>
<p>This component represent a steady rise in the index over the period 100 days before the election to 100 days after the election.</p>
<div class="thumb"><a href="/2017/01/sp500-election-pc1.png"><img src="https://www.nu42.com/2017/01/sp500-election-pc1.png" width="600" title="First Principal Component"></a></div>
<h3 id="pc2"><code>pc2</code></h3>
<p>This component represents a volatile increase leading up to the day of the election, a dip on election day with a bump following the election and then a steady decline to pre-election levels, ending up on a slightly up-note.</p>
<div class="thumb"><a href="/2017/01/sp500-election-pc2.png"><img src="https://www.nu42.com/2017/01/sp500-election-pc2.png" width="600" title="Second Principal Component"></a></div>
<h3 id="pc3"><code>pc3</code></h3>
<p>The third component represents a rocky road of generally higher index levels leading up to the election with an interesting local maximum on election day, followed by a nice post-election bump which disappears over the course of the following days and ends on a down note.</p>
<div class="thumb"><a href="/2017/01/sp500-election-pc3.png"><img src="https://www.nu42.com/2017/01/sp500-election-pc3.png" width="600" title="Third Principal Component"></a></div>
<p>A combination of the patterns exhibited by the second and third principal components seems to be what most analysts have in mind when they talk about selling the inauguration. If we are in a world that corresponds to such a pattern, then the “Trump bump” has probably gone as high as it could, and will soon disappear.</p>
<p>The focus of this post is on identifying which elections exhibit similarity in the behavior of the stock market around them. To that end, let’s restrict our attention only to the first three principal components:</p>
<pre><code>-----------------------------------------
Variable | Comp1 Comp2 Comp3
-----------+-----------------------------
obama | -0.4340 -0.1862 0.1284
bush | -0.4345 0.0403 0.1023
clinton | 0.4327 -0.1531 0.1615
reagan | 0.3500 0.4276 0.0707
carter | -0.1901 0.0409 0.8296
nixon | 0.1406 0.7037 0.0618
kennedy | 0.3629 -0.4763 -0.0587
eisenhower | 0.3543 -0.1835 0.4965
-----------------------------------------</code></pre>
<p>What do those numbers mean? Let’s, for the sake of example, look at the coefficients for S&P 500 around President Obama’s election. The coefficients imply that we can approximate its behavior by combining approximately 3.4 parts steady decline (<code>pc1</code>), 1.5 parts <code>pc2</code> decline with bump on election day and recovery, and one part <code>pc3</code> increase with a crash around election day followed by a bump and decline.</p>
<p>Just eyeballing these coefficient vectors, it looks like the behavior of the stock market around the 2008 election is most similar to its behavior around the 2000 election. We don’t have to trust our eyes, though. We can calculate how similar these vectors are using a simple Perl script:</p>
<div class="sourceCode" id="cb4"><pre class="sourceCode perl"><code class="sourceCode perl"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a><span class="kw">#!/usr/bin/env perl</span></span>
<span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb4-3"><a href="#cb4-3" aria-hidden="true" tabindex="-1"></a><span class="fu">use</span> v5.<span class="dv">24</span>;</span>
<span id="cb4-4"><a href="#cb4-4" aria-hidden="true" tabindex="-1"></a><span class="fu">use</span> <span class="kw">warnings</span>;</span>
<span id="cb4-5"><a href="#cb4-5" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb4-6"><a href="#cb4-6" aria-hidden="true" tabindex="-1"></a><span class="fu">use</span> feature <span class="ot">'</span><span class="ss">signatures</span><span class="ot">'</span>;</span>
<span id="cb4-7"><a href="#cb4-7" aria-hidden="true" tabindex="-1"></a><span class="fu">no</span> <span class="kw">warnings</span> <span class="ot">'</span><span class="ss">experimental::signatures</span><span class="ot">'</span>;</span>
<span id="cb4-8"><a href="#cb4-8" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb4-9"><a href="#cb4-9" aria-hidden="true" tabindex="-1"></a><span class="fu">use</span> <span class="fu">List::Util</span> <span class="ot">qw(</span> sum <span class="ot">)</span>;</span>
<span id="cb4-10"><a href="#cb4-10" aria-hidden="true" tabindex="-1"></a><span class="fu">use</span> <span class="fu">Text::Table</span>::<span class="fu">Tiny</span> ();</span>
<span id="cb4-11"><a href="#cb4-11" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb4-12"><a href="#cb4-12" aria-hidden="true" tabindex="-1"></a><span class="kw">my</span> <span class="dt">%pc</span> = (</span>
<span id="cb4-13"><a href="#cb4-13" aria-hidden="true" tabindex="-1"></a> obama => [ -<span class="fl">0.4340</span>, -<span class="fl">0.1862</span>, <span class="fl">0.1284</span>],</span>
<span id="cb4-14"><a href="#cb4-14" aria-hidden="true" tabindex="-1"></a> bush => [ -<span class="fl">0.4345</span>, <span class="fl">0.0403</span>, <span class="fl">0.1023</span>],</span>
<span id="cb4-15"><a href="#cb4-15" aria-hidden="true" tabindex="-1"></a> clinton => [ <span class="fl">0.4327</span>, -<span class="fl">0.1531</span>, <span class="fl">0.1615</span>],</span>
<span id="cb4-16"><a href="#cb4-16" aria-hidden="true" tabindex="-1"></a> reagan => [ <span class="fl">0.3500</span>, <span class="fl">0.4276</span>, <span class="fl">0.0707</span>],</span>
<span id="cb4-17"><a href="#cb4-17" aria-hidden="true" tabindex="-1"></a> carter => [ -<span class="fl">0.1901</span>, <span class="fl">0.0409</span>, <span class="fl">0.8296</span>],</span>
<span id="cb4-18"><a href="#cb4-18" aria-hidden="true" tabindex="-1"></a> nixon => [ <span class="fl">0.1406</span>, <span class="fl">0.7037</span>, <span class="fl">0.0618</span>],</span>
<span id="cb4-19"><a href="#cb4-19" aria-hidden="true" tabindex="-1"></a> kennedy => [ <span class="fl">0.3629</span>, -<span class="fl">0.4763</span>, -<span class="fl">0.0587</span>],</span>
<span id="cb4-20"><a href="#cb4-20" aria-hidden="true" tabindex="-1"></a> eisenhower => [ <span class="fl">0.3543</span>, -<span class="fl">0.1835</span>, <span class="fl">0.4965</span>],</span>
<span id="cb4-21"><a href="#cb4-21" aria-hidden="true" tabindex="-1"></a>);</span>
<span id="cb4-22"><a href="#cb4-22" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb4-23"><a href="#cb4-23" aria-hidden="true" tabindex="-1"></a><span class="kw">my</span> <span class="dt">@table</span> = ([<span class="ot">''</span>, <span class="ot">qw(</span></span>
<span id="cb4-24"><a href="#cb4-24" aria-hidden="true" tabindex="-1"></a> obama</span>
<span id="cb4-25"><a href="#cb4-25" aria-hidden="true" tabindex="-1"></a> bush</span>
<span id="cb4-26"><a href="#cb4-26" aria-hidden="true" tabindex="-1"></a> clinton</span>
<span id="cb4-27"><a href="#cb4-27" aria-hidden="true" tabindex="-1"></a> reagan</span>
<span id="cb4-28"><a href="#cb4-28" aria-hidden="true" tabindex="-1"></a> carter</span>
<span id="cb4-29"><a href="#cb4-29" aria-hidden="true" tabindex="-1"></a> nixon</span>
<span id="cb4-30"><a href="#cb4-30" aria-hidden="true" tabindex="-1"></a> kennedy</span>
<span id="cb4-31"><a href="#cb4-31" aria-hidden="true" tabindex="-1"></a> eisenhower</span>
<span id="cb4-32"><a href="#cb4-32" aria-hidden="true" tabindex="-1"></a><span class="ot">)</span> ]);</span>
<span id="cb4-33"><a href="#cb4-33" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb4-34"><a href="#cb4-34" aria-hidden="true" tabindex="-1"></a><span class="kw">for</span> <span class="kw">my</span> <span class="dt">$i</span> (<span class="dv">1</span> .. <span class="wa">$#</span>{ <span class="dt">$table</span>[<span class="dv">0</span>] }) {</span>
<span id="cb4-35"><a href="#cb4-35" aria-hidden="true" tabindex="-1"></a> <span class="fu">push</span> <span class="dt">@table</span>, [</span>
<span id="cb4-36"><a href="#cb4-36" aria-hidden="true" tabindex="-1"></a> <span class="dt">$table</span>[<span class="dv">0</span>][<span class="dt">$i</span>],</span>
<span id="cb4-37"><a href="#cb4-37" aria-hidden="true" tabindex="-1"></a> <span class="fu">map</span> <span class="fu">sprintf</span>(<span class="ot">'</span><span class="ss">%6.3f</span><span class="ot">'</span>, <span class="wa">$_</span>),</span>
<span id="cb4-38"><a href="#cb4-38" aria-hidden="true" tabindex="-1"></a> <span class="fu">map</span> similarity(<span class="dt">$pc</span>{<span class="dt">$table</span>[<span class="dv">0</span>][<span class="dt">$i</span>]}, <span class="dt">$pc</span>{<span class="dt">$table</span>[<span class="dv">0</span>][<span class="wa">$_</span>]}), <span class="dv">1</span> .. <span class="dt">$i</span></span>
<span id="cb4-39"><a href="#cb4-39" aria-hidden="true" tabindex="-1"></a> ];</span>
<span id="cb4-40"><a href="#cb4-40" aria-hidden="true" tabindex="-1"></a>}</span>
<span id="cb4-41"><a href="#cb4-41" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb4-42"><a href="#cb4-42" aria-hidden="true" tabindex="-1"></a><span class="fu">say</span> <span class="fu">Text::Table</span>::<span class="fu">Tiny</span>::<span class="fu">table</span>(rows => \<span class="dt">@table</span>, header_row => <span class="dv">1</span>);</span>
<span id="cb4-43"><a href="#cb4-43" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb4-44"><a href="#cb4-44" aria-hidden="true" tabindex="-1"></a><span class="kw">sub </span><span class="fu">similarity</span> (<span class="dt">$</span>x, <span class="dt">$y</span>) {</span>
<span id="cb4-45"><a href="#cb4-45" aria-hidden="true" tabindex="-1"></a> dot(<span class="dt">$x</span>, <span class="dt">$y</span>)/(norm(<span class="dt">$x</span>)<span class="dt">*norm</span>(<span class="dt">$y</span>));</span>
<span id="cb4-46"><a href="#cb4-46" aria-hidden="true" tabindex="-1"></a>}</span>
<span id="cb4-47"><a href="#cb4-47" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb4-48"><a href="#cb4-48" aria-hidden="true" tabindex="-1"></a><span class="kw">sub </span><span class="fu">norm</span> (<span class="dt">$</span>x) {</span>
<span id="cb4-49"><a href="#cb4-49" aria-hidden="true" tabindex="-1"></a> <span class="fu">sqrt</span>(sum( <span class="fu">map</span> <span class="wa">$_</span>**<span class="dv">2</span>, <span class="dt">$x</span>-><span class="dt">@</span><span class="ot">*</span> ));</span>
<span id="cb4-50"><a href="#cb4-50" aria-hidden="true" tabindex="-1"></a>}</span>
<span id="cb4-51"><a href="#cb4-51" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb4-52"><a href="#cb4-52" aria-hidden="true" tabindex="-1"></a><span class="kw">sub </span><span class="fu">dot</span>(<span class="dt">$</span>x, <span class="dt">$y</span>) {</span>
<span id="cb4-53"><a href="#cb4-53" aria-hidden="true" tabindex="-1"></a> sum(<span class="fu">map</span> <span class="dt">$x</span>->[<span class="wa">$_</span>] <span class="ot">*</span> <span class="dt">$y</span>->[<span class="wa">$_</span>], <span class="dv">0</span> .. <span class="wa">$#</span><span class="dt">$x</span>);</span>
<span id="cb4-54"><a href="#cb4-54" aria-hidden="true" tabindex="-1"></a>}</span></code></pre></div>
<p>which produces the output:</p>
<pre><code>+------------+--------+--------+---------+--------+--------+--------+---------+------------+
| | obama | bush | clinton | reagan | carter | nixon | kennedy | eisenhower |
+------------+--------+--------+---------+--------+--------+--------+---------+------------+
| obama | 1.000 | | | | | | | |
| bush | 0.885 | 1.000 | | | | | | |
| clinton | -0.582 | -0.815 | 1.000 | | | | | |
| reagan | -0.816 | -0.511 | 0.359 | 1.000 | | | | |
| carter | 0.435 | 0.443 | 0.110 | 0.020 | 1.000 | | | |
| nixon | -0.522 | -0.082 | -0.105 | 0.883 | 0.087 | 1.000 | | |
| kennedy | -0.259 | -0.678 | 0.753 | -0.241 | -0.268 | -0.664 | 1.000 | |
| eisenhower | -0.179 | -0.387 | 0.844 | 0.227 | 0.621 | -0.106 | 0.488 | 1.000 |
+------------+--------+--------+---------+--------+--------+--------+---------+------------+</code></pre>
<p>In this context, numbers close to 1 indicate high similarity and numbers close to -1 indicate complete dissimilarity. The similarity figure is the cosine of the angle between the coefficient vectors. That is, a similarity of 0.885 indicates that the Presidents Obama and Bush’s coefficient vectors are approximately 28° apart.</p>
<p>Indeed, our first impression that the behavior of the stock market around President Obama’s election is most similar to that around President Bush’s is somewhat confirmed (depending on how much stock you put in the notion that two series are similar if these coefficients are cosine-similar in 3 dimensional space).</p>
<p>Does this say anything about how S&P is going to behave in the near future? Not really. But we can try to figure out what happens when we restrict our attention to the 100 days before and 45 days after the election (which is all we have at the time I am writing this):</p>
<pre><code>. pca trump obama bush clinton reagan carter nixon kennedy eisenhower
Principal components/correlation Number of obs = 146
Number of comp. = 9
Trace = 9
Rotation: (unrotated = principal) Rho = 1.0000
--------------------------------------------------------------------------
Component | Eigenvalue Difference Proportion Cumulative
-------------+------------------------------------------------------------
Comp1 | 4.85596 2.61317 0.5396 0.5396
Comp2 | 2.24279 1.35168 0.2492 0.7887
Comp3 | .891104 .48339 0.0990 0.8878
Comp4 | .407714 .156724 0.0453 0.9331
Comp5 | .25099 .0996227 0.0279 0.9610
Comp6 | .151367 .0637735 0.0168 0.9778
Comp7 | .0875935 .00941584 0.0097 0.9875
Comp8 | .0781777 .0438715 0.0087 0.9962
Comp9 | .0343061 . 0.0038 1.0000
--------------------------------------------------------------------------</code></pre>
<p>For the sake of consistency, let’s again retain the first three principal components. Here is how they look:</p>
<h3 id="pc1-1"><code>pc1</code></h3>
<div class="thumb"><a href="/2017/01/sp500-election-pc1-2016.png"><img src="https://www.nu42.com/2017/01/sp500-election-pc1-2016.png" width="600" title="First Principal Component"></a></div>
<h3 id="pc2-1"><code>pc2</code></h3>
<div class="thumb"><a href="/2017/01/sp500-election-pc2-2016.png"><img src="https://www.nu42.com/2017/01/sp500-election-pc2-2016.png" width="600" title="Second Principal Component"></a></div>
<h3 id="pc3-1"><code>pc3</code></h3>
<div class="thumb"><a href="/2017/01/sp500-election-pc3-2016.png"><img src="https://www.nu42.com/2017/01/sp500-election-pc3-2016.png" width="600" title="Third Principal Component"></a></div>
<p>No huge surprises there.</p>
<p>We have the following coefficients:</p>
<pre><code>-----------------------------------------
Variable | Comp1 Comp2 Comp3
-----------+-----------------------------
trump | 0.3841 0.2511 0.0595
obama | -0.3914 0.2904 0.0689
bush | -0.3948 0.1512 0.0205
clinton | 0.3936 0.2034 0.0704
reagan | 0.3910 -0.2151 0.0849
carter | 0.0260 0.4494 0.7297
nixon | 0.3469 -0.3498 0.0593
kennedy | 0.1034 0.4743 -0.6629
eisenhower | 0.3223 0.4393 -0.0612
-----------------------------------------</code></pre>
<p>The coefficients are different because we are using fewer observations and we have added one more variable. Let’s again look at the similarities using the same Perl script:</p>
<pre><code>+------------+--------+--------+--------+---------+--------+--------+--------+---------+------------+
| | trump | obama | bush | clinton | reagan | carter | nixon | kennedy | eisenhower |
+------------+--------+--------+--------+---------+--------+--------+--------+---------+------------+
| trump | 1.000 | | | | | | | | |
| obama | -0.322 | 1.000 | | | | | | | |
| bush | -0.574 | 0.959 | 1.000 | | | | | | |
| clinton | 0.994 | -0.408 | -0.649 | 1.000 | | | | | |
| reagan | 0.482 | -0.938 | -0.963 | 0.570 | 1.000 | | | | |
| carter | 0.419 | 0.404 | 0.200 | 0.398 | -0.063 | 1.000 | | | |
| nixon | 0.213 | -0.955 | -0.898 | 0.313 | 0.958 | -0.247 | 1.000 | | |
| kennedy | 0.314 | 0.128 | 0.050 | 0.246 | -0.316 | -0.380 | -0.415 | 1.000 | |
| eisenhower | 0.908 | -0.010 | -0.267 | 0.862 | 0.106 | 0.343 | -0.167 | 0.627 | 1.000 |
+------------+--------+--------+--------+---------+--------+--------+--------+---------+------------+</code></pre>
<p>The behavior of S&P 500 around the 2016 election <em>so far</em> seems to be most similar to its behavior around the Clinton and Eisenhower elections in 1992 and 1952 respectively.</p>
<p>The similarity between the Obama and Bush elections over this truncated period is also noteworthy.</p>
<p>So what does the near future hold? Let’s look at how the Clinton and Trump elections compare:</p>
<div class="thumb"><a href="/2017/01/sp-500-trump-clinton.png"><img src="https://www.nu42.com/2017/01/sp-500-trump-clinton.png" width="600" title="SP500 around Trump's election compared to same period around Clinton's election"></a></div>
<p>If you believe the future is going to be similar to what happened in 1993, it looks like S&P 500 is going to keep on climbing.</p>
<div class="thumb"><a href="/2017/01/sp-500-trump-eisenhower.png"><img src="https://www.nu42.com/2017/01/sp-500-trump-eisenhower.png" width="600" title="SP500 around Trump's election compared to same period around Eisenhower's election"></a></div>
<p>If you believe the future is going to resemble what happened after Eisenhower’s election, it looks like S&P 500 is going start falling soon, but will still be up by about 3% 100 days from the election.</p>
<p>Finally, let’s compare what’s been happening around the 2016 election to what happened around the election of G.W. Bush in 2000 which is the series that is most dissimilar to the stock market behavior so far:</p>
<div class="thumb"><a href="/2017/01/sp-500-trump-bush.png"><img src="https://www.nu42.com/2017/01/sp-500-trump-bush.png" width="600" title="SP500 around Trump's election compared to same period around Bush's election"></a></div>
<p>And, for completeness, here is the comparison to President Obama’s election:</p>
<div class="thumb"><a href="/2017/01/sp-500-trump-obama.png"><img src="https://www.nu42.com/2017/01/sp-500-trump-obama.png" width="600" title="SP500 around Trump's election compared to same period around Obama's election"></a></div>
<p>We’ll see what the future holds.</p>
<p>Here is to hoping that there is no 20% drop in the S&P 500 in the near future.</p>
</div>
</article>
Sinan UnurDoes past behavior of S&P500 indicate 'selling the inauguration' is a good idea?tag:www.nu42.com,2017-01-16:/2017/01/sell-inauguration.html2017-01-16T21:30:00+00:00
<article class="h-entry article">
<div class="article-header-container">
<header>
<h1 class="p-name article-title">Does past behavior of S&P500 indicate 'selling the inauguration' is a good idea?</h1><h2 class="p-author article-author">A. Sinan Unur</h2><h3 class="article-published"><time datetime="2017-01-16T21:30:00+00:00" class="dt-published">January 16, 2017</time></h3>
</header>
</div>
<div class="article-content"><blockquote>
<p>This post is an exercise in chartsmanship. Nothing discussed herein should be taken as investment advice or as a recommendation regarding any particular investment vehicle or course of action. All statements herein are statements of subjective opinion and for information and entertainment purposes only. Seek a duly licensed professional for investment advice.</p>
</blockquote>
<p>A lot has happened since the last time I discussed a supposed crystall ball <a href="/2016/04/djia-crystal-ball.html">predicting an impending crash of the U.S. stock market</a>. Today, I saw another chart which is being <a href="http://www.marketwatch.com/story/how-long-post-election-rallies-last-after-inauguration-day-in-one-sp-chart-2017-01-13">used to support</a> the claim that:</p>
<blockquote>
<p>Stocks tend to strengthen in the two weeks after Inauguration Day, but one-month returns are typically negative</p>
</blockquote>
<p>Here is the chart for reference:</p>
<div class="thumb"><a href="/2017/01/mw-sp500-post-election.jpg"><img src="https://www.nu42.com/2017/01/mw-sp500-post-election.jpg" width="600" alt="S&P500 post election advance" title="S&P500 post election advance"></a></div>
<p>This got me curious: What detail is being obscured by the use of the median per trading day across years? Separately, I also wanted to see if there were any interesting dynamics in the period leading up to each election.</p>
<p>To that end, I downloaded the <a href="https://finance.yahoo.com/quote/%5EGSPC/history?p=%5EGSPC">daily levels of S&P500</a> from Yahoo! Finance. When you do that, you get a data file that is sorted in reverse chronological order. I wrote a quick Perl script to extract S&P500 levels from 100 trading days before to 100 trading days after the day of the U.S. Presidential Election for each presidential election year since 1950:</p>
<div class="sourceCode" id="cb1"><pre class="sourceCode perl"><code class="sourceCode perl"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="kw">#!/usr/bin/env perl</span></span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a><span class="fu">use</span> v5.<span class="dv">24</span>;</span>
<span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a><span class="fu">use</span> <span class="kw">warnings</span>;</span>
<span id="cb1-5"><a href="#cb1-5" aria-hidden="true" tabindex="-1"></a><span class="fu">use</span> feature <span class="ot">'</span><span class="ss">signatures</span><span class="ot">'</span>;</span>
<span id="cb1-6"><a href="#cb1-6" aria-hidden="true" tabindex="-1"></a><span class="fu">no</span> <span class="kw">warnings</span> <span class="ot">'</span><span class="ss">experimental::signatures</span><span class="ot">'</span>;</span>
<span id="cb1-7"><a href="#cb1-7" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-8"><a href="#cb1-8" aria-hidden="true" tabindex="-1"></a><span class="fu">use</span> autouse <span class="ot">'</span><span class="ss">Carp</span><span class="ot">'</span> => <span class="ot">qw(</span> croak <span class="ot">)</span>;</span>
<span id="cb1-9"><a href="#cb1-9" aria-hidden="true" tabindex="-1"></a><span class="fu">use</span> autouse <span class="ot">'</span><span class="ss">YAML::XS</span><span class="ot">'</span> => <span class="ot">qw(</span> Dump <span class="ot">)</span>;</span>
<span id="cb1-10"><a href="#cb1-10" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-11"><a href="#cb1-11" aria-hidden="true" tabindex="-1"></a><span class="fu">use</span> <span class="fu">Const::Fast</span>;</span>
<span id="cb1-12"><a href="#cb1-12" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-13"><a href="#cb1-13" aria-hidden="true" tabindex="-1"></a>const <span class="kw">my</span> <span class="dt">$SP500_FILE</span> => <span class="ot">'</span><span class="ss">SP500-daily-upto-20170113.csv</span><span class="ot">'</span>;</span>
<span id="cb1-14"><a href="#cb1-14" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-15"><a href="#cb1-15" aria-hidden="true" tabindex="-1"></a>const <span class="kw">my</span> <span class="dt">%I</span> => (</span>
<span id="cb1-16"><a href="#cb1-16" aria-hidden="true" tabindex="-1"></a> election => <span class="dv">0</span>,</span>
<span id="cb1-17"><a href="#cb1-17" aria-hidden="true" tabindex="-1"></a> inauguration => <span class="dv">1</span>,</span>
<span id="cb1-18"><a href="#cb1-18" aria-hidden="true" tabindex="-1"></a>);</span>
<span id="cb1-19"><a href="#cb1-19" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-20"><a href="#cb1-20" aria-hidden="true" tabindex="-1"></a>const <span class="kw">my</span> <span class="dt">%INTERVAL</span> => (</span>
<span id="cb1-21"><a href="#cb1-21" aria-hidden="true" tabindex="-1"></a> <span class="dv">1952</span> => [ <span class="ot">'</span><span class="ss">1952-11-04</span><span class="ot">'</span>, <span class="ot">'</span><span class="ss">1953-01-20</span><span class="ot">'</span> ],</span>
<span id="cb1-22"><a href="#cb1-22" aria-hidden="true" tabindex="-1"></a> <span class="dv">1956</span> => [ <span class="ot">'</span><span class="ss">1956-11-06</span><span class="ot">'</span>, <span class="ot">'</span><span class="ss">1957-01-20</span><span class="ot">'</span> ],</span>
<span id="cb1-23"><a href="#cb1-23" aria-hidden="true" tabindex="-1"></a> <span class="dv">1960</span> => [ <span class="ot">'</span><span class="ss">1960-11-08</span><span class="ot">'</span>, <span class="ot">'</span><span class="ss">1961-01-20</span><span class="ot">'</span> ],</span>
<span id="cb1-24"><a href="#cb1-24" aria-hidden="true" tabindex="-1"></a> <span class="dv">1964</span> => [ <span class="ot">'</span><span class="ss">1964-11-03</span><span class="ot">'</span>, <span class="ot">'</span><span class="ss">1965-01-20</span><span class="ot">'</span> ],</span>
<span id="cb1-25"><a href="#cb1-25" aria-hidden="true" tabindex="-1"></a> <span class="dv">1968</span> => [ <span class="ot">'</span><span class="ss">1968-11-05</span><span class="ot">'</span>, <span class="ot">'</span><span class="ss">1969-01-20</span><span class="ot">'</span> ],</span>
<span id="cb1-26"><a href="#cb1-26" aria-hidden="true" tabindex="-1"></a> <span class="dv">1972</span> => [ <span class="ot">'</span><span class="ss">1972-11-07</span><span class="ot">'</span>, <span class="ot">'</span><span class="ss">1973-01-20</span><span class="ot">'</span> ],</span>
<span id="cb1-27"><a href="#cb1-27" aria-hidden="true" tabindex="-1"></a> <span class="dv">1976</span> => [ <span class="ot">'</span><span class="ss">1976-11-02</span><span class="ot">'</span>, <span class="ot">'</span><span class="ss">1977-01-20</span><span class="ot">'</span> ],</span>
<span id="cb1-28"><a href="#cb1-28" aria-hidden="true" tabindex="-1"></a> <span class="dv">1980</span> => [ <span class="ot">'</span><span class="ss">1980-11-04</span><span class="ot">'</span>, <span class="ot">'</span><span class="ss">1981-01-20</span><span class="ot">'</span> ],</span>
<span id="cb1-29"><a href="#cb1-29" aria-hidden="true" tabindex="-1"></a> <span class="dv">1984</span> => [ <span class="ot">'</span><span class="ss">1984-11-06</span><span class="ot">'</span>, <span class="ot">'</span><span class="ss">1985-01-20</span><span class="ot">'</span> ],</span>
<span id="cb1-30"><a href="#cb1-30" aria-hidden="true" tabindex="-1"></a> <span class="dv">1988</span> => [ <span class="ot">'</span><span class="ss">1988-11-08</span><span class="ot">'</span>, <span class="ot">'</span><span class="ss">1989-01-20</span><span class="ot">'</span> ],</span>
<span id="cb1-31"><a href="#cb1-31" aria-hidden="true" tabindex="-1"></a> <span class="dv">1992</span> => [ <span class="ot">'</span><span class="ss">1992-11-03</span><span class="ot">'</span>, <span class="ot">'</span><span class="ss">1993-01-20</span><span class="ot">'</span> ],</span>
<span id="cb1-32"><a href="#cb1-32" aria-hidden="true" tabindex="-1"></a> <span class="dv">1996</span> => [ <span class="ot">'</span><span class="ss">1996-11-05</span><span class="ot">'</span>, <span class="ot">'</span><span class="ss">1997-01-20</span><span class="ot">'</span> ],</span>
<span id="cb1-33"><a href="#cb1-33" aria-hidden="true" tabindex="-1"></a> <span class="dv">2000</span> => [ <span class="ot">'</span><span class="ss">2000-11-07</span><span class="ot">'</span>, <span class="ot">'</span><span class="ss">2001-01-20</span><span class="ot">'</span> ],</span>
<span id="cb1-34"><a href="#cb1-34" aria-hidden="true" tabindex="-1"></a> <span class="dv">2004</span> => [ <span class="ot">'</span><span class="ss">2004-11-02</span><span class="ot">'</span>, <span class="ot">'</span><span class="ss">2005-01-20</span><span class="ot">'</span> ],</span>
<span id="cb1-35"><a href="#cb1-35" aria-hidden="true" tabindex="-1"></a> <span class="dv">2008</span> => [ <span class="ot">'</span><span class="ss">2008-11-04</span><span class="ot">'</span>, <span class="ot">'</span><span class="ss">2009-01-20</span><span class="ot">'</span> ],</span>
<span id="cb1-36"><a href="#cb1-36" aria-hidden="true" tabindex="-1"></a> <span class="dv">2012</span> => [ <span class="ot">'</span><span class="ss">2012-11-06</span><span class="ot">'</span>, <span class="ot">'</span><span class="ss">2013-01-20</span><span class="ot">'</span> ],</span>
<span id="cb1-37"><a href="#cb1-37" aria-hidden="true" tabindex="-1"></a> <span class="dv">2016</span> => [ <span class="ot">'</span><span class="ss">2016-11-08</span><span class="ot">'</span>, <span class="ot">'</span><span class="ss">2017-01-13</span><span class="ot">'</span> ],</span>
<span id="cb1-38"><a href="#cb1-38" aria-hidden="true" tabindex="-1"></a>);</span>
<span id="cb1-39"><a href="#cb1-39" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-40"><a href="#cb1-40" aria-hidden="true" tabindex="-1"></a>const <span class="kw">my</span> <span class="dt">%YEARS</span> => (</span>
<span id="cb1-41"><a href="#cb1-41" aria-hidden="true" tabindex="-1"></a> all => [ <span class="fu">reverse</span> <span class="fu">map</span> <span class="dv">1952</span> + <span class="dv">4</span> <span class="ot">*</span> <span class="wa">$_</span>, <span class="dv">0</span> .. <span class="dv">16</span> ],</span>
<span id="cb1-42"><a href="#cb1-42" aria-hidden="true" tabindex="-1"></a> change => [ <span class="fu">reverse</span> <span class="dv">1952</span>, <span class="dv">1960</span>, <span class="dv">1968</span>, <span class="dv">1976</span>, <span class="dv">1980</span>, <span class="dv">1992</span>, <span class="dv">2000</span>, <span class="dv">2008</span>, <span class="dv">2016</span> ]</span>
<span id="cb1-43"><a href="#cb1-43" aria-hidden="true" tabindex="-1"></a>);</span>
<span id="cb1-44"><a href="#cb1-44" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-45"><a href="#cb1-45" aria-hidden="true" tabindex="-1"></a>const <span class="kw">my</span> <span class="dt">%TRADING_DAYS</span> => (</span>
<span id="cb1-46"><a href="#cb1-46" aria-hidden="true" tabindex="-1"></a> BEFORE => <span class="dv">100</span>,</span>
<span id="cb1-47"><a href="#cb1-47" aria-hidden="true" tabindex="-1"></a> AFTER => <span class="dv">100</span>,</span>
<span id="cb1-48"><a href="#cb1-48" aria-hidden="true" tabindex="-1"></a>);</span>
<span id="cb1-49"><a href="#cb1-49" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-50"><a href="#cb1-50" aria-hidden="true" tabindex="-1"></a>run( <span class="wa">@ARGV</span> ? <span class="wa">$ARGV</span>[<span class="dv">0</span>] : <span class="dt">$SP500_FILE</span> );</span>
<span id="cb1-51"><a href="#cb1-51" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-52"><a href="#cb1-52" aria-hidden="true" tabindex="-1"></a><span class="kw">sub </span><span class="fu">run</span>( <span class="dt">$sp500_file</span> ) {</span>
<span id="cb1-53"><a href="#cb1-53" aria-hidden="true" tabindex="-1"></a> <span class="kw">my</span> <span class="dt">$data</span> = read_data( <span class="dt">$sp500_file</span>, <span class="dt">$YEARS</span>{all} );</span>
<span id="cb1-54"><a href="#cb1-54" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-55"><a href="#cb1-55" aria-hidden="true" tabindex="-1"></a> <span class="fu">print</span> Dump <span class="dt">$data</span>;</span>
<span id="cb1-56"><a href="#cb1-56" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-57"><a href="#cb1-57" aria-hidden="true" tabindex="-1"></a> export_data(</span>
<span id="cb1-58"><a href="#cb1-58" aria-hidden="true" tabindex="-1"></a> <span class="dt">$data</span>,</span>
<span id="cb1-59"><a href="#cb1-59" aria-hidden="true" tabindex="-1"></a> <span class="dt">$YEARS</span>{all},</span>
<span id="cb1-60"><a href="#cb1-60" aria-hidden="true" tabindex="-1"></a> <span class="ot">"</span><span class="st">sp500-all-election-</span><span class="dt">$TRADING_DAYS</span><span class="st">{BEFORE}-</span><span class="dt">$TRADING_DAYS</span><span class="st">{AFTER}.tsv</span><span class="ot">"</span>,</span>
<span id="cb1-61"><a href="#cb1-61" aria-hidden="true" tabindex="-1"></a> [ -<span class="dt">$TRADING_DAYS</span>{BEFORE}, <span class="dt">$TRADING_DAYS</span>{AFTER} ],</span>
<span id="cb1-62"><a href="#cb1-62" aria-hidden="true" tabindex="-1"></a> );</span>
<span id="cb1-63"><a href="#cb1-63" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-64"><a href="#cb1-64" aria-hidden="true" tabindex="-1"></a> export_data(</span>
<span id="cb1-65"><a href="#cb1-65" aria-hidden="true" tabindex="-1"></a> <span class="dt">$data</span>,</span>
<span id="cb1-66"><a href="#cb1-66" aria-hidden="true" tabindex="-1"></a> <span class="dt">$YEARS</span>{change},</span>
<span id="cb1-67"><a href="#cb1-67" aria-hidden="true" tabindex="-1"></a> <span class="ot">"</span><span class="st">sp500-change-election-</span><span class="dt">$TRADING_DAYS</span><span class="st">{BEFORE}-</span><span class="dt">$TRADING_DAYS</span><span class="st">{AFTER}.tsv</span><span class="ot">"</span>,</span>
<span id="cb1-68"><a href="#cb1-68" aria-hidden="true" tabindex="-1"></a> [ -<span class="dt">$TRADING_DAYS</span>{BEFORE}, <span class="dt">$TRADING_DAYS</span>{AFTER} ],</span>
<span id="cb1-69"><a href="#cb1-69" aria-hidden="true" tabindex="-1"></a> );</span>
<span id="cb1-70"><a href="#cb1-70" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-71"><a href="#cb1-71" aria-hidden="true" tabindex="-1"></a> <span class="kw">return</span>;</span>
<span id="cb1-72"><a href="#cb1-72" aria-hidden="true" tabindex="-1"></a>}</span>
<span id="cb1-73"><a href="#cb1-73" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-74"><a href="#cb1-74" aria-hidden="true" tabindex="-1"></a><span class="kw">sub </span><span class="fu">export_data</span>(<span class="dt">$</span>data, <span class="dt">$years</span>, <span class="dt">$filename</span>, <span class="dt">$period</span>) {</span>
<span id="cb1-75"><a href="#cb1-75" aria-hidden="true" tabindex="-1"></a> <span class="fu">open</span> <span class="kw">my</span> <span class="dt">$fh</span>, <span class="ot">'</span><span class="ss">></span><span class="ot">'</span>, <span class="dt">$filename</span></span>
<span id="cb1-76"><a href="#cb1-76" aria-hidden="true" tabindex="-1"></a> <span class="ot">or</span> croak <span class="ot">"</span><span class="st">Failed to open '</span><span class="dt">$filename</span><span class="ot">'</span><span class="st">: </span><span class="wa">$!</span><span class="ot">"</span>;</span>
<span id="cb1-77"><a href="#cb1-77" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-78"><a href="#cb1-78" aria-hidden="true" tabindex="-1"></a> <span class="co"># header</span></span>
<span id="cb1-79"><a href="#cb1-79" aria-hidden="true" tabindex="-1"></a> <span class="fu">say</span> <span class="dt">$fh</span> <span class="fu">join</span>(<span class="ot">"</span><span class="ch">\t</span><span class="ot">"</span>, t => <span class="dt">$years</span>-><span class="dt">@</span><span class="ot">*</span>);</span>
<span id="cb1-80"><a href="#cb1-80" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-81"><a href="#cb1-81" aria-hidden="true" tabindex="-1"></a> <span class="kw">for</span> <span class="kw">my</span> <span class="dt">$i</span> ( <span class="dt">$period</span>->[<span class="dv">0</span>] .. <span class="dt">$period</span>->[<span class="dv">1</span>] ) {</span>
<span id="cb1-82"><a href="#cb1-82" aria-hidden="true" tabindex="-1"></a> <span class="kw">my</span> <span class="dt">@obs</span> = <span class="fu">map</span> {</span>
<span id="cb1-83"><a href="#cb1-83" aria-hidden="true" tabindex="-1"></a> <span class="fu">defined</span> ? <span class="fu">sprintf</span>(<span class="ot">'</span><span class="ss">%.4f</span><span class="ot">'</span>, <span class="wa">$_</span>) : <span class="ot">''</span></span>
<span id="cb1-84"><a href="#cb1-84" aria-hidden="true" tabindex="-1"></a> } <span class="fu">map</span> sp500_of(<span class="dt">$data</span>->{<span class="wa">$_</span>}[<span class="dt">$i</span> - <span class="dt">$period</span>->[<span class="dv">0</span>]]), <span class="dt">$years</span>-><span class="dt">@</span><span class="wa">*;</span></span>
<span id="cb1-85"><a href="#cb1-85" aria-hidden="true" tabindex="-1"></a> <span class="fu">say</span> <span class="dt">$fh</span> <span class="fu">join</span>(<span class="ot">"</span><span class="ch">\t</span><span class="ot">"</span>, <span class="dt">$i</span>, <span class="dt">@obs</span>);</span>
<span id="cb1-86"><a href="#cb1-86" aria-hidden="true" tabindex="-1"></a> }</span>
<span id="cb1-87"><a href="#cb1-87" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-88"><a href="#cb1-88" aria-hidden="true" tabindex="-1"></a> <span class="fu">close</span> <span class="dt">$fh</span></span>
<span id="cb1-89"><a href="#cb1-89" aria-hidden="true" tabindex="-1"></a> <span class="ot">or</span> croak <span class="ot">"</span><span class="st">Cannot close '</span><span class="dt">$filename</span><span class="ot">'</span><span class="st">: </span><span class="wa">$!</span><span class="ot">"</span>;</span>
<span id="cb1-90"><a href="#cb1-90" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-91"><a href="#cb1-91" aria-hidden="true" tabindex="-1"></a> <span class="kw">return</span>;</span>
<span id="cb1-92"><a href="#cb1-92" aria-hidden="true" tabindex="-1"></a>}</span>
<span id="cb1-93"><a href="#cb1-93" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-94"><a href="#cb1-94" aria-hidden="true" tabindex="-1"></a><span class="kw">sub </span><span class="fu">read_data</span> (<span class="dt">$</span>sp500_file, <span class="dt">$years</span> ) {</span>
<span id="cb1-95"><a href="#cb1-95" aria-hidden="true" tabindex="-1"></a> <span class="kw">my</span> <span class="dt">%data</span>;</span>
<span id="cb1-96"><a href="#cb1-96" aria-hidden="true" tabindex="-1"></a> <span class="fu">open</span> <span class="kw">my</span> <span class="dt">$fh</span>, <span class="ot">'</span><span class="ss"><</span><span class="ot">'</span>, <span class="dt">$sp500_file</span></span>
<span id="cb1-97"><a href="#cb1-97" aria-hidden="true" tabindex="-1"></a> <span class="ot">or</span> croak <span class="ot">"</span><span class="st">Cannot open '</span><span class="dt">$sp500_file</span><span class="ot">'</span><span class="st">: </span><span class="wa">$!</span><span class="ot">"</span>;</span>
<span id="cb1-98"><a href="#cb1-98" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-99"><a href="#cb1-99" aria-hidden="true" tabindex="-1"></a> <span class="kw">my</span> <span class="dt">$extractor</span> = make_extractor( <span class="fu">scalar</span> <<span class="dt">$fh</span>> );</span>
<span id="cb1-100"><a href="#cb1-100" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-101"><a href="#cb1-101" aria-hidden="true" tabindex="-1"></a> <span class="kw">for</span> <span class="kw">my</span> <span class="dt">$year</span> ( <span class="dt">$years</span>-><span class="dt">@</span><span class="ot">*</span> ) {</span>
<span id="cb1-102"><a href="#cb1-102" aria-hidden="true" tabindex="-1"></a> <span class="dt">$data</span>{<span class="dt">$year</span>} = extract_sp500( <span class="dt">$INTERVAL</span>{<span class="dt">$year</span>}[<span class="dt">$I</span>{election}], <span class="dt">$fh</span>, <span class="dt">$extractor</span> );</span>
<span id="cb1-103"><a href="#cb1-103" aria-hidden="true" tabindex="-1"></a> <span class="kw">my</span> <span class="dt">$base</span> = sp500_of(<span class="dt">$data</span>{<span class="dt">$year</span>}->[<span class="dt">$TRADING_DAYS</span>{BEFORE}]); <span class="co"># election day</span></span>
<span id="cb1-104"><a href="#cb1-104" aria-hidden="true" tabindex="-1"></a> <span class="kw">for</span> <span class="kw">my</span> <span class="dt">$obs</span> ( <span class="dt">$data</span>{<span class="dt">$year</span>}-><span class="dt">@</span><span class="ot">*</span> ) {</span>
<span id="cb1-105"><a href="#cb1-105" aria-hidden="true" tabindex="-1"></a> <span class="kw">if</span> (<span class="fu">defined</span>(<span class="kw">my</span> <span class="dt">$sp500</span> = sp500_of(<span class="dt">$obs</span>))) {</span>
<span id="cb1-106"><a href="#cb1-106" aria-hidden="true" tabindex="-1"></a> sp500_of(<span class="dt">$obs</span>, <span class="dt">$sp500</span>/<span class="dt">$base</span>);</span>
<span id="cb1-107"><a href="#cb1-107" aria-hidden="true" tabindex="-1"></a> }</span>
<span id="cb1-108"><a href="#cb1-108" aria-hidden="true" tabindex="-1"></a> }</span>
<span id="cb1-109"><a href="#cb1-109" aria-hidden="true" tabindex="-1"></a> }</span>
<span id="cb1-110"><a href="#cb1-110" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-111"><a href="#cb1-111" aria-hidden="true" tabindex="-1"></a> <span class="fu">close</span> <span class="dt">$fh</span></span>
<span id="cb1-112"><a href="#cb1-112" aria-hidden="true" tabindex="-1"></a> <span class="ot">or</span> croak <span class="ot">"</span><span class="st">Cannot close '</span><span class="dt">$sp500_file</span><span class="ot">'</span><span class="st">: </span><span class="wa">$!</span><span class="ot">"</span>;</span>
<span id="cb1-113"><a href="#cb1-113" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-114"><a href="#cb1-114" aria-hidden="true" tabindex="-1"></a> <span class="kw">return</span> \<span class="dt">%data</span>;</span>
<span id="cb1-115"><a href="#cb1-115" aria-hidden="true" tabindex="-1"></a>}</span>
<span id="cb1-116"><a href="#cb1-116" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-117"><a href="#cb1-117" aria-hidden="true" tabindex="-1"></a><span class="kw">sub </span><span class="fu">make_extractor</span>( <span class="dt">$header</span> ) {</span>
<span id="cb1-118"><a href="#cb1-118" aria-hidden="true" tabindex="-1"></a> trim_inplace( <span class="dt">$header</span> );</span>
<span id="cb1-119"><a href="#cb1-119" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-120"><a href="#cb1-120" aria-hidden="true" tabindex="-1"></a> <span class="kw">my</span> <span class="dt">@header</span> = <span class="fu">split</span> <span class="ot">/,/</span>, <span class="dt">$header</span>;</span>
<span id="cb1-121"><a href="#cb1-121" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-122"><a href="#cb1-122" aria-hidden="true" tabindex="-1"></a> <span class="kw">my</span> <span class="dt">@i</span> = <span class="fu">grep</span> <span class="dt">$header</span>[<span class="wa">$_</span>] <span class="ot">eq</span> <span class="ot">'</span><span class="ss">Date</span><span class="ot">'</span>, <span class="dv">0</span> .. <span class="dt">$#header</span>;</span>
<span id="cb1-123"><a href="#cb1-123" aria-hidden="true" tabindex="-1"></a> <span class="fu">push</span> <span class="dt">@i</span>, <span class="fu">grep</span> <span class="dt">$header</span>[<span class="wa">$_</span>] <span class="ot">eq</span> <span class="ot">'</span><span class="ss">Close</span><span class="ot">'</span>, (<span class="dt">$i</span>[-<span class="dv">1</span>] + <span class="dv">1</span>) .. <span class="dt">$#header</span>;</span>
<span id="cb1-124"><a href="#cb1-124" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-125"><a href="#cb1-125" aria-hidden="true" tabindex="-1"></a> <span class="kw">return</span> <span class="kw">sub </span>(<span class="dt">$</span>row) {</span>
<span id="cb1-126"><a href="#cb1-126" aria-hidden="true" tabindex="-1"></a> trim_inplace( <span class="dt">$row</span> );</span>
<span id="cb1-127"><a href="#cb1-127" aria-hidden="true" tabindex="-1"></a> <span class="kw">return</span> [ (<span class="fu">split</span> <span class="ot">/,/</span>, <span class="dt">$row</span>)[<span class="dt">@i</span>] ];</span>
<span id="cb1-128"><a href="#cb1-128" aria-hidden="true" tabindex="-1"></a> };</span>
<span id="cb1-129"><a href="#cb1-129" aria-hidden="true" tabindex="-1"></a>}</span>
<span id="cb1-130"><a href="#cb1-130" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-131"><a href="#cb1-131" aria-hidden="true" tabindex="-1"></a><span class="kw">sub </span><span class="fu">trim_inplace</span> {</span>
<span id="cb1-132"><a href="#cb1-132" aria-hidden="true" tabindex="-1"></a> <span class="wa">$_</span>[<span class="dv">0</span>] =~ <span class="ot">s/</span><span class="ch">^</span><span class="bn">\s</span><span class="ch">+</span><span class="ot">//</span>;</span>
<span id="cb1-133"><a href="#cb1-133" aria-hidden="true" tabindex="-1"></a> <span class="wa">$_</span>[<span class="dv">0</span>] =~ <span class="ot">s/</span><span class="bn">\s</span><span class="ch">+\z</span><span class="ot">//</span>;</span>
<span id="cb1-134"><a href="#cb1-134" aria-hidden="true" tabindex="-1"></a> <span class="kw">return</span>;</span>
<span id="cb1-135"><a href="#cb1-135" aria-hidden="true" tabindex="-1"></a>}</span>
<span id="cb1-136"><a href="#cb1-136" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-137"><a href="#cb1-137" aria-hidden="true" tabindex="-1"></a><span class="kw">sub </span><span class="fu">date_of</span> { <span class="wa">$_</span>[<span class="dv">0</span>]->[<span class="dv">0</span>] }</span>
<span id="cb1-138"><a href="#cb1-138" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-139"><a href="#cb1-139" aria-hidden="true" tabindex="-1"></a><span class="kw">sub </span><span class="fu">sp500_of</span> {</span>
<span id="cb1-140"><a href="#cb1-140" aria-hidden="true" tabindex="-1"></a> <span class="kw">if</span> ( <span class="dt">@_</span> == <span class="dv">1</span> ) {</span>
<span id="cb1-141"><a href="#cb1-141" aria-hidden="true" tabindex="-1"></a> <span class="kw">return</span> <span class="wa">$_</span>[<span class="dv">0</span>]->[<span class="dv">1</span>];</span>
<span id="cb1-142"><a href="#cb1-142" aria-hidden="true" tabindex="-1"></a> } <span class="kw">elsif</span> ( <span class="dt">@_</span> == <span class="dv">2</span> ) {</span>
<span id="cb1-143"><a href="#cb1-143" aria-hidden="true" tabindex="-1"></a> <span class="wa">$_</span>[<span class="dv">0</span>]->[<span class="dv">1</span>] = <span class="wa">$_</span>[<span class="dv">1</span>];</span>
<span id="cb1-144"><a href="#cb1-144" aria-hidden="true" tabindex="-1"></a> <span class="kw">return</span>;</span>
<span id="cb1-145"><a href="#cb1-145" aria-hidden="true" tabindex="-1"></a> } <span class="kw">else</span> {</span>
<span id="cb1-146"><a href="#cb1-146" aria-hidden="true" tabindex="-1"></a> croak <span class="ot">"</span><span class="st">Too many arguments: '</span><span class="dt">@_</span><span class="ot">'"</span>;</span>
<span id="cb1-147"><a href="#cb1-147" aria-hidden="true" tabindex="-1"></a> }</span>
<span id="cb1-148"><a href="#cb1-148" aria-hidden="true" tabindex="-1"></a>}</span>
<span id="cb1-149"><a href="#cb1-149" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-150"><a href="#cb1-150" aria-hidden="true" tabindex="-1"></a><span class="kw">sub </span><span class="fu">extract_sp500</span>( <span class="dt">$election_date</span>, <span class="dt">$fh</span>, <span class="dt">$extractor</span> ) {</span>
<span id="cb1-151"><a href="#cb1-151" aria-hidden="true" tabindex="-1"></a> <span class="kw">my</span> (<span class="dt">$year</span>) = <span class="fu">split</span> <span class="ot">/-/</span>, <span class="dt">$election_date</span>, <span class="dv">2</span>;</span>
<span id="cb1-152"><a href="#cb1-152" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-153"><a href="#cb1-153" aria-hidden="true" tabindex="-1"></a> <span class="kw">my</span> <span class="dt">$lo_date</span> = <span class="fu">sprintf</span> <span class="ot">'</span><span class="ss">%d-06-01</span><span class="ot">'</span>, <span class="dt">$year</span>;</span>
<span id="cb1-154"><a href="#cb1-154" aria-hidden="true" tabindex="-1"></a> <span class="kw">my</span> <span class="dt">$hi_date</span> = <span class="fu">sprintf</span> <span class="ot">'</span><span class="ss">%d-04-30</span><span class="ot">'</span>, <span class="dt">$year</span> + <span class="dv">1</span>;</span>
<span id="cb1-155"><a href="#cb1-155" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-156"><a href="#cb1-156" aria-hidden="true" tabindex="-1"></a> <span class="kw">my</span> <span class="dt">@data</span>;</span>
<span id="cb1-157"><a href="#cb1-157" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-158"><a href="#cb1-158" aria-hidden="true" tabindex="-1"></a> <span class="kw">while</span> (<span class="kw">my</span> <span class="dt">$row</span> = <<span class="dt">$fh</span>>) {</span>
<span id="cb1-159"><a href="#cb1-159" aria-hidden="true" tabindex="-1"></a> <span class="kw">my</span> <span class="dt">$obs</span> = <span class="dt">$extractor</span>->(<span class="dt">$row</span>);</span>
<span id="cb1-160"><a href="#cb1-160" aria-hidden="true" tabindex="-1"></a> <span class="kw">if</span> ( date_of(<span class="dt">$obs</span>) <span class="ot">le</span> <span class="dt">$hi_date</span> ) {</span>
<span id="cb1-161"><a href="#cb1-161" aria-hidden="true" tabindex="-1"></a> <span class="fu">unshift</span> <span class="dt">@data</span>, <span class="dt">$obs</span>;</span>
<span id="cb1-162"><a href="#cb1-162" aria-hidden="true" tabindex="-1"></a> }</span>
<span id="cb1-163"><a href="#cb1-163" aria-hidden="true" tabindex="-1"></a> <span class="co"># Markets were closed on November 2, 1976</span></span>
<span id="cb1-164"><a href="#cb1-164" aria-hidden="true" tabindex="-1"></a> <span class="co"># http://archives.chicagotribune.com/1976/11/02/page/53/article/wall-street-pre-election-gains-small</span></span>
<span id="cb1-165"><a href="#cb1-165" aria-hidden="true" tabindex="-1"></a> <span class="kw">last</span> <span class="kw">if</span> date_of(<span class="dt">$obs</span>) <span class="ot">le</span> <span class="dt">$election_date</span>;</span>
<span id="cb1-166"><a href="#cb1-166" aria-hidden="true" tabindex="-1"></a> }</span>
<span id="cb1-167"><a href="#cb1-167" aria-hidden="true" tabindex="-1"></a> <span class="dt">@data</span> = <span class="dt">@data</span>[<span class="dv">0</span> .. <span class="dt">$TRADING_DAYS</span>{AFTER}];</span>
<span id="cb1-168"><a href="#cb1-168" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-169"><a href="#cb1-169" aria-hidden="true" tabindex="-1"></a> <span class="kw">for</span> (<span class="dv">1</span> .. <span class="dt">$TRADING_DAYS</span>{BEFORE}) {</span>
<span id="cb1-170"><a href="#cb1-170" aria-hidden="true" tabindex="-1"></a> <span class="kw">my</span> <span class="dt">$row</span> = <<span class="dt">$fh</span>>;</span>
<span id="cb1-171"><a href="#cb1-171" aria-hidden="true" tabindex="-1"></a> <span class="fu">defined</span>(<span class="dt">$row</span>)</span>
<span id="cb1-172"><a href="#cb1-172" aria-hidden="true" tabindex="-1"></a> <span class="ot">or</span> croak <span class="ot">"</span><span class="st">Failed to read from data file: </span><span class="wa">$!</span><span class="ot">"</span>;</span>
<span id="cb1-173"><a href="#cb1-173" aria-hidden="true" tabindex="-1"></a> <span class="fu">unshift</span> <span class="dt">@data</span>, <span class="dt">$extractor</span>->(<span class="dt">$row</span>);</span>
<span id="cb1-174"><a href="#cb1-174" aria-hidden="true" tabindex="-1"></a> }</span>
<span id="cb1-175"><a href="#cb1-175" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-176"><a href="#cb1-176" aria-hidden="true" tabindex="-1"></a> <span class="kw">return</span> \<span class="dt">@data</span>;</span>
<span id="cb1-177"><a href="#cb1-177" aria-hidden="true" tabindex="-1"></a>}</span></code></pre></div>
<p>Note that I normalized the S&P500 levels relative to the level on election (or the trading day before if markets were closed on election day that year). First, I plotted the series for every election:</p>
<div class="thumb"><a href="/2017/01/sp500-all-elections.png"><img src="https://www.nu42.com/2017/01/sp500-all-elections.png" width="600" alt="S&P500 100 trading days before/after election, all years" title="S&P500 100 trading days before/after election, all years"></a></div>
<p>Clearly, there is considerable variability, but most of it is being suppressed by the huge movement in the series for 2008. 100 trading days before the election of President Obama, on June 13, 2008, S&P500 was at 1,341.8. On election that year, S&P500 was down to 971.3, and 100 trading days after the election, on March 31, 2009, it closed at 790.9 points representing a 59% drop over the course of the 201 trading days.</p>
<p>To get a little bit more clarity, I decided to only plot the series for the elections where the party holding the presidency changed: 1952, 1960, 1968, 1976, 1980, 1992, 2000, 2008, and 2016 (thus, using the same years as those used in the MarketWatch story):</p>
<div class="thumb"><a href="/2017/01/sp500-party-change-elections.png"><img src="https://www.nu42.com/2017/01/sp500-party-change-elections.png" width="600" alt="S&P500 100 trading days before/after election, party change years" title="S&P500 100 trading days before/after election, party change years"></a></div>
<p>If you squint hard, three series stand out: 1960, 2000, and 2008. 1960 stands out because it is the only series that looks like it is still on a increasing trajectory 100 days after the election. Both 2000 and 2008 stand out because those are the two cases where S&P500 was down by about 20% 100 days out from the election.</p>
<p>Kennedy’s election was a couple of years after the recession of 1957–58 whereas both Bush and Obama took office right around the times the technology and housing bubbles, respectively, were bursting. Another president who took office <em>after</em> a recession was Clinton whose election happened in 1992 following the July 1990–March 1991 recession. We may want to keep him in mind as well while trying to reduce the visual clutter in the chart.</p>
<p>Here is the final iteration of the graph, focusing solely on the 1960, 1992, 2000, 2008, and 2016 elections:</p>
<div class="thumb"><a href="/2017/01/sp500-1960-1992-2000-2008-2016-elections.png"><img src="https://www.nu42.com/2017/01/sp500-1960-1992-2000-2008-2016-elections.png" width="600" alt="S&P500 100 days before/after election, 1960, 1992, 2000, 2008, 2016" title="S&P500 100 days before/after election, 1960, 1992, 2000, 2008, 2016"></a></div>
<p>So, if you look at the chart from just the right™ angle, it kinda sorta looks like S&P500’s behavior during the time period surrounding the 2016 is more similar to its behavior in 1960 and 1992 than it is to its behavior in 2000 and 2008.</p>
<p>Let’s take a look at correlations among years, ignoring 2016:</p>
<pre class="text"><code>. corr(y2008-y1952)
(obs=201)
| y2008 y2000 y1992 y1980 y1976 y1968 y1960 y1952
-------------+------------------------------------------------------------------------
y2008 | 1.0000
y2000 | 0.8606 1.0000
y1992 | -0.7609 -0.8083 1.0000
y1980 | -0.7884 -0.6565 0.5719 1.0000
y1976 | 0.4471 0.4083 -0.2469 -0.1885 1.0000
y1968 | -0.4539 -0.1891 0.1338 0.6011 -0.0596 1.0000
y1960 | -0.5771 -0.7355 0.7994 0.2305 -0.3997 -0.2801 1.0000
y1952 | -0.5441 -0.5861 0.8131 0.4384 0.0769 0.1094 0.6958 1.0000</code></pre>
<p>So, S&P500’s behavior during the period surrounding President Obama’s election is positively correlated with its behavior during the same period around the elections of Presidents Bush and Carter, and negatively correlated with its behavior during the elections of Clinton, Reagan, Nixon, Kennedy, and Eisenhower. This, in and of itself, is probably an interesting observation.</p>
<p>The future is unknowable. However, we can ask how the behavior of S&P500 we have observed so far around Trump’s election relates to its behavior during the corresponding periods around the presidents mentioned above. Here are the correlations:</p>
<pre class="text"><code>. corr(y2016-y1952)
(obs=146)
| y2016 y2008 y2000 y1992 y1980 y1976 y1968 y1960 y1952
-------------+---------------------------------------------------------------------------------
y2016 | 1.0000
y2008 | -0.5251 1.0000
y2000 | -0.5629 0.8895 1.0000
y1992 | 0.8445 -0.5840 -0.5974 1.0000
y1980 | 0.6254 -0.8726 -0.7949 0.6306 1.0000
y1976 | 0.2781 0.2398 0.0344 0.2442 -0.1151 1.0000
y1968 | 0.4590 -0.8561 -0.7343 0.5256 0.7329 -0.2825 1.0000
y1960 | 0.3772 0.0333 -0.1057 0.3305 -0.0739 0.1179 -0.2437 1.0000
y1952 | 0.8402 -0.3043 -0.4568 0.8050 0.3733 0.4203 0.2343 0.6367 1.0000</code></pre>
<p>So far, S&P500’s behavior during the period surrounding Trump’s election is negatively correlated with its behavior during the same period surrounding the elections of Presidents Obama and Bush. It is positively correlated with its behavior during the elections of Presidents Clinton, Reagan, Carter, Nixon, Kennedy, and Eisenhower. The strongest correlations are between the current period and the corresponding periods surrounding the elections of Clinton, Reagan, and Eisenhower.</p>
<p>Limiting our chart only to those series, here’s what we have:</p>
<div class="thumb"><a href="/2017/01/sp500-1952-1980-1992-2016-elections.png"><img src="https://www.nu42.com/2017/01/sp500-1952-1980-1992-2016-elections.png" width="600" title="S&P500 100 trading days before/after election, 1952, 1980, 1992, 2016" alt="S&P500 100 trading days before/after election, 1952, 1980, 1992, 2016"></a></div>
<p>A hundred days from Eisenhower’s election, S&P500 was 2.6% higher. It was 4.1% higher 100 days after Reagan’s election, and up a whopping 7.4% 100 days after Clinton’s election.</p>
<p>Does this chart mean S&P500 on April 4, 2017 will be up somewhere between 2,186–2,287?</p>
<p>Of course not … remember the future has not been written yet. For all anyone knows, S&P500 may crash back down to 900 or reach for the skies at 4,000.</p>
<p>The motivation of this post was a chart that was constructed to support a specific <a href="http://www.cnbc.com/2017/01/03/morgan-stanley-buy-the-election-sell-the-inauguration.html">point of view</a> that</p>
<blockquote>
<p>U.S. stocks have rallied since the election, but it’s time for investors to start thinking about getting out, possibly timed for President-elect Donald Trump’s inauguration, Morgan Stanley said.</p>
</blockquote>
<p>In fact, there may be other reasons to expect the future is not going to be very rosy, but the chart I showed in the beginning of this post is not the reason. Aggregating the information contained individual series with different dynamics mushes around all the variability and hides interesting patterns and relationships as can be seen in the following chart:</p>
<div class="thumb"><a href="/2017/01/sp500-party-change-elections-median.png"><img src="https://www.nu42.com/2017/01/sp500-party-change-elections-median.png" width="600" title="S&P500 100 trading days before/after election, change elections with median" alt="S&P500 100 trading days before/after election, change elections with median"></a></div>
<p>Yup, if you look only at the median, the conclusion is <code><sarcasm></code>obvious<code></sarcasm></code>:</p>
<p>S&P500 100 trading days out of the election day will be 0.57% higher than its level on election day—around 2,142 points which is the point being made in the MarketWatch article. Too bad there is no change election 100 days out of which the S&P 500 was virtually unchanged. It varied between a loss of almost 23% (after Bush’s election in 2000) to a gain of more than 19% (after Kennedy’s election) with the smallest loss being 1.5% (after Nixon’s election) and the smallest gain being 2.6% (after Eisenhower’s election).</p>
</div>
</article>
Sinan Unur