Free go faster stripes for your Perl programs

We all know benchmarks don’t mean anything, but, let’s admit, the Perl entry in the n-body benchmark is pretty slow. The fastest program (written in C++) solves the problem in less than 10 seconds whereas it takes Perl eighteen minutes, fully 115 times longer, to finish.

These measurements are done using a quad-core 2.4Ghz Intel® Q6600®. The Q6600 is somewhat similar to the T7600 I put in my trusty old laptop — except that it supports a faster FSB, has a slightly faster clock speed, and incorporates a larger cache. The number of cores don’t matter much for this particular problem.

Lately, I have been playing with the compiler optimization settings in my Perl builds for Windows. The stock Windows Makefile that ships with Perl has the following:

# -O1 yields smaller code, which turns out to be faster than -O2 on x86 and x64
OPTIMIZE        = -O1 -MD -Zi -DNDEBUG

I had just built 5.22.1-RC3 on this system, so I decided to check if there was difference in performance between -O1 and -Ox.

It turns out, there isn’t much, at least when it comes to the n-body problem:

TimeThis :  Command Line :  c:\opt\perl-ox\5.22.1\bin\perl nbody.pl 50000000
TimeThis :  Elapsed Time :  00:20:44.797

versus

TimeThis :  Command Line :  c:\opt\perl-o1\5.22.1\bin\perl nbody.pl 50000000
TimeThis :  Elapsed Time :  00:21:07.724

That’s less than a 30 second, i.e. 2.4%, improvement, and the performance difference between this, and that listed in the benchmarks game leaderboard can be attributed to the differences I listed above.

I also have a Windows 10 laptop with an Core i3-5010U 2.1GHz Broadwell CPU. On this laptop, I had already built 5.23.5 with -Ox and arch:AVX2 settings. So, I tried the n-body problem with that perl.

It only took nine minutes and seventeen seconds.

If you are keeping score, that’s about twice as fast as the published results.

Of course, I cannot easily attribute this difference to any one specific factor: Is the improved performance due to the slower but better CPU? Is it due to the availability of AVX2 instructions? Or, is it due to differences between perl versions?

So, I built a few more perls.

Here are the results:

perl version Optimization Settings
O1 O1 AVX2 Ox AVX2
5.23.5 9:53 9:17
5.22.1 RC3 13:57 13:44 14:47
5.20.2 16:16

Well, as you can imagine, it takes a while to fill in those boxes in this grid, and, even though I can do other things while the builds and the benchmarks are running, I don’t think one needs a lot more to be able to make a reasonable guess as to the sources of performance improvements.

First, with the stock Makefile settings, the benchmark runs about 15% faster with 5.22.1 RC3 than with 5.20.2, and about 36% faster with 5.23.5 than with 5.22.1 RC3.

And, 5.23.5 is 39% faster than 5.20.2.

perldelta for 5.23.5 contains this innocent sounding blurb:

Faster addition, subtraction and multiplication.

Since 5.8.0, arithmetic became slower due to the need to support 64-bit integers. To deal with 64-bit integers, a lot more corner cases need to be checked, which adds time. We now detect common cases where there is no need to check for those corner cases, and special-case them.

I don’t think this does justice to the real serious improvement that was achieved here.

If you look at the commit message, you get a better idea. Dave Mitchell says:

On my platform (x86_64), it (along with the previous commit) reduces the execution time of the nbody benchmark (lots of floating-point vector arithmetic) by a third and in fact makes it 10% faster than 5.6.1.

I don’t know about the comparison to 5.6.1, but I can verify that the improvement can also be observed using Visual Studio 2013 builds on Windows 10.

On my old laptop, 5.23.5 completes the n-body benchmark in 15:02 which represents roughly a 27% improvement.

Thanks for the go faster stripes, Dave.

PS: You can discuss this post on /r/perl.

PPS: By the way, it appears the stock Makefile settings are good enough for compiling perl, but there may be XS modules for which the extra optimizations may matter.

PPPS: Keep in mind that 5.23.5 is a development release, and these improvements are going to be included the upcoming 5.24.0 production release.