We all know benchmarks don’t mean anything, but, let’s admit, the Perl entry in the n-body benchmark is pretty slow. The fastest program (written in C++) solves the problem in less than 10 seconds whereas it takes Perl eighteen minutes, fully 115 times longer, to finish.
These measurements are done using a quad-core 2.4Ghz Intel® Q6600®. The Q6600 is somewhat similar to the T7600 I put in my trusty old laptop — except that it supports a faster FSB, has a slightly faster clock speed, and incorporates a larger cache. The number of cores don’t matter much for this particular problem.
Lately, I have been playing with the compiler optimization settings in my Perl builds for Windows. The stock Windows Makefile that ships with Perl has the following:
# -O1 yields smaller code, which turns out to be faster than -O2 on x86 and x64 OPTIMIZE = -O1 -MD -Zi -DNDEBUG
I had just built 5.22.1-RC3 on this system, so I decided to check if there was difference in performance between
It turns out, there isn’t much, at least when it comes to the n-body problem:
TimeThis : Command Line : c:\opt\perl-ox\5.22.1\bin\perl nbody.pl 50000000 TimeThis : Elapsed Time : 00:20:44.797
TimeThis : Command Line : c:\opt\perl-o1\5.22.1\bin\perl nbody.pl 50000000 TimeThis : Elapsed Time : 00:21:07.724
That’s less than a 30 second, i.e. 2.4%, improvement, and the performance difference between this, and that listed in the benchmarks game leaderboard can be attributed to the differences I listed above.
It only took nine minutes and seventeen seconds.
If you are keeping score, that’s about twice as fast as the published results.
Of course, I cannot easily attribute this difference to any one specific factor: Is the improved performance due to the slower but better CPU? Is it due to the availability of AVX2 instructions? Or, is it due to differences between
So, I built a few more
Here are the results:
|perl version||Optimization Settings|
|O1||O1 AVX2||Ox AVX2|
Well, as you can imagine, it takes a while to fill in those boxes in this grid, and, even though I can do other things while the builds and the benchmarks are running, I don’t think one needs a lot more to be able to make a reasonable guess as to the sources of performance improvements.
First, with the stock Makefile settings, the benchmark runs about 15% faster with 5.22.1 RC3 than with 5.20.2, and about 36% faster with 5.23.5 than with 5.22.1 RC3.
And, 5.23.5 is 39% faster than 5.20.2.
perldelta for 5.23.5 contains this innocent sounding blurb:
Faster addition, subtraction and multiplication.
Since 5.8.0, arithmetic became slower due to the need to support 64-bit integers. To deal with 64-bit integers, a lot more corner cases need to be checked, which adds time. We now detect common cases where there is no need to check for those corner cases, and special-case them.
I don’t think this does justice to the real serious improvement that was achieved here.
If you look at the commit message, you get a better idea. Dave Mitchell says:
On my platform (x86_64), it (along with the previous commit) reduces the execution time of the nbody benchmark (lots of floating-point vector arithmetic) by a third and in fact makes it 10% faster than 5.6.1.
I don’t know about the comparison to 5.6.1, but I can verify that the improvement can also be observed using Visual Studio 2013 builds on Windows 10.
On my old laptop, 5.23.5 completes the n-body benchmark in 15:02 which represents roughly a 27% improvement.
Thanks for the go faster stripes, Dave.
PS: You can discuss this post on /r/perl.
PPS: By the way, it appears the stock Makefile settings are good enough for compiling
perl, but there may be XS modules for which the extra optimizations may matter.
PPPS: Keep in mind that 5.23.5 is a development release, and these improvements are going to be included the upcoming 5.24.0 production release.