A discrepancy between P-square algorithm implementation in Boost.Accumulators and the original paper

At the end of my previous post, I highlighted a discrepancy between the output produced by the Boost.Accumulators implementation of the P2 algorithm. It turns out the discrepancy was due to a typo in the original paper and not in the Boost.Accumulators implementation as I had originally suspected.

All my objections listed in that post to the testing method used by the library are still valid, but I thought it was important to emphasize that my suspicions about an implementation error were misguided. Of course, had the authors tested their implementation against figures provided in the original paper, they may have discovered the typo in that paper a long time ago. That in and of itself serves to emphasize the importance of testing one's code against known results.

I consider my more serious blog entries to be the equivalent of working papers: This is where I hash out ideas so others with similar interests can learn from my experience and/or point out my errors. This is a process of discovery: Explaining my process of how I discovered an interesting problem does not necessarily mean I have a solution at the time of discovery. It means I have reached a point where it is useful to step back and summarize what I have done.

When I do this, without fail, some individuals demand bug reports and pull requests. A pull request is more like submitting a draft to a journal for publication. That comes after I outline the basics in a working paper, present it and get some feedback.

Back to the Boost.Accumulators implementation of the P2 algorithm. At the end of my previous post, I showed the following output:

calculated= 0.74        expected= 0.74
calculated= 0.74        expected= 0.74
calculated= 2.06167     expected= 2.18
calculated= 4.55176     expected= 4.75
calculated= 4.55176     expected= 4.75
calculated= 9.15196     expected= 9.28
calculated= 9.15196     expected= 9.28
calculated= 9.15196     expected= 9.28
calculated= 9.15196     expected= 9.28
calculated= 6.17976     expected= 6.3
calculated= 6.17976     expected= 6.3
calculated= 6.17976     expected= 6.3
calculated= 6.17976     expected= 6.3
calculated= 4.24624     expected= 4.44
calculated= 4.24624     expected= 4.44

In this output, calculated values are produced by the Boost.Accumulators implementation and the expected values come from Table I in the paper. According to the paper, these values were produced from the following sequence of observations:

0.02, 0.5, 0.74, 3.39, 0.83,

22.37, 10.15, 15.43, 38.62, 15.92, 34.60, 10.28, 1.47, 0.40, 0.05, 11.39, 0.27, 0.42, 0.09, 11.37

where the first five observations are listed on page 1078:

[ 0.02, 0.5, 0.74, 3.39, 0.83 ]

These observations are used to set up the algorithm. Then, Table I in the paper summarizes the mechanics of the algorithm.

In the absence of evidence to the contrary, my initial instinct was to assume that the figures in a peer-reviewed paper published 30 years ago were correct. After all, I have been calculating percentiles for large data sets (where large means something larger every year) for more than 2/3 of that period, and, therefore, have been directly or indirectly using the algorithm outlined in this paper all that time.

That meant trying to figure out what subtle error might be lurking in the implementation. I probably re-wrote the calculation of the parabolic interpolation in that code six and a half different ways. Used different combinations of intermediate calculations, compared output from gcc, VC++, and clang, compared output using floats versus doubles, single stepped through code, printfed every intermediate step etc etc and could not get the output to match what was shown in the paper.

[ Table 1 in original paper shows a worked out example of P^2 algorithm ]

I decided it was time to verify the calculations by hand. And, I do mean by hand. Handling x6 = 22.37 was easy. With x7 = 10.15, I had to calculate only one of the marker heights, q4 = 4.465. I noticed this was listed as 4.47 in the paper, and I briefly wondered if the discrepancy could be due to someone using the rounded figure in subsequent calculations rather than the exact number. Then, it was time to handle x8 = 15.43. This meant that q3 (that, is our approximation to the median of the sample {0.02, 0.5, 0.74, 3.39, 0.83, 22.37, 10.15}) needed to be updated. In this case, the parabolic interpolation formula on page 1078 of the paper reduces to:

0.74 + (1/3) * ( 2 * (4.465 - 0.74) / 2 + 1 * ( 0.74 - 0.5) / 1 )

= 0.74 + (1/3) * ( 3.725 + 0.24 )

= 2.06166666666667

and this matches the output from the Boost.Accumulators implementation. On the other hand, Table I in the original paper says the new value q3 after this step should be 2.18.

This made me stop in my tracks and start staring helplessly at the paper. For q3 to take on the value, the last parenthesized expression had to add up to 4.32 rather than 3.97. That is, q3 - q2 = 0.74 - 0.5 = 0.24 was too small by approximately 0.35.

That's when I saw it.

If you look carefully at the top row of Table I in the paper, you'll notice it, too if I zoom in:

[ Table 1 shows second observation is 0.15 ]

While the text says the second observation is 0.5, Table I shows that the computations were based on the second observation being 0.15. It was easy to verify that this would fix the calculation of q3 after the arrival of x8. To make sure this typo was indeed the source of the discrepancy between the Boost.Accumulators implementation and the data shown in the original paper, I updated my original demo program, and ran it:

#include <algorithm>
#include <cstdio>
#include <string>
#include <utility>
#include <vector>
#include <boost/accumulators/accumulators.hpp>
#include <boost/accumulators/statistics/stats.hpp>
#include <boost/accumulators/statistics/median.hpp>

namespace bacc = boost::accumulators;

int main(void)
{
    bacc::accumulator_set<double,
        bacc::stats<bacc::tag::median(bacc::with_p_square_quantile)> > acc;

    // See http://www.cse.wustl.edu/~jain/papers/psqr.htm

    // First five observations
    acc(0.02);
    acc(0.15);
    acc(0.74);
    acc(3.39);
    acc(0.83);

    const std::vector<std::pair<double, double> > jain_chlamtac {
        {22.37, 0.74},
        {10.15, 0.74},
        {15.43, 2.18},
        {38.62, 4.75},
        {15.92, 4.75},
        {34.60, 9.28},
        {10.28, 9.28},
        {1.47,  9.28},
        {0.40,  9.28},
        {0.05,  6.30},
        {11.39, 6.30},
        {0.27,  6.30},
        {0.42,  6.30},
        {0.09,  4.44},
        {11.37, 4.44},
    };

    for (auto p: jain_chlamtac)
    {
        acc(p.first);
        std::printf("calculated= %.3f\texpected= %.2f\n", bacc::median(acc), p.second);
    }

    return 0;
}

when run, this program produced the output:

calculated= 0.740       expected= 0.74
calculated= 0.740       expected= 0.74
calculated= 2.178       expected= 2.18
calculated= 4.753       expected= 4.75
calculated= 4.753       expected= 4.75
calculated= 9.275       expected= 9.28
calculated= 9.275       expected= 9.28
calculated= 9.275       expected= 9.28
calculated= 9.275       expected= 9.28
calculated= 6.297       expected= 6.30
calculated= 6.297       expected= 6.30
calculated= 6.297       expected= 6.30
calculated= 6.297       expected= 6.30
calculated= 4.441       expected= 4.44
calculated= 4.441       expected= 4.44

That is, the output from the Boost.Accumulators implementation of the algorithm would exactly match the output in the original paper if the code was run in "round to nearest" mode.

I fired off a quick email to Dr. Jain explaining my findings. He immediately annotated the online version of the paper to note the typo:

[ Corrigendum in original paper ]

Now that we have that cleared it up, we can use the example worked out in the original paper to put together a test of the implementation of the algorithm. My work increased my confidence in the implementation of the P2 algorithm in Boost.Accumulators. Obviously, this simple typo does not invalidate any of the conclusions of Jain & Chlamtac (1985).

I am still puzzled by some output I am getting from the other two median algorithms in Boost.Accumulators, and I'll turn my attention to those next.

PS: You can discuss this post on r/cpp or HN.