Can programmers do statistics?

I have a feeling that most programmer types assume Stats is trivial because the arithmetic involved in most of the basic concepts they have occasion to use is trivial.

In reality, people who assume Stats is trivial share the same lack of understanding pretty much everyone else has about basic Stats concepts. For example, many of them seem to assume that significant test statistic, by itself, means something about the truth of a hypothesis, or that a high correlation coefficient between two variables means one causes the other, or one can reliably predict future values one using the future values of the other.

In the same vein, every since “data mining” ceased to be a dirty activity, they seem more amenable to shady practices. For example, a rather prominent programmer once told me at a conference that they run their A/B tests until they achieve significance.

If you do not understand why that’s a problem, then you are part of the problem.

But, I am not going to discuss A/B tests today. No, this post was motivated by a recent topic getting a lot of attention on /r/programming: Why can’t programmers program? Is “Print 100 to 1” (with one tiny catch) too hard an interview question for programming positions?

The author of that article has moved beyond FizzBuzz, and decided to ask programmers interviewing for a position to solve a simple problem:

Print 100 to 1.

…

You need to start with “for(int i=0;” and continue from there — you cannot write anything before “for(int i=0;” and you can’t use two loops.

Of course, there is a lot that is wrong with asking a question like this. First of all, if someone asked me this question, I would get hung up on the constraints that are irrelevant: Why would he tell me that I can’t use two loops when there is no reason on God’s green earth to use two loops?.

for (int i = 0; i < 100; i += 1) {
    printf("%d\n", 100 - i);
}

Now that that is out of the way, let’s get to why this article got my blood boiling.

It is not because I don’t think there a lot of fakers pretending to be programmers out there: On the contrary, I believe at least every other person making a living as a programmer is faking it. It is relatively easy to get other people to believe you’re a l33t coder when they cannot even imagine being able to cope with “paste special”.

It is not because I don’t like stupid puzzles in interviews: I don’t. And, I don’t do well on them. Contrary to, say, Atwood, I do believe typing speed is irrelevant to the quality of a programmer.

It is not because I do not believe it is necessary to actually memorize tons of APIs to be a good programmer. I do not. And, I cannot, for the life of me, memorize these things. Over the course of a given day, I may have to switch among writing Perl, C with smatterings of C++, Java, PHP, JavaScript, batch files, shell scripts, SQL statements, SAS macros, and Stata do files. Understanding the actual problem I am trying to solve is more important than remembering the order of arguments for a particular call.

No, I was annoyed by this post because the author, Rajiv Popat, decided there was a relationship between how much you whine about your current employment situation, and how badly you performed in the print numbers from 100 to 1 test.

He claims that the candidates who whined more about their current situation, did worse on the programming task.

As a staunch advocate of the continuous whining philosophy (CWP), this got my attention.

As proof his proposition, Rajiv offers the following chart:

Right off the bat, the he commits a cardinal sin: He compares the two variables by using a line graph where the X-axis has no meaning. Such line graphs are not appropriate for illustrating an association between two variables. In Stats, in the broadest terms, two variables are said to be associated with each other when they move in tandem (either in the same direction or opposite directions). Two variables may be completely unrelated, but have line graphs that look very close. As an example, consider the following simple data set:

The correlation coefficient between these two variables is 0.064. The fact that there was no association between them would have been apparent had I shown you the scatter plot — which, coincidentally, happens to be the right way to illustrate any association between two variables:

One might actually go ahead and fit a regression to these data, fake the various specification tests, and claim that every extra unit of CO₂ causes 7m of sea level increase over a century or some such nonsense, but I am not going to go there. This just illustrates that you can have two variables which look kinda sorta close on a line graph, but have no association.

I understand Rajiv has recently written an ebook. He probably thinks writing a controversial post like this may be a good way to drum up business.

Luckily, the /r/programming audience has some observant people who note: “Why did he use a line graph there?”

Still, I was curious to find out what the data really looked like, so I digitized it using GIMP, and I came up with this:

Whining    Programming
1   2
1   3
1   1
1   1
3   2
5   4
5   3
5   8
6   4
6   5
6   7
7   8
7   10
7   6
7   10
8   5
8   10
9   8
10  5
10  4
10  8

I realized too late that other people had already done this. Oh well!

On the surface, the correlation coefficient is around a respectable 0.65.

But, you must look at the scatter-plot, Luke!

It should be clear from the plot that there are two groups of candidates: A small group close to the origin, who complain just enough about their current jobs so as not to offend the interviewer, and a larger group who complain for anywhere from five to ten minutes about their current jobs.

That’s just eyeballing the data. Running K-means with two clusters actually puts some of the five minute whiners in that group as well, as you can see below:

Most importanly, among the whiners, there is no association between whining and performance on the stupid exercise. Yes, the whiners do worse than the non-whiners, but, among the whiners, longer whining time does not go with worse performance on the stupid exercise. The correlation between whining time and time spent on the problem in this group is a measly, and negative, -0.128.

As a person who enjoys whining to people with whom I have an existing relationship as a way to just vent frustrations that build up, I can see two red flags in this situation. First, an interviewer who asks me to complain about my current situation is a red flag: A job interview is not a good place to badmouth your current employer, and interviewers should know that. This may be a signal that one must look for another place to interview with.

Second, I can envision situations where eager and stressed candidates think they must badmouth their current employer for the interviewer to like them, and end up believing they have a better chance if they go on for too long. Such a person may not be a good co-worker to have around either.

Coming back to Stats, when you mix data from two separate groups, and compute a single correlation coefficient from them, you run into a situation where the coefficient might be inflated because what you are doing becomes similar to fitting a line to two points. That will always give you a correlation coefficient of 1 or -1: Statistics begins with n + 1 observations that don’t lie on a line (where n is the number of variables).
In this case, the points in the group of whiners are not tightly clustered, so we merely have an inflated correlation coefficient for the overall data set rather than one that is very close to unity.

You can also observe the effect if you just look at the non-whiners group: Without the five minute whiners, there is almost no variability in the “whining” variable, and a correlation coefficient of 0.17. With the two five minute whiners, we get a correlation of 0.71 in that group.

The upshot of all this is that whatever relationship that seems to exist between whining and doing badly on a trivial programming problem seems to be due to unobserved differences between groups, rather than a real relationship, possibly created by asking stressed people to complain about their current situations, and them sheepishly following the interviewers wishes.

It looks to me like, among the 22, there are only two acceptable candidates. The two people who knew enough that complaining about your current situation in a job interview is not proper, and, at the same time, recognized the stupid question for what it was and did not overthink it.

On the other hand, if these are the criteria by which you judge candidates, you do not deserve those two.

Can programmers do statistics?

A. Sinan Unur

April 11, 2015