Every time I look at another climate related data set, I am reminded once again of this amazing achievement and how much those climate scientists would have benefited from choosing the right tools for the job.
After all, by Human Genome standards, they are working with minuscule amounts of data. For example, the GHCNv2 global mean temperature data set is about 40 MB uncompressed. Yet, if you want to extract data for a particular location, you need to deal with silly Fortran code. This is the kind of stuff Perl excels at.
I think there was a time, say about 10 to 15 years ago, when people at NASA and CRU and NOAA could have said “Do we have to keep programming like we are connecting to building-size computers using 300-baud teletype terminals?”
I mean, 363 lines for a program whose purpose is:
c This program reads GHCN v2 data and writes out a subset c of the data along with a subset of the metadata c The options to subset are based on lat/lon, continent, c country, or time
All that work for what ought to be single SQL join on two tables!