Identifier strings consisting solely of digits are not numbers

The last time I saw a punch card was in 1984. The last time I handled an 8-inch floppy was in April 1987. So, why am I still seeing people treat ZIP codes and TINs as numbers?

Here are some hints that you might be handling your data wrong if:

  1. You have ZIP codes consisting of four digits in your database.
  2. You have an identifier described as “10 character alphanumeric string” in the documentation, and your database specification for the id is NUMERIC(10).
  3. You also import the data using a home-built C program that uses itoa liberally.

Incidentally, Excel just makes it too easy to create nonsense values in columns containing ID strings consisting solely of digits. It’s not just the obvious cases where the ZIP code 06020 becomes the number 6020 as soon as a file passes through Excel, but also the NetID schemes that are in use at places like Cornell and Columbia where the ID consists of a user’s initials plus up to three digits.

Guess what happens when someone sends you a class list with student NetIDs in the first column and you have student called Diana E. Christiansen who’s the 25th user with the initials d, e, and c.