In OCaml, how can I get a list of directories in my PATH?

I have been getting my toes wet with OCaml, using Real World OCaml. The book content is freely available on their web site, but I have bought the ebook from O’Reilly, and I thoroughly recommend it.

I have to admit, it hasn’t been a quick task. I find that I am too used to the luxury of documentation at my fingertips using perldoc. Reading the book, doing the exercises does breed familiarity, but I am far away from being able to write an image gallery generator (which was my first ever Perl program).

I like the implicit type checking. In fact, that is an idea that appears in Perl as well (not as strict, but, still). For example, let f x = x + 1 defines a function that takes an integer, and returns the following integer. Yes, OCaml does distinguish between types of numbers. No, I haven’t yet gotten used to it.

Now, f 5 will return 6. But, f 0.5 will result in This expression has type float but an expression was expected of type int.

In Perl, if you defined my $f = sub { $_[0] + 1 } and invoked it with a string argument, the interpreter would notice it (and even tell you about it if you ask nicely):

$ perl -w -e 'my $f = sub { $_[0] + 1 }; $f->("test")'
Argument "test" isn't numeric in addition (+) at -e line 1.

Strict type checking is useful. The OCaml kind is not the same as the C or Java sort of type checking. Here is an example that had me scratching my head for a while until I studied it further.

Real World OCaml has the following example:

# let path = "/usr/bin:/usr/local/bin:/bin:/sbin";;
val path : string = "/usr/bin:/usr/local/bin:/bin:/sbin"
# String.split ~on:':' path
|> List.dedup ~compare:String.compare
|> List.iter ~f:print_endline
;;
/bin
/sbin
/usr/bin
/usr/local/bin
- : unit = ()

Now, if you squint enough, this is kind of like:

my $path = "/usr/bin:/usr/local/bin:/bin:/sbin";;
say for List::AllUtils::uniq(split /:/, $path);

although I do like the syntactic sugar of |>.

In Perl, I would have just used $ENV{PATH}. My thoughts immediately went to how to do that in OCaml. Luckily, utop has code-completion, so it didn’t take me a long time to figure out I could use Sys.getenv to get the value of my $PATH.

utop # Sys.getenv("PATH");;
- : string option =
Some
 "/Users/xyz/.opam/system/bin:/Users/xyz/.opam/system/bin: \
/Users/xyz/bin:/Users/xyz/perl/5.20.0/bin: \
/opt/local/bin:/opt/local/sbin:/usr/bin: \
/bin:/usr/sbin:/sbin:/usr/local/bin: \
/opt/X11/bin:/usr/local/MacGPG2/bin"

Hmmmm … Why is ~/.opam/system/bin in there twice?

Anyway, first, note that naively replacing path with (Sys.getenv "PATH") does not “work”:

utop # String.split ~on:':' (Sys.getenv "PATH");;
Error: This expression has type string option
but an expression was expected of type string

Note the Some there. Sys.getenv takes a string and possibly returns a string. In other words, its type is string -> string option = <fun>

We know why: The environment variable may or may not be defined. In Perl, we would get an undefined value in that case. Perl can then convert that value to 0 or "" as needed. In OCaml, you need to explicitly account for that possibility.

Observe the following:

utop # match Sys.getenv "PATH" with
| None -> ""
| Some x -> x
;;
- : string =
  "/Users/xyz/.opam/ ...

Here, we decided that if Sys.getenv "PATH" does not return a value, we will consider our path to be empty. The type of the return value changed from string option to simply string, and it is no longer prefixed with Some.

If you are doing something real rather than working on small modifications to textbook exercises, you might not want to proceed if the path is not defined. But, for my immediate purpose of actually using the value of my path rather than manually typing in a string, the following was sufficient:

utop # String.split ~on:':'
(match Sys.getenv "PATH" with | None -> "" | Some x -> x)
|> List.dedup ~compare:String.compare
|> List.iter ~f:print_endline
;;

Phewww!

Pattern matching like this is actually quite valuable.

There is still a gaping hole in this construction. What if you type Sys.getenv "PTHA"? You’ll end up propagating an empty path throughout a program. In Perl, I tried to avoid that kind of problem by using Const::Fast. As a simple example, I might have:

use Const::Fast;

const my %VAR => (
    HOME => 'HOME',
    PATH => 'PATH',
    TMP  => 'TMP',
);

say $ENV{ $VAR{PTHA} };

which immediately gives me Attempt to access disallowed key ‘PTHA’ in a restricted hash …. It also serves as a documentation of which environment variables my script actually depends on.

This idea corresponds to the principle of making illegal states unrepresentable which fellow Cornellian Yaron Minsky explains in his guest lecture at CMU.

PS: Why OCaml? Well, for one, I loved Higher Order Perl, and decided I should add another camel to my herd ;-)