A Perl script to make printing ClimateAudit threads easier

The theme used by ClimateAudit does not have a print stylesheet. This makes printing an interesting thread in its entirety much messier than it needs to be. One solution is to write a quick Perl script to remove stylesheet used and set the display property of a bunch of elements to none. The resulting HTML page renders very plainly which might not be to everyone’s taste but that can be adjusted.

For example, one can create a print stylesheet to use and instead of removing the <link rel="stylesheet/> tag entirely from the document head, set it to point to one’s preferred print stylesheet.

#!/usr/bin/perl
# Copyright © 2010 A. Sinan Unur http://www.unur.com/sinan/
# Licensed under GPLv3 http://www.gnu.org/licenses/gpl.html
use strict; use warnings;

use LWP::Simple;
use HTML::TokeParser::Simple;

my ($url) = @ARGV;
die "Missing URL\n" unless defined $url;

my $doc = get $url;
die "Cannot retrieve '$url'\n" unless defined $doc;

my %remove = map { $_ => undef } qw(
    access
    nav-above
    nav-below
    wpl-likebox
    primary
    secondary
    footer
);

my $out_fn = $url;
$out_fn =~ s{ ^http:// }{}x;
$out_fn =~ s{ [^A-Za-z0-9]+ }{_}xg;
$out_fn .= '.html';

open my $out, '>:utf8', $out_fn
    or die "Cannot open '$out_fn' for writing: $!";

my $parser = HTML::TokeParser::Simple->new( \ $doc );

while ( my $token = $parser->get_token ) {
    if ( $token->is_start_tag('div') ) {
        my $id = $token->get_attr('id');
        $id = '' unless defined $id;
        if ( exists $remove { $id } ) {
            $token->set_attr(style => 'display:none');
        }
        elsif ( $id =~ /^post/ ) {
            $token->set_attr(style => 'width:100%' );
            $token->delete_attr('class');
        }
        elsif ( $id eq 'content' ) {
            $token->set_attr(style => 'margin:1em auto')
        }
    }
    elsif ( $token->is_start_tag('link') ) {
        my $rel = $token->get_attr('rel');
        next if $rel and 'stylesheet' eq lc $rel;
    }

    $token->rewrite_tag;
    print $out $token->as_is
        or die "Cannot print to '$out_fn': $!";
}

close $out
    or die "Cannot close '$out_fn': $!";

# The following line is for Windows only. Replace it with the appropriate
# command on your system, or open the file manually.

system start => $out_fn;