Tuesday, June 30, 2015

Using HTML::Tree to create magic

Dada Mail v8.3 was recently released. Dada Mail itself is a mailing list manager written in Perl. It's  a project I've been working on since 1999(!) - lots of blood/sweat/tears. It's somewhat of a mix between Mailman (or Majordomo, if you remember that app, which I initially went out to inadvertently replace!), and one of the big email marketing services, like Constant Contact or MailChimp.

Difference to those services, is that you run it on your own server/hosting account. The system requirements are  minimal (really), so a cheap cPanel-based shared hosting on Bluehost or whatever will work just fine.

It has support to hook up with Amazon SES for mail sending, giving you a very competitive platform to send mass mailings to your announcement list or discussion list at a really good price. If you want more performance, you can run it as a PSGI app - full support to run it under Plack/PSGI, or running as a FastCGI script without Plack (mod_fcgi?), or as a plain CGI script, even. Dada Mail tries to be flexible. You won't need root access to install - it comes with its own web-based installer. Just upload the distro and a helper script, and away you go. 

One of the newer features of the app is called, Magic User Templates. This feature allows someone with very little experience with manipulating HTML code to integrate Dada Mail into their already existing site. It does this by basing it's own layout and design off of an already created page - say: the homepage to your own website.

Magic User Templates fetch this via a URL, and then places its own content between a tag, it finds via a css selector that you specify. Pretty easy for the user, and solves the problem I've been having of people wanting to brand the app to match their site, but not wanting to dive into their site's HTML  the HTML of the Dada Mail's templates, and then learning how the templating language Dada Mail works  (HTML::Template, basically). That's a big barrier to entry, when you just want something branded.

"Integration with Wordpress" is maybe my #1 feature request, but it's a nonsensical thing to me - what exactly does that mean? If it means: "I want Dada Mail to look like the rest of my site", then Magic Templates are the answer.

To do all this manipulation, I've taken advantage of the excellent HTML::Tree CPAN distribution, which can read in the HTML pretty well, and give a tree-based interface to the HTML the page is made up of. Then you can fairly successfully output the HTML back.

It does have some shortcomings, as it has a hard time with HTML5 tags. There's a bug opened up about it (which, I now can't track down - believe it has to do with HTML::Parser?), but there's not been a lot of work to fix it, as the maintainer wants a correct fix, and not a hack.

One solution is to use the HTML::TreeBuilder::LibXML module instead of HTML::TreeBuilder, but it comes with its own silent dragons - the docs are just so minimal, it's difficult to gauge what the heck it actually to does and what methods are now unsupported.

I gather the lack of docs is in direct inverse to the brilliance of the module author, and that's a weakness of being just a regular ol' joe (like me). Regardless, that will be the future of where I go with this, but for now, HTML::Tree is working great.

Here's the code I came up with.  Hopefully the code is very straightforward, readable, and holds no tricks whatsoever.
My coding is fairly straightforward for many reasons: I like no-nonsense stuff, and this is actively maintained code - it's a living project for the last 15 years! One demerit I'll give it though, is that it is a long, long sub. Someone much cleverer than I could take this down a bit.

HTML::Tree (and friends) provide just the right amount of API to do the job and do it well (although it took a little bit to figure out the right incantation). To work around some of the HTML 5 weaknesses of HTML::TreeBuilder, I call it with the following arguments:

my $root = HTML::TreeBuilder->new(
                ignore_unknown      => 0,
                no_space_compacting => 1,
                store_comments      => 1,

For the majority of the cases, that works well.  You may notice that the code is also written very defensively. Since Dada Mail is released as an app you're supposed to simply upload and run (Basically), I can't promise that HTML::Tree and friends are going to be available, so this entire feature has to be optional, and not cause the entire app to break.  Rather, I give directions on how to install the missing CPAN modules, so that a motivated user can do that part of everything themselves.

I've also gotten into the habit of returning such defensive subs  with 3 values: a status of if it worked, a hashref of errors to describe what didn't work, and the returned string, which is also set to undef if things didn't pan out right. With that, I can then fallback to something else, if the sub didn't return what I need. The idea is that something needs to work, since a template has to be returned to be used, and it's not acceptable to have something like this break, which would break the entire app. This is a great example of why: since we're relying on an outside resource (the URL to the webpage), that resource could be gone, or the details of it (the css selector in the HTML) could be changed, rendering our configuration outdated.

What do you do? Throw an error and break the app, or return what the problem is, and try to describe would could be wrong, so that you can reapply the Magic Template in a more correct manner, later? For me, the correct answer is to do the latter. This is a consumer focused app, and people just want to get the job done, without having the app break in surprising ways.

1 comment:

Blogger said...

DreamHost is ultimately one of the best website hosting company with plans for any hosting needs.