Monday, September 27, 2010

On making a Simple Web-based installer in Perl

Recently and after 10 years of avoiding the idea, I made a web-based installer for Dada Mail.

My original thought was, something like this isn't necessary, that the installation process had been twiddled down to changing, oh, 3 variables in a heavily documented config file and additional "power user" fancy things to do, if You Knew What You Were Doing. What could possibly be easier?

Oh, how wrong I was - and how I never knew how wrong I was, until I've gotten some feedback on the alternative of having an installer.

You can get a gist of how the installer works in this video:




Basically, you pop up a tar.gz distro and a helper script that just gets things all set, you run the helper script and fill in a few things in a form - most of which are pre-filled with Best Guesses. I am not breaking new ground, here.

BUT what I am doing is giving my users the patented PHP Xperience - and that's what they want:

You throw up some files, you visit some stuff via your browser, you set some params and you go zoom.

The problem with, "But, there's all these CPAN dependencies!!!" was solved a long time ago in this project by only using Pure Perl modules (gets limiting in some instances) and shipping the app with an already pre-filled app-specific perllib. This also is a huge headache and the shipped-perllib is always behind, but it's better than nothing. The cpan deps service has been heaven-sent for roll-your-own-manually folks. The, "But, there's Cool Things in CPAN to do Cool Things in your App!" wish is solved by just making these cools things optional. cPanel, which is what a lot of Very Cheap Web Hosts use as their platform has a web-based cpan installer, so it's even within a mortal's reach to use CPAN to do fun things.

Anyways: here is, after a month of having the installer, some reflections:

The support boards aren't devoid of installation questions, but they sure are a lot quieter.

There are users who are intimidated by the installer. I don't know how else I can make things easier and these users would also be intimidated by doing things manually (which is still an option). I used to be intimidated with oiling the chain on my bicycle, but it takes... I dunno, 5 minutes to learn what you need to do. I don't ever say, "Uh, Google-fu the base level how-to to, (for example) FTP a file", but I don't quite know what I need to say.

But, the other amount of users that were intimidated doing things manually, but can also figure out the installer is the majority. Which is great - solved the problem. Now, the questions are simply bizarre edge cases dealing with weird MySQL setups on questionable hosting accounts. Happy to help.

Since more people are having an easier time installing the program, I get the feeling that more people are installing the program, successfully, which is great. I don't have any hard facts for that one, sadly. There are a small amount of people that tell me they like the Hard Way of doing things, and Why would I ever change? You're ruining my flawed work flow!, and I don't really know what to tell those folks, except to try the new way, that there is a manual way and it's progress - babe. And also, the previous version is still available if you want Hell, again. I had shipped the installer as the only new feature of the current version, so there's nothing that was missing.

Another hunch I have is that my own services to install the program are dramatically down. I can certainly graph how many installs I do per day/month/whatever, but these numbers are affected by a number of things: the economy, other competing programs, if Burning Man was particularly amazing and how long it takes someone after Labor Day to decompress, etc. This is also fine for me, because I loathe having to install the program for people: it takes a lot of time. I would rather they do it, themselves.

When I do install the program? Guess what, I use the web-based installer too. It's that much better/faster/spiffier. I really wish I created the installer earlier, as the time it took to make the installer would have paid back sooner for the time it's saving me number. I would have like 10 years of that time back, instead of just a month, but whatchagonna do.

But I've now raised the bar on Easiness...es and have a new problem: people who found the installer easy to use, want to upgrade just as easy. 10 years of development of a program, especially when Year 0 I knew not of the ways of Perl and Year 10 I'm still very much an amateur with a Right Hemispherical controlled mind.... things are messy. The config file format sucks, for example. And I have ten years of versions people want to upgrade from.

Should be interesting.

Some notes on the design of the installer:

I basically decided to make the installer as separate from the main program as possible, meaning it has its own library files, it's own template files, it's own testing suite (coming soon, I promise!). It does not use a framework, but uses the modulino approach. Things are fairly tidy.

It's written in a procedural design, because installations and configurations have steps. See? I just listed two of them. I made wrappers around things that deal with system calls - copying/removing files/directories, for example. I was thinking there could/would/should be edge cases for different OS's, but I really haven't found any, but I guess it still gives breathing room for someone much smarter than me to move in and make a better version of whatever I've botched. I did start with just using back ticks to system calls and replace those with perlish alternatives.


So. I'm not sure what stops all of us from making similar web-based installers for our web-apps. I am coming from a field (Art) and a preference for visual things, so I never really gotten a handle on system-admin type of tasks. The trend, it seems, in software is less, Give me as many options to roll my own doo-dad as I can have" and more, "Just freakin' work. And On my phone. When my 2-y/o uses it." I'm a little embarrassed at how much of the code I have is simply crap boilerplate to look up boring things about someone's environment to install. It seems that can all be sweeped under the floor and get some shared code. But then again, there's the rub, huh? If its shared code it's on CPAN, and then, how do you get the CPAN module, without a major compromise? And someone will be brilliant and write it in Moose (which is also, sincerely, brilliant) and there goes the baby, with the bathwater.

Sunday, March 7, 2010

Dada Mail Four Point Zero Point Three Released. Super Duper UTF-8/Unicode Support


Download:

http://dadamailproject.com

What's New

We've been working really really hard to get the UTF-8/Unicode support working well in Dada Mail and this release, we really - no foolin' this time, think we've nailed it.

If you require character set support that's a little more than what's usually found in Latin-based languages, well, this is the release for you. We're going to build localization/internationalization support into Dada Mail, starting with this release. That means, we're going to start translating Dada Mail into multiple languages.

This release also has some pleasant bug-fixes. We couldn't have done it without your feedback.

Pro Dada Four - Ever: $44

There's only one more week to take advantage of this deal. After that? Gone.

Purchase Pro Dada for a special price of Forty-Four Dollars and your subscription to download Pro Dada and the Dada Mail Manual lasts Forever.

Pro Dada subscriptions are usually for a year. This offer extends your subscription for the entire life of the Dada Mail Project.

Purchase at:

http://dadamailproject.com/purchase/pro.html

This irrational offer lasts until pi day (3/14/2010)

Pro Dada Four Installed - $88

Get Pro Dada installed by us or get any current Dada Mail upgraded to Pro Dada Mail Four, for $88 (regularly $100). We'll keep upgrading it at your request for a year, but you'll have access to the Pro Dada Download and Dada Mail Manual for the life of the Dada Mail project.

Request an Install or Upgrade:

http://dadamailproject.com/installation/request.html

This irrational offer also only lasts until pi day (3/14/2010) - you have week left to submit that installation request.

Dada Mail Turns Ten in 2010

The Dada Mail Project started ten years ago in December of 1999 as a small curiosity and has gradually evolved and developed into an extremely popular programming and conceptual art project. Happy Birthday, Dada Mail.

Dada Mail Four is our latest release. Thanks for everyone's thoughtful feedback in this year-long development effort.

We couldn't have done it without you.

We're looking forward to receiving your feedback on Dada Mail Four.

Good luck!

Justin J
Lead Dadaist
http://dadamailproject.com

Dada Mail Change Log for version 4.0.3


Unicode/UTF-8 Work

We have worked very, very hard to get Dada Mail working with UTF-8/Unicode.

We think we did a pretty good job and you'll have a most amazing experience when comparing this version to any previous version of Dada Mail (ever), but there may be tiny things still to work out.

We need to know about them, don't be shy!

SQL table schema changes!

People who upgrade to 4.0.3 (and any version afterwards, until things change!) should note that the MySQL and PostgreSQL Table Schemas have changed!

You may need to update your own tables, to support UTF-8 (if they aren't already in that encoding).

See Also:

If you're upgrading, please read over the updated UTF-8/Unicode FAQ:

http://dadamailproject.com/support/documentation/features-UTF-8.pod.html

If you're doing a new install, there's nothing you need to know, Dada Mail should work well out of the box in re: to UTF-8/Unicode stuff.

Changes to Default List Settings
We've changed a few of the default list settings, hopefully so that everyone has a more pleasant experience, right off the bat:

Activate Black List

We've enabled the setting to active the Black List, by default.

We're also enabling the below settings:

  • Move Unsubscribed Subscribers Automatically to the Black List
  • Continue to Allow Subscriptions From Subscribers of Black Listed Addresses

You still have the option to change new lists to the previous behavior and already created lists will have their previous behavior, if Black List Settings have already been edited.

Print List-Specific Headers option Removed

The option, Print List-Specific Headers has been removed from, Mail Sending -Advanced Sending Preferences has been removed, but the functionality has not. All mailing list messages will have these headers.

Send Unsubscription Confirmation Emails (Closed-Loop Opt-Out) - disabled by default

Send Unsubscription Confirmation Emails (Closed-Loop Opt-Out) has been disabled by default (you can still enable it)

This option, when enabled, requires that when someone wants to unsubscribe, they have to confirm this unsubscription by clicking on the unsubscription confirmation link in a URL sent their subscribed address.

When disabled (the new default), they simply have to fill out the subscribe/unsubscribe form.

Subscription and Unsubscription links now include an Email Address

When available, both the Subscription and Unsubscription links will have the potential subscriber's (or unsubscriber's) email address in the link itself, so that the user does not have to do the two-step of first following the link and then typing in their email address.

These links are created per-subscriber (or potential sub/unsub), when you use the:

http://dadamailproject.com/cgi-bin/dada/mail.cgi/s/dada_announce/example/example.com/

or,

http://dadamailproject.com/cgi-bin/dada/mail.cgi/u/dada_announce/example/example.com/

tags. Previously, these tags only provided a link to the subscription/unsubscription form, without the email address embedded within the link itself. There is no way to revert this behaviour, but you can still roll your own links, like this:

Subscription Link:

/s/

Unsubscription Link:

/u/

Unsubscription Links Now Mandatory for Mass Mailing Messages Dada Mail will now do a quick check to make sure that there is a Dada Mail Unsubscription link in your mass mailing messages, before sending them out.

If one is not found, one will be automatically appended to the end of your message.

It will not be very fancy.

We suggest that you make sure that you have a real, valid, Dada Mail unsubscription link in your Mailing List Messages.

Bug Fixes 4.0.3

  • Send newest archived message may have outdated header information

http://github.com/justingit/dada-mail/issues/issue/30

  • pop3 username/password not saved when "Save, Then Test..." button pressed in Sending Preferences

http://github.com/justingit/dada-mail/issues/issue/29

  • Beatitude: Months are listed out of order

http://github.com/justingit/dada-mail/issues/issue/28

  • profile field names can contain more than just ascii letters, numbers and underscores

http://github.com/justingit/dada-mail/issues/issue/27

  • list short names can contain more than just ascii letters, numbers and underscores

http://github.com/justingit/dada-mail/issues/issue/26

  • Beatitude: Scheduled List Not in Any Useable Order?

http://github.com/justingit/dada-mail/issues/issue/16

  • Dada Bridge: Spam Assassin Level Picker isn't available

http://github.com/justingit/dada-mail/issues/issue/21

  • Sending Preferences don't correctly state if you can use Use Secure Sockets Layer (SSL) for POP-before-SMTP

http://github.com/justingit/dada-mail/issues/issue/24

  • Double Subscriptions when using List Invitation

http://github.com/justingit/dada-mail/issues/issue/23

  • Archived messages not templated out in publicly displayed archives

http://github.com/justingit/dada-mail/issues/issue/20

*Link to edit subscriber information broken when using the search

http://github.com/justingit/dada-mail/issues/issue/19

  • Unsubsciption Notice to List Owner doesn't have subscriber (profile) fields

http://github.com/justingit/dada-mail/issues/issue/18

  • Disabled Menu items return server error when using the, "Classic" session type

http://github.com/justingit/dada-mail/issues/issue/15

Tuesday, February 23, 2010

Experimental Dada Mail w/unicode ¡Support! Released

(this is a repost from here, 'cause I'm pretty stoked on it)


This is the first step in the localization project, since we can't very well translate Dada Mail if Dada Mail can't use the translations available.

I have to let this project rest for a little bit (and collect my wits - it was a very difficult step!) but any and all feedback is welcome, if you'd like to give this a spin - bug reports/problems of any kind are very much appreciated.

This version of Dada Mail should basically be able to support any language that can in the unicode characters set and UTF-8 encoding. Which, should be, well, a lot of them. It doesn't (Dada Mail), but where it fails? I don't know - but it's a good time to test and see where it's wrong.

For simple Euro-centric stuff, like this:

Je peux manger du verre, ça ne me fait pas mal.

It should be fine. For something a little more wild:

أنا قادر على أكل الزجاج و هذا لا يؤلمني.

(which should be Arabic)

Well, I can only go on if something visually looks correct :) Even this email is sort of a test - I don't know if it's going to work, or not - so, fingers crossed! If it does - we're on a good track, since Dada Bridge taking a random email, having it go through the system that's mostly tested using a very specific way of creating emails and coming out readable on the other side is a great big step - not even talking about the online archive, rss/atom feeds, twitter thingie, etc, etc, etc.

Here's the download to the version I'm now running at the Dada Mail support site:

http://github.com/downloads/justingit/dada-mail/dada-4_0_2-unicode.zip

http://github.com/downloads/justingit/dada-mail/dada-4_0_2-unicode.tar.gz

If you want to check it out via github, the branch is at:

http://github.com/justingit/dada-mail/tree/charset_work

To grab it with git, you have to do this:


git clone git://github.com/justingit/dada-mail.git
cd dada-mail
git fetch
git checkout --track -b your_local_branch_name origin/charset_work


Here's the explanation of all that:

http://groups.google.com/group/github/browse_thread/thread/71f944b925467ab6

There's a guide of what to expect with Dada Mail and unicode/UTF-8 you can read here:

http://dadamailproject.com/support/documentation-4_0_2-unicode/features-UTF-8.pod.html

Which I'll paste the contents of at the end of this message - but you may also want to compare it to the version of this doc for 4.0.2 STABLE:

http://dadamailproject.com/support/documentation-4_0_2/features-UTF-8.pod.html

(Long story short: 4.0.2 UTF-8/Unicode Support: "uhh...")

And, that's about it. This was a hard part of the project, since this is a 10+ y/o codebase - it very much pre-dates even unicode/UTF-8 support in Perl itself, so there's a reason, I guess, why the program was in such bad shape when it came to support it. Many,

many many bugs showed themselves, once this feature was asked for. I think a great majority of them have been solved.

Give it a spin if this interests you and if I can help out with anything, let me know,


--
Introduction

Dada Mail can speak UTF-8 and almost expects that everything else around it does, too.

That means:

• It treats everything it handles as UTF-8
• Everything it returns is in UTF-8
How To Have a Pleasant Experience

If you're installing Dada Mail for the first time, there's nothing you'll need to do, but below are some great guidelines on how to keep your lists configured, so you continue to have a good experience.

If you're upgrading, make sure your configuration reflects the advice below.

It's heavily advised to keep everything in Dada Mail speaking UTF-8 without any real exceptions.

Config Variable: $HTML_CHARSET

By default, the config variable, $HTML_CHARSET is set to, UTF-8

Keep it that way, same case (UTF-8) - same everything.

Dada Mail is only tested with the charset set this way.

Advanced Sending Preferences

Default Character Set

Set this as, UTF-8 UTF-8

Default Plain Text/HTML Message Encoding

There's really only a few choices recommended for Dada Mail.

• 8bit
Should work.

• quoted-printable

If you have any trouble with 8bit, try quoted-printable. Because of the amount of time that Dada Mail creates, tweaks, formats and templates out email messages, the encoding can potentially get mucked up.

This potential mucking-up is mitigated when Dada Mail uses quoted-printable encoding internally. This should be the default for email messages.

Encode Message Headers

Have this option checked.

SQL Backends

Database

PostgreSQL

Encoding for PostgreSQL databases is done when the database is created - make sure to create your database with a, UTF-8 encoding, like so:

CREATE DATABASE dadamail WITH ENCODING 'UTF-8'
MySQL

Nothing you'll have to do.

SQLite

Nothing you'll have to do.

DBM Files

DBM Files have no encoding support, but Dada Mail knows this and compensates.

Schema

MySQL

The MySQL schemas are set to create tables with an encoding of, UTF-8

PostgreSQL

Nothing has changed.

SQLite

Nothing has changed.

Drivers

The current support SQL backends, mysql (MySQL), Pg (PostgreSQL) and SQLite all have different ways to somewhat, "enable" their UTF-8 support.

• MySQL
add,

mysql_enable_utf8 => 1,
has been added to the $DBI_PARAMS hashref.

• PostgreSQL
add,

pg_enable_utf8 => 1,
has been added to the $DBI_PARAMS hashref.

• SQLite
add,

sqlite_unicode => 1
has been added to the $DBI_PARAMS hashref.

No explicit encoding/decoding is done in Dada Mail when saving/retrieving data. Hopefully, the drivers are UTF-8-aware enough.

Plugins/Extensions

The Plugins and Extensions that come with Dada Mail have not been as thoroughly tested as the main program. There's still warts.

Dada Bridge

Dada Bridge has a unique position needing to handle a lot of different stuff thown at it and deal with it gracefully. Dada Mail does, in fact, handle, any realistic character set/encoding you throw at it, but Dada Mail will convert messages it receives to its internal format, before it resends it out to your list.

This means the encoding of your choice (8bit or quoted-printable) and the charset of your choice (as long as your charset is, UTF-8)

Upgrading

You are potentially going to have problems.

Its possible that, since List Settings were never decoded/encoded correctly in past versions, they'll show up the program (once you've upgrade) incorrectly. The easiest thing to do is to edit the mistakes and resave the information. For most of the program, you're going to have to manually export the information and re-import it with the correct encoding, sadly. Dada Mail will probably fail gracefully with old information, but it's possible that you'll see squiggly charaters, instead of what you want to see. There's nothing in Dada Mail that will stop this from happening. If you experience it (from old information), we're not going to count it as a bug, but rather a known issue.

Problems?

Please let us know via the Support Boards:

http://dadamailproject.com/support/boards/

Or the developer mailing list:

http://dadamailproject.com/cgi-bin/dada/mail.cgi/list/dadadev/

Thanks!

See Also:

• The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)

http://www.joelonsoftware.com/articles/Unicode.html

• perlunitut - Perl Unicode Tutorial

http://perldoc.perl.org/perlunitut.html

• perlunifaq - Perl Unicode FAQ

http://perldoc.perl.org/perlunifaq.html



--

Post:
mailto:dadadev@dadamailproject.com

Unsubscribe:
http://dadamailproject.com/cgi-bin/dada/mail.cgi/u/dadadev/

List Information:
http://dadamailproject.com/cgi-bin/dada/mail.cgi/list/dadadev

Archive:
http://dadamailproject.com/cgi-bin/dada/mail.cgi/archive/dadadev

Developer Info:
http://dev.dadamailproject.com

Saturday, February 20, 2010

Perl HTML::Template and UTF-8 Unicode

HTML::Template does not support file encoding:


#!/usr/bin/perl -w
use strict;
use Encode;
use HTML::Template;
my $template = HTML::Template->new(
filehandle => *DATA,
);
print Encode::encode('UTF-8', $template->output);
__DATA__
¡™£¢∞§¶•ªº

prints, ¡™£¢∞§¶•ªº (or something like that!)

In the example above, this makes sense, since we're printing on an open filehandle (even if it's only to our magical, DATA) that we didn't put a file layer filter thingy to. That's easy to fix:



#!/usr/bin/perl -w
use strict;
use Encode;

binmode DATA, ':encoding(UTF-8)';

use HTML::Template;
my $template = HTML::Template->new(
filehandle => *DATA,
);
print Encode::encode('UTF-8', $template->output);
__DATA__
¡™£¢∞§¶•ªº
prints, ¡™£¢∞§¶•ªº, yay!


This also works if we want to just pass a reference to a scalar to HTML::Template:


#!/usr/bin/perl -w
use strict;
use Encode;
my $content = "\x{a1}\x{2122}\x{a3}\x{a2}\x{221e}\x{a7}\x{b6}\x{2022}\x{aa}\x{ba}";
use HTML::Template;
my $template = HTML::Template->new(
scalarref => \$content,
);
print Encode::encode('UTF-8', $template->output);
prints, ¡™£¢∞§¶•ªº, yay!

This doesn't work, if we want to just give it a name of a template file. This is really useful, since HTML::Template has a feature to allow you to search through a file structure (or at least an array of directories, looking for the file).

And this is where encoding madness begins.

Cause I know what you're thinking, just treat HTML::Template's output like information that's coming from outside your program (since, if you're using a template *file*, it kinda is).

So, all you need to do is decode (this is the WRONG WAY to solve the problem, but let's just make that mistake...) the return value of ->output, like this:


#!/usr/bin/perl -w
use strict;
use Encode;
my $content = "\x{a1}\x{2122}\x{a3}\x{a2}\x{221e}\x{a7}\x{b6}\x{2022}\x{aa}\x{ba}";

my $filename = 'utf8string.tmpl';

open my $fh, '>:encoding(UTF-8)', $filename or die $!;
print $fh $content;
close $fh;

use HTML::Template;
my $template = HTML::Template->new(
filename => $filename,
);

my $output = $template->output;
$output = Encode::decode('UTF-8', $output);

print Encode::encode('UTF-8', $output);


prints, ¡™£¢∞§¶•ªº. Yes.

But... what if you have a variable (it is a templating system) and the variable in the param() you pass has UTF-8 strings? MUAHAHA!


#!/usr/bin/perl -w
use strict;
use Encode;
my $content = "
<!-- tmpl_var one -->
\x{a1}\x{2122}\x{a3}\x{a2}\x{221e}\x{a7}\x{b6}\x{2022}\x{aa}\x{ba}
";

my $filename = 'utf8string.tmpl';

open my $fh, '>:encoding(UTF-8)', $filename or die $!;
print $fh $content;
close $fh;


use HTML::Template;
my $template = HTML::Template->new(
filename => $filename,
);
$template->param(
one => "\x{a1}\x{2122}\x{a3}\x{a2}\x{221e}\x{a7}\x{b6}\x{2022}\x{aa}\x{ba}",
);

my $output = $template->output;
$output = Encode::decode('UTF-8', $output);

print Encode::encode('UTF-8', $output);

Cannot decode string with wide characters at /System/Library/Perl/5.10.0/darwin-thread-multi-2level/Encode.pm line 162.


Bahahaha!

Take those decode/encode lines (I know it looks strange to one, right after the other ) and you'll still get a weird output:


¡™£¢∞§¶•ªº
¡™£¢∞§¶•ªº


Darned if you do/don't. Those two lines should have the same string. They don't. No amount of encoding/decoding is going to help.


The trick, other than tweaking HTML::Template's source to include file filter layer thingamabobs, is to decode the contents of the file it opens up.

How to do that.

Trolling through the HTML::Template mailing list archives leads to the idea of using a HTML::Template filter that matches everything, that then does our decoding:



#!/usr/bin/perl -w
use strict;
use Encode;
my $content = "
<!-- tmpl_var one -->
\x{a1}\x{2122}\x{a3}\x{a2}\x{221e}\x{a7}\x{b6}\x{2022}\x{aa}\x{ba}
";

my $filename = 'utf8string.tmpl';

open my $fh, '>:encoding(UTF-8)', $filename or die $!;
print $fh $content;
close $fh;


use HTML::Template;
my $template = HTML::Template->new(
filename => $filename,
filter => [
{ sub => \&decode_str, format => 'scalar' },
],
);
$template->param(
one => "\x{a1}\x{2122}\x{a3}\x{a2}\x{221e}\x{a7}\x{b6}\x{2022}\x{aa}\x{ba}",
);

my $output = $template->output;


print Encode::encode('UTF-8', $output);



sub decode_str {
my $ref = shift;
${$ref} = Encode::decode('UTF-8', ${$ref});
}

This sort of lines up all the data to be UTF-8 encoded and aware and all that stuff that the unicodefaqthingy perldoc tells you to do.

But, oh, it gets better.

DON'T use that filter trick thing if you're using a scalarref, or a properly encoded file handle! You'll get a nice error, like this:

HTML::Template->new() : fatal error occured during filter call: Cannot decode string with wide characters at /System/Library/Perl/5.10.0/darwin-thread-multi-2level/Encode.pm line 162.
at /Library/Perl/5.10.0/HTML/Template.pm line 1697
HTML::Template::_init_template('HTML::Template=HASH(0x1008aafb8)') called at /Library/Perl/5.10.0/HTML/Template.pm line 1238
HTML::Template::_init('HTML::Template=HASH(0x1008aafb8)') called at /Library/Perl/5.10.0/HTML/Template.pm line 1124



Brilliant.


So I don't know what the best advice is to give. If you're passing the template as a scalarref, DON'T use that filter, unless you want to, perhaps encode your template beforehand (which makes little sense?)

If it's a filename, use that filter trick perhaps (or edit the sourcecode of HTML::Template).

Monday, February 15, 2010

Perl, UTF-8 Email Messages, MIME::Enity and QuotedPrintable encoding

Some findings after much bashing of head:


#!/usr/bin/perl -w
use strict;

use MIME::Entity;
use Encode;

# My UTF-8 string -
# ¡™£¢∞§¶•ªº
# Basically using Mac OS X, just hold down the alt/option key and hit the 1 through 0 keys, in succession:
#
my $content = "\x{a1}\x{2122}\x{a3}\x{a2}\x{221e}\x{a7}\x{b6}\x{2022}\x{aa}\x{ba}";

# Build the message, using MIME::Entity.
# MAKE SURE TO ALWAYS encode('UTF-8', 'string') BEFORE ADDING
# Always.

my $pt_entity = MIME::Entity->build(
Type => 'text/plain',
Data => Encode::encode('UTF-8', $content),
Encoding => 'quoted-printable',
);



# MAKE SURE TO ALWAYS decode('UTF-8', 'string') BEFORE WORKING WITH STRING
# Always.
my $new_content = $pt_entity->bodyhandle->as_string;
$new_content = Encode::decode('UTF-8', $new_content);

# For example, we're just going to reverse it:
$new_content = reverse($new_content);




my $io = $pt_entity->bodyhandle->open('w');

# YES. You will will need to encode content using the bodyhandle. Always.
# Always.

$new_content = Encode::encode('UTF-8', $new_content);
$io->print($new_content);
$io->close;
$pt_entity->sync_headers(
'Length' => 'COMPUTE',
'Nonstandard' => 'ERASE'
);


# And, that's it.



# Before using the content, decode
# Always.
my $result = $pt_entity->bodyhandle->as_string;
$result = Encode::decode('UTF-8', $result);


# Always encode, before printing.
# Always.
#
# prints, ºª•¶§∞¢£™¡
print Encode::encode('UTF-8', $result);


The trick is to always, always, always encode your data, before creating any sort of entity using MIME::Entity and to always, always always decode the data you get using bodyhandle()

This workflow is strange, since you're told not to encode data, until you're ready to print it. I suspect there's some weird IO::File stuff going on with MIME::Entity (and friends), or, want to think of saving binary data, instead of characters when creating MIME stuff. I don't know.

If you do not encode before, MIME::Entity will barf, when using the quoted/printable encoding, but will probably be just fine with, "8bit" encoding.

This was a huge headache to figure out.

This will all seem to work out, if you don't do that first encode:


#!/usr/bin/perl -w
use strict;

use lib qw(/Users/justin/Documents/DadaMail/git/dada-mail/dada/DADA/perllib);

use MIME::Entity;
use Encode;

# My UTF-8 string -
# ¡™£¢∞§¶•ªº
# Basically using Mac OS X, just hold down the alt/option key and hit the 1 through 0 keys, in succession:
#
my $content = "\x{a1}\x{2122}\x{a3}\x{a2}\x{221e}\x{a7}\x{b6}\x{2022}\x{aa}\x{ba}";

# Build the message, using MIME::Entity.
# MAKE SURE TO ALWAYS encode('UTF-8', 'string') BEFORE ADDING
# Always.

my $pt_entity = MIME::Entity->build(
Type => 'text/plain',
# Data => Encode::encode('UTF-8', $content),
Data => $content,
Encoding => 'quoted-printable',
);

my $s = $pt_entity->bodyhandle->as_string;

$s = Encode::decode('UTF-8', $s);

# Let's do a little string manip:
$s = reverse($s);

$s = Encode::encode('UTF-8', $s);

print $s;


Cannot decode string with wide characters at /System/Library/Perl/5.10.0/darwin-thread-multi-2level/Encode.pm line 162.

exception_handler::die in Encode.pm at line 162
Encode::decode in test7.pl at line 28






So do that first encode, please. If you don't follow this formula, your prog may work, until that last encode:


 
#!/usr/bin/perl -w
use strict;

use lib qw(/Users/justin/Documents/DadaMail/git/dada-mail/dada/DADA/perllib);

use MIME::Entity;
use Encode;

# My UTF-8 string -
# ¡™£¢∞§¶•ªº
# Basically using Mac OS X, just hold down the alt/option key and hit the 1 through 0 keys, in succession:
#
my $content = "\x{a1}\x{2122}\x{a3}\x{a2}\x{221e}\x{a7}\x{b6}\x{2022}\x{aa}\x{ba}";

# Build the message, using MIME::Entity.
# MAKE SURE TO ALWAYS encode('UTF-8', 'string') BEFORE ADDING
# Always.

my $pt_entity = MIME::Entity->build(
Type => 'text/plain',
# Data => Encode::encode('UTF-8', $content),
Data => $content,
Encoding => 'quoted-printable',
);

my $s = $pt_entity->bodyhandle->as_string;

# NAW, we don't need that
# $s = Encode::decode('UTF-8', $s);

# Let's do a little string manip:
$s = reverse($s);

# Well, that's silly! We don't need that one, either!
# $s = Encode::encode('UTF-8', $s);

print $s;



Wide character in print at /Users/justin/Desktop/test7.pl line 37.


And, you will do what I do, and bang your head, some more.

I couldn't fine any info on how to handle things like MIME::Entity and UTF-8 encoding, in the excellent articles available such as this one:

http://ahinea.com/en/tech/perl-unicode-struggle.html

and,

http://perlgeek.de/en/article/encodings-and-unicode

and,

http://juerd.nl/site.plp/perluniadvice


I have this article labeled as, "do not trust"

http://kbinstuff.googlepages.com/perl,unicodeutf8,cgi.pm,apache,mod_perla

Because it states,

6.1. Encode::encode/decode

For start, you should avoid using Encode::encode/decode/from_to to the greatest possible extent in your scripts. This will only lead to great confusion later. You may think you have gotten everything to work, but then a week later, you shall only add a little more functionality to your work and suddenly, everything falls apart and doodles will appear on your web pages.



I guess I understand what they mean - but you'll need to encode your UTF-8 stuff before it exits your program. Always. And, you have to decode UTF-8 info that goes out of your program. Always. How do you do this? Uh-huh, the Encode module.

Like it says in the perldoc for unicode. So, I don't know what this page is yabbering about. I'm sure, behind the scense, Encode is used when open files with a specific encoding:

http://perldoc.perl.org/perluniintro.html#Unicode-I/O

Which, by the way of features, is a pretty rad one.