Some findings after much bashing of head:
#!/usr/bin/perl -w
use strict;
use MIME::Entity;
use Encode;
# My UTF-8 string -
# ¡™£¢∞§¶•ªº
# Basically using Mac OS X, just hold down the alt/option key and hit the 1 through 0 keys, in succession:
#
my $content = "\x{a1}\x{2122}\x{a3}\x{a2}\x{221e}\x{a7}\x{b6}\x{2022}\x{aa}\x{ba}";
# Build the message, using MIME::Entity.
# MAKE SURE TO ALWAYS encode('UTF-8', 'string') BEFORE ADDING
# Always.
my $pt_entity = MIME::Entity->build(
Type => 'text/plain',
Data => Encode::encode('UTF-8', $content),
Encoding => 'quoted-printable',
);
# MAKE SURE TO ALWAYS decode('UTF-8', 'string') BEFORE WORKING WITH STRING
# Always.
my $new_content = $pt_entity->bodyhandle->as_string;
$new_content = Encode::decode('UTF-8', $new_content);
# For example, we're just going to reverse it:
$new_content = reverse($new_content);
my $io = $pt_entity->bodyhandle->open('w');
# YES. You will will need to encode content using the bodyhandle. Always.
# Always.
$new_content = Encode::encode('UTF-8', $new_content);
$io->print($new_content);
$io->close;
$pt_entity->sync_headers(
'Length' => 'COMPUTE',
'Nonstandard' => 'ERASE'
);
# And, that's it.
# Before using the content, decode
# Always.
my $result = $pt_entity->bodyhandle->as_string;
$result = Encode::decode('UTF-8', $result);
# Always encode, before printing.
# Always.
#
# prints, ºª•¶§∞¢£™¡
print Encode::encode('UTF-8', $result);
The trick is to always, always, always encode your data, before creating any sort of entity using MIME::Entity and to always, always always decode the data you get using
bodyhandle()
This workflow is strange, since you're told not to encode data, until you're ready to print it. I suspect there's some weird IO::File stuff going on with MIME::Entity (and friends), or, want to think of saving binary data, instead of characters when creating MIME stuff. I don't know.
If you do not encode before, MIME::Entity will barf, when using the quoted/printable encoding, but will probably be just fine with, "8bit" encoding.
This was a huge headache to figure out.
This will all seem to work out, if you don't do that first encode:
#!/usr/bin/perl -w
use strict;
use lib qw(/Users/justin/Documents/DadaMail/git/dada-mail/dada/DADA/perllib);
use MIME::Entity;
use Encode;
# My UTF-8 string -
# ¡™£¢∞§¶•ªº
# Basically using Mac OS X, just hold down the alt/option key and hit the 1 through 0 keys, in succession:
#
my $content = "\x{a1}\x{2122}\x{a3}\x{a2}\x{221e}\x{a7}\x{b6}\x{2022}\x{aa}\x{ba}";
# Build the message, using MIME::Entity.
# MAKE SURE TO ALWAYS encode('UTF-8', 'string') BEFORE ADDING
# Always.
my $pt_entity = MIME::Entity->build(
Type => 'text/plain',
# Data => Encode::encode('UTF-8', $content),
Data => $content,
Encoding => 'quoted-printable',
);
my $s = $pt_entity->bodyhandle->as_string;
$s = Encode::decode('UTF-8', $s);
# Let's do a little string manip:
$s = reverse($s);
$s = Encode::encode('UTF-8', $s);
print $s;
Cannot decode string with wide characters at /System/Library/Perl/5.10.0/darwin-thread-multi-2level/Encode.pm line 162.
exception_handler::die in Encode.pm at line 162
Encode::decode in test7.pl at line 28
So do that first encode, please. If you don't follow this formula, your prog may work, until that last encode:
#!/usr/bin/perl -w
use strict;
use lib qw(/Users/justin/Documents/DadaMail/git/dada-mail/dada/DADA/perllib);
use MIME::Entity;
use Encode;
# My UTF-8 string -
# ¡™£¢∞§¶•ªº
# Basically using Mac OS X, just hold down the alt/option key and hit the 1 through 0 keys, in succession:
#
my $content = "\x{a1}\x{2122}\x{a3}\x{a2}\x{221e}\x{a7}\x{b6}\x{2022}\x{aa}\x{ba}";
# Build the message, using MIME::Entity.
# MAKE SURE TO ALWAYS encode('UTF-8', 'string') BEFORE ADDING
# Always.
my $pt_entity = MIME::Entity->build(
Type => 'text/plain',
# Data => Encode::encode('UTF-8', $content),
Data => $content,
Encoding => 'quoted-printable',
);
my $s = $pt_entity->bodyhandle->as_string;
# NAW, we don't need that
# $s = Encode::decode('UTF-8', $s);
# Let's do a little string manip:
$s = reverse($s);
# Well, that's silly! We don't need that one, either!
# $s = Encode::encode('UTF-8', $s);
print $s;
Wide character in print at /Users/justin/Desktop/test7.pl line 37.
And, you will do what I do, and bang your head, some more.
I couldn't fine any info on how to handle things like MIME::Entity and UTF-8 encoding, in the excellent articles available such as this one:
http://ahinea.com/en/tech/perl-unicode-struggle.html
and,
http://perlgeek.de/en/article/encodings-and-unicode
and,
http://juerd.nl/site.plp/perluniadvice
I have this article labeled as, "do not trust"
http://kbinstuff.googlepages.com/perl,unicodeutf8,cgi.pm,apache,mod_perla
Because it states,
6.1. Encode::encode/decode
For start, you should avoid using Encode::encode/decode/from_to to the greatest possible extent in your scripts. This will only lead to great confusion later. You may think you have gotten everything to work, but then a week later, you shall only add a little more functionality to your work and suddenly, everything falls apart and doodles will appear on your web pages.
I guess I understand what they mean - but you'll need to encode your UTF-8 stuff before it exits your program. Always. And, you have to decode UTF-8 info that goes out of your program. Always. How do you do this? Uh-huh, the Encode module.
Like it says in the perldoc for unicode. So, I don't know what this page is yabbering about. I'm sure, behind the scense, Encode is used when open files with a specific encoding:
http://perldoc.perl.org/perluniintro.html#Unicode-I/O
Which, by the way of features, is a pretty rad one.
No comments:
Post a Comment