Over here at
Dada Mail, we do a lot of verification of email addresses when confirming subscriptions. Is the form of the email correct? Has this address been used to try to subscribe with before? Has the user themselves confirmed their subscription? What happens when there's an error? There's a lot to it (sadly) and if something in one of these steps goes wrong, a subscription doesn't happen, or worse yet, a mailing list is tarnished with not-so-valid addresses. This little tale is about
invalid in form addresses getting on people's mailing lists.
A user posted on the
Dada Mail support forums:
"There's an address with a space in it, that's entered my list! It's causing problems with delivery of the ENTIRE list? How did that happen?"
A lot of the time developing a very complicated piece of software is supporting it (I think, anyways). When a question like this comes down the tubes, I have no great answer, except it must have been an outside thing. So I ask,
"Was the address added directly into the backend?" That's something that happens from time to time, although there are many pitfalls: importing data can be done incorrectly - and even if it's done correctly once, the design of the backend can change, and then it's not done correctly anymore.
That's why, if it's an option, it's a good idea to use the API that presented to the developer, rather than try to retinker it, yourself.
When the user replied back with,
"Nope!" I knew I was really stuck.
"But I have tests!" I protested to my Macbook Pro.
"Coverage!" "I didn't roll my own email validator!" "I did everything you're supposed to do!"
So, current tests be-damned, I had to start from square one: Attempt to recreate the problem, in the simplest way I could. So, I tried to join a list, with an address with a space in it, like this:
myaddress@example .com
And, wouldn't you know it, the app took the address like nothing was wrong. Now, totally believing my user, but being in total
disbelief myself, it was time to hit the code. We use
Email::Valid in Dada Mail, to test the validity of an email address, like so:
#!/usr/bin/perl
use Email::Valid;
my $address = 'someone@example.com';
if(Email::Valid->address($address)){
print "Valid!";
}
else {
print "NOPE!";
}
# prints, Valid!
This is wrong. Observe:
#!/usr/bin/perl
use Email::Valid;
my $address = 'someone@example .com'; # notice the space!
if(Email::Valid->address($address)){
print "Valid!";
}
else {
print "NOPE!";
}
# Prints, Valid!
And, there's my problem, in a pretty simple case.
So, what's going on? Basically my code had
assumed that
Email::Valid's address method returned a boolean value. It does not. From Email::Valid's docs (
emphasis is mine):
This is the primary method which determines whether an email address
is valid. It's behavior is modified by the values of mxcheck(),
tldcheck(), local_rules(), fqdn(), and fudge(). If the address passes
all checks, the (possibly modified) address is returned as a string.
Otherwise, the undefined value is returned. In a list context, the
method also returns an instance of the Mail::Address class representing
the email address.
Big difference! So let's try that code snippet, again:
#!/usr/bin/perl
use Email::Valid;
my $address = 'someone@example .com'; # notice the space!
my $validated_address = undef;
if($validated_address = Email::Valid->address($address)){
print "Valid!: $validated_address";
}
else {
print "NOPE!";
}
# prints, Valid!: someone@example.com
Ah-ha:
Email::Valid will take it upon itself to modify the email address for you - but only in certain circumstances - spaces in
certain places of the address. So, this still returns
undef:
#!/usr/bin/perl
use Email::Valid;
my $address = 'some one@example.com'; # notice the space!
my $validated_address = undef;
if($validated_address = Email::Valid->address($address)){
print "Valid!: $validated_address";
}
else {
print "NOPE!";
}
# prints, NOPE!
This certainly makes things confusing. The reason that the email address gets modified, is that under the hood,
Email::Valid turns the address you give it, into a
Mail::Address object (it'll also just accept an
Mail::Address object, and
Mail::Address itself will take out that pesky space in the domain part of our email address, when it's own
address() method returns a string:
#!/usr/bin/perl
use Mail::Address;
my $address = 'someone@example .com'; # notice the space!
my $modified = ( Mail::Address->parse($address) )[0]->address;
print $modified;
# prints, someone@example.com
This makes it difficult to patch up
Email::Valid, since then I'd have to delve into
Mail::Address - and perhaps
Mail::Address has a perfectly good reason to do this. Since both
Email::Valid and
Mail::Address have been around for so long, and many programs have been written with their API the way it is, it's a pretty good bet they'll stay this way.
So, what to do? Since my app's validation system is a little more than, "is the form correct?" it's difficult in the workflow to take in account of the changes
Email::Valid makes to the address I pass - I'm not sure if I even like the idea: getting a boolean back in a validation method seems a better idea - most of my own valiation methods in the app have the pattern of, return a boolean status, plus a hashref saying what may be wrong,
# In a perfect world:
my ($status, $errors) = $validater->check_out_this_address($address);
if($status == 1){
# Good to go!
}
else {
print "Dude, PROBLEMS:\n";
for (keys %$errors){
print '* ' . $_ . "\n";
}
}
So, for example, say I'm importing a whole address book of addresses - thousands of addresses! And before I do that, I sort out the duplicate addresses, as the validation steps are many, and cpu consuming. Using
Email::Valid the correct way could introduce subtle bugs into the system. Simplifying things:
#!/usr/bin/perl
use Email::Valid;
# Say, I've already sorted out dupes:
my @check_these =
(
'one@foobar.com',
'another@foobar.com',
'another@foobar .com',
'two@foobar.com',
'blech'
);
my @validated = ();
my @invalid = ();
for(@check_these){
my $address;
if($address = Email::Valid->address($_)) {
push(@validated, $address);
}
else {
push(@invalid, $_);
}
}
print "Good! Addresses:\n";
print "* $_\n" for @validated;
print "BAD! Addresses:\n";
print "* $_\n" for @invalid;
This prints out,
Good! Addresses:
* one@foobar.com
* another@foobar.com
* another@foobar.com
* two@foobar.com
BAD! Addresses:
* blech
As you can see, some weird things happen:
- The number of addresses total that are returned isn't the same as what gets passed
- All the invalid address are not reported
- my once unique list now isn't so unique anymore.
I'm not sure yet what the best thing to do is. For now, sadly, I have to do something hacky in my own email (in form) validator - I just look for a space, if there's a space, it's automatically invalid:
#!/usr/bin/perl
use Email::Valid;
my $address = 'someone@example. com'; # notice the space!
if(is_valid_email($address)){
print "Valid!";
}
else {
print "Nope!";
}
sub is_valid_email {
my $a = shift;
if($a =~ m/\s/){
return 0;
}
if(Email::Valid->address($a)){
return 1;
}
else {
return 0;
}
}
# prints, Nope!
So many miscalculations on my part, which! means much to learn from my own dumb self:
Firstly, I couldn't believe it was a bug, since the way things have worked hadn't been changed in eons.
Next: way back when I first wrote put this validation stuff in the app yarns ago, I was thinking that
Email::Valid was nothing but a nice wrapper around a very, very complex regex created by
Jeffrey Friedl back in the day for a book on Regular Expressions that he wrote.
It looks a little like this. It's a doozy.
Another miscalculation I had made was,
"I have tests! How could this fail!" and I made the assumption the user was in fact on the wrong. So, I looked at my tests ,and whatayaknow:
TODO: {
local $TODO = 'This test fails.';
ok(valid_email_sub('test@example .com') == 0);
};
So I actually knew about this problem before, and forgot - there's actually a link to a bug tracker in the actual test, which means I had handed off the problem to someone else to fix.
Lazy!
Things to learn:
- Read the docs! This very peculiarity is documented in Email::Valid's own docs. Don't assume!
Let's see an example of how the address may be modified:
$addr = Email::Valid->address('Alfred Neuman ');
print "$addr\n"; # prints Neuman@foo.bar
- Just because you think everything's working, doesn't mean there aren't weird little edge cases - it's probably a given.
- Tests are good! Skipping tests that don't pass doesn't always help you.
- Reporting bugs in other modules is good! Trying to fix them? Probably better.
My thanks to
rjbs for being patient as I stumble around a module he has been nice enough to adopt, warts and all.