May 24, 2013

List Hygiene 101: More Than You Think

James CarnerBy James Carner, founder, eHygienics.com

What goes into list hygiene, besides removing traps, bounces, complainers and litigators? Many publishers purchase, lease or trade email lists. Each week my company scrubs anywhere from 5-10 GB of new data. That’s roughly 200 million emails every seven days, if not more, “worldwide.” With this much data trading hands, it’s imperative to keep it clean. List hygiene is not just about removing spamtraps, complainers, bounces and litigators — there’s a lot more.

Have you ever scrolled down 1 million email addresses line after line? If it’s not sorted properly, it can look very intimidating. However, when you sort it alphabetically, it’s easier to manage if you are manually removing bad emails.

If you alphabetize the list and start from the top, you will see that most are alphanumeric email addresses. Most of these look suspicious, so it’s probably a good idea to remove them. However, once you get down to the “aaa@…” and the rest of the good-looking emails, eyeballing suspicious emails and removing them is impossible.

When you are ready to work with a list hygiene company or plan on doing it yourself, make sure you address the following issues:

Syntax errors
When one thinks of syntax errors, the first thing that comes to mind is a weird-looking email with characters like “abc^@domain.com.” These are really common and easy to remove, but syntax errors go beyond this type of issue. Opening and saving files in different programs can also corrupt a file. For example, TextEdit on Mac → PC Microsoft Excel 2007 → PC Microsoft Excel 2010 can commonly exchange programs for a text file for editing and forward it like this. All three programs read files completely differently and can easily corrupt a file before it gets scrubbed or uploaded to an ESP. The file may look normal when you open it in each program, but it’s not.

Departmentalized emails
These emails are your typical “info@…” or “webmaster@…” These are simple to remove too, but there are a lot more departmentalized emails out there than you think. They are invented daily because of the threat of dictionary attacks. Spamtraps hide in these more often than you think, especially in “info@…” and “webmaster@…” These are common to remove, and spam advisories realize this. So what they are doing now is condensing or rephrasing their words — instead of “webmaster@…”, they now use “wbmr@…”, etc.

Seed removal
You can easily spot seeds at the beginning of an alphabetized list. They look something like “4dr1p2v3m3t2d@domain.com.” Most are computer-generated to track, for whatever reason, when one emails to it. The hard ones to spot and remove have names associated with it. They look something like “moonstar92n3g6a0b@domain.com.” These are tough to remove unless you know what you are doing. You need a smart algorithm with conditions and written rules using regular expressions that can pinpoint these addresses.

Operating-system errors
Transferring files back and forth between operating systems (Mac/Windows/Linux/Unix/IBM) can easily corrupt files. In fact, it’s quite common. Although a text file appears to look normal and uncorrupted, the code can have corruption, which can render a text file useless or can halt a program from scrubbing.

Fake emails
When filling out forms, consumers will either give you their real email or a fake one. Fake emails can look like seeds, scribble or a real address. Differentiating between these can be tough unless you understand verification and mail exchange (MX) record checking.

Temporary, disposable or time bomb emails
If a consumer does not want to release their identity, they can create expiration emails through a handful of services. GuerrillaMail, 10 Minute Mail, Mailinator, YOPmail and thousands of free programs give consumers anonymity. These domains need to be updated weekly for removal.

Mail exchange verification (MX scrubbing)
Just because an email has a working domain doesn’t mean it’s valid. By scrubbing against the mail exchange records, one can find out a lot about an email address and if it’s OK to mail to without consequence. A good hygiene company should check MX records for a working domain, verify RDNS, perform open relay checks, judge response-time performance, validate working HTML/XHTML websites, and if it is parked or sends other warnings. MX scrubbing also does a good job of removing hard bounces.

Spam advisory IP blocking
Out of the hundreds of spam advisories that are currently swallowing spam, there are approximately 1 million IP addresses that are owned by these organizations. If you want to understand who all the current spam advisory companies are, see here.

Miscellaneous
Empty spaces, or leading and trailing spaces before and after emails are common and should be fixed. Emails longer than 45 characters in length are suspicious. Domains that start with numbers, or emails with numbers only are questionable. Duplicates should be removed. Threat endings like .gov should be tossed. And finally, foreign emails inside a U.S. list should be removed.

All of the above processes should include an algorithm string finder to add to a suppression list and should be mandatory. Strings are often found in traps and seeds. Some bots use the same alphanumeric characters to track emails. For example: “3e4r5t6y@…” and “3e4r5t6z@…” Both emails are identical except for the last character. The familiar string that should be pulled and added to all suppressions should be “3e4r5t6”. You will need smart technology to differentiate between names and gibberish. If the system finds common emails like “jamescarner1” and “jamescarner2,” it should not add the string “jamescarner.”

Any list, whether it’s double opt-in or harvested straight from the Web, is going to have all of the aforementioned problems that need to be addressed. Ten years ago, you could get away with mailing without cleaning yours lists, but nowadays, you really need to get it done professionally. If you’re a publisher and do it yourself, keep in mind that you don’t personally eat, breathe and sleep list hygiene. There will be limitations between your own hygiene and a company that does it 24/7. If anything, please make sure you address the above issues to ensure you have the least amount of threats in your lists.


Founder of eHygienics.com (formerly Quickie Marketing) in 2003, James Carner created The Viral Spiral, an Internet marketing newsletter that catered to affiliates and publishers. eHygienics.com metamorphosed from a publisher into a professional email list hygiene company after mailing offers for a few years. James became a private investigator of spam advisory organizations worldwide studying and interpreting black-hole trap servers used to capture spam. Now this knowledge is fed into eHygienics.com’s self-scrubbing, real-time API platform, which is used by its subscribers.

eHygienics.com removes threats from all types of email marketing lists and caters to hundreds of list owners and publishers.

Speak Your Mind