http://en.wikipedia.org/wiki/Anti-spam_techniques
To prevent
email spam (aka unsolicited bulk email), both end users and
administrators of email systems use various
anti-spam techniques.
Some of these techniques have been embedded in products, services and
software to ease the burden on users and administrators. No one
technique is a complete solution to the spam problem, and each has
trade-offs between incorrectly rejecting legitimate email vs. not
rejecting all spam, and the associated costs in time and effort.
Anti-spam techniques can be broken into four broad categories: those
that require actions by individuals, those that can be automated by
email administrators, those that can be automated by email senders and
those employed by researchers and law enforcement officials.
Contents
Detecting spam
Checking words: false positives
People tend to be much less bothered by spam slipping through filters into their mail box (
false negatives), than having desired email ("ham") blocked (
false positives).
Trying to balance false negatives (missed spams) vs false positives
(rejecting good email) is critical for a successful anti-spam system.
Some systems let individual users have some control over this balance by
setting "spam score" limits, etc. Most techniques have both kinds of
serious errors, to varying degrees. So, for example, anti-spam systems
may use techniques that have a high false negative rate (miss a lot of
spam), in order to reduce the number of false positives (rejecting good
email).
Detecting spam based on the content of the email, either by detecting
keywords such as "viagra" or by statistical means (content or
non-content based), is very popular. Content based statistical means or
detecting keywords can be very accurate when they are correctly tuned to
the types of legitimate email that an individual gets, but they can
also make mistakes such as detecting the keyword "
cialis" in the word "specialist" (see also
Internet censorship: Over- and under-blocking).
The content also doesn't determine whether the email was either
unsolicited or bulk, the two key features of spam. So, if a friend sends
you a joke that mentions "viagra", content filters can easily mark it
as being spam even though it is neither unsolicited nor sent in bulk.
Non-content base statistical means can help lower false positives
because it looks at statistical means vs. blocking based on
content/keywords. Therefore, you will be able to receive the friend who
sends you a joke that mentions "viagra".
Lists of sites
The most popular
DNSBLs
(DNS Blacklists) are lists of IP addresses of known spammers, known
open relays, known proxy servers, compromised “zombie” spammers, as well
as hosts on the internet that shouldn’t be sending external emails,
such as the end-user address space of a consumer ISP. These are known as
“Dial Up Lists”, from the time when end users had to dial up to the
internet with a modem and a phone line.
Spamtraps
are often email addresses that were never valid or have been invalid
for a long time that are used to collect spam. An effective spamtrap is
not announced and is only found by
dictionary attacks
or by pulling addresses off hidden webpages. For a spamtrap to remain
effective the address must never be given to anyone. Some black lists,
such as
spamcop, use spamtraps to catch spammers and blacklist them.
Enforcing technical requirements of the
Simple Mail Transfer Protocol (SMTP) can be used to block mail coming from systems that are not compliant with the
RFC standards.
A lot of spammers use poorly written software or are unable to comply
with the standards because they do not have legitimate control of the
computer sending spam (
zombie computer). So by setting restrictions on the
mail transfer agent
(MTA) a mail administrator can reduce spam significantly, such as by
enforcing the correct fall back of Mail eXchange (MX) records in the
Domain Name System, or the correct handling of delays (
Teergrube).
End-user techniques
There are a number of techniques that individuals can use to restrict
the availability of their email addresses, reducing or preventing their
attractiveness to spam.
Discretion
Sharing an email address only among a limited group of correspondents
is one way to limit spam. This method relies on the discretion of all
members of the group, as disclosing email addresses outside the group
circumvents the trust relationship of the group. For this reason,
forwarding messages to recipients who don't know one another should be
avoided. When it is absolutely necessary to forward messages to
recipients who don't know one another, it is good practice to list the
recipient names all after "bcc:" instead of after "to:". This practice
avoids the scenario where unscrupulous recipients might compile a list
of email addresses for spamming purposes. This practice also reduces the
risk of the address being distributed by computers affected with email
address harvesting malware. However, once the privacy of the email
address is lost by divulgence, it cannot be regained.
Address munging
Posting anonymously, or with a fake name and address, is one way to avoid
email address harvesting,
but users should ensure that the fake address is not valid. Users who
want to receive legitimate email regarding their posts or Web sites can
alter their addresses so humans can figure out but spammers cannot. For
instance,
joe@example.com might post as
joeNOS@PAM.invalid.example.com.
Address munging, however, can cause legitimate replies to be lost. If
it's not the user's valid address, it has to be truly invalid, otherwise
someone or some server will still get the spam for it.
[1] Other ways use
transparent address munging
to avoid this by allowing users to see the actual address but obfuscate
it from automated email harvesters with methods such as displaying all
or part of the email address on a web page as an image, a text logo
shrunken to normal size using in-line
CSS, or as jumbled text with the order of characters restored using CSS.
Avoid responding to spam
Spammers often regard responses to their messages—even responses like
"Don't spam me"—as confirmation that an email address is valid.
Likewise, many spam messages contain Web links or addresses which the
user is directed to follow to be removed from the spammer's mailing
list. In several cases, spam-fighters have tested these links,
confirming they do not lead to the recipient address's removal—if
anything, they lead to more spam. This removal request of filing a
complaint may get the address list washed. To lower complaints so the
spammer can stay active before having to acquire new accounts and/or
internet provider.
Sender addresses are often forged in spam messages, including using
the recipient's own address as the forged sender address, so that
responding to spam may result in failed deliveries or may reach innocent
email users whose addresses have been abused.
In
Usenet,
it is widely considered even more important to avoid responding to
spam. Many ISPs have software that seek and destroy duplicate messages.
Someone may see a spam and respond to it before it is cancelled by their
server, which can have the effect of reposting the spam for them; since
it is not a duplicate, the reposted copy will last longer. Replying may
also cause the poster to be falsely linked to as part of the spam
message.
Contact forms
Contact forms allow users to send email by filling out forms in a web
browser. The web server takes the form data, forwarding it to an email
address. Users never see the email address. Such forms, however, are
sometimes inconvenient to users, as they are not able to use their
preferred email client, risk entering a faulty reply address, and are
typically not notified about delivery problems. Further, contact forms
have the drawback that they require a website that supports server side
scripts. Finally, if the software used to run the contact forms is badly
designed, it can become a spam tool in its own right. Additionally,
some spammers have begun to send spam using the contact form.
[citation needed]
Disable HTML in email
Many modern mail programs incorporate
Web browser functionality, such as the display of
HTML, URLs, and images. This can easily expose the user to offensive images in spam. In addition, spam written in HTML can contain
web bugs which allows spammers to see that the email address is valid and that the message has not been caught in spam filters.
JavaScript
programs can be used to direct the user's Web browser to an advertised
page, or to make the spam message difficult to close or delete. Spam
messages have contained attacks upon security vulnerabilities in the
HTML renderer, using these holes to install
spyware. (Some
computer viruses are borne by the same mechanisms.)
Mail clients which do not automatically download and display HTML,
images or attachments, have fewer risks, as do clients who have been
configured to not display these by default.
Disposable email addresses
An email user may sometimes need to give an address to a site without
complete assurance that the site owner will not use it for sending
spam. One way to mitigate the risk is to provide a
disposable
email address—a temporary address which the user can disable or abandon
which forwards email to a real account. A number of services provide
disposable address forwarding. Addresses can be manually disabled, can
expire after a given time interval, or can expire after a certain number
of messages have been forwarded. Disposable email addresses can be used
by users to track whether a site owner has disclosed an address. This
capability has resulted in legal jeopardy for sites that disclose
confidential addresses without permission.
[2]
Ham passwords
Systems that use ham passwords ask unrecognised senders to include in
their email a password that demonstrates that the email message is a
"ham" (not spam) message. Typically the email address and ham password
would be described on a web page, and the ham password would be included
in the "subject" line of an email address. Ham passwords are often
combined with filtering systems, to counter the risk that a filtering
system will accidentally identify a ham message as a spam message.
[3]
The "
plus addressing" technique appends a password to the "username" part of the email address.
Reporting spam
Tracking down a spammer's ISP and reporting the offense can lead to
the spammer's service being terminated. Unfortunately, it can be
difficult to track down the spammer—and while there are some online
tools to assist, they are not always accurate. Occasionally, spammers
employ their own netblocks. In this case, the abuse contact for the
netblock can be the spammer itself and can confirm your address.
Examples of these online tools are
SpamCop and
Network Abuse Clearinghouse.
They provide automated or semi-automated means to report spam to ISPs.
Some spam-fighters regard them as inaccurate compared to what an expert
in the email system can do; however, most email users are not experts.
A free tool called Complainterator may be used in the reporting of
spam. The Complainterator will send an automatically generated complaint
to the registrar of the spamming domain and the registrar of its name
servers.
Historically, reporting spam in this way has not seriously abated
spam, since the spammers simply move their operation to another URL, ISP
or network of IP addresses.
Consumers may also forward "unwanted or deceptive spam" to an email address (
spam@uce.gov) maintained by the FTC. The database collected is used to prosecute perpetrators of scam or deceptive advertising.
An alternative to contacting ISPs is to contact the registrar of a
domain name that has used in spam email. Registrars, as ICANN-accredited
administrative organizations, are obliged to uphold certain rules and
regulations, and have the resources necessary for dealing with abuse
complaints.
Responding to spam
Some advocate responding aggressively to spam—in other words, "spamming the spammer".
The basic idea is to make spamming less attractive to the spammer, by
increasing the spammer's overhead. There are several ways to reach a
spammer, but besides the
caveats mentioned above, it may lead to retaliations by the spammer.
- Replying directly to the spammer's email address[4]
- Just clicking "reply" will not work in the vast majority of cases,
since most of the sender addresses are forged or made up. In some cases,
however, spammers do provide valid addresses, as in the case of Nigerian scams.[5]
- Targeting the computers used to send out spam
- In 2005, IBM announced a service to bounce spam directly to the computers that send out spam.[6]
Because the IP addresses are identified in the headers of every
message, it would be possible to target those computers directly,
sidestepping the problem of forged email addresses. In most cases,
however, those computers do not belong to the real spammer, but to
unsuspecting users with unsecured or outdated systems, hijacked through malware and controlled at distance by the spammer; these are known as zombie computers.
However, in most legal jurisdictions, ignorance is no defense, and many
victims of spam regard the owners of zombie computers as willfully
compliant accomplices of spammers.
- Leaving messages on the spamvertised site
- Spammers selling their wares need a tangible point of contact so
that customers can reach them. Sometimes it is a telephone number, but
most often is a web site containing web forms
through which customers can fill out orders or inquiries, or even
"unsubscribe" requests. Since positive response to spam is probably much
less than 1/10,000,[original research?]
if just a tiny percentage of users visit spam sites just to leave
negative messages, the negative messages could easily outnumber positive
ones, incurring costs for spammers to sort them out, not mentioning the
cost in bandwidth. An automated system, designed to respond in just
such a way, was Blue Frog.
Unfortunately, in doing so, you risk arousing the ire of criminals who
may respond with threats or 'target' your address with even more spam.[citation needed]
Automated techniques for email administrators
There are a number of appliances, services, and software systems that
email administrators can use to reduce the load of spam on their
systems and mailboxes. Some of these depend upon rejecting email from
Internet sites known or likely to send spam. Other more advanced
techniques analyze message patterns in real time to detect spam like
behavior and then compares it to global databases of spam. Those methods
are capable of detecting spam in real time even when there is no
content (common to image based spam) and in any language. Another method
relies on automatically analyzing the content of email messages and
weeding out those which resemble spam. These three approaches are
sometimes termed
blocking,
pattern detection, and
filtering.
There is an increasing trend of integration of anti-spam techniques into
MTAs
whereby the mail systems themselves also perform various measures that
are generally referred to as filtering, ultimately resulting in spam
messages being rejected before delivery (or
blocked).
Many filtering systems take advantage of
machine learning
techniques, which improve their accuracy over manual methods. However,
some people find filtering intrusive to privacy, and many email
administrators prefer blocking to deny access to their systems from
sites tolerant of spammers.
Authentication and reputation
A number of systems have been proposed to allow acceptance of email
from servers which have authenticated in some fashion as senders of only
legitimate email. Many of these systems use the DNS, as do
DNSBLs;
but rather than being used to list nonconformant sites, the DNS is used
to list sites authorized to send email, and (sometimes) to determine
the reputation of those sites. Other methods of identifying ham
(non-spam email) and spam are still used.
Authentication systems cannot detect whether a message is spam.
Rather, they allow a site to express trust that an authenticated site
will not send spam. Thus, a recipient site may choose to skip expensive
spam-filtering methods for messages from authenticated sites.
Challenge/response systems
Another method which may be used by internet service providers, by
specialized services or enterprises to combat spam is to require unknown
senders to pass various tests before their messages are delivered.
These strategies are termed
challenge/response systems or
C/R.
Some view their use as being as bad as spam since they place the burden
of spam fighting on legitimate email senders—who it should be noted
will often indeed give up at the slightest hindrance. A new
implementation of this is done in
Channel email.
Checksum-based filtering
Checksum-based filter exploits the fact that the messages are
sent in bulk, that is that they will be identical with small variations.
Checksum-based filters strip out everything that might vary between
messages, reduce what remains to a
checksum,
and look that checksum up in a database which collects the checksums of
messages that email recipients consider to be spam (some people have a
button on their email client which they can click to nominate a message
as being spam); if the checksum is in the database, the message is
likely to be spam.
The advantage of this type of filtering is that it lets ordinary
users help identify spam, and not just administrators, thus vastly
increasing the pool of spam fighters. The disadvantage is that spammers
can insert unique invisible gibberish—known as
hashbusters—into the middle of each of their messages, thus making each message unique and having a different checksum. This leads to an
arms race between the developers of the checksum software and the developers of the spam-generating software.
Checksum based filtering methods include:
Country-based filtering
Some email servers expect to never communicate with particular
countries from which they receive a great deal of spam. Therefore, they
use country-based filtering - a technique that blocks email from certain
countries. This technique is based on country of origin determined by
the sender's IP address rather than any trait of the sender.
DNS-based blacklists
DNS-based Blacklists, or
DNSBLs, are used for
heuristic filtering and blocking. A site publishes lists (typically of IP addresses) via the
DNS,
in such a way that mail servers can easily be set to reject mail from
those sources. There are literally scores of DNSBLs, each of which
reflects different policies: some list sites known to emit spam; others
list
open mail relays or proxies; others list ISPs known to support spam.
Other DNS-based anti-spam systems list known good ("white") or bad ("black") IPs domains or URLs, including RHSBLs and URIBLs.
Enforcing RFC standards
Analysis of an email's conformation to RFC standards for the
Simple Mail Transfer Protocol
(SMTP) can be used to judge the likelihood of the message being spam. A
lot of spammers use poorly written software or are unable to comply
with the standards because they do not have legitimate control of the
computer they are using to send spam (
zombie computer). By setting limits on the deviation from RFC standards that the
MTA will accept, a mail administrator can reduce spam significantly.
Greeting delay
A greeting delay is a deliberate pause introduced by an SMTP server
before it sends the SMTP greeting banner to the client. The client is
required to wait until it has received this banner before it sends any
data to the server. (per
RFC 5321
3.2). Many spam-sending applications do not wait to receive this
banner, and instead start sending data as soon as the TCP connection is
established. The server can detect this, and drop the connection.
There are some legitimate sites that play "fast and loose" with the
SMTP specifications, and may be caught by this mechanism. It also has a
tendency to interact badly with sites that perform
callback verification, as common callback verification systems have timeouts that are much shorter than those mandated by
RFC 5321 4.5.3.2.
Greylisting
Main article:
Greylisting
The
SMTP
protocol allows for temporary rejection of incoming messages.
Greylisting is the technique to temporarily reject messages from unknown
sender mail servers. A temporary rejection is designated with a 4xx
error code that is recognized by all normal MTAs, which then proceed to
retry delivery later.
Greylisting is based on the premise that spammers and spambots will
not retry their messages but instead will move on to the next message
and next address in their list. Since a retry attempt means the message
and state of the process must be stored, it inherently increases the
cost incurred by the spammer. The assumption is that, for the spammer,
it's a better use of resources to try a new address than waste time
re-sending to an address that's already exhibited a problem. For a
legitimate message this delay is not an issue since retrying is a
standard component of any legitimate sender's server.
The downside of greylisting is that all legitimate messages from
first time senders will experience a delay in delivery, with the delay
period before a new message is accepted from an unknown sender usually
being configurable in the software. There also exists the possibility
that some legitimate messages won't be delivered, which can happen if a
poorly configured (but legitimate) mail server interprets the temporary
rejection as a permanent rejection and sends a bounce message to the
original sender, instead of trying to resend the message later, as it
should.
HELO/EHLO checking
For example, some
spamware can be detected by a number of simple checks confirming compliance with standard addressing and MTA operation.
RFC 5321
section 4.1.4 says that "An SMTP server MAY verify that the domain name
argument in the EHLO command actually corresponds to the IP address of
the client. However, if the verification fails, the server MUST NOT
refuse to accept a message on that basis.", so to be in compliance with
the RFCs, rejecting connections must be based on additional
information/policies.
- Refusing connections from hosts that give an invalid HELO - for example, a HELO that is not an FQDN or is an IP address not surrounded by square brackets
Invalid HELO localhost
Invalid HELO 127.0.0.1
Valid HELO domain.tld
Valid HELO [127.0.0.1]
- Refusing connections from hosts that give an obviously fraudulent HELO
Fraudulent HELO friend
Fraudulent HELO -232975332
- Refusing to accept email claiming to be from a hosted domain when the sending host has not authenticated
- Refusing to accept email whose HELO/EHLO argument does not resolve
in DNS. Unfortunately, some email system administrators ignore section
2.3.5 of RFC 5321 and administer the MTA to use a nonresolvable argument to the HELO/EHLO command.
Invalid pipelining
The SMTP protocol can allow several SMTP commands to be placed in one
network packet and "pipelined". For example, if an email is sent with a
CC: header, several SMTP "RCPT TO" commands might be placed in a single
packet instead of one packet per "RCPT TO" command. The SMTP protocol,
however, requires that errors be checked and everything is synchronized
at certain points. Many spammers will send everything in a single packet
since they do not care about errors and it is more efficient. Some MTAs
will detect this invalid pipelining and reject email sent this way.
Nolisting
The
SMTP protocol requires that email servers for any given domain be provided in a prioritized list (namely,
MX records),
and further specifies mandatory error-handling behavior when servers in
that list cannot be contacted. Nolisting is a technique of purposely
creating unreachable MX records, so that only senders who have
implemented this error-handling behavior can successfully deliver mail.
Quit detection
The
SMTP protocol requires connections to be closed with a QUIT command. (
RFC 5321
section 4.1.4) Many spammers skip this step because their spam has
already been sent and taking the time to properly close the connection
takes time and bandwidth. Some MTAs like
Exim
are capable of detecting whether or not the connection is closed with
the quit command and can track patterns of use for the purpose of
building
DNSBLs.
Honeypots
Another approach is simply an imitation MTA which gives the
appearance of being an open mail relay, or an imitation TCP/IP proxy
server which gives the appearance of being an open proxy. Spammers who
probe systems for open relays/proxies will find such a host and attempt
to send mail through it, wasting their time and resources and
potentially revealing information about themselves and the origin of the
spam they're sending to the entity that operates the honeypot. Such a
system may simply discard the spam attempts, submit them to
DNSBLs, or store them for analysis.
Hybrid filtering
Hybrid filtering, such as is implemented in the open source programs
SpamAssassin and
Policyd-weight
uses some or all of the various tests for spam, and assigns a numerical
score to each test. Each message is scanned for these patterns, and the
applicable scores tallied up. If the total is above a fixed value, the
message is rejected or flagged as spam. By ensuring that no single spam
test by itself can flag a message as spam, the false positive rate can
be greatly reduced.
Outbound spam protection
Outbound spam protection involves scanning email traffic as it
exits a network, identifying spam messages and then taking an action
such as blocking the message or shutting off the source of the traffic.
Outbound spam protection can be implemented on a network-wide level
(using
policy-based routing or similar techniques to route
SMTP
messages to a filtering service). Or, it can be implemented within a
standard SMTP gateway. While the primary economic impact of
spam
is on spam recipients, sending networks also experience financial
costs, such as wasted bandwidth, and the risk of having IP addresses
blocked by receiving networks.
The advantage of outbound spam protection is that it stops spam
before it leaves the sending network, protecting receiving networks
globally from the damage and costs that would otherwise be caused by the
spam. Further it lets system administrators track down spam sources on
the network and remediate them – for example, providing free anti-virus
tools to customers whose machines have become infected with a
virus or are participating in a
botnet.
Given an appropriately designed spam filtering algorithm, outbound spam
filtering can be implemented with a near zero false positive rate,
which keeps customer related issues with blocked legitimate email down
to a minimum.
When dealing with outbound spam, it's important to not only analyze
the content of individual messages, but also to keep track of the
behaviour of email senders over time. Senders exhibiting suspicious
behaviour should be rate limited to reduce the likelihood that they will
send spam, which may get past even a good filter.
There are several commercial software vendors who offer specialized outbound spam protection products, including
MailChannels and
Commtouch. Open source options such as
SpamAssassin may also be effective.
Pattern detection
Pattern detection, is an approach to stop spam in real time
before it gets to the end user. This technology monitors a large
database of messages worldwide to detect spam patterns. Many spam
messages have no content or may contain attachments which this method of
detection can catch. Pioneered by
Commtouch,
a developer of anti-spam software, their Recurrent Pattern Detection
(RPD) software can be integrated into other appliances and applications.
This method is more automated than most because the service provider
maintains the comparative spam database instead of the system
administrator.
PTR/reverse DNS checks
The PTR DNS records in the reverse DNS can be used for a number of things, including:
- Most email mail transfer agents (mail servers) use a forward-confirmed reverse DNS (FCrDNS) verification and if there is a valid domain name, put it into the "Received:" trace header field.
- Some email mail transfer agents will perform FCrDNS verification on
the domain name given in the SMTP HELO and EHLO commands. See #HELO/EHLO checking.
- To check the domain names in the rDNS to see if they are likely from
dial-up users, dynamically assigned addresses, or home-based broadband
customers. Since the vast majority, but by no means all, of email that
originates from these computers is spam, many mail servers also refuse
email with missing or "generic" rDNS names.[7]
- A Forward Confirmed reverse DNS verification can create a form of
authentication that there is a valid relationship between the owner of a
domain name and the owner of the network that has been given an IP
address. While reliant on the DNS infrastructure, which has known
vulnerabilities, this authentication is strong enough that it can be
used for whitelisting purposes because spammers and phishers cannot usually bypass this verification when they use zombie computers to forge the domains.
Rule-based filtering
Content filtering techniques rely on the specification of lists of words or
regular expressions
disallowed in mail messages. Thus, if a site receives spam advertising
"herbal Viagra", the administrator might place this phrase in the filter
configuration. The mail server would then reject any message containing
the phrase.
Header filtering is the means of inspecting the header of the email,
the part of the message that contains information about the origin,
destination and content of the message. Spammers will often
spoof
fields in the header in order to hide their identity, or to try to make
the email look more legitimate than it is; many of these spoofing
methods can be detected. Also, a violation of the
RFC 5322 standard on how the header is to be formed can serve as a basis for rejecting the message.
Disadvantages of filtering are threefold: First, filtering can be
time-consuming to maintain. Second, it is prone to false positives.
Third, these false positives are not equally distributed: since content
filtering is prone to reject legitimate messages on topics related to
products frequently advertised in spam. A system administrator who
attempts to reject spam messages which advertise mortgage refinancing,
credit or debt may inadvertently block legitimate email on the same
subject.
Spammers frequently change the phrases and spellings they use. This
can mean more work for the administrator. However, it also has some
advantages for the spam fighter. If the spammer starts spelling "Viagra"
as "V1agra" (see
leet)
or "Via_gra", it makes it harder for the spammer's intended audience to
read their messages. If they try to trip up the phrase detector, by,
for example, inserting an invisible-to-the-user HTML
comment
in the middle of a word ("Via<!---->gra"), this sleight of hand
is itself easily detectable, and is a good indication that the message
is spam. And if they send spam that consists entirely of images, so that
anti-spam software can't analyze the words and phrases in the message,
the fact that there
is no readable text in the body can be detected, making that message a higher risk of being spam.
Content filtering can also be implemented to examine the
URLs present (i.e.
spamvertising)
in an email message. This form of content filtering is much harder to
disguise as the URLs must resolve to a valid domain name. Extracting a
list of such links and comparing them to published sources of
spamvertised domains is a simple and reliable way to eliminate a large
percentage of spam via content analysis.
Sender-supported whitelists and tags
There are a small number of organizations which offer IP whitelisting
and/or licensed tags that can be placed in email (for a fee) to assure
recipients' systems that the messages thus tagged are not spam. This
system relies on legal enforcement of the tag. The intent is for email
administrators to whitelist messages bearing the licensed tag.
A potential difficulty with such systems is that the licensing
organization makes its money by licensing more senders to use the
tag—not by strictly enforcing the rules upon licensees. A concern exists
that senders whose messages are more likely to be considered spam would
accrue a greater benefit by using such a tag. The concern is that these
factors form a
perverse incentive
for licensing organizations to be lenient with licensees who have
offended. However, the value of a license would drop if it was not
strictly enforced, and financial gains due to enforcement of a license
itself can provide an additional incentive for strict enforcement.
SMTP callback verification
Since a large percentage of spam has forged and invalid sender
("from") addresses, some spam can be detected by checking that this
"from" address is valid. A mail server can try to verify the sender
address by making an SMTP connection back to the mail exchanger for the
address, as if it was creating a bounce, but stopping just before any
email is sent.
Callback verification can be compliant with SMTP RFCs, but it has various drawbacks. Since nearly all spam has forged
return addresses,
nearly all callbacks are to innocent third party mail servers that are
unrelated to the spam. At the same time, there will be numerous false
negatives due to spammers abusing real addresses and some false
positives.
SMTP proxy
SMTP proxies allow combating spam in real time, combining sender's
behavior controls, providing legitimate users immediate feedback,
eliminating a need for quarantine.
Spamtrapping
Spamtrapping is the seeding of an email address so that spammers can
find it, but normal users can not. If the email address is used then the
sender must be a spammer and they are black listed.
As an example, consider the email address "spamtrap@example.org". If
this email address were placed in the source HTML of our web site in a
way that it isn't displayed on the web page, normal humans would not see
it. Spammers, on the other hand, use web page scrapers and bots to
harvest email addresses from HTML source code so they would find this
address.
When the spammer sends mail with the destination address of
"spamtrap@example.org" the SpamTrap knows this is highly likely to be a
spammer and can take appropriate action.
Statistical content filtering
Statistical (or Bayesian) filtering once set up, requires no
administrative maintenance per se: instead, users mark messages as spam
or nonspam and the filtering software learns from these judgements.
Thus, a statistical filter does not reflect the software author's or
administrator's biases as to content, but rather the
user's
biases. For example, a biochemist who is researching Viagra won't have
messages containing the word "Viagra" automatically flagged as spam,
because "Viagra" will show up often in his or her legitimate messages.
Still,
spam emails containing the word "Viagra" do get filtered
because the content of the rest of the spam messages differs
significantly from the content of legitimate messages. A statistical
filter can also respond quickly to changes in spam content, without
administrative intervention, as long as users consistently designate
false negative messages as spam when received in their email.
Statistical filters can also look at message headers, thereby
considering not just the content but also peculiarities of the transport
mechanism of the email.
Typical statistical filtering uses single words in the calculations
to decide if a message should be classified as spam or not. A more
powerful calculation can be made using groups of two or more words taken
together. Then random "noise" words can not be used as successfully to
fool the filter.
Software programs that implement statistical filtering include
Bogofilter,
DSPAM,
SpamBayes,
ASSP, the email programs
Mozilla and
Mozilla Thunderbird,
Mailwasher, and later revisions of
SpamAssassin. Another interesting project is
CRM114 which hashes phrases and does bayesian classification on the phrases.
There is also the free mail filter
POPFile, which sorts mail in as many categories as the user wants (family, friends, co-worker, spam, whatever) with Bayesian filtering.
Tarpits
A
tarpit is any server software which intentionally responds
pathologically slowly to client commands. By running a tarpit which
treats acceptable mail normally and known spam slowly or which appears
to be an open mail relay, a site can slow down the rate at which
spammers can inject messages into the mail facility. Many systems will
simply disconnect if the server doesn't respond quickly, which will
eliminate the spam. However, a few legitimate email systems will also
not deal correctly with these delays.
Automated techniques for email senders
There are a variety of techniques that email senders use to try to
make sure that they do not send spam. Failure to control the amount of
spam sent, as judged by email receivers, can often cause even legitimate
email to be blocked and for the sender to be put on
DNSBLs.
Background checks on new users and customers
Since spammer's accounts are frequently disabled due to violations of
abuse policies, they are constantly trying to create new accounts. Due
to the damage done to an ISP's reputation when it is the source of spam,
many ISPs and web email providers use
CAPTCHAs
on new accounts to verify that it is a real human registering the
account, and not an automated spamming system. They can also verify that
credit cards are not stolen before accepting new customers, check
the Spamhaus Project ROKSO list, and do other background checks.
Confirmed opt-in for mailing lists
Main article:
Opt in email
One difficulty in implementing opt-in mailing lists is that many
means of gathering user email addresses remain susceptible to forgery.
For instance, if a company puts up a Web form to allow users to
subscribe to a mailing list about its products, a malicious person can
enter other people's email addresses — to harass them, or to make the
company appear to be spamming. (To most anti-spammers, if the company
sends email to these forgery victims, it
is spamming, albeit inadvertently.)
To prevent this abuse, MAPS and other anti-spam organizations encourage that all mailing lists use
confirmed opt-in (also known as
verified opt-in or
double opt-in).
That is, whenever an email address is presented for subscription to the
list, the list software should send a confirmation message to that
address. The confirmation message contains no advertising content, so it
is not construed to be spam itself — and the address is not added to
the live mail list unless the recipient responds to the confirmation
message. See also the
Spamhaus Mailing Lists vs. Spam Lists page.
All modern mailing list management programs (such as
GNU Mailman,
LISTSERV,
Majordomo, and
qmail's ezmlm) support confirmed opt-in by default.
Egress spam filtering
Email senders can do the same type of anti-spam checks on email
coming from their users and customers as can be done for email coming
from the rest of the Internet.
Limit email backscatter
If any sort of
bounce message or anti-virus warning gets sent to a forged email address, the result will be
backscatter.
Problems with sending challenges to forged email addresses can be
greatly reduced by not creating a new message that contains the
challenge. Instead, the challenge can be placed in the
Bounce message when the receiving mail system gives a rejection-code during the
SMTP
session. When the receiving mail system rejects an email this way, it
is the sending system that actually creates the bounce message. As a
result, the bounce message will almost always be sent to the real
sender, and it will be in a format and language that the sender will
usually recognize.
Port 25 blocking
Firewalls and
routers can be programmed to not allow
SMTP traffic (TCP port 25) from machines on the network that are not supposed to run
Mail Transfer Agents or send email.
[8]
This practice is somewhat controversial when ISPs block home users,
especially if the ISPs do not allow the blocking to be turned off upon
request. Email can still be sent from these computers to designated
smart hosts via port 25 and to other smart hosts via the email submission port 587.
Port 25 interception
Network address translation
can be used to intercept all port 25 (SMTP) traffic and direct it to a
mail server that enforces rate limiting and egress spam filtering. This
is commonly done in hotels,
[9] but it can cause
email privacy problems, as well making it impossible to use
STARTTLS and
SMTP-AUTH if the port 587 submission port isn't used.
Rate limiting
Machines that suddenly start sending lots of email may well have become
zombie computers.
By limiting the rate that email can be sent around what is typical for
the computer in question, legitimate email can still be sent, but large
spam runs can be slowed down until manual investigation can be done.
[10]
Spam report feedback loops
By monitoring spam reports from places such as
spamcop,
AOL's feedback loop, and
Network Abuse Clearinghouse,
the domain's abuse@ mailbox, etc., ISPs can often learn of problems
before they seriously damage the ISP's reputation and have their mail
servers blacklisted.
FROM field control
Both malicious software and human spam senders often use forged FROM
addresses when sending spam messages. Control may be enforced on SMTP
servers to ensure senders can only use their correct email address in
the FROM field of outgoing messages. In an email users database each
user has a record with an email address. The SMTP server must check if
the email address in the FROM field of an outgoing message is the same
address that belongs to the user's credentials, supplied for SMTP
authentication. If the FROM field is forged, an SMTP error will be
returned to the email client (e.g. "You do not own the email address you
are trying to send from").
Strong AUP and TOS agreements
Most ISPs and
webmail providers have either an
Acceptable Use Policy (AUP) or a
Terms of Service
(TOS) agreement that discourages spammers from using their system and
allows the spammer to be terminated quickly for violations.
Techniques for researchers & law enforcement
Increasingly, anti-spam efforts have led to co-ordination between law
enforcement, researchers, major consumer financial service companies
and Internet service providers in monitoring and tracking email spam,
identity theft and
phishing activities and gathering evidence for criminal cases.
[11]
Legislation and enforcement
Appropriate
legislation and
enforcement can have a significant impact on spamming activity.
The penalty provisions of the Australian Spam Act 2003 dropped
Australia's ranking in the list of spam-relaying countries for email
spam from tenth to twenty-eighth.
[12]
Legislation that provides mandates that bulk emailers must follow makes compliant spam easier to identify and filter out.
Analysis of spamvertisements
Analysis of sites being
spamvertised
by a given piece of spam often leads to questionable registrations of
Internet domain names. Since registrars are required to maintain
trustworthy
WHOIS
databases, digging into the registration details and complaining at the
proper locations often results in site shutdowns. Uncoordinated
activity may not be effective, given today's volume of spam and the rate
at which criminal organizations register new domains. However, a
coordinated effort, implemented with adequate infrastructure, can obtain
good results.
[13]
New solutions and ongoing research
Several approaches have been proposed to improve the email system.
Cost-based systems
Since spamming is facilitated by the fact that large volumes of email
are very inexpensive to send, one proposed set of solutions would
require that senders pay some cost in order to send email, making it
prohibitively expensive for spammers. Anti-spam activist
Daniel Balsam attempts to make spamming less profitable by bringing lawsuits against spammers.
[14]
Other techniques
There are a number of proposals for sideband protocols that will assist SMTP operation. The
Anti-Spam Research Group (ASRG) of the
Internet Research Task Force (IRTF) is working on a number of
email authentication and other proposals for providing simple source authentication that is flexible, lightweight, and scalable. Recent
Internet Engineering Task Force (IETF) activities include
MARID (2004) leading to two approved IETF experiments in 2005, and
DomainKeys Identified Mail in 2006.
DMARC,
which stands for "Domain-based Message Authentication, Reporting &
Conformance" standardizes how email receivers perform email
authentication using the well-known Sender Policy Framework (SPF) and
DKIM mechanisms.
[15]
Channel email is a new proposal for sending email that attempts to distribute anti-spam activities by forcing verification (probably using
bounce messages so back-scatter doesn't occur) when the first email is sent for new contacts.
Research conferences
Spam is the subject of several research conferences, including: