Introduction

This is an analysis of spam received at a set of spam traps during the week of Monday December 15th 2003 (ISO week number 51, from Monday December 15th 2003 to Sunday December 21st 2003 inclusive). Most of the spam traps are expired or non-existent accounts, the remaining few have been unused for a long time with no legitimate email traffic. All the expired addresses have been bouncing mail for over a year, most for more then three. None of the spam traps have been actively seeded to receive spam. For the purpose of this report, spam is defined as any message delivered to these spam traps. All times and dates are relative to UTC.

While it is impossible to say with any level of certainty how representative these numbers are of spam in general, I believe they are close enough to provide useful information to anyone interested in spam or spam prevention.

Summary

A short summary of this weeks numbers.

Total number of messages:
1578
Unique message bodies:
1489
Unique sender IP addresses:
1392
Average message size:
4253

Client DNS

Reverse DNS

The client IP address is looked up in DNS at the time of delivery. The following shows the number of clients with and without published reverse DNS information, and the number of clients with matching forward and reverse DNS information.

Clients without reverse DNS:
317 (20.1%)
Clients with reverse DNS:
1261 (79.9%)
Clients with matching forward and reverse DNS:
1154 (73.1%)

Originating Domain

The 10 most frequent domains (reverse excluding the host name/left most label). Only clients with matching forward and reverse DNS information are considered. Count is the number of messages delivered from clients within the given domain. Percentage is relative to the number of messages delivered from clients with matching forward and reverse DNS information.

Domain Count Percentage
client.comcast.net 65 5.6%
stocksnut.com 33 2.9%
dip.t-dialin.net 32 2.8%
dyn.optonline.net 30 2.6%
ne.client2.attbi.com 18 1.6%
telia.com 16 1.4%
ipt.aol.com 14 1.2%
prod-infinitum.com.mx 14 1.2%
forestsavers.com 13 1.1%
dsl.telesp.net.br 12 1%

Client Countries

Client Country Distribution

Client country is determined using the WebHosting.Info ip-to-country data. Data is updated monthly if there are updates available. This shows the top 15 countries for delivering clients. Count is the number of messages delivered from clients in the given country. Percentage is relative to the total number of messages received in the reporting period.

Chart@http://db.org/media/2003/12/22/oos-contry.png

Contry Count Percentage
United States 839 53.2%
Canada 102 6.5%
Republic Of Korea 91 5.8%
China 75 4.8%
Brazil 44 2.8%
France 42 2.7%
Germany 42 2.7%
Netherlands 26 1.6%
Mexico 22 1.4%
Spain 22 1.4%
Poland 20 1.3%
United Kingdom 20 1.3%
Sweden 19 1.2%
Venezuela 17 1.1%
Japan 17 1.1%

Senders

Claimed Sender Domains

Domain name claimed in envelope sender addresses. It is commonly accepted that a significant share of all spam is sent with a forged sender address, so the following list is useless for identifying message origin. It is included here to show spammers domain preference when forging sender addresses.

Domain Count Percentage
yahoo.com 141 8.9%
msn.com 61 3.9%
hotmail.com 42 2.7%
aol.com 32 2%
thebiggreenbox.com 19 1.2%
upper-web-side.com 17 1.1%
canada.com 16 1%
stocksnut.com 16 1%
qo.forestsavers.com 13 0.8%
SB1.trbrgns.com 12 0.8%

Clients

A port scan is performed on all clients delivering messages to spam traps. The scan is started as soon as possible after message delivery, usually within a few seconds.

Please note that in cases where the client is located behind a router or firewall doing NAT, the target of the scan may be the router/firewall not the sending client.

Services On Unfiltered Ports

Services are identified by nmaps version detection feature. The following lists the 15 most common services found on unfiltered ports. The count is the number of distinct IP addresses, the percentage is relative to the total number of distinct IP addresses seen within the reporting period.

Service Count Percentage
Microsoft Windows msrpc 714 51.3%
Microsoft Windows UPnP 483 34.7%
Microsoft Windows XP microsoft-ds 227 16.3%
Microsoft mstask 123 8.8%
OpenSSH 98 7%
Microsoft IIS webserver 79 5.7%
Apache httpd 72 5.2%
Microsoft Distributed Transaction Coordinator 65 4.7%
Microsoft Terminal Service 60 4.3%
KaZaA client 58 4.2%
Microsoft ftpd 52 3.7%
Microsoft ESMTP 50 3.6%
Microsoft DNS 48 3.4%
MySQL 46 3.3%
ISC Bind 34 2.4%

Client OS Vendor

Client operating systems are identified by nmaps TCP/IP fingerprinting feature. The following lists the 10 most common client operating system vendors. In cases where TCP/IP fingerprinting did not successfully identify the clients operating systems, it is listed as *unknown*.

Chart@http://db.org/media/2003/12/22/oos-os-vendor.png

Vendor Count Percentage
Microsoft 902 64.8%
*unknown* 254 18.2%
Linux 121 8.7%
FreeBSD 36 2.6%
Turtle 30 2.2%
Cisco 11 0.8%
IBM 9 0.6%
Cnet 5 0.4%
Smoothwall 4 0.3%
Tektronix 3 0.2%

Time Distribution

Day of Week

Distribution of messages over days of week. The count shows the number of messages received on each day, the percentage is relative to the total number of messages received within the reporting period.

Chart@http://db.org/media/2003/12/22/oos-day-of-week.png

Day Count Percentage
Mon 213 13.5%
Tue 212 13.4%
Wed 230 14.6%
Thu 234 14.8%
Fri 240 15.2%
Sat 250 15.8%
Sun 199 12.6%

Time of Day

Distribution of messages over time of day. 00 describes the hour from 00:00 to 01:00. The count shows the number of messages received within each hour, the percentage is relative to the total number of messages received within the reporting period.

Chart@http://db.org/media/2003/12/22/oos-time-of-day.png

Hour Count Percentage
00 56 3.5%
01 83 5.3%
02 61 3.9%
03 73 4.6%
04 70 4.4%
05 65 4.1%
06 54 3.4%
07 75 4.8%
08 61 3.9%
09 79 5%
10 60 3.8%
11 52 3.3%
Hour Count Percentage
12 59 3.7%
13 72 4.6%
14 59 3.7%
15 55 3.5%
16 70 4.4%
17 77 4.9%
18 59 3.7%
19 64 4.1%
20 60 3.8%
21 82 5.2%
22 72 4.6%
23 60 3.8%

Size Distribution

Message Size Distribution

Message size distribution over all messages. Count is the number of messages within the given size range. The percentage is relative to the total number of messages received within the reporting period.

Chart@http://db.org/media/2003/12/22/oos-size.png

Size Count Percentage
1.5KiB > size >= 0.5KiB 405 25.7%
2.5KiB > size >= 1.5KiB 342 21.7%
3.5KiB > size >= 2.5KiB 285 18.1%
4.5KiB > size >= 3.5KiB 164 10.4%
5.5KiB > size >= 4.5KiB 132 8.4%
7.5KiB > size >= 6.5KiB 62 3.9%
6.5KiB > size >= 5.5KiB 47 3%
8.5KiB > size >= 7.5KiB 27 1.7%
34.5KiB > size >= 33.5KiB 24 1.5%
9.5KiB > size >= 8.5KiB 15 1%

DNS Block Lists

These numbers shows the presence of the sending client in a DNS block list at the time of message delivery. The selection of DNS block lists may be updated on a monthly basis.

Senders In DNS Block Lists

Sending clients present in DNS block list at the time of delivery. The count is the total number of messages delivered from clients listed in the respective block list. The percentage is relative to the total number of messages received within the reporting period.

DNS block list Count Percentage
bl.spamcop.net 1235 78.3%
dul.dnsbl.sorbs.net 764 48.4%
cbl.abuseat.org 729 46.2%
sbl.spamhaus.org 262 16.6%
socks.dnsbl.sorbs.net 233 14.8%
http.dnsbl.sorbs.net 192 12.2%
relays.visi.com 158 10%
spam.dnsbl.sorbs.net 141 8.9%
misc.dnsbl.sorbs.net 23 1.5%
zombie.dnsbl.sorbs.net 16 1%
relays.ordb.org 9 0.6%

DNS Block List Groups

To investigate overlap of the different DNS block lists and the effectiveness of combinations of block lists, the following shows the number of messages delivered from clients listed in at least one of the groups block lists. The set of groups may be updated on a monthly basis.

all:
1465 of 1578 (92.8%)
bl.spamcop.net, cbl.abuseat.org and dul.dnsbl.sorbs.net:
1388 of 1578 (88%)
bl.spamcop.net and dul.dnsbl.sorbs.net:
1373 of 1578 (87%)
bl.spamcop.net and cbl.abuseat.org:
1259 of 1578 (79.8%)
*.dnsbl.sorbs.net:
1044 of 1578 (66.2%)

SpamAssassin

All messages are filtered through SpamAssassin and reports are generated on hit rates including and excluding Bayesian classification as well as the scores from the Bayesian classifier itself. As the spam traps receive no legitimate email, the learning for the SpamAssassin Bayesian classifier is based on mail received at a different address. This can be expected to reduce the accuracy of the numbers that includes the result of the Bayesian classifier.

SpamAssassin Without Bayesian Classifier

SpamAssassin hits when disregarding the result of the Bayesian classifier. The count is the number of messages for the given score range. The percentage is relative to the total number of messages received within the reporting period.

Chart@http://db.org/media/2003/12/22/oos-sa-only.png

Hits Count Percentage
0 > hits 2 0.1%
5 > hits >= 0 513 32.5%
10 > hits >= 5 356 22.6%
15 > hits >= 10 298 18.9%
20 > hits >= 15 226 14.3%
hits >= 20 183 11.6%

SpamAssassin Bayesian Classifier Scores

The score of the SpamAssassin Bayesian classifier. The count is the number of messages for the given score range. The percentage is relative to the total number of messages received within the reporting period.

Chart@http://db.org/media/2003/12/22/oos-sa-bayes.png

Score Count Percentage
1.00 > score >= 0.95 1406 89.1%
0.95 > score >= 0.85 42 2.7%
0.85 > score >= 0.75 18 1.1%
0.75 > score >= 0.65 14 0.9%
0.65 > score >= 0.55 16 1%
0.55 > score >= 0.45 74 4.7%
0.45 > score >= 0.35 1 0.1%
0.35 > score >= 0.25 2 0.1%
0.25 > score >= 0.15 1 0.1%
0.15 > score >= 0.05 1 0.1%
0.05 > score >= 0 3 0.2%

SpamAssassin Including Bayesian Classifier

SpamAssassin hits including the result of the Bayesian classifier. The count is the number of messages for the given hit range. The percentage is relative to the total number of messages received within the reporting period.

Chart@http://db.org/media/2003/12/22/oos-sa-combined.png

Hits Count Percentage
0 > hits 7 0.4%
5 > hits >= 0 106 6.7%
10 > hits >= 5 428 27.1%
15 > hits >= 10 316 20%
20 > hits >= 15 270 17.1%
hits >= 20 451 28.6%

Distributed Checksum Clearinghouse

DCC Matches

Number of matches for the three matching algorithms used by DCC. The DCC servers are queried at the time of delivery. The Body, Fuz1 and Fuz2 columns shows the number of messages matched in the count range for their respective algorithm. The Highest column shows the number of hits from the algorithm returning the highest match count. The percentage is relative to the total number of messages received within the reporting period.

Chart@http://db.org/media/2003/12/22/oos-dcc.png

Range Body Fuz1 Fuz2 Highest
25 >= count 1462 (92.6%) 1142 (72.4%) 401 (25.4%) 398 (25.2%)
50 >= count > 25 5 (0.3%) 2 (0.1%) 15 (1%) 11 (0.7%)
75 >= count > 50 5 (0.3%) 17 (1.1%) 25 (1.6%) 24 (1.5%)
100 >= count > 75 7 (0.4%) 1 (0.1%) 19 (1.2%) 17 (1.1%)
count > 100 99 (6.3%) 416 (26.4%) 1118 (70.8%) 1128 (71.5%)

About

On the Origin Of Spam is published as weekly and monthly reports by B. Johannessen in the hope that it will be useful to the anti-spam community.