Introduction

Following a discussion on the Norwegian news group no.it.tjenester.www.design about language tags in different browsers and their default values, I configured db.org to log the contents of Accept-*: headers. The following is an analysis of this log, based on traffic from Tuesday February 4th to Tuesday February 11th 2003.

Accept-Language

The list below shows the different language tags seen during the week in question. Everything looks mostly like expected with two notable exceptions. The first is the tag no-bm, which i assume signifies Norwegian Bokmål. According to the User-Agent: header this is sent by Opera 5.12 running under Windows 95. There is no no-nn tag in the list, but it would be reasonable to assume that a browser using no-bm for Norwegian Bokmål also uses no-nn for Norwegiaa
n Nynorsk.

The second issue is the use of the language tag pdf, comming from Mozilla 4.79 running under Windows 98. If someone has an explanation to this, or even a plausible theory, I would like to know about it.

The list was produced with the following command:

cat negotiation.log \
| sed -e 's/^\"\(.*\)\" \"\(.*\)\" \"\(.*\)\" \"\(.*\)\"/\4/' \
| sed -e 's/,/\
/g'| sed -e 's/ *\(.*\)/\1/' \
| sed -e 's/^\(.*\);.*/\1/' \
| sort -f | uniq -i
  • bg
  • da
  • de
  • de-at
  • en
  • en-au
  • en-bz
  • en-ca
  • en-gb
  • en-ie
  • en-jm
  • en-nz
  • en-ph
  • en-tt
  • en-us
  • en-za
  • en-zw
  • es
  • es-mx
  • es-pr
  • fr
  • he
  • ie-ee
  • it
  • ja
  • lt
  • nb
  • nb-no
  • nl
  • nn
  • nn-no
  • no
  • no-bm
  • no-bok
  • no-nyn
  • pdf
  • pl
  • pt-br
  • ru
  • sk
  • sr
  • sv
  • tr
  • zh-cn

Accept-Encoding

The selection of encodings are noticeably smaller and mostly as can be expected. I noticed that Internet Explorer, Opera and Konqueror all send both gzip and x-gzip. x-compress on other hand is used only by robots/spiders, in this case RPT-HTTPClient and NPBot. For those of you who, like me, has never heard of the identity encoding, RFC 2616 offers the following explanation:

The default (identity) encoding; the use of no transformation whatsoever. This content-coding is used only in the Accept-Encoding header, and SHOULD NOT be used in the Content-Encoding header.

The list was produced with the following command:

cat negotiation.log \
| sed -e 's/^\"\(.*\)\" \"\(.*\)\" \"\(.*\)\" \"\(.*\)\"/\3/' \
| sed -e 's/,/\
/g'| sed -e 's/ *\(.*\)/\1/' \
| sed -e 's/^\(.*\);.*/\1/' \
| sort -f | uniq -i
  • compress
  • deflate
  • gzip
  • identity
  • x-compress
  • x-gzip

Accept-Charset

Another narrow selection. The only interesting thing to note is that Opera, on both Windows and Linux, is the only browser asking for documents in the windows-1252 character set.

The list was produced with the following command:

cat negotiation.log \
| sed -e 's/^\"\(.*\)\" \"\(.*\)\" \"\(.*\)\" \"\(.*\)\"/\2/' \
| sed -e 's/,/\
/g'| sed -e 's/ *\(.*\)/\1/' \
| sed -e 's/^\(.*\);.*/\1/' \
| sort -f | uniq -i
  • ISO-8859-1
  • ISO-8859-15
  • utf-16
  • utf-8
  • windows-1252

Apache Configuration

If you would like to make a similar analysis on your own traffic, the log file can be generated using the following directives in Apache’s httpd.conf.

LogFormat       \"\\"%{User-agent}i\\"       \
                 \\"%{Accept-charset}i\\"   \
                 \\"%{Accept-encoding}i\\"  \
                 \\"%{Accept-language}i\\"\" \
                 negotiation
CustomLog /path/to/logdir/negotiation.log negotiation