Introduction
Following a discussion on the Norwegian news group no.it.tjenester.www.design about language tags in different browsers and their default values, I configured db.org to log the contents of Accept-*: headers. The following is an analysis of this log, based on traffic from Tuesday February 4th to Tuesday February 11th 2003.
Accept-Language
The list below shows the different language tags seen during the week in question. Everything looks mostly like expected with two notable exceptions. The first is the tag no-bm, which i assume signifies Norwegian Bokmål. According to the User-Agent: header this is sent by Opera 5.12 running under Windows 95. There is no no-nn tag in the list, but it would be reasonable to assume that a browser using no-bm for Norwegian Bokmål also uses no-nn for Norwegiaa
n Nynorsk.
The second issue is the use of the language tag pdf, comming from Mozilla 4.79 running under Windows 98. If someone has an explanation to this, or even a plausible theory, I would like to know about it.
The list was produced with the following command:
cat negotiation.log \ | sed -e 's/^\"\(.*\)\" \"\(.*\)\" \"\(.*\)\" \"\(.*\)\"/\4/' \ | sed -e 's/,/\ /g'| sed -e 's/ *\(.*\)/\1/' \ | sed -e 's/^\(.*\);.*/\1/' \ | sort -f | uniq -i
-
bg -
da -
de -
de-at -
en -
en-au -
en-bz -
en-ca -
en-gb -
en-ie -
en-jm -
en-nz -
en-ph -
en-tt -
en-us -
en-za -
en-zw -
es -
es-mx -
es-pr -
fr -
he -
ie-ee -
it -
ja -
lt -
nb -
nb-no -
nl -
nn -
nn-no -
no -
no-bm -
no-bok -
no-nyn -
pdf -
pl -
pt-br -
ru -
sk -
sr -
sv -
tr -
zh-cn
Accept-Encoding
The selection of encodings are noticeably smaller and mostly as can be expected. I noticed that Internet Explorer, Opera and Konqueror all send both gzip and x-gzip. x-compress on other hand is used only by robots/spiders, in this case RPT-HTTPClient and NPBot. For those of you who, like me, has never heard of the identity encoding, RFC 2616 offers the following explanation:
The default (identity) encoding; the use of no transformation whatsoever. This content-coding is used only in the
Accept-Encodingheader, and SHOULD NOT be used in theContent-Encodingheader.
The list was produced with the following command:
cat negotiation.log \ | sed -e 's/^\"\(.*\)\" \"\(.*\)\" \"\(.*\)\" \"\(.*\)\"/\3/' \ | sed -e 's/,/\ /g'| sed -e 's/ *\(.*\)/\1/' \ | sed -e 's/^\(.*\);.*/\1/' \ | sort -f | uniq -i
-
compress -
deflate -
gzip -
identity -
x-compress -
x-gzip
Accept-Charset
Another narrow selection. The only interesting thing to note is that Opera, on both Windows and Linux, is the only browser asking for documents in the windows-1252 character set.
The list was produced with the following command:
cat negotiation.log \ | sed -e 's/^\"\(.*\)\" \"\(.*\)\" \"\(.*\)\" \"\(.*\)\"/\2/' \ | sed -e 's/,/\ /g'| sed -e 's/ *\(.*\)/\1/' \ | sed -e 's/^\(.*\);.*/\1/' \ | sort -f | uniq -i
-
ISO-8859-1 -
ISO-8859-15 -
utf-16 -
utf-8 -
windows-1252
Apache Configuration
If you would like to make a similar analysis on your own traffic, the log file can be generated using the following directives in Apache’s httpd.conf.
LogFormat \"\\"%{User-agent}i\\" \
\\"%{Accept-charset}i\\" \
\\"%{Accept-encoding}i\\" \
\\"%{Accept-language}i\\"\" \
negotiation
CustomLog /path/to/logdir/negotiation.log negotiation