Collection of DKIM statistics

From: Murray S. Kucherawy <>
Date: Fri, 30 Jul 2010 01:15:58 -0700 (PDT)

The DKIM working group at the IETF has taken up the task of collecting
data about DKIM deployment, including what features you use (hence my
previous queries) but also success/failure and deployment rates. The
recent overhaul of the statistics code of OpenDKIM was done with this in

The infrastructure is included in the stats feature to collect the data
locally to a hash table on your local disk. You can use the
opendkim-stats tool to query this information for your own interest, but
you can also use it and the opendkim-importstats script to convert it from
the hash table to SQL data so that your hash table retains a reasonable
size and the data can be more easily analyzed for your own use.

What would help with the working group's effort would be to begin
collecting that data from various OpenDKIM installations that are willing
to share it. The infrastructure to do this exists now and is ready to
receive that data from willing participants.

Note that the database does record, for your own use, the domain name in
the From: field of the message (but never the userid) and the IP address
from which the connection originated. In version 2.1.3 and prior, the
opendkim-stats tool will render this information as-is, meaning those data
are revealed to whoever queries them. A certain command line flag will
instead hash the data so that it cannot be decoded, but two records with
the same data can be correlated. Although unencoded data are more useful,
we'd be happy to accept your data in either form. There's nothing funny
going on here, and you have the source code anyway to verify all of this.

If you are interested in participating, ensure your OpenDKIM installation
is fully upgraded, is compiled with "--enable-stats", and is configured to
record those statistics. If after receiving some mail you can run
opendkim-stats on your database and see some data, run this command:

         opendkim-stats -a -c -m

This will dump your database contents, anonymized, into a message and send
it to me. (Run it without "-m" to see what it will
send.) Once I confirm the data looks like it's being properly collected,
I'll give you the submission address and you can just add the command
(with a slight adjustment) to your crontab to upload your data to us at
regular intervals. (Note: As of v2.2.0, the "-a" logic is flipped, so you
will add it only if you want to submit your unencoded data; the default
will be to submit anonymized information.)

Among the information this will allow us to produce:

- how many domains are not signing
- how many domains are signing
- how many domains are signing with multiple signatures
- how many signatures are surviving
- which canonicalizations are popular
- whether failures can be associated with header changes or body changes
- use of various signature features (t=, l=, x=, z=)
- how many messages that use "l=" were extended
- use of key features (g=, t=)
- count of keys that had syntax errors
- count of syntactically invalid signatures
- count of third-party signatures
- count of Received: header fields on signed mail
- how many messages you get that appear to come via lists
- signature survival on list mail vs. regular mail
- how many domains advertise ADSP records, which ones, and pass/fail rates

Looking forward to getting more data than just our own... !

Received on Fri Jul 30 2010 - 08:16:20 PST

This archive was generated by hypermail 2.3.0 : Mon Oct 29 2012 - 23:19:47 PST