HTTP Content Signature Header

HTTP Content Signature Header
-----------------------------

Version:
2008-06-29

Status:
Brainstorming.

Overview:

In a world where some ISPs are starting to fiddle with the contents
of data downloaded by their customers[1], it'd be useful to be able to
detect any manipulation of data and to alert the user.

Currently there is a Content-MD5 header to verify that data is delivered
unchanged, but it's easy for a malicious party with access to the
network to also alter the MD5 sum when altering content data. To prevent
this it's necessary to be able to verify the origin on the data.

The only current way to prevent this is to use HTTPS for all transfers,
but that requires more resources on the server side, especially for sites
with lots of traffic.

Using OpenPGP[2] signatures it's possible to use plain old HTTP and still
be able to detect altered data. There is some initial network overhead
since public keys will need to be downloaded, but some keys could be
distributed with the browser (for example keys for Google, MSN and a
few more high traffic sites) and once a key has been downloaded it stays
in the local cache.

Using this approach, it's possible for caches along the way to store
data, and it's possible to use pools of web servers with different IP
addresses. If the signatures are pre-computed and stored alongside the
content on the origin servers, these won't even have to have access to
the private key.

Basic structure of header:

Content-Signature: [base64 encoded signature];[base64 encoded signature checksum]

Example header:

Content-Signature: iD8DBQFIYQCSi0P7OS4VvkwRAm7nAKC1Ra4RmhtgPFEIckxu0uACoVWVIwCg0u2B5u2gS2tSO7LXagplAF+AwI0=;=FfiF

Compare to a normal PGP signature:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

<html><body></body></html>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)

iD8DBQFIYQCSi0P7OS4VvkwRAm7nAKC1Ra4RmhtgPFEIckxu0uACoVWVIwCg0u2B
5u2gS2tSO7LXagplAF+AwI0=
=FfiF
-----END PGP SIGNATURE-----

Verifying that a key is allowed to sign content:

In the signature the signing key is identified using a key id. Before trusting
the signature, the browser needs to verify that the key is indeed associated with
the web site in question. This is fairly tricky, since an unencrypted network
connection cannot be used to verify it (remember, we're assuming an evil ISP).

To get around this problem, we add an additional HTTP header Signature-Keys
that points to a list of trusted keys that can be downloaded over a HTTPS
connection. By doing this, we can be sure that we get a valid list (assuming we
trust HTTPS connections).

E.g.: Signature-Keys: https://keys.someserver.com/trusted_keys.txt

The list of trusted keys has to be located on the same host as the content to be
verified, or one of its super domains. This means that google.com could be used
for any *.google.com domain, but not for microsoft.com.

To allow users without access to HTTPS enabled servers to sign content, it might
be necessary to allow certain exceptions to this rule so that a third party can
maintain key lists for non-HTTPS domains. Only a small number of such exceptions can
be allowed, and the exceptions must be clearly defined.

The structure of the key list is simple: Each line consists of a trust level, a key id,
and a domain pattern. Allowed trust levels are 'trusted' and 'untrusted'. Listing a key
as untrusted can signify that is has once been but is no longer trusted, or to assign
a set of keys to different parts of a domain.

E.g. https://someserver.com/trusted_keys.txt:
untrusted AABBCC11 *.someserver.com # old key that got leaked
trusted BBCC33DD www.someserver.com # web key
trusted CC9988EE *.someserver.com # generic key
untrusted CC9988EE images.someserver.com # generic key isn't trusted to sign images

Text after a # sign is ignored as a comment. The domain pattern is very basic: it's
either a specific domain, or a domain prefixed with "*.", meaning that it's valid for
that domain and all sub-domains.

Once a list of trusted keys has been downloaded for a domain, it can be cached for as
long as allowed by standard HTTP cache rules. Future downloads from the same domain
can use the cached version of the list, assuming that the new content uses one of
the keys in the list.

Alternately, the Signature-Keys header can be sent in HTML content using a meta tag.

E.g. contents of URL http://www.someserver.com/index.html:
<html>
<head>
<meta http-equiv="Signature-Keys"
content="https://someserver.com/trusted_keys.txt" />
</head>
...
</html>

Note that the Content-Signature header cannot be sent as a meta tag, since that would
require the signature to sign itself. It could be argued that a signature could be
calculated with the content attribute of the meta tag set to "", but that would be
somewhat complicated in practice. It might be worth the effort though, to allow
signing HTML content stored on servers that lack Content-Signature support.

Implementation:

The client part of the system can be implemented as a browser plugin in browsers
that support plugins, or of course natively in the browser. The main thing to think
about here is how to alert the user if there is a signature mismatch.

On the server side the best course of action depends a bit on the size and nature
of the content to be signed. For static files it's trivial to calculate and store
the signatures as files are retrieved. For dynamic content it's not as simple, but
for small to medium size content it's possible to store the output of a script in
RAM and generate the signature before data is sent to the client. For larger content
it will be up to the script to generate and add the signature.

References:

[1] http://www.theregister.co.uk/2008/06/23/topolski_takes_on_nebuad/
[2] http://www.openpgp.org/

Document maintainer:

mikael@eiman.tv