HTTP Content Signature Header
-----------------------------

Version:
    2008-06-29


Status:
    Brainstorming.


Overview:

    In a world where some ISPs are starting to fiddle with the contents
    of data downloaded by their customers[1], it'd be useful to be able to
    detect any manipulation of data and to alert the user.
    
    Currently there is a Content-MD5 header to verify that data is delivered
    unchanged, but it's easy for a malicious party with access to the
    network to also alter the MD5 sum when altering content data. To prevent
    this it's necessary to be able to verify the origin on the data.
    
    The only current way to prevent this is to use HTTPS for all transfers,
    but that requires more resources on the server side, especially for sites
    with lots of traffic.
    
    Using OpenPGP[2] signatures it's possible to use plain old HTTP and still
    be able to detect altered data. There is some initial network overhead
    since public keys will need to be downloaded, but some keys could be
    distributed with the browser (for example keys for Google, MSN and a
    few more high traffic sites) and once a key has been downloaded it stays
    in the local cache.
    
    Using this approach, it's possible for caches along the way to store
    data, and it's possible to use pools of web servers with different IP 
    addresses. If the signatures are pre-computed and stored alongside the 
    content on the origin servers, these won't even have to have access to 
    the private key.


Basic structure of header:

    Content-Signature: [base64 encoded signature];[base64 encoded signature checksum]
    
    Example header:
    
        Content-Signature: iD8DBQFIYQCSi0P7OS4VvkwRAm7nAKC1Ra4RmhtgPFEIckxu0uACoVWVIwCg0u2B5u2gS2tSO7LXagplAF+AwI0=;=FfiF
    
    Compare to a normal PGP signature:

        -----BEGIN PGP SIGNED MESSAGE-----
        Hash: SHA1

        <html><body></body></html>
        -----BEGIN PGP SIGNATURE-----
        Version: GnuPG v1.4.7 (Darwin)

        iD8DBQFIYQCSi0P7OS4VvkwRAm7nAKC1Ra4RmhtgPFEIckxu0uACoVWVIwCg0u2B
        5u2gS2tSO7LXagplAF+AwI0=
        =FfiF
        -----END PGP SIGNATURE-----


Verifying that a key is allowed to sign content:

    In the signature the signing key is identified using a key id. Before trusting
    the signature, the browser needs to verify that the key is indeed associated with
    the web site in question. This is fairly tricky, since an unencrypted network
    connection cannot be used to verify it (remember, we're assuming an evil ISP).
    
    To get around this problem, we add an additional HTTP header Signature-Keys
    that points to a list of trusted keys that can be downloaded over a HTTPS 
    connection. By doing this, we can be sure that we get a valid list (assuming we 
    trust HTTPS connections).
    
    E.g.: Signature-Keys: https://keys.someserver.com/trusted_keys.txt
    
    The list of trusted keys has to be located on the same host as the content to be 
    verified, or one of its super domains. This means that google.com could be used 
    for any *.google.com domain, but not for microsoft.com.
    
    To allow users without access to HTTPS enabled servers to sign content, it might 
    be necessary to allow certain exceptions to this rule so that a third party can 
    maintain key lists for non-HTTPS domains. Only a small number of such exceptions can 
    be allowed, and the exceptions must be clearly defined.
    
    The structure of the key list is simple: Each line consists of a trust level, a key id,
    and a domain pattern. Allowed trust levels are 'trusted' and 'untrusted'. Listing a key 
    as untrusted can signify that is has once been but is no longer trusted, or to assign
    a set of keys to different parts of a domain.
    
    E.g. https://someserver.com/trusted_keys.txt:
      untrusted AABBCC11 *.someserver.com       # old key that got leaked
      trusted   BBCC33DD www.someserver.com     # web key
      trusted   CC9988EE *.someserver.com       # generic key
      untrusted CC9988EE images.someserver.com  # generic key isn't trusted to sign images
    
    Text after a # sign is ignored as a comment. The domain pattern is very basic: it's
    either a specific domain, or a domain prefixed with "*.", meaning that it's valid for
    that domain and all sub-domains.
    
    Once a list of trusted keys has been downloaded for a domain, it can be cached for as
    long as allowed by standard HTTP cache rules. Future downloads from the same domain 
    can use the cached version of the list, assuming that the new content uses one of 
    the keys in the list.
    
    Alternately, the Signature-Keys header can be sent in HTML content using a meta tag.
    
    E.g. contents of URL http://www.someserver.com/index.html:
      <html>
        <head>
          <meta http-equiv="Signature-Keys" 
                content="https://someserver.com/trusted_keys.txt" />
        </head>
        ...
      </html>
    
    Note that the Content-Signature header cannot be sent as a meta tag, since that would
    require the signature to sign itself. It could be argued that a signature could be
    calculated with the content attribute of the meta tag set to "", but that would be
    somewhat complicated in practice. It might be worth the effort though, to allow
    signing HTML content stored on servers that lack Content-Signature support.
    

Implementation:
    
    The client part of the system can be implemented as a browser plugin in browsers
    that support plugins, or of course natively in the browser. The main thing to think
    about here is how to alert the user if there is a signature mismatch.
    
    On the server side the best course of action depends a bit on the size and nature
    of the content to be signed. For static files it's trivial to calculate and store
    the signatures as files are retrieved. For dynamic content it's not as simple, but
    for small to medium size content it's possible to store the output of a script in
    RAM and generate the signature before data is sent to the client. For larger content
    it will be up to the script to generate and add the signature.


References:

    [1] http://www.theregister.co.uk/2008/06/23/topolski_takes_on_nebuad/
    [2] http://www.openpgp.org/

Document maintainer:

    mikael@eiman.tv