It is interesting to examine the HTTP headers of requests and
responses flowing through the caches. To get this information, I
temporarily modified Squid to write a short binary record that
indicates which headers are present. I also tracked the
Cache-control
directives.
The headers log file does not include URLs, so I cannot eliminate the popularity effects. There is one entry for each request from and each response to a client, so this data is from the client’s point of view.
Table A-2
lists the request headers and
their frequency of occurrence. It’s important to keep in mind
that most of these requests come from child caches, not from
web browsers. Furthermore, most of the child caches are also
running Squid. Evidence of this is seen in the occurrence of
Via
and X-Forwarded-For
headers. Both of these are added by proxies, and the latter
is an extension header used by Squid. According to this
data, around 99% of all requests come from child caches.
Table A-2. Client Request Headers (IRCache Data)
Header | % Occurrence | Header | % Occurrence |
---|---|---|---|
Host
| 99.91 |
Range
| 0.46 |
User-Agent
| 99.21 |
Connection
| 0.26 |
Via
| 98.90 |
From
| 0.24 |
Accept
| 98.84 |
Date
| 0.18 |
Cache-Control
| 98.34 |
Proxy-Authorization
| 0.07 |
X-Forwarded-For
| 98.19 |
Request-Range
| 0.06 |
Accept-Language
| 91.33 |
If-Range
| 0.05 |
Referer
| 85.00 |
Expires
| 0.02 |
Accept-Encoding
| 82.60 |
Mime-Version
| 0.01 |
Proxy-Connection
| 78.46 |
Content-Encoding
| 0.00 |
Cookie
| 39.18 |
Location
| 0.00 |
Accept-Charset
| 28.77 |
If-Match
| 0.00 |
If-Modified-Since
| 24.83 |
X-Cache
| 0.00 |
Pragma
| 13.18 |
Age
| 0.00 |
Other
| 5.82 |
Last-Modified
| 0.00 |
Authorization
| 1.41 |
Server
| 0.00 |
Content-Type
| 1.00 |
ETag
| 0.00 |
Content-Length
| 0.84 |
Accept-Ranges
| 0.00 |
If-None-Match
| 0.61 |
Set-Cookie
| 0.00 |
The Referer
and From
headers are interesting for their privacy implications.
Fortunately, very few requests include the
From
header. Referer
is
quite common, but it is less of a threat to privacy.
The data indicates that about 25% of all requests are cache
validations. Most of these are
If-Modified-Since
requests, and a small
amount are If-None-Match
. Note that Squid
does not support ETag-based validation at this time.
Table A-3
lists the Cache-control
directives found in the same set of
requests. The max-age
directive occurs
often because Squid always adds this header when forwarding a
request to a neighbor cache. The
only-if-cached
directives come from caches
configured in a sibling relationship. (The
only-if-cached
directive instructs the
sibling not to forward the request if it is a cache miss.)
Table A-4 lists the HTTP reply
headers. X-Cache
is an extension header
that Squid uses for debugging. Its value is either
HIT or MISS to
indicate whether the reply came from a cached response.
Table A-4. Client Reply Headers (IRCache Data)
Header | % Occurrence | Header | % Occurrence |
---|---|---|---|
X-Cache
| 100.00 |
Warning
| 0.03 |
Proxy-Connection
| 99.88 |
Content-Language
| 0.02 |
Date
| 95.20 |
WWW-Authenticate
| 0.02 |
Content-Type
| 84.94 |
Title
| 0.01 |
Server
| 82.49 |
Content-Base
| 0.01 |
Content-Length
| 65.67 |
Location
| 0.00 |
Last-Modified
| 65.61 |
Referer
| 0.00 |
ETag
| 53.07 |
Content-MD5
| 0.00 |
Accept-Ranges
| 48.06 |
From
| 0.00 |
Age
| 24.28 |
Host
| 0.00 |
Cache-Control
| 10.36 |
Public
| 0.00 |
Expires
| 10.30 |
Upgrade
| 0.00 |
Pragma
| 3.13 |
X-Request-URI
| 0.00 |
Set-Cookie
| 3.04 |
Cookie
| 0.00 |
Other
| 2.99 |
Accept-Charset
| 0.00 |
Mime-Version
| 1.62 |
User-Agent
| 0.00 |
Via
| 0.80 |
Retry-After
| 0.00 |
Vary
| 0.66 |
Accept-Language
| 0.00 |
Link
| 0.53 |
Authorization
| 0.00 |
Content-Location
| 0.28 |
Range
| 0.00 |
Content-Encoding
| 0.28 |
Accept-Encoding
| 0.00 |
Allow
| 0.19 |
X-Forwarded-For
| 0.00 |
Connection
| 0.12 |
If-Modified-Since
| 0.00 |
Accept
| 0.04 |
Content-Range
| 0.00 |
The Date
header is important for caching.
RFC 2616 says that every response must have a Date
header,
with few exceptions.
Here we see it in about 95% of replies, which is pretty good.
Content-length
occurs in only 65% of responses. This is
unfortunate, because when a client (including proxies) doesn’t
know how long the message should be, it’s difficult to detect
partial responses due to network problems. The missing
Content-length
header also prevents a connection from being
persistent, unless the agents use chunked encoding.
Table A-5
lists the Cache-control
reply directives present in the
responses sent to cache clients. As you can see,
no-cache
and private
are
the most popular directives. The fact that both occur in 4.6%
of responses leads me to believe they probably always occur
together. max-age
is the only other
directive that occurs in more than 1% of responses. The
“Other” entry refers to unknown or nonstandard directives.
Table A-5. Cache-control Reply Directives (IRCache Data)
Directive | % Occurrence | Directive | % Occurrence |
---|---|---|---|
no-cache
| 4.60 |
no-store
| 0.06 |
private
| 4.60 |
no-transform
| 0.02 |
max-age
| 2.69 |
s-maxage
| 0.00 |
must-revalidate
| 0.23 |
proxy-revalidate
| 0.00 |
Other
| 0.11 |
only-if-cached
| 0.00 |
public
| 0.09 |
If we want to find the percentage of responses that
have an expiration time, we need to know how often the
Expires
header and max-age
directive
occur separately and together.
Table A-6 shows the percentage of
responses that have one, the other, neither, and both of
these headers. In these traces, 89.65% of responses have
neither header, which means that only 10.35% have an expiration
value.
You can see that the Expires
header is still
more popular than the max-age
directive, and
that max-age
almost never appears alone.
Table A-6. Responses with Explicit Expiration Times (IRCache Data)
Header/Directive | % Occurrence |
---|---|
Neither Expires nor max-age
| 89.65 |
Expires only | 7.65 |
Both | 2.64 |
max-age only | 0.05 |
The analysis is similar for cache validators, although the
results in Table A-7 are more
encouraging. 77.04% of all responses
sent to clients have a cache validator.
Last-modified
is still more popular than the
ETag
header, although a significant percentage
(11.43%) of responses carry only the ETag
validator.
3.144.238.20