[clamav-users] No good deed goes unpunished, or, why CVD files don't work
clamav-users at iment.com
Fri Dec 14 21:58:17 EST 2018
The Good Deed
When we started using ClamAV, we wanted to distribute the database
to the several machines on our LAN in order to reduce the load on the
volunteer servers and minimize the load on our old DSL (now gone). The
best way to do this, it seemed, was to set up a trivial HTTP server to
mirror and deliver the new files. And, of course, they had to be cvd
files which, according to the FAQ, precluded "Scripted Updates" and the
much smaller cdiff files.
This all worked quite well until ClamAV switched to distributing the
updates via Cloudflare: then The Delays started. The Delays initially
exhibited themselves when freshclam itself(!) found that the DNS TXT
record said that a new daily.cvd was available but upon trying to
retrieve it freshclam failed, complaining about network problems. This
eventually would cause all the mirrors to be disabled.
After much investigation (documented at length in previous posts) I
noticed that the daily.cvd from the BOS Cloudflare server was often far
behind that from the IAD Cloudflare server (which always seemed to
match the DNS TXT advertisement). I began to suspect that this was
perhaps caused by a caching web proxy, probably a transparent one
"helpfully" interposed by Comcast.
While all this was going on, Joel stated that nobody else was having
(or at least reporting) these Delay problems.
Now I think I know why.
Most everybody (I would guess) uses the Scripted Update feature, which
is enabled by default. So, I ran an experiment. On one machine I
bypassed local mirroring, enabled Scripted Update *and* captured the
HTTP traffic to/from Cloudflare via dumpcap. What I found was that
Scripted Update does HTTP GETs for one or more daily-12345.cdiff
files in sequence, each, presumably, updating "daily" from the
numerically previous version.
Now it became clear! Each daily-12345.cdiff *always* has the same
content, no matter when it is retrieved. The content of daily.cvd, on
the other hand varies over time. That makes *any* caching of daily.cvd
files susceptible to cause versioning problems, whereas the cdiff files
(such as daily-12345.cdiff) are totally invulnerable to any caching
whatsoever: web caches work according to file *name*, not file content.
This problem is exacerbated by the fact that the Cloudflare servers
seem to add a "Cache-Control:" HTTP header that does NOT specify
"no-cache". (I don't know what the old "volunteer" servers did in this
The upshot of this is that the Scripted Update mechanism will *never*
get out-of-date cdiff files, although it may experience a short delay
if it's the first requester of a new cdiff.
The local mirror mechanism, on the other hand is almost guaranteed to
fail on occasion -- or at least suffer arbitrary delays -- if there is a
caching proxy in its path to Cloudflare. Even if the Cloudflare servers
used a "Cache-Control: no-cache" header, there might be a rogue proxy
in the way that ignores this header, and caches anyway. (AFAIK, there
is no way to enforce "no-cache".)
So what could be done to avoid the problem?
One possibility is to give up on local mirrors. But that might increase
the load on the Cloudflare servers, as some installations might have
more local ClamAV clients than the ratio of the size of a full cvd to
the size of a typical cdiff.
A solution to that would be to use a local HTTP proxy to distribute the
cdiff files to all the ClamAV installations on the LAN. (But that would
require rather complicated setup.)
A third approach would be to do the mirroring using the cdiff-generated
cld files rather than with cvd files. I don't know what changes to
freshclam this would require. One possible obstacle to doing this is
whether the cld files are or could be cryptographically signed like the
cvds are. Something like that would likely be necessary for enterprise
security. (Presumably, generating Talos-signed cvds locally from the
clds would be a really bad idea, while setting up private PKI for local
signing would be a really big pain.)
A fourth, and I think very simple, approach would be to name cvds like
the cdiffs are named. In other words, instead of having daily.cvd, one
would have daily-12345.cvd, followed by daily-12346.cvd as the next
update. This would be impervious to the vagaries of caching. I also
think it would require only fairly trivial code changes to freshclam
and whatever component of ClamAV it is that (re)loads the database.
(All that would be necessary would be to always use the cvd with the
highest version number.)
Any thoughts on all this? Is local mirroring still possible?
P.S. I would have thought that since the clds are much bigger than the
corresponding cvds, loading a cld into memory would be slower than
loading the equivalent cvd, but this seems not to be the case.
To measure the load time I ran clamscan on one tiny file using the
daily.cvd version of the signatures and then using the much bigger
daily.cld. (Main.cvd remained as itself.) This was done on an fairly old
machine, and before each run I, of course, did:
echo 1 > /proc/sys/vm/drop_caches
The result was that total real (and 'user') times were slightly less
for the cld, although the 'system' time was slightly more. I wonder what
eats up the extra time. (I thought disks were always supposed to be the
bottleneck for simple computations like decompression and crypto.)
More information about the clamav-users