[clamav-users] No good deed goes unpunished, or, why CVD files don't work

Dennis Peterson dennispe at inetnw.com
Sat Dec 15 06:42:40 UTC 2018

 From a best practices perspective it is best to use freshclam when talking to 
ClamAV resources. Once you have what you need from them you can do anything you 
like internally. You don't have to be nice to them at this point. I had a couple 
hundred RedHat servers to manage and they all required scanning software because 
of the industry I was in and because of HIPPA, credit card, social security, 
phone numbers and other personal information rules we were bound to. I created a 
lot of locally generated signatures to look for this information. This was 
before smart file systems that would do this for us.

When I built the local private mirror I used the cdiff files (scripted downloads 
were permitted) to create local patched .cld files. These had to be distributed 
to the hundreds of other machines and for that I initially used rsync because it 
is just bullet proof, and later I moved it all to CFengine (predecessor to 
puppet, chef).

The CFengine master server received the cld files from a snapshot file system 
(freshclam triggered the snapshot before and after an update) so new updates 
would not corrupt existing signature files, and it then immediately informed all 
the clients they had work to do to become conformal (in CFengine  terms). 
CFengine is smart enough to know to transmit differences between local and 
remote files on the fly (rsync) so net traffic is minimized as the daily and 
bytecode files don't change much. And because of the way the process works 
(creating hidden files until the differences are resolved), the hidden files are 
renamed to the original names which is very close to an atomic operation, so 
problems working with files in transport were prevented. The CFengine client 
would notify the local clamd instance when the files were ready. Clamd has to be 
told not to reload when it detects signature change. All very clean, fast, and 
secure owing to using secure processes at each step and hands-free on my part. 
It also passed federal government security audits which was the best part.

Short answer - don't use freshclam to get the signature files from your mirror 
to your clients and it won't matter if they are cld, cvd, cud, etc., and it 
doesn't burden the ClamAV servers by pulling full copies of CVD files.

As for the cdiff files not changing, that is by design because each cdiff file 
brings the local cld file to the cdiff version, and because it can't be known 
how many cdiffs have been created between user updates, they are retained for a 
period of time and freshclam applies them in order until the final cdiff matches 
the current DNS TXT record.


On 12/14/18 6:58 PM, Paul Kosinski wrote:
> The Good Deed
> When we started using ClamAV, we wanted to distribute the database
> to the several machines on our LAN in order to reduce the load on the
> volunteer servers and minimize the load on our old DSL (now gone). The
> best way to do this, it seemed, was to set up a trivial HTTP server to
> mirror and deliver the new files. And, of course, they had to be cvd
> files which, according to the FAQ, precluded "Scripted Updates" and the
> much smaller cdiff files.
> The Punishment
> This all worked quite well until ClamAV switched to distributing the
> updates via Cloudflare: then The Delays started. The Delays initially
> exhibited themselves when freshclam itself(!) found that the DNS TXT
> record said that a new daily.cvd was available but upon trying to
> retrieve it freshclam failed, complaining about network problems. This
> eventually would cause all the mirrors to be disabled.
> After much investigation (documented at length in previous posts) I
> noticed that the daily.cvd from the BOS Cloudflare server was often far
> behind that from the IAD Cloudflare server (which always seemed to
> match the DNS TXT advertisement). I began to suspect that this was
> perhaps caused by a caching web proxy, probably a transparent one
> "helpfully" interposed by Comcast.
> While all this was going on, Joel stated that nobody else was having
> (or at least reporting) these Delay problems.
> Now I think I know why.
> The Explanation
> Most everybody (I would guess) uses the Scripted Update feature, which
> is enabled by default. So, I ran an experiment. On one machine I
> bypassed local mirroring, enabled Scripted Update *and* captured the
> HTTP traffic to/from Cloudflare via dumpcap. What I found was that
> Scripted Update does HTTP GETs for one or more daily-12345.cdiff
> files in sequence, each, presumably, updating "daily" from the
> numerically previous version.
> Now it became clear! Each daily-12345.cdiff *always* has the same
> content, no matter when it is retrieved. The content of daily.cvd, on
> the other hand varies over time. That makes *any* caching of daily.cvd
> files susceptible to cause versioning problems, whereas the cdiff files
> (such as daily-12345.cdiff) are totally invulnerable to any caching
> whatsoever: web caches work according to file *name*, not file content.
> This problem is exacerbated by the fact that the Cloudflare servers
> seem to add a "Cache-Control:" HTTP header that does NOT specify
> "no-cache". (I don't know what the old "volunteer" servers did in this
> regard.)
> The upshot of this is that the Scripted Update mechanism will *never*
> get out-of-date cdiff files, although it may experience a short delay
> if it's the first requester of a new cdiff.
> The local mirror mechanism, on the other hand is almost guaranteed to
> fail on occasion -- or at least suffer arbitrary delays -- if there is a
> caching proxy in its path to Cloudflare. Even if the Cloudflare servers
> used a "Cache-Control: no-cache" header, there might be a rogue proxy
> in the way that ignores this header, and caches anyway. (AFAIK, there
> is no way to enforce "no-cache".)
> So what could be done to avoid the problem?
> One possibility is to give up on local mirrors. But that might increase
> the load on the Cloudflare servers, as some installations might have
> more local ClamAV clients than the ratio of the size of a full cvd to
> the size of a typical cdiff.
> A solution to that would be to use a local HTTP proxy to distribute the
> cdiff files to all the ClamAV installations on the LAN. (But that would
> require rather complicated setup.)
> A third approach would be to do the mirroring using the cdiff-generated
> cld files rather than with cvd files. I don't know what changes to
> freshclam this would require. One possible obstacle to doing this is
> whether the cld files are or could be cryptographically signed like the
> cvds are. Something like that would likely be necessary for enterprise
> security. (Presumably, generating Talos-signed cvds locally from the
> clds would be a really bad idea, while setting up private PKI for local
> signing would be a really big pain.)
> A fourth, and I think very simple, approach would be to name cvds like
> the cdiffs are named. In other words, instead of having daily.cvd, one
> would have daily-12345.cvd, followed by daily-12346.cvd as the next
> update. This would be impervious to the vagaries of caching. I also
> think it would require only fairly trivial code changes to freshclam
> and whatever component of ClamAV it is that (re)loads the database.
> (All that would be necessary would be to always use the cvd with the
> highest version number.)
> Any thoughts on all this? Is local mirroring still possible?
> Paul
> P.S. I would have thought that since the clds are much bigger than the
> corresponding cvds, loading a cld into memory would be slower than
> loading the equivalent cvd, but this seems not to be the case.
> To measure the load time I ran clamscan on one tiny file using the
> daily.cvd version of the signatures and then using the much bigger
> daily.cld. (Main.cvd remained as itself.) This was done on an fairly old
> machine, and before each run I, of course, did:
>    echo 1 > /proc/sys/vm/drop_caches
> The result was that total real (and 'user') times were slightly less
> for the cld, although the 'system' time was slightly more. I wonder what
> eats up the extra time. (I thought disks were always supposed to be the
> bottleneck for simple computations like decompression and crypto.)
> _______________________________________________
> clamav-users mailing list
> clamav-users at lists.clamav.net
> http://lists.clamav.net/cgi-bin/mailman/listinfo/clamav-users
> Help us build a comprehensive ClamAV guide:
> https://github.com/vrtadmin/clamav-faq
> http://www.clamav.net/contact.html#ml

More information about the clamav-users mailing list