[clamav-users] No good deed goes unpunished, or, why CVD files don't work

Paul Kosinski clamav-users at iment.com
Sat Dec 15 13:01:43 EST 2018

Automated configuration management sounds interesting, but we have
only a few machines running ClamAV, rather than a couple hundred, so I
doubt the effort would pay off.

We could, I suppose, have a "master" ClamAV do the Scripted Update and
then distribute updated clds to the other ClamAVs (using rsync), but
that would mean I would have to worry about synchronizing running
clamds, and perhaps our HAVP daemons.

I now conclude that with only a few ClamAV machines, the very large size
of current cvds (they used to be *much* smaller), and especially given
the vulnerability of cvds to caching, the most reasonable approach is
simply to turn on Scripted Update everywhere and have each ClamAV
machine obtain the cdiffs directly from Cloudflare. (A local HTTP proxy
would save only a trivial amount of external bandwidth, but would be a
pain to set up and maintain, since it isn't otherwise needed.)


P.S. I figured the cdiffs were designed to be time invariant, but I
never saw any documentation -- hence my dumpcap experiment.

On Fri, 14 Dec 2018 22:42:40 -0800
Dennis Peterson <dennispe at inetnw.com> wrote:

>  From a best practices perspective it is best to use freshclam when
> talking to ClamAV resources. Once you have what you need from them
> you can do anything you like internally. You don't have to be nice to
> them at this point. I had a couple hundred RedHat servers to manage
> and they all required scanning software because of the industry I was
> in and because of HIPPA, credit card, social security, phone numbers
> and other personal information rules we were bound to. I created a
> lot of locally generated signatures to look for this information.
> This was before smart file systems that would do this for us.
> When I built the local private mirror I used the cdiff files
> (scripted downloads were permitted) to create local patched .cld
> files. These had to be distributed to the hundreds of other machines
> and for that I initially used rsync because it is just bullet proof,
> and later I moved it all to CFengine (predecessor to puppet, chef).
> The CFengine master server received the cld files from a snapshot
> file system (freshclam triggered the snapshot before and after an
> update) so new updates would not corrupt existing signature files,
> and it then immediately informed all the clients they had work to do
> to become conformal (in CFengine  terms). CFengine is smart enough to
> know to transmit differences between local and remote files on the
> fly (rsync) so net traffic is minimized as the daily and bytecode
> files don't change much. And because of the way the process works
> (creating hidden files until the differences are resolved), the
> hidden files are renamed to the original names which is very close to
> an atomic operation, so problems working with files in transport were
> prevented. The CFengine client would notify the local clamd instance
> when the files were ready. Clamd has to be told not to reload when it
> detects signature change. All very clean, fast, and secure owing to
> using secure processes at each step and hands-free on my part. It
> also passed federal government security audits which was the best
> part.
> Short answer - don't use freshclam to get the signature files from
> your mirror to your clients and it won't matter if they are cld, cvd,
> cud, etc., and it doesn't burden the ClamAV servers by pulling full
> copies of CVD files.
> As for the cdiff files not changing, that is by design because each
> cdiff file brings the local cld file to the cdiff version, and
> because it can't be known how many cdiffs have been created between
> user updates, they are retained for a period of time and freshclam
> applies them in order until the final cdiff matches the current DNS
> TXT record.
> dp
> On 12/14/18 6:58 PM, Paul Kosinski wrote:
> > The Good Deed
> >
> > When we started using ClamAV, we wanted to distribute the database
> > to the several machines on our LAN in order to reduce the load on
> > the volunteer servers and minimize the load on our old DSL (now
> > gone). The best way to do this, it seemed, was to set up a trivial
> > HTTP server to mirror and deliver the new files. And, of course,
> > they had to be cvd files which, according to the FAQ, precluded
> > "Scripted Updates" and the much smaller cdiff files.
> >
> >
> > The Punishment
> >
> > This all worked quite well until ClamAV switched to distributing the
> > updates via Cloudflare: then The Delays started. The Delays
> > initially exhibited themselves when freshclam itself(!) found that
> > the DNS TXT record said that a new daily.cvd was available but upon
> > trying to retrieve it freshclam failed, complaining about network
> > problems. This eventually would cause all the mirrors to be
> > disabled.
> >
> > After much investigation (documented at length in previous posts) I
> > noticed that the daily.cvd from the BOS Cloudflare server was often
> > far behind that from the IAD Cloudflare server (which always seemed
> > to match the DNS TXT advertisement). I began to suspect that this
> > was perhaps caused by a caching web proxy, probably a transparent
> > one "helpfully" interposed by Comcast.
> >
> > While all this was going on, Joel stated that nobody else was having
> > (or at least reporting) these Delay problems.
> >
> > Now I think I know why.
> >
> >
> > The Explanation
> >
> > Most everybody (I would guess) uses the Scripted Update feature,
> > which is enabled by default. So, I ran an experiment. On one
> > machine I bypassed local mirroring, enabled Scripted Update *and*
> > captured the HTTP traffic to/from Cloudflare via dumpcap. What I
> > found was that Scripted Update does HTTP GETs for one or more
> > daily-12345.cdiff files in sequence, each, presumably, updating
> > "daily" from the numerically previous version.
> >
> > Now it became clear! Each daily-12345.cdiff *always* has the same
> > content, no matter when it is retrieved. The content of daily.cvd,
> > on the other hand varies over time. That makes *any* caching of
> > daily.cvd files susceptible to cause versioning problems, whereas
> > the cdiff files (such as daily-12345.cdiff) are totally
> > invulnerable to any caching whatsoever: web caches work according
> > to file *name*, not file content.
> >
> > This problem is exacerbated by the fact that the Cloudflare servers
> > seem to add a "Cache-Control:" HTTP header that does NOT specify
> > "no-cache". (I don't know what the old "volunteer" servers did in
> > this regard.)
> >
> > The upshot of this is that the Scripted Update mechanism will
> > *never* get out-of-date cdiff files, although it may experience a
> > short delay if it's the first requester of a new cdiff.
> >
> > The local mirror mechanism, on the other hand is almost guaranteed
> > to fail on occasion -- or at least suffer arbitrary delays -- if
> > there is a caching proxy in its path to Cloudflare. Even if the
> > Cloudflare servers used a "Cache-Control: no-cache" header, there
> > might be a rogue proxy in the way that ignores this header, and
> > caches anyway. (AFAIK, there is no way to enforce "no-cache".)
> >
> > So what could be done to avoid the problem?
> >
> > One possibility is to give up on local mirrors. But that might
> > increase the load on the Cloudflare servers, as some installations
> > might have more local ClamAV clients than the ratio of the size of
> > a full cvd to the size of a typical cdiff.
> >
> > A solution to that would be to use a local HTTP proxy to distribute
> > the cdiff files to all the ClamAV installations on the LAN. (But
> > that would require rather complicated setup.)
> >
> > A third approach would be to do the mirroring using the
> > cdiff-generated cld files rather than with cvd files. I don't know
> > what changes to freshclam this would require. One possible obstacle
> > to doing this is whether the cld files are or could be
> > cryptographically signed like the cvds are. Something like that
> > would likely be necessary for enterprise security. (Presumably,
> > generating Talos-signed cvds locally from the clds would be a
> > really bad idea, while setting up private PKI for local signing
> > would be a really big pain.)
> >
> > A fourth, and I think very simple, approach would be to name cvds
> > like the cdiffs are named. In other words, instead of having
> > daily.cvd, one would have daily-12345.cvd, followed by
> > daily-12346.cvd as the next update. This would be impervious to the
> > vagaries of caching. I also think it would require only fairly
> > trivial code changes to freshclam and whatever component of ClamAV
> > it is that (re)loads the database. (All that would be necessary
> > would be to always use the cvd with the highest version number.)
> >
> >
> > Any thoughts on all this? Is local mirroring still possible?
> >
> > Paul
> >
> >
> > P.S. I would have thought that since the clds are much bigger than
> > the corresponding cvds, loading a cld into memory would be slower
> > than loading the equivalent cvd, but this seems not to be the case.
> >
> > To measure the load time I ran clamscan on one tiny file using the
> > daily.cvd version of the signatures and then using the much bigger
> > daily.cld. (Main.cvd remained as itself.) This was done on an
> > fairly old machine, and before each run I, of course, did:
> >
> >    echo 1 > /proc/sys/vm/drop_caches
> >
> > The result was that total real (and 'user') times were slightly less
> > for the cld, although the 'system' time was slightly more. I wonder
> > what eats up the extra time. (I thought disks were always supposed
> > to be the bottleneck for simple computations like decompression and
> > crypto.) _______________________________________________

More information about the clamav-users mailing list