On Tue, Mar 15, 2022 at 1:53 PM G.W. Haywood via clamav-users <clamav-users@lists.clamav.net> wrote:
Hi there,

On Tue, 15 Mar 2022, Laurent S. via clamav-users wrote:
>> using Yara's engine in clamav directly is something that has been
>> brought up time and again. It is possible. My understanding is that
>> the reason ClamAV's yara support isn't done this way is that it
>> would require a second pass over the file with a Yara's pattern
>> matcher, after ClamAV's pattern matcher, and that the performance
>> concern made it make more sense to try and load yara rules into
>> ClamAV's matcher instead.

Speaking selfishly I wouldn't be greatly inconvenienced by an increase
in the scan times (even if it doubles) caused by separating the Yara
engine from the ClamAV engine.  That's because I only scan mail, and
the clamd server is well on top of it.  I can understand that people
who scan filesystems might have a different point of view; maybe both
could be accommodated with a config option.

Anything that increases scan times would be prohibitive for me. We use ClamAV to scan around a billion files per day and the primary thing stopping us from using Yara is the increase in scan times.
 
>> I honestly don't have any numbers to back up this argument. It
>> sounds reasonable, but I'd love to see the numbers.

I occasionally run more than one clamd instance and I've seriously
considered running a separate one purely so that that Yara rules are
kept separate from the rest.  I always log scan times.  It will be a
bit fiddly, but when I get a minute I'll set something up to try to
give you some numbers.


We run multiple clamd instances specifically to load different sets of signatures for different purposes.

For example, if we have instance 1 with very specific signatures and instance 2 with more general signatures and instance 3 with ClamAV / 3rd party signatures, we would first scan against instance 1 and, if we don't get a match, we then scan against instance 2 and, if still no match, against instance 3.
 
> One big reason I like to use ClamAV is that it's possible to add
> other sources of signatures. Lots of people use the sanesecurity
> ones. I add a lot of my own.

+1


For us, the attraction is the ease of creating our own signatures more than the 3rd-party signatures, though 3rd-party signatures are a definite plus.
 
Finally, unashamed repetition:

(1) a plea for a way to test rules before they go live;

This is relatively straightforward to do on your own (save the signatures in a temp location, create a file with something that you know will match, and scan to make sure it is detected), so the fact that it's not built-in is a bit confusing. 
 
(2) another plea for a parser which is good at its job;

(3) a way to specify that a rule is to match in
    (a) mail headers only or
    (b) mail body only or
    (c) both;
 
This would be awesome for mail, but also for any file that has differentiated parts. It would be great to have a better macro style that would allow you to combine multiple signatures to produce a different classification (sort of like logical signatures, but with the ability for each sub-signature to hit different filetypes).

and lastly

(4) it would be great to have a way to reload rulesets separately so
it isn't necessary to reload ten million signatures when you've only
added one Yara rule, only then to find clamd crashes the first time it
tries to scan anything because you broke that rule.  I understand this
might be asking a lot, and a decent parser which prevents attempts to
load garbage rules (point 2) would do a lot to alleviate this pain.
 
100% this. Having the ability to load a diff rather than the complete database would be an enormous boon.

--Maarten