[clamav-users] [External] Re: Scan very slow
Paul Kosinski
clamav-users at iment.com
Thu Apr 11 20:09:29 UTC 2019
Does clamd use multi-threading for the various "engines" within a
single scan, or only to handle multiple requests from different sources?
On Tue, 9 Apr 2019 21:29:43 +0000
"Micah Snyder \(micasnyd\) via clamav-users"
<clamav-users at lists.clamav.net> wrote:
> Maarten,
>
> Your test results are pretty great. I really like your breakdown of
> the signatures by category. I will caution that scan times will vary
> quite heavily depending on what you’re scanning, based on Target type
> (https://www.clamav.net/documents/clamav-file-types).
>
> In addition, it’s important to distinguish between load and scan
> times. The time reported by clamscan is both load + scan. If you
> just want scan time, you will want to load the database with clamd
> and then test the scantime with clamdscan.
>
> Regarding load time vs scantime, all of the signatures must be
> loaded, but depending on the target type of the file being scanned,
> not all of the signatures will be matched against the file. That is,
> daily_Win.ldb might take the longest to load due to the number of
> signatures or complexity of the signatures but when scanning a PDF,
> they probably won’t impact scan time, as Win signatures are probably
> mostly target type 1 (PE file).
>
> I’ve bit of time today investigating what I believe is responsible
> for slow load and scan times for the Phishtank sigs. I had a hunch,
> based on a conversation we saw a while back in the mailing list, that
> the identical beginning for URL-based signatures result in an
> un-balanced and inefficient tree for matching. That is, some 3000
> signatures each began with either:
>
>
> 1. href="http:// (687265663d22687474703a2f2f)
> 2. HYPERLINK"http (48595045524c494e4b2022687474703a2f2f)
> 3. S/URI/URI(http:// (532f5552492f55524928687474703a2f2f)
>
> Looking at a few of the Phish.Phishing signatures, these appear to
> have the same issue (href="http:// prefix). In testing with scan of
> a PDF document, I was able to reduce the scan time from 31.987 sec
> down to 2.632 sec simply by changing the start of the Phishtank
> signatures for the following:
>
>
> 1. href="http://
> * from: 687265663d22687474703a2f2f
> * to: 687265663d2268747470{3-4}
> 2. HYPERLINK "http
> * from: 48595045524c494e4b2022687474703a2f2f
> * to: 48595045524c494e4b202268747470{3-4}
> 3. S/URI/URI(http://
> * from: 532f5552492f55524928687474703a2f2f
> * to: 532f5552492f5552492868747470{3-4}
>
> This should get the same detection with a faster load and scan time,
> and will accommodate for httpS for better coverage. To turn lemonade
> into really good lemonade, we may be able to take the above
> optimization and apply it to the Phish.Phishing signatures identified
> by Maarten to reduce scan times further to levels below those before
> the addition of the Phishtank signatures.
>
> As noted by Maarten as well, the Phish.Phishing sigs are Target type
> 0, whereas we’d split the Phishtank.Phishing signatures up by target
> type to reduce scan times of files where the signatures won’t apply.
> It should also speed things up quite a bit for other file types to
> split those up by Target types.
>
> Further research into scan time optimization is definitely welcome
> and appreciated.
>
> Regards,
> Micah
More information about the clamav-users
mailing list