[clamav-users] ClamAV UnOfficial Database
Henrik K
hege at hege.li
Thu May 4 21:30:15 UTC 2017
On Thu, May 04, 2017 at 08:36:00PM +0300, Henrik K wrote:
> On Thu, May 04, 2017 at 02:57:51PM +0200, Reindl Harald wrote:
> >
> > it's unacceptable having a clamd process which wastes nearly 1 GB of RAM
> > hanging around when he don't catch anything
>
> For once I have to agree..
>
> My stats:
> ClamAV - 10 million sigs (includes most sanesecurity stuff)
> Sophos - 13 million sigs
>
> # /usr/bin/time -f '\t%E real, \t%M kBmem' /usr/local/clamav/bin/clamscan /etc/hosts
> <snip>
> 0:28.18 real, 1096492 kBmem
>
> # /usr/bin/time -f '\t%E real, \t%M kBmem' /opt/sophos-av/bin/savscan /etc/hosts
> <snip>
> 0:05.99 real, 231504 kBmem
>
> Perhaps ClamAV devs should start innovating a little on how to handle all
> the sigs, instead of keeping bloating a glorified in-memory hash-database.
> ;-D Jeez one could probably simply precompile a CDB database from all the
> hashes and dramatically reduce memory usage, probably wouldn't even slow
> down much..
Just playing around a bit..
# /usr/bin/time -f '\t%E real, \t%M kBmem' /usr/local/clamav/bin/clamscan -d /tmp/testsigs /etc/hosts
Known viruses: 10448710
0:25.76 real, 1164396 kBmem
Take out all the "complete file hashes" and we are not left with many sigs..
dramatic drop in memory usage, though it's still very high considering..
# /usr/bin/time -f '\t%E real, \t%M kBmem' /usr/local/clamav/bin/clamscan -d /tmp/testsigs /etc/hosts
Known viruses: 298188
0:10.67 real, 215048 kBmem
These were separated:
# wc -l *
447753 daily.hdb
54 daily.hdu
1531075 daily.hsb
1 daily.hsu
75620 daily.mdb
1083 daily.mdu
1 daily.msb
1 daily.msu
58464 main.hdb
1 main.hsb
4059433 main.mdb
1 main.msb
428 porcupine.hsb
9636 rfxn.hdb
114 rogue.hdb
3730415 securiteinfo.hdb
94786 securiteinfoandroid.hdb
96084 securiteinfoascii.hdb
36319 securiteinfohtml.hdb
14 spamattach.hdb
71 spamimg.hdb
5894 winnow.attachments.hdb
825 winnow_extended_malware.hdb
3751 winnow_malware.hdb
10151824 total
Chew them into cdb with some lamo perl
===
#!/usr/bin/perl
use CDB_File;
$cdb = new CDB_File ('/tmp/sigs.cdb', "/tmp/sigs.cdb.$$") or die $@;
$keys = 0;
while (<STDIN>) {
chomp;
if (/^([a-f0-9]{32,64}):(\d+|\*):([^:]+)/i) {
$hash = lc($1); $size = $2; $sig = $3;
}
elsif (/^(\d+):([a-f0-9]{32,64}):([^:]+)/i) {
$size = $1; $hash = lc($2); $sig = $3;
}
else { die "Barf? $_\n"; }
$cdb->insert(pack("H*", $hash), "$size:$sig");
$keys++;
}
$cdb->finish;
print "$keys keys inserted\n";
===
# cat * | /usr/bin/time -f '\t%E real, \t%M kBmem' /tmp/clamcdb.pl
10151824 keys inserted
0:31.09 real, 160144 kBmem
# du -h /tmp/sigs.cdb
781M /tmp/sigs.cdb
So we traded memory for equal disk. No surprise there, those bazillion
hashes need their space. I guess someone should just serve them up in cloud
somewhere like... Immunet? ^_^
More information about the clamav-users
mailing list