[clamav-users] Scanning very large files in chunks

sapientdust+clamav at gmail.com sapientdust+clamav at gmail.com
Fri Aug 12 23:03:09 EDT 2016

On Fri, Aug 12, 2016 at 2:51 PM, TR Shaw <tshaw at oitc.com> wrote:
> Actually there is always a probability that a detection will not occur if you beak apart at file into pieces  This is due to the following
> 1) md5 signatures based upon any file type are applied on any file and match to the md4 hash of that file AND the file’s size. If you break apart a file, neither the hash nor the file size will match the signature.

Thanks for the info! I don't quite understand this part though. In
Andy Singer explained that the "WIN.Trojan.DarkKomet:1:*:..." sig
would match the bytes anywhere in the file, so that's definitely not
taking the whole hash of the file into account.

It seems extraordinarily brittle to take the whole file digest into
account, because then a single bit flip anywhere in the file is enough
to evade clamav altogether, because it would be very easy to make
every file unique if clamav takes into account the size and the digest
of the full file.

> 2) Complex signatures that a logical grouping of the results of multiple other signature detections are the other type that can break if you break a file in pieces.
> This question of breaking apart files and checking comes up regularly by folks who need to support high data rate inputs and still be NIST FISMA compliant and the answer is always not you can’t do that.
> Tom

I see. Is that something you would expect to make a difference in
practice if the chunks were large (say 1 GB each)?

Thanks for your thoughts.

More information about the clamav-users mailing list