[clamav-users] Scanning very large files in chunks

Paul Kosinski clamav-users at iment.com
Fri Aug 5 02:32:08 UTC 2016

Really large files like this would likely either be video files or
disk images (incl. DVD and Blu-Ray). Both kinds could, in principle,
have malware embedded.

Disk images often contain whole file systems and thus many, many files.
The alternative is to scan the entire FS after it is "mounted". (Of
course disk images these days might be 6 TB rather than a mere 6 GB.)

Video files could have malicious sequences of bytes which break codecs
(e.g., via buffer overflows) and possibly result in code execution.
(Adobe Flash comes to mind, with its monthly critical updates, but I'm
sure other codecs can also have similar problems.)

On Thu, 04 Aug 2016 19:14:05 -0700
Al Varnell <alvarnell at mac.com> wrote:

> Does anybody have any evidence of malware that exceeds 4GB? Although
> I can certainly see the utility of the proposed capability as a hedge
> for the future, it would seem to be a waste of time and compute power
> to scan such large files today.
> With the ever increasing malware issues we face today, it’s important
> to consider this:
> Risk = threat x vulnerability x consequence
> <http://fortune.com/2016/05/14/cybersecurity-risk-calculation/>
> We all need to focus on fixing the high risk items first.
> Sent from Janet's iPad
> -Al-
> On Aug 4, 2016, at 4:40 PM, sapientdust+clamav at gmail.com wrote:
> > 
> > I've recently run into the issue of clamd not being able to scan
> > files that are larger than a small number of GB, and I have seen the
> > warnings in `man clamd.conf` that say specified limits above 4GB are
> > ignored.
> > 
> > Could developers or other folks familiar with the clamd codebase
> > comment on the feasibility of scanning large files in multiple
> > pieces as a way of handling larger files?
> > 
> > For example, given a file that is 6GB, does using multiple INSTREAM
> > calls (that's how I'm interacting wth clamd currently) to check the
> > full 6GB seem like it should work reliably?
> > 
> > INSTREAM: bytes 0-1000MB
> > INSTREAM: bytes 900MB-1.9GB
> > INSTREAM: bytes 1.8GB-2.8GB
> > INSTREAM: bytes 2.7GB-3.7GB
> > INSTREAM: bytes 3.6GB-4.6GB
> > INSTREAM: bytes 4.5GB-5.5GB
> > INSTREAM: bytes 5.4GB-6.0GB
> > 
> > There is overlap above, wherein the 100MB of data that starts at the
> > 900MB position is scanned twice, once in the first call (as the last
> > 100MB of that stream) and once in the second call (as the first
> > 100MB of that stream), to reduce the possibility of a virus being
> > split into two pieces and therefore not recognized.
> > 
> > If Clamav needs the first bytes in order to know
> > what kind of file it is scanning and trigger filetype-specific
> > heuristics, then something like the above could be adapted so that
> > the first N bytes of the first chunk are prepended to each
> > subsequent chunk that is checked for that file.
> > 
> > Thanks for any guidance or feedback you can provide.

More information about the clamav-users mailing list