[clamav-users] Scanning very large files in chunks

Paul Kosinski clamav-users at iment.com
Thu Aug 11 16:19:44 EDT 2016

After posting a while ago about scanning (extremely) large disk images,
I realized that files need not be contiguous in a disk image. It all
depends on the block allocation algorithm of the file system and, in
many cases, to fragmentation that occurs as the disk is used.

So, even if you could scan a terabyte+ disk image as one long stream,
a virus signature might escape detection due to being split in half.

On Thu, 11 Aug 2016 18:15:08 +0100 (BST)
"G.W. Haywood" <clamav at jubileegroup.co.uk> wrote:

> Hello once again,
> On Thu, 11 Aug 2016, sapientdust+clamav at gmail.com wrote:
> > I scan a 4.5 GB file in multiple instream calls, by scanning the
> > first 3 GB in one call, and then making a second instream call that
> > provides the first N  MB followed by the last 2 GB of the file.
> > Would clamav be expected to work similarly in the two cases in terms
> > of identifying a virus, assuming the virus is the same in the two
> > scenarios and it's in ClamAV's database? Or are there technical
> > reasons why ClamAV wouldn't detect the virus in the second scenario
> > but would in the first, even though the virus bytes are identical?
> There's a possibility of failing to find it in the second scenario.
> It's anybody's guess what the probability will be; my guess would be
> that the probability of that failure would be small compared with the
> relatively large probability of not finding it at all in both cases.
> > This is a question for clamav developers or those who understand the
> > codebase sufficiently to know the impact of scanning a partial file.
> I don't think so.  Just think about it a bit:
> Much of ClamAV's operation is looking for pattern matches.
> Suppose you scan a 4.5GB file in two chunks.
> Suppose half this mysterious 'huge file virus' is in the first chunk.
> Presumably the other half is in the second chunk.
> What happens if the pattern is designed to match the entire virus?
> > Should I have asked this question on the developer list?
> No.  You're a user, the developers' list is for working on ClamAV.

More information about the clamav-users mailing list