[clamav-users] Scanning very large files in chunks

G.W. Haywood clamav at jubileegroup.co.uk
Thu Aug 11 13:15:08 EDT 2016

Hello once again,

On Thu, 11 Aug 2016, sapientdust+clamav at gmail.com wrote:

> I scan a 4.5 GB file in multiple instream calls, by scanning the first
> 3 GB in one call, and then making a second instream call that provides
> the first N  MB followed by the last 2 GB of the file.

> Would clamav be expected to work similarly in the two cases in terms
> of identifying a virus, assuming the virus is the same in the two
> scenarios and it's in ClamAV's database? Or are there technical
> reasons why ClamAV wouldn't detect the virus in the second scenario
> but would in the first, even though the virus bytes are identical?

There's a possibility of failing to find it in the second scenario.
It's anybody's guess what the probability will be; my guess would be
that the probability of that failure would be small compared with the
relatively large probability of not finding it at all in both cases.

> This is a question for clamav developers or those who understand the
> codebase sufficiently to know the impact of scanning a partial file.

I don't think so.  Just think about it a bit:

Much of ClamAV's operation is looking for pattern matches.
Suppose you scan a 4.5GB file in two chunks.
Suppose half this mysterious 'huge file virus' is in the first chunk.
Presumably the other half is in the second chunk.
What happens if the pattern is designed to match the entire virus?

> Should I have asked this question on the developer list?

No.  You're a user, the developers' list is for working on ClamAV.



More information about the clamav-users mailing list