[Clamav-devel] Pull request to add parallel scanning to clamscan

Michal Marek mmarek at suse.com
Tue Jun 20 05:59:45 EDT 2017


On 2017-06-20 11:33, Mark Allan wrote:
> From the commit message you said "build a list of files first and
> then spawn N children to scan the files in parallel."
> 
> Does this actually iterate *all* the files and directories before
> starting the first scan?

Yes.


> If you're scanning a large directory tree,
> how much overhead does this add prior to scanning the first file?

Unless you are scanning a really slow NFS or CIFS mount, it's negligible
compared to the time it takes to process the content. Initializing the
database does take noticeable time on startup.


> Alternatively, as clamscan already iterates through directories, does
> it maintain a count of the number of concurrent calls to 'scanfile()'
> and fire off another one at that point as necessary?

That would of course be an option, but it would require incrementally
passing paths to the children / threads. With the current approach, I
only need a pipe, which is the simplest synchronization primitive one
can think of :).

Michal



More information about the clamav-devel mailing list