[Clamav-devel] Pull request to add parallel scanning to clamscan
Michal Marek
mmarek at suse.com
Tue Jun 20 05:59:45 EDT 2017
On 2017-06-20 11:33, Mark Allan wrote:
> From the commit message you said "build a list of files first and
> then spawn N children to scan the files in parallel."
>
> Does this actually iterate *all* the files and directories before
> starting the first scan?
Yes.
> If you're scanning a large directory tree,
> how much overhead does this add prior to scanning the first file?
Unless you are scanning a really slow NFS or CIFS mount, it's negligible
compared to the time it takes to process the content. Initializing the
database does take noticeable time on startup.
> Alternatively, as clamscan already iterates through directories, does
> it maintain a count of the number of concurrent calls to 'scanfile()'
> and fire off another one at that point as necessary?
That would of course be an option, but it would require incrementally
passing paths to the children / threads. With the current approach, I
only need a pipe, which is the simplest synchronization primitive one
can think of :).
Michal
More information about the clamav-devel
mailing list