[Clamav-devel] AC algorithm optimization
smorgan at sourcefire.com
Wed Mar 26 12:45:47 EDT 2014
That sounds interesting. Is there a reference implementation of AC trie
using double arrays for us?
On Wed, Mar 26, 2014 at 7:34 AM, 刘申 <liu31380 at gmail.com> wrote:
> I have been studying the AC algorithm of ClamAV. Here are some ideas.
> Now AC:
> 1.Use of the Trie structure,to loading large-scale
> pattern string,memory expansion serious;
> 2.The max_ac_depth is set to 3,although can reduce the memory usage;but
> that could lead to the ac_find_match function is frequently called.
> If a algorithm that can compressing the Trie structure, that could increase
> the max_ac_depth value.
> There is just right a double array trie algorithm,using two arrays,very
> clever compression trie structure.
> I did a experiment on my PC,the AC algorithm is separated from the ClamAV
> code,which can be run independently.I implement the double array trie as a
> contrast at the same time.Extracted the original AC algorithm rules,as
> the experimental rules(General:8030PE:17241substrings).
> Scanning the files samples(1465 files,filetype :
> HTML,PE,ASCII,PDF,etc.. the total size: 273MB).
> Test code flow, initialization of AC structure, -> load the rule, -> AC
> build - > scan directory for each file, -> read file content into the
> memory buffer, - > call gettimeofday to record the start time, -> call
> ac scan buff, -> call gettimeofday to record end time, - > calculate
> throughput (Mbps) according to the file size and scanning time
> Hardware and Software environment:
> CPU: Intel i3-3220 @ 3.30 GHz CPU
> Memory: DDR3 1333 8 g
> OS version: Ubuntu 13.10
> GCC version: 4.8.1
> The result:
> Most of the file scanning performance will be improved, some file pretty
> obvious rise up to 10 times.
> Whether can use dat algorithm to replace the original ac algorithm to
> improve the scanning performance?
> Please submit your patches to our Bugzilla: http://bugs.clamav.net
More information about the clamav-devel