[Clamav-devel] AC algorithm optimization

刘申 liu31380 at gmail.com
Wed Mar 26 07:34:14 EDT 2014


I have been studying the AC algorithm of ClamAV. Here are some ideas.

Now AC:
1.Use of the Trie structure,to loading large-scale
pattern string,memory expansion serious;
2.The max_ac_depth is set to 3,although can reduce the memory usage;but
that could lead to the ac_find_match function is frequently called.

If a algorithm that can compressing the Trie structure, that could increase
the max_ac_depth value.

There is just right a double array trie algorithm,using two arrays,very
clever compression trie structure.

I did a experiment on my PC,the AC algorithm is separated from the ClamAV
code,which can be run independently.I implement the double array trie as a
contrast at the same time.Extracted the original AC algorithm rules,as
the experimental rules(General:8030PE:17241substrings).

Scanning the files samples(1465 files,filetype :
HTML,PE,ASCII,PDF,etc.. the total size: 273MB).

Test code flow, initialization of AC structure, -> load the rule, -> AC
build - > scan directory for each file, -> read file content into the
memory buffer, - > call gettimeofday to record the start time, -> call
ac scan buff, -> call gettimeofday to record end time, - > calculate
throughput (Mbps) according to the file size and scanning time

Hardware and Software environment:

CPU: Intel i3-3220 @ 3.30 GHz CPU

Memory: DDR3 1333 8 g

OS version: Ubuntu 13.10

GCC version: 4.8.1

The result:

Most of the file scanning performance will be improved, some file pretty
obvious rise up to 10 times.

Whether can use dat algorithm to replace the original ac algorithm to
improve the scanning performance?

More information about the clamav-devel mailing list