[clamav-users] Clamav cannot detect a malware using a signature based on html comment

Dennis Peterson dennispe at inetnw.com
Tue Jan 26 14:50:42 EST 2016


test.html
<html>
<body>
THIS IS     A MALWARE
<!-- THIS      IS A MALWARE -->
</html>

Test signatures:
<!-- this is a malware -->
<!-- This is a malware -->
  this is a malware
  This is a malware

test.ndb
test1:3:*:3c212d2d20546869732069732061206d616c77617265202d2d3e
test2:3:*:3c212d2d20746869732069732061206d616c77617265202d2d3e
test3:3:*:20746869732069732061206d616c7761726520
test4:3:*:20546869732069732061206d616c7761726520

Results:
  clamscan -id test.ndb test.html
test.html: test3.UNOFFICIAL FOUND

----------- SCAN SUMMARY -----------
Known viruses: 4
Engine version: 0.98.4
Scanned directories: 0
Scanned files: 1
Infected files: 1
Data scanned: 0.00 MB
Data read: 0.00 MB (ratio 0.00:1)
Time: 0.005 sec (0 m 0 s)

Analysis: Clamscan is removing multiple spaces and comments and converting the 
text to lower case.

dp



On 1/26/16 2:49 AM, Arnaud Jacques / SecuriteInfo.com wrote:
> Hello Clamav Team,
>
> To detect some JS includers, I need to create a signature based on HTML
> comment. Here is an example
>
> # cat test.html
> <html>
> <body>
> <!-- This is a malware -->
> </body>
> </html>
>
> I *need* to include the comment tags to avoid false positives. I tried several
> signatures :
> # cat test.ndb
> test:7:*:3c212d2d20546869732069732061206d616c77617265202d2d3e
> test:7:*:3c212d2d20746869732069732061206d616c77617265202d2d3e
> test:3:*:3c212d2d20546869732069732061206d616c77617265202d2d3e
> test:3:*:3c212d2d20746869732069732061206d616c77617265202d2d3e
>
> None of them matches.
>
> # clamscan -id test.ndb test.html
>
> ----------- SCAN SUMMARY -----------
> Known viruses: 4
> Engine version: 0.98.7
> Scanned directories: 0
> Scanned files: 1
> Infected files: 0
> Data scanned: 0.00 MB
> Data read: 0.00 MB (ratio 0.00:1)
> Time: 0.007 sec (0 m 0 s)
>
> (I have also tested with lastest github snapshot of clamav-devel with no more
> success)
>
>
> Why doesn't it match ? Let's run the scan with debug information :
>
> # clamscan -id test.ndb test.html --debug
> (... snip ...)
> LibClamAV debug: Recognized ASCII text
> LibClamAV debug: Matched signature for file type HTML data at 0
> LibClamAV debug: cache_check: e7a3239dc6d11597df1a03a6a8a55854 is negative
> LibClamAV debug: in cli_scanhtml()
> LibClamAV debug: cli_scanhtml: using tempdir /tmp/clamav-
> a13a0761052e94cf406a02db25f7c324.tmp
> LibClamAV debug: hashtab: Freeing hashset, elements: 0, capacity: 0
> LibClamAV debug: hashtab: Freeing hashset, elements: 0, capacity: 0
> LibClamAV debug: cli_magic_scandesc: returning 0  at line 2334
> LibClamAV debug: cache_add: e7a3239dc6d11597df1a03a6a8a55854 (level 0)
> LibClamAV debug: Cleaning up phishcheck
> LibClamAV debug: Freeing phishcheck struct
> LibClamAV debug: Phishcheck cleaned up
>
> The file is detected as ASCII, is not normalized and not scanned by the
> engine, then the file is detected as HTML, normalized and scanned by the
> engine.
>
> The HTML normalization is removing html comments from the original file.
> That's why it is not detected.
>
> There is 2 soltions to resolve this :
>
> 1/ When detecting ASCII file, normalize it and scan it before clamav try to
> detect if it is a html file.
>
> or
>
> 2/ When detecting HTML, Clamav generate 2 temp files : "nocomment.html" and
> "notags.html". I suggest to add a third temp file "withcomment.html".
> "withcomment.html" should be normalized (removing space, carriage returns,
> lower ascii, etc) but keeps the html comments.
>
> On my side, a signature is ready to detect hundreds of thousands of
> JS.Includer. I'm ready to publish it in the official Clamav database when this
> new engine feature is ready. This could greatly improve Clamav detection
> ratio.
>




More information about the clamav-users mailing list