[clamav-users] Understanding 'Heuristics.Phishing.Email.SpoofedDomain' debug output
Mickey Williams
M.Williams at kent.ac.uk
Tue Nov 17 13:28:32 UTC 2020
I didn't actually think to try and trace this back via the source available on GitHub, my ability to read C isn't great but I don't think any of the debug output is "wrong" (misbehaving). The function get_char_at_pos_with_skip is successfully getting the domain, the output down to
LibClamAV debug: Lookup result: in regex list
is doing what it is supposed to do, the problem I then have is there is then nothing displayed apart from
LibClamAV debug: Phishcheck: Phishing scan result: URLs are way too different
which is the very last part of the phishingScan function and without understanding the entire phishing related codebase, it has gone from 'does hsbc.co.uk exist on a regex list?' -> Yes -> ??? -> 'URLs are too different'.
I'll submit this as a clamAV bug report, as if it turns out it isn't a bug and there is some valid questionable way the HTML is written, the debug output needs to be improved to show what is actually happening.
Regards
Mickey
________________________________
From: clamav-users <clamav-users-bounces at lists.clamav.net> on behalf of G.W. Haywood via clamav-users <clamav-users at lists.clamav.net>
Sent: 11 November 2020 12:53
To: Mickey Williams via clamav-users <clamav-users at lists.clamav.net>
Cc: G.W. Haywood <clamav at jubileegroup.co.uk>
Subject: Re: [clamav-users] Understanding 'Heuristics.Phishing.Email.SpoofedDomain' debug output
Hi there,
On Wed, 11 Nov 2020, Mickey Williams via clamav-users wrote:
> I'm trying and failing to understand the debug output ...
You're not alone. Perhaps this extract from .../libclamav/regex_list.c
will shed some light. The last paragraph is particularly amusing. :/
...
reverse_string(bufrev);
// TODO Add this back in once we improve the regex parsing code that finds
// suffixes to add to the filter.
//
// Reviewing Coverity bug reports we found that the return value to this
// filter_search call was effectively being ignored, causing no filtering
// to occur. Fixing this issue resulted in a unit test that uses the
// following match list regex to fail when searching for `ebay.com`.:
//
// .+\\.paypal\\.(com|de|fr|it)([/?].*)?:.+\\.ebay\\.(at|be|ca|ch|co\\.uk|de|es|fr|ie|in|it|nl|ph|pl|com(\\.(au|cn|hk|my|sg))?)/
//
// After investigating further, this is because the regex_list_add_pattern
// call, which parses the regex for suffixes and attempts to add these to
// the filter, can't handle the `com(\\.(au|cn|hk|my|sg))?` portion of
// the regex. As a result, it only adds `ebay.at`, `ebay.be`, `ebay.ca`, up
// through `ebay.pl` into the filter). With the commented out code below
// uncommented, these suffixes not existing in the filter are treated as
// there not being a corresponding regex for ebay.com, causing no regex
// rules to be evaluated against the URL.
//
// We should get the regex parsing code working (and ensure it handles any
// other complex cases in daily.cdb) before re-enabling this code. The code
// has had no effect for 12+ years at this point, though, so it's probably
// safe to wait a bit longer without it.
//
//filter_search_rc = filter_search(&matcher->filter, (const unsigned char *)bufrev, buffer_len);
//if (filter_search_rc == -1) {
// free(buffer);
// free(bufrev);
// /* filter says this suffix doesn't match.
// * The filter has false positives, but no false
// * negatives */
// return CL_SUCCESS;
//}
...
Incidentally your debug message claims "calc_pos_with_skip:" but the
function which emits is is actually called "get_char_at_pos_with_skip"
so I guess that at some point it's been renamed a little carelessly.
--
73,
Ged.
_______________________________________________
clamav-users mailing list
clamav-users at lists.clamav.net
https://lists.clamav.net/mailman/listinfo/clamav-users
Help us build a comprehensive ClamAV guide:
https://github.com/vrtadmin/clamav-faq
http://www.clamav.net/contact.html#ml
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clamav.net/pipermail/clamav-users/attachments/20201117/75db2c49/attachment.htm>
More information about the clamav-users
mailing list