[clamav-users] Win.Trojan.URLspoof-2 signtuare and WARC files
Christopher Marczewski
cmarczewski at sourcefire.com
Wed Dec 21 00:45:07 UTC 2016
Hello Jay,
Al is correct. Signature drop requests can come in the form of an FP
submission <http://www.clamav.net/reports/fp>. Signature submissions or
suggestions for modifications should be sent to our community-sigs
<http://lists.clamav.net/cgi-bin/mailman/listinfo/community-sigs> mailing
list.
This signature was published in 2005. It's only looking for one variation
of such a spoofing attempt. For now, we'll modify Win.Trojan.URLSpoof-2 to
include your suggestion. At the very least, we'll be significantly
narrowing the detection scope & reducing the risk of additional FP's.
As for resources concerning best practices for signatures, the signatures
manual
<https://github.com/vrtadmin/clamav-devel/blob/master/docs/signatures.pdf>
would
be the best place to start. Resources specifically covering malware
analysis or DFIR topics would also be good resources to leverage, as many
of them will cover detection strategies & automation relying on pattern
matching.
Looking through our public repository & bug tracker, I don't see any
reference to the WARC specification. ClamAV is certainly extensible when it
comes to parsing select formats & scanning file artifacts thereafter, but
the WARC format would probably require its own parser. Your best option is
to submit a feature request
<https://bugzilla.clamav.net/enter_bug.cgi?product=ClamAV> to our bug
tracker.
Finally, deserializing the WARC files would allow for better coverage
through ClamAV as it could then identify common file formats & process them
properly. It's also worth mentioning that signatures carry a target type.
If you opt to scan the WARC files "as is", they would only be eligible for
alerts from signatures with a target type of 0 (any file) or 7 (normalized
ASCII text file), depending on how ClamAV ultimately processes & scans the
file.
Alongside the signature change, I'll also be looking for a few sample WARC
files to confirm how they're handled by ClamAV. If you have any samples in
mind, please provide the hashes or upload the samples through our Report
Malware <http://www.clamav.net/reports/malware> form. If they're clean
samples, I can mark them clean upon submission.
---------- Forwarded message ----------
Al Varnell alvarnell at mac.com
Mon Dec 19 23:24:01 EST 2016
One correction to the Group 2 signature, it's just '%00@'.
The only available method for having a signature removed or modified is by
submitting one or more False Positives at
<http://www.clamav.net/reports/fp> and include the details you have covered
below. If you would like to be notified of changes in the virus database,
you will need to join the clamav-virusdb mailing-list
<http://lists.clamav.net/cgi-bin/mailman/listinfo/clamav-virusdb>.
You can submit any suggested revised signature through the ClamAV Community
Signatures program
<http://blog.clamav.net/2014/02/introducing-clamav-community-signatures.html
>.
Although I'm not a signature expert by any means, but I would have to agree
that both the art and ClamAV engine capabilities have improved since this
one was apparently written and it should be easily improved.
-Al-
On Mon, Dec 19, 2016 at 05:40 PM, Jay Gattuso wrote:
>
> Win.Trojan.URLspoof-2
> We’re encountering some issues with this particular “virus”, and having
worked through what we’re seeing, I wanted to ask a couple of questions..
> The signature is pretty weak.
>
> [main.ndb] Win.Trojan.URLspoof-2:0:*:20687265663d22*0125303040*223e*3c2f
>
>
> We’ve seen hits against this signature 14 times in 8 years (I’m not sure
how long it’s been in the defs, but we’ve been checking our ~20Mil files
against ClamAV for 8 years).
> Every hit for Win.Trojan.URLspoof-2 we’ve seen is a false positive.
> Breaking the signature sequence into parts reveals the weakness of this
particular signature:
>
> Group 1: 20687265663d22 = ’ href=’
> Group 2: 0125303040 = ‘\x01%00@’
> Group 3: 223e = ‘">’
> Group 4: 3c2f = ‘</’
>
> This false positives is appearing in WARC files (
http://iipc.github.io/warc-specifications/), and its earlier variant ARC (
http://archive.org/web/researcher/ArcFileFormat.php)
> I’ve been pulling these containers apart, and can see that we only get a
hit when the signature parts are found across the content container, so for
us, group 1 appearing in any piece of HTML, group 2 appearing in a variety
of file formats including PDF, MP3, MP4 and JPG. Groups 3 and 4 are trivial
and appear everywhere. The point here, is that it is never caused by a
single file as would found in the wild, only through the aggregation we
undertake ourselves when creating these WARC files.
>
> We run a slightly non-standard conf:
>
> # MaxScanSize
> # Default: 100M
> MaxScanSize 2048M
>
> And
>
> # MaxFileSize
> # Default: 25M
> MaxFileSize 2048M
>
> Questions:
>
> 1) How would I go about getting this signature either removed or
hardened? For example, if the signature is specifically hunting for a URL,
perhaps it could be confined to the max URL length * 2 or some such (
http://stackoverflow.com/questions/417142/what-is-the-maximum-length-of-a-url-in-different-browsers)
say 4000 bytes. As I’ve never seen a positive hit against this signature,
and I have no idea how common it is or what its actually looking for.
Removing it might not be a great idea.
>
> Is there any resources that might help me to work on a stronger signature
for this particular threat, and what’s the process for suggesting a
revision/removal?
>
> 2) These hits all happen in the W/ARC container. These containers
are simple serialisations of arbitrary files harvested from websites, and
their associated HTTP transaction. These are used to “replay” web harvests
(like the wayback machine etc). Is there any way we can handle these
particular file types differently? As these files are aggregations of any
number of binary items we are much more likely to encounter false
positives, especially for weak signatures. We’ve only seen false positives
for the Trojan URL signature, but I anticipate seeing more when we process
the 80Tbs of WARCs we have waiting to come in – these will translate into
~2billion files housed in several hundred thousand WARC files.
>
> Ideally we ought to be ripping the (W)ARC into its binary parts – by
parsing an arbitrary aggregation of many files as a coherent file of single
payload I think we’re doing ourselves a disservice. I wondered if there was
a method within the ClamAV architecture that would support the construction
of a WARC parser. This might allow WARC files to be “properly” consumed as
a series of disconnected binary items, reducing the likelihood of false
positives.
>
> We are also looking at what it would mean for our workflow to explode the
W/ARCs into their parts before they are presented for scanning, and that’s
a viable option. For now I’m mainly interested in knowing what we
could/could not do.
>
>
> Jay Gattuso | Digital Preservation Analyst | Preservation, Research and
Consultancy
> National Library of New Zealand | Te Puna Mātauranga o Aotearoa
> PO Box 1467 Wellington 6140 New Zealand | +64 (0)4 474 3064
> jay.gattuso at dia.govt.nz<mailto:jay.gattuso at natlib.govt.nz>
>
> _______________________________________________
> clamav-users mailing list
> clamav-users at lists.clamav.net
> http://lists.clamav.net/cgi-bin/mailman/listinfo/clamav-users
>
>
> Help us build a comprehensive ClamAV guide:
> https://github.com/vrtadmin/clamav-faq
>
> http://www.clamav.net/contact.html#ml
-Al-
--
Al Varnell
Mountain View, CA
--
Christopher Marczewski
Research Engineer
Talos Group
cmarczewski at sourcefire.com
Phone: 443.430.7118
More information about the clamav-users
mailing list