[clamav-users] Google safebrowsing types and usage questions
iulian stan
iulian at sphere.ro
Mon Oct 19 21:54:39 UTC 2020
Dear Ged/All,
After a beer things started to look more clear :)
You were right about something: indeed clamav is looking for something
before starting to look after URL but it's actually looking for what
should be the start of email headers. In short words is looking for:
"From someone".
Basically the test can be:
echo -e "From test\n\n http://www.google.com/" | clamscan -d bla.gdb -
or
echo -e "From test\n\n<a href=http://www.google.com/>test</a>" |
clamscan -d bla.gdb -
with the fallowing result:
----------- SCAN SUMMARY -----------
Known viruses: 2
Engine version: 0.102.4
Scanned directories: 0
Scanned files: 1
Infected files: 1
Data scanned: 0.00 MB
Data read: 0.00 MB (ratio 0.00:1)
Time: 0.051 sec (0 m 0 s)
I totally agree with you that "Know viruses" should be 1 but this is
another story for another time.
Now comes the funny part which explains why i didn't found the sha256
hash in my mysql and also why the above test will fail if you don't
create the hash correctly.
If you read https://developers.google.com/safe-browsing/v4/urls-hashing
(very carefully, not like I've did in the beginning) you will see that
you can create multiple hashes for the same url but you first need to
strip http[s]://
The same is also seen in the clamav debugging.
If we take for example url
"http://www.google.com/jhgfedwsqasdfgh/234tewdas.txt" the debug will be.
LibClamAV debug: getHrefs: html_normalise_mem returned
LibClamAV debug: Phishcheck:Checking url
http://www.google.com/jhgfedwsqasdfgh/234tewdas.txt</p>->
LibClamAV debug: Looking up hash
DDEF6ACD0DF553A77CBC6B3537BDAA766E0CD819733D0B712AFD9A41B5888AB5 for
google.com/(11)jhgfedwsqasdfgh/234tewdas.txt</p>(31)
LibClamAV debug: Looking up hash
B8047D0B3763184FF29E17D4F649BA05E469538C40018FBB901437822F0066C6 for
www.google.com/(15)jhgfedwsqasdfgh/234tewdas.txt</p>(31)
LibClamAV debug: Looking up hash
6D92531661EBF105F3C03BE8EA6C7E585F2A1603B5FF4D501BC0846755355018 for
google.com/(11)jhgfedwsqasdfgh/234tewdas.txt</p>(16)
LibClamAV debug: Looking up hash
DA983C0FAA7401A96BBBF6068F29762557B63F0811A0418BC046D95795999AFB for
www.google.com/(15)jhgfedwsqasdfgh/234tewdas.txt</p>(16)
LibClamAV debug: Looking up hash
88981E6263BE34A6C0B53ADA73D168B68828DD643723D34A812E9F8A6ABB5EE9 for
google.com/(11)jhgfedwsqasdfgh/234tewdas.txt</p>(0)
LibClamAV debug: Looking up hash
BC9A8F2B6FFFD58571E188BB110545F8FB3AF51CDF1A63696D505A9870A85BE5 for
www.google.com/(15)jhgfedwsqasdfgh/234tewdas.txt</p>(0)
LibClamAV debug: This hash matched:
BC9A8F2B6FFFD58571E188BB110545F8FB3AF51CDF1A63696D505A9870A85BE5
LibClamAV debug: Hash matched for:
http://www.google.com/jhgfedwsqasdfgh/234tewdas.txt</p>
LibClamAV debug: Phishcheck: Phishing scan result: Blacklisted
LibClamAV debug: blobDestroy
Long story short, safebrowsing is working ok but there are no hits which
is quite surprising i can say seeing the magnitude of the database
entries and the scam/phishing flowing trough emails now-days.
---
Best regards,
Iulian Stan
On 2020-10-19 20:01, G.W. Haywood via clamav-users wrote:
> Hi there,
>
> Just some thoughts, as you asked. Sorry is isn't more helpful.
>
> On Mon, 19 Oct 2020, iulian stan via clamav-users wrote:
>
>> #cat bla.gdb
>> S1:F:dd014af5ed6b38d9130e3f466f850e46d21b951199d53a18ef29ee9341614eaf
>> S1:P:dd014af5 Creating file to be tested: #cat /tmp/clam.txt
>> http://www.google.com/
>> www.google.com
>> http://www.google.com/asdasdasd
>
> I repeated your tests with 0.103-rc2 and got the same results. I
> looked for obvious things like line terminators being included by
> accident, but I didn't find anything.
>
>> Running scanner: clamscan --debug -d bla.gdb /tmp/clam.txt
>> LibClamAV debug: Module <....> On
>
> I wondered if there's a module that should be being loaded and isn't.
>
>> LibClamAV debug: Recognized ASCII text
>
> I wondered does it need to recognize the file as HTML, and also if
> there's some length limit below which the scanner won't bother doing
> the scan (I've seen mention of something like that when I've been
> reading the code looking for something else) but I tried wrapping your
> text in some html tags, and added some padding, and it made no
> difference. This is incidentally one of those cases where the values
> printed in the output for "Data scanned" and "Data read" could be more
> useful...
>
> 8<----------------------------------------------------------------------
> ...
> LibClamAV debug: Recognized ASCII text
> LibClamAV debug: Matched signature for file type HTML data at 0
> ...
> ----------- SCAN SUMMARY -----------
> Known viruses: 2
> Engine version: 0.103.0-rc2
> Scanned directories: 0
> Scanned files: 1
> Infected files: 0
> Data scanned: 0.20 MB
> Data read: 0.10 MB (ratio 2.00:1)
> 8<----------------------------------------------------------------------
>
> Lastly
>
>> ----------- SCAN SUMMARY -----------
>> Known viruses: 2
>
> This doesn't seem right to me. There's really only one signature.
>
> Basically I haven't seen anything here which might make me think the
> problem is you, but I don't use the safebrowsing stuff so I don't have
> the experience (and I don't have the time right now) to investigate it
> further. It seems to me that even if there isn't something wrong with
> clamd (which I guess means that it's faulty documentation) it really
> shouldn't be this difficult - that alone would make it worth a report
> to the ClamAV Bugzilla.
>
> --
>
> 73,
> Ged.
>
> _______________________________________________
>
> clamav-users mailing list
> clamav-users at lists.clamav.net
> https://lists.clamav.net/mailman/listinfo/clamav-users
>
>
> Help us build a comprehensive ClamAV guide:
> https://github.com/vrtadmin/clamav-faq
>
> http://www.clamav.net/contact.html#ml
More information about the clamav-users
mailing list