[clamav-users] Google safebrowsing types and usage questions

iulian stan iulian at sphere.ro
Mon Oct 19 21:54:39 UTC 2020


Dear Ged/All,

After a beer things started to look more clear :)

You were right about something: indeed clamav is looking for something 
before starting to look after URL but it's actually looking for what 
should be the start of email headers. In short words is looking for: 
"From someone".
Basically the test can be:
echo -e "From test\n\n http://www.google.com/" | clamscan  -d bla.gdb  -
or
echo -e "From test\n\n<a href=http://www.google.com/>test</a>" | 
clamscan  -d bla.gdb  -

with the fallowing result:
----------- SCAN SUMMARY -----------
Known viruses: 2
Engine version: 0.102.4
Scanned directories: 0
Scanned files: 1
Infected files: 1
Data scanned: 0.00 MB
Data read: 0.00 MB (ratio 0.00:1)
Time: 0.051 sec (0 m 0 s)

I totally agree with you that "Know viruses" should be 1 but this is 
another story for another time.


Now comes the funny part which explains  why i didn't found the sha256 
hash in my mysql and also why the above test will fail if you don't 
create the hash correctly.

If you read https://developers.google.com/safe-browsing/v4/urls-hashing 
(very carefully, not like I've did in the beginning) you will see that 
you can create multiple hashes for the same url but you first need to 
strip http[s]://

The same is also seen in the clamav debugging.
If we take for example url 
"http://www.google.com/jhgfedwsqasdfgh/234tewdas.txt" the debug will be.

LibClamAV debug: getHrefs: html_normalise_mem returned
LibClamAV debug: Phishcheck:Checking url 
http://www.google.com/jhgfedwsqasdfgh/234tewdas.txt</p>->
LibClamAV debug: Looking up hash 
DDEF6ACD0DF553A77CBC6B3537BDAA766E0CD819733D0B712AFD9A41B5888AB5 for 
google.com/(11)jhgfedwsqasdfgh/234tewdas.txt</p>(31)
LibClamAV debug: Looking up hash 
B8047D0B3763184FF29E17D4F649BA05E469538C40018FBB901437822F0066C6 for 
www.google.com/(15)jhgfedwsqasdfgh/234tewdas.txt</p>(31)
LibClamAV debug: Looking up hash 
6D92531661EBF105F3C03BE8EA6C7E585F2A1603B5FF4D501BC0846755355018 for 
google.com/(11)jhgfedwsqasdfgh/234tewdas.txt</p>(16)
LibClamAV debug: Looking up hash 
DA983C0FAA7401A96BBBF6068F29762557B63F0811A0418BC046D95795999AFB for 
www.google.com/(15)jhgfedwsqasdfgh/234tewdas.txt</p>(16)
LibClamAV debug: Looking up hash 
88981E6263BE34A6C0B53ADA73D168B68828DD643723D34A812E9F8A6ABB5EE9 for 
google.com/(11)jhgfedwsqasdfgh/234tewdas.txt</p>(0)
LibClamAV debug: Looking up hash 
BC9A8F2B6FFFD58571E188BB110545F8FB3AF51CDF1A63696D505A9870A85BE5 for 
www.google.com/(15)jhgfedwsqasdfgh/234tewdas.txt</p>(0)
LibClamAV debug: This hash matched: 
BC9A8F2B6FFFD58571E188BB110545F8FB3AF51CDF1A63696D505A9870A85BE5
LibClamAV debug: Hash matched for: 
http://www.google.com/jhgfedwsqasdfgh/234tewdas.txt</p>
LibClamAV debug: Phishcheck: Phishing scan result: Blacklisted
LibClamAV debug: blobDestroy



Long story short, safebrowsing is working ok but there are no hits which 
is quite surprising i can say seeing the magnitude of the database 
entries and the scam/phishing flowing trough emails now-days.

---
Best regards,
Iulian Stan


On 2020-10-19 20:01, G.W. Haywood via clamav-users wrote:
> Hi there,
> 
> Just some thoughts, as you asked.  Sorry is isn't more helpful.
> 
> On Mon, 19 Oct 2020, iulian stan via clamav-users wrote:
> 
>> #cat bla.gdb
>> S1:F:dd014af5ed6b38d9130e3f466f850e46d21b951199d53a18ef29ee9341614eaf
>> S1:P:dd014af5 Creating file to be tested: #cat /tmp/clam.txt
>> http://www.google.com/
>> www.google.com
>> http://www.google.com/asdasdasd
> 
> I repeated your tests with 0.103-rc2 and got the same results.  I
> looked for obvious things like line terminators being included by
> accident, but I didn't find anything.
> 
>> Running scanner: clamscan --debug -d bla.gdb /tmp/clam.txt
>> LibClamAV debug: Module <....> On
> 
> I wondered if there's a module that should be being loaded and isn't.
> 
>> LibClamAV debug: Recognized ASCII text
> 
> I wondered does it need to recognize the file as HTML, and also if
> there's some length limit below which the scanner won't bother doing
> the scan (I've seen mention of something like that when I've been
> reading the code looking for something else) but I tried wrapping your
> text in some html tags, and added some padding, and it made no
> difference.  This is incidentally one of those cases where the values
> printed in the output for "Data scanned" and "Data read" could be more
> useful...
> 
> 8<----------------------------------------------------------------------
> ...
> LibClamAV debug: Recognized ASCII text
> LibClamAV debug: Matched signature for file type HTML data at 0
> ...
> ----------- SCAN SUMMARY -----------
> Known viruses: 2
> Engine version: 0.103.0-rc2
> Scanned directories: 0
> Scanned files: 1
> Infected files: 0
> Data scanned: 0.20 MB
> Data read: 0.10 MB (ratio 2.00:1)
> 8<----------------------------------------------------------------------
> 
> Lastly
> 
>> ----------- SCAN SUMMARY -----------
>> Known viruses: 2
> 
> This doesn't seem right to me.  There's really only one signature.
> 
> Basically I haven't seen anything here which might make me think the
> problem is you, but I don't use the safebrowsing stuff so I don't have
> the experience (and I don't have the time right now) to investigate it
> further.  It seems to me that even if there isn't something wrong with
> clamd (which I guess means that it's faulty documentation) it really
> shouldn't be this difficult - that alone would make it worth a report
> to the ClamAV Bugzilla.
> 
> --
> 
> 73,
> Ged.
> 
> _______________________________________________
> 
> clamav-users mailing list
> clamav-users at lists.clamav.net
> https://lists.clamav.net/mailman/listinfo/clamav-users
> 
> 
> Help us build a comprehensive ClamAV guide:
> https://github.com/vrtadmin/clamav-faq
> 
> http://www.clamav.net/contact.html#ml



More information about the clamav-users mailing list