I have 48472 and 48473. The 48474 I got was the gdb file that was downloaded as part of the cdiff. The freshclam process hung after downloading though.  The order of the 48474 gdb file was no different from the order of the 48473 file.

Freshclam gets this far before hanging after the download. The gdb file listed there has the same format.
Wed Mar  6 16:50:46 2019 -> *main.cvd version from DNS: 58
Wed Mar  6 16:50:46 2019 -> main.cvd is up to date (version: 58, sigs: 4566249, f-level: 60, builder: sigmgr)
Wed Mar  6 16:50:46 2019 -> *daily.cvd version from DNS: 25380
Wed Mar  6 16:50:46 2019 -> daily.cvd is up to date (version: 25380, sigs: 1503528, f-level: 63, builder: raynman)
Wed Mar  6 16:50:46 2019 -> *safebrowsing.cvd version from DNS: 48474
LibClamAV debug: in cli_untgz()
LibClamAV debug: cli_untgz: Unpacking /home/logins/mbroekman/analysis/tmp/clamav-317041d4b9d853e83b60005464dd098c.tmp/clamav-b4a94beaae2191e11c7805c6e49be7e6.tmp/COPYING
LibClamAV debug: cli_untgz: Unpacking /home/logins/mbroekman/analysis/tmp/clamav-317041d4b9d853e83b60005464dd098c.tmp/clamav-b4a94beaae2191e11c7805c6e49be7e6.tmp/safebrowsing.info
LibClamAV debug: cli_untgz: Unpacking /home/logins/mbroekman/analysis/tmp/clamav-317041d4b9d853e83b60005464dd098c.tmp/clamav-b4a94beaae2191e11c7805c6e49be7e6.tmp/safebrowsing.gdb
LibClamAV debug: in cli_untgz_cleanup()
Wed Mar  6 16:50:49 2019 -> *Retrieving http://db.US.clamav.net/safebrowsing-48474.cdiff
Wed Mar  6 16:50:49 2019 -> nonblock_connect: connect(): fd=4 errno=101: Network is unreachable
Wed Mar  6 16:50:49 2019 -> Can't connect to port 80 of host db.US.clamav.net (IP: 2606:4700::6810:da54)
Wed Mar  6 16:50:49 2019 -> *Trying to download http://db.US.clamav.net/safebrowsing-48474.cdiff (IP: 104.16.219.84)
Wed Mar  6 16:50:49 2019 -> Downloading safebrowsing-48474.cdiff [100%]

The last time I ran freshclam, I was stuck at 100% on the download for 40 minutes before I killed the process.

The info file in the tmpdir shows:
ClamAV-VDB:06 Mar 2019 13-24 -0500:48474:3232286:63:X:X:google:1551896655
safebrowsing.gdb:132636452:7f6645b8d865de3992be1ad5de215afd848acee4c021eed4818fdb760f76b57e
DSIG:NxsTJGIb7EQ9e71CjIH2QJYzp+BhrH0qK1Mb0Ef5BQfO5WZnm8qZSqj/y6vstcjAOUfWwLG8ba3RemesF+KxIuk/HMkDgRCJep+shVvz8nAccajvbBN1ZnmpTkf1T0QgTsDbuBK9cTItdlQWupKfuiV1aKKdF1jSLvtRJU4zoZl+B3/qgIAPi7sqmkh8W5qKplYdsICdfmDLxK5dDwCkGmdtXZol5pHHXTQb1/LJqml8SORrFydkYizuVl07/uuc332dk5Uk1NfZrDj94wG0dIIloWiwfPzj563Vl5e7GvCvCdMR1Gfq3EGYZGSPftR7a/K7TashvsoWP2Uma0Fq/




On Wed, Mar 6, 2019 at 5:47 PM David Raynor <draynor@sourcefire.com> wrote:
That's strange, the 48474 I have should have the sorting changed and has the improved loading time we're talking about.

$ sigtool --info safebrowsing.cvd
File: safebrowsing.cvd
Build time: 06 Mar 2019 13:24 -0500
Version: 48474
Signatures: 3232286
Functionality level: 63
Builder: google
MD5: 70c61f41e52b5a2134ff7e272f5a6df1

SHA256 (safebrowsing.gdb) = 7f6645b8d865de3992be1ad5de215afd848acee4c021eed4818fdb760f76b57e

Something must be different.

Dave R.

On Wed, Mar 6, 2019 at 5:39 PM Maarten Broekman via clamav-users <clamav-users@lists.clamav.net> wrote:
The new safebrowsing cvd (starting with version 48473) seems to be sorted in a way that increases the load time of that file by several orders of magnitude.

I have a previous version from February where the entries in the gdb section are sorted like this:
S2:F:0000917787cff7b0993917209809ff3d94bec7e1de7188b323d9b88e0273cb71
S2:F:000149794d90dc5bce4f685deed6076d00c9209bd81cef4cbdf8a4e41f0a2153
S2:F:00042c895c912fd567afa35450cfe5d321d0d68eb3833156925c4e27d2c29aa2
S2:F:0006d4dcb0d939d725e676a9e68aaeb303e04478e6861d2a77469d1b6a0a0f7d
S2:F:0007bf7c1808d12177f0ae90d336d60c5a7a3d89703806955b75c56f898dd919
...
S2:P:00009177
S2:P:00014979
S2:P:00042c89
S2:P:0006d4dc
S2:P:0007bf7c
...
S:F:00000860493997b798861956e06d3d3606f82384259b971bb922f94f886a4b55
S:F:00000bddafae162a7a2f1249b3b38c8e4b6d3cb8bf0c30c26cc354ebcba16b37
S:F:000046cad35fbecbcc8dd4ebb244bd08aa6dbf1078279115c82f8e21b2cf8478
S:F:0000684200da7b11f38a6f4719bda4ec6c6ae8b2be1f7e12a16605b2d3a5d490
S:F:000072f3f33e47a2f97b8711d240267462aa3f0a5f8130845b119a2ad3798292
...
S:P:00000860
S:P:00000bdd
S:P:000046ca
S:P:00006842
S:P:000072f3

That loads into clamd (and clamscan) in under 5 seconds for the 3041760 entries in it.

Version 48473 and 48474 are sorted like this:
S2:P:00009177
S2:F:0000917787cff7b0993917209809ff3d94bec7e1de7188b323d9b88e0273cb71
S2:P:00014979
S2:F:000149794d90dc5bce4f685deed6076d00c9209bd81cef4cbdf8a4e41f0a2153
...

That version loads in 50+ seconds for the 3229612 entries in it.

If I flip the order of the entries so the :F: entries comes before the corresponding :P: entry, it loads the same number of entries in 5 - 10 seconds.
If I reorder the entire file so that _all_ the :F: entries for each section (S or S2) come before the :P: entries for that section, it loads in under 5 seconds again.

Earlier today it was mentioned that 'the next version of the CVD' would fix it (when 48473 was the current version). That seems to have not been the case since 48474 didn't fix it. Is there a plan to fix it? Or will we have to live with the enormous load times for this database?

--Maarten



_______________________________________________

clamav-users mailing list
clamav-users@lists.clamav.net
https://lists.clamav.net/mailman/listinfo/clamav-users


Help us build a comprehensive ClamAV guide:
https://github.com/vrtadmin/clamav-faq

http://www.clamav.net/contact.html#ml


--
---
Dave Raynor
Talos Security Intelligence and Research Group
draynor@sourcefire.com


_______________________________________________

clamav-users mailing list
clamav-users@lists.clamav.net
https://lists.clamav.net/mailman/listinfo/clamav-users


Help us build a comprehensive ClamAV guide:
https://github.com/vrtadmin/clamav-faq

http://www.clamav.net/contact.html#ml