[clamav-users] ClamAV Scan - Data Read vs Data Scanned
Paul Kosinski
clamav-users at iment.com
Tue Nov 3 05:00:17 UTC 2020
"(don't you love C?)"
I have never understood why the originators of C didn't give integers
explicit widths in bits: their scheme made C code often non-portable.
When I wrote code in the mid 1990s for the DEC Alpha, ints were 32 bits
while longs were 64 (unlike "standard" C). This made Alpha C code not
portable to lesser CPUs. On the other hand, when I wrote C on DOS for
the IBM PC in the late 1980s, ints were only 8 bits! It took some time
to figure out why my C-compliant code failed so badly. In spite of all
that, having started programming before C was invented, I can safely
say that C is better than its predecessors for software like ClamAV.
P.S. Good code these days tends to use typedefs defining things like
int32, uint64 etc. A shame the original ClamAV coders didn't do that.
On Tue, 3 Nov 2020 01:53:33 +0000
"Micah Snyder (micasnyd)" <micasnyd at cisco.com> wrote:
> I hadn't really looked at the code. You raise a good point.
>
> Changing it isn't super simple. The info.blocks variable is passed through cli_scandesc_callback() and scan_common() where it's placed into the scan context. When data is scanned, the amount scanned is divided by CL_COUNT_PRECISION (also found in clamav.h), which is what you multiply the number by to get the value in bytes. Provided that all downstream applications use CL_COUNT_PRECISION as clamscan does, we could shrink the count precision from 4k to something lower, but that would also decrease the max amount of data which could be scanned.
>
> If the variable were a uint64_t, that'd probably be fine... but it's an unsigned long int... aka maybe 4 bytes or maybe 8 bytes (don't you love C?). On systems where an unsigned long is 4 bytes, then that'd cap the scan limit at 4GB. Changing the variable to be an uint64_t would be "best", but it would be a non-backwards compatible change to the API which is very much not worth it.
>
> Sigh :-/
>
> > -----Original Message-----
> > From: clamav-users <clamav-users-bounces at lists.clamav.net> On Behalf Of
> > Paul Kosinski via clamav-users
> > Sent: Monday, November 2, 2020 5:23 PM
> > To: clamav-users at lists.clamav.net
> > Cc: Paul Kosinski <clamav-users at iment.com>
> > Subject: Re: [clamav-users] ClamAV Scan - Data Read vs Data Scanned
> >
> > Can this really be done? I was looking at the code referred to by G.W.
> > Haywood, and I see that it uses "info.blocks" and "info.rblocks".
> > Looking at the definitions in "clamav-0.103.0/clamscan/", I see the
> > following:
> >
> > struct s_info {
> > unsigned int sigs; /* number of signatures */
> > unsigned int dirs; /* number of scanned directories */
> > unsigned int files; /* number of scanned files */
> > unsigned int ifiles; /* number of infected files */
> > unsigned int errors; /* number of errors */
> > unsigned long int blocks; /* number of *scanned* 16kb blocks */
> > unsigned long int rblocks; /* number of *read* 16kb blocks */ };
> >
> > This suggests that the counts for "scanned" and "read" are not really byte
> > counts, and EICAR's 68 bytes would always be recorded as 0 (if normal
> > rounding rules are applied).
> >
> >
> >
> > On Mon, 2 Nov 2020 23:59:20 +0000
> > "Micah Snyder \(micasnyd\) via clamav-users" <clamav-users at lists.clamav.net>
> > wrote:
> >
> > > I agree. We already have some logic in freshclam to convert bytes to human
> > readable B / KiB / MiB / GiB format. It should be pretty much a copypaste
> > effort to improve the data scanned/read output.
> > >
> > > -Micah
> > >
> > > On 11/2/20, 9:47 AM, "clamav-users on behalf of G.W. Haywood via clamav-
> > users" <clamav-users-bounces at lists.clamav.net on behalf of clamav-
> > users at lists.clamav.net> wrote:
> > >
> > > Hi there,
> > >
> > > On Mon, 2 Nov 2020, Paul Kosinski via clamav-users wrote:
> > >
> > > > ... I still think it is a bad message that should be fixed.
> > >
> > > +1
> > >
> > > If you want to try a very quick and dirty tweak to get more precise
> > > numbers, change the value of
> > >
> > > 1) CL_COUNT_PRECISION in .../libclamav/clamav.h from 4096 to 1
> > >
> > > 2) replace '1024' with '1' in four places in clamscan/clamscan.c
> > >
> > > 3) change 'MB' to 'Bytes' in two places in clamscan/clamscan.c and
> > >
> > > 4) rebuild.
> > >
> > > 8<----------------------------------------------------------------------
> > > ~/clamav-0.103.0-rc2: $ grep -C3 -r CL_COUNT_PRECISION clamscan
> > libclamav | ...
> > > ...
> > > ...
> > > clamscan/clamscan.c: mb = info.blocks * (CL_COUNT_PRECISION /
> > 1024) / 1024.0;
> > > clamscan/clamscan.c: logg("Data scanned: %2.2lf MB\n", mb);
> > > clamscan/clamscan.c: rmb = info.rblocks * (CL_COUNT_PRECISION /
> > 1024) / 1024.0;
> > > clamscan/clamscan.c: logg("Data read: %2.2lf MB (ratio %.2f:1)\n",
> > rmb, info.rblocks ? (double)info.blocks / (double)info.rblocks : 0);
> > > ...
> > > ...
> > > libclamav/clamav.h:#define CL_COUNT_PRECISION 4096
> > > ...
> > > ...
> > >
> > > 8<--------------------------------------------------------------------
> > > --
> > >
> > > This is untested, YMMV. Obviously, if you're skilled in the art, this
> > > can be done better. Note that 'MB' should in any case be 'MiB' as the
> > > values printed are the counts divided by 2^20 and not by 10^6.
> > >
> > > --
> > >
> > > 73,
> > > Ged.
More information about the clamav-users
mailing list