[clamav-users] unexplainable tar behaviour

Micah Snyder (micasnyd) micasnyd at cisco.com
Thu Oct 31 12:57:13 EDT 2019

Yessir, it does indeed scan the raw file and if nothing is found (or you're running in allmatch mode) it will decompress the archive and scan the files within.  ClamAV has a default archive recursion depth of 16, so it will go pretty deep.  

I don’t think it's been explicitly stated yet, tar files are not compressed, and are just a bundle of files in one file.  A compressed tarball ( tar.gz or targ.bz ) is less likely to have the issue described by Steffen where a signature matches various parts of different files within an archive. 

If you want to see how ClamAV extracts files or other buffers for scanning, try out clamscan's --leave-temps and --tempdir options.  I would also recommend trying the --gen-json option, if your ClamAV build was linked with libjson-c.  

The --leave-temps option will force it to write extracted files and other buffers (like PDF streams) to disk, and --tempdir will direct it to a location of your choosing.  I will admit, it's a bit of a bear to analyze because the file names (including the JSON metadata file created by --gen-json) are randomly generated and there's only some limited structure.  We're working on making the output more readable / more valuable to analysts but for now it is a bit of work to interpret.   The output from clamscan's --debug option may also help. 


On 10/31/19, 10:46 AM, "clamav-users on behalf of J.R. via clamav-users"

    > I thought ClamAV unpacked TARs (and other archives) and looked at the
    > contents. If it doesn't, it wouldn't be very effective in detecting
    > viruses in compressed files.
    I've been wondering about this too during this particular discussion.
    Is ClamAV scanning the archive as-is, then additionally (hopefully)
    decompressing it and scanning individual files? Is there a way to
    debug with more info to see exactly what is going on with the process?
