[clamav-users] ERROR: VirusEvent: fork failed.
Micah Snyder (micasnyd)
micasnyd at cisco.com
Tue Feb 18 04:34:22 UTC 2020
Ángel,
Thanks! I've added a link to your email (and attached example) to our Jira task for improving VirusEvent. We'll definitely try it out.
-Micah
Micah Snyder
ClamAV Development
Talos
Cisco Systems, Inc.
On 2/16/20, 9:34 PM, "clamav-users on behalf of Ángel via clamav-users" <clamav-users-bounces at lists.clamav.net on behalf of clamav-users at lists.clamav.net> wrote:
On 2020-02-11 at 16:57 +0000, Mickey Sola (micksola) via clamav-users
wrote:
> Wanted to add a bit of insight to this convo from the dev side of
> things:
>
> VirusEvent currently works by forking the existing clamd process into
> a new, short-lived process that handles execution of the user's
> script.
>
> This is a legacy design choice and is problematic for a number of
> reasons--most relevant here is that you will need at minimum 2x the
> amount of resources clamd is already using to execute the VirusEvent.
> It was this resource drain, combined with the threaded nature of the
> old on access code, which led to us disabling the feature (only for on
> access scanning, not clamd/clamdscan).
>
> From what I can tell, your problem is that the fork system command is
> failing (code path for that error requires a negative return for
> fork())--very likely due to lack of resources on the server.
>
> Ideally, we would fix this resource consumption issue on its own, or
> better, as part of a larger redesign of clamd, but for
> now--like Ged, I would also recommend increasing memory resources and
> seeing if that solves the issue.
>
> -Mickey
This can be easily solved by changing the fork() to vfork()
--- ./clamd/others.c 2020-02-04 15:59:26.000000000 +0100
+++ ./clamd/others.c.new 2020-02-17 02:25:10.404123000 +0100
@@ -157,9 +157,9 @@
pthread_mutex_lock(&virusaction_lock);
/* We can only call async-signal-safe functions after fork(). */
- pid = fork();
+ pid = vfork();
if (pid == 0) { /* child */
- exit(execle("/bin/sh", "sh", "-c", buffer_cmd, NULL, env));
+ _exit(execle("/bin/sh", "sh", "-c", buffer_cmd, NULL, env));
} else if (pid > 0) { /* parent */
pthread_mutex_unlock(&virusaction_lock);
while (waitpid(pid, NULL, 0) == -1 && errno == EINTR) continue;
You will surely find text saying how nowadays there is no need for
vfork(), since fork() uses copy-on-write and it has practically no
penalty.
Well, I have found years ago that there *is* a difference (at least on
Linux, which is used by the OP). When the memory usage of the process
goes over a point¹ (in the default overcommit_memory=0) fork() will
start failing, while vfork() still works.
The context around that code means that it is safe to exchange the
fork() with vfork(), as it only modifies a pid_t variable "before
successfully calling _exit(2) or one of the exec(3) family of
functions".
Note that I am changing to _exit(2) the original exit(3) call used if
execle fails. Usage of exit(3) seems like a bug even when in the fork()
version, since atexit() handlers would be called at the children
process, which could have undesired effects.
It had been about 10 years since I last tested this, but I have made a
cheap program showing the issue. Hope it doesn't get stripped by the
mailing list.
Compiling with
gcc -o vfork-test vfork-test.c -O3
will -depending on the host memory- quickly stop after a few iterations
under the normal overcommit_memory=0, fail much earlier for
non-overcomitting (2), and "never" for always overcommit (1).
If instead of fork() you use vfork():
gcc -o vfork-test -Dfork=vfork vfork-test.c -O3
you get the "always forking" behavior also under overcommit_memory=0,
without changing the overcommit behavior systemwide (and a slight
increase under non-overcomitting mode).
Kind regards
Note: use -O0 if you want the program to force the memory to be
allocated.
¹ https://unix.stackexchange.com/questions/378172/what-does-heuristics-in-overcommit-memory-0-mean
More information about the clamav-users
mailing list