Lots of tests run fsstress in the background and then have to kill
it and/or provide special cleanup functions to kill the background
fsstress processes. They typically use $KILLALL_PROG for this.
Use of killall is problematic for running multiple tests in parallel
in that one test can kill other tests' processes. However, because
fsstress itself forks and runs children, there are very few avenues
for shell scripts to ensure all the fsstress processes actually die.
With bash, it is especially nasty, because sending SIGTERM will
result in bash outputting error messages ("Killed: ..." that will
cause golden output mismatches and hence test failures. Hence we
also need to be able to tell the main fstress process to die without
triggering these messages.
To avoid the process tracking problems, we change to use pkill
rather than killall (more options for process selection) and we
stop using the $here/ltp/fsstress binary. Instead, we copy the
$here/ltp/fsstress to $TEST_DIR/$seq.fsstress so that the test has
a unique fsstress binary name. This allows the pkill filter to
select just the fsstress processes the test has run. The fsstress
binary name is held in _FSSTRESS_NAME, and the program to run is
_FSSTRESS_PROG.
We also track the primary fsstress process ID, and store that in
_FSSTRESS_PID. We do this so that we have a PID to wait against so
that we don't return before the fsstress processes are dead. To this
end, we add a SIGPIPE handler to the primary process so that it
dying doesn't trigger bash 'killed' message output. We can
send 'pkill -PIPE $_FSSTRESS_NAME' to all the fsstress processes and
the primary process will then enter the "wait for children to die"
processing loop before it exits. In this way, we can wait for the
primary fsstress process and when it exits we know that all it's
children have also finished and gone away. This makes killing
fsstress invocations reliable and noise free.
This is accomplished by the helpers added to common/rc:
This also means that all fsstress invocations now obey
FSSTRESS_AVOID environment restrictions, many of which didn't.
We add a call to _kill_fstress into the generic _cleanup() function.
This means that tests using fsstress don't need to add a special
local _cleanup function just to call _kill_fsstress() so that
background fsstress processes are killed when the user interrupts
the tests with ctrl-c.
Further, killall in the _cleanup() function is often used to attempt
to expedite killing of foreground execution fsstress processes. This
doesn't actually work because of the way bash processes interupt
signals. That is, it waits for the currently executing process to
finish execution, then runs the trap function. Hence a foreground
fsstress won't ever be interrupted by ctrl-c. By implementing
_run_fsstress() as a background process and a wait call, the wait()
call is interrupted by the signal and the cleanup trap is run
immediately. Hence the fsstress processes are killed immediately and
the test exits cleanly almost immediately.
The result of all this is common, clean handling of fsstress
execution and termination. There are a few exceptions for special
cases, but the vast majority of tests that run fsstress use the
above four wrapper functions exclusively.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Zorro lang <zlang@redhat.com> Signed-off-by: Zorro Lang <zlang@kernel.org>