Thursday 6 June 2019

synchronous pipe based task monitoring

I want a process to monitor another and know when it quits.

An obvious way if I control the launch is to tie them together with a pipe and they can detect when the pipe closes.

Maybe the other process would launch many further processes, all inheriting the pipe-fd, which I want to avoid.

So clearly the launcher needs to hold the pipe-fd but not share it (close-on-exec, don't fork too much), and then wait in the usual way for the launched process to quit, and then close the pipe.

Here's a bash incantation lifetime_fd which runs a simple list of arguments and can share an fd with another process through process substitution.

# just run a simple list, without affecting $?
just() {
  set -- $? "$@"
  "${@:2}"
  return $1
}

lifetime_fd() {
  set -- $_ "$@" ; eval "$1>&-" '"${@:2}"' ; just eval exec "$1>&-"
}

So if you want to run command fooly barly bazly and link it to the lame read && echo done then this will do the trick

lifetime_fd fooly barly bazly {_}> >( read && echo done )

So to attach a pipe descriptor to a sub-process, it is clear to use the trailing invocation {_}> >( sub-process ) which will attach stdin of the sub-process to a file descriptor to be stored in $_ which is managed in lifetime_fd

The variable $_ is used to avoid messing with any other variables. $_ is constantly adjusted and should do no harm if we abuse it; but as it is constantly adjusted, the first thing we do in lifetime_fd is to save it.

We don't use a local variable in case of a name clash that affects something else, so we store it as $1

We then run "$@" (or "${@:2}" as it now would be) but with the fd closed, so that it is not inherited.

We then close the fd while preserving the exit code.

You can invoke it in a pipeline like this:

get_report_request | lifetime_fd get_report {_}> >( monitor ) | send_report

An illustrative example of monitor (which reads until eof), might be:

monitor() {
  while read -t 1 || test $? = 142 # 142 is timeout code
  do echo -n '*'
  done
}

which displays a star every second until stdin closes; by continually waiting up to 1 second to fail to read anything from stdin (until it closes, having a different exit code), and displays a star.

Of course it might read other data too.... if you can send it...