Stopping systemd workloads in Openshift

Are you using systemd workloads? Then this article could be of interest. In this article we are going to see how workloads based on systemd can be stopped gracefully on Openshift.

We are going to do hands-on activities, using a simple systemd workload which runs an nginx service. We will see the differences between using the workload in Podman and using the workload in Openshift. Finally we will see how to overcome the limitation in Openshift by using container lifecycle hooks.

Prerequisites

You can install a single node OpenShift using kcli or Code Ready Containers.

Updates:

Defining the workload

We are going to use the following simple Dockerfile.stopsignal-systemd Dockerfile to build our workload.

FROM quay.io/fedora/fedora:35
RUN dnf -y install procps nginx \
    && dnf clean all \
    && systemctl enable nginx
EXPOSE 80
# https://docs.docker.com/engine/reference/builder/#stopsignal
# https://www.freedesktop.org/software/systemd/man/systemd.html#SIGRTMIN+3
STOPSIGNAL SIGRTMIN+3
ENTRYPOINT ["/sbin/init"]

The STOPSIGNAL instruction is not needed by podman as it detects that the signal to be sent by podman stop should be SIGRTMIN+3, because the container process is systemd.

Now we build:

export IMG="quay.io/avisied0/demos:stopsignal-systemd"
podman build -t "${IMG}" -f Dockerfile.stopsignal-systemd .

Runnning container with podman

Firstly, let's see what happens with the workload when running with podman or docker:

CONTAINER_ID=$( podman run -it -d "${IMG}" )
podman logs --follow "${CONTAINER_ID}" &
podman stop "${CONTAINER_ID}"

And we get a result like the below:

[  OK  ] Removed slice Slice /system/getty.
[  OK  ] Removed slice Slice /system/modprobe.
[  OK  ] Stopped target Graphical Interface.
[  OK  ] Stopped target Multi-User System.
[  OK  ] Stopped target Login Prompts.
[  OK  ] Stopped target Timer Units.
[  OK  ] Stopped dnf makecache --timer.
[  OK  ] Stopped Daily rotation of log files.
[  OK  ] Stopped Daily Cleanup of Temporary Directories.
.
.
.
[  OK  ] Stopped target Swaps.
[  OK  ] Reached target System Shutdown.
[  OK  ] Reached target Unmount All Filesystems.
[  OK  ] Reached target Late Shutdown Services.
         Starting System Halt...
Sending SIGTERM to remaining processes...
Sending SIGKILL to remaining processes...
All filesystems, swaps, loop devices, MD devices and DM devices detached.
Halting system.
Exiting container.

[1]+  Done                    podman logs --follow "${CONTAINER_ID}"

What about OpenShift?

Let's try now our workload on OpenShift; you will need an OpenShift cluster or a single node OpenShift (you can get one by using kcli or Code Ready Containers).

Ensure the repository is public so that the cluster can pull the image.

We get something like the below in the log output, but systemd and the pod are still running:

pod "systemd-nginx" deleted
systemd-nginx login: systemd v249.7-2.fc35 running in system mode (+PAM +AUDIT +SELINUX -APPARMOR +IMA +SMACK +SECCOMP +GCRYPT +GNUTLS +OPENSSL +ACL +BLKID +CURL +ELFUTILS +FIDO2 +IDN2 -IDN +IPTC +KMOD +LIBCRYPTSETUP +LIBFDISK +PCRE2 +PWQUALITY +P11KIT +QRENCODE +BZIP2 +LZ4 +XZ +ZLIB +ZSTD +XKBCOMMON +UTMP +SYSVINIT default-hierarchy=unified)
Detected virtualization podman.
Detected architecture x86-64.

We can see that systemd does not begin the stop sequence as was the case with podman. This is because Openshift did not translate the STOPSIGNAL instruction specified in the Dockerfile (this will be fixed at Openshift 4.10). To work around this situation we will use container lifecycle hooks, to explicitly send SIGRTMIN+3 to PID 1 (systemd).

Trying more isolated

Let's see if this happens only for SIGRTMIN+3 or for any signal specified via the STOPSIGNAL instruction. To investigate that, we will use the following Dockerfile.stopsignal-demo Dockerfile:

FROM quay.io/fedora/fedora:35
COPY demo-signal.sh /demo-signal.sh
RUN chmod a+x /demo-signal.sh
STOPSIGNAL SIGINT
CMD ["/demo-signal.sh"]

The demo-signal.sh should have execute permission. The content is:

#!/bin/bash

function trap_signal {
  local signal="$1"
  echo -e "\nExiting by ${signal}" >&2
  exit 0
}

for signal in SIGINT SIGTERM SIGUSR1 "SIGRTMIN+3"
do
  trap "trap_signal '${signal}'" "${signal}"
done

while true; do
    echo -n "."
    sleep 1
done

Update: Script updated based on PR at: https://github.com/avisiedo/freeipa-kustomize/blob/idmocp-331-stopping-with-kind-and-podman/incubator/013-signalstop/demo-signal.sh

Finally we define a workload with the following pod-stopsignal-demo.yaml:

apiVersion: v1
kind: Pod
metadata:
  name: stopsignal-demo
  labels:
    app: stopsignals
spec:
  automountServiceAccountToken: false
  containers:
  - name: main
    image: quay.io/avisied0/demos:stopsignal-demo
    imagePullPolicy: Always
    command:
    - /demo-signal.sh
    tty: true
    privileged: false

Build the image and push:

export IMG="quay.io/avisied0/demos:stopsignal-demo"
podman build -t "${IMG}" -f Dockerfile.stopsignal-demo .
podman push "${IMG}"

And we try the scenario by:

oc create -f pod-stopsignal-demo.yaml --as system:serviceaccount:stopsignal:runasanyuid
oc logs pod/stopsignal-demo -f --as system:serviceaccount:stopsignal:runasanyuid &
oc delete -f pod-stopsignal-demo.yaml --as system:serviceaccount:stopsignal:runasanyuid

Getting the output below:

pod "stopsignal-demo" deleted
............
Exiting by SIGINT

When the SIGINT is specified into the STOPSIGNAL instruction in the Dockerfile Openshift is sending SIGINT signal to the pod when we delete the resource.

When the STOPSIGNAL 37 (RTMIN+3) is specified as a numeric value, Openshift is sending SIGTERM instead of the expected SIGRTMIN+3 indicated into the Dockerfile file.

Update:

Another test was made in Openshift 4.10 ci build on Wed Jan 5, 2022 and it worked as expected, by sending the SIGRTMIN+3 to the container workload. So this will be fixed in future releases.

Solution: container lifecycle hooks

And the log output immediately shows the below:

pod "systemd-nginx" deleted
systemd-nginx login: [  OK  ] Removed slice Slice /system/getty.
[  OK  ] Removed slice Slice /system/modprobe.
[  OK  ] Stopped target Graphical Interface.
[  OK  ] Stopped target Multi-User System.
[  OK  ] Stopped target Login Prompts.
[  OK  ] Stopped target Timer Units.
[  OK  ] Stopped dnf makecache --timer.
[  OK  ] Stopped Daily rotation of log files.
[  OK  ] Stopped Daily Cleanup of Temporary Directories.
[  OK  ] Closed Process Core Dump Socket.
         Stopping Console Getty...
         Stopping The nginx HTTP and reverse proxy server...
         Stopping User Login Management...
[  OK  ] Stopped Console Getty.
         Stopping Permit User Sessions...
[  OK  ] Stopped User Login Management.
[  OK  ] Stopped Permit User Sessions.
systemd v249.7-2.fc35 running in system mode (+PAM +AUDIT +SELINUX -APPARMOR +IMA +SMACK +SECCOMP +GCRYPT +GNUTLS +OPENSSL +ACL +BLKID +CURL +ELFUTILS +FIDO2 +IDN2 -IDN +IPTC +KMOD +LIBCRYPTSETUP +LIBFDISK +PCRE2 +PWQUALITY +P11KIT +QRENCODE +BZIP2 +LZ4 +XZ +ZLIB +ZSTD +XKBCOMMON +UTMP +SYSVINIT default-hierarchy=unified)
Detected virtualization podman.
Detected architecture x86-64.
[  OK  ] Stopped The nginx HTTP and reverse proxy server.
[  OK  ] Stopped target Network is Online.
[  OK  ] Stopped target Host and Network Name Lookups.
[  OK  ] Stopped target Remote File Systems.
         Stopping Home Area Activation...
         Stopping Network Name Resolution...
[  OK  ] Stopped Network Name Resolution.
[  OK  ] Stopped Home Area Activation.
         Stopping Home Area Manager...
[  OK  ] Stopped Home Area Manager.
[  OK  ] Stopped target Basic System.
[  OK  ] Stopped target Path Units.
[  OK  ] Stopped Dispatch Password …ts to Console Directory Watch.
[  OK  ] Stopped Forward Password R…uests to Wall Directory Watch.
[  OK  ] Stopped target Slice Units.
[  OK  ] Removed slice User and Session Slice.
[  OK  ] Stopped target Socket Units.
         Stopping D-Bus System Message Bus...
[  OK  ] Stopped D-Bus System Message Bus.
[  OK  ] Closed D-Bus System Message Bus Socket.
[  OK  ] Stopped target System Initialization.
[  OK  ] Stopped target Local Verity Protected Volumes.
[  OK  ] Stopped Update is Completed.
[  OK  ] Stopped Rebuild Dynamic Linker Cache.
[  OK  ] Stopped Rebuild Journal Catalog.
         Stopping Record System Boot/Shutdown in UTMP...
[  OK  ] Stopped Record System Boot/Shutdown in UTMP.
[  OK  ] Stopped Create Volatile Files and Directories.
[  OK  ] Stopped target Local File Systems.
         Unmounting /etc/hostname...
         Unmounting /etc/hosts...
         Unmounting /etc/resolv.conf...
         Unmounting /run/lock...
         Unmounting /run/secrets/kubernetes.io/serviceaccount...
         Unmounting Temporary Directory /tmp...
         Unmounting /var/log/journal...
[  OK  ] Stopped Create System Users.
[FAILED] Failed unmounting /etc/hosts.
[FAILED] Failed unmounting /run/lock.
[FAILED] Failed unmounting /run/sec…/kubernetes.io/serviceaccount.
         Unmounting /run/secrets...
[FAILED] Failed unmounting /etc/resolv.conf.
[FAILED] Failed unmounting Temporary Directory /tmp.
[FAILED] Failed unmounting /var/log/journal.
[FAILED] Failed unmounting /etc/hostname.
[FAILED] Failed unmounting /run/secrets.
[  OK  ] Stopped target Swaps.
[  OK  ] Reached target System Shutdown.
[  OK  ] Reached target Unmount All Filesystems.
[  OK  ] Reached target Late Shutdown Services.
         Starting System Halt...
Sending SIGTERM to remaining processes...
Sending SIGKILL to remaining processes...
All filesystems, swaps, loop devices, MD devices and DM devices detached.
Halting system.
Exiting container.

Wrap up

In this article we have seen that:

Updates:

References