Configuring ocp for user namespace
Preparing an OpenShift cluster for using user namespaces involves several steps and hands-over. To simplify the process we are using some configurations at freeipa-kustomize that make easier that task.
Pre-requisites:
- A 4.9 or 4.10 OpenShift cluster.
- You are logged in the cluster and you have cluster-admin privileges.
Out of scope: Build the runc and cri-o rpm packages.
Setting the configuration node
-
Clone freeipa-kustomize repository:
git clone https://github.com/freeipa/freeipa-kustomize.git
-
Retrieve the machine config pools by:
oc get mcp
-
Set the POOL environment variables with the names of the machine config pool that are going to be configured:
export POOL="worker"
if you want to specify more than one:
export POOL="worker master"
-
Install some custom RPMs by:
export RPM_PACKAGES="https://ftweedal.fedorapeople.org/runc-1.0.3-992.rhaos4.10.el8.x86_64.rpm https://ftweedal.fedorapeople.org/cri-o-1.23.0-990.rhaos4.10.git8c7713a.el8.x86_64.rpm"
The RPMs above are experimental and will become obsolete. This show how you can customize the ocp node environment easily by using this configuration. Keep in mind that if they become a lower version than the version that ships in the cluster release, the RPM package will not be installed. Credits and thanks to Fraser Tweedale.
-
Now we just run:
make -C config/static/nodes/userns configure kustomize build config/static/nodes/userns | oc create -f -
-
Finally await the node state is updated by:
oc wait mcp/worker --for condition=updated --timeout=-1s
It will take a few minutes (5-10minutes) as the configuration is applied node by node, evacuate the node, restart the node, and make it available. This process is repeated for all impacted nodes. Eventually all the nodes will get a Ready state and they could be used.
How is it structured?
The main overlay at config/static/nodes/userns is a composition of smaller
ones, that are divided on:
config/static/nodes/cgroup-v2: Configure cgroup-v2 into the node, enabling to mount cgroup v2 filessytem into the node.config/static/nodes/userns-subid: Configure the necessary subid for the user namespaces. Different files can be found atconfig/static/nodes/userns-subid/filesto spicify the subuid and subgid information.99-crio-userns.conf: Enable theio.kubernates.cri-o.userns-modeannotation into the PodSpec.subuidandsubgid: Configure the subordinate ids to be used by the user namespace.
config/static/nodes/rpm-overrides: This configuration handle the RPM package installation. This is made by creating a systemd unit, and executing the command that install the RPM package. It is generated a resource for each RPM and POOL. The package installation is checked before launch the RPM command, so that future reboots does not try to install the RPM package again. Here this is used for custom runc and cri-o rpm packages, but this configuration could work for any RPM that we want to quickly test into our OCP development cluster.
Checking that the configuration was applied
Here you will find several commands that are executed from the node. If you are using CodeReadyContainers you can directly use a ssh command such as:
ssh -i ~/.crc/machines/crc/id_ecdsa core@192.168.130.11
This could be helpful when the KAS communication is not available.
Or you can just open a terminal into the node and run the command there by:
# Retrieve node list by: oc get nodes # Open the terminal by: oc debug node/NODE chroot /host # Now run your commands here
-
For the RPM packages check from the node:
runc --version
runc version 1.0.3 spec: 1.0.2-dev go: go1.17.2 libseccomp: 2.5.1
# If you are using code ready containers, you can directly do the below ssh -i ~/.crc/machines/crc/id_ecdsa core@192.168.130.11 journalctl -u install-runc.service # Or using the oc adm command oc adm node-logs -u install-runc.service NODE
-- Logs begin at Sat 2021-12-11 13:38:56 UTC, end at Wed 2022-01-26 07:12:23 UTC. -- Jan 26 06:50:13 crc-hsl9k-master-0 bash[1658]: package runc-1.0.3-992.rhaos4.10.el8.x86_64 is not installed Jan 26 06:50:13 crc-hsl9k-master-0 systemd[1]: Started Install custom runc. Jan 26 06:50:14 crc-hsl9k-master-0 bash[1658]: Downloading 'https://ftweedal.fedorapeople.org/runc-1.0.3-992.rhaos4.10.el8.x86_64.rpm'... done! Jan 26 06:50:16 crc-hsl9k-master-0 bash[1658]: Checking out tree 26d80bc...done Jan 26 06:50:16 crc-hsl9k-master-0 bash[1658]: No enabled rpm-md repositories. Jan 26 06:50:16 crc-hsl9k-master-0 bash[1658]: Importing rpm-md...done Jan 26 06:50:16 crc-hsl9k-master-0 bash[1658]: Resolving dependencies...done Jan 26 06:50:16 crc-hsl9k-master-0 bash[1658]: Applying 1 override and 5 overlays Jan 26 06:50:16 crc-hsl9k-master-0 bash[1658]: Processing packages...done Jan 26 06:50:16 crc-hsl9k-master-0 bash[1658]: Running pre scripts...done Jan 26 06:50:16 crc-hsl9k-master-0 bash[1658]: Running post scripts...done Jan 26 06:50:17 crc-hsl9k-master-0 bash[1658]: Running posttrans scripts...done Jan 26 06:50:17 crc-hsl9k-master-0 bash[1658]: Writing rpmdb...done Jan 26 06:50:18 crc-hsl9k-master-0 bash[1658]: Writing OSTree commit...done Jan 26 06:50:19 crc-hsl9k-master-0 bash[1658]: Staging deployment...done Jan 26 06:50:20 crc-hsl9k-master-0 systemd[1]: Stopping Install custom runc... Jan 26 06:50:20 crc-hsl9k-master-0 systemd[1]: install-runc.service: Succeeded. Jan 26 06:50:20 crc-hsl9k-master-0 systemd[1]: Stopped Install custom runc. Jan 26 06:50:20 crc-hsl9k-master-0 systemd[1]: install-runc.service: Consumed 94ms CPU time -- Reboot -- Jan 26 06:51:10 crc-hsl9k-master-0 bash[1656]: runc-1.0.3-992.rhaos4.10.el8.x86_64 Jan 26 06:51:09 crc-hsl9k-master-0 systemd[1]: Started Install custom runc. Jan 26 06:51:09 crc-hsl9k-master-0 systemd[1]: install-runc.service: Succeeded. Jan 26 06:51:09 crc-hsl9k-master-0 systemd[1]: install-runc.service: Consumed 11ms CPU time
-
For the cgroup2, run the below from the node:
mount | grep cgroup2
cgroup2 on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime,seclabel) cgroup on /var/lib/containers/storage/overlay/1ec73edf3e99a0772aaab2ba0f27110bb879a9fe86f607acc9de822489a4a9e1/merged/sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime,seclabel)
-
For the kernelarguments, run the below from the node:
# check kernel args in the node boot cat /proc/cmdline
BOOT_IMAGE=(hd0,gpt3)/ostree/rhcos-36fd944867b0e491991a65f6f3b7209c937fe3bd7cdbd855c7c5d5a7070ce570/vmlinuz-4.18.0-305.28.1.el8_4.x86_64 random.trust_cpu=on console=tty0 console=ttyS0,115200n8 ignition.platform.id=qemu ostree=/ostree/boot.1/rhcos/36fd944867b0e491991a65f6f3b7209c937fe3bd7cdbd855c7c5d5a7070ce570/0 root=UUID=91ba4914-fd2b-4a7c-b498-28585a80a40e rw rootflags=prjquota systemd.unified_cgroup_hierarchy=1 cgroup_no_v1=all psi=1 -
For the subid configuration we run the below from the node:
cat /etc/subuid cat /etc/subgid
core:100000:65536 containers:200000:268435456
core:100000:65536 containers:200000:268435456
And we can observe that entries for container user and group exists too:
getent passwd containers getent group containers
containers:x:1001:995:User for housing the sub ID range for containers:/var/home/containers:/sbin/nologincontainers:x:995: -
For the cri-o configuration we run the below from the node:
cat /etc/crio/crio.conf.d/99-crio-userns.conf
# https://github.com/cri-o/cri-o/blob/main/docs/crio.conf.5.md#crioruntimeruntimes-table [crio.runtime.runtimes.runc] allowed_annotations=["io.kubernetes.cri-o.userns-mode"]
Now we can use the annotation below to enable user namespaces for a particular Pod:
apiVersion: v1 kind: Pod metadata: name: test-userns annotations: io.kubernetes.cri-o.userns-mode: "auto:size=65536" spec: serviceAccountName: test-userns containers: - name: userns-test image: quay.io/fedora/fedora:35 command: ["sleep", "3600"]
Let's try quickly with the below:
# Create a namespace oc new-project test-userns # Create the 'test-userns' service account to be used oc create sa test-userns # Add edit role to the sa oc adm policy add-role-to-user edit -z test-userns # Add anyuid security context constraint to the sa oc adm policy add-scc-to-user anyuid -z test-userns # We create the service oc create -f pod.yaml --as system:serviceaccount:$( oc project -q ):test-userns
When the pod is ready, we check the user namespace by:
oc exec pod/test-userns -- cat /proc/1/uid_map
0 200000 65536This means that the [0..65535] uids inside the container are mapped to [200000..265535] into the parent container.
When the user namespace is not used, the content of this file will be:
0 0 4294967295
Wrap-up
With this configuration we can quickly set up our OCP cluster to quickly experiment with and investigate user namespace.
Knowledgements
- Thanks to Fraser Tweedale for his sessions to understand better the user namespaces.