Linux mount namespace hopping considered harmful?

up vote
0
down vote

favorite

In an application, I need to temporarily "switch" into a specific mount namespace to check some things inside it in /proc, then switch back to the mount namespace my application was started with, then switch to another mount namespace, etc, etc.

The application is started with the "root" mount namespace, and under the root user (two different root concepts here!).

Under the hood, setns() is used to switch forth and back. On top of that I use Zalando's nsenter Python library. This library allows to "enter" a specific namespace by first opening an fd to /proc/self/ns/[nstype] to be used later to switch back. Then, it takes the path to a namespace in the filesystem, opens an fd from that, and joins via setns(fd, 0). Afterwards, the first fd is used to join back the original namespace, using setns() again. This works beautifully for, say, network namespaces.

But for hopping mount namespaces, it fails when trying to reenter the same mount namespace again, after having left it before. Hopping here means: my application enters one mount namespace, does some work, returns to its original mount namespace, switches into a mount namespace again, switches back, etc.

For what it is worth: the trouble seems to step in with containers in containers.

Is there some restriction on switching mount namespaces? Possibly related to user namespaces? The mount namespace man page mentions some relation to user namespaces, but I don't understand how a different user namespace active when the mount namespace for a container was created does affect my application from the root user namespace with root rights with respect to switching to and away from those container mount namespaces. Does switching into such a mount namespace makes my application loose rights, so it fails later?

So, with a nod to the giants: is mount namespace hopping considered harmful?

asked Jun 27 at 16:18

TheDiveO

22310

what is the error (errno) returned by setns() in your case?
â€“Â sourcejedi
Jun 27 at 16:33

please name at least one version of the python-nsenter library which exhibited this problem
â€“Â sourcejedi
Jun 27 at 16:35

Forgot to mention: I'm getting thrown out before the setns() because /proc/[PID]/ns/mnt with PIDs belonging to those containers isn't accessible anymore, albeit I'm considered to be in the "root" mount namespace. Up to that point all setns() calls succeed with 0, even switching back from the container mount namespace into the root mount namespace...
â€“Â TheDiveO
Jun 27 at 16:35

i.e. the attempt to open() it fails with ENOENT ("No such file or directory")? Or is it a different errno?
â€“Â sourcejedi
Jun 27 at 16:37

It's nsenter 0.2 from PyPI: pypi.org/project/nsenter/0.2 ... this is basically the same as from github.com/zalando/python-nsenter, where the GitHub version only has a newer commit removing some unnecessary brackets. I don't think that it is an issue of the nsenter library, but instead to do with how Linux mount namespaces work?
â€“Â TheDiveO
Jun 27 at 16:40

Â |Â
show 3 more comments

up vote
0
down vote

favorite

The application is started with the "root" mount namespace, and under the root user (two different root concepts here!).

For what it is worth: the trouble seems to step in with containers in containers.

So, with a nod to the giants: is mount namespace hopping considered harmful?

asked Jun 27 at 16:18

TheDiveO

22310

what is the error (errno) returned by setns() in your case?
â€“Â sourcejedi
Jun 27 at 16:33

please name at least one version of the python-nsenter library which exhibited this problem
â€“Â sourcejedi
Jun 27 at 16:35

Forgot to mention: I'm getting thrown out before the setns() because /proc/[PID]/ns/mnt with PIDs belonging to those containers isn't accessible anymore, albeit I'm considered to be in the "root" mount namespace. Up to that point all setns() calls succeed with 0, even switching back from the container mount namespace into the root mount namespace...
â€“Â TheDiveO
Jun 27 at 16:35

i.e. the attempt to open() it fails with ENOENT ("No such file or directory")? Or is it a different errno?
â€“Â sourcejedi
Jun 27 at 16:37

It's nsenter 0.2 from PyPI: pypi.org/project/nsenter/0.2 ... this is basically the same as from github.com/zalando/python-nsenter, where the GitHub version only has a newer commit removing some unnecessary brackets. I don't think that it is an issue of the nsenter library, but instead to do with how Linux mount namespaces work?
â€“Â TheDiveO
Jun 27 at 16:40

Â |Â
show 3 more comments

up vote
0
down vote

favorite

The application is started with the "root" mount namespace, and under the root user (two different root concepts here!).

For what it is worth: the trouble seems to step in with containers in containers.

So, with a nod to the giants: is mount namespace hopping considered harmful?

asked Jun 27 at 16:18

TheDiveO

22310

The application is started with the "root" mount namespace, and under the root user (two different root concepts here!).

For what it is worth: the trouble seems to step in with containers in containers.

So, with a nod to the giants: is mount namespace hopping considered harmful?

asked Jun 27 at 16:18

TheDiveO

22310

asked Jun 27 at 16:18

TheDiveO

22310

asked Jun 27 at 16:18

TheDiveO

22310

asked Jun 27 at 16:18

TheDiveO

22310

what is the error (errno) returned by setns() in your case?
â€“Â sourcejedi
Jun 27 at 16:33

please name at least one version of the python-nsenter library which exhibited this problem
â€“Â sourcejedi
Jun 27 at 16:35

Forgot to mention: I'm getting thrown out before the setns() because /proc/[PID]/ns/mnt with PIDs belonging to those containers isn't accessible anymore, albeit I'm considered to be in the "root" mount namespace. Up to that point all setns() calls succeed with 0, even switching back from the container mount namespace into the root mount namespace...
â€“Â TheDiveO
Jun 27 at 16:35

i.e. the attempt to open() it fails with ENOENT ("No such file or directory")? Or is it a different errno?
â€“Â sourcejedi
Jun 27 at 16:37

It's nsenter 0.2 from PyPI: pypi.org/project/nsenter/0.2 ... this is basically the same as from github.com/zalando/python-nsenter, where the GitHub version only has a newer commit removing some unnecessary brackets. I don't think that it is an issue of the nsenter library, but instead to do with how Linux mount namespaces work?
â€“Â TheDiveO
Jun 27 at 16:40

Â |Â
show 3 more comments

what is the error (errno) returned by setns() in your case?
â€“Â sourcejedi
Jun 27 at 16:33

please name at least one version of the python-nsenter library which exhibited this problem
â€“Â sourcejedi
Jun 27 at 16:35

Forgot to mention: I'm getting thrown out before the setns() because /proc/[PID]/ns/mnt with PIDs belonging to those containers isn't accessible anymore, albeit I'm considered to be in the "root" mount namespace. Up to that point all setns() calls succeed with 0, even switching back from the container mount namespace into the root mount namespace...
â€“Â TheDiveO
Jun 27 at 16:35

i.e. the attempt to open() it fails with ENOENT ("No such file or directory")? Or is it a different errno?
â€“Â sourcejedi
Jun 27 at 16:37

It's nsenter 0.2 from PyPI: pypi.org/project/nsenter/0.2 ... this is basically the same as from github.com/zalando/python-nsenter, where the GitHub version only has a newer commit removing some unnecessary brackets. I don't think that it is an issue of the nsenter library, but instead to do with how Linux mount namespaces work?
â€“Â TheDiveO
Jun 27 at 16:40

what is the error (errno) returned by setns() in your case?
â€“Â sourcejedi
Jun 27 at 16:33

please name at least one version of the python-nsenter library which exhibited this problem
â€“Â sourcejedi
Jun 27 at 16:35

Forgot to mention: I'm getting thrown out before the setns() because /proc/[PID]/ns/mnt with PIDs belonging to those containers isn't accessible anymore, albeit I'm considered to be in the "root" mount namespace. Up to that point all setns() calls succeed with 0, even switching back from the container mount namespace into the root mount namespace...
â€“Â TheDiveO
Jun 27 at 16:35

i.e. the attempt to open() it fails with ENOENT ("No such file or directory")? Or is it a different errno?
â€“Â sourcejedi
Jun 27 at 16:37

It's nsenter 0.2 from PyPI: pypi.org/project/nsenter/0.2 ... this is basically the same as from github.com/zalando/python-nsenter, where the GitHub version only has a newer commit removing some unnecessary brackets. I don't think that it is an issue of the nsenter library, but instead to do with how Linux mount namespaces work?
â€“Â TheDiveO
Jun 27 at 16:40

Â |Â
show 3 more comments

1 Answer
1

active

oldest

votes

up vote
0
down vote

Probably this tiny print near the end of the man page for setns(2) is the key to my woes:

Changing the mount namespace requires that the caller possess both CAP_SYS_CHROOT and CAP_SYS_ADMIN capabilities in its own user namespace and CAP_SYS_ADMIN in the target mount namespace.

I suspect that my application/process looses some CAPs after having entered a container mount namespace inside another container, so it's locked inside the mount namespace. What s5ill still makes me wonder: no error/exception when trying to reassociate with the root mount namespace...

The typical usecase for switching is probably switching into a target namespace, and then die, but never switch back. Looks as if there is no lifeline to come back, in case of mount namespaces.

answered Jun 28 at 20:00

TheDiveO

22310

system does create a new child, that works always, because there is not way back except death. This issue only appears when trying to "switch back" the mnt and net namespaces in the same process using the still opened return namespace fds. As there is no "namespace pop" operation, the return is the same as the switch into, thus there is interaction with the owning user namespaces of the namespaces switches from and to.
â€“Â TheDiveO
Jul 17 at 6:05

I forgot to mention: an unsuccessful switch back without any errno/exception causes /sys to be in some inconsistent state which isn't obvious at first, but shows network devices which aren't there.
â€“Â TheDiveO
Jul 17 at 6:07

It happens in a complex setting which I can't describe here; but there are containers in containers and Chromiun sandboxes. I only notice something's rotten after several hops between the root ns for net and mnt (in this sequence, so mnt comes last as it pulls my /proc) when I'm getting inconsistent RTNETLINK data. Flat container layout doesn't seem to trigger the issue. I see incomplete RTNETLINK data, when switching into target net ns, then mnt ns. I'm not switching user ns, and the container in container is in the root user ns. Strange. Don't ask for the reason for container in container...
â€“Â TheDiveO
Jul 17 at 19:53

still waiting for someone to explain to me what is happening when switching mount namespaces in terms of capabilities. the man page isn't enough to explain this behavior.
â€“Â TheDiveO
Jul 17 at 19:57

then please check man7.org/linux/man-pages/man2/setns.2.html: "Changing the mount namespace requires that the caller possess both CAP_SYS_CHROOT and CAP_SYS_ADMIN capabilities in its own user namespace and CAP_SYS_ADMIN in the target mount namespace. See user_namespaces(7) for details on the interaction of user namespaces and mount namespaces."
â€“Â TheDiveO
Jul 17 at 20:01

Â |Â
show 1 more comment

Your Answer

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f452268%2flinux-mount-namespace-hopping-considered-harmful%23new-answer', 'question_page');

);

Post as a guest

Name

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
0
down vote

Probably this tiny print near the end of the man page for setns(2) is the key to my woes:

Changing the mount namespace requires that the caller possess both CAP_SYS_CHROOT and CAP_SYS_ADMIN capabilities in its own user namespace and CAP_SYS_ADMIN in the target mount namespace.

The typical usecase for switching is probably switching into a target namespace, and then die, but never switch back. Looks as if there is no lifeline to come back, in case of mount namespaces.

answered Jun 28 at 20:00

TheDiveO

22310

system does create a new child, that works always, because there is not way back except death. This issue only appears when trying to "switch back" the mnt and net namespaces in the same process using the still opened return namespace fds. As there is no "namespace pop" operation, the return is the same as the switch into, thus there is interaction with the owning user namespaces of the namespaces switches from and to.
â€“Â TheDiveO
Jul 17 at 6:05

I forgot to mention: an unsuccessful switch back without any errno/exception causes /sys to be in some inconsistent state which isn't obvious at first, but shows network devices which aren't there.
â€“Â TheDiveO
Jul 17 at 6:07

It happens in a complex setting which I can't describe here; but there are containers in containers and Chromiun sandboxes. I only notice something's rotten after several hops between the root ns for net and mnt (in this sequence, so mnt comes last as it pulls my /proc) when I'm getting inconsistent RTNETLINK data. Flat container layout doesn't seem to trigger the issue. I see incomplete RTNETLINK data, when switching into target net ns, then mnt ns. I'm not switching user ns, and the container in container is in the root user ns. Strange. Don't ask for the reason for container in container...
â€“Â TheDiveO
Jul 17 at 19:53

still waiting for someone to explain to me what is happening when switching mount namespaces in terms of capabilities. the man page isn't enough to explain this behavior.
â€“Â TheDiveO
Jul 17 at 19:57

then please check man7.org/linux/man-pages/man2/setns.2.html: "Changing the mount namespace requires that the caller possess both CAP_SYS_CHROOT and CAP_SYS_ADMIN capabilities in its own user namespace and CAP_SYS_ADMIN in the target mount namespace. See user_namespaces(7) for details on the interaction of user namespaces and mount namespaces."
â€“Â TheDiveO
Jul 17 at 20:01

Â |Â
show 1 more comment

up vote
0
down vote

Probably this tiny print near the end of the man page for setns(2) is the key to my woes:

Changing the mount namespace requires that the caller possess both CAP_SYS_CHROOT and CAP_SYS_ADMIN capabilities in its own user namespace and CAP_SYS_ADMIN in the target mount namespace.

The typical usecase for switching is probably switching into a target namespace, and then die, but never switch back. Looks as if there is no lifeline to come back, in case of mount namespaces.

answered Jun 28 at 20:00

TheDiveO

22310

system does create a new child, that works always, because there is not way back except death. This issue only appears when trying to "switch back" the mnt and net namespaces in the same process using the still opened return namespace fds. As there is no "namespace pop" operation, the return is the same as the switch into, thus there is interaction with the owning user namespaces of the namespaces switches from and to.
â€“Â TheDiveO
Jul 17 at 6:05

I forgot to mention: an unsuccessful switch back without any errno/exception causes /sys to be in some inconsistent state which isn't obvious at first, but shows network devices which aren't there.
â€“Â TheDiveO
Jul 17 at 6:07

It happens in a complex setting which I can't describe here; but there are containers in containers and Chromiun sandboxes. I only notice something's rotten after several hops between the root ns for net and mnt (in this sequence, so mnt comes last as it pulls my /proc) when I'm getting inconsistent RTNETLINK data. Flat container layout doesn't seem to trigger the issue. I see incomplete RTNETLINK data, when switching into target net ns, then mnt ns. I'm not switching user ns, and the container in container is in the root user ns. Strange. Don't ask for the reason for container in container...
â€“Â TheDiveO
Jul 17 at 19:53

still waiting for someone to explain to me what is happening when switching mount namespaces in terms of capabilities. the man page isn't enough to explain this behavior.
â€“Â TheDiveO
Jul 17 at 19:57

then please check man7.org/linux/man-pages/man2/setns.2.html: "Changing the mount namespace requires that the caller possess both CAP_SYS_CHROOT and CAP_SYS_ADMIN capabilities in its own user namespace and CAP_SYS_ADMIN in the target mount namespace. See user_namespaces(7) for details on the interaction of user namespaces and mount namespaces."
â€“Â TheDiveO
Jul 17 at 20:01

Â |Â
show 1 more comment

up vote
0
down vote

Probably this tiny print near the end of the man page for setns(2) is the key to my woes:

Changing the mount namespace requires that the caller possess both CAP_SYS_CHROOT and CAP_SYS_ADMIN capabilities in its own user namespace and CAP_SYS_ADMIN in the target mount namespace.

The typical usecase for switching is probably switching into a target namespace, and then die, but never switch back. Looks as if there is no lifeline to come back, in case of mount namespaces.

answered Jun 28 at 20:00

TheDiveO

22310

Probably this tiny print near the end of the man page for setns(2) is the key to my woes:

Changing the mount namespace requires that the caller possess both CAP_SYS_CHROOT and CAP_SYS_ADMIN capabilities in its own user namespace and CAP_SYS_ADMIN in the target mount namespace.

The typical usecase for switching is probably switching into a target namespace, and then die, but never switch back. Looks as if there is no lifeline to come back, in case of mount namespaces.

answered Jun 28 at 20:00

TheDiveO

22310

answered Jun 28 at 20:00

TheDiveO

22310

answered Jun 28 at 20:00

TheDiveO

22310

answered Jun 28 at 20:00

TheDiveO

22310

system does create a new child, that works always, because there is not way back except death. This issue only appears when trying to "switch back" the mnt and net namespaces in the same process using the still opened return namespace fds. As there is no "namespace pop" operation, the return is the same as the switch into, thus there is interaction with the owning user namespaces of the namespaces switches from and to.
â€“Â TheDiveO
Jul 17 at 6:05

I forgot to mention: an unsuccessful switch back without any errno/exception causes /sys to be in some inconsistent state which isn't obvious at first, but shows network devices which aren't there.
â€“Â TheDiveO
Jul 17 at 6:07

It happens in a complex setting which I can't describe here; but there are containers in containers and Chromiun sandboxes. I only notice something's rotten after several hops between the root ns for net and mnt (in this sequence, so mnt comes last as it pulls my /proc) when I'm getting inconsistent RTNETLINK data. Flat container layout doesn't seem to trigger the issue. I see incomplete RTNETLINK data, when switching into target net ns, then mnt ns. I'm not switching user ns, and the container in container is in the root user ns. Strange. Don't ask for the reason for container in container...
â€“Â TheDiveO
Jul 17 at 19:53

still waiting for someone to explain to me what is happening when switching mount namespaces in terms of capabilities. the man page isn't enough to explain this behavior.
â€“Â TheDiveO
Jul 17 at 19:57

then please check man7.org/linux/man-pages/man2/setns.2.html: "Changing the mount namespace requires that the caller possess both CAP_SYS_CHROOT and CAP_SYS_ADMIN capabilities in its own user namespace and CAP_SYS_ADMIN in the target mount namespace. See user_namespaces(7) for details on the interaction of user namespaces and mount namespaces."
â€“Â TheDiveO
Jul 17 at 20:01

Â |Â
show 1 more comment

system does create a new child, that works always, because there is not way back except death. This issue only appears when trying to "switch back" the mnt and net namespaces in the same process using the still opened return namespace fds. As there is no "namespace pop" operation, the return is the same as the switch into, thus there is interaction with the owning user namespaces of the namespaces switches from and to.
â€“Â TheDiveO
Jul 17 at 6:05

I forgot to mention: an unsuccessful switch back without any errno/exception causes /sys to be in some inconsistent state which isn't obvious at first, but shows network devices which aren't there.
â€“Â TheDiveO
Jul 17 at 6:07

It happens in a complex setting which I can't describe here; but there are containers in containers and Chromiun sandboxes. I only notice something's rotten after several hops between the root ns for net and mnt (in this sequence, so mnt comes last as it pulls my /proc) when I'm getting inconsistent RTNETLINK data. Flat container layout doesn't seem to trigger the issue. I see incomplete RTNETLINK data, when switching into target net ns, then mnt ns. I'm not switching user ns, and the container in container is in the root user ns. Strange. Don't ask for the reason for container in container...
â€“Â TheDiveO
Jul 17 at 19:53

still waiting for someone to explain to me what is happening when switching mount namespaces in terms of capabilities. the man page isn't enough to explain this behavior.
â€“Â TheDiveO
Jul 17 at 19:57

then please check man7.org/linux/man-pages/man2/setns.2.html: "Changing the mount namespace requires that the caller possess both CAP_SYS_CHROOT and CAP_SYS_ADMIN capabilities in its own user namespace and CAP_SYS_ADMIN in the target mount namespace. See user_namespaces(7) for details on the interaction of user namespaces and mount namespaces."
â€“Â TheDiveO
Jul 17 at 20:01

system does create a new child, that works always, because there is not way back except death. This issue only appears when trying to "switch back" the mnt and net namespaces in the same process using the still opened return namespace fds. As there is no "namespace pop" operation, the return is the same as the switch into, thus there is interaction with the owning user namespaces of the namespaces switches from and to.
â€“Â TheDiveO
Jul 17 at 6:05

I forgot to mention: an unsuccessful switch back without any errno/exception causes /sys to be in some inconsistent state which isn't obvious at first, but shows network devices which aren't there.
â€“Â TheDiveO
Jul 17 at 6:07

It happens in a complex setting which I can't describe here; but there are containers in containers and Chromiun sandboxes. I only notice something's rotten after several hops between the root ns for net and mnt (in this sequence, so mnt comes last as it pulls my /proc) when I'm getting inconsistent RTNETLINK data. Flat container layout doesn't seem to trigger the issue. I see incomplete RTNETLINK data, when switching into target net ns, then mnt ns. I'm not switching user ns, and the container in container is in the root user ns. Strange. Don't ask for the reason for container in container...
â€“Â TheDiveO
Jul 17 at 19:53

still waiting for someone to explain to me what is happening when switching mount namespaces in terms of capabilities. the man page isn't enough to explain this behavior.
â€“Â TheDiveO
Jul 17 at 19:57

then please check man7.org/linux/man-pages/man2/setns.2.html: "Changing the mount namespace requires that the caller possess both CAP_SYS_CHROOT and CAP_SYS_ADMIN capabilities in its own user namespace and CAP_SYS_ADMIN in the target mount namespace. See user_namespaces(7) for details on the interaction of user namespaces and mount namespaces."
â€“Â TheDiveO
Jul 17 at 20:01

Â |Â
show 1 more comment

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

搜尋此網誌

mjhjmtu