nf_conntrack_sip does not work SOMETIMES, restarting iptables USUALLY fixes it

up vote
0
down vote

favorite

I'm trying to use nf_conntrack_sip on box that is running Asterisk, that is, not routing traffic for another VoIP box. Setup works until I reboot. After reboot nf_conntrack_sip ALMOST always stops working and media traffic is dropped.

conntrack --dump | grep -E 'sip|helper'
# No output matching 'sip' nor 'helper' while a call is in progress (albeit no audio)

The iptables rules are loaded correctly confirmed by iptables-save.

Then I do systemctl restart iptables and 9/10 times that fixes it. If it does not then I restart repeat the iptables restart.

conntrack --dump | grep -E 'sip|helper'
conntrack v1.4.4 (conntrack-tools): 9 flow entries have been shown.
udp 17 3597 src=10.7.0.38 dst=10.47.1.11 sport=5063 dport=5060 src=10.47.1.11 dst=10.7.0.38 sport=5060 dport=5063 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 helper=sip use=2

Simply reloading the rules with iptables-restore < /etc/sysconfig/iptables does not help. I suspect unloading/loading conntrack or some modules does the trick.

Occasionally it does work at boot, but very rare. Asterisk start quickly. Giving it more time to "finish starting something" does not help.

Update: restarting iptables while nf_conntrack_sip is working as expected, can, rarely, break it.

The setup:

Update: Initially the problem was described as occurring on a VM, but since then I reinstalled onto real hardware (i5-6500 CPU @ 3.20GHz with 8Gb RAM) with exactly the same problem still occurring. All identical packages (same provision script) as the initial VM.

The OS is CentOS-7.4 Minimal + updates, kernel 3.10.0-693.21.1.el7.x86_64. It is all installed from RPMs, no custom kernels nor modules. Update: I also did yum update to latest stable packages and kernel available from CentOS at 2018-08-10. The problem persists.

I did yum autoremove firewalld and yum install iptables-services.

Diffs to /etc/sysconfig/iptables-config (other values are defaults by RPM)

-IPTABLES_MODULES=""
+IPTABLES_MODULES="nf_conntrack_sip"

Added file /etc/modprobe.d/nf_conntrack.conf:

options nf_conntrack nf_conntrack_helper=0

The entire /etc/sysconfig/iptables is very simple:

*raw
-A PREROUTING -p udp --dport 5060 -j CT --helper sip
COMMIT

*filter
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
-A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT
-A INPUT -p icmp -j ACCEPT
-A INPUT -i lo -j ACCEPT
-A INPUT -p tcp -m state --state NEW -m tcp --dport 22 -j ACCEPT
-A INPUT -p udp -m state --state NEW -m udp --dport 5060 -j ACCEPT
-A INPUT -j LOG --log-level 7 --log-prefix "REJECT in filter.INPUT:"
-A INPUT -j REJECT --reject-with icmp-host-prohibited
-A FORWARD -j REJECT --reject-with icmp-host-prohibited
COMMIT

Update: Setting module options options nf_conntrack nf_conntrack_helper=1 and NOT using the iptables rule ... -j CT --helper sip does NOT fix it and the behavior remains non-deterministic.

Not relevant to the problem, only to confirm that packets are dropped, as opposed to having NAT issues, /etc/rsyslog.d/kern-debug.conf

kern.=debug /var/log/kernel-debug

Testing with a Cisco SPA504G phone that dials into the PBX and gets hold music. Not trying to do anything complicated with media. SIP signalling and Media are exchanged with same IPv4 address. The test call is only between the phone and the PBX. No other parties involved.

My attempt to diagnose it:

I've made short script that tries to capture the state of various things before and after the fix by restarting iptables, to compare by diff. The script:

for f in $( find /proc/sys/net/netfilter -type f ); do
 echo f=$f
 cat "$f"
done

echo cat /sys/module/nf_conntrack/parameters/*
cat /sys/module/nf_conntrack/parameters/*

echo ls /sys/module/nf_conntrack/holders/
ls /sys/module/nf_conntrack/holders/

echo cat /sys/module/nf_conntrack_sip/parameters/*
cat /sys/module/nf_conntrack_sip/parameters/*
echo ls /sys/module/nf_conntrack_sip/holders/
ls /sys/module/nf_conntrack_sip/holders/

echo ls /sys/module/ip*/holders/
ls /sys/module/ip,nf_*/holders/

echo sysctl -a
sysctl -a

echo lsmod
lsmod

echo iptables-save
iptables-save

The only thing I notice is that OFTEN module nf_conntrack_netlink IS listed as loaded after the boot, while there is a problem. Sometimes it is NOT listed by lsmod AFTER the boot but there is still the problem. After restarting iptables it is, to the best of my knowledge, never listed as loaded. I suspect it is unrelated because there is no direct link between it being loaded and the problem manifesting.

edited Aug 12 at 3:35

asked Aug 8 at 16:34

AndrDevEK

1558

Have a look at your system logs. Do you have a message about exausted nf_conntrack connections?
â€“Â Rui F Ribeiro
Aug 8 at 19:32

@Rui F Ribeiro I was tail -f /var/log/messages and nothing like that came up. I suspect what you suggest would happen on a busy or system or under a DoF attack. This is an isolated test and network with the fault right after boot when conntrack --dump lists 12 entries on average. I will check again and post result. Thanks.
â€“Â AndrDevEK
Aug 8 at 22:38

@Rui F Ribeiro - no, there is nothing in the logs about reaching nf_conntrack connections limits. I'm reaching conntrack connection counts in the 20-40 range, with max being 65536.
â€“Â AndrDevEK
Aug 9 at 1:23

Have you installed vmware tools?
â€“Â Rui F Ribeiro
Aug 9 at 1:25

1

Instead of restarting iptables to fix it when it fails, you should try clearing conntrack entries: if it fixes it, that would pinpoint an issue related to conflict of ports in conntrack. Check conntrack -L and conntrack -D -p udp --dport ... etc. Not that it would solve anything, but that would be a step in the right direction. The restart does have this effect (iptables.init in CentOS has: # try to unload remaining netfilter modules used by ipv4 and ipv6 )
â€“Â A.B
Aug 20 at 18:35

Â |Â
show 12 more comments

up vote
0
down vote

favorite

conntrack --dump | grep -E 'sip|helper'
# No output matching 'sip' nor 'helper' while a call is in progress (albeit no audio)

The iptables rules are loaded correctly confirmed by iptables-save.

Then I do systemctl restart iptables and 9/10 times that fixes it. If it does not then I restart repeat the iptables restart.

conntrack --dump | grep -E 'sip|helper'
conntrack v1.4.4 (conntrack-tools): 9 flow entries have been shown.
udp 17 3597 src=10.7.0.38 dst=10.47.1.11 sport=5063 dport=5060 src=10.47.1.11 dst=10.7.0.38 sport=5060 dport=5063 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 helper=sip use=2

Simply reloading the rules with iptables-restore < /etc/sysconfig/iptables does not help. I suspect unloading/loading conntrack or some modules does the trick.

Occasionally it does work at boot, but very rare. Asterisk start quickly. Giving it more time to "finish starting something" does not help.

Update: restarting iptables while nf_conntrack_sip is working as expected, can, rarely, break it.

The setup:

I did yum autoremove firewalld and yum install iptables-services.

Diffs to /etc/sysconfig/iptables-config (other values are defaults by RPM)

-IPTABLES_MODULES=""
+IPTABLES_MODULES="nf_conntrack_sip"

Added file /etc/modprobe.d/nf_conntrack.conf:

options nf_conntrack nf_conntrack_helper=0

The entire /etc/sysconfig/iptables is very simple:

*raw
-A PREROUTING -p udp --dport 5060 -j CT --helper sip
COMMIT

*filter
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
-A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT
-A INPUT -p icmp -j ACCEPT
-A INPUT -i lo -j ACCEPT
-A INPUT -p tcp -m state --state NEW -m tcp --dport 22 -j ACCEPT
-A INPUT -p udp -m state --state NEW -m udp --dport 5060 -j ACCEPT
-A INPUT -j LOG --log-level 7 --log-prefix "REJECT in filter.INPUT:"
-A INPUT -j REJECT --reject-with icmp-host-prohibited
-A FORWARD -j REJECT --reject-with icmp-host-prohibited
COMMIT

Update: Setting module options options nf_conntrack nf_conntrack_helper=1 and NOT using the iptables rule ... -j CT --helper sip does NOT fix it and the behavior remains non-deterministic.

Not relevant to the problem, only to confirm that packets are dropped, as opposed to having NAT issues, /etc/rsyslog.d/kern-debug.conf

kern.=debug /var/log/kernel-debug

My attempt to diagnose it:

I've made short script that tries to capture the state of various things before and after the fix by restarting iptables, to compare by diff. The script:

for f in $( find /proc/sys/net/netfilter -type f ); do
 echo f=$f
 cat "$f"
done

echo cat /sys/module/nf_conntrack/parameters/*
cat /sys/module/nf_conntrack/parameters/*

echo ls /sys/module/nf_conntrack/holders/
ls /sys/module/nf_conntrack/holders/

echo cat /sys/module/nf_conntrack_sip/parameters/*
cat /sys/module/nf_conntrack_sip/parameters/*
echo ls /sys/module/nf_conntrack_sip/holders/
ls /sys/module/nf_conntrack_sip/holders/

echo ls /sys/module/ip*/holders/
ls /sys/module/ip,nf_*/holders/

echo sysctl -a
sysctl -a

echo lsmod
lsmod

echo iptables-save
iptables-save

edited Aug 12 at 3:35

asked Aug 8 at 16:34

AndrDevEK

1558

Have a look at your system logs. Do you have a message about exausted nf_conntrack connections?
â€“Â Rui F Ribeiro
Aug 8 at 19:32

@Rui F Ribeiro I was tail -f /var/log/messages and nothing like that came up. I suspect what you suggest would happen on a busy or system or under a DoF attack. This is an isolated test and network with the fault right after boot when conntrack --dump lists 12 entries on average. I will check again and post result. Thanks.
â€“Â AndrDevEK
Aug 8 at 22:38

@Rui F Ribeiro - no, there is nothing in the logs about reaching nf_conntrack connections limits. I'm reaching conntrack connection counts in the 20-40 range, with max being 65536.
â€“Â AndrDevEK
Aug 9 at 1:23

Have you installed vmware tools?
â€“Â Rui F Ribeiro
Aug 9 at 1:25

1

Instead of restarting iptables to fix it when it fails, you should try clearing conntrack entries: if it fixes it, that would pinpoint an issue related to conflict of ports in conntrack. Check conntrack -L and conntrack -D -p udp --dport ... etc. Not that it would solve anything, but that would be a step in the right direction. The restart does have this effect (iptables.init in CentOS has: # try to unload remaining netfilter modules used by ipv4 and ipv6 )
â€“Â A.B
Aug 20 at 18:35

Â |Â
show 12 more comments

up vote
0
down vote

favorite

conntrack --dump | grep -E 'sip|helper'
# No output matching 'sip' nor 'helper' while a call is in progress (albeit no audio)

The iptables rules are loaded correctly confirmed by iptables-save.

Then I do systemctl restart iptables and 9/10 times that fixes it. If it does not then I restart repeat the iptables restart.

conntrack --dump | grep -E 'sip|helper'
conntrack v1.4.4 (conntrack-tools): 9 flow entries have been shown.
udp 17 3597 src=10.7.0.38 dst=10.47.1.11 sport=5063 dport=5060 src=10.47.1.11 dst=10.7.0.38 sport=5060 dport=5063 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 helper=sip use=2

Simply reloading the rules with iptables-restore < /etc/sysconfig/iptables does not help. I suspect unloading/loading conntrack or some modules does the trick.

Occasionally it does work at boot, but very rare. Asterisk start quickly. Giving it more time to "finish starting something" does not help.

Update: restarting iptables while nf_conntrack_sip is working as expected, can, rarely, break it.

The setup:

I did yum autoremove firewalld and yum install iptables-services.

Diffs to /etc/sysconfig/iptables-config (other values are defaults by RPM)

-IPTABLES_MODULES=""
+IPTABLES_MODULES="nf_conntrack_sip"

Added file /etc/modprobe.d/nf_conntrack.conf:

options nf_conntrack nf_conntrack_helper=0

The entire /etc/sysconfig/iptables is very simple:

*raw
-A PREROUTING -p udp --dport 5060 -j CT --helper sip
COMMIT

*filter
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
-A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT
-A INPUT -p icmp -j ACCEPT
-A INPUT -i lo -j ACCEPT
-A INPUT -p tcp -m state --state NEW -m tcp --dport 22 -j ACCEPT
-A INPUT -p udp -m state --state NEW -m udp --dport 5060 -j ACCEPT
-A INPUT -j LOG --log-level 7 --log-prefix "REJECT in filter.INPUT:"
-A INPUT -j REJECT --reject-with icmp-host-prohibited
-A FORWARD -j REJECT --reject-with icmp-host-prohibited
COMMIT

Update: Setting module options options nf_conntrack nf_conntrack_helper=1 and NOT using the iptables rule ... -j CT --helper sip does NOT fix it and the behavior remains non-deterministic.

Not relevant to the problem, only to confirm that packets are dropped, as opposed to having NAT issues, /etc/rsyslog.d/kern-debug.conf

kern.=debug /var/log/kernel-debug

My attempt to diagnose it:

I've made short script that tries to capture the state of various things before and after the fix by restarting iptables, to compare by diff. The script:

for f in $( find /proc/sys/net/netfilter -type f ); do
 echo f=$f
 cat "$f"
done

echo cat /sys/module/nf_conntrack/parameters/*
cat /sys/module/nf_conntrack/parameters/*

echo ls /sys/module/nf_conntrack/holders/
ls /sys/module/nf_conntrack/holders/

echo cat /sys/module/nf_conntrack_sip/parameters/*
cat /sys/module/nf_conntrack_sip/parameters/*
echo ls /sys/module/nf_conntrack_sip/holders/
ls /sys/module/nf_conntrack_sip/holders/

echo ls /sys/module/ip*/holders/
ls /sys/module/ip,nf_*/holders/

echo sysctl -a
sysctl -a

echo lsmod
lsmod

echo iptables-save
iptables-save

edited Aug 12 at 3:35

asked Aug 8 at 16:34

AndrDevEK

1558

conntrack --dump | grep -E 'sip|helper'
# No output matching 'sip' nor 'helper' while a call is in progress (albeit no audio)

The iptables rules are loaded correctly confirmed by iptables-save.

Then I do systemctl restart iptables and 9/10 times that fixes it. If it does not then I restart repeat the iptables restart.

conntrack --dump | grep -E 'sip|helper'
conntrack v1.4.4 (conntrack-tools): 9 flow entries have been shown.
udp 17 3597 src=10.7.0.38 dst=10.47.1.11 sport=5063 dport=5060 src=10.47.1.11 dst=10.7.0.38 sport=5060 dport=5063 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 helper=sip use=2

Simply reloading the rules with iptables-restore < /etc/sysconfig/iptables does not help. I suspect unloading/loading conntrack or some modules does the trick.

Occasionally it does work at boot, but very rare. Asterisk start quickly. Giving it more time to "finish starting something" does not help.

Update: restarting iptables while nf_conntrack_sip is working as expected, can, rarely, break it.

The setup:

I did yum autoremove firewalld and yum install iptables-services.

Diffs to /etc/sysconfig/iptables-config (other values are defaults by RPM)

-IPTABLES_MODULES=""
+IPTABLES_MODULES="nf_conntrack_sip"

Added file /etc/modprobe.d/nf_conntrack.conf:

options nf_conntrack nf_conntrack_helper=0

The entire /etc/sysconfig/iptables is very simple:

*raw
-A PREROUTING -p udp --dport 5060 -j CT --helper sip
COMMIT

*filter
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
-A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT
-A INPUT -p icmp -j ACCEPT
-A INPUT -i lo -j ACCEPT
-A INPUT -p tcp -m state --state NEW -m tcp --dport 22 -j ACCEPT
-A INPUT -p udp -m state --state NEW -m udp --dport 5060 -j ACCEPT
-A INPUT -j LOG --log-level 7 --log-prefix "REJECT in filter.INPUT:"
-A INPUT -j REJECT --reject-with icmp-host-prohibited
-A FORWARD -j REJECT --reject-with icmp-host-prohibited
COMMIT

Update: Setting module options options nf_conntrack nf_conntrack_helper=1 and NOT using the iptables rule ... -j CT --helper sip does NOT fix it and the behavior remains non-deterministic.

Not relevant to the problem, only to confirm that packets are dropped, as opposed to having NAT issues, /etc/rsyslog.d/kern-debug.conf

kern.=debug /var/log/kernel-debug

My attempt to diagnose it:

I've made short script that tries to capture the state of various things before and after the fix by restarting iptables, to compare by diff. The script:

for f in $( find /proc/sys/net/netfilter -type f ); do
 echo f=$f
 cat "$f"
done

echo cat /sys/module/nf_conntrack/parameters/*
cat /sys/module/nf_conntrack/parameters/*

echo ls /sys/module/nf_conntrack/holders/
ls /sys/module/nf_conntrack/holders/

echo cat /sys/module/nf_conntrack_sip/parameters/*
cat /sys/module/nf_conntrack_sip/parameters/*
echo ls /sys/module/nf_conntrack_sip/holders/
ls /sys/module/nf_conntrack_sip/holders/

echo ls /sys/module/ip*/holders/
ls /sys/module/ip,nf_*/holders/

echo sysctl -a
sysctl -a

echo lsmod
lsmod

echo iptables-save
iptables-save

iptables sip ip-conntrack

edited Aug 12 at 3:35

asked Aug 8 at 16:34

AndrDevEK

1558

edited Aug 12 at 3:35

asked Aug 8 at 16:34

AndrDevEK

1558

edited Aug 12 at 3:35

asked Aug 8 at 16:34

AndrDevEK

1558

asked Aug 8 at 16:34

AndrDevEK

1558

asked Aug 8 at 16:34

AndrDevEK

1558

Have a look at your system logs. Do you have a message about exausted nf_conntrack connections?
â€“Â Rui F Ribeiro
Aug 8 at 19:32

@Rui F Ribeiro I was tail -f /var/log/messages and nothing like that came up. I suspect what you suggest would happen on a busy or system or under a DoF attack. This is an isolated test and network with the fault right after boot when conntrack --dump lists 12 entries on average. I will check again and post result. Thanks.
â€“Â AndrDevEK
Aug 8 at 22:38

@Rui F Ribeiro - no, there is nothing in the logs about reaching nf_conntrack connections limits. I'm reaching conntrack connection counts in the 20-40 range, with max being 65536.
â€“Â AndrDevEK
Aug 9 at 1:23

Have you installed vmware tools?
â€“Â Rui F Ribeiro
Aug 9 at 1:25

1

Instead of restarting iptables to fix it when it fails, you should try clearing conntrack entries: if it fixes it, that would pinpoint an issue related to conflict of ports in conntrack. Check conntrack -L and conntrack -D -p udp --dport ... etc. Not that it would solve anything, but that would be a step in the right direction. The restart does have this effect (iptables.init in CentOS has: # try to unload remaining netfilter modules used by ipv4 and ipv6 )
â€“Â A.B
Aug 20 at 18:35

Â |Â
show 12 more comments

Have a look at your system logs. Do you have a message about exausted nf_conntrack connections?
â€“Â Rui F Ribeiro
Aug 8 at 19:32

@Rui F Ribeiro I was tail -f /var/log/messages and nothing like that came up. I suspect what you suggest would happen on a busy or system or under a DoF attack. This is an isolated test and network with the fault right after boot when conntrack --dump lists 12 entries on average. I will check again and post result. Thanks.
â€“Â AndrDevEK
Aug 8 at 22:38

@Rui F Ribeiro - no, there is nothing in the logs about reaching nf_conntrack connections limits. I'm reaching conntrack connection counts in the 20-40 range, with max being 65536.
â€“Â AndrDevEK
Aug 9 at 1:23

Have you installed vmware tools?
â€“Â Rui F Ribeiro
Aug 9 at 1:25

1

Instead of restarting iptables to fix it when it fails, you should try clearing conntrack entries: if it fixes it, that would pinpoint an issue related to conflict of ports in conntrack. Check conntrack -L and conntrack -D -p udp --dport ... etc. Not that it would solve anything, but that would be a step in the right direction. The restart does have this effect (iptables.init in CentOS has: # try to unload remaining netfilter modules used by ipv4 and ipv6 )
â€“Â A.B
Aug 20 at 18:35

Have a look at your system logs. Do you have a message about exausted nf_conntrack connections?
â€“Â Rui F Ribeiro
Aug 8 at 19:32

@Rui F Ribeiro I was tail -f /var/log/messages and nothing like that came up. I suspect what you suggest would happen on a busy or system or under a DoF attack. This is an isolated test and network with the fault right after boot when conntrack --dump lists 12 entries on average. I will check again and post result. Thanks.
â€“Â AndrDevEK
Aug 8 at 22:38

@Rui F Ribeiro - no, there is nothing in the logs about reaching nf_conntrack connections limits. I'm reaching conntrack connection counts in the 20-40 range, with max being 65536.
â€“Â AndrDevEK
Aug 9 at 1:23

Have you installed vmware tools?
â€“Â Rui F Ribeiro
Aug 9 at 1:25

Instead of restarting iptables to fix it when it fails, you should try clearing conntrack entries: if it fixes it, that would pinpoint an issue related to conflict of ports in conntrack. Check conntrack -L and conntrack -D -p udp --dport ... etc. Not that it would solve anything, but that would be a step in the right direction. The restart does have this effect (iptables.init in CentOS has: # try to unload remaining netfilter modules used by ipv4 and ipv6 )
â€“Â A.B
Aug 20 at 18:35

Â |Â
show 12 more comments

2 Answers
2

active

oldest

votes

up vote
0
down vote

accepted

Solution

The solution was to simply mark the outgoing packets to be handled by conntrack sip helper too:

iptables -t raw -A OUTPUT -p udp -m udp --sport 5060 -j CT --helper sip

Cause

The problem was the firewall rule was marking only incoming packets for conntrack sip helper.

iptables -t raw -A PREROUTING -p udp -m udp --dport 5060 -j CT --helper sip

When the PBX was the one to send the first packet toward the phone, it would establish a conntrack entry without sip helper.
The entry continued to match the SIP conversation without SIP helper being involved.

[root@test ~]# conntrack -L | grep -E '5060|sip'
conntrack v1.4.4 (conntrack-tools): 13 flow entries have been shown.
udp 17 159 src=10.47.1.11 dst=10.7.0.38 sport=5060 dport=1024 src=10.7.0.38 dst=10.47.1.11 sport=1024 dport=5060 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1

When the phone was the one to send the first packet toward the PBX, it would hit the rule with "-j CT --helper sip" listed above and the conntrack entry would get created with sip helper.

[root@test ~]# conntrack -L | grep -E '5060|sip'
conntrack v1.4.4 (conntrack-tools): 9 flow entries have been shown.
udp 17 3588 src=10.7.0.38 dst=10.47.1.11 sport=1024 dport=5060 src=10.47.1.11 dst=10.7.0.38 sport=5060 dport=1024 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 helper=sip use=2

Note the "helper=sip" towards the end of the entry, compare with lack of it in first sample.

The PBX and Phone send SIP packets at each other to confirm the other is there, so the timing made it appear non-deterministic.

Asterisk preserves the state of peers across reboots, and probes them after reboot, thus being far more likely to send outgoing packet first and causing the non-SIP-helper entry in the conntrack.

Big thank you to user A.B for pointing me in the right direction in the comments.

Remaining mystery

What I can not explain is why when I had modprobe option

options nf_conntrack nf_conntrack_helper=0

It was still getting broken and "fixed" in the same way.
I did not spend much time on helper automatically triggering, so maybe I did something wrong.
I might update this answer if I find out more.
I am not planning to use automatic helper enabled.

answered Aug 23 at 15:16

AndrDevEK

1558

add a commentÂ |Â

up vote
-1
down vote

You are running that testing environment on top of an environment that is severily constraining your performance.

Besides running on a Windows, whilst you have a semi-decent ammount of RAM, the CPU is on the low side. On top of it, you do not have VMWare Tools installed for paravirtualizing your NICs.

I would:

run it on a dedicated VmWare ESXi or at least, not on top of Windows for having a bigger network throughput;

give it at least 2 or 3 CPUs, you want to have some concurrency there, and handle things faster;

install open vmWare tools in the guest OS, the difference in network performance can be more than tenfold of having paravirtualized NICs vs forcing the host to emulate the bits and bytes of "fake"/emulated NICs;

if running Asterisk on top of MySQL, I would prefer having at least 4GB of RAM.

edited Aug 9 at 2:27

answered Aug 9 at 1:33

Rui F Ribeiro

36.5k1271116

It is possible I did not make my comment clear. The Windows host has 4HT cores (8 virt CPUs) and 32Gb RAM. The Linux guest is allocated 1CPU and 2Gb RAM. I disagree that 1CPU and 2Gb RAM for VM is low. It does not come anywhere near the limits. CPU load averate 0.02. RAM used less than half. It is failing to handle ONE SIP connection with 30ish conntrack entries. The comments about Windows and VMWare tools sound more plausible. I'm looking to repeat the test on real hardware.
â€“Â AndrDevEK
Aug 9 at 1:42

Be aware that load is not the whole story. I do not know your setup to say something definite about RAM, however I can tell you it will never scale with such constraints in the CPU side. You are severely constraining it with only 1 CPU, namely for handling interrupts requests.
â€“Â Rui F Ribeiro
Aug 9 at 1:44

This setup if purely to test nf_conntrack_sip configuration, what settings to put where. It is not expected to scale nor to be used in production of any sort. It is on an isolated test network.
â€“Â AndrDevEK
Aug 9 at 1:47

Are you insterested on doing a more fidedign load test or not? it is up to you. As a matter of curiosity, I have seen setups on commercial linux appliances of doing CPU pinning, at setting aside a whole CPU for that, for having the OS/ disk I/O handling in separate CPUs of the network output. Nevertheless, the biggest shortcome for now in your setup is not using vmtools in a VMWare based-environment.
â€“Â Rui F Ribeiro
Aug 9 at 1:50

I have also witnessed some nasty network-related bugs in vbox and hyper V in Windows...in fact I gave up entirely using Windows and switched to a Linux host at work. Some were clearly Windows fault. Cant speak much about VMware under Windows on bugs.
â€“Â Rui F Ribeiro
Aug 9 at 1:56

Â |Â
show 10 more comments

Your Answer

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f461320%2fnf-conntrack-sip-does-not-work-sometimes-restarting-iptables-usually-fixes-it%23new-answer', 'question_page');

);

Post as a guest

Name

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

up vote
0
down vote

accepted

Solution

The solution was to simply mark the outgoing packets to be handled by conntrack sip helper too:

iptables -t raw -A OUTPUT -p udp -m udp --sport 5060 -j CT --helper sip

Cause

The problem was the firewall rule was marking only incoming packets for conntrack sip helper.

iptables -t raw -A PREROUTING -p udp -m udp --dport 5060 -j CT --helper sip

[root@test ~]# conntrack -L | grep -E '5060|sip'
conntrack v1.4.4 (conntrack-tools): 13 flow entries have been shown.
udp 17 159 src=10.47.1.11 dst=10.7.0.38 sport=5060 dport=1024 src=10.7.0.38 dst=10.47.1.11 sport=1024 dport=5060 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1

When the phone was the one to send the first packet toward the PBX, it would hit the rule with "-j CT --helper sip" listed above and the conntrack entry would get created with sip helper.

[root@test ~]# conntrack -L | grep -E '5060|sip'
conntrack v1.4.4 (conntrack-tools): 9 flow entries have been shown.
udp 17 3588 src=10.7.0.38 dst=10.47.1.11 sport=1024 dport=5060 src=10.47.1.11 dst=10.7.0.38 sport=5060 dport=1024 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 helper=sip use=2

Note the "helper=sip" towards the end of the entry, compare with lack of it in first sample.

The PBX and Phone send SIP packets at each other to confirm the other is there, so the timing made it appear non-deterministic.

Asterisk preserves the state of peers across reboots, and probes them after reboot, thus being far more likely to send outgoing packet first and causing the non-SIP-helper entry in the conntrack.

Big thank you to user A.B for pointing me in the right direction in the comments.

Remaining mystery

What I can not explain is why when I had modprobe option

options nf_conntrack nf_conntrack_helper=0

answered Aug 23 at 15:16

AndrDevEK

1558

add a commentÂ |Â

up vote
0
down vote

accepted

Solution

The solution was to simply mark the outgoing packets to be handled by conntrack sip helper too:

iptables -t raw -A OUTPUT -p udp -m udp --sport 5060 -j CT --helper sip

Cause

The problem was the firewall rule was marking only incoming packets for conntrack sip helper.

iptables -t raw -A PREROUTING -p udp -m udp --dport 5060 -j CT --helper sip

[root@test ~]# conntrack -L | grep -E '5060|sip'
conntrack v1.4.4 (conntrack-tools): 13 flow entries have been shown.
udp 17 159 src=10.47.1.11 dst=10.7.0.38 sport=5060 dport=1024 src=10.7.0.38 dst=10.47.1.11 sport=1024 dport=5060 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1

When the phone was the one to send the first packet toward the PBX, it would hit the rule with "-j CT --helper sip" listed above and the conntrack entry would get created with sip helper.

[root@test ~]# conntrack -L | grep -E '5060|sip'
conntrack v1.4.4 (conntrack-tools): 9 flow entries have been shown.
udp 17 3588 src=10.7.0.38 dst=10.47.1.11 sport=1024 dport=5060 src=10.47.1.11 dst=10.7.0.38 sport=5060 dport=1024 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 helper=sip use=2

Note the "helper=sip" towards the end of the entry, compare with lack of it in first sample.

The PBX and Phone send SIP packets at each other to confirm the other is there, so the timing made it appear non-deterministic.

Asterisk preserves the state of peers across reboots, and probes them after reboot, thus being far more likely to send outgoing packet first and causing the non-SIP-helper entry in the conntrack.

Big thank you to user A.B for pointing me in the right direction in the comments.

Remaining mystery

What I can not explain is why when I had modprobe option

options nf_conntrack nf_conntrack_helper=0

answered Aug 23 at 15:16

AndrDevEK

1558

add a commentÂ |Â

up vote
0
down vote

accepted

Solution

The solution was to simply mark the outgoing packets to be handled by conntrack sip helper too:

iptables -t raw -A OUTPUT -p udp -m udp --sport 5060 -j CT --helper sip

Cause

The problem was the firewall rule was marking only incoming packets for conntrack sip helper.

iptables -t raw -A PREROUTING -p udp -m udp --dport 5060 -j CT --helper sip

[root@test ~]# conntrack -L | grep -E '5060|sip'
conntrack v1.4.4 (conntrack-tools): 13 flow entries have been shown.
udp 17 159 src=10.47.1.11 dst=10.7.0.38 sport=5060 dport=1024 src=10.7.0.38 dst=10.47.1.11 sport=1024 dport=5060 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1

When the phone was the one to send the first packet toward the PBX, it would hit the rule with "-j CT --helper sip" listed above and the conntrack entry would get created with sip helper.

[root@test ~]# conntrack -L | grep -E '5060|sip'
conntrack v1.4.4 (conntrack-tools): 9 flow entries have been shown.
udp 17 3588 src=10.7.0.38 dst=10.47.1.11 sport=1024 dport=5060 src=10.47.1.11 dst=10.7.0.38 sport=5060 dport=1024 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 helper=sip use=2

Note the "helper=sip" towards the end of the entry, compare with lack of it in first sample.

The PBX and Phone send SIP packets at each other to confirm the other is there, so the timing made it appear non-deterministic.

Asterisk preserves the state of peers across reboots, and probes them after reboot, thus being far more likely to send outgoing packet first and causing the non-SIP-helper entry in the conntrack.

Big thank you to user A.B for pointing me in the right direction in the comments.

Remaining mystery

What I can not explain is why when I had modprobe option

options nf_conntrack nf_conntrack_helper=0

answered Aug 23 at 15:16

AndrDevEK

1558

Solution

The solution was to simply mark the outgoing packets to be handled by conntrack sip helper too:

iptables -t raw -A OUTPUT -p udp -m udp --sport 5060 -j CT --helper sip

Cause

The problem was the firewall rule was marking only incoming packets for conntrack sip helper.

iptables -t raw -A PREROUTING -p udp -m udp --dport 5060 -j CT --helper sip

[root@test ~]# conntrack -L | grep -E '5060|sip'
conntrack v1.4.4 (conntrack-tools): 13 flow entries have been shown.
udp 17 159 src=10.47.1.11 dst=10.7.0.38 sport=5060 dport=1024 src=10.7.0.38 dst=10.47.1.11 sport=1024 dport=5060 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1

When the phone was the one to send the first packet toward the PBX, it would hit the rule with "-j CT --helper sip" listed above and the conntrack entry would get created with sip helper.

[root@test ~]# conntrack -L | grep -E '5060|sip'
conntrack v1.4.4 (conntrack-tools): 9 flow entries have been shown.
udp 17 3588 src=10.7.0.38 dst=10.47.1.11 sport=1024 dport=5060 src=10.47.1.11 dst=10.7.0.38 sport=5060 dport=1024 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 helper=sip use=2

Note the "helper=sip" towards the end of the entry, compare with lack of it in first sample.

The PBX and Phone send SIP packets at each other to confirm the other is there, so the timing made it appear non-deterministic.

Asterisk preserves the state of peers across reboots, and probes them after reboot, thus being far more likely to send outgoing packet first and causing the non-SIP-helper entry in the conntrack.

Big thank you to user A.B for pointing me in the right direction in the comments.

Remaining mystery

What I can not explain is why when I had modprobe option

options nf_conntrack nf_conntrack_helper=0

answered Aug 23 at 15:16

AndrDevEK

1558

answered Aug 23 at 15:16

AndrDevEK

1558

answered Aug 23 at 15:16

AndrDevEK

1558

answered Aug 23 at 15:16

AndrDevEK

1558

add a commentÂ |Â

up vote
-1
down vote

You are running that testing environment on top of an environment that is severily constraining your performance.

Besides running on a Windows, whilst you have a semi-decent ammount of RAM, the CPU is on the low side. On top of it, you do not have VMWare Tools installed for paravirtualizing your NICs.

I would:

run it on a dedicated VmWare ESXi or at least, not on top of Windows for having a bigger network throughput;

give it at least 2 or 3 CPUs, you want to have some concurrency there, and handle things faster;

install open vmWare tools in the guest OS, the difference in network performance can be more than tenfold of having paravirtualized NICs vs forcing the host to emulate the bits and bytes of "fake"/emulated NICs;

if running Asterisk on top of MySQL, I would prefer having at least 4GB of RAM.

edited Aug 9 at 2:27

answered Aug 9 at 1:33

Rui F Ribeiro

36.5k1271116

It is possible I did not make my comment clear. The Windows host has 4HT cores (8 virt CPUs) and 32Gb RAM. The Linux guest is allocated 1CPU and 2Gb RAM. I disagree that 1CPU and 2Gb RAM for VM is low. It does not come anywhere near the limits. CPU load averate 0.02. RAM used less than half. It is failing to handle ONE SIP connection with 30ish conntrack entries. The comments about Windows and VMWare tools sound more plausible. I'm looking to repeat the test on real hardware.
â€“Â AndrDevEK
Aug 9 at 1:42

Be aware that load is not the whole story. I do not know your setup to say something definite about RAM, however I can tell you it will never scale with such constraints in the CPU side. You are severely constraining it with only 1 CPU, namely for handling interrupts requests.
â€“Â Rui F Ribeiro
Aug 9 at 1:44

This setup if purely to test nf_conntrack_sip configuration, what settings to put where. It is not expected to scale nor to be used in production of any sort. It is on an isolated test network.
â€“Â AndrDevEK
Aug 9 at 1:47

Are you insterested on doing a more fidedign load test or not? it is up to you. As a matter of curiosity, I have seen setups on commercial linux appliances of doing CPU pinning, at setting aside a whole CPU for that, for having the OS/ disk I/O handling in separate CPUs of the network output. Nevertheless, the biggest shortcome for now in your setup is not using vmtools in a VMWare based-environment.
â€“Â Rui F Ribeiro
Aug 9 at 1:50

I have also witnessed some nasty network-related bugs in vbox and hyper V in Windows...in fact I gave up entirely using Windows and switched to a Linux host at work. Some were clearly Windows fault. Cant speak much about VMware under Windows on bugs.
â€“Â Rui F Ribeiro
Aug 9 at 1:56

Â |Â
show 10 more comments

up vote
-1
down vote

You are running that testing environment on top of an environment that is severily constraining your performance.

Besides running on a Windows, whilst you have a semi-decent ammount of RAM, the CPU is on the low side. On top of it, you do not have VMWare Tools installed for paravirtualizing your NICs.

I would:

run it on a dedicated VmWare ESXi or at least, not on top of Windows for having a bigger network throughput;

give it at least 2 or 3 CPUs, you want to have some concurrency there, and handle things faster;

install open vmWare tools in the guest OS, the difference in network performance can be more than tenfold of having paravirtualized NICs vs forcing the host to emulate the bits and bytes of "fake"/emulated NICs;

if running Asterisk on top of MySQL, I would prefer having at least 4GB of RAM.

edited Aug 9 at 2:27

answered Aug 9 at 1:33

Rui F Ribeiro

36.5k1271116

It is possible I did not make my comment clear. The Windows host has 4HT cores (8 virt CPUs) and 32Gb RAM. The Linux guest is allocated 1CPU and 2Gb RAM. I disagree that 1CPU and 2Gb RAM for VM is low. It does not come anywhere near the limits. CPU load averate 0.02. RAM used less than half. It is failing to handle ONE SIP connection with 30ish conntrack entries. The comments about Windows and VMWare tools sound more plausible. I'm looking to repeat the test on real hardware.
â€“Â AndrDevEK
Aug 9 at 1:42

Be aware that load is not the whole story. I do not know your setup to say something definite about RAM, however I can tell you it will never scale with such constraints in the CPU side. You are severely constraining it with only 1 CPU, namely for handling interrupts requests.
â€“Â Rui F Ribeiro
Aug 9 at 1:44

This setup if purely to test nf_conntrack_sip configuration, what settings to put where. It is not expected to scale nor to be used in production of any sort. It is on an isolated test network.
â€“Â AndrDevEK
Aug 9 at 1:47

Are you insterested on doing a more fidedign load test or not? it is up to you. As a matter of curiosity, I have seen setups on commercial linux appliances of doing CPU pinning, at setting aside a whole CPU for that, for having the OS/ disk I/O handling in separate CPUs of the network output. Nevertheless, the biggest shortcome for now in your setup is not using vmtools in a VMWare based-environment.
â€“Â Rui F Ribeiro
Aug 9 at 1:50

I have also witnessed some nasty network-related bugs in vbox and hyper V in Windows...in fact I gave up entirely using Windows and switched to a Linux host at work. Some were clearly Windows fault. Cant speak much about VMware under Windows on bugs.
â€“Â Rui F Ribeiro
Aug 9 at 1:56

Â |Â
show 10 more comments

up vote
-1
down vote

You are running that testing environment on top of an environment that is severily constraining your performance.

Besides running on a Windows, whilst you have a semi-decent ammount of RAM, the CPU is on the low side. On top of it, you do not have VMWare Tools installed for paravirtualizing your NICs.

I would:

run it on a dedicated VmWare ESXi or at least, not on top of Windows for having a bigger network throughput;

give it at least 2 or 3 CPUs, you want to have some concurrency there, and handle things faster;

install open vmWare tools in the guest OS, the difference in network performance can be more than tenfold of having paravirtualized NICs vs forcing the host to emulate the bits and bytes of "fake"/emulated NICs;

if running Asterisk on top of MySQL, I would prefer having at least 4GB of RAM.

edited Aug 9 at 2:27

answered Aug 9 at 1:33

Rui F Ribeiro

36.5k1271116

You are running that testing environment on top of an environment that is severily constraining your performance.

Besides running on a Windows, whilst you have a semi-decent ammount of RAM, the CPU is on the low side. On top of it, you do not have VMWare Tools installed for paravirtualizing your NICs.

I would:

run it on a dedicated VmWare ESXi or at least, not on top of Windows for having a bigger network throughput;

give it at least 2 or 3 CPUs, you want to have some concurrency there, and handle things faster;

install open vmWare tools in the guest OS, the difference in network performance can be more than tenfold of having paravirtualized NICs vs forcing the host to emulate the bits and bytes of "fake"/emulated NICs;

if running Asterisk on top of MySQL, I would prefer having at least 4GB of RAM.

edited Aug 9 at 2:27

answered Aug 9 at 1:33

Rui F Ribeiro

36.5k1271116

edited Aug 9 at 2:27

answered Aug 9 at 1:33

Rui F Ribeiro

36.5k1271116

answered Aug 9 at 1:33

Rui F Ribeiro

36.5k1271116

answered Aug 9 at 1:33

Rui F Ribeiro

36.5k1271116

It is possible I did not make my comment clear. The Windows host has 4HT cores (8 virt CPUs) and 32Gb RAM. The Linux guest is allocated 1CPU and 2Gb RAM. I disagree that 1CPU and 2Gb RAM for VM is low. It does not come anywhere near the limits. CPU load averate 0.02. RAM used less than half. It is failing to handle ONE SIP connection with 30ish conntrack entries. The comments about Windows and VMWare tools sound more plausible. I'm looking to repeat the test on real hardware.
â€“Â AndrDevEK
Aug 9 at 1:42

Be aware that load is not the whole story. I do not know your setup to say something definite about RAM, however I can tell you it will never scale with such constraints in the CPU side. You are severely constraining it with only 1 CPU, namely for handling interrupts requests.
â€“Â Rui F Ribeiro
Aug 9 at 1:44

This setup if purely to test nf_conntrack_sip configuration, what settings to put where. It is not expected to scale nor to be used in production of any sort. It is on an isolated test network.
â€“Â AndrDevEK
Aug 9 at 1:47

Are you insterested on doing a more fidedign load test or not? it is up to you. As a matter of curiosity, I have seen setups on commercial linux appliances of doing CPU pinning, at setting aside a whole CPU for that, for having the OS/ disk I/O handling in separate CPUs of the network output. Nevertheless, the biggest shortcome for now in your setup is not using vmtools in a VMWare based-environment.
â€“Â Rui F Ribeiro
Aug 9 at 1:50

I have also witnessed some nasty network-related bugs in vbox and hyper V in Windows...in fact I gave up entirely using Windows and switched to a Linux host at work. Some were clearly Windows fault. Cant speak much about VMware under Windows on bugs.
â€“Â Rui F Ribeiro
Aug 9 at 1:56

Â |Â
show 10 more comments

It is possible I did not make my comment clear. The Windows host has 4HT cores (8 virt CPUs) and 32Gb RAM. The Linux guest is allocated 1CPU and 2Gb RAM. I disagree that 1CPU and 2Gb RAM for VM is low. It does not come anywhere near the limits. CPU load averate 0.02. RAM used less than half. It is failing to handle ONE SIP connection with 30ish conntrack entries. The comments about Windows and VMWare tools sound more plausible. I'm looking to repeat the test on real hardware.
â€“Â AndrDevEK
Aug 9 at 1:42

Be aware that load is not the whole story. I do not know your setup to say something definite about RAM, however I can tell you it will never scale with such constraints in the CPU side. You are severely constraining it with only 1 CPU, namely for handling interrupts requests.
â€“Â Rui F Ribeiro
Aug 9 at 1:44

This setup if purely to test nf_conntrack_sip configuration, what settings to put where. It is not expected to scale nor to be used in production of any sort. It is on an isolated test network.
â€“Â AndrDevEK
Aug 9 at 1:47

Are you insterested on doing a more fidedign load test or not? it is up to you. As a matter of curiosity, I have seen setups on commercial linux appliances of doing CPU pinning, at setting aside a whole CPU for that, for having the OS/ disk I/O handling in separate CPUs of the network output. Nevertheless, the biggest shortcome for now in your setup is not using vmtools in a VMWare based-environment.
â€“Â Rui F Ribeiro
Aug 9 at 1:50

I have also witnessed some nasty network-related bugs in vbox and hyper V in Windows...in fact I gave up entirely using Windows and switched to a Linux host at work. Some were clearly Windows fault. Cant speak much about VMware under Windows on bugs.
â€“Â Rui F Ribeiro
Aug 9 at 1:56

It is possible I did not make my comment clear. The Windows host has 4HT cores (8 virt CPUs) and 32Gb RAM. The Linux guest is allocated 1CPU and 2Gb RAM. I disagree that 1CPU and 2Gb RAM for VM is low. It does not come anywhere near the limits. CPU load averate 0.02. RAM used less than half. It is failing to handle ONE SIP connection with 30ish conntrack entries. The comments about Windows and VMWare tools sound more plausible. I'm looking to repeat the test on real hardware.
â€“Â AndrDevEK
Aug 9 at 1:42

Be aware that load is not the whole story. I do not know your setup to say something definite about RAM, however I can tell you it will never scale with such constraints in the CPU side. You are severely constraining it with only 1 CPU, namely for handling interrupts requests.
â€“Â Rui F Ribeiro
Aug 9 at 1:44

This setup if purely to test nf_conntrack_sip configuration, what settings to put where. It is not expected to scale nor to be used in production of any sort. It is on an isolated test network.
â€“Â AndrDevEK
Aug 9 at 1:47

Are you insterested on doing a more fidedign load test or not? it is up to you. As a matter of curiosity, I have seen setups on commercial linux appliances of doing CPU pinning, at setting aside a whole CPU for that, for having the OS/ disk I/O handling in separate CPUs of the network output. Nevertheless, the biggest shortcome for now in your setup is not using vmtools in a VMWare based-environment.
â€“Â Rui F Ribeiro
Aug 9 at 1:50

I have also witnessed some nasty network-related bugs in vbox and hyper V in Windows...in fact I gave up entirely using Windows and switched to a Linux host at work. Some were clearly Windows fault. Cant speak much about VMware under Windows on bugs.
â€“Â Rui F Ribeiro
Aug 9 at 1:56

Â |Â
show 10 more comments

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

搜尋此網誌

mjhjmtu