How do I make ppp reliable over lossy radio modems using pppd and tcp kernel settings on debian?
Clash Royale CLAN TAG#URR8PPP
up vote
2
down vote
favorite
I am having a lot of trouble establishing a reliable ppp / tcpip link between two debian systems over radio modems.
My hardware topology is a little complex.
The system uses:
- Raspberry Pi 3B at each end of the radio link running raspbian
stretch (RPI) - RFDesign RFD900x radio modems (connected via FTDI cable
via USB to the RPIs) (RFD900) - A linksys wifi router that it NATing
(WIFI) To a satelite service (SkyMuster - Australia) to an unknown
POP in AUS to the Internet (SAT) - A vpn (vpnc) over the SAT to another
Aus ISP static IP terminted by a router. (which is the default route
for the RPI3Bs) - (VPN) The vpn endpoint is connected to the net with a
static ip (END)
I beleive the problem to be over the RFD900x modems, related to TCP congestion fall backs that occur when the radio drops packets, though I provide the other details for context and in case I'm missing somthing silly.
The issues are reproducable between the RPIs over the RFD900.
From the end-point (with the most trouble) the link to the Internet is as follows:
RPI -> RFD900 -> RFD900 -> RPI -> VPN -> WIFI -> SAT -> END.
Again the above for context.
The RFD900s drop packets a lot with the distance and obsticles involved. I have tried all sorts of aerial configurations to no avail (omni, yaggi, direct vs bouncing off granite cliffs). I have tried tuning all sorts of parameters in the modems, mtu, ppp settings etc to acheive TCP/IP reliability to no avail.
Air speed is 64kb. Serial speed is 57kb.
Diag notes:
- On simple serial to serial comms over the RFD900 at various distances
the radio MTU size of 131 bytes or 151 bytes has best throughput. - The throughput is reliable, though is "bursty" ->
burst, burst, burst, rather than continuous flow. - I suspect this "burstiness" is a function of TCP seeing the radio packet dropouts as congestion which progresses to an inevitable retry saturation.
- When it saturates, sessions (ssh, scp, apt etc) just seem to freeze for
variable amounts of extended time (seconds, often 2-3 minutes,
sometimes > 10 minutes). - apt will generally fail. scp and ssh tend to keep on going and get there eventually, though usually with multiple stalls and crazy delay times.
- Interactively over ssh, the link is usable, providing that no long responses are involved - eg a long ls -la.
- Flow control to the modems (none, RTSCTS, XONXOFF) seem inconsequential to my tests.
- Different forms of ppp payload compression seem inconsequential (BSD, Predictor, deflate etc).
- Van Jacobsen header compression increases the throughput per burst, but exacerbates the stalls and delays
- I've searched extensively for solutions (even going back and reading the RFCs).
- It seems that VJ header comp was identified as problematic for lossy links and there have been RFC advances on compression techniques eg ROHC - RObust Header Compression, including a ROHC work-group from which seems to have emerged various proprietary compression protocols that are not available in open source.
- The problem seems well-solved for cellular links (both with ppp and RLP) - which rely on proprietary protocols.
I also post here my current script which runs pppd (including the various options I've tried - see #commented lines.):
# set up the pppd to allow the remote unit to connect as an IP peer
# additional options for remote end: usepeerdns defaultroute replacedefultroute
pppd /dev/ttyUSB0 57600 mru 131 mtu 131 noauth crtscts nocdtrcts lock passive 192.168.10.1:192.168.10.2 local maxfail 0 persist proxyarp updetach
#pppd /dev/ttyUSB0 57600 novj novjccomp mru 131 mtu 131 noauth crtscts nocdtrcts lock passive 192.168.10.1:192.168.10.2 local maxfail 0 persist proxyarp updetach
#pppd /dev/ttyUSB0 57600 192.168.10.1:192.168.10.2 mru 131 mtu 131 proxyarp noauth crtscts nocdtrcts noaccomp nobsdcomp nodeflate nopcomp nopredictor1 novj novjccomp lock mru 131 mtu 131 passive local maxfail 0 persist updetach
#debug ktune bsdcomp 12 xonxoff nocrtscts mru 296 mtu 296
#pppd /dev/ttyUSB0 57600 debug mru 131 mtu 131 noauth crtscts nocdtrcts lock passive 192.168.10.1:192.168.10.2 local maxfail 0 persist updetach proxyarp
#pppd /dev/ttyUSB0 57600 noaccomp nobsdcomp nodeflate nopcomp nopredictor1 novj novjccomp mru 131 mtu 131 noauth crtscts nocdtrcts lock passive 192.168.10.1:192.168.10.2 local maxfail 0 persist proxyarp updetach
#pppd /dev/ttyUSB0 57600 novjccomp mru 131 mtu 131 noauth crtscts nocdtrcts lock passive 192.168.10.1:192.168.10.2 local maxfail 0 persist proxyarp updetach
Has anyone solved this with open source pppd? Are there other options or technologies which would be an alternative?
Are kernel TCP congestion settings worth looking into?
tcp modem ppp pppd
add a comment |Â
up vote
2
down vote
favorite
I am having a lot of trouble establishing a reliable ppp / tcpip link between two debian systems over radio modems.
My hardware topology is a little complex.
The system uses:
- Raspberry Pi 3B at each end of the radio link running raspbian
stretch (RPI) - RFDesign RFD900x radio modems (connected via FTDI cable
via USB to the RPIs) (RFD900) - A linksys wifi router that it NATing
(WIFI) To a satelite service (SkyMuster - Australia) to an unknown
POP in AUS to the Internet (SAT) - A vpn (vpnc) over the SAT to another
Aus ISP static IP terminted by a router. (which is the default route
for the RPI3Bs) - (VPN) The vpn endpoint is connected to the net with a
static ip (END)
I beleive the problem to be over the RFD900x modems, related to TCP congestion fall backs that occur when the radio drops packets, though I provide the other details for context and in case I'm missing somthing silly.
The issues are reproducable between the RPIs over the RFD900.
From the end-point (with the most trouble) the link to the Internet is as follows:
RPI -> RFD900 -> RFD900 -> RPI -> VPN -> WIFI -> SAT -> END.
Again the above for context.
The RFD900s drop packets a lot with the distance and obsticles involved. I have tried all sorts of aerial configurations to no avail (omni, yaggi, direct vs bouncing off granite cliffs). I have tried tuning all sorts of parameters in the modems, mtu, ppp settings etc to acheive TCP/IP reliability to no avail.
Air speed is 64kb. Serial speed is 57kb.
Diag notes:
- On simple serial to serial comms over the RFD900 at various distances
the radio MTU size of 131 bytes or 151 bytes has best throughput. - The throughput is reliable, though is "bursty" ->
burst, burst, burst, rather than continuous flow. - I suspect this "burstiness" is a function of TCP seeing the radio packet dropouts as congestion which progresses to an inevitable retry saturation.
- When it saturates, sessions (ssh, scp, apt etc) just seem to freeze for
variable amounts of extended time (seconds, often 2-3 minutes,
sometimes > 10 minutes). - apt will generally fail. scp and ssh tend to keep on going and get there eventually, though usually with multiple stalls and crazy delay times.
- Interactively over ssh, the link is usable, providing that no long responses are involved - eg a long ls -la.
- Flow control to the modems (none, RTSCTS, XONXOFF) seem inconsequential to my tests.
- Different forms of ppp payload compression seem inconsequential (BSD, Predictor, deflate etc).
- Van Jacobsen header compression increases the throughput per burst, but exacerbates the stalls and delays
- I've searched extensively for solutions (even going back and reading the RFCs).
- It seems that VJ header comp was identified as problematic for lossy links and there have been RFC advances on compression techniques eg ROHC - RObust Header Compression, including a ROHC work-group from which seems to have emerged various proprietary compression protocols that are not available in open source.
- The problem seems well-solved for cellular links (both with ppp and RLP) - which rely on proprietary protocols.
I also post here my current script which runs pppd (including the various options I've tried - see #commented lines.):
# set up the pppd to allow the remote unit to connect as an IP peer
# additional options for remote end: usepeerdns defaultroute replacedefultroute
pppd /dev/ttyUSB0 57600 mru 131 mtu 131 noauth crtscts nocdtrcts lock passive 192.168.10.1:192.168.10.2 local maxfail 0 persist proxyarp updetach
#pppd /dev/ttyUSB0 57600 novj novjccomp mru 131 mtu 131 noauth crtscts nocdtrcts lock passive 192.168.10.1:192.168.10.2 local maxfail 0 persist proxyarp updetach
#pppd /dev/ttyUSB0 57600 192.168.10.1:192.168.10.2 mru 131 mtu 131 proxyarp noauth crtscts nocdtrcts noaccomp nobsdcomp nodeflate nopcomp nopredictor1 novj novjccomp lock mru 131 mtu 131 passive local maxfail 0 persist updetach
#debug ktune bsdcomp 12 xonxoff nocrtscts mru 296 mtu 296
#pppd /dev/ttyUSB0 57600 debug mru 131 mtu 131 noauth crtscts nocdtrcts lock passive 192.168.10.1:192.168.10.2 local maxfail 0 persist updetach proxyarp
#pppd /dev/ttyUSB0 57600 noaccomp nobsdcomp nodeflate nopcomp nopredictor1 novj novjccomp mru 131 mtu 131 noauth crtscts nocdtrcts lock passive 192.168.10.1:192.168.10.2 local maxfail 0 persist proxyarp updetach
#pppd /dev/ttyUSB0 57600 novjccomp mru 131 mtu 131 noauth crtscts nocdtrcts lock passive 192.168.10.1:192.168.10.2 local maxfail 0 persist proxyarp updetach
Has anyone solved this with open source pppd? Are there other options or technologies which would be an alternative?
Are kernel TCP congestion settings worth looking into?
tcp modem ppp pppd
You seem to have testedppp
options thoroughly. The "saturation" behavior suggests bufferbloat (google), so have a look attc
and Linux Traffic Control tutorials.
â dirkt
May 9 at 10:21
1
Were you looking over my shoulder :) Yes deep in that now. I've had some success with tcp_bbr congestion control and qdisc set to fair queuing. Will report back once I've done some more testing.
â BrendanMcL
May 9 at 11:23
add a comment |Â
up vote
2
down vote
favorite
up vote
2
down vote
favorite
I am having a lot of trouble establishing a reliable ppp / tcpip link between two debian systems over radio modems.
My hardware topology is a little complex.
The system uses:
- Raspberry Pi 3B at each end of the radio link running raspbian
stretch (RPI) - RFDesign RFD900x radio modems (connected via FTDI cable
via USB to the RPIs) (RFD900) - A linksys wifi router that it NATing
(WIFI) To a satelite service (SkyMuster - Australia) to an unknown
POP in AUS to the Internet (SAT) - A vpn (vpnc) over the SAT to another
Aus ISP static IP terminted by a router. (which is the default route
for the RPI3Bs) - (VPN) The vpn endpoint is connected to the net with a
static ip (END)
I beleive the problem to be over the RFD900x modems, related to TCP congestion fall backs that occur when the radio drops packets, though I provide the other details for context and in case I'm missing somthing silly.
The issues are reproducable between the RPIs over the RFD900.
From the end-point (with the most trouble) the link to the Internet is as follows:
RPI -> RFD900 -> RFD900 -> RPI -> VPN -> WIFI -> SAT -> END.
Again the above for context.
The RFD900s drop packets a lot with the distance and obsticles involved. I have tried all sorts of aerial configurations to no avail (omni, yaggi, direct vs bouncing off granite cliffs). I have tried tuning all sorts of parameters in the modems, mtu, ppp settings etc to acheive TCP/IP reliability to no avail.
Air speed is 64kb. Serial speed is 57kb.
Diag notes:
- On simple serial to serial comms over the RFD900 at various distances
the radio MTU size of 131 bytes or 151 bytes has best throughput. - The throughput is reliable, though is "bursty" ->
burst, burst, burst, rather than continuous flow. - I suspect this "burstiness" is a function of TCP seeing the radio packet dropouts as congestion which progresses to an inevitable retry saturation.
- When it saturates, sessions (ssh, scp, apt etc) just seem to freeze for
variable amounts of extended time (seconds, often 2-3 minutes,
sometimes > 10 minutes). - apt will generally fail. scp and ssh tend to keep on going and get there eventually, though usually with multiple stalls and crazy delay times.
- Interactively over ssh, the link is usable, providing that no long responses are involved - eg a long ls -la.
- Flow control to the modems (none, RTSCTS, XONXOFF) seem inconsequential to my tests.
- Different forms of ppp payload compression seem inconsequential (BSD, Predictor, deflate etc).
- Van Jacobsen header compression increases the throughput per burst, but exacerbates the stalls and delays
- I've searched extensively for solutions (even going back and reading the RFCs).
- It seems that VJ header comp was identified as problematic for lossy links and there have been RFC advances on compression techniques eg ROHC - RObust Header Compression, including a ROHC work-group from which seems to have emerged various proprietary compression protocols that are not available in open source.
- The problem seems well-solved for cellular links (both with ppp and RLP) - which rely on proprietary protocols.
I also post here my current script which runs pppd (including the various options I've tried - see #commented lines.):
# set up the pppd to allow the remote unit to connect as an IP peer
# additional options for remote end: usepeerdns defaultroute replacedefultroute
pppd /dev/ttyUSB0 57600 mru 131 mtu 131 noauth crtscts nocdtrcts lock passive 192.168.10.1:192.168.10.2 local maxfail 0 persist proxyarp updetach
#pppd /dev/ttyUSB0 57600 novj novjccomp mru 131 mtu 131 noauth crtscts nocdtrcts lock passive 192.168.10.1:192.168.10.2 local maxfail 0 persist proxyarp updetach
#pppd /dev/ttyUSB0 57600 192.168.10.1:192.168.10.2 mru 131 mtu 131 proxyarp noauth crtscts nocdtrcts noaccomp nobsdcomp nodeflate nopcomp nopredictor1 novj novjccomp lock mru 131 mtu 131 passive local maxfail 0 persist updetach
#debug ktune bsdcomp 12 xonxoff nocrtscts mru 296 mtu 296
#pppd /dev/ttyUSB0 57600 debug mru 131 mtu 131 noauth crtscts nocdtrcts lock passive 192.168.10.1:192.168.10.2 local maxfail 0 persist updetach proxyarp
#pppd /dev/ttyUSB0 57600 noaccomp nobsdcomp nodeflate nopcomp nopredictor1 novj novjccomp mru 131 mtu 131 noauth crtscts nocdtrcts lock passive 192.168.10.1:192.168.10.2 local maxfail 0 persist proxyarp updetach
#pppd /dev/ttyUSB0 57600 novjccomp mru 131 mtu 131 noauth crtscts nocdtrcts lock passive 192.168.10.1:192.168.10.2 local maxfail 0 persist proxyarp updetach
Has anyone solved this with open source pppd? Are there other options or technologies which would be an alternative?
Are kernel TCP congestion settings worth looking into?
tcp modem ppp pppd
I am having a lot of trouble establishing a reliable ppp / tcpip link between two debian systems over radio modems.
My hardware topology is a little complex.
The system uses:
- Raspberry Pi 3B at each end of the radio link running raspbian
stretch (RPI) - RFDesign RFD900x radio modems (connected via FTDI cable
via USB to the RPIs) (RFD900) - A linksys wifi router that it NATing
(WIFI) To a satelite service (SkyMuster - Australia) to an unknown
POP in AUS to the Internet (SAT) - A vpn (vpnc) over the SAT to another
Aus ISP static IP terminted by a router. (which is the default route
for the RPI3Bs) - (VPN) The vpn endpoint is connected to the net with a
static ip (END)
I beleive the problem to be over the RFD900x modems, related to TCP congestion fall backs that occur when the radio drops packets, though I provide the other details for context and in case I'm missing somthing silly.
The issues are reproducable between the RPIs over the RFD900.
From the end-point (with the most trouble) the link to the Internet is as follows:
RPI -> RFD900 -> RFD900 -> RPI -> VPN -> WIFI -> SAT -> END.
Again the above for context.
The RFD900s drop packets a lot with the distance and obsticles involved. I have tried all sorts of aerial configurations to no avail (omni, yaggi, direct vs bouncing off granite cliffs). I have tried tuning all sorts of parameters in the modems, mtu, ppp settings etc to acheive TCP/IP reliability to no avail.
Air speed is 64kb. Serial speed is 57kb.
Diag notes:
- On simple serial to serial comms over the RFD900 at various distances
the radio MTU size of 131 bytes or 151 bytes has best throughput. - The throughput is reliable, though is "bursty" ->
burst, burst, burst, rather than continuous flow. - I suspect this "burstiness" is a function of TCP seeing the radio packet dropouts as congestion which progresses to an inevitable retry saturation.
- When it saturates, sessions (ssh, scp, apt etc) just seem to freeze for
variable amounts of extended time (seconds, often 2-3 minutes,
sometimes > 10 minutes). - apt will generally fail. scp and ssh tend to keep on going and get there eventually, though usually with multiple stalls and crazy delay times.
- Interactively over ssh, the link is usable, providing that no long responses are involved - eg a long ls -la.
- Flow control to the modems (none, RTSCTS, XONXOFF) seem inconsequential to my tests.
- Different forms of ppp payload compression seem inconsequential (BSD, Predictor, deflate etc).
- Van Jacobsen header compression increases the throughput per burst, but exacerbates the stalls and delays
- I've searched extensively for solutions (even going back and reading the RFCs).
- It seems that VJ header comp was identified as problematic for lossy links and there have been RFC advances on compression techniques eg ROHC - RObust Header Compression, including a ROHC work-group from which seems to have emerged various proprietary compression protocols that are not available in open source.
- The problem seems well-solved for cellular links (both with ppp and RLP) - which rely on proprietary protocols.
I also post here my current script which runs pppd (including the various options I've tried - see #commented lines.):
# set up the pppd to allow the remote unit to connect as an IP peer
# additional options for remote end: usepeerdns defaultroute replacedefultroute
pppd /dev/ttyUSB0 57600 mru 131 mtu 131 noauth crtscts nocdtrcts lock passive 192.168.10.1:192.168.10.2 local maxfail 0 persist proxyarp updetach
#pppd /dev/ttyUSB0 57600 novj novjccomp mru 131 mtu 131 noauth crtscts nocdtrcts lock passive 192.168.10.1:192.168.10.2 local maxfail 0 persist proxyarp updetach
#pppd /dev/ttyUSB0 57600 192.168.10.1:192.168.10.2 mru 131 mtu 131 proxyarp noauth crtscts nocdtrcts noaccomp nobsdcomp nodeflate nopcomp nopredictor1 novj novjccomp lock mru 131 mtu 131 passive local maxfail 0 persist updetach
#debug ktune bsdcomp 12 xonxoff nocrtscts mru 296 mtu 296
#pppd /dev/ttyUSB0 57600 debug mru 131 mtu 131 noauth crtscts nocdtrcts lock passive 192.168.10.1:192.168.10.2 local maxfail 0 persist updetach proxyarp
#pppd /dev/ttyUSB0 57600 noaccomp nobsdcomp nodeflate nopcomp nopredictor1 novj novjccomp mru 131 mtu 131 noauth crtscts nocdtrcts lock passive 192.168.10.1:192.168.10.2 local maxfail 0 persist proxyarp updetach
#pppd /dev/ttyUSB0 57600 novjccomp mru 131 mtu 131 noauth crtscts nocdtrcts lock passive 192.168.10.1:192.168.10.2 local maxfail 0 persist proxyarp updetach
Has anyone solved this with open source pppd? Are there other options or technologies which would be an alternative?
Are kernel TCP congestion settings worth looking into?
tcp modem ppp pppd
edited May 9 at 8:23
asked May 9 at 1:51
BrendanMcL
214
214
You seem to have testedppp
options thoroughly. The "saturation" behavior suggests bufferbloat (google), so have a look attc
and Linux Traffic Control tutorials.
â dirkt
May 9 at 10:21
1
Were you looking over my shoulder :) Yes deep in that now. I've had some success with tcp_bbr congestion control and qdisc set to fair queuing. Will report back once I've done some more testing.
â BrendanMcL
May 9 at 11:23
add a comment |Â
You seem to have testedppp
options thoroughly. The "saturation" behavior suggests bufferbloat (google), so have a look attc
and Linux Traffic Control tutorials.
â dirkt
May 9 at 10:21
1
Were you looking over my shoulder :) Yes deep in that now. I've had some success with tcp_bbr congestion control and qdisc set to fair queuing. Will report back once I've done some more testing.
â BrendanMcL
May 9 at 11:23
You seem to have tested
ppp
options thoroughly. The "saturation" behavior suggests bufferbloat (google), so have a look at tc
and Linux Traffic Control tutorials.â dirkt
May 9 at 10:21
You seem to have tested
ppp
options thoroughly. The "saturation" behavior suggests bufferbloat (google), so have a look at tc
and Linux Traffic Control tutorials.â dirkt
May 9 at 10:21
1
1
Were you looking over my shoulder :) Yes deep in that now. I've had some success with tcp_bbr congestion control and qdisc set to fair queuing. Will report back once I've done some more testing.
â BrendanMcL
May 9 at 11:23
Were you looking over my shoulder :) Yes deep in that now. I've had some success with tcp_bbr congestion control and qdisc set to fair queuing. Will report back once I've done some more testing.
â BrendanMcL
May 9 at 11:23
add a comment |Â
1 Answer
1
active
oldest
votes
up vote
1
down vote
As a partial answer to my own question, I have made significant reliability improvements through the following:
Changing the tcp_congestion_control kernel plugin from the default cubic to bbr has significantly addressed bufferbloat that along with a lossy radio connection is at the core of the issue.
This required loading of the tcp_bbr kernel module and also changing the net.core queuing discipline model to fair queuing to provide pacing for the bbr module.
On the RPIs the defaults are:
net.ipv4.tcp_congestion_control=cubic
net.core.default_qdisc=pfifo_fast
The commands to change this at run-time are:
modprobe tcp_bbr
sysctl -w net.core.default_qdisc=fq
sysctl -w net.ipv4.tcp_congestion_control=bbr
I presently run these from a script called from /etc/rc.local. They can easily be made permanent with modifications to modprode.d and sysctl.conf or as a file in sysctl.d.
The result is a far more smooth response over ssh and far more reliable bulk transfer, which while still stalling manages to recover quickly and complete reliably, returning to the command prompt immediately (rather than pausing at 100% for extended periods - eg 1 - 3 minutes before returning as was the case with the cubic congestion control).
The trade-off is overall speed, however reliability is more important. For example the transfer of a 283k file across the radio link using scp now results in:
100% 283KB 2.5KB/s 01:51
This is a compromise I am happy enough with for now.
However, long running bulk transfer processes are still problematic and eventually stall and never complete.
For example, apt-get update, running for over an hour (stalling on the 11.7MB file), requires occasional carriage returns through the ssh terminal to continue running, and eventually do bog down to a very extended latency, though not failing all together.
In the following screen scrape, the process was 1 hour plus, with a few CRs every 10-15 mins and a delay of approximately 5 mins between when ^C was sent vs the terminal responding:
root@priotdev2:~# apt-get update
Get:1 http://mirror.internode.on.net/pub/raspbian/raspbian stretch InRelease [15.0 kB]
Get:3 http://mirror.internode.on.net/pub/raspbian/raspbian stretch/main armhf Packages [11.7 MB]
Get:2 http://archive.raspberrypi.org/debian stretch InRelease [25.3 kB]
Get:2 http://archive.raspberrypi.org/debian stretch InRelease [25.3 kB]
Get:4 http://archive.raspberrypi.org/debian stretch/main armhf Packages [145 kB]
Get:5 http://archive.raspberrypi.org/debian stretch/ui armhf Packages [30.7 kB]
Get:3 http://mirror.internode.on.net/pub/raspbian/raspbian stretch/main armhf Packages [11.7 MB]
23% [3 Packages 270 kB/11.7 MB 2%] 2,864 B/s 1h 6min 15s
Get:3 http://mirror.internode.on.net/pub/raspbian/raspbian stretch/main armhf Packages [11.7 MB]
29% [3 Packages 1,263 kB/11.7 MB 11%] 131 B/s 22h 2min 19s
Get:3 http://mirror.internode.on.net/pub/raspbian/raspbian stretch/main armhf Packages [11.7 MB]
Get:3 http://mirror.internode.on.net/pub/raspbian/raspbian stretch/main armhf Packages [11.7 MB]
Get:3 http://mirror.internode.on.net/pub/raspbian/raspbian stretch/main armhf Packages [11.7 MB]
Get:3 http://mirror.internode.on.net/pub/raspbian/raspbian stretch/main armhf Packages [11.7 MB]
60% [3 Packages 5,883 kB/11.7 MB 50%] 16 B/s 4d 4h 13min 58s
60% [3 Packages 5,902 kB/11.7 MB 51%] 1,531 B/s 1h 2min 38s
66% [3 Packages 6,704 kB/11.7 MB 58%]
66% [3 Packages 6,704 kB/11.7 MB 58%]
Get:3 http://mirror.internode.on.net/pub/raspbian/raspbian stretch/main armhf Packages [11.7 MB]
66% [3 Packages 6,735 kB/11.7 MB 58%]
66% [3 Packages 6,735 kB/11.7 MB 58%]
66% [3 Packages 6,745 kB/11.7 MB 58%] 32 B/s 1d 18h 37min 55s
66% [3 Packages 6,745 kB/11.7 MB 58%] 32 B/s 1d 18h 37min 55s
66% [3 Packages 6,745 kB/11.7 MB 58%]
66% [3 Packages 6,745 kB/11.7 MB 58%]
66% [3 Packages 6,746 kB/11.7 MB 58%] 230 B/s 5h 55min 46s
66% [3 Packages 6,747 kB/11.7 MB 58%] 148 B/s 9h 12min 47s^C
root@priotdev2:~# ^C
root@priotdev2:~#
root@priotdev2:~#
Scrolling to the right on the above dump can be seen the abysmal through put (down to 16 and 32 bytes per second in some cases).
To remove the end-to-end variables that also involve the satellite link, the apt process is actually using the upstream RPI as an apt cache (which is up to date), the transfers only represent traffic over the radio link.
Any insights from the community on further improvements would be most welcome.
add a comment |Â
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
1
down vote
As a partial answer to my own question, I have made significant reliability improvements through the following:
Changing the tcp_congestion_control kernel plugin from the default cubic to bbr has significantly addressed bufferbloat that along with a lossy radio connection is at the core of the issue.
This required loading of the tcp_bbr kernel module and also changing the net.core queuing discipline model to fair queuing to provide pacing for the bbr module.
On the RPIs the defaults are:
net.ipv4.tcp_congestion_control=cubic
net.core.default_qdisc=pfifo_fast
The commands to change this at run-time are:
modprobe tcp_bbr
sysctl -w net.core.default_qdisc=fq
sysctl -w net.ipv4.tcp_congestion_control=bbr
I presently run these from a script called from /etc/rc.local. They can easily be made permanent with modifications to modprode.d and sysctl.conf or as a file in sysctl.d.
The result is a far more smooth response over ssh and far more reliable bulk transfer, which while still stalling manages to recover quickly and complete reliably, returning to the command prompt immediately (rather than pausing at 100% for extended periods - eg 1 - 3 minutes before returning as was the case with the cubic congestion control).
The trade-off is overall speed, however reliability is more important. For example the transfer of a 283k file across the radio link using scp now results in:
100% 283KB 2.5KB/s 01:51
This is a compromise I am happy enough with for now.
However, long running bulk transfer processes are still problematic and eventually stall and never complete.
For example, apt-get update, running for over an hour (stalling on the 11.7MB file), requires occasional carriage returns through the ssh terminal to continue running, and eventually do bog down to a very extended latency, though not failing all together.
In the following screen scrape, the process was 1 hour plus, with a few CRs every 10-15 mins and a delay of approximately 5 mins between when ^C was sent vs the terminal responding:
root@priotdev2:~# apt-get update
Get:1 http://mirror.internode.on.net/pub/raspbian/raspbian stretch InRelease [15.0 kB]
Get:3 http://mirror.internode.on.net/pub/raspbian/raspbian stretch/main armhf Packages [11.7 MB]
Get:2 http://archive.raspberrypi.org/debian stretch InRelease [25.3 kB]
Get:2 http://archive.raspberrypi.org/debian stretch InRelease [25.3 kB]
Get:4 http://archive.raspberrypi.org/debian stretch/main armhf Packages [145 kB]
Get:5 http://archive.raspberrypi.org/debian stretch/ui armhf Packages [30.7 kB]
Get:3 http://mirror.internode.on.net/pub/raspbian/raspbian stretch/main armhf Packages [11.7 MB]
23% [3 Packages 270 kB/11.7 MB 2%] 2,864 B/s 1h 6min 15s
Get:3 http://mirror.internode.on.net/pub/raspbian/raspbian stretch/main armhf Packages [11.7 MB]
29% [3 Packages 1,263 kB/11.7 MB 11%] 131 B/s 22h 2min 19s
Get:3 http://mirror.internode.on.net/pub/raspbian/raspbian stretch/main armhf Packages [11.7 MB]
Get:3 http://mirror.internode.on.net/pub/raspbian/raspbian stretch/main armhf Packages [11.7 MB]
Get:3 http://mirror.internode.on.net/pub/raspbian/raspbian stretch/main armhf Packages [11.7 MB]
Get:3 http://mirror.internode.on.net/pub/raspbian/raspbian stretch/main armhf Packages [11.7 MB]
60% [3 Packages 5,883 kB/11.7 MB 50%] 16 B/s 4d 4h 13min 58s
60% [3 Packages 5,902 kB/11.7 MB 51%] 1,531 B/s 1h 2min 38s
66% [3 Packages 6,704 kB/11.7 MB 58%]
66% [3 Packages 6,704 kB/11.7 MB 58%]
Get:3 http://mirror.internode.on.net/pub/raspbian/raspbian stretch/main armhf Packages [11.7 MB]
66% [3 Packages 6,735 kB/11.7 MB 58%]
66% [3 Packages 6,735 kB/11.7 MB 58%]
66% [3 Packages 6,745 kB/11.7 MB 58%] 32 B/s 1d 18h 37min 55s
66% [3 Packages 6,745 kB/11.7 MB 58%] 32 B/s 1d 18h 37min 55s
66% [3 Packages 6,745 kB/11.7 MB 58%]
66% [3 Packages 6,745 kB/11.7 MB 58%]
66% [3 Packages 6,746 kB/11.7 MB 58%] 230 B/s 5h 55min 46s
66% [3 Packages 6,747 kB/11.7 MB 58%] 148 B/s 9h 12min 47s^C
root@priotdev2:~# ^C
root@priotdev2:~#
root@priotdev2:~#
Scrolling to the right on the above dump can be seen the abysmal through put (down to 16 and 32 bytes per second in some cases).
To remove the end-to-end variables that also involve the satellite link, the apt process is actually using the upstream RPI as an apt cache (which is up to date), the transfers only represent traffic over the radio link.
Any insights from the community on further improvements would be most welcome.
add a comment |Â
up vote
1
down vote
As a partial answer to my own question, I have made significant reliability improvements through the following:
Changing the tcp_congestion_control kernel plugin from the default cubic to bbr has significantly addressed bufferbloat that along with a lossy radio connection is at the core of the issue.
This required loading of the tcp_bbr kernel module and also changing the net.core queuing discipline model to fair queuing to provide pacing for the bbr module.
On the RPIs the defaults are:
net.ipv4.tcp_congestion_control=cubic
net.core.default_qdisc=pfifo_fast
The commands to change this at run-time are:
modprobe tcp_bbr
sysctl -w net.core.default_qdisc=fq
sysctl -w net.ipv4.tcp_congestion_control=bbr
I presently run these from a script called from /etc/rc.local. They can easily be made permanent with modifications to modprode.d and sysctl.conf or as a file in sysctl.d.
The result is a far more smooth response over ssh and far more reliable bulk transfer, which while still stalling manages to recover quickly and complete reliably, returning to the command prompt immediately (rather than pausing at 100% for extended periods - eg 1 - 3 minutes before returning as was the case with the cubic congestion control).
The trade-off is overall speed, however reliability is more important. For example the transfer of a 283k file across the radio link using scp now results in:
100% 283KB 2.5KB/s 01:51
This is a compromise I am happy enough with for now.
However, long running bulk transfer processes are still problematic and eventually stall and never complete.
For example, apt-get update, running for over an hour (stalling on the 11.7MB file), requires occasional carriage returns through the ssh terminal to continue running, and eventually do bog down to a very extended latency, though not failing all together.
In the following screen scrape, the process was 1 hour plus, with a few CRs every 10-15 mins and a delay of approximately 5 mins between when ^C was sent vs the terminal responding:
root@priotdev2:~# apt-get update
Get:1 http://mirror.internode.on.net/pub/raspbian/raspbian stretch InRelease [15.0 kB]
Get:3 http://mirror.internode.on.net/pub/raspbian/raspbian stretch/main armhf Packages [11.7 MB]
Get:2 http://archive.raspberrypi.org/debian stretch InRelease [25.3 kB]
Get:2 http://archive.raspberrypi.org/debian stretch InRelease [25.3 kB]
Get:4 http://archive.raspberrypi.org/debian stretch/main armhf Packages [145 kB]
Get:5 http://archive.raspberrypi.org/debian stretch/ui armhf Packages [30.7 kB]
Get:3 http://mirror.internode.on.net/pub/raspbian/raspbian stretch/main armhf Packages [11.7 MB]
23% [3 Packages 270 kB/11.7 MB 2%] 2,864 B/s 1h 6min 15s
Get:3 http://mirror.internode.on.net/pub/raspbian/raspbian stretch/main armhf Packages [11.7 MB]
29% [3 Packages 1,263 kB/11.7 MB 11%] 131 B/s 22h 2min 19s
Get:3 http://mirror.internode.on.net/pub/raspbian/raspbian stretch/main armhf Packages [11.7 MB]
Get:3 http://mirror.internode.on.net/pub/raspbian/raspbian stretch/main armhf Packages [11.7 MB]
Get:3 http://mirror.internode.on.net/pub/raspbian/raspbian stretch/main armhf Packages [11.7 MB]
Get:3 http://mirror.internode.on.net/pub/raspbian/raspbian stretch/main armhf Packages [11.7 MB]
60% [3 Packages 5,883 kB/11.7 MB 50%] 16 B/s 4d 4h 13min 58s
60% [3 Packages 5,902 kB/11.7 MB 51%] 1,531 B/s 1h 2min 38s
66% [3 Packages 6,704 kB/11.7 MB 58%]
66% [3 Packages 6,704 kB/11.7 MB 58%]
Get:3 http://mirror.internode.on.net/pub/raspbian/raspbian stretch/main armhf Packages [11.7 MB]
66% [3 Packages 6,735 kB/11.7 MB 58%]
66% [3 Packages 6,735 kB/11.7 MB 58%]
66% [3 Packages 6,745 kB/11.7 MB 58%] 32 B/s 1d 18h 37min 55s
66% [3 Packages 6,745 kB/11.7 MB 58%] 32 B/s 1d 18h 37min 55s
66% [3 Packages 6,745 kB/11.7 MB 58%]
66% [3 Packages 6,745 kB/11.7 MB 58%]
66% [3 Packages 6,746 kB/11.7 MB 58%] 230 B/s 5h 55min 46s
66% [3 Packages 6,747 kB/11.7 MB 58%] 148 B/s 9h 12min 47s^C
root@priotdev2:~# ^C
root@priotdev2:~#
root@priotdev2:~#
Scrolling to the right on the above dump can be seen the abysmal through put (down to 16 and 32 bytes per second in some cases).
To remove the end-to-end variables that also involve the satellite link, the apt process is actually using the upstream RPI as an apt cache (which is up to date), the transfers only represent traffic over the radio link.
Any insights from the community on further improvements would be most welcome.
add a comment |Â
up vote
1
down vote
up vote
1
down vote
As a partial answer to my own question, I have made significant reliability improvements through the following:
Changing the tcp_congestion_control kernel plugin from the default cubic to bbr has significantly addressed bufferbloat that along with a lossy radio connection is at the core of the issue.
This required loading of the tcp_bbr kernel module and also changing the net.core queuing discipline model to fair queuing to provide pacing for the bbr module.
On the RPIs the defaults are:
net.ipv4.tcp_congestion_control=cubic
net.core.default_qdisc=pfifo_fast
The commands to change this at run-time are:
modprobe tcp_bbr
sysctl -w net.core.default_qdisc=fq
sysctl -w net.ipv4.tcp_congestion_control=bbr
I presently run these from a script called from /etc/rc.local. They can easily be made permanent with modifications to modprode.d and sysctl.conf or as a file in sysctl.d.
The result is a far more smooth response over ssh and far more reliable bulk transfer, which while still stalling manages to recover quickly and complete reliably, returning to the command prompt immediately (rather than pausing at 100% for extended periods - eg 1 - 3 minutes before returning as was the case with the cubic congestion control).
The trade-off is overall speed, however reliability is more important. For example the transfer of a 283k file across the radio link using scp now results in:
100% 283KB 2.5KB/s 01:51
This is a compromise I am happy enough with for now.
However, long running bulk transfer processes are still problematic and eventually stall and never complete.
For example, apt-get update, running for over an hour (stalling on the 11.7MB file), requires occasional carriage returns through the ssh terminal to continue running, and eventually do bog down to a very extended latency, though not failing all together.
In the following screen scrape, the process was 1 hour plus, with a few CRs every 10-15 mins and a delay of approximately 5 mins between when ^C was sent vs the terminal responding:
root@priotdev2:~# apt-get update
Get:1 http://mirror.internode.on.net/pub/raspbian/raspbian stretch InRelease [15.0 kB]
Get:3 http://mirror.internode.on.net/pub/raspbian/raspbian stretch/main armhf Packages [11.7 MB]
Get:2 http://archive.raspberrypi.org/debian stretch InRelease [25.3 kB]
Get:2 http://archive.raspberrypi.org/debian stretch InRelease [25.3 kB]
Get:4 http://archive.raspberrypi.org/debian stretch/main armhf Packages [145 kB]
Get:5 http://archive.raspberrypi.org/debian stretch/ui armhf Packages [30.7 kB]
Get:3 http://mirror.internode.on.net/pub/raspbian/raspbian stretch/main armhf Packages [11.7 MB]
23% [3 Packages 270 kB/11.7 MB 2%] 2,864 B/s 1h 6min 15s
Get:3 http://mirror.internode.on.net/pub/raspbian/raspbian stretch/main armhf Packages [11.7 MB]
29% [3 Packages 1,263 kB/11.7 MB 11%] 131 B/s 22h 2min 19s
Get:3 http://mirror.internode.on.net/pub/raspbian/raspbian stretch/main armhf Packages [11.7 MB]
Get:3 http://mirror.internode.on.net/pub/raspbian/raspbian stretch/main armhf Packages [11.7 MB]
Get:3 http://mirror.internode.on.net/pub/raspbian/raspbian stretch/main armhf Packages [11.7 MB]
Get:3 http://mirror.internode.on.net/pub/raspbian/raspbian stretch/main armhf Packages [11.7 MB]
60% [3 Packages 5,883 kB/11.7 MB 50%] 16 B/s 4d 4h 13min 58s
60% [3 Packages 5,902 kB/11.7 MB 51%] 1,531 B/s 1h 2min 38s
66% [3 Packages 6,704 kB/11.7 MB 58%]
66% [3 Packages 6,704 kB/11.7 MB 58%]
Get:3 http://mirror.internode.on.net/pub/raspbian/raspbian stretch/main armhf Packages [11.7 MB]
66% [3 Packages 6,735 kB/11.7 MB 58%]
66% [3 Packages 6,735 kB/11.7 MB 58%]
66% [3 Packages 6,745 kB/11.7 MB 58%] 32 B/s 1d 18h 37min 55s
66% [3 Packages 6,745 kB/11.7 MB 58%] 32 B/s 1d 18h 37min 55s
66% [3 Packages 6,745 kB/11.7 MB 58%]
66% [3 Packages 6,745 kB/11.7 MB 58%]
66% [3 Packages 6,746 kB/11.7 MB 58%] 230 B/s 5h 55min 46s
66% [3 Packages 6,747 kB/11.7 MB 58%] 148 B/s 9h 12min 47s^C
root@priotdev2:~# ^C
root@priotdev2:~#
root@priotdev2:~#
Scrolling to the right on the above dump can be seen the abysmal through put (down to 16 and 32 bytes per second in some cases).
To remove the end-to-end variables that also involve the satellite link, the apt process is actually using the upstream RPI as an apt cache (which is up to date), the transfers only represent traffic over the radio link.
Any insights from the community on further improvements would be most welcome.
As a partial answer to my own question, I have made significant reliability improvements through the following:
Changing the tcp_congestion_control kernel plugin from the default cubic to bbr has significantly addressed bufferbloat that along with a lossy radio connection is at the core of the issue.
This required loading of the tcp_bbr kernel module and also changing the net.core queuing discipline model to fair queuing to provide pacing for the bbr module.
On the RPIs the defaults are:
net.ipv4.tcp_congestion_control=cubic
net.core.default_qdisc=pfifo_fast
The commands to change this at run-time are:
modprobe tcp_bbr
sysctl -w net.core.default_qdisc=fq
sysctl -w net.ipv4.tcp_congestion_control=bbr
I presently run these from a script called from /etc/rc.local. They can easily be made permanent with modifications to modprode.d and sysctl.conf or as a file in sysctl.d.
The result is a far more smooth response over ssh and far more reliable bulk transfer, which while still stalling manages to recover quickly and complete reliably, returning to the command prompt immediately (rather than pausing at 100% for extended periods - eg 1 - 3 minutes before returning as was the case with the cubic congestion control).
The trade-off is overall speed, however reliability is more important. For example the transfer of a 283k file across the radio link using scp now results in:
100% 283KB 2.5KB/s 01:51
This is a compromise I am happy enough with for now.
However, long running bulk transfer processes are still problematic and eventually stall and never complete.
For example, apt-get update, running for over an hour (stalling on the 11.7MB file), requires occasional carriage returns through the ssh terminal to continue running, and eventually do bog down to a very extended latency, though not failing all together.
In the following screen scrape, the process was 1 hour plus, with a few CRs every 10-15 mins and a delay of approximately 5 mins between when ^C was sent vs the terminal responding:
root@priotdev2:~# apt-get update
Get:1 http://mirror.internode.on.net/pub/raspbian/raspbian stretch InRelease [15.0 kB]
Get:3 http://mirror.internode.on.net/pub/raspbian/raspbian stretch/main armhf Packages [11.7 MB]
Get:2 http://archive.raspberrypi.org/debian stretch InRelease [25.3 kB]
Get:2 http://archive.raspberrypi.org/debian stretch InRelease [25.3 kB]
Get:4 http://archive.raspberrypi.org/debian stretch/main armhf Packages [145 kB]
Get:5 http://archive.raspberrypi.org/debian stretch/ui armhf Packages [30.7 kB]
Get:3 http://mirror.internode.on.net/pub/raspbian/raspbian stretch/main armhf Packages [11.7 MB]
23% [3 Packages 270 kB/11.7 MB 2%] 2,864 B/s 1h 6min 15s
Get:3 http://mirror.internode.on.net/pub/raspbian/raspbian stretch/main armhf Packages [11.7 MB]
29% [3 Packages 1,263 kB/11.7 MB 11%] 131 B/s 22h 2min 19s
Get:3 http://mirror.internode.on.net/pub/raspbian/raspbian stretch/main armhf Packages [11.7 MB]
Get:3 http://mirror.internode.on.net/pub/raspbian/raspbian stretch/main armhf Packages [11.7 MB]
Get:3 http://mirror.internode.on.net/pub/raspbian/raspbian stretch/main armhf Packages [11.7 MB]
Get:3 http://mirror.internode.on.net/pub/raspbian/raspbian stretch/main armhf Packages [11.7 MB]
60% [3 Packages 5,883 kB/11.7 MB 50%] 16 B/s 4d 4h 13min 58s
60% [3 Packages 5,902 kB/11.7 MB 51%] 1,531 B/s 1h 2min 38s
66% [3 Packages 6,704 kB/11.7 MB 58%]
66% [3 Packages 6,704 kB/11.7 MB 58%]
Get:3 http://mirror.internode.on.net/pub/raspbian/raspbian stretch/main armhf Packages [11.7 MB]
66% [3 Packages 6,735 kB/11.7 MB 58%]
66% [3 Packages 6,735 kB/11.7 MB 58%]
66% [3 Packages 6,745 kB/11.7 MB 58%] 32 B/s 1d 18h 37min 55s
66% [3 Packages 6,745 kB/11.7 MB 58%] 32 B/s 1d 18h 37min 55s
66% [3 Packages 6,745 kB/11.7 MB 58%]
66% [3 Packages 6,745 kB/11.7 MB 58%]
66% [3 Packages 6,746 kB/11.7 MB 58%] 230 B/s 5h 55min 46s
66% [3 Packages 6,747 kB/11.7 MB 58%] 148 B/s 9h 12min 47s^C
root@priotdev2:~# ^C
root@priotdev2:~#
root@priotdev2:~#
Scrolling to the right on the above dump can be seen the abysmal through put (down to 16 and 32 bytes per second in some cases).
To remove the end-to-end variables that also involve the satellite link, the apt process is actually using the upstream RPI as an apt cache (which is up to date), the transfers only represent traffic over the radio link.
Any insights from the community on further improvements would be most welcome.
edited May 10 at 1:58
answered May 10 at 1:49
BrendanMcL
214
214
add a comment |Â
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f442668%2fhow-do-i-make-ppp-reliable-over-lossy-radio-modems-using-pppd-and-tcp-kernel-set%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
You seem to have tested
ppp
options thoroughly. The "saturation" behavior suggests bufferbloat (google), so have a look attc
and Linux Traffic Control tutorials.â dirkt
May 9 at 10:21
1
Were you looking over my shoulder :) Yes deep in that now. I've had some success with tcp_bbr congestion control and qdisc set to fair queuing. Will report back once I've done some more testing.
â BrendanMcL
May 9 at 11:23