Root Causing and Fixing NIC Buffer Overruns for 10Gb interfaces on Linux (SCTP)

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
0
down vote

favorite
1












I'm seeing a high rate of packet errors (almost all overruns) on both 10gb NICs attached to my linux server.
The system is handling high volumes of SCTP network traffic (very little TCP), so this is likely a linux kernel tuning problem.



However all the tuning parameters I've tried thus far seems to be having little effect and I'm still seeing high volumes of packet overruns.
Any pointers on other things I could try to get the system handling packets efficiently would be much appreciated!



:~# ifconfig ens4f1

ens4f1 Link encap:Ethernet HWaddr 5c:b9:01:de:0d:4c
UP BROADCAST RUNNING PROMISC MULTICAST MTU:9000 Metric:1
RX packets:22313514162 errors:17598241316 dropped:68
overruns:17598241316 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:31767480894219 (31.7 TB) TX bytes:0 (0.0 B)
Interrupt:17 Memory:c9800000-c9ffffff


System details:



OS : Ubuntu Linux (4.11.0-14-generic #20~16.04.1-Ubuntu SMP x86_64 )
CPU Cores : 72
NIC Model : NetXtreme II BCM57810 10 Gigabit Ethernet
RAM : 240 GiB



NIC sample stats showing packet error rate:



 for i in `seq 1 10`;do echo "$i) `date`" - $(ifconfig ens4f0| egrep "RX"| egrep overruns;sleep 5);done

1) Thu Oct 12 19:50:40 SGT 2017 - RX packets:8364065830 errors:2594507718 dropped:215 overruns:2594507718 frame:0
2) Thu Oct 12 19:50:45 SGT 2017 - RX packets:8365336060 errors:2596662672 dropped:215 overruns:2596662672 frame:0
3) Thu Oct 12 19:50:50 SGT 2017 - RX packets:8366602087 errors:2598840959 dropped:215 overruns:2598840959 frame:0
4) Thu Oct 12 19:50:55 SGT 2017 - RX packets:8367881271 errors:2600989229 dropped:215 overruns:2600989229 frame:0
5) Thu Oct 12 19:51:01 SGT 2017 - RX packets:8369147536 errors:2603157030 dropped:215 overruns:2603157030 frame:0
6) Thu Oct 12 19:51:06 SGT 2017 - RX packets:8370149567 errors:2604904183 dropped:215 overruns:2604904183 frame:0
7) Thu Oct 12 19:51:11 SGT 2017 - RX packets:8371298018 errors:2607183939 dropped:215 overruns:2607183939 frame:0
8) Thu Oct 12 19:51:16 SGT 2017 - RX packets:8372455587 errors:2609411186 dropped:215 overruns:2609411186 frame:0
9) Thu Oct 12 19:51:21 SGT 2017 - RX packets:8373585102 errors:2611680597 dropped:215 overruns:2611680597 frame:0
10) Thu Oct 12 19:51:26 SGT 2017 - RX packets:8374678508 errors:2614053000 dropped:215 overruns:2614053000 frame:0


However, checking (with tc) shows no ring buffer overruns on NIC:



tc -s qdisc show dev ens4f0|egrep drop

Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)


Checking tcp retransmits, the rate is low:



 for i in `seq 1 10`;do echo "`date`" - $(netstat -s | grep - i retransmited;sleep 2);done

Thu Oct 12 20:04:29 SGT 2017 - 10633 segments retransmited
Thu Oct 12 20:04:31 SGT 2017 - 10634 segments retransmited
Thu Oct 12 20:04:33 SGT 2017 - 10636 segments retransmited
Thu Oct 12 20:04:35 SGT 2017 - 10636 segments retransmited
Thu Oct 12 20:04:37 SGT 2017 - 10638 segments retransmited
Thu Oct 12 20:04:39 SGT 2017 - 10639 segments retransmited
Thu Oct 12 20:04:41 SGT 2017 - 10640 segments retransmited
Thu Oct 12 20:04:43 SGT 2017 - 10640 segments retransmited
Thu Oct 12 20:04:45 SGT 2017 - 10643 segments retransmited


What I've tried so far:




  • Tuning the NIC parameters (packet coalesce, offloading, upping NIC ring buffers etc ...):



    ethtool -L ens4f0 combined 30



    ethtool -K ens4f0 gso on rx on tx on sg on tso on



    ethtool -C ens4f0 rx-usecs 96



    ethtool -C ens4f0 adaptive-rx on



    ethtool -G ens4f0 rx 4078 tx 4078




  • sysctl tunables for the kernel (mainly increasing kernel tcp buffers):



    sysctl -w net.ipv4.tcp_low_latency=1



    sysctl -w net.ipv4.tcp_max_syn_backlog=16384



    sysctl -w net.core.optmem_max=20480000



    sysctl -w net.core.netdev_max_backlog=5000000



    sysctl -w net.ipv4.tcp_rmem="65536 1747600 83886080"



    sysctl -w net.core.somaxconn=1280



    sysctl -w kernel.sched_min_granularity_ns=10000000



    sysctl -w kernel.sched_wakeup_granularity_ns=15000000



    sysctl -w net.ipv4.tcp_wmem="65536 1747600 83886080"



    sysctl -w net.core.wmem_max=2147483647



    sysctl -w net.core.wmem_default=2147483647



    sysctl -w net.core.rmem_max=2147483647



    sysctl -w net.core.rmem_default=2147483647



    sysctl -w net.ipv4.tcp_congestion_control=cubic



    sysctl -w net.ipv4.tcp_rmem="163840 3495200 268754560"



    sysctl -w net.ipv4.tcp_wmem="163840 3495200 268754560"



    sysctl -w net.ipv4.udp_rmem_min="163840 3495200 268754560"



    sysctl -w net.ipv4.udp_wmem_min="163840 3495200 268754560"



    sysctl -w net.ipv4.tcp_mem="268754560 268754560 268754560"



    sysctl -w net.ipv4.udp_mem="268754560 268754560 268754560"



    sysctl -w net.ipv4.tcp_mtu_probing=1



    sysctl -w net.ipv4.tcp_slow_start_after_idle=0



Results after this (apparently not much):



 :~# for i in `seq 1 10`;do echo "$i) `date`" - $(ifconfig ens4f1| egrep "RX"| egrep overruns;sleep 5);done

1) Thu Oct 12 20:42:56 SGT 2017 - RX packets:16260617113 errors:10964865836 dropped:68 overruns:10964865836 frame:0
2) Thu Oct 12 20:43:01 SGT 2017 - RX packets:16263268608 errors:10969589847 dropped:68 overruns:10969589847 frame:0
3) Thu Oct 12 20:43:06 SGT 2017 - RX packets:16265869693 errors:10974489639 dropped:68 overruns:10974489639 frame:0
4) Thu Oct 12 20:43:11 SGT 2017 - RX packets:16268487078 errors:10979323070 dropped:68 overruns:10979323070 frame:0
5) Thu Oct 12 20:43:16 SGT 2017 - RX packets:16271098501 errors:10984193349 dropped:68 overruns:10984193349 frame:0
6) Thu Oct 12 20:43:21 SGT 2017 - RX packets:16273804004 errors:10988857622 dropped:68 overruns:10988857622 frame:0
7) Thu Oct 12 20:43:26 SGT 2017 - RX packets:16276493470 errors:10993340211 dropped:68 overruns:10993340211 frame:0
8) Thu Oct 12 20:43:31 SGT 2017 - RX packets:16278612090 errors:10997152436 dropped:68 overruns:10997152436 frame:0
9) Thu Oct 12 20:43:36 SGT 2017 - RX packets:16281253727 errors:11001834579 dropped:68 overruns:11001834579 frame:0
10) Thu Oct 12 20:43:41 SGT 2017 - RX packets:16283972622 errors:11006374277 dropped:68 overruns:11006374277 frame:0


Freak the CPU for better performance:



cpufreq-set -r -g performance


Results (nothing significant):



 :~# for i in `seq 1 10`;do echo "$i) `date`" - $(ifconfig ens4f1| egrep "RX"| egrep overruns;sleep 5);done

1) Thu Oct 12 21:53:07 SGT 2017 - RX packets:18506492788 errors:14622639426 dropped:68 overruns:14622639426 frame:0
2) Thu Oct 12 21:53:12 SGT 2017 - RX packets:18509314581 errors:14626750641 dropped:68 overruns:14626750641 frame:0
3) Thu Oct 12 21:53:17 SGT 2017 - RX packets:18511485458 errors:14630268859 dropped:68 overruns:14630268859 frame:0
4) Thu Oct 12 21:53:22 SGT 2017 - RX packets:18514223562 errors:14634547845 dropped:68 overruns:14634547845 frame:0
5) Thu Oct 12 21:53:27 SGT 2017 - RX packets:18516926578 errors:14638745143 dropped:68 overruns:14638745143 frame:0
6) Thu Oct 12 21:53:32 SGT 2017 - RX packets:18519605412 errors:14642929021 dropped:68 overruns:14642929021 frame:0
7) Thu Oct 12 21:53:37 SGT 2017 - RX packets:18522523560 errors:14647108982 dropped:68 overruns:14647108982 frame:0
8) Thu Oct 12 21:53:42 SGT 2017 - RX packets:18525185869 errors:14651577286 dropped:68 overruns:14651577286 frame:0
9) Thu Oct 12 21:53:47 SGT 2017 - RX packets:18527947266 errors:14655961847 dropped:68 overruns:14655961847 frame:0
10) Thu Oct 12 21:53:52 SGT 2017 - RX packets:18530703288 errors:14659988398 dropped:68 overruns:14659988398 frame:0


Results using sar:




:~# sar -n EDEV 5 3| egrep "(ens4f1|IFACE)"

11:17:43 PM IFACE rxerr/s txerr/s coll/s rxdrop/s txdrop/s txcarr/s rxfram/s rxfifo/s txfifo/s
11:17:48 PM ens4f1 360809.40 0.00 0.00 0.00 0.00 0.00 0.00 360809.40 0.00
11:17:53 PM ens4f1 382500.40 0.00 0.00 0.00 0.00
0.00 0.00 382500.40 0.00
11:17:58 PM ens4f1 353717.00 0.00 0.00 0.00 0.00
0.00 0.00 353717.00 0.00
Average: ens4f1 365675.60 0.00 0.00 0.00 0.00 0.00 0.00 365675.60 0.00



I've also tuned a few SCTP specific parameters, however without results as well:



sysctl -w net.core.rmem_max=900000000
sysctl -w net.core.wmem_max=900000000

sysctl -w net.sctp.sctp_mem="2100000000 2100000000 2100000000"
sysctl -w net.sctp.sctp_rmem="2100000000 2100000000 2100000000"
sysctl -w net.sctp.sctp_wmem="2100000000 2100000000 2100000000"

sysctl -w net.ipv4.udp_mem="5000000000 5000000000 5000000000"
sysctl -w net.ipv4.udp_mem="10000000000 10000000000 10000000000"

for i in `seq 1 10`;do echo "$i) `date`" - $(ifconfig ens4f1| egrep "RX"| egrep overruns;sleep 5);done
1) Sat Oct 14 21:55:55 SGT 2017 - RX packets:84379103241 errors:56372972367 dropped:58 overruns:56372972367 frame:0
2) Sat Oct 14 21:56:00 SGT 2017 - RX packets:84381451420 errors:56377777944 dropped:58 overruns:56377777944 frame:0
3) Sat Oct 14 21:56:05 SGT 2017 - RX packets:84383737427 errors:56382434478 dropped:58 overruns:56382434478 frame:0
4) Sat Oct 14 21:56:10 SGT 2017 - RX packets:84386524128 errors:56386618268 dropped:58 overruns:56386618268 frame:0
5) Sat Oct 14 21:56:15 SGT 2017 - RX packets:84389578203 errors:56390512483 dropped:58 overruns:56390512483 frame:0
6) Sat Oct 14 21:56:20 SGT 2017 - RX packets:84392673120 errors:56394472475 dropped:58 overruns:56394472475 frame:0
7) Sat Oct 14 21:56:25 SGT 2017 - RX packets:84395714973 errors:56398573221 dropped:58 overruns:56398573221 frame:0
8) Sat Oct 14 21:56:30 SGT 2017 - RX packets:84398951451 errors:56402297479 dropped:58 overruns:56402297479 frame:0
9) Sat Oct 14 21:56:35 SGT 2017 - RX packets:84401039177 errors:56406013473 dropped:58 overruns:56406013473 frame:0
10) Sat Oct 14 21:56:40 SGT 2017 - RX packets:84403558097 errors:56410804379 dropped:58 overruns:56410804379 frame:0






share|improve this question
























    up vote
    0
    down vote

    favorite
    1












    I'm seeing a high rate of packet errors (almost all overruns) on both 10gb NICs attached to my linux server.
    The system is handling high volumes of SCTP network traffic (very little TCP), so this is likely a linux kernel tuning problem.



    However all the tuning parameters I've tried thus far seems to be having little effect and I'm still seeing high volumes of packet overruns.
    Any pointers on other things I could try to get the system handling packets efficiently would be much appreciated!



    :~# ifconfig ens4f1

    ens4f1 Link encap:Ethernet HWaddr 5c:b9:01:de:0d:4c
    UP BROADCAST RUNNING PROMISC MULTICAST MTU:9000 Metric:1
    RX packets:22313514162 errors:17598241316 dropped:68
    overruns:17598241316 frame:0
    TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
    collisions:0 txqueuelen:1000
    RX bytes:31767480894219 (31.7 TB) TX bytes:0 (0.0 B)
    Interrupt:17 Memory:c9800000-c9ffffff


    System details:



    OS : Ubuntu Linux (4.11.0-14-generic #20~16.04.1-Ubuntu SMP x86_64 )
    CPU Cores : 72
    NIC Model : NetXtreme II BCM57810 10 Gigabit Ethernet
    RAM : 240 GiB



    NIC sample stats showing packet error rate:



     for i in `seq 1 10`;do echo "$i) `date`" - $(ifconfig ens4f0| egrep "RX"| egrep overruns;sleep 5);done

    1) Thu Oct 12 19:50:40 SGT 2017 - RX packets:8364065830 errors:2594507718 dropped:215 overruns:2594507718 frame:0
    2) Thu Oct 12 19:50:45 SGT 2017 - RX packets:8365336060 errors:2596662672 dropped:215 overruns:2596662672 frame:0
    3) Thu Oct 12 19:50:50 SGT 2017 - RX packets:8366602087 errors:2598840959 dropped:215 overruns:2598840959 frame:0
    4) Thu Oct 12 19:50:55 SGT 2017 - RX packets:8367881271 errors:2600989229 dropped:215 overruns:2600989229 frame:0
    5) Thu Oct 12 19:51:01 SGT 2017 - RX packets:8369147536 errors:2603157030 dropped:215 overruns:2603157030 frame:0
    6) Thu Oct 12 19:51:06 SGT 2017 - RX packets:8370149567 errors:2604904183 dropped:215 overruns:2604904183 frame:0
    7) Thu Oct 12 19:51:11 SGT 2017 - RX packets:8371298018 errors:2607183939 dropped:215 overruns:2607183939 frame:0
    8) Thu Oct 12 19:51:16 SGT 2017 - RX packets:8372455587 errors:2609411186 dropped:215 overruns:2609411186 frame:0
    9) Thu Oct 12 19:51:21 SGT 2017 - RX packets:8373585102 errors:2611680597 dropped:215 overruns:2611680597 frame:0
    10) Thu Oct 12 19:51:26 SGT 2017 - RX packets:8374678508 errors:2614053000 dropped:215 overruns:2614053000 frame:0


    However, checking (with tc) shows no ring buffer overruns on NIC:



    tc -s qdisc show dev ens4f0|egrep drop

    Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
    Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
    Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
    Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
    Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
    Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
    Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
    Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
    Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
    Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
    Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
    Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
    Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
    Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
    Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
    Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
    Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
    Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
    Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
    Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
    Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
    Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
    Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
    Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
    Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)


    Checking tcp retransmits, the rate is low:



     for i in `seq 1 10`;do echo "`date`" - $(netstat -s | grep - i retransmited;sleep 2);done

    Thu Oct 12 20:04:29 SGT 2017 - 10633 segments retransmited
    Thu Oct 12 20:04:31 SGT 2017 - 10634 segments retransmited
    Thu Oct 12 20:04:33 SGT 2017 - 10636 segments retransmited
    Thu Oct 12 20:04:35 SGT 2017 - 10636 segments retransmited
    Thu Oct 12 20:04:37 SGT 2017 - 10638 segments retransmited
    Thu Oct 12 20:04:39 SGT 2017 - 10639 segments retransmited
    Thu Oct 12 20:04:41 SGT 2017 - 10640 segments retransmited
    Thu Oct 12 20:04:43 SGT 2017 - 10640 segments retransmited
    Thu Oct 12 20:04:45 SGT 2017 - 10643 segments retransmited


    What I've tried so far:




    • Tuning the NIC parameters (packet coalesce, offloading, upping NIC ring buffers etc ...):



      ethtool -L ens4f0 combined 30



      ethtool -K ens4f0 gso on rx on tx on sg on tso on



      ethtool -C ens4f0 rx-usecs 96



      ethtool -C ens4f0 adaptive-rx on



      ethtool -G ens4f0 rx 4078 tx 4078




    • sysctl tunables for the kernel (mainly increasing kernel tcp buffers):



      sysctl -w net.ipv4.tcp_low_latency=1



      sysctl -w net.ipv4.tcp_max_syn_backlog=16384



      sysctl -w net.core.optmem_max=20480000



      sysctl -w net.core.netdev_max_backlog=5000000



      sysctl -w net.ipv4.tcp_rmem="65536 1747600 83886080"



      sysctl -w net.core.somaxconn=1280



      sysctl -w kernel.sched_min_granularity_ns=10000000



      sysctl -w kernel.sched_wakeup_granularity_ns=15000000



      sysctl -w net.ipv4.tcp_wmem="65536 1747600 83886080"



      sysctl -w net.core.wmem_max=2147483647



      sysctl -w net.core.wmem_default=2147483647



      sysctl -w net.core.rmem_max=2147483647



      sysctl -w net.core.rmem_default=2147483647



      sysctl -w net.ipv4.tcp_congestion_control=cubic



      sysctl -w net.ipv4.tcp_rmem="163840 3495200 268754560"



      sysctl -w net.ipv4.tcp_wmem="163840 3495200 268754560"



      sysctl -w net.ipv4.udp_rmem_min="163840 3495200 268754560"



      sysctl -w net.ipv4.udp_wmem_min="163840 3495200 268754560"



      sysctl -w net.ipv4.tcp_mem="268754560 268754560 268754560"



      sysctl -w net.ipv4.udp_mem="268754560 268754560 268754560"



      sysctl -w net.ipv4.tcp_mtu_probing=1



      sysctl -w net.ipv4.tcp_slow_start_after_idle=0



    Results after this (apparently not much):



     :~# for i in `seq 1 10`;do echo "$i) `date`" - $(ifconfig ens4f1| egrep "RX"| egrep overruns;sleep 5);done

    1) Thu Oct 12 20:42:56 SGT 2017 - RX packets:16260617113 errors:10964865836 dropped:68 overruns:10964865836 frame:0
    2) Thu Oct 12 20:43:01 SGT 2017 - RX packets:16263268608 errors:10969589847 dropped:68 overruns:10969589847 frame:0
    3) Thu Oct 12 20:43:06 SGT 2017 - RX packets:16265869693 errors:10974489639 dropped:68 overruns:10974489639 frame:0
    4) Thu Oct 12 20:43:11 SGT 2017 - RX packets:16268487078 errors:10979323070 dropped:68 overruns:10979323070 frame:0
    5) Thu Oct 12 20:43:16 SGT 2017 - RX packets:16271098501 errors:10984193349 dropped:68 overruns:10984193349 frame:0
    6) Thu Oct 12 20:43:21 SGT 2017 - RX packets:16273804004 errors:10988857622 dropped:68 overruns:10988857622 frame:0
    7) Thu Oct 12 20:43:26 SGT 2017 - RX packets:16276493470 errors:10993340211 dropped:68 overruns:10993340211 frame:0
    8) Thu Oct 12 20:43:31 SGT 2017 - RX packets:16278612090 errors:10997152436 dropped:68 overruns:10997152436 frame:0
    9) Thu Oct 12 20:43:36 SGT 2017 - RX packets:16281253727 errors:11001834579 dropped:68 overruns:11001834579 frame:0
    10) Thu Oct 12 20:43:41 SGT 2017 - RX packets:16283972622 errors:11006374277 dropped:68 overruns:11006374277 frame:0


    Freak the CPU for better performance:



    cpufreq-set -r -g performance


    Results (nothing significant):



     :~# for i in `seq 1 10`;do echo "$i) `date`" - $(ifconfig ens4f1| egrep "RX"| egrep overruns;sleep 5);done

    1) Thu Oct 12 21:53:07 SGT 2017 - RX packets:18506492788 errors:14622639426 dropped:68 overruns:14622639426 frame:0
    2) Thu Oct 12 21:53:12 SGT 2017 - RX packets:18509314581 errors:14626750641 dropped:68 overruns:14626750641 frame:0
    3) Thu Oct 12 21:53:17 SGT 2017 - RX packets:18511485458 errors:14630268859 dropped:68 overruns:14630268859 frame:0
    4) Thu Oct 12 21:53:22 SGT 2017 - RX packets:18514223562 errors:14634547845 dropped:68 overruns:14634547845 frame:0
    5) Thu Oct 12 21:53:27 SGT 2017 - RX packets:18516926578 errors:14638745143 dropped:68 overruns:14638745143 frame:0
    6) Thu Oct 12 21:53:32 SGT 2017 - RX packets:18519605412 errors:14642929021 dropped:68 overruns:14642929021 frame:0
    7) Thu Oct 12 21:53:37 SGT 2017 - RX packets:18522523560 errors:14647108982 dropped:68 overruns:14647108982 frame:0
    8) Thu Oct 12 21:53:42 SGT 2017 - RX packets:18525185869 errors:14651577286 dropped:68 overruns:14651577286 frame:0
    9) Thu Oct 12 21:53:47 SGT 2017 - RX packets:18527947266 errors:14655961847 dropped:68 overruns:14655961847 frame:0
    10) Thu Oct 12 21:53:52 SGT 2017 - RX packets:18530703288 errors:14659988398 dropped:68 overruns:14659988398 frame:0


    Results using sar:




    :~# sar -n EDEV 5 3| egrep "(ens4f1|IFACE)"

    11:17:43 PM IFACE rxerr/s txerr/s coll/s rxdrop/s txdrop/s txcarr/s rxfram/s rxfifo/s txfifo/s
    11:17:48 PM ens4f1 360809.40 0.00 0.00 0.00 0.00 0.00 0.00 360809.40 0.00
    11:17:53 PM ens4f1 382500.40 0.00 0.00 0.00 0.00
    0.00 0.00 382500.40 0.00
    11:17:58 PM ens4f1 353717.00 0.00 0.00 0.00 0.00
    0.00 0.00 353717.00 0.00
    Average: ens4f1 365675.60 0.00 0.00 0.00 0.00 0.00 0.00 365675.60 0.00



    I've also tuned a few SCTP specific parameters, however without results as well:



    sysctl -w net.core.rmem_max=900000000
    sysctl -w net.core.wmem_max=900000000

    sysctl -w net.sctp.sctp_mem="2100000000 2100000000 2100000000"
    sysctl -w net.sctp.sctp_rmem="2100000000 2100000000 2100000000"
    sysctl -w net.sctp.sctp_wmem="2100000000 2100000000 2100000000"

    sysctl -w net.ipv4.udp_mem="5000000000 5000000000 5000000000"
    sysctl -w net.ipv4.udp_mem="10000000000 10000000000 10000000000"

    for i in `seq 1 10`;do echo "$i) `date`" - $(ifconfig ens4f1| egrep "RX"| egrep overruns;sleep 5);done
    1) Sat Oct 14 21:55:55 SGT 2017 - RX packets:84379103241 errors:56372972367 dropped:58 overruns:56372972367 frame:0
    2) Sat Oct 14 21:56:00 SGT 2017 - RX packets:84381451420 errors:56377777944 dropped:58 overruns:56377777944 frame:0
    3) Sat Oct 14 21:56:05 SGT 2017 - RX packets:84383737427 errors:56382434478 dropped:58 overruns:56382434478 frame:0
    4) Sat Oct 14 21:56:10 SGT 2017 - RX packets:84386524128 errors:56386618268 dropped:58 overruns:56386618268 frame:0
    5) Sat Oct 14 21:56:15 SGT 2017 - RX packets:84389578203 errors:56390512483 dropped:58 overruns:56390512483 frame:0
    6) Sat Oct 14 21:56:20 SGT 2017 - RX packets:84392673120 errors:56394472475 dropped:58 overruns:56394472475 frame:0
    7) Sat Oct 14 21:56:25 SGT 2017 - RX packets:84395714973 errors:56398573221 dropped:58 overruns:56398573221 frame:0
    8) Sat Oct 14 21:56:30 SGT 2017 - RX packets:84398951451 errors:56402297479 dropped:58 overruns:56402297479 frame:0
    9) Sat Oct 14 21:56:35 SGT 2017 - RX packets:84401039177 errors:56406013473 dropped:58 overruns:56406013473 frame:0
    10) Sat Oct 14 21:56:40 SGT 2017 - RX packets:84403558097 errors:56410804379 dropped:58 overruns:56410804379 frame:0






    share|improve this question






















      up vote
      0
      down vote

      favorite
      1









      up vote
      0
      down vote

      favorite
      1






      1





      I'm seeing a high rate of packet errors (almost all overruns) on both 10gb NICs attached to my linux server.
      The system is handling high volumes of SCTP network traffic (very little TCP), so this is likely a linux kernel tuning problem.



      However all the tuning parameters I've tried thus far seems to be having little effect and I'm still seeing high volumes of packet overruns.
      Any pointers on other things I could try to get the system handling packets efficiently would be much appreciated!



      :~# ifconfig ens4f1

      ens4f1 Link encap:Ethernet HWaddr 5c:b9:01:de:0d:4c
      UP BROADCAST RUNNING PROMISC MULTICAST MTU:9000 Metric:1
      RX packets:22313514162 errors:17598241316 dropped:68
      overruns:17598241316 frame:0
      TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
      collisions:0 txqueuelen:1000
      RX bytes:31767480894219 (31.7 TB) TX bytes:0 (0.0 B)
      Interrupt:17 Memory:c9800000-c9ffffff


      System details:



      OS : Ubuntu Linux (4.11.0-14-generic #20~16.04.1-Ubuntu SMP x86_64 )
      CPU Cores : 72
      NIC Model : NetXtreme II BCM57810 10 Gigabit Ethernet
      RAM : 240 GiB



      NIC sample stats showing packet error rate:



       for i in `seq 1 10`;do echo "$i) `date`" - $(ifconfig ens4f0| egrep "RX"| egrep overruns;sleep 5);done

      1) Thu Oct 12 19:50:40 SGT 2017 - RX packets:8364065830 errors:2594507718 dropped:215 overruns:2594507718 frame:0
      2) Thu Oct 12 19:50:45 SGT 2017 - RX packets:8365336060 errors:2596662672 dropped:215 overruns:2596662672 frame:0
      3) Thu Oct 12 19:50:50 SGT 2017 - RX packets:8366602087 errors:2598840959 dropped:215 overruns:2598840959 frame:0
      4) Thu Oct 12 19:50:55 SGT 2017 - RX packets:8367881271 errors:2600989229 dropped:215 overruns:2600989229 frame:0
      5) Thu Oct 12 19:51:01 SGT 2017 - RX packets:8369147536 errors:2603157030 dropped:215 overruns:2603157030 frame:0
      6) Thu Oct 12 19:51:06 SGT 2017 - RX packets:8370149567 errors:2604904183 dropped:215 overruns:2604904183 frame:0
      7) Thu Oct 12 19:51:11 SGT 2017 - RX packets:8371298018 errors:2607183939 dropped:215 overruns:2607183939 frame:0
      8) Thu Oct 12 19:51:16 SGT 2017 - RX packets:8372455587 errors:2609411186 dropped:215 overruns:2609411186 frame:0
      9) Thu Oct 12 19:51:21 SGT 2017 - RX packets:8373585102 errors:2611680597 dropped:215 overruns:2611680597 frame:0
      10) Thu Oct 12 19:51:26 SGT 2017 - RX packets:8374678508 errors:2614053000 dropped:215 overruns:2614053000 frame:0


      However, checking (with tc) shows no ring buffer overruns on NIC:



      tc -s qdisc show dev ens4f0|egrep drop

      Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
      Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
      Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
      Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
      Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
      Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
      Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
      Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
      Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
      Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
      Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
      Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
      Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
      Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
      Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
      Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
      Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
      Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
      Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
      Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
      Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
      Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
      Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
      Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
      Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)


      Checking tcp retransmits, the rate is low:



       for i in `seq 1 10`;do echo "`date`" - $(netstat -s | grep - i retransmited;sleep 2);done

      Thu Oct 12 20:04:29 SGT 2017 - 10633 segments retransmited
      Thu Oct 12 20:04:31 SGT 2017 - 10634 segments retransmited
      Thu Oct 12 20:04:33 SGT 2017 - 10636 segments retransmited
      Thu Oct 12 20:04:35 SGT 2017 - 10636 segments retransmited
      Thu Oct 12 20:04:37 SGT 2017 - 10638 segments retransmited
      Thu Oct 12 20:04:39 SGT 2017 - 10639 segments retransmited
      Thu Oct 12 20:04:41 SGT 2017 - 10640 segments retransmited
      Thu Oct 12 20:04:43 SGT 2017 - 10640 segments retransmited
      Thu Oct 12 20:04:45 SGT 2017 - 10643 segments retransmited


      What I've tried so far:




      • Tuning the NIC parameters (packet coalesce, offloading, upping NIC ring buffers etc ...):



        ethtool -L ens4f0 combined 30



        ethtool -K ens4f0 gso on rx on tx on sg on tso on



        ethtool -C ens4f0 rx-usecs 96



        ethtool -C ens4f0 adaptive-rx on



        ethtool -G ens4f0 rx 4078 tx 4078




      • sysctl tunables for the kernel (mainly increasing kernel tcp buffers):



        sysctl -w net.ipv4.tcp_low_latency=1



        sysctl -w net.ipv4.tcp_max_syn_backlog=16384



        sysctl -w net.core.optmem_max=20480000



        sysctl -w net.core.netdev_max_backlog=5000000



        sysctl -w net.ipv4.tcp_rmem="65536 1747600 83886080"



        sysctl -w net.core.somaxconn=1280



        sysctl -w kernel.sched_min_granularity_ns=10000000



        sysctl -w kernel.sched_wakeup_granularity_ns=15000000



        sysctl -w net.ipv4.tcp_wmem="65536 1747600 83886080"



        sysctl -w net.core.wmem_max=2147483647



        sysctl -w net.core.wmem_default=2147483647



        sysctl -w net.core.rmem_max=2147483647



        sysctl -w net.core.rmem_default=2147483647



        sysctl -w net.ipv4.tcp_congestion_control=cubic



        sysctl -w net.ipv4.tcp_rmem="163840 3495200 268754560"



        sysctl -w net.ipv4.tcp_wmem="163840 3495200 268754560"



        sysctl -w net.ipv4.udp_rmem_min="163840 3495200 268754560"



        sysctl -w net.ipv4.udp_wmem_min="163840 3495200 268754560"



        sysctl -w net.ipv4.tcp_mem="268754560 268754560 268754560"



        sysctl -w net.ipv4.udp_mem="268754560 268754560 268754560"



        sysctl -w net.ipv4.tcp_mtu_probing=1



        sysctl -w net.ipv4.tcp_slow_start_after_idle=0



      Results after this (apparently not much):



       :~# for i in `seq 1 10`;do echo "$i) `date`" - $(ifconfig ens4f1| egrep "RX"| egrep overruns;sleep 5);done

      1) Thu Oct 12 20:42:56 SGT 2017 - RX packets:16260617113 errors:10964865836 dropped:68 overruns:10964865836 frame:0
      2) Thu Oct 12 20:43:01 SGT 2017 - RX packets:16263268608 errors:10969589847 dropped:68 overruns:10969589847 frame:0
      3) Thu Oct 12 20:43:06 SGT 2017 - RX packets:16265869693 errors:10974489639 dropped:68 overruns:10974489639 frame:0
      4) Thu Oct 12 20:43:11 SGT 2017 - RX packets:16268487078 errors:10979323070 dropped:68 overruns:10979323070 frame:0
      5) Thu Oct 12 20:43:16 SGT 2017 - RX packets:16271098501 errors:10984193349 dropped:68 overruns:10984193349 frame:0
      6) Thu Oct 12 20:43:21 SGT 2017 - RX packets:16273804004 errors:10988857622 dropped:68 overruns:10988857622 frame:0
      7) Thu Oct 12 20:43:26 SGT 2017 - RX packets:16276493470 errors:10993340211 dropped:68 overruns:10993340211 frame:0
      8) Thu Oct 12 20:43:31 SGT 2017 - RX packets:16278612090 errors:10997152436 dropped:68 overruns:10997152436 frame:0
      9) Thu Oct 12 20:43:36 SGT 2017 - RX packets:16281253727 errors:11001834579 dropped:68 overruns:11001834579 frame:0
      10) Thu Oct 12 20:43:41 SGT 2017 - RX packets:16283972622 errors:11006374277 dropped:68 overruns:11006374277 frame:0


      Freak the CPU for better performance:



      cpufreq-set -r -g performance


      Results (nothing significant):



       :~# for i in `seq 1 10`;do echo "$i) `date`" - $(ifconfig ens4f1| egrep "RX"| egrep overruns;sleep 5);done

      1) Thu Oct 12 21:53:07 SGT 2017 - RX packets:18506492788 errors:14622639426 dropped:68 overruns:14622639426 frame:0
      2) Thu Oct 12 21:53:12 SGT 2017 - RX packets:18509314581 errors:14626750641 dropped:68 overruns:14626750641 frame:0
      3) Thu Oct 12 21:53:17 SGT 2017 - RX packets:18511485458 errors:14630268859 dropped:68 overruns:14630268859 frame:0
      4) Thu Oct 12 21:53:22 SGT 2017 - RX packets:18514223562 errors:14634547845 dropped:68 overruns:14634547845 frame:0
      5) Thu Oct 12 21:53:27 SGT 2017 - RX packets:18516926578 errors:14638745143 dropped:68 overruns:14638745143 frame:0
      6) Thu Oct 12 21:53:32 SGT 2017 - RX packets:18519605412 errors:14642929021 dropped:68 overruns:14642929021 frame:0
      7) Thu Oct 12 21:53:37 SGT 2017 - RX packets:18522523560 errors:14647108982 dropped:68 overruns:14647108982 frame:0
      8) Thu Oct 12 21:53:42 SGT 2017 - RX packets:18525185869 errors:14651577286 dropped:68 overruns:14651577286 frame:0
      9) Thu Oct 12 21:53:47 SGT 2017 - RX packets:18527947266 errors:14655961847 dropped:68 overruns:14655961847 frame:0
      10) Thu Oct 12 21:53:52 SGT 2017 - RX packets:18530703288 errors:14659988398 dropped:68 overruns:14659988398 frame:0


      Results using sar:




      :~# sar -n EDEV 5 3| egrep "(ens4f1|IFACE)"

      11:17:43 PM IFACE rxerr/s txerr/s coll/s rxdrop/s txdrop/s txcarr/s rxfram/s rxfifo/s txfifo/s
      11:17:48 PM ens4f1 360809.40 0.00 0.00 0.00 0.00 0.00 0.00 360809.40 0.00
      11:17:53 PM ens4f1 382500.40 0.00 0.00 0.00 0.00
      0.00 0.00 382500.40 0.00
      11:17:58 PM ens4f1 353717.00 0.00 0.00 0.00 0.00
      0.00 0.00 353717.00 0.00
      Average: ens4f1 365675.60 0.00 0.00 0.00 0.00 0.00 0.00 365675.60 0.00



      I've also tuned a few SCTP specific parameters, however without results as well:



      sysctl -w net.core.rmem_max=900000000
      sysctl -w net.core.wmem_max=900000000

      sysctl -w net.sctp.sctp_mem="2100000000 2100000000 2100000000"
      sysctl -w net.sctp.sctp_rmem="2100000000 2100000000 2100000000"
      sysctl -w net.sctp.sctp_wmem="2100000000 2100000000 2100000000"

      sysctl -w net.ipv4.udp_mem="5000000000 5000000000 5000000000"
      sysctl -w net.ipv4.udp_mem="10000000000 10000000000 10000000000"

      for i in `seq 1 10`;do echo "$i) `date`" - $(ifconfig ens4f1| egrep "RX"| egrep overruns;sleep 5);done
      1) Sat Oct 14 21:55:55 SGT 2017 - RX packets:84379103241 errors:56372972367 dropped:58 overruns:56372972367 frame:0
      2) Sat Oct 14 21:56:00 SGT 2017 - RX packets:84381451420 errors:56377777944 dropped:58 overruns:56377777944 frame:0
      3) Sat Oct 14 21:56:05 SGT 2017 - RX packets:84383737427 errors:56382434478 dropped:58 overruns:56382434478 frame:0
      4) Sat Oct 14 21:56:10 SGT 2017 - RX packets:84386524128 errors:56386618268 dropped:58 overruns:56386618268 frame:0
      5) Sat Oct 14 21:56:15 SGT 2017 - RX packets:84389578203 errors:56390512483 dropped:58 overruns:56390512483 frame:0
      6) Sat Oct 14 21:56:20 SGT 2017 - RX packets:84392673120 errors:56394472475 dropped:58 overruns:56394472475 frame:0
      7) Sat Oct 14 21:56:25 SGT 2017 - RX packets:84395714973 errors:56398573221 dropped:58 overruns:56398573221 frame:0
      8) Sat Oct 14 21:56:30 SGT 2017 - RX packets:84398951451 errors:56402297479 dropped:58 overruns:56402297479 frame:0
      9) Sat Oct 14 21:56:35 SGT 2017 - RX packets:84401039177 errors:56406013473 dropped:58 overruns:56406013473 frame:0
      10) Sat Oct 14 21:56:40 SGT 2017 - RX packets:84403558097 errors:56410804379 dropped:58 overruns:56410804379 frame:0






      share|improve this question












      I'm seeing a high rate of packet errors (almost all overruns) on both 10gb NICs attached to my linux server.
      The system is handling high volumes of SCTP network traffic (very little TCP), so this is likely a linux kernel tuning problem.



      However all the tuning parameters I've tried thus far seems to be having little effect and I'm still seeing high volumes of packet overruns.
      Any pointers on other things I could try to get the system handling packets efficiently would be much appreciated!



      :~# ifconfig ens4f1

      ens4f1 Link encap:Ethernet HWaddr 5c:b9:01:de:0d:4c
      UP BROADCAST RUNNING PROMISC MULTICAST MTU:9000 Metric:1
      RX packets:22313514162 errors:17598241316 dropped:68
      overruns:17598241316 frame:0
      TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
      collisions:0 txqueuelen:1000
      RX bytes:31767480894219 (31.7 TB) TX bytes:0 (0.0 B)
      Interrupt:17 Memory:c9800000-c9ffffff


      System details:



      OS : Ubuntu Linux (4.11.0-14-generic #20~16.04.1-Ubuntu SMP x86_64 )
      CPU Cores : 72
      NIC Model : NetXtreme II BCM57810 10 Gigabit Ethernet
      RAM : 240 GiB



      NIC sample stats showing packet error rate:



       for i in `seq 1 10`;do echo "$i) `date`" - $(ifconfig ens4f0| egrep "RX"| egrep overruns;sleep 5);done

      1) Thu Oct 12 19:50:40 SGT 2017 - RX packets:8364065830 errors:2594507718 dropped:215 overruns:2594507718 frame:0
      2) Thu Oct 12 19:50:45 SGT 2017 - RX packets:8365336060 errors:2596662672 dropped:215 overruns:2596662672 frame:0
      3) Thu Oct 12 19:50:50 SGT 2017 - RX packets:8366602087 errors:2598840959 dropped:215 overruns:2598840959 frame:0
      4) Thu Oct 12 19:50:55 SGT 2017 - RX packets:8367881271 errors:2600989229 dropped:215 overruns:2600989229 frame:0
      5) Thu Oct 12 19:51:01 SGT 2017 - RX packets:8369147536 errors:2603157030 dropped:215 overruns:2603157030 frame:0
      6) Thu Oct 12 19:51:06 SGT 2017 - RX packets:8370149567 errors:2604904183 dropped:215 overruns:2604904183 frame:0
      7) Thu Oct 12 19:51:11 SGT 2017 - RX packets:8371298018 errors:2607183939 dropped:215 overruns:2607183939 frame:0
      8) Thu Oct 12 19:51:16 SGT 2017 - RX packets:8372455587 errors:2609411186 dropped:215 overruns:2609411186 frame:0
      9) Thu Oct 12 19:51:21 SGT 2017 - RX packets:8373585102 errors:2611680597 dropped:215 overruns:2611680597 frame:0
      10) Thu Oct 12 19:51:26 SGT 2017 - RX packets:8374678508 errors:2614053000 dropped:215 overruns:2614053000 frame:0


      However, checking (with tc) shows no ring buffer overruns on NIC:



      tc -s qdisc show dev ens4f0|egrep drop

      Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
      Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
      Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
      Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
      Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
      Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
      Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
      Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
      Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
      Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
      Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
      Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
      Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
      Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
      Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
      Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
      Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
      Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
      Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
      Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
      Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
      Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
      Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
      Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
      Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)


      Checking tcp retransmits, the rate is low:



       for i in `seq 1 10`;do echo "`date`" - $(netstat -s | grep - i retransmited;sleep 2);done

      Thu Oct 12 20:04:29 SGT 2017 - 10633 segments retransmited
      Thu Oct 12 20:04:31 SGT 2017 - 10634 segments retransmited
      Thu Oct 12 20:04:33 SGT 2017 - 10636 segments retransmited
      Thu Oct 12 20:04:35 SGT 2017 - 10636 segments retransmited
      Thu Oct 12 20:04:37 SGT 2017 - 10638 segments retransmited
      Thu Oct 12 20:04:39 SGT 2017 - 10639 segments retransmited
      Thu Oct 12 20:04:41 SGT 2017 - 10640 segments retransmited
      Thu Oct 12 20:04:43 SGT 2017 - 10640 segments retransmited
      Thu Oct 12 20:04:45 SGT 2017 - 10643 segments retransmited


      What I've tried so far:




      • Tuning the NIC parameters (packet coalesce, offloading, upping NIC ring buffers etc ...):



        ethtool -L ens4f0 combined 30



        ethtool -K ens4f0 gso on rx on tx on sg on tso on



        ethtool -C ens4f0 rx-usecs 96



        ethtool -C ens4f0 adaptive-rx on



        ethtool -G ens4f0 rx 4078 tx 4078




      • sysctl tunables for the kernel (mainly increasing kernel tcp buffers):



        sysctl -w net.ipv4.tcp_low_latency=1



        sysctl -w net.ipv4.tcp_max_syn_backlog=16384



        sysctl -w net.core.optmem_max=20480000



        sysctl -w net.core.netdev_max_backlog=5000000



        sysctl -w net.ipv4.tcp_rmem="65536 1747600 83886080"



        sysctl -w net.core.somaxconn=1280



        sysctl -w kernel.sched_min_granularity_ns=10000000



        sysctl -w kernel.sched_wakeup_granularity_ns=15000000



        sysctl -w net.ipv4.tcp_wmem="65536 1747600 83886080"



        sysctl -w net.core.wmem_max=2147483647



        sysctl -w net.core.wmem_default=2147483647



        sysctl -w net.core.rmem_max=2147483647



        sysctl -w net.core.rmem_default=2147483647



        sysctl -w net.ipv4.tcp_congestion_control=cubic



        sysctl -w net.ipv4.tcp_rmem="163840 3495200 268754560"



        sysctl -w net.ipv4.tcp_wmem="163840 3495200 268754560"



        sysctl -w net.ipv4.udp_rmem_min="163840 3495200 268754560"



        sysctl -w net.ipv4.udp_wmem_min="163840 3495200 268754560"



        sysctl -w net.ipv4.tcp_mem="268754560 268754560 268754560"



        sysctl -w net.ipv4.udp_mem="268754560 268754560 268754560"



        sysctl -w net.ipv4.tcp_mtu_probing=1



        sysctl -w net.ipv4.tcp_slow_start_after_idle=0



      Results after this (apparently not much):



       :~# for i in `seq 1 10`;do echo "$i) `date`" - $(ifconfig ens4f1| egrep "RX"| egrep overruns;sleep 5);done

      1) Thu Oct 12 20:42:56 SGT 2017 - RX packets:16260617113 errors:10964865836 dropped:68 overruns:10964865836 frame:0
      2) Thu Oct 12 20:43:01 SGT 2017 - RX packets:16263268608 errors:10969589847 dropped:68 overruns:10969589847 frame:0
      3) Thu Oct 12 20:43:06 SGT 2017 - RX packets:16265869693 errors:10974489639 dropped:68 overruns:10974489639 frame:0
      4) Thu Oct 12 20:43:11 SGT 2017 - RX packets:16268487078 errors:10979323070 dropped:68 overruns:10979323070 frame:0
      5) Thu Oct 12 20:43:16 SGT 2017 - RX packets:16271098501 errors:10984193349 dropped:68 overruns:10984193349 frame:0
      6) Thu Oct 12 20:43:21 SGT 2017 - RX packets:16273804004 errors:10988857622 dropped:68 overruns:10988857622 frame:0
      7) Thu Oct 12 20:43:26 SGT 2017 - RX packets:16276493470 errors:10993340211 dropped:68 overruns:10993340211 frame:0
      8) Thu Oct 12 20:43:31 SGT 2017 - RX packets:16278612090 errors:10997152436 dropped:68 overruns:10997152436 frame:0
      9) Thu Oct 12 20:43:36 SGT 2017 - RX packets:16281253727 errors:11001834579 dropped:68 overruns:11001834579 frame:0
      10) Thu Oct 12 20:43:41 SGT 2017 - RX packets:16283972622 errors:11006374277 dropped:68 overruns:11006374277 frame:0


      Freak the CPU for better performance:



      cpufreq-set -r -g performance


      Results (nothing significant):



       :~# for i in `seq 1 10`;do echo "$i) `date`" - $(ifconfig ens4f1| egrep "RX"| egrep overruns;sleep 5);done

      1) Thu Oct 12 21:53:07 SGT 2017 - RX packets:18506492788 errors:14622639426 dropped:68 overruns:14622639426 frame:0
      2) Thu Oct 12 21:53:12 SGT 2017 - RX packets:18509314581 errors:14626750641 dropped:68 overruns:14626750641 frame:0
      3) Thu Oct 12 21:53:17 SGT 2017 - RX packets:18511485458 errors:14630268859 dropped:68 overruns:14630268859 frame:0
      4) Thu Oct 12 21:53:22 SGT 2017 - RX packets:18514223562 errors:14634547845 dropped:68 overruns:14634547845 frame:0
      5) Thu Oct 12 21:53:27 SGT 2017 - RX packets:18516926578 errors:14638745143 dropped:68 overruns:14638745143 frame:0
      6) Thu Oct 12 21:53:32 SGT 2017 - RX packets:18519605412 errors:14642929021 dropped:68 overruns:14642929021 frame:0
      7) Thu Oct 12 21:53:37 SGT 2017 - RX packets:18522523560 errors:14647108982 dropped:68 overruns:14647108982 frame:0
      8) Thu Oct 12 21:53:42 SGT 2017 - RX packets:18525185869 errors:14651577286 dropped:68 overruns:14651577286 frame:0
      9) Thu Oct 12 21:53:47 SGT 2017 - RX packets:18527947266 errors:14655961847 dropped:68 overruns:14655961847 frame:0
      10) Thu Oct 12 21:53:52 SGT 2017 - RX packets:18530703288 errors:14659988398 dropped:68 overruns:14659988398 frame:0


      Results using sar:




      :~# sar -n EDEV 5 3| egrep "(ens4f1|IFACE)"

      11:17:43 PM IFACE rxerr/s txerr/s coll/s rxdrop/s txdrop/s txcarr/s rxfram/s rxfifo/s txfifo/s
      11:17:48 PM ens4f1 360809.40 0.00 0.00 0.00 0.00 0.00 0.00 360809.40 0.00
      11:17:53 PM ens4f1 382500.40 0.00 0.00 0.00 0.00
      0.00 0.00 382500.40 0.00
      11:17:58 PM ens4f1 353717.00 0.00 0.00 0.00 0.00
      0.00 0.00 353717.00 0.00
      Average: ens4f1 365675.60 0.00 0.00 0.00 0.00 0.00 0.00 365675.60 0.00



      I've also tuned a few SCTP specific parameters, however without results as well:



      sysctl -w net.core.rmem_max=900000000
      sysctl -w net.core.wmem_max=900000000

      sysctl -w net.sctp.sctp_mem="2100000000 2100000000 2100000000"
      sysctl -w net.sctp.sctp_rmem="2100000000 2100000000 2100000000"
      sysctl -w net.sctp.sctp_wmem="2100000000 2100000000 2100000000"

      sysctl -w net.ipv4.udp_mem="5000000000 5000000000 5000000000"
      sysctl -w net.ipv4.udp_mem="10000000000 10000000000 10000000000"

      for i in `seq 1 10`;do echo "$i) `date`" - $(ifconfig ens4f1| egrep "RX"| egrep overruns;sleep 5);done
      1) Sat Oct 14 21:55:55 SGT 2017 - RX packets:84379103241 errors:56372972367 dropped:58 overruns:56372972367 frame:0
      2) Sat Oct 14 21:56:00 SGT 2017 - RX packets:84381451420 errors:56377777944 dropped:58 overruns:56377777944 frame:0
      3) Sat Oct 14 21:56:05 SGT 2017 - RX packets:84383737427 errors:56382434478 dropped:58 overruns:56382434478 frame:0
      4) Sat Oct 14 21:56:10 SGT 2017 - RX packets:84386524128 errors:56386618268 dropped:58 overruns:56386618268 frame:0
      5) Sat Oct 14 21:56:15 SGT 2017 - RX packets:84389578203 errors:56390512483 dropped:58 overruns:56390512483 frame:0
      6) Sat Oct 14 21:56:20 SGT 2017 - RX packets:84392673120 errors:56394472475 dropped:58 overruns:56394472475 frame:0
      7) Sat Oct 14 21:56:25 SGT 2017 - RX packets:84395714973 errors:56398573221 dropped:58 overruns:56398573221 frame:0
      8) Sat Oct 14 21:56:30 SGT 2017 - RX packets:84398951451 errors:56402297479 dropped:58 overruns:56402297479 frame:0
      9) Sat Oct 14 21:56:35 SGT 2017 - RX packets:84401039177 errors:56406013473 dropped:58 overruns:56406013473 frame:0
      10) Sat Oct 14 21:56:40 SGT 2017 - RX packets:84403558097 errors:56410804379 dropped:58 overruns:56410804379 frame:0








      share|improve this question











      share|improve this question




      share|improve this question










      asked Oct 15 '17 at 8:04









      Traiano Welcome

      11




      11

























          active

          oldest

          votes











          Your Answer







          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "106"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          convertImagesToLinks: false,
          noModals: false,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );













           

          draft saved


          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f398208%2froot-causing-and-fixing-nic-buffer-overruns-for-10gb-interfaces-on-linux-sctp%23new-answer', 'question_page');

          );

          Post as a guest



































          active

          oldest

          votes













          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes















           

          draft saved


          draft discarded















































           


          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f398208%2froot-causing-and-fixing-nic-buffer-overruns-for-10gb-interfaces-on-linux-sctp%23new-answer', 'question_page');

          );

          Post as a guest













































































          Popular posts from this blog

          How to check contact read email or not when send email to Individual?

          Displaying single band from multi-band raster using QGIS

          How many registers does an x86_64 CPU actually have?