Linux network troubleshooting and debugging

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
62
down vote

favorite
72












From time to time Linux and Unix users faced with various network problems. Many of these problems are presented here and at some others troubleshooting forums, but they are very concrete and contains a lot of additional technical information, and sometimes it's rather difficult to understand the main point and the real reason of buggy system behavior.



By asking this question, my intention is to start a community wiki page which allows generalizing our network troubleshooting and debugging experience. I hope the Linux and Unix users could easier recognize and solve("divide and conquer") their network problems using this page.



The parent of this page should be Best practise to diagnose problems. But here we should focus on troubleshooting the network problems from user- and kernel-space.



I suppose, if you:



  1. Share the information about using some great network diagnostic tool with concrete usage examples and examples of network bugs, which they help to catch.

  2. Share the link to the great network tutorial connected with this subject

  3. Tell about a general method or recipe which allows to tackle some class of network problems

  4. Share information about your tool-set for network debugging and troubleshooting

it would perfectly fits for this topic.




I'll begin from sharing the link to varios diagnostic tools and 12-years old simple tutorial. Also archlinux tutorial seem to have actual information about our subject. And for diving into linux networking we definetely need to visit Linux Networking-HOWTO.










share|improve this question























  • This Q&A has one other thing to consider, 2 machines on the network configured with the same IP address: unix.stackexchange.com/questions/85887/….
    – slm♦
    Oct 12 '13 at 1:41










  • Another useful network troubleshooting guide: cisco.com/en/US/docs/internetworking/troubleshooting/guide/…
    – Ryne Everett
    Jul 19 '16 at 1:31














up vote
62
down vote

favorite
72












From time to time Linux and Unix users faced with various network problems. Many of these problems are presented here and at some others troubleshooting forums, but they are very concrete and contains a lot of additional technical information, and sometimes it's rather difficult to understand the main point and the real reason of buggy system behavior.



By asking this question, my intention is to start a community wiki page which allows generalizing our network troubleshooting and debugging experience. I hope the Linux and Unix users could easier recognize and solve("divide and conquer") their network problems using this page.



The parent of this page should be Best practise to diagnose problems. But here we should focus on troubleshooting the network problems from user- and kernel-space.



I suppose, if you:



  1. Share the information about using some great network diagnostic tool with concrete usage examples and examples of network bugs, which they help to catch.

  2. Share the link to the great network tutorial connected with this subject

  3. Tell about a general method or recipe which allows to tackle some class of network problems

  4. Share information about your tool-set for network debugging and troubleshooting

it would perfectly fits for this topic.




I'll begin from sharing the link to varios diagnostic tools and 12-years old simple tutorial. Also archlinux tutorial seem to have actual information about our subject. And for diving into linux networking we definetely need to visit Linux Networking-HOWTO.










share|improve this question























  • This Q&A has one other thing to consider, 2 machines on the network configured with the same IP address: unix.stackexchange.com/questions/85887/….
    – slm♦
    Oct 12 '13 at 1:41










  • Another useful network troubleshooting guide: cisco.com/en/US/docs/internetworking/troubleshooting/guide/…
    – Ryne Everett
    Jul 19 '16 at 1:31












up vote
62
down vote

favorite
72









up vote
62
down vote

favorite
72






72





From time to time Linux and Unix users faced with various network problems. Many of these problems are presented here and at some others troubleshooting forums, but they are very concrete and contains a lot of additional technical information, and sometimes it's rather difficult to understand the main point and the real reason of buggy system behavior.



By asking this question, my intention is to start a community wiki page which allows generalizing our network troubleshooting and debugging experience. I hope the Linux and Unix users could easier recognize and solve("divide and conquer") their network problems using this page.



The parent of this page should be Best practise to diagnose problems. But here we should focus on troubleshooting the network problems from user- and kernel-space.



I suppose, if you:



  1. Share the information about using some great network diagnostic tool with concrete usage examples and examples of network bugs, which they help to catch.

  2. Share the link to the great network tutorial connected with this subject

  3. Tell about a general method or recipe which allows to tackle some class of network problems

  4. Share information about your tool-set for network debugging and troubleshooting

it would perfectly fits for this topic.




I'll begin from sharing the link to varios diagnostic tools and 12-years old simple tutorial. Also archlinux tutorial seem to have actual information about our subject. And for diving into linux networking we definetely need to visit Linux Networking-HOWTO.










share|improve this question















From time to time Linux and Unix users faced with various network problems. Many of these problems are presented here and at some others troubleshooting forums, but they are very concrete and contains a lot of additional technical information, and sometimes it's rather difficult to understand the main point and the real reason of buggy system behavior.



By asking this question, my intention is to start a community wiki page which allows generalizing our network troubleshooting and debugging experience. I hope the Linux and Unix users could easier recognize and solve("divide and conquer") their network problems using this page.



The parent of this page should be Best practise to diagnose problems. But here we should focus on troubleshooting the network problems from user- and kernel-space.



I suppose, if you:



  1. Share the information about using some great network diagnostic tool with concrete usage examples and examples of network bugs, which they help to catch.

  2. Share the link to the great network tutorial connected with this subject

  3. Tell about a general method or recipe which allows to tackle some class of network problems

  4. Share information about your tool-set for network debugging and troubleshooting

it would perfectly fits for this topic.




I'll begin from sharing the link to varios diagnostic tools and 12-years old simple tutorial. Also archlinux tutorial seem to have actual information about our subject. And for diving into linux networking we definetely need to visit Linux Networking-HOWTO.







linux networking debugging troubleshooting






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Aug 10 at 17:50









Community♦

1




1










asked Oct 6 '12 at 13:55









dr.

1,3611109




1,3611109











  • This Q&A has one other thing to consider, 2 machines on the network configured with the same IP address: unix.stackexchange.com/questions/85887/….
    – slm♦
    Oct 12 '13 at 1:41










  • Another useful network troubleshooting guide: cisco.com/en/US/docs/internetworking/troubleshooting/guide/…
    – Ryne Everett
    Jul 19 '16 at 1:31
















  • This Q&A has one other thing to consider, 2 machines on the network configured with the same IP address: unix.stackexchange.com/questions/85887/….
    – slm♦
    Oct 12 '13 at 1:41










  • Another useful network troubleshooting guide: cisco.com/en/US/docs/internetworking/troubleshooting/guide/…
    – Ryne Everett
    Jul 19 '16 at 1:31















This Q&A has one other thing to consider, 2 machines on the network configured with the same IP address: unix.stackexchange.com/questions/85887/….
– slm♦
Oct 12 '13 at 1:41




This Q&A has one other thing to consider, 2 machines on the network configured with the same IP address: unix.stackexchange.com/questions/85887/….
– slm♦
Oct 12 '13 at 1:41












Another useful network troubleshooting guide: cisco.com/en/US/docs/internetworking/troubleshooting/guide/…
– Ryne Everett
Jul 19 '16 at 1:31




Another useful network troubleshooting guide: cisco.com/en/US/docs/internetworking/troubleshooting/guide/…
– Ryne Everett
Jul 19 '16 at 1:31










3 Answers
3






active

oldest

votes

















up vote
93
down vote













I think, general principles of network troubleshooting are:



  1. Find out at what level of TCP/IP stack(or some other stack) occurs the problem.

  2. Understand what is the correct system behavior, and what is deviation from normal system state

  3. Try to express the problem in one sentence or in several words

  4. Using obtained information from buggy system, your own experience and experience of other people(google, various forum, etc.), try to solve the problem until success(or failure)

  5. If you fail, ask other people about help or some advice

As for me, I usually obtain all required information using all needed tools, and try to match this information to my experience. Deciding what level of network stack contains the bug helps to cut off unlikely variants. Using experience of other people helps to solve the problems quickly, but often it leads to situation, that I can solve some problem without its understanding and if this problem occurs again, it's impossible for me to tackle it again without the Internet.



And in general, I don't know how I solve network problems. It seems that there is some magic function in my brain named SolveNetworkProblem(information_about_system_state, my_experience, people_experience), which could sometimes return exactly the right answer, and also could sometimes fail(like here TCP dies on a Linux laptop).



I usually use utils from this set for network debugging:




  • ifconfig (or ip link, ip addr) - for obtaining information about network interfaces


  • ping - for validating, if target host is accessible from my machine. ping is also could be used for basic DNS diagnostics - we could ping host by IP-address or by its hostname and then decide if DNS works at all. And then traceroute or tracepath or mtr to look what's going on on the way there.


  • dig - diagnose everything DNS


  • dmesg | less or dmesg | tail or dmesg | grep -i error - for understanding what the Linux kernel thinks about some trouble.


  • netstat -antp + | grep smth - my most popular usage of netstat command, which shows information about TCP connections. Often I perform some filtering using grep. See also the new ss command (from iproute2 the new standard suite of Linux networking tools) and lsof as in lsof -ai tcp -c some-cmd.


  • telnet <host> <port> - is very useful for communicating with various TCP-services(e.g. on SMTP, HTTP protocols), also we could check general opportunity to connect to some TCP port.


  • iptables-save (on Linux) - to dump the full iptables tables


  • ethtool - get all the network interface card parameters (status of the link, speed, offload parameters...)


  • socat - the swiss army tool to test all network protocols (UDP, multicast, SCTP...). Especially useful (more so than telnet) with a few -d options.


  • iperf - to test bandwidth availability


  • openssl (s_client, ocsp, x509...) to debug all SSL/TLS/PKI issues.


  • wireshark - the powerful tool for capturing and analyzing network traffic, which allows you to analyze and catch many network bugs.


  • iftop - show big users on the network/router.


  • iptstate (on Linux) - current view of the firewall's connection tracking.


  • arp (or the new (Linux) ip neigh) - show the ARP-table status.


  • route or the newer (on Linux) ip route - show the routing table status.


  • strace (or truss, dtrace or tusc depending on the system) - is useful tool which shows what system calls does the problem process, it also shows error codes(errno) when system calls fails. This information often says enough for understanding the system behavior and solving a problem. Alternatively, using breakpoints on some networking functions in gdb can let you find out when they are made and with which arguments.

  • to investigate firewall issues on Linux: iptables -nvL shows how many packets are matched by each rule (iptables -Z to zero the counters). The LOG target inserted in the firewall chains is useful to see which packets reach them and how they have already been transformed when they get there. To get further NFLOG (associated with ulogd) will log the full packet.





share|improve this answer






















  • Geez, talk about thorough!
    – mVChr
    Apr 1 '17 at 1:02






  • 3




    I'd add nmap. The profile of open ports on a machine can quickly give you hints as to whether you are looking at a Linux or Windows server, for example.
    – Adam Monsen
    Apr 21 '17 at 18:54






  • 2




    I'd add tcpdump. As its the standard packet analyzer for TCP.
    – jhvaras
    May 23 at 13:39

















up vote
12
down vote













A surprising number of "network problems" boil down to DNS problems of one kind or another. Initial troubleshooting should use ping -n w.x.y.z in order to leave out DNS resolution of a hostname, and just check IP connectivity. After that, use route -n to check the default IP route without DNS resolution.



After verifying IP connectivity, and routing, nslookup, host and dig can yield information. Remember that "locking up" can indicate that DNS timeouts are occuring.



Don't forget to check existence and contents of /etc/resolv.conf. DHCP clients change that file with every lease, and sometimes they get it wrong, or if disk space is tight, an update might not happen.






share|improve this answer



























    up vote
    7
    down vote













    Cabling problems can exist. If you have access to the hardware, ensure that the cables are all plugged in and mechanically engaged. If you can see routers or ethernet interfaces, ensure that the link lights are on.



    Remotely, you have to depend on ethtool and mii-tool.



    [root@flask ~]# ethtool eth0
    Settings for eth0:
    Supported ports: [ TP MII ]
    Supported link modes: 10baseT/Half 10baseT/Full
    100baseT/Half 100baseT/Full
    Supported pause frame use: No
    Supports auto-negotiation: Yes
    Advertised link modes: 10baseT/Half 10baseT/Full
    100baseT/Half 100baseT/Full
    Advertised pause frame use: Symmetric
    Advertised auto-negotiation: Yes
    Speed: 10Mb/s
    Duplex: Half
    Port: MII
    PHYAD: 24
    Transceiver: internal
    Auto-negotiation: on
    Supports Wake-on: g
    Wake-on: d
    Current message level: 0x00000001 (1)
    drv
    Link detected: yes


    "Link detected: yes" is good, but 10Mb/s and Half duplex are not good, as the NIC on that computer can do better. I need to figure out if the NIC is goofed up or the cable is. Another computer plugged into the same router says 100Mb/s, Full duplex.






    share|improve this answer




















      Your Answer







      StackExchange.ready(function()
      var channelOptions =
      tags: "".split(" "),
      id: "106"
      ;
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function()
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled)
      StackExchange.using("snippets", function()
      createEditor();
      );

      else
      createEditor();

      );

      function createEditor()
      StackExchange.prepareEditor(
      heartbeatType: 'answer',
      convertImagesToLinks: false,
      noModals: false,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: null,
      bindNavPrevention: true,
      postfix: "",
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      );



      );













       

      draft saved


      draft discarded


















      StackExchange.ready(
      function ()
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f50098%2flinux-network-troubleshooting-and-debugging%23new-answer', 'question_page');

      );

      Post as a guest






























      3 Answers
      3






      active

      oldest

      votes








      3 Answers
      3






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes








      up vote
      93
      down vote













      I think, general principles of network troubleshooting are:



      1. Find out at what level of TCP/IP stack(or some other stack) occurs the problem.

      2. Understand what is the correct system behavior, and what is deviation from normal system state

      3. Try to express the problem in one sentence or in several words

      4. Using obtained information from buggy system, your own experience and experience of other people(google, various forum, etc.), try to solve the problem until success(or failure)

      5. If you fail, ask other people about help or some advice

      As for me, I usually obtain all required information using all needed tools, and try to match this information to my experience. Deciding what level of network stack contains the bug helps to cut off unlikely variants. Using experience of other people helps to solve the problems quickly, but often it leads to situation, that I can solve some problem without its understanding and if this problem occurs again, it's impossible for me to tackle it again without the Internet.



      And in general, I don't know how I solve network problems. It seems that there is some magic function in my brain named SolveNetworkProblem(information_about_system_state, my_experience, people_experience), which could sometimes return exactly the right answer, and also could sometimes fail(like here TCP dies on a Linux laptop).



      I usually use utils from this set for network debugging:




      • ifconfig (or ip link, ip addr) - for obtaining information about network interfaces


      • ping - for validating, if target host is accessible from my machine. ping is also could be used for basic DNS diagnostics - we could ping host by IP-address or by its hostname and then decide if DNS works at all. And then traceroute or tracepath or mtr to look what's going on on the way there.


      • dig - diagnose everything DNS


      • dmesg | less or dmesg | tail or dmesg | grep -i error - for understanding what the Linux kernel thinks about some trouble.


      • netstat -antp + | grep smth - my most popular usage of netstat command, which shows information about TCP connections. Often I perform some filtering using grep. See also the new ss command (from iproute2 the new standard suite of Linux networking tools) and lsof as in lsof -ai tcp -c some-cmd.


      • telnet <host> <port> - is very useful for communicating with various TCP-services(e.g. on SMTP, HTTP protocols), also we could check general opportunity to connect to some TCP port.


      • iptables-save (on Linux) - to dump the full iptables tables


      • ethtool - get all the network interface card parameters (status of the link, speed, offload parameters...)


      • socat - the swiss army tool to test all network protocols (UDP, multicast, SCTP...). Especially useful (more so than telnet) with a few -d options.


      • iperf - to test bandwidth availability


      • openssl (s_client, ocsp, x509...) to debug all SSL/TLS/PKI issues.


      • wireshark - the powerful tool for capturing and analyzing network traffic, which allows you to analyze and catch many network bugs.


      • iftop - show big users on the network/router.


      • iptstate (on Linux) - current view of the firewall's connection tracking.


      • arp (or the new (Linux) ip neigh) - show the ARP-table status.


      • route or the newer (on Linux) ip route - show the routing table status.


      • strace (or truss, dtrace or tusc depending on the system) - is useful tool which shows what system calls does the problem process, it also shows error codes(errno) when system calls fails. This information often says enough for understanding the system behavior and solving a problem. Alternatively, using breakpoints on some networking functions in gdb can let you find out when they are made and with which arguments.

      • to investigate firewall issues on Linux: iptables -nvL shows how many packets are matched by each rule (iptables -Z to zero the counters). The LOG target inserted in the firewall chains is useful to see which packets reach them and how they have already been transformed when they get there. To get further NFLOG (associated with ulogd) will log the full packet.





      share|improve this answer






















      • Geez, talk about thorough!
        – mVChr
        Apr 1 '17 at 1:02






      • 3




        I'd add nmap. The profile of open ports on a machine can quickly give you hints as to whether you are looking at a Linux or Windows server, for example.
        – Adam Monsen
        Apr 21 '17 at 18:54






      • 2




        I'd add tcpdump. As its the standard packet analyzer for TCP.
        – jhvaras
        May 23 at 13:39














      up vote
      93
      down vote













      I think, general principles of network troubleshooting are:



      1. Find out at what level of TCP/IP stack(or some other stack) occurs the problem.

      2. Understand what is the correct system behavior, and what is deviation from normal system state

      3. Try to express the problem in one sentence or in several words

      4. Using obtained information from buggy system, your own experience and experience of other people(google, various forum, etc.), try to solve the problem until success(or failure)

      5. If you fail, ask other people about help or some advice

      As for me, I usually obtain all required information using all needed tools, and try to match this information to my experience. Deciding what level of network stack contains the bug helps to cut off unlikely variants. Using experience of other people helps to solve the problems quickly, but often it leads to situation, that I can solve some problem without its understanding and if this problem occurs again, it's impossible for me to tackle it again without the Internet.



      And in general, I don't know how I solve network problems. It seems that there is some magic function in my brain named SolveNetworkProblem(information_about_system_state, my_experience, people_experience), which could sometimes return exactly the right answer, and also could sometimes fail(like here TCP dies on a Linux laptop).



      I usually use utils from this set for network debugging:




      • ifconfig (or ip link, ip addr) - for obtaining information about network interfaces


      • ping - for validating, if target host is accessible from my machine. ping is also could be used for basic DNS diagnostics - we could ping host by IP-address or by its hostname and then decide if DNS works at all. And then traceroute or tracepath or mtr to look what's going on on the way there.


      • dig - diagnose everything DNS


      • dmesg | less or dmesg | tail or dmesg | grep -i error - for understanding what the Linux kernel thinks about some trouble.


      • netstat -antp + | grep smth - my most popular usage of netstat command, which shows information about TCP connections. Often I perform some filtering using grep. See also the new ss command (from iproute2 the new standard suite of Linux networking tools) and lsof as in lsof -ai tcp -c some-cmd.


      • telnet <host> <port> - is very useful for communicating with various TCP-services(e.g. on SMTP, HTTP protocols), also we could check general opportunity to connect to some TCP port.


      • iptables-save (on Linux) - to dump the full iptables tables


      • ethtool - get all the network interface card parameters (status of the link, speed, offload parameters...)


      • socat - the swiss army tool to test all network protocols (UDP, multicast, SCTP...). Especially useful (more so than telnet) with a few -d options.


      • iperf - to test bandwidth availability


      • openssl (s_client, ocsp, x509...) to debug all SSL/TLS/PKI issues.


      • wireshark - the powerful tool for capturing and analyzing network traffic, which allows you to analyze and catch many network bugs.


      • iftop - show big users on the network/router.


      • iptstate (on Linux) - current view of the firewall's connection tracking.


      • arp (or the new (Linux) ip neigh) - show the ARP-table status.


      • route or the newer (on Linux) ip route - show the routing table status.


      • strace (or truss, dtrace or tusc depending on the system) - is useful tool which shows what system calls does the problem process, it also shows error codes(errno) when system calls fails. This information often says enough for understanding the system behavior and solving a problem. Alternatively, using breakpoints on some networking functions in gdb can let you find out when they are made and with which arguments.

      • to investigate firewall issues on Linux: iptables -nvL shows how many packets are matched by each rule (iptables -Z to zero the counters). The LOG target inserted in the firewall chains is useful to see which packets reach them and how they have already been transformed when they get there. To get further NFLOG (associated with ulogd) will log the full packet.





      share|improve this answer






















      • Geez, talk about thorough!
        – mVChr
        Apr 1 '17 at 1:02






      • 3




        I'd add nmap. The profile of open ports on a machine can quickly give you hints as to whether you are looking at a Linux or Windows server, for example.
        – Adam Monsen
        Apr 21 '17 at 18:54






      • 2




        I'd add tcpdump. As its the standard packet analyzer for TCP.
        – jhvaras
        May 23 at 13:39












      up vote
      93
      down vote










      up vote
      93
      down vote









      I think, general principles of network troubleshooting are:



      1. Find out at what level of TCP/IP stack(or some other stack) occurs the problem.

      2. Understand what is the correct system behavior, and what is deviation from normal system state

      3. Try to express the problem in one sentence or in several words

      4. Using obtained information from buggy system, your own experience and experience of other people(google, various forum, etc.), try to solve the problem until success(or failure)

      5. If you fail, ask other people about help or some advice

      As for me, I usually obtain all required information using all needed tools, and try to match this information to my experience. Deciding what level of network stack contains the bug helps to cut off unlikely variants. Using experience of other people helps to solve the problems quickly, but often it leads to situation, that I can solve some problem without its understanding and if this problem occurs again, it's impossible for me to tackle it again without the Internet.



      And in general, I don't know how I solve network problems. It seems that there is some magic function in my brain named SolveNetworkProblem(information_about_system_state, my_experience, people_experience), which could sometimes return exactly the right answer, and also could sometimes fail(like here TCP dies on a Linux laptop).



      I usually use utils from this set for network debugging:




      • ifconfig (or ip link, ip addr) - for obtaining information about network interfaces


      • ping - for validating, if target host is accessible from my machine. ping is also could be used for basic DNS diagnostics - we could ping host by IP-address or by its hostname and then decide if DNS works at all. And then traceroute or tracepath or mtr to look what's going on on the way there.


      • dig - diagnose everything DNS


      • dmesg | less or dmesg | tail or dmesg | grep -i error - for understanding what the Linux kernel thinks about some trouble.


      • netstat -antp + | grep smth - my most popular usage of netstat command, which shows information about TCP connections. Often I perform some filtering using grep. See also the new ss command (from iproute2 the new standard suite of Linux networking tools) and lsof as in lsof -ai tcp -c some-cmd.


      • telnet <host> <port> - is very useful for communicating with various TCP-services(e.g. on SMTP, HTTP protocols), also we could check general opportunity to connect to some TCP port.


      • iptables-save (on Linux) - to dump the full iptables tables


      • ethtool - get all the network interface card parameters (status of the link, speed, offload parameters...)


      • socat - the swiss army tool to test all network protocols (UDP, multicast, SCTP...). Especially useful (more so than telnet) with a few -d options.


      • iperf - to test bandwidth availability


      • openssl (s_client, ocsp, x509...) to debug all SSL/TLS/PKI issues.


      • wireshark - the powerful tool for capturing and analyzing network traffic, which allows you to analyze and catch many network bugs.


      • iftop - show big users on the network/router.


      • iptstate (on Linux) - current view of the firewall's connection tracking.


      • arp (or the new (Linux) ip neigh) - show the ARP-table status.


      • route or the newer (on Linux) ip route - show the routing table status.


      • strace (or truss, dtrace or tusc depending on the system) - is useful tool which shows what system calls does the problem process, it also shows error codes(errno) when system calls fails. This information often says enough for understanding the system behavior and solving a problem. Alternatively, using breakpoints on some networking functions in gdb can let you find out when they are made and with which arguments.

      • to investigate firewall issues on Linux: iptables -nvL shows how many packets are matched by each rule (iptables -Z to zero the counters). The LOG target inserted in the firewall chains is useful to see which packets reach them and how they have already been transformed when they get there. To get further NFLOG (associated with ulogd) will log the full packet.





      share|improve this answer














      I think, general principles of network troubleshooting are:



      1. Find out at what level of TCP/IP stack(or some other stack) occurs the problem.

      2. Understand what is the correct system behavior, and what is deviation from normal system state

      3. Try to express the problem in one sentence or in several words

      4. Using obtained information from buggy system, your own experience and experience of other people(google, various forum, etc.), try to solve the problem until success(or failure)

      5. If you fail, ask other people about help or some advice

      As for me, I usually obtain all required information using all needed tools, and try to match this information to my experience. Deciding what level of network stack contains the bug helps to cut off unlikely variants. Using experience of other people helps to solve the problems quickly, but often it leads to situation, that I can solve some problem without its understanding and if this problem occurs again, it's impossible for me to tackle it again without the Internet.



      And in general, I don't know how I solve network problems. It seems that there is some magic function in my brain named SolveNetworkProblem(information_about_system_state, my_experience, people_experience), which could sometimes return exactly the right answer, and also could sometimes fail(like here TCP dies on a Linux laptop).



      I usually use utils from this set for network debugging:




      • ifconfig (or ip link, ip addr) - for obtaining information about network interfaces


      • ping - for validating, if target host is accessible from my machine. ping is also could be used for basic DNS diagnostics - we could ping host by IP-address or by its hostname and then decide if DNS works at all. And then traceroute or tracepath or mtr to look what's going on on the way there.


      • dig - diagnose everything DNS


      • dmesg | less or dmesg | tail or dmesg | grep -i error - for understanding what the Linux kernel thinks about some trouble.


      • netstat -antp + | grep smth - my most popular usage of netstat command, which shows information about TCP connections. Often I perform some filtering using grep. See also the new ss command (from iproute2 the new standard suite of Linux networking tools) and lsof as in lsof -ai tcp -c some-cmd.


      • telnet <host> <port> - is very useful for communicating with various TCP-services(e.g. on SMTP, HTTP protocols), also we could check general opportunity to connect to some TCP port.


      • iptables-save (on Linux) - to dump the full iptables tables


      • ethtool - get all the network interface card parameters (status of the link, speed, offload parameters...)


      • socat - the swiss army tool to test all network protocols (UDP, multicast, SCTP...). Especially useful (more so than telnet) with a few -d options.


      • iperf - to test bandwidth availability


      • openssl (s_client, ocsp, x509...) to debug all SSL/TLS/PKI issues.


      • wireshark - the powerful tool for capturing and analyzing network traffic, which allows you to analyze and catch many network bugs.


      • iftop - show big users on the network/router.


      • iptstate (on Linux) - current view of the firewall's connection tracking.


      • arp (or the new (Linux) ip neigh) - show the ARP-table status.


      • route or the newer (on Linux) ip route - show the routing table status.


      • strace (or truss, dtrace or tusc depending on the system) - is useful tool which shows what system calls does the problem process, it also shows error codes(errno) when system calls fails. This information often says enough for understanding the system behavior and solving a problem. Alternatively, using breakpoints on some networking functions in gdb can let you find out when they are made and with which arguments.

      • to investigate firewall issues on Linux: iptables -nvL shows how many packets are matched by each rule (iptables -Z to zero the counters). The LOG target inserted in the firewall chains is useful to see which packets reach them and how they have already been transformed when they get there. To get further NFLOG (associated with ulogd) will log the full packet.






      share|improve this answer














      share|improve this answer



      share|improve this answer








      edited Feb 4 at 16:20









      Jeff Schaller

      32.4k849110




      32.4k849110










      answered Oct 6 '12 at 13:55









      dr.

      1,3611109




      1,3611109











      • Geez, talk about thorough!
        – mVChr
        Apr 1 '17 at 1:02






      • 3




        I'd add nmap. The profile of open ports on a machine can quickly give you hints as to whether you are looking at a Linux or Windows server, for example.
        – Adam Monsen
        Apr 21 '17 at 18:54






      • 2




        I'd add tcpdump. As its the standard packet analyzer for TCP.
        – jhvaras
        May 23 at 13:39
















      • Geez, talk about thorough!
        – mVChr
        Apr 1 '17 at 1:02






      • 3




        I'd add nmap. The profile of open ports on a machine can quickly give you hints as to whether you are looking at a Linux or Windows server, for example.
        – Adam Monsen
        Apr 21 '17 at 18:54






      • 2




        I'd add tcpdump. As its the standard packet analyzer for TCP.
        – jhvaras
        May 23 at 13:39















      Geez, talk about thorough!
      – mVChr
      Apr 1 '17 at 1:02




      Geez, talk about thorough!
      – mVChr
      Apr 1 '17 at 1:02




      3




      3




      I'd add nmap. The profile of open ports on a machine can quickly give you hints as to whether you are looking at a Linux or Windows server, for example.
      – Adam Monsen
      Apr 21 '17 at 18:54




      I'd add nmap. The profile of open ports on a machine can quickly give you hints as to whether you are looking at a Linux or Windows server, for example.
      – Adam Monsen
      Apr 21 '17 at 18:54




      2




      2




      I'd add tcpdump. As its the standard packet analyzer for TCP.
      – jhvaras
      May 23 at 13:39




      I'd add tcpdump. As its the standard packet analyzer for TCP.
      – jhvaras
      May 23 at 13:39












      up vote
      12
      down vote













      A surprising number of "network problems" boil down to DNS problems of one kind or another. Initial troubleshooting should use ping -n w.x.y.z in order to leave out DNS resolution of a hostname, and just check IP connectivity. After that, use route -n to check the default IP route without DNS resolution.



      After verifying IP connectivity, and routing, nslookup, host and dig can yield information. Remember that "locking up" can indicate that DNS timeouts are occuring.



      Don't forget to check existence and contents of /etc/resolv.conf. DHCP clients change that file with every lease, and sometimes they get it wrong, or if disk space is tight, an update might not happen.






      share|improve this answer
























        up vote
        12
        down vote













        A surprising number of "network problems" boil down to DNS problems of one kind or another. Initial troubleshooting should use ping -n w.x.y.z in order to leave out DNS resolution of a hostname, and just check IP connectivity. After that, use route -n to check the default IP route without DNS resolution.



        After verifying IP connectivity, and routing, nslookup, host and dig can yield information. Remember that "locking up" can indicate that DNS timeouts are occuring.



        Don't forget to check existence and contents of /etc/resolv.conf. DHCP clients change that file with every lease, and sometimes they get it wrong, or if disk space is tight, an update might not happen.






        share|improve this answer






















          up vote
          12
          down vote










          up vote
          12
          down vote









          A surprising number of "network problems" boil down to DNS problems of one kind or another. Initial troubleshooting should use ping -n w.x.y.z in order to leave out DNS resolution of a hostname, and just check IP connectivity. After that, use route -n to check the default IP route without DNS resolution.



          After verifying IP connectivity, and routing, nslookup, host and dig can yield information. Remember that "locking up" can indicate that DNS timeouts are occuring.



          Don't forget to check existence and contents of /etc/resolv.conf. DHCP clients change that file with every lease, and sometimes they get it wrong, or if disk space is tight, an update might not happen.






          share|improve this answer












          A surprising number of "network problems" boil down to DNS problems of one kind or another. Initial troubleshooting should use ping -n w.x.y.z in order to leave out DNS resolution of a hostname, and just check IP connectivity. After that, use route -n to check the default IP route without DNS resolution.



          After verifying IP connectivity, and routing, nslookup, host and dig can yield information. Remember that "locking up" can indicate that DNS timeouts are occuring.



          Don't forget to check existence and contents of /etc/resolv.conf. DHCP clients change that file with every lease, and sometimes they get it wrong, or if disk space is tight, an update might not happen.







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Oct 6 '12 at 16:25









          Bruce Ediger

          33.8k565118




          33.8k565118




















              up vote
              7
              down vote













              Cabling problems can exist. If you have access to the hardware, ensure that the cables are all plugged in and mechanically engaged. If you can see routers or ethernet interfaces, ensure that the link lights are on.



              Remotely, you have to depend on ethtool and mii-tool.



              [root@flask ~]# ethtool eth0
              Settings for eth0:
              Supported ports: [ TP MII ]
              Supported link modes: 10baseT/Half 10baseT/Full
              100baseT/Half 100baseT/Full
              Supported pause frame use: No
              Supports auto-negotiation: Yes
              Advertised link modes: 10baseT/Half 10baseT/Full
              100baseT/Half 100baseT/Full
              Advertised pause frame use: Symmetric
              Advertised auto-negotiation: Yes
              Speed: 10Mb/s
              Duplex: Half
              Port: MII
              PHYAD: 24
              Transceiver: internal
              Auto-negotiation: on
              Supports Wake-on: g
              Wake-on: d
              Current message level: 0x00000001 (1)
              drv
              Link detected: yes


              "Link detected: yes" is good, but 10Mb/s and Half duplex are not good, as the NIC on that computer can do better. I need to figure out if the NIC is goofed up or the cable is. Another computer plugged into the same router says 100Mb/s, Full duplex.






              share|improve this answer
























                up vote
                7
                down vote













                Cabling problems can exist. If you have access to the hardware, ensure that the cables are all plugged in and mechanically engaged. If you can see routers or ethernet interfaces, ensure that the link lights are on.



                Remotely, you have to depend on ethtool and mii-tool.



                [root@flask ~]# ethtool eth0
                Settings for eth0:
                Supported ports: [ TP MII ]
                Supported link modes: 10baseT/Half 10baseT/Full
                100baseT/Half 100baseT/Full
                Supported pause frame use: No
                Supports auto-negotiation: Yes
                Advertised link modes: 10baseT/Half 10baseT/Full
                100baseT/Half 100baseT/Full
                Advertised pause frame use: Symmetric
                Advertised auto-negotiation: Yes
                Speed: 10Mb/s
                Duplex: Half
                Port: MII
                PHYAD: 24
                Transceiver: internal
                Auto-negotiation: on
                Supports Wake-on: g
                Wake-on: d
                Current message level: 0x00000001 (1)
                drv
                Link detected: yes


                "Link detected: yes" is good, but 10Mb/s and Half duplex are not good, as the NIC on that computer can do better. I need to figure out if the NIC is goofed up or the cable is. Another computer plugged into the same router says 100Mb/s, Full duplex.






                share|improve this answer






















                  up vote
                  7
                  down vote










                  up vote
                  7
                  down vote









                  Cabling problems can exist. If you have access to the hardware, ensure that the cables are all plugged in and mechanically engaged. If you can see routers or ethernet interfaces, ensure that the link lights are on.



                  Remotely, you have to depend on ethtool and mii-tool.



                  [root@flask ~]# ethtool eth0
                  Settings for eth0:
                  Supported ports: [ TP MII ]
                  Supported link modes: 10baseT/Half 10baseT/Full
                  100baseT/Half 100baseT/Full
                  Supported pause frame use: No
                  Supports auto-negotiation: Yes
                  Advertised link modes: 10baseT/Half 10baseT/Full
                  100baseT/Half 100baseT/Full
                  Advertised pause frame use: Symmetric
                  Advertised auto-negotiation: Yes
                  Speed: 10Mb/s
                  Duplex: Half
                  Port: MII
                  PHYAD: 24
                  Transceiver: internal
                  Auto-negotiation: on
                  Supports Wake-on: g
                  Wake-on: d
                  Current message level: 0x00000001 (1)
                  drv
                  Link detected: yes


                  "Link detected: yes" is good, but 10Mb/s and Half duplex are not good, as the NIC on that computer can do better. I need to figure out if the NIC is goofed up or the cable is. Another computer plugged into the same router says 100Mb/s, Full duplex.






                  share|improve this answer












                  Cabling problems can exist. If you have access to the hardware, ensure that the cables are all plugged in and mechanically engaged. If you can see routers or ethernet interfaces, ensure that the link lights are on.



                  Remotely, you have to depend on ethtool and mii-tool.



                  [root@flask ~]# ethtool eth0
                  Settings for eth0:
                  Supported ports: [ TP MII ]
                  Supported link modes: 10baseT/Half 10baseT/Full
                  100baseT/Half 100baseT/Full
                  Supported pause frame use: No
                  Supports auto-negotiation: Yes
                  Advertised link modes: 10baseT/Half 10baseT/Full
                  100baseT/Half 100baseT/Full
                  Advertised pause frame use: Symmetric
                  Advertised auto-negotiation: Yes
                  Speed: 10Mb/s
                  Duplex: Half
                  Port: MII
                  PHYAD: 24
                  Transceiver: internal
                  Auto-negotiation: on
                  Supports Wake-on: g
                  Wake-on: d
                  Current message level: 0x00000001 (1)
                  drv
                  Link detected: yes


                  "Link detected: yes" is good, but 10Mb/s and Half duplex are not good, as the NIC on that computer can do better. I need to figure out if the NIC is goofed up or the cable is. Another computer plugged into the same router says 100Mb/s, Full duplex.







                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Oct 6 '12 at 23:46









                  Bruce Ediger

                  33.8k565118




                  33.8k565118



























                       

                      draft saved


                      draft discarded















































                       


                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function ()
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f50098%2flinux-network-troubleshooting-and-debugging%23new-answer', 'question_page');

                      );

                      Post as a guest













































































                      Popular posts from this blog

                      Peggy Mitchell

                      Palaiologos

                      The Forum (Inglewood, California)