Bash script -Way to ignore hung Server

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP












0















I wrote a script that runs commands on 1000+ servers in background. Sometimes the script gets hung on one of the servers. If/when a server gets hung(due to high load avg) when running a script,the command might also gets hung on that server. Is there a way to skip that host so the script can go to the next host and keep running along?.



I am highlighting two main function of my script, but no luck in giving "ConnectTimeout" and wait keywords.



exec_ssh()
{
for i in `cat $file`
do
ssh -q -o "StrictHostKeyChecking no" -o "NumberOfPasswordPrompts 0" -o ConnectTimeout=2 $i $command 2>>/dev/null &
if wait $!; then
echo "" >> /dev/null
else
echo "$i is not reachable over SSH or passwordless authentication is not setup on the server" >> /tmp/not_reachable
fi

done >/tmp/output.csv &


run_command()

export -f exec_ssh
export command
nohup bash -c exec_ssh &>>$log_file &










share|improve this question
























  • This might be a good time to learn about ssh -o BatchMode=yes (:

    – DopeGhoti
    Jan 23 at 15:22











  • Also on SO: How to skip/ignore Hung host

    – glenn jackman
    Jan 23 at 15:23















0















I wrote a script that runs commands on 1000+ servers in background. Sometimes the script gets hung on one of the servers. If/when a server gets hung(due to high load avg) when running a script,the command might also gets hung on that server. Is there a way to skip that host so the script can go to the next host and keep running along?.



I am highlighting two main function of my script, but no luck in giving "ConnectTimeout" and wait keywords.



exec_ssh()
{
for i in `cat $file`
do
ssh -q -o "StrictHostKeyChecking no" -o "NumberOfPasswordPrompts 0" -o ConnectTimeout=2 $i $command 2>>/dev/null &
if wait $!; then
echo "" >> /dev/null
else
echo "$i is not reachable over SSH or passwordless authentication is not setup on the server" >> /tmp/not_reachable
fi

done >/tmp/output.csv &


run_command()

export -f exec_ssh
export command
nohup bash -c exec_ssh &>>$log_file &










share|improve this question
























  • This might be a good time to learn about ssh -o BatchMode=yes (:

    – DopeGhoti
    Jan 23 at 15:22











  • Also on SO: How to skip/ignore Hung host

    – glenn jackman
    Jan 23 at 15:23













0












0








0








I wrote a script that runs commands on 1000+ servers in background. Sometimes the script gets hung on one of the servers. If/when a server gets hung(due to high load avg) when running a script,the command might also gets hung on that server. Is there a way to skip that host so the script can go to the next host and keep running along?.



I am highlighting two main function of my script, but no luck in giving "ConnectTimeout" and wait keywords.



exec_ssh()
{
for i in `cat $file`
do
ssh -q -o "StrictHostKeyChecking no" -o "NumberOfPasswordPrompts 0" -o ConnectTimeout=2 $i $command 2>>/dev/null &
if wait $!; then
echo "" >> /dev/null
else
echo "$i is not reachable over SSH or passwordless authentication is not setup on the server" >> /tmp/not_reachable
fi

done >/tmp/output.csv &


run_command()

export -f exec_ssh
export command
nohup bash -c exec_ssh &>>$log_file &










share|improve this question
















I wrote a script that runs commands on 1000+ servers in background. Sometimes the script gets hung on one of the servers. If/when a server gets hung(due to high load avg) when running a script,the command might also gets hung on that server. Is there a way to skip that host so the script can go to the next host and keep running along?.



I am highlighting two main function of my script, but no luck in giving "ConnectTimeout" and wait keywords.



exec_ssh()
{
for i in `cat $file`
do
ssh -q -o "StrictHostKeyChecking no" -o "NumberOfPasswordPrompts 0" -o ConnectTimeout=2 $i $command 2>>/dev/null &
if wait $!; then
echo "" >> /dev/null
else
echo "$i is not reachable over SSH or passwordless authentication is not setup on the server" >> /tmp/not_reachable
fi

done >/tmp/output.csv &


run_command()

export -f exec_ssh
export command
nohup bash -c exec_ssh &>>$log_file &







bash






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Jan 23 at 15:19









DopeGhoti

45.4k55988




45.4k55988










asked Jan 23 at 15:09









Sin15Sin15

212




212












  • This might be a good time to learn about ssh -o BatchMode=yes (:

    – DopeGhoti
    Jan 23 at 15:22











  • Also on SO: How to skip/ignore Hung host

    – glenn jackman
    Jan 23 at 15:23

















  • This might be a good time to learn about ssh -o BatchMode=yes (:

    – DopeGhoti
    Jan 23 at 15:22











  • Also on SO: How to skip/ignore Hung host

    – glenn jackman
    Jan 23 at 15:23
















This might be a good time to learn about ssh -o BatchMode=yes (:

– DopeGhoti
Jan 23 at 15:22





This might be a good time to learn about ssh -o BatchMode=yes (:

– DopeGhoti
Jan 23 at 15:22













Also on SO: How to skip/ignore Hung host

– glenn jackman
Jan 23 at 15:23





Also on SO: How to skip/ignore Hung host

– glenn jackman
Jan 23 at 15:23










2 Answers
2






active

oldest

votes


















0














Your script as written would keep running all of your remote commands concurrently, but for your use of wait which explicitly will wait for a backgrounded task to complete. In the case you describe of a high-load server, this means your ssh command is not timing out, but is simply taking a long time to complete, so the script is doing exactly what you ask it to. ConnectTimeout is moot when you are able to successfully make the ssh connection.



If you do want to use this sort of script rather than a tool designed for distributed remote execution such as Ansible, I might modify your script as follows:



exec_ssh() 
while read file; do
if ! ssh -q -o BatchMode=yes -o ConnectTimeout=2 "$i" "$command" 2>>/dev/null & then
echo "$i is not reachable via non-interactive SSH or remote command threw error - exit code $?" >> /tmp/not_reachable
fi
done < "$file" > /tmp/output.csv &


run_command()
export -f exec_ssh
export command
nohup bash -c exec_ssh &>> "$log_file" &



It also might be worth considering separating your "can I SSH to the host" test and your "can I complete the job" test:



if ssh -q -o BatchMode=yes -o ConnectTimeout=2 "$host" true; then
# connection succeeded
if ! ssh -q -o BatchMode=yes -o ConnectTimeout=2 "$host" "$command" & then
echo "Remote command threw $?"
fi
else
echo "SSH threw $?"
fi





share|improve this answer

























  • thanks, I will try this out and let you know.

    – Sin15
    Jan 23 at 16:38


















0














As your local and remote commands get more complex you're quickly going to become overwhelmed with trying to cram this all into one coherent script, and with hundreds or thousands of backgrounded processes you're likely to run into resource contention issues even with a beefy local machine.



You can get this under control with xargs -P. I typically break up tasks like this into two scripts.



local.sh



Generally this script has a single argument which is the hostname, and performs any necessary validations, pre-flight tasks, logging, etc. Eg:



#!/bin/bash
hostname=$1
# simple
cat remote.sh | ssh user@$hostname
# sudo the whole thing
cat remote.sh | ssh user@$hostname sudo
# log to files
cat remote.sh | ssh user@$hostname &> logs/$hostname.log
# or log to stdout with the hostname prefixed
cat remote.sh | ssh user@$hostname 2>&1 | sed "s/^/$hostname:/"


remote.sh



The script you want to run remotely, but now you don't have to cram it into a quoted one-liner and deal with quote-escaping hell.



The actual command



cat host_list.txt | xargs -P 16 -n 1 -I bash local.sh 


Where:




  • -P 16 will fork up to 16 sub-processes


  • -n 1 will feed exactly one argument per command


  • -I will substitute the argument in place of [not necessary here, but may be useful for constructing more complex xargs calls.

This way even if one of your local or remote scripts gets hung up you'll still have the other 15 chugging along unimpeded.






share|improve this answer






















    Your Answer








    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "106"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













    draft saved

    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f496237%2fbash-script-way-to-ignore-hung-server%23new-answer', 'question_page');

    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0














    Your script as written would keep running all of your remote commands concurrently, but for your use of wait which explicitly will wait for a backgrounded task to complete. In the case you describe of a high-load server, this means your ssh command is not timing out, but is simply taking a long time to complete, so the script is doing exactly what you ask it to. ConnectTimeout is moot when you are able to successfully make the ssh connection.



    If you do want to use this sort of script rather than a tool designed for distributed remote execution such as Ansible, I might modify your script as follows:



    exec_ssh() 
    while read file; do
    if ! ssh -q -o BatchMode=yes -o ConnectTimeout=2 "$i" "$command" 2>>/dev/null & then
    echo "$i is not reachable via non-interactive SSH or remote command threw error - exit code $?" >> /tmp/not_reachable
    fi
    done < "$file" > /tmp/output.csv &


    run_command()
    export -f exec_ssh
    export command
    nohup bash -c exec_ssh &>> "$log_file" &



    It also might be worth considering separating your "can I SSH to the host" test and your "can I complete the job" test:



    if ssh -q -o BatchMode=yes -o ConnectTimeout=2 "$host" true; then
    # connection succeeded
    if ! ssh -q -o BatchMode=yes -o ConnectTimeout=2 "$host" "$command" & then
    echo "Remote command threw $?"
    fi
    else
    echo "SSH threw $?"
    fi





    share|improve this answer

























    • thanks, I will try this out and let you know.

      – Sin15
      Jan 23 at 16:38















    0














    Your script as written would keep running all of your remote commands concurrently, but for your use of wait which explicitly will wait for a backgrounded task to complete. In the case you describe of a high-load server, this means your ssh command is not timing out, but is simply taking a long time to complete, so the script is doing exactly what you ask it to. ConnectTimeout is moot when you are able to successfully make the ssh connection.



    If you do want to use this sort of script rather than a tool designed for distributed remote execution such as Ansible, I might modify your script as follows:



    exec_ssh() 
    while read file; do
    if ! ssh -q -o BatchMode=yes -o ConnectTimeout=2 "$i" "$command" 2>>/dev/null & then
    echo "$i is not reachable via non-interactive SSH or remote command threw error - exit code $?" >> /tmp/not_reachable
    fi
    done < "$file" > /tmp/output.csv &


    run_command()
    export -f exec_ssh
    export command
    nohup bash -c exec_ssh &>> "$log_file" &



    It also might be worth considering separating your "can I SSH to the host" test and your "can I complete the job" test:



    if ssh -q -o BatchMode=yes -o ConnectTimeout=2 "$host" true; then
    # connection succeeded
    if ! ssh -q -o BatchMode=yes -o ConnectTimeout=2 "$host" "$command" & then
    echo "Remote command threw $?"
    fi
    else
    echo "SSH threw $?"
    fi





    share|improve this answer

























    • thanks, I will try this out and let you know.

      – Sin15
      Jan 23 at 16:38













    0












    0








    0







    Your script as written would keep running all of your remote commands concurrently, but for your use of wait which explicitly will wait for a backgrounded task to complete. In the case you describe of a high-load server, this means your ssh command is not timing out, but is simply taking a long time to complete, so the script is doing exactly what you ask it to. ConnectTimeout is moot when you are able to successfully make the ssh connection.



    If you do want to use this sort of script rather than a tool designed for distributed remote execution such as Ansible, I might modify your script as follows:



    exec_ssh() 
    while read file; do
    if ! ssh -q -o BatchMode=yes -o ConnectTimeout=2 "$i" "$command" 2>>/dev/null & then
    echo "$i is not reachable via non-interactive SSH or remote command threw error - exit code $?" >> /tmp/not_reachable
    fi
    done < "$file" > /tmp/output.csv &


    run_command()
    export -f exec_ssh
    export command
    nohup bash -c exec_ssh &>> "$log_file" &



    It also might be worth considering separating your "can I SSH to the host" test and your "can I complete the job" test:



    if ssh -q -o BatchMode=yes -o ConnectTimeout=2 "$host" true; then
    # connection succeeded
    if ! ssh -q -o BatchMode=yes -o ConnectTimeout=2 "$host" "$command" & then
    echo "Remote command threw $?"
    fi
    else
    echo "SSH threw $?"
    fi





    share|improve this answer















    Your script as written would keep running all of your remote commands concurrently, but for your use of wait which explicitly will wait for a backgrounded task to complete. In the case you describe of a high-load server, this means your ssh command is not timing out, but is simply taking a long time to complete, so the script is doing exactly what you ask it to. ConnectTimeout is moot when you are able to successfully make the ssh connection.



    If you do want to use this sort of script rather than a tool designed for distributed remote execution such as Ansible, I might modify your script as follows:



    exec_ssh() 
    while read file; do
    if ! ssh -q -o BatchMode=yes -o ConnectTimeout=2 "$i" "$command" 2>>/dev/null & then
    echo "$i is not reachable via non-interactive SSH or remote command threw error - exit code $?" >> /tmp/not_reachable
    fi
    done < "$file" > /tmp/output.csv &


    run_command()
    export -f exec_ssh
    export command
    nohup bash -c exec_ssh &>> "$log_file" &



    It also might be worth considering separating your "can I SSH to the host" test and your "can I complete the job" test:



    if ssh -q -o BatchMode=yes -o ConnectTimeout=2 "$host" true; then
    # connection succeeded
    if ! ssh -q -o BatchMode=yes -o ConnectTimeout=2 "$host" "$command" & then
    echo "Remote command threw $?"
    fi
    else
    echo "SSH threw $?"
    fi






    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Jan 23 at 15:36

























    answered Jan 23 at 15:28









    DopeGhotiDopeGhoti

    45.4k55988




    45.4k55988












    • thanks, I will try this out and let you know.

      – Sin15
      Jan 23 at 16:38

















    • thanks, I will try this out and let you know.

      – Sin15
      Jan 23 at 16:38
















    thanks, I will try this out and let you know.

    – Sin15
    Jan 23 at 16:38





    thanks, I will try this out and let you know.

    – Sin15
    Jan 23 at 16:38













    0














    As your local and remote commands get more complex you're quickly going to become overwhelmed with trying to cram this all into one coherent script, and with hundreds or thousands of backgrounded processes you're likely to run into resource contention issues even with a beefy local machine.



    You can get this under control with xargs -P. I typically break up tasks like this into two scripts.



    local.sh



    Generally this script has a single argument which is the hostname, and performs any necessary validations, pre-flight tasks, logging, etc. Eg:



    #!/bin/bash
    hostname=$1
    # simple
    cat remote.sh | ssh user@$hostname
    # sudo the whole thing
    cat remote.sh | ssh user@$hostname sudo
    # log to files
    cat remote.sh | ssh user@$hostname &> logs/$hostname.log
    # or log to stdout with the hostname prefixed
    cat remote.sh | ssh user@$hostname 2>&1 | sed "s/^/$hostname:/"


    remote.sh



    The script you want to run remotely, but now you don't have to cram it into a quoted one-liner and deal with quote-escaping hell.



    The actual command



    cat host_list.txt | xargs -P 16 -n 1 -I bash local.sh 


    Where:




    • -P 16 will fork up to 16 sub-processes


    • -n 1 will feed exactly one argument per command


    • -I will substitute the argument in place of [not necessary here, but may be useful for constructing more complex xargs calls.

    This way even if one of your local or remote scripts gets hung up you'll still have the other 15 chugging along unimpeded.






    share|improve this answer



























      0














      As your local and remote commands get more complex you're quickly going to become overwhelmed with trying to cram this all into one coherent script, and with hundreds or thousands of backgrounded processes you're likely to run into resource contention issues even with a beefy local machine.



      You can get this under control with xargs -P. I typically break up tasks like this into two scripts.



      local.sh



      Generally this script has a single argument which is the hostname, and performs any necessary validations, pre-flight tasks, logging, etc. Eg:



      #!/bin/bash
      hostname=$1
      # simple
      cat remote.sh | ssh user@$hostname
      # sudo the whole thing
      cat remote.sh | ssh user@$hostname sudo
      # log to files
      cat remote.sh | ssh user@$hostname &> logs/$hostname.log
      # or log to stdout with the hostname prefixed
      cat remote.sh | ssh user@$hostname 2>&1 | sed "s/^/$hostname:/"


      remote.sh



      The script you want to run remotely, but now you don't have to cram it into a quoted one-liner and deal with quote-escaping hell.



      The actual command



      cat host_list.txt | xargs -P 16 -n 1 -I bash local.sh 


      Where:




      • -P 16 will fork up to 16 sub-processes


      • -n 1 will feed exactly one argument per command


      • -I will substitute the argument in place of [not necessary here, but may be useful for constructing more complex xargs calls.

      This way even if one of your local or remote scripts gets hung up you'll still have the other 15 chugging along unimpeded.






      share|improve this answer

























        0












        0








        0







        As your local and remote commands get more complex you're quickly going to become overwhelmed with trying to cram this all into one coherent script, and with hundreds or thousands of backgrounded processes you're likely to run into resource contention issues even with a beefy local machine.



        You can get this under control with xargs -P. I typically break up tasks like this into two scripts.



        local.sh



        Generally this script has a single argument which is the hostname, and performs any necessary validations, pre-flight tasks, logging, etc. Eg:



        #!/bin/bash
        hostname=$1
        # simple
        cat remote.sh | ssh user@$hostname
        # sudo the whole thing
        cat remote.sh | ssh user@$hostname sudo
        # log to files
        cat remote.sh | ssh user@$hostname &> logs/$hostname.log
        # or log to stdout with the hostname prefixed
        cat remote.sh | ssh user@$hostname 2>&1 | sed "s/^/$hostname:/"


        remote.sh



        The script you want to run remotely, but now you don't have to cram it into a quoted one-liner and deal with quote-escaping hell.



        The actual command



        cat host_list.txt | xargs -P 16 -n 1 -I bash local.sh 


        Where:




        • -P 16 will fork up to 16 sub-processes


        • -n 1 will feed exactly one argument per command


        • -I will substitute the argument in place of [not necessary here, but may be useful for constructing more complex xargs calls.

        This way even if one of your local or remote scripts gets hung up you'll still have the other 15 chugging along unimpeded.






        share|improve this answer













        As your local and remote commands get more complex you're quickly going to become overwhelmed with trying to cram this all into one coherent script, and with hundreds or thousands of backgrounded processes you're likely to run into resource contention issues even with a beefy local machine.



        You can get this under control with xargs -P. I typically break up tasks like this into two scripts.



        local.sh



        Generally this script has a single argument which is the hostname, and performs any necessary validations, pre-flight tasks, logging, etc. Eg:



        #!/bin/bash
        hostname=$1
        # simple
        cat remote.sh | ssh user@$hostname
        # sudo the whole thing
        cat remote.sh | ssh user@$hostname sudo
        # log to files
        cat remote.sh | ssh user@$hostname &> logs/$hostname.log
        # or log to stdout with the hostname prefixed
        cat remote.sh | ssh user@$hostname 2>&1 | sed "s/^/$hostname:/"


        remote.sh



        The script you want to run remotely, but now you don't have to cram it into a quoted one-liner and deal with quote-escaping hell.



        The actual command



        cat host_list.txt | xargs -P 16 -n 1 -I bash local.sh 


        Where:




        • -P 16 will fork up to 16 sub-processes


        • -n 1 will feed exactly one argument per command


        • -I will substitute the argument in place of [not necessary here, but may be useful for constructing more complex xargs calls.

        This way even if one of your local or remote scripts gets hung up you'll still have the other 15 chugging along unimpeded.







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Jan 25 at 0:45









        SammitchSammitch

        274110




        274110



























            draft saved

            draft discarded
















































            Thanks for contributing an answer to Unix & Linux Stack Exchange!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f496237%2fbash-script-way-to-ignore-hung-server%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown






            Popular posts from this blog

            How to check contact read email or not when send email to Individual?

            Displaying single band from multi-band raster using QGIS

            How many registers does an x86_64 CPU actually have?