mpd daemon prematurely ending jobs
Clash Royale CLAN TAG#URR8PPP
I am trying to configure mpirun and mpiexec to run software called Materials Studio on a 1 node, 2 processor, 12 core cluster. The submission scheme is PBS. I had everything set up properly (with some help) and where I could submit jobs and they would work well but after a few days I ran into issues where I would get this sort of error:
mpiexec_server.org: cannot connect to local mpd (/tmp/mpd2.console_user); possible causes: 1. no mpd is running on this host 2. an mpd is running but was started without a "console" (-n option)
It seemed like the daemon for mpd was somehow set up but eventually terminated. I had luck adding this (bold part) to my submission script:
export PATH=/data1/opt/MD/Linux-x86_64/IntelMPI/bin:$PATH
export LD_LIBRARY_PATH=/data1/opt/MD/Linux-x86_64/IntelMPI/lib:/data1/opt/MD/Linux-x86_64/IntelMPI/bin:/data1/opt/MD/Linux-x86_64/IntelMKL/lib
**mpdboot -n 1 -f ~/mpd.hosts**
nohup mpd &
/data1/opt/MD/Linux-x86_64/IntelMPI/bin/mpiexec -n 6 /data1/opt/MD/2.0/TaskServer/Tools/vasp5.3.3/Linux-x86_64/vasp_parallel
The job now submits and runs properly but times out after 30 minutes or so. I tried adding '-r ssh' without quotes to the end of the mpdboot line but I am not sure if that is the right strategy to take. Also, I am a little confused about why I need to run this daemon in this script and why I need to call a hosts file when I run- I thought that PBS creates that when the job picks up. Could anyone please give me some advice on where to go next? Basically how can I prevent a job that is running from quitting because of something to do with the mpi daemon.
EDIT: Could anyone shed any light on what is involved with running that mpiexec that I have on the last line? If I properly link to the folder where it is, do I need to run a boot command? I must admit that I am confused why I need to run mpdboot/mpd when then whole point of mpiexec is to eliminate the need for mpd (at least according to the mpiexec website).
job-control cluster timeout mpi
add a comment |
I am trying to configure mpirun and mpiexec to run software called Materials Studio on a 1 node, 2 processor, 12 core cluster. The submission scheme is PBS. I had everything set up properly (with some help) and where I could submit jobs and they would work well but after a few days I ran into issues where I would get this sort of error:
mpiexec_server.org: cannot connect to local mpd (/tmp/mpd2.console_user); possible causes: 1. no mpd is running on this host 2. an mpd is running but was started without a "console" (-n option)
It seemed like the daemon for mpd was somehow set up but eventually terminated. I had luck adding this (bold part) to my submission script:
export PATH=/data1/opt/MD/Linux-x86_64/IntelMPI/bin:$PATH
export LD_LIBRARY_PATH=/data1/opt/MD/Linux-x86_64/IntelMPI/lib:/data1/opt/MD/Linux-x86_64/IntelMPI/bin:/data1/opt/MD/Linux-x86_64/IntelMKL/lib
**mpdboot -n 1 -f ~/mpd.hosts**
nohup mpd &
/data1/opt/MD/Linux-x86_64/IntelMPI/bin/mpiexec -n 6 /data1/opt/MD/2.0/TaskServer/Tools/vasp5.3.3/Linux-x86_64/vasp_parallel
The job now submits and runs properly but times out after 30 minutes or so. I tried adding '-r ssh' without quotes to the end of the mpdboot line but I am not sure if that is the right strategy to take. Also, I am a little confused about why I need to run this daemon in this script and why I need to call a hosts file when I run- I thought that PBS creates that when the job picks up. Could anyone please give me some advice on where to go next? Basically how can I prevent a job that is running from quitting because of something to do with the mpi daemon.
EDIT: Could anyone shed any light on what is involved with running that mpiexec that I have on the last line? If I properly link to the folder where it is, do I need to run a boot command? I must admit that I am confused why I need to run mpdboot/mpd when then whole point of mpiexec is to eliminate the need for mpd (at least according to the mpiexec website).
job-control cluster timeout mpi
I guess I am a little confused why I need to run mpdboot and mpd in the first place. It seems like only the latest and greatest intel compiler suggests doing this. Is there a way to revert to previous functionality that would be present in say mpi 3.2 which I am told this code was compiled against? Thanks again!
– sjensen
Jun 10 '13 at 0:44
add a comment |
I am trying to configure mpirun and mpiexec to run software called Materials Studio on a 1 node, 2 processor, 12 core cluster. The submission scheme is PBS. I had everything set up properly (with some help) and where I could submit jobs and they would work well but after a few days I ran into issues where I would get this sort of error:
mpiexec_server.org: cannot connect to local mpd (/tmp/mpd2.console_user); possible causes: 1. no mpd is running on this host 2. an mpd is running but was started without a "console" (-n option)
It seemed like the daemon for mpd was somehow set up but eventually terminated. I had luck adding this (bold part) to my submission script:
export PATH=/data1/opt/MD/Linux-x86_64/IntelMPI/bin:$PATH
export LD_LIBRARY_PATH=/data1/opt/MD/Linux-x86_64/IntelMPI/lib:/data1/opt/MD/Linux-x86_64/IntelMPI/bin:/data1/opt/MD/Linux-x86_64/IntelMKL/lib
**mpdboot -n 1 -f ~/mpd.hosts**
nohup mpd &
/data1/opt/MD/Linux-x86_64/IntelMPI/bin/mpiexec -n 6 /data1/opt/MD/2.0/TaskServer/Tools/vasp5.3.3/Linux-x86_64/vasp_parallel
The job now submits and runs properly but times out after 30 minutes or so. I tried adding '-r ssh' without quotes to the end of the mpdboot line but I am not sure if that is the right strategy to take. Also, I am a little confused about why I need to run this daemon in this script and why I need to call a hosts file when I run- I thought that PBS creates that when the job picks up. Could anyone please give me some advice on where to go next? Basically how can I prevent a job that is running from quitting because of something to do with the mpi daemon.
EDIT: Could anyone shed any light on what is involved with running that mpiexec that I have on the last line? If I properly link to the folder where it is, do I need to run a boot command? I must admit that I am confused why I need to run mpdboot/mpd when then whole point of mpiexec is to eliminate the need for mpd (at least according to the mpiexec website).
job-control cluster timeout mpi
I am trying to configure mpirun and mpiexec to run software called Materials Studio on a 1 node, 2 processor, 12 core cluster. The submission scheme is PBS. I had everything set up properly (with some help) and where I could submit jobs and they would work well but after a few days I ran into issues where I would get this sort of error:
mpiexec_server.org: cannot connect to local mpd (/tmp/mpd2.console_user); possible causes: 1. no mpd is running on this host 2. an mpd is running but was started without a "console" (-n option)
It seemed like the daemon for mpd was somehow set up but eventually terminated. I had luck adding this (bold part) to my submission script:
export PATH=/data1/opt/MD/Linux-x86_64/IntelMPI/bin:$PATH
export LD_LIBRARY_PATH=/data1/opt/MD/Linux-x86_64/IntelMPI/lib:/data1/opt/MD/Linux-x86_64/IntelMPI/bin:/data1/opt/MD/Linux-x86_64/IntelMKL/lib
**mpdboot -n 1 -f ~/mpd.hosts**
nohup mpd &
/data1/opt/MD/Linux-x86_64/IntelMPI/bin/mpiexec -n 6 /data1/opt/MD/2.0/TaskServer/Tools/vasp5.3.3/Linux-x86_64/vasp_parallel
The job now submits and runs properly but times out after 30 minutes or so. I tried adding '-r ssh' without quotes to the end of the mpdboot line but I am not sure if that is the right strategy to take. Also, I am a little confused about why I need to run this daemon in this script and why I need to call a hosts file when I run- I thought that PBS creates that when the job picks up. Could anyone please give me some advice on where to go next? Basically how can I prevent a job that is running from quitting because of something to do with the mpi daemon.
EDIT: Could anyone shed any light on what is involved with running that mpiexec that I have on the last line? If I properly link to the folder where it is, do I need to run a boot command? I must admit that I am confused why I need to run mpdboot/mpd when then whole point of mpiexec is to eliminate the need for mpd (at least according to the mpiexec website).
job-control cluster timeout mpi
job-control cluster timeout mpi
edited Jan 13 at 22:13
Rui F Ribeiro
39.7k1479132
39.7k1479132
asked Jun 8 '13 at 13:37
sjensensjensen
1112
1112
I guess I am a little confused why I need to run mpdboot and mpd in the first place. It seems like only the latest and greatest intel compiler suggests doing this. Is there a way to revert to previous functionality that would be present in say mpi 3.2 which I am told this code was compiled against? Thanks again!
– sjensen
Jun 10 '13 at 0:44
add a comment |
I guess I am a little confused why I need to run mpdboot and mpd in the first place. It seems like only the latest and greatest intel compiler suggests doing this. Is there a way to revert to previous functionality that would be present in say mpi 3.2 which I am told this code was compiled against? Thanks again!
– sjensen
Jun 10 '13 at 0:44
I guess I am a little confused why I need to run mpdboot and mpd in the first place. It seems like only the latest and greatest intel compiler suggests doing this. Is there a way to revert to previous functionality that would be present in say mpi 3.2 which I am told this code was compiled against? Thanks again!
– sjensen
Jun 10 '13 at 0:44
I guess I am a little confused why I need to run mpdboot and mpd in the first place. It seems like only the latest and greatest intel compiler suggests doing this. Is there a way to revert to previous functionality that would be present in say mpi 3.2 which I am told this code was compiled against? Thanks again!
– sjensen
Jun 10 '13 at 0:44
add a comment |
1 Answer
1
active
oldest
votes
I'm running a MD simulation. But, once I want to run the simulation in DL-POLY the simulation is not started. I used these commands:
$ ps aux | grep mpd
$ nohup mpd > mpd.out 2> mpd.err < /dev/null/ &
$ mpiexec -n 4 DLPOLY.X >> job.out 2> job.err < /dev/null &
$ top
So that when I use the last command to see the process, I would see that the DL_POLY didn't appear. In the meanwhile, using the ll
command I see that mpd.out
has a zero value. I don't know why?
add a comment |
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f78703%2fmpd-daemon-prematurely-ending-jobs%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
I'm running a MD simulation. But, once I want to run the simulation in DL-POLY the simulation is not started. I used these commands:
$ ps aux | grep mpd
$ nohup mpd > mpd.out 2> mpd.err < /dev/null/ &
$ mpiexec -n 4 DLPOLY.X >> job.out 2> job.err < /dev/null &
$ top
So that when I use the last command to see the process, I would see that the DL_POLY didn't appear. In the meanwhile, using the ll
command I see that mpd.out
has a zero value. I don't know why?
add a comment |
I'm running a MD simulation. But, once I want to run the simulation in DL-POLY the simulation is not started. I used these commands:
$ ps aux | grep mpd
$ nohup mpd > mpd.out 2> mpd.err < /dev/null/ &
$ mpiexec -n 4 DLPOLY.X >> job.out 2> job.err < /dev/null &
$ top
So that when I use the last command to see the process, I would see that the DL_POLY didn't appear. In the meanwhile, using the ll
command I see that mpd.out
has a zero value. I don't know why?
add a comment |
I'm running a MD simulation. But, once I want to run the simulation in DL-POLY the simulation is not started. I used these commands:
$ ps aux | grep mpd
$ nohup mpd > mpd.out 2> mpd.err < /dev/null/ &
$ mpiexec -n 4 DLPOLY.X >> job.out 2> job.err < /dev/null &
$ top
So that when I use the last command to see the process, I would see that the DL_POLY didn't appear. In the meanwhile, using the ll
command I see that mpd.out
has a zero value. I don't know why?
I'm running a MD simulation. But, once I want to run the simulation in DL-POLY the simulation is not started. I used these commands:
$ ps aux | grep mpd
$ nohup mpd > mpd.out 2> mpd.err < /dev/null/ &
$ mpiexec -n 4 DLPOLY.X >> job.out 2> job.err < /dev/null &
$ top
So that when I use the last command to see the process, I would see that the DL_POLY didn't appear. In the meanwhile, using the ll
command I see that mpd.out
has a zero value. I don't know why?
edited May 29 '14 at 11:43
slm♦
249k66523681
249k66523681
answered May 29 '14 at 11:20
MajidMajid
1
1
add a comment |
add a comment |
Thanks for contributing an answer to Unix & Linux Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f78703%2fmpd-daemon-prematurely-ending-jobs%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
I guess I am a little confused why I need to run mpdboot and mpd in the first place. It seems like only the latest and greatest intel compiler suggests doing this. Is there a way to revert to previous functionality that would be present in say mpi 3.2 which I am told this code was compiled against? Thanks again!
– sjensen
Jun 10 '13 at 0:44