Save entire process for continuation after reboot
Clash Royale CLAN TAG#URR8PPP
up vote
47
down vote
favorite
I developed an algorithm for a fairly hard problem in mathematics which is likely to need several months to finish. As I have limited resources only, I started this on my Ubuntu 12.04 (x86) laptop. Now I want to install some updates and actually restart the laptop (the "please reboot" message is just annoying).
Is there a way to save an entire process including its allocated memory for continuation beyond a reboot?
Here is some information about the process you might need. Please feel free to ask for further information if needed.
- I called the process in a terminal with the command "
./binary > ./somefile &
" or "time ./binary > ./somefile &", I cannot really remember. - It's printing some debug information to std::cerr (not very often).
- It's currently using roughly 600.0 kiB and even though this will increase, it's unlikely to increase rapidly.
- the process runs with normal priority
- the kernel is 3.2.0-26-generic-pae, the cpu is an AMD, the operating system is Ubuntu 12.04 x86.
- it runs since 9 days and 14 hours (so too long to cancel it ;-) )
process reboot
|
show 1 more comment
up vote
47
down vote
favorite
I developed an algorithm for a fairly hard problem in mathematics which is likely to need several months to finish. As I have limited resources only, I started this on my Ubuntu 12.04 (x86) laptop. Now I want to install some updates and actually restart the laptop (the "please reboot" message is just annoying).
Is there a way to save an entire process including its allocated memory for continuation beyond a reboot?
Here is some information about the process you might need. Please feel free to ask for further information if needed.
- I called the process in a terminal with the command "
./binary > ./somefile &
" or "time ./binary > ./somefile &", I cannot really remember. - It's printing some debug information to std::cerr (not very often).
- It's currently using roughly 600.0 kiB and even though this will increase, it's unlikely to increase rapidly.
- the process runs with normal priority
- the kernel is 3.2.0-26-generic-pae, the cpu is an AMD, the operating system is Ubuntu 12.04 x86.
- it runs since 9 days and 14 hours (so too long to cancel it ;-) )
process reboot
2
Out of curiosity, what are you calculating?
– Viktor Mellgren
Jul 25 '12 at 9:48
2
@user1261166: I study the Target Visitation Problem (which is an extension of the Travelling Salesman Problem) with Branch-and-Cut approach. Thus I need to know as many facets of some special high-dimensional polytope as possible. Basically, it's blowing up a big problem to a gigantic one and then trying to solve just a bit to reduce it afterwards.
– stefan
Jul 25 '12 at 21:58
4
It doesn't quite answer your question but have you considered running your code on a dedicated cluster in the future? Those are hardly shut down and I'm sure there is some computing grid available to you. Not only are they on all the time but also quite a bit faster (especially if you can parallelise your code). You could even have a go at setting one up yourself (look up Oracle Grid Engine).
– Wojtek Rzepala
Aug 2 '12 at 8:25
I never thought of this beeing such a popular question (at least way more popular than every other question by me so for). Since the process finished now (unexpectedly, though without a crash), I will try out each method shortly. Thanks everyone!
– stefan
Aug 8 '12 at 9:20
Just FYI, there's also a Computational Science SE
– Tobias Kienzler
Feb 4 '13 at 8:12
|
show 1 more comment
up vote
47
down vote
favorite
up vote
47
down vote
favorite
I developed an algorithm for a fairly hard problem in mathematics which is likely to need several months to finish. As I have limited resources only, I started this on my Ubuntu 12.04 (x86) laptop. Now I want to install some updates and actually restart the laptop (the "please reboot" message is just annoying).
Is there a way to save an entire process including its allocated memory for continuation beyond a reboot?
Here is some information about the process you might need. Please feel free to ask for further information if needed.
- I called the process in a terminal with the command "
./binary > ./somefile &
" or "time ./binary > ./somefile &", I cannot really remember. - It's printing some debug information to std::cerr (not very often).
- It's currently using roughly 600.0 kiB and even though this will increase, it's unlikely to increase rapidly.
- the process runs with normal priority
- the kernel is 3.2.0-26-generic-pae, the cpu is an AMD, the operating system is Ubuntu 12.04 x86.
- it runs since 9 days and 14 hours (so too long to cancel it ;-) )
process reboot
I developed an algorithm for a fairly hard problem in mathematics which is likely to need several months to finish. As I have limited resources only, I started this on my Ubuntu 12.04 (x86) laptop. Now I want to install some updates and actually restart the laptop (the "please reboot" message is just annoying).
Is there a way to save an entire process including its allocated memory for continuation beyond a reboot?
Here is some information about the process you might need. Please feel free to ask for further information if needed.
- I called the process in a terminal with the command "
./binary > ./somefile &
" or "time ./binary > ./somefile &", I cannot really remember. - It's printing some debug information to std::cerr (not very often).
- It's currently using roughly 600.0 kiB and even though this will increase, it's unlikely to increase rapidly.
- the process runs with normal priority
- the kernel is 3.2.0-26-generic-pae, the cpu is an AMD, the operating system is Ubuntu 12.04 x86.
- it runs since 9 days and 14 hours (so too long to cancel it ;-) )
process reboot
process reboot
asked Jul 24 '12 at 17:49
stefan
4591816
4591816
2
Out of curiosity, what are you calculating?
– Viktor Mellgren
Jul 25 '12 at 9:48
2
@user1261166: I study the Target Visitation Problem (which is an extension of the Travelling Salesman Problem) with Branch-and-Cut approach. Thus I need to know as many facets of some special high-dimensional polytope as possible. Basically, it's blowing up a big problem to a gigantic one and then trying to solve just a bit to reduce it afterwards.
– stefan
Jul 25 '12 at 21:58
4
It doesn't quite answer your question but have you considered running your code on a dedicated cluster in the future? Those are hardly shut down and I'm sure there is some computing grid available to you. Not only are they on all the time but also quite a bit faster (especially if you can parallelise your code). You could even have a go at setting one up yourself (look up Oracle Grid Engine).
– Wojtek Rzepala
Aug 2 '12 at 8:25
I never thought of this beeing such a popular question (at least way more popular than every other question by me so for). Since the process finished now (unexpectedly, though without a crash), I will try out each method shortly. Thanks everyone!
– stefan
Aug 8 '12 at 9:20
Just FYI, there's also a Computational Science SE
– Tobias Kienzler
Feb 4 '13 at 8:12
|
show 1 more comment
2
Out of curiosity, what are you calculating?
– Viktor Mellgren
Jul 25 '12 at 9:48
2
@user1261166: I study the Target Visitation Problem (which is an extension of the Travelling Salesman Problem) with Branch-and-Cut approach. Thus I need to know as many facets of some special high-dimensional polytope as possible. Basically, it's blowing up a big problem to a gigantic one and then trying to solve just a bit to reduce it afterwards.
– stefan
Jul 25 '12 at 21:58
4
It doesn't quite answer your question but have you considered running your code on a dedicated cluster in the future? Those are hardly shut down and I'm sure there is some computing grid available to you. Not only are they on all the time but also quite a bit faster (especially if you can parallelise your code). You could even have a go at setting one up yourself (look up Oracle Grid Engine).
– Wojtek Rzepala
Aug 2 '12 at 8:25
I never thought of this beeing such a popular question (at least way more popular than every other question by me so for). Since the process finished now (unexpectedly, though without a crash), I will try out each method shortly. Thanks everyone!
– stefan
Aug 8 '12 at 9:20
Just FYI, there's also a Computational Science SE
– Tobias Kienzler
Feb 4 '13 at 8:12
2
2
Out of curiosity, what are you calculating?
– Viktor Mellgren
Jul 25 '12 at 9:48
Out of curiosity, what are you calculating?
– Viktor Mellgren
Jul 25 '12 at 9:48
2
2
@user1261166: I study the Target Visitation Problem (which is an extension of the Travelling Salesman Problem) with Branch-and-Cut approach. Thus I need to know as many facets of some special high-dimensional polytope as possible. Basically, it's blowing up a big problem to a gigantic one and then trying to solve just a bit to reduce it afterwards.
– stefan
Jul 25 '12 at 21:58
@user1261166: I study the Target Visitation Problem (which is an extension of the Travelling Salesman Problem) with Branch-and-Cut approach. Thus I need to know as many facets of some special high-dimensional polytope as possible. Basically, it's blowing up a big problem to a gigantic one and then trying to solve just a bit to reduce it afterwards.
– stefan
Jul 25 '12 at 21:58
4
4
It doesn't quite answer your question but have you considered running your code on a dedicated cluster in the future? Those are hardly shut down and I'm sure there is some computing grid available to you. Not only are they on all the time but also quite a bit faster (especially if you can parallelise your code). You could even have a go at setting one up yourself (look up Oracle Grid Engine).
– Wojtek Rzepala
Aug 2 '12 at 8:25
It doesn't quite answer your question but have you considered running your code on a dedicated cluster in the future? Those are hardly shut down and I'm sure there is some computing grid available to you. Not only are they on all the time but also quite a bit faster (especially if you can parallelise your code). You could even have a go at setting one up yourself (look up Oracle Grid Engine).
– Wojtek Rzepala
Aug 2 '12 at 8:25
I never thought of this beeing such a popular question (at least way more popular than every other question by me so for). Since the process finished now (unexpectedly, though without a crash), I will try out each method shortly. Thanks everyone!
– stefan
Aug 8 '12 at 9:20
I never thought of this beeing such a popular question (at least way more popular than every other question by me so for). Since the process finished now (unexpectedly, though without a crash), I will try out each method shortly. Thanks everyone!
– stefan
Aug 8 '12 at 9:20
Just FYI, there's also a Computational Science SE
– Tobias Kienzler
Feb 4 '13 at 8:12
Just FYI, there's also a Computational Science SE
– Tobias Kienzler
Feb 4 '13 at 8:12
|
show 1 more comment
4 Answers
4
active
oldest
votes
up vote
37
down vote
accepted
The best/simplest solution is to change your program to save the state to a file an reuse that file to restore the process.
Based upon the wikipedia page about application snapshots there are multiple alternatives:
- There is also cryopid but it seems to be unmaintained.
Linux checkpoint/restart seems to be a good choice but your kernel needs to haveCONFIG_CHECKPOINT_RESTORE
enabled.
criu is probably the most up to-date project and probably your best shot but depends also on some specific Kernel options which your distribution probably hasn't set.
This is already too late but another more hands-on approach is to start your process in a dedicated VM and just suspend and restore the whole Virtual machine. Depending on your hypervisor you can also move the machine between different hosts.
For the future think about where you run your long-running processes, how to parallize them and how to handle problems, e.g. full disks, process gets killed etc.
add a comment |
up vote
18
down vote
A fairly "cheap" way to do this would be to do the processing in a VM (e.g., with VirtualBox). Before you shut down suspend the VM and save the state. After booting restore the VM & state.
This does have the disadvantage of requiring killing and restarting the job. But if it's actually going to be running for several months then a nine days difference becomes trivial (5% increase over 6 months).
Edit: I just realized that Ulrich already mentioned this in unnumbered item 4 on his list.
I would still encourage you to consider this as an option, especially since none of the alternatives seem like a robust solution. Each has a reason why it may not work.
I suppose the best thing to do would be to try one of those and if it doesn't work restart the job in a VM.
add a comment |
up vote
14
down vote
Take a peek at the tool CryoPID.
From the home page:
"CryoPID allows you to capture the state of a running process in Linux and save it to a file. This file can then be used to resume the process later on, either after a reboot or even on another machine."
3
Used this before to save the state of a python script running on a Linux box and moved it to a FreeBSD box and resumed there. Some arcane magic going on there ;)
– Tim
Jul 24 '12 at 18:30
I didn't know FreeBSD and Linux were binary-compatible; that's something very interesting I just learned. But does that mean they have exactly identical memory models? It seems incredulous to me that they have the same syscall conventions, the same libc (i guess fbsd use glibc), the same exact calling conventions at the asm level, etc. The incompatibilities sound to me as if you took a MacOS process and dumped it onto a Windows box; that's really quite amazing.
– cat
May 7 '16 at 15:33
Has anyone tried this recently? The site is gone, I can't find a .deb, building from source fails, etc. I'd like to know if it's possible before spending any longer on it. I'm on Debian if it matters.
– John P
Jan 25 '17 at 21:20
1
@JohnP It's available on GitHub now: github.com/maaziz/cryopid
– starbeamrainbowlabs
Jun 6 '17 at 9:18
add a comment |
up vote
6
down vote
If you end up needing to restart your program, I would encourage you to spend some time adding some features to your code that might save you time in the future.
If the process is going to be run for a long time, being able to save the entire process state when you restart the machine is perhaps not hugely helpful if your process crashes while it is running.
I would encourage you to have your program output to a file "checkpoint" data. This data should be sufficient that your program will be able to resume from the state it was at when the checkpoint file was saved. You need not save the entire process, just a snapshot of the relevant variables being used in your calculation, sufficient for your calculation to resume where it left off. Your code would also need to include some way of reading in the data from this file to obtain it's starting state.
You could set up your code so when you send it a signal, it saves one of these checkpoint files, so you can save the "state" of your calculation at any point.
Additionally, being able to see how the data changes as the calculation progresses might be interesting in itself!
add a comment |
4 Answers
4
active
oldest
votes
4 Answers
4
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
37
down vote
accepted
The best/simplest solution is to change your program to save the state to a file an reuse that file to restore the process.
Based upon the wikipedia page about application snapshots there are multiple alternatives:
- There is also cryopid but it seems to be unmaintained.
Linux checkpoint/restart seems to be a good choice but your kernel needs to haveCONFIG_CHECKPOINT_RESTORE
enabled.
criu is probably the most up to-date project and probably your best shot but depends also on some specific Kernel options which your distribution probably hasn't set.
This is already too late but another more hands-on approach is to start your process in a dedicated VM and just suspend and restore the whole Virtual machine. Depending on your hypervisor you can also move the machine between different hosts.
For the future think about where you run your long-running processes, how to parallize them and how to handle problems, e.g. full disks, process gets killed etc.
add a comment |
up vote
37
down vote
accepted
The best/simplest solution is to change your program to save the state to a file an reuse that file to restore the process.
Based upon the wikipedia page about application snapshots there are multiple alternatives:
- There is also cryopid but it seems to be unmaintained.
Linux checkpoint/restart seems to be a good choice but your kernel needs to haveCONFIG_CHECKPOINT_RESTORE
enabled.
criu is probably the most up to-date project and probably your best shot but depends also on some specific Kernel options which your distribution probably hasn't set.
This is already too late but another more hands-on approach is to start your process in a dedicated VM and just suspend and restore the whole Virtual machine. Depending on your hypervisor you can also move the machine between different hosts.
For the future think about where you run your long-running processes, how to parallize them and how to handle problems, e.g. full disks, process gets killed etc.
add a comment |
up vote
37
down vote
accepted
up vote
37
down vote
accepted
The best/simplest solution is to change your program to save the state to a file an reuse that file to restore the process.
Based upon the wikipedia page about application snapshots there are multiple alternatives:
- There is also cryopid but it seems to be unmaintained.
Linux checkpoint/restart seems to be a good choice but your kernel needs to haveCONFIG_CHECKPOINT_RESTORE
enabled.
criu is probably the most up to-date project and probably your best shot but depends also on some specific Kernel options which your distribution probably hasn't set.
This is already too late but another more hands-on approach is to start your process in a dedicated VM and just suspend and restore the whole Virtual machine. Depending on your hypervisor you can also move the machine between different hosts.
For the future think about where you run your long-running processes, how to parallize them and how to handle problems, e.g. full disks, process gets killed etc.
The best/simplest solution is to change your program to save the state to a file an reuse that file to restore the process.
Based upon the wikipedia page about application snapshots there are multiple alternatives:
- There is also cryopid but it seems to be unmaintained.
Linux checkpoint/restart seems to be a good choice but your kernel needs to haveCONFIG_CHECKPOINT_RESTORE
enabled.
criu is probably the most up to-date project and probably your best shot but depends also on some specific Kernel options which your distribution probably hasn't set.
This is already too late but another more hands-on approach is to start your process in a dedicated VM and just suspend and restore the whole Virtual machine. Depending on your hypervisor you can also move the machine between different hosts.
For the future think about where you run your long-running processes, how to parallize them and how to handle problems, e.g. full disks, process gets killed etc.
answered Jul 24 '12 at 18:28
Ulrich Dangel
20.2k25771
20.2k25771
add a comment |
add a comment |
up vote
18
down vote
A fairly "cheap" way to do this would be to do the processing in a VM (e.g., with VirtualBox). Before you shut down suspend the VM and save the state. After booting restore the VM & state.
This does have the disadvantage of requiring killing and restarting the job. But if it's actually going to be running for several months then a nine days difference becomes trivial (5% increase over 6 months).
Edit: I just realized that Ulrich already mentioned this in unnumbered item 4 on his list.
I would still encourage you to consider this as an option, especially since none of the alternatives seem like a robust solution. Each has a reason why it may not work.
I suppose the best thing to do would be to try one of those and if it doesn't work restart the job in a VM.
add a comment |
up vote
18
down vote
A fairly "cheap" way to do this would be to do the processing in a VM (e.g., with VirtualBox). Before you shut down suspend the VM and save the state. After booting restore the VM & state.
This does have the disadvantage of requiring killing and restarting the job. But if it's actually going to be running for several months then a nine days difference becomes trivial (5% increase over 6 months).
Edit: I just realized that Ulrich already mentioned this in unnumbered item 4 on his list.
I would still encourage you to consider this as an option, especially since none of the alternatives seem like a robust solution. Each has a reason why it may not work.
I suppose the best thing to do would be to try one of those and if it doesn't work restart the job in a VM.
add a comment |
up vote
18
down vote
up vote
18
down vote
A fairly "cheap" way to do this would be to do the processing in a VM (e.g., with VirtualBox). Before you shut down suspend the VM and save the state. After booting restore the VM & state.
This does have the disadvantage of requiring killing and restarting the job. But if it's actually going to be running for several months then a nine days difference becomes trivial (5% increase over 6 months).
Edit: I just realized that Ulrich already mentioned this in unnumbered item 4 on his list.
I would still encourage you to consider this as an option, especially since none of the alternatives seem like a robust solution. Each has a reason why it may not work.
I suppose the best thing to do would be to try one of those and if it doesn't work restart the job in a VM.
A fairly "cheap" way to do this would be to do the processing in a VM (e.g., with VirtualBox). Before you shut down suspend the VM and save the state. After booting restore the VM & state.
This does have the disadvantage of requiring killing and restarting the job. But if it's actually going to be running for several months then a nine days difference becomes trivial (5% increase over 6 months).
Edit: I just realized that Ulrich already mentioned this in unnumbered item 4 on his list.
I would still encourage you to consider this as an option, especially since none of the alternatives seem like a robust solution. Each has a reason why it may not work.
I suppose the best thing to do would be to try one of those and if it doesn't work restart the job in a VM.
answered Jul 24 '12 at 18:55
bahamat
24k14690
24k14690
add a comment |
add a comment |
up vote
14
down vote
Take a peek at the tool CryoPID.
From the home page:
"CryoPID allows you to capture the state of a running process in Linux and save it to a file. This file can then be used to resume the process later on, either after a reboot or even on another machine."
3
Used this before to save the state of a python script running on a Linux box and moved it to a FreeBSD box and resumed there. Some arcane magic going on there ;)
– Tim
Jul 24 '12 at 18:30
I didn't know FreeBSD and Linux were binary-compatible; that's something very interesting I just learned. But does that mean they have exactly identical memory models? It seems incredulous to me that they have the same syscall conventions, the same libc (i guess fbsd use glibc), the same exact calling conventions at the asm level, etc. The incompatibilities sound to me as if you took a MacOS process and dumped it onto a Windows box; that's really quite amazing.
– cat
May 7 '16 at 15:33
Has anyone tried this recently? The site is gone, I can't find a .deb, building from source fails, etc. I'd like to know if it's possible before spending any longer on it. I'm on Debian if it matters.
– John P
Jan 25 '17 at 21:20
1
@JohnP It's available on GitHub now: github.com/maaziz/cryopid
– starbeamrainbowlabs
Jun 6 '17 at 9:18
add a comment |
up vote
14
down vote
Take a peek at the tool CryoPID.
From the home page:
"CryoPID allows you to capture the state of a running process in Linux and save it to a file. This file can then be used to resume the process later on, either after a reboot or even on another machine."
3
Used this before to save the state of a python script running on a Linux box and moved it to a FreeBSD box and resumed there. Some arcane magic going on there ;)
– Tim
Jul 24 '12 at 18:30
I didn't know FreeBSD and Linux were binary-compatible; that's something very interesting I just learned. But does that mean they have exactly identical memory models? It seems incredulous to me that they have the same syscall conventions, the same libc (i guess fbsd use glibc), the same exact calling conventions at the asm level, etc. The incompatibilities sound to me as if you took a MacOS process and dumped it onto a Windows box; that's really quite amazing.
– cat
May 7 '16 at 15:33
Has anyone tried this recently? The site is gone, I can't find a .deb, building from source fails, etc. I'd like to know if it's possible before spending any longer on it. I'm on Debian if it matters.
– John P
Jan 25 '17 at 21:20
1
@JohnP It's available on GitHub now: github.com/maaziz/cryopid
– starbeamrainbowlabs
Jun 6 '17 at 9:18
add a comment |
up vote
14
down vote
up vote
14
down vote
Take a peek at the tool CryoPID.
From the home page:
"CryoPID allows you to capture the state of a running process in Linux and save it to a file. This file can then be used to resume the process later on, either after a reboot or even on another machine."
Take a peek at the tool CryoPID.
From the home page:
"CryoPID allows you to capture the state of a running process in Linux and save it to a file. This file can then be used to resume the process later on, either after a reboot or even on another machine."
edited Nov 26 at 21:33
Joseph Young
1034
1034
answered Jul 24 '12 at 18:29
Tim
4,9901216
4,9901216
3
Used this before to save the state of a python script running on a Linux box and moved it to a FreeBSD box and resumed there. Some arcane magic going on there ;)
– Tim
Jul 24 '12 at 18:30
I didn't know FreeBSD and Linux were binary-compatible; that's something very interesting I just learned. But does that mean they have exactly identical memory models? It seems incredulous to me that they have the same syscall conventions, the same libc (i guess fbsd use glibc), the same exact calling conventions at the asm level, etc. The incompatibilities sound to me as if you took a MacOS process and dumped it onto a Windows box; that's really quite amazing.
– cat
May 7 '16 at 15:33
Has anyone tried this recently? The site is gone, I can't find a .deb, building from source fails, etc. I'd like to know if it's possible before spending any longer on it. I'm on Debian if it matters.
– John P
Jan 25 '17 at 21:20
1
@JohnP It's available on GitHub now: github.com/maaziz/cryopid
– starbeamrainbowlabs
Jun 6 '17 at 9:18
add a comment |
3
Used this before to save the state of a python script running on a Linux box and moved it to a FreeBSD box and resumed there. Some arcane magic going on there ;)
– Tim
Jul 24 '12 at 18:30
I didn't know FreeBSD and Linux were binary-compatible; that's something very interesting I just learned. But does that mean they have exactly identical memory models? It seems incredulous to me that they have the same syscall conventions, the same libc (i guess fbsd use glibc), the same exact calling conventions at the asm level, etc. The incompatibilities sound to me as if you took a MacOS process and dumped it onto a Windows box; that's really quite amazing.
– cat
May 7 '16 at 15:33
Has anyone tried this recently? The site is gone, I can't find a .deb, building from source fails, etc. I'd like to know if it's possible before spending any longer on it. I'm on Debian if it matters.
– John P
Jan 25 '17 at 21:20
1
@JohnP It's available on GitHub now: github.com/maaziz/cryopid
– starbeamrainbowlabs
Jun 6 '17 at 9:18
3
3
Used this before to save the state of a python script running on a Linux box and moved it to a FreeBSD box and resumed there. Some arcane magic going on there ;)
– Tim
Jul 24 '12 at 18:30
Used this before to save the state of a python script running on a Linux box and moved it to a FreeBSD box and resumed there. Some arcane magic going on there ;)
– Tim
Jul 24 '12 at 18:30
I didn't know FreeBSD and Linux were binary-compatible; that's something very interesting I just learned. But does that mean they have exactly identical memory models? It seems incredulous to me that they have the same syscall conventions, the same libc (i guess fbsd use glibc), the same exact calling conventions at the asm level, etc. The incompatibilities sound to me as if you took a MacOS process and dumped it onto a Windows box; that's really quite amazing.
– cat
May 7 '16 at 15:33
I didn't know FreeBSD and Linux were binary-compatible; that's something very interesting I just learned. But does that mean they have exactly identical memory models? It seems incredulous to me that they have the same syscall conventions, the same libc (i guess fbsd use glibc), the same exact calling conventions at the asm level, etc. The incompatibilities sound to me as if you took a MacOS process and dumped it onto a Windows box; that's really quite amazing.
– cat
May 7 '16 at 15:33
Has anyone tried this recently? The site is gone, I can't find a .deb, building from source fails, etc. I'd like to know if it's possible before spending any longer on it. I'm on Debian if it matters.
– John P
Jan 25 '17 at 21:20
Has anyone tried this recently? The site is gone, I can't find a .deb, building from source fails, etc. I'd like to know if it's possible before spending any longer on it. I'm on Debian if it matters.
– John P
Jan 25 '17 at 21:20
1
1
@JohnP It's available on GitHub now: github.com/maaziz/cryopid
– starbeamrainbowlabs
Jun 6 '17 at 9:18
@JohnP It's available on GitHub now: github.com/maaziz/cryopid
– starbeamrainbowlabs
Jun 6 '17 at 9:18
add a comment |
up vote
6
down vote
If you end up needing to restart your program, I would encourage you to spend some time adding some features to your code that might save you time in the future.
If the process is going to be run for a long time, being able to save the entire process state when you restart the machine is perhaps not hugely helpful if your process crashes while it is running.
I would encourage you to have your program output to a file "checkpoint" data. This data should be sufficient that your program will be able to resume from the state it was at when the checkpoint file was saved. You need not save the entire process, just a snapshot of the relevant variables being used in your calculation, sufficient for your calculation to resume where it left off. Your code would also need to include some way of reading in the data from this file to obtain it's starting state.
You could set up your code so when you send it a signal, it saves one of these checkpoint files, so you can save the "state" of your calculation at any point.
Additionally, being able to see how the data changes as the calculation progresses might be interesting in itself!
add a comment |
up vote
6
down vote
If you end up needing to restart your program, I would encourage you to spend some time adding some features to your code that might save you time in the future.
If the process is going to be run for a long time, being able to save the entire process state when you restart the machine is perhaps not hugely helpful if your process crashes while it is running.
I would encourage you to have your program output to a file "checkpoint" data. This data should be sufficient that your program will be able to resume from the state it was at when the checkpoint file was saved. You need not save the entire process, just a snapshot of the relevant variables being used in your calculation, sufficient for your calculation to resume where it left off. Your code would also need to include some way of reading in the data from this file to obtain it's starting state.
You could set up your code so when you send it a signal, it saves one of these checkpoint files, so you can save the "state" of your calculation at any point.
Additionally, being able to see how the data changes as the calculation progresses might be interesting in itself!
add a comment |
up vote
6
down vote
up vote
6
down vote
If you end up needing to restart your program, I would encourage you to spend some time adding some features to your code that might save you time in the future.
If the process is going to be run for a long time, being able to save the entire process state when you restart the machine is perhaps not hugely helpful if your process crashes while it is running.
I would encourage you to have your program output to a file "checkpoint" data. This data should be sufficient that your program will be able to resume from the state it was at when the checkpoint file was saved. You need not save the entire process, just a snapshot of the relevant variables being used in your calculation, sufficient for your calculation to resume where it left off. Your code would also need to include some way of reading in the data from this file to obtain it's starting state.
You could set up your code so when you send it a signal, it saves one of these checkpoint files, so you can save the "state" of your calculation at any point.
Additionally, being able to see how the data changes as the calculation progresses might be interesting in itself!
If you end up needing to restart your program, I would encourage you to spend some time adding some features to your code that might save you time in the future.
If the process is going to be run for a long time, being able to save the entire process state when you restart the machine is perhaps not hugely helpful if your process crashes while it is running.
I would encourage you to have your program output to a file "checkpoint" data. This data should be sufficient that your program will be able to resume from the state it was at when the checkpoint file was saved. You need not save the entire process, just a snapshot of the relevant variables being used in your calculation, sufficient for your calculation to resume where it left off. Your code would also need to include some way of reading in the data from this file to obtain it's starting state.
You could set up your code so when you send it a signal, it saves one of these checkpoint files, so you can save the "state" of your calculation at any point.
Additionally, being able to see how the data changes as the calculation progresses might be interesting in itself!
answered Aug 2 '12 at 8:08
James Womack
32228
32228
add a comment |
add a comment |
Thanks for contributing an answer to Unix & Linux Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f43854%2fsave-entire-process-for-continuation-after-reboot%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
2
Out of curiosity, what are you calculating?
– Viktor Mellgren
Jul 25 '12 at 9:48
2
@user1261166: I study the Target Visitation Problem (which is an extension of the Travelling Salesman Problem) with Branch-and-Cut approach. Thus I need to know as many facets of some special high-dimensional polytope as possible. Basically, it's blowing up a big problem to a gigantic one and then trying to solve just a bit to reduce it afterwards.
– stefan
Jul 25 '12 at 21:58
4
It doesn't quite answer your question but have you considered running your code on a dedicated cluster in the future? Those are hardly shut down and I'm sure there is some computing grid available to you. Not only are they on all the time but also quite a bit faster (especially if you can parallelise your code). You could even have a go at setting one up yourself (look up Oracle Grid Engine).
– Wojtek Rzepala
Aug 2 '12 at 8:25
I never thought of this beeing such a popular question (at least way more popular than every other question by me so for). Since the process finished now (unexpectedly, though without a crash), I will try out each method shortly. Thanks everyone!
– stefan
Aug 8 '12 at 9:20
Just FYI, there's also a Computational Science SE
– Tobias Kienzler
Feb 4 '13 at 8:12