Clustered file services failure causes all nodes down? [closed]

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
1
down vote

favorite












I am planning a cluster of 3 active-active nodes running Red Hat Linux 6.7, with local and shared file systems, such that the 3 servers can read / write some shared files concurrently. However, my system administrator told me that if the clustered file service is down, then all 3 nodes will go down, is there any clustering approach to overcome it?







share|improve this question














closed as too broad by maulinglawns, Christopher, Thomas, Jeff Schaller, roaima Mar 23 at 23:38


Please edit the question to limit it to a specific problem with enough detail to identify an adequate answer. Avoid asking multiple distinct questions at once. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.














  • It might depend on the clustering implementation you're using. But you've forgotten to tell us what that is.
    – roaima
    Mar 23 at 23:38











  • One approach would be to have the shared filesystems be NFS filesystems, and the NFS server be a NetApp or some other NAS device with sufficient redundancy built-in. Or you might build your own NFS server cluster using DRBD, and have the worker nodes rely on that for shared filesystems. But still, if your workers need shared storage and it's down, then the workers will be unable to work too. The only way around it is replicating the data instead of sharing it, and that brings with it the problem of keeping the replicas in sync with each other. [...]
    – telcoM
    Mar 24 at 7:42










  • [...] If there is a chance that one of the workers is using a version of shared data that is out-of-sync with the others, then that worker must stop immediately and not pass on the stale data. Essentially, you must be able to treat the worker systems as expendable.
    – telcoM
    Mar 24 at 7:43














up vote
1
down vote

favorite












I am planning a cluster of 3 active-active nodes running Red Hat Linux 6.7, with local and shared file systems, such that the 3 servers can read / write some shared files concurrently. However, my system administrator told me that if the clustered file service is down, then all 3 nodes will go down, is there any clustering approach to overcome it?







share|improve this question














closed as too broad by maulinglawns, Christopher, Thomas, Jeff Schaller, roaima Mar 23 at 23:38


Please edit the question to limit it to a specific problem with enough detail to identify an adequate answer. Avoid asking multiple distinct questions at once. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.














  • It might depend on the clustering implementation you're using. But you've forgotten to tell us what that is.
    – roaima
    Mar 23 at 23:38











  • One approach would be to have the shared filesystems be NFS filesystems, and the NFS server be a NetApp or some other NAS device with sufficient redundancy built-in. Or you might build your own NFS server cluster using DRBD, and have the worker nodes rely on that for shared filesystems. But still, if your workers need shared storage and it's down, then the workers will be unable to work too. The only way around it is replicating the data instead of sharing it, and that brings with it the problem of keeping the replicas in sync with each other. [...]
    – telcoM
    Mar 24 at 7:42










  • [...] If there is a chance that one of the workers is using a version of shared data that is out-of-sync with the others, then that worker must stop immediately and not pass on the stale data. Essentially, you must be able to treat the worker systems as expendable.
    – telcoM
    Mar 24 at 7:43












up vote
1
down vote

favorite









up vote
1
down vote

favorite











I am planning a cluster of 3 active-active nodes running Red Hat Linux 6.7, with local and shared file systems, such that the 3 servers can read / write some shared files concurrently. However, my system administrator told me that if the clustered file service is down, then all 3 nodes will go down, is there any clustering approach to overcome it?







share|improve this question














I am planning a cluster of 3 active-active nodes running Red Hat Linux 6.7, with local and shared file systems, such that the 3 servers can read / write some shared files concurrently. However, my system administrator told me that if the clustered file service is down, then all 3 nodes will go down, is there any clustering approach to overcome it?









share|improve this question













share|improve this question




share|improve this question








edited Mar 24 at 2:38

























asked Mar 23 at 16:40









Ken Sin

63




63




closed as too broad by maulinglawns, Christopher, Thomas, Jeff Schaller, roaima Mar 23 at 23:38


Please edit the question to limit it to a specific problem with enough detail to identify an adequate answer. Avoid asking multiple distinct questions at once. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.






closed as too broad by maulinglawns, Christopher, Thomas, Jeff Schaller, roaima Mar 23 at 23:38


Please edit the question to limit it to a specific problem with enough detail to identify an adequate answer. Avoid asking multiple distinct questions at once. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.













  • It might depend on the clustering implementation you're using. But you've forgotten to tell us what that is.
    – roaima
    Mar 23 at 23:38











  • One approach would be to have the shared filesystems be NFS filesystems, and the NFS server be a NetApp or some other NAS device with sufficient redundancy built-in. Or you might build your own NFS server cluster using DRBD, and have the worker nodes rely on that for shared filesystems. But still, if your workers need shared storage and it's down, then the workers will be unable to work too. The only way around it is replicating the data instead of sharing it, and that brings with it the problem of keeping the replicas in sync with each other. [...]
    – telcoM
    Mar 24 at 7:42










  • [...] If there is a chance that one of the workers is using a version of shared data that is out-of-sync with the others, then that worker must stop immediately and not pass on the stale data. Essentially, you must be able to treat the worker systems as expendable.
    – telcoM
    Mar 24 at 7:43
















  • It might depend on the clustering implementation you're using. But you've forgotten to tell us what that is.
    – roaima
    Mar 23 at 23:38











  • One approach would be to have the shared filesystems be NFS filesystems, and the NFS server be a NetApp or some other NAS device with sufficient redundancy built-in. Or you might build your own NFS server cluster using DRBD, and have the worker nodes rely on that for shared filesystems. But still, if your workers need shared storage and it's down, then the workers will be unable to work too. The only way around it is replicating the data instead of sharing it, and that brings with it the problem of keeping the replicas in sync with each other. [...]
    – telcoM
    Mar 24 at 7:42










  • [...] If there is a chance that one of the workers is using a version of shared data that is out-of-sync with the others, then that worker must stop immediately and not pass on the stale data. Essentially, you must be able to treat the worker systems as expendable.
    – telcoM
    Mar 24 at 7:43















It might depend on the clustering implementation you're using. But you've forgotten to tell us what that is.
– roaima
Mar 23 at 23:38





It might depend on the clustering implementation you're using. But you've forgotten to tell us what that is.
– roaima
Mar 23 at 23:38













One approach would be to have the shared filesystems be NFS filesystems, and the NFS server be a NetApp or some other NAS device with sufficient redundancy built-in. Or you might build your own NFS server cluster using DRBD, and have the worker nodes rely on that for shared filesystems. But still, if your workers need shared storage and it's down, then the workers will be unable to work too. The only way around it is replicating the data instead of sharing it, and that brings with it the problem of keeping the replicas in sync with each other. [...]
– telcoM
Mar 24 at 7:42




One approach would be to have the shared filesystems be NFS filesystems, and the NFS server be a NetApp or some other NAS device with sufficient redundancy built-in. Or you might build your own NFS server cluster using DRBD, and have the worker nodes rely on that for shared filesystems. But still, if your workers need shared storage and it's down, then the workers will be unable to work too. The only way around it is replicating the data instead of sharing it, and that brings with it the problem of keeping the replicas in sync with each other. [...]
– telcoM
Mar 24 at 7:42












[...] If there is a chance that one of the workers is using a version of shared data that is out-of-sync with the others, then that worker must stop immediately and not pass on the stale data. Essentially, you must be able to treat the worker systems as expendable.
– telcoM
Mar 24 at 7:43




[...] If there is a chance that one of the workers is using a version of shared data that is out-of-sync with the others, then that worker must stop immediately and not pass on the stale data. Essentially, you must be able to treat the worker systems as expendable.
– telcoM
Mar 24 at 7:43










1 Answer
1






active

oldest

votes

















up vote
1
down vote













Your admin is correct, if you have a single shared OS between nodes in a clustered filesystem and the clustering fails, all nodes are down as they would not be able to access important files necessary for operation.



You are also correct, if each node has its own OS and the clustering fails, you should only lose access to files within the cluster as the necessary operating system files would be local. You do not mention what clustering you are using, so there may need to be some further mechanisms to make sure a failed cluster does not otherwise hang the system.



It seems like your clustering would still be a single point of failure for the shared files, but if it does not take out entire hosts that may make your admin feel better.






share|improve this answer



























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    1
    down vote













    Your admin is correct, if you have a single shared OS between nodes in a clustered filesystem and the clustering fails, all nodes are down as they would not be able to access important files necessary for operation.



    You are also correct, if each node has its own OS and the clustering fails, you should only lose access to files within the cluster as the necessary operating system files would be local. You do not mention what clustering you are using, so there may need to be some further mechanisms to make sure a failed cluster does not otherwise hang the system.



    It seems like your clustering would still be a single point of failure for the shared files, but if it does not take out entire hosts that may make your admin feel better.






    share|improve this answer
























      up vote
      1
      down vote













      Your admin is correct, if you have a single shared OS between nodes in a clustered filesystem and the clustering fails, all nodes are down as they would not be able to access important files necessary for operation.



      You are also correct, if each node has its own OS and the clustering fails, you should only lose access to files within the cluster as the necessary operating system files would be local. You do not mention what clustering you are using, so there may need to be some further mechanisms to make sure a failed cluster does not otherwise hang the system.



      It seems like your clustering would still be a single point of failure for the shared files, but if it does not take out entire hosts that may make your admin feel better.






      share|improve this answer






















        up vote
        1
        down vote










        up vote
        1
        down vote









        Your admin is correct, if you have a single shared OS between nodes in a clustered filesystem and the clustering fails, all nodes are down as they would not be able to access important files necessary for operation.



        You are also correct, if each node has its own OS and the clustering fails, you should only lose access to files within the cluster as the necessary operating system files would be local. You do not mention what clustering you are using, so there may need to be some further mechanisms to make sure a failed cluster does not otherwise hang the system.



        It seems like your clustering would still be a single point of failure for the shared files, but if it does not take out entire hosts that may make your admin feel better.






        share|improve this answer












        Your admin is correct, if you have a single shared OS between nodes in a clustered filesystem and the clustering fails, all nodes are down as they would not be able to access important files necessary for operation.



        You are also correct, if each node has its own OS and the clustering fails, you should only lose access to files within the cluster as the necessary operating system files would be local. You do not mention what clustering you are using, so there may need to be some further mechanisms to make sure a failed cluster does not otherwise hang the system.



        It seems like your clustering would still be a single point of failure for the shared files, but if it does not take out entire hosts that may make your admin feel better.







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Mar 23 at 19:15









        GracefulRestart

        74917




        74917












            Popular posts from this blog

            How to check contact read email or not when send email to Individual?

            Displaying single band from multi-band raster using QGIS

            How many registers does an x86_64 CPU actually have?