big data + what is the right filesystem ext4 or xfs?

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
4
down vote

favorite
1












We have Linux Redhat version 7.2 , with xfs file system.



from /etc/fstab

/dev/mapper/vgCLU_HDP-root / xfs defaults 0 0
UUID=7de1ab5c-b605-4b6f-bdf1-f1e8658fb9 /boot xfs defaults 0 0
/dev/mapper/vg
/dev/mapper/vgCLU_HDP-root / xfs defaults 0 0
UUID=7de1dc5c-b605-4a6f-bdf1-f1e869f6ffb9 /boot xfs defaults 0 0
/dev/mapper/vgCLU_HDP-var /var xfs defaults 0 0 var /var xfs defaults 0 0


The machines are used for hadoop clusters.



I just thinking what is the best file-system for this purpose?



So what is better EXT4, or XFS regarding that machines are used for hadoop cluster?







share|improve this question





















  • Both work fine. I'm running a cluster using both (mostly because the provision scripts create XFS and we forget to reformat the disks before installing Hadoop)
    – cricket_007
    Apr 30 at 4:11











  • I assume that you mean Red Hat Enterprise Linux, not 7.2 Red Hat Linux 7.2, right?
    – mattdm
    Apr 30 at 5:22











  • yes we have redhat 7.2 version
    – yael
    Apr 30 at 10:04














up vote
4
down vote

favorite
1












We have Linux Redhat version 7.2 , with xfs file system.



from /etc/fstab

/dev/mapper/vgCLU_HDP-root / xfs defaults 0 0
UUID=7de1ab5c-b605-4b6f-bdf1-f1e8658fb9 /boot xfs defaults 0 0
/dev/mapper/vg
/dev/mapper/vgCLU_HDP-root / xfs defaults 0 0
UUID=7de1dc5c-b605-4a6f-bdf1-f1e869f6ffb9 /boot xfs defaults 0 0
/dev/mapper/vgCLU_HDP-var /var xfs defaults 0 0 var /var xfs defaults 0 0


The machines are used for hadoop clusters.



I just thinking what is the best file-system for this purpose?



So what is better EXT4, or XFS regarding that machines are used for hadoop cluster?







share|improve this question





















  • Both work fine. I'm running a cluster using both (mostly because the provision scripts create XFS and we forget to reformat the disks before installing Hadoop)
    – cricket_007
    Apr 30 at 4:11











  • I assume that you mean Red Hat Enterprise Linux, not 7.2 Red Hat Linux 7.2, right?
    – mattdm
    Apr 30 at 5:22











  • yes we have redhat 7.2 version
    – yael
    Apr 30 at 10:04












up vote
4
down vote

favorite
1









up vote
4
down vote

favorite
1






1





We have Linux Redhat version 7.2 , with xfs file system.



from /etc/fstab

/dev/mapper/vgCLU_HDP-root / xfs defaults 0 0
UUID=7de1ab5c-b605-4b6f-bdf1-f1e8658fb9 /boot xfs defaults 0 0
/dev/mapper/vg
/dev/mapper/vgCLU_HDP-root / xfs defaults 0 0
UUID=7de1dc5c-b605-4a6f-bdf1-f1e869f6ffb9 /boot xfs defaults 0 0
/dev/mapper/vgCLU_HDP-var /var xfs defaults 0 0 var /var xfs defaults 0 0


The machines are used for hadoop clusters.



I just thinking what is the best file-system for this purpose?



So what is better EXT4, or XFS regarding that machines are used for hadoop cluster?







share|improve this question













We have Linux Redhat version 7.2 , with xfs file system.



from /etc/fstab

/dev/mapper/vgCLU_HDP-root / xfs defaults 0 0
UUID=7de1ab5c-b605-4b6f-bdf1-f1e8658fb9 /boot xfs defaults 0 0
/dev/mapper/vg
/dev/mapper/vgCLU_HDP-root / xfs defaults 0 0
UUID=7de1dc5c-b605-4a6f-bdf1-f1e869f6ffb9 /boot xfs defaults 0 0
/dev/mapper/vgCLU_HDP-var /var xfs defaults 0 0 var /var xfs defaults 0 0


The machines are used for hadoop clusters.



I just thinking what is the best file-system for this purpose?



So what is better EXT4, or XFS regarding that machines are used for hadoop cluster?









share|improve this question












share|improve this question




share|improve this question








edited Apr 30 at 4:06









A. Rawson

133




133









asked Apr 29 at 15:20









yael

1,9351144




1,9351144











  • Both work fine. I'm running a cluster using both (mostly because the provision scripts create XFS and we forget to reformat the disks before installing Hadoop)
    – cricket_007
    Apr 30 at 4:11











  • I assume that you mean Red Hat Enterprise Linux, not 7.2 Red Hat Linux 7.2, right?
    – mattdm
    Apr 30 at 5:22











  • yes we have redhat 7.2 version
    – yael
    Apr 30 at 10:04
















  • Both work fine. I'm running a cluster using both (mostly because the provision scripts create XFS and we forget to reformat the disks before installing Hadoop)
    – cricket_007
    Apr 30 at 4:11











  • I assume that you mean Red Hat Enterprise Linux, not 7.2 Red Hat Linux 7.2, right?
    – mattdm
    Apr 30 at 5:22











  • yes we have redhat 7.2 version
    – yael
    Apr 30 at 10:04















Both work fine. I'm running a cluster using both (mostly because the provision scripts create XFS and we forget to reformat the disks before installing Hadoop)
– cricket_007
Apr 30 at 4:11





Both work fine. I'm running a cluster using both (mostly because the provision scripts create XFS and we forget to reformat the disks before installing Hadoop)
– cricket_007
Apr 30 at 4:11













I assume that you mean Red Hat Enterprise Linux, not 7.2 Red Hat Linux 7.2, right?
– mattdm
Apr 30 at 5:22





I assume that you mean Red Hat Enterprise Linux, not 7.2 Red Hat Linux 7.2, right?
– mattdm
Apr 30 at 5:22













yes we have redhat 7.2 version
– yael
Apr 30 at 10:04




yes we have redhat 7.2 version
– yael
Apr 30 at 10:04










2 Answers
2






active

oldest

votes

















up vote
4
down vote



accepted










This is addressed in this knowledge base article; the main consideration for you will be the support levels available: Ext4 is supported up to 50TB, XFS up to 500TB. For really big data, you’d probably end up looking at shared storage, which by default means GFS2 on RHEL 7, except that for Hadoop you’d use HDFS or GlusterFS.



For local storage on RHEL the default is XFS and you should generally use that unless you have specific reasons not to.






share|improve this answer






























    up vote
    2
    down vote













    XFS is an amazing filesystem, especially for large files. If your load involves lots of small files, cleaning up any fragmentation periodically may improve performance. I don't worry about it and use XFS for all loads. It is well supported, so no reason not to use it.



    Set aside a machine and disk for your own testing of various filesystems, if you want to find out what is best for your typical work load. Working the test load in steps over the entire disk can tell you something about how the filesystem being tested works.



    Testing your load on your machine is the only way to be sure.






    share|improve this answer





















      Your Answer







      StackExchange.ready(function()
      var channelOptions =
      tags: "".split(" "),
      id: "106"
      ;
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function()
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled)
      StackExchange.using("snippets", function()
      createEditor();
      );

      else
      createEditor();

      );

      function createEditor()
      StackExchange.prepareEditor(
      heartbeatType: 'answer',
      convertImagesToLinks: false,
      noModals: false,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: null,
      bindNavPrevention: true,
      postfix: "",
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      );



      );








       

      draft saved


      draft discarded


















      StackExchange.ready(
      function ()
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f440748%2fbig-data-what-is-the-right-filesystem-ext4-or-xfs%23new-answer', 'question_page');

      );

      Post as a guest






























      2 Answers
      2






      active

      oldest

      votes








      2 Answers
      2






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes








      up vote
      4
      down vote



      accepted










      This is addressed in this knowledge base article; the main consideration for you will be the support levels available: Ext4 is supported up to 50TB, XFS up to 500TB. For really big data, you’d probably end up looking at shared storage, which by default means GFS2 on RHEL 7, except that for Hadoop you’d use HDFS or GlusterFS.



      For local storage on RHEL the default is XFS and you should generally use that unless you have specific reasons not to.






      share|improve this answer



























        up vote
        4
        down vote



        accepted










        This is addressed in this knowledge base article; the main consideration for you will be the support levels available: Ext4 is supported up to 50TB, XFS up to 500TB. For really big data, you’d probably end up looking at shared storage, which by default means GFS2 on RHEL 7, except that for Hadoop you’d use HDFS or GlusterFS.



        For local storage on RHEL the default is XFS and you should generally use that unless you have specific reasons not to.






        share|improve this answer

























          up vote
          4
          down vote



          accepted







          up vote
          4
          down vote



          accepted






          This is addressed in this knowledge base article; the main consideration for you will be the support levels available: Ext4 is supported up to 50TB, XFS up to 500TB. For really big data, you’d probably end up looking at shared storage, which by default means GFS2 on RHEL 7, except that for Hadoop you’d use HDFS or GlusterFS.



          For local storage on RHEL the default is XFS and you should generally use that unless you have specific reasons not to.






          share|improve this answer















          This is addressed in this knowledge base article; the main consideration for you will be the support levels available: Ext4 is supported up to 50TB, XFS up to 500TB. For really big data, you’d probably end up looking at shared storage, which by default means GFS2 on RHEL 7, except that for Hadoop you’d use HDFS or GlusterFS.



          For local storage on RHEL the default is XFS and you should generally use that unless you have specific reasons not to.







          share|improve this answer















          share|improve this answer



          share|improve this answer








          edited Apr 30 at 4:45


























          answered Apr 29 at 15:47









          Stephen Kitt

          140k22302363




          140k22302363






















              up vote
              2
              down vote













              XFS is an amazing filesystem, especially for large files. If your load involves lots of small files, cleaning up any fragmentation periodically may improve performance. I don't worry about it and use XFS for all loads. It is well supported, so no reason not to use it.



              Set aside a machine and disk for your own testing of various filesystems, if you want to find out what is best for your typical work load. Working the test load in steps over the entire disk can tell you something about how the filesystem being tested works.



              Testing your load on your machine is the only way to be sure.






              share|improve this answer

























                up vote
                2
                down vote













                XFS is an amazing filesystem, especially for large files. If your load involves lots of small files, cleaning up any fragmentation periodically may improve performance. I don't worry about it and use XFS for all loads. It is well supported, so no reason not to use it.



                Set aside a machine and disk for your own testing of various filesystems, if you want to find out what is best for your typical work load. Working the test load in steps over the entire disk can tell you something about how the filesystem being tested works.



                Testing your load on your machine is the only way to be sure.






                share|improve this answer























                  up vote
                  2
                  down vote










                  up vote
                  2
                  down vote









                  XFS is an amazing filesystem, especially for large files. If your load involves lots of small files, cleaning up any fragmentation periodically may improve performance. I don't worry about it and use XFS for all loads. It is well supported, so no reason not to use it.



                  Set aside a machine and disk for your own testing of various filesystems, if you want to find out what is best for your typical work load. Working the test load in steps over the entire disk can tell you something about how the filesystem being tested works.



                  Testing your load on your machine is the only way to be sure.






                  share|improve this answer













                  XFS is an amazing filesystem, especially for large files. If your load involves lots of small files, cleaning up any fragmentation periodically may improve performance. I don't worry about it and use XFS for all loads. It is well supported, so no reason not to use it.



                  Set aside a machine and disk for your own testing of various filesystems, if you want to find out what is best for your typical work load. Working the test load in steps over the entire disk can tell you something about how the filesystem being tested works.



                  Testing your load on your machine is the only way to be sure.







                  share|improve this answer













                  share|improve this answer



                  share|improve this answer











                  answered Apr 29 at 17:09









                  casualunixer

                  4651716




                  4651716






















                       

                      draft saved


                      draft discarded


























                       


                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function ()
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f440748%2fbig-data-what-is-the-right-filesystem-ext4-or-xfs%23new-answer', 'question_page');

                      );

                      Post as a guest













































































                      Popular posts from this blog

                      Peggy Mitchell

                      Palaiologos

                      The Forum (Inglewood, California)