Can I configure my Linux system for more aggressive file system caching?

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
100
down vote

favorite
60












I am neither concerned about RAM usage (as I've got enough) nor about losing data in case of an accidental shut-down (as my power is backed, the system is reliable and the data are not critical). But I do a lot of file processing and could use some performance boost.



That's why I'd like to set the system up to use more RAM for file system read and write caching, to prefetch files aggressively (e.g. read-ahead the whole file accessed by an application in case the file is of sane size or at least read-ahead a big chunk of it otherwise) and to flush writing buffers less frequently. How to achieve this (may it be possible)?



I use ext3 and ntfs (I use ntfs a lot!) file systems with XUbuntu 11.10 x86.










share|improve this question























  • Do you have a raid-controller or a "normal" disc controller capable of doing write-ahead?
    – Nils
    Mar 18 '12 at 20:37






  • 5




    If you have lots of RAM, care a lot about performance and don't care about data loss, just copy all your data to a RAM disk and serve it from there, discarding all updates on crash/shutdown. If that won't work for you, you may need to qualify "enough" for RAM or how critical the data isn't.
    – James Youngman
    Mar 19 '12 at 0:16






  • 1




    @Nils, the computer is a laptop, so, I believe, the controller is pretty ordinary.
    – Ivan
    Mar 19 '12 at 0:17






  • 1




    One way to improve performance a lot is to skip durability of data. Simply disable syncing to disk even if some apps requests for sync. This will cause data loss if your storage device ever suffers loss of electricity. If you want to do it anyway, simply execute sudo mount -o ro,nobarrier /path/to/mountpoint or adjust /etc/fstab to include nobarrier for any filesystem that you're willing to sacrifice for improved performance. However, if your storage device has internal battery such as Intel 320 SSD series, using nobarrier causes no data loss.
    – Mikko Rantalainen
    Apr 11 '14 at 8:39






  • 1




    The use of nobarrier is no longer recommended in Red Hat Enterprise Linux 6 as the negative performance impact of write barriers is negligible (approximately 3%). The benefits of write barriers typically outweigh the performance benefits of disabling them. Additionally, the nobarrier option should never be used on storage configured on virtual machines. access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/…
    – Ivailo Bardarov
    Oct 20 '17 at 18:46














up vote
100
down vote

favorite
60












I am neither concerned about RAM usage (as I've got enough) nor about losing data in case of an accidental shut-down (as my power is backed, the system is reliable and the data are not critical). But I do a lot of file processing and could use some performance boost.



That's why I'd like to set the system up to use more RAM for file system read and write caching, to prefetch files aggressively (e.g. read-ahead the whole file accessed by an application in case the file is of sane size or at least read-ahead a big chunk of it otherwise) and to flush writing buffers less frequently. How to achieve this (may it be possible)?



I use ext3 and ntfs (I use ntfs a lot!) file systems with XUbuntu 11.10 x86.










share|improve this question























  • Do you have a raid-controller or a "normal" disc controller capable of doing write-ahead?
    – Nils
    Mar 18 '12 at 20:37






  • 5




    If you have lots of RAM, care a lot about performance and don't care about data loss, just copy all your data to a RAM disk and serve it from there, discarding all updates on crash/shutdown. If that won't work for you, you may need to qualify "enough" for RAM or how critical the data isn't.
    – James Youngman
    Mar 19 '12 at 0:16






  • 1




    @Nils, the computer is a laptop, so, I believe, the controller is pretty ordinary.
    – Ivan
    Mar 19 '12 at 0:17






  • 1




    One way to improve performance a lot is to skip durability of data. Simply disable syncing to disk even if some apps requests for sync. This will cause data loss if your storage device ever suffers loss of electricity. If you want to do it anyway, simply execute sudo mount -o ro,nobarrier /path/to/mountpoint or adjust /etc/fstab to include nobarrier for any filesystem that you're willing to sacrifice for improved performance. However, if your storage device has internal battery such as Intel 320 SSD series, using nobarrier causes no data loss.
    – Mikko Rantalainen
    Apr 11 '14 at 8:39






  • 1




    The use of nobarrier is no longer recommended in Red Hat Enterprise Linux 6 as the negative performance impact of write barriers is negligible (approximately 3%). The benefits of write barriers typically outweigh the performance benefits of disabling them. Additionally, the nobarrier option should never be used on storage configured on virtual machines. access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/…
    – Ivailo Bardarov
    Oct 20 '17 at 18:46












up vote
100
down vote

favorite
60









up vote
100
down vote

favorite
60






60





I am neither concerned about RAM usage (as I've got enough) nor about losing data in case of an accidental shut-down (as my power is backed, the system is reliable and the data are not critical). But I do a lot of file processing and could use some performance boost.



That's why I'd like to set the system up to use more RAM for file system read and write caching, to prefetch files aggressively (e.g. read-ahead the whole file accessed by an application in case the file is of sane size or at least read-ahead a big chunk of it otherwise) and to flush writing buffers less frequently. How to achieve this (may it be possible)?



I use ext3 and ntfs (I use ntfs a lot!) file systems with XUbuntu 11.10 x86.










share|improve this question















I am neither concerned about RAM usage (as I've got enough) nor about losing data in case of an accidental shut-down (as my power is backed, the system is reliable and the data are not critical). But I do a lot of file processing and could use some performance boost.



That's why I'd like to set the system up to use more RAM for file system read and write caching, to prefetch files aggressively (e.g. read-ahead the whole file accessed by an application in case the file is of sane size or at least read-ahead a big chunk of it otherwise) and to flush writing buffers less frequently. How to achieve this (may it be possible)?



I use ext3 and ntfs (I use ntfs a lot!) file systems with XUbuntu 11.10 x86.







linux filesystems performance fstab sysctl






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Jun 28 '12 at 19:54









bahamat

23.9k14690




23.9k14690










asked Jan 29 '12 at 4:22









Ivan

5,634196597




5,634196597











  • Do you have a raid-controller or a "normal" disc controller capable of doing write-ahead?
    – Nils
    Mar 18 '12 at 20:37






  • 5




    If you have lots of RAM, care a lot about performance and don't care about data loss, just copy all your data to a RAM disk and serve it from there, discarding all updates on crash/shutdown. If that won't work for you, you may need to qualify "enough" for RAM or how critical the data isn't.
    – James Youngman
    Mar 19 '12 at 0:16






  • 1




    @Nils, the computer is a laptop, so, I believe, the controller is pretty ordinary.
    – Ivan
    Mar 19 '12 at 0:17






  • 1




    One way to improve performance a lot is to skip durability of data. Simply disable syncing to disk even if some apps requests for sync. This will cause data loss if your storage device ever suffers loss of electricity. If you want to do it anyway, simply execute sudo mount -o ro,nobarrier /path/to/mountpoint or adjust /etc/fstab to include nobarrier for any filesystem that you're willing to sacrifice for improved performance. However, if your storage device has internal battery such as Intel 320 SSD series, using nobarrier causes no data loss.
    – Mikko Rantalainen
    Apr 11 '14 at 8:39






  • 1




    The use of nobarrier is no longer recommended in Red Hat Enterprise Linux 6 as the negative performance impact of write barriers is negligible (approximately 3%). The benefits of write barriers typically outweigh the performance benefits of disabling them. Additionally, the nobarrier option should never be used on storage configured on virtual machines. access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/…
    – Ivailo Bardarov
    Oct 20 '17 at 18:46
















  • Do you have a raid-controller or a "normal" disc controller capable of doing write-ahead?
    – Nils
    Mar 18 '12 at 20:37






  • 5




    If you have lots of RAM, care a lot about performance and don't care about data loss, just copy all your data to a RAM disk and serve it from there, discarding all updates on crash/shutdown. If that won't work for you, you may need to qualify "enough" for RAM or how critical the data isn't.
    – James Youngman
    Mar 19 '12 at 0:16






  • 1




    @Nils, the computer is a laptop, so, I believe, the controller is pretty ordinary.
    – Ivan
    Mar 19 '12 at 0:17






  • 1




    One way to improve performance a lot is to skip durability of data. Simply disable syncing to disk even if some apps requests for sync. This will cause data loss if your storage device ever suffers loss of electricity. If you want to do it anyway, simply execute sudo mount -o ro,nobarrier /path/to/mountpoint or adjust /etc/fstab to include nobarrier for any filesystem that you're willing to sacrifice for improved performance. However, if your storage device has internal battery such as Intel 320 SSD series, using nobarrier causes no data loss.
    – Mikko Rantalainen
    Apr 11 '14 at 8:39






  • 1




    The use of nobarrier is no longer recommended in Red Hat Enterprise Linux 6 as the negative performance impact of write barriers is negligible (approximately 3%). The benefits of write barriers typically outweigh the performance benefits of disabling them. Additionally, the nobarrier option should never be used on storage configured on virtual machines. access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/…
    – Ivailo Bardarov
    Oct 20 '17 at 18:46















Do you have a raid-controller or a "normal" disc controller capable of doing write-ahead?
– Nils
Mar 18 '12 at 20:37




Do you have a raid-controller or a "normal" disc controller capable of doing write-ahead?
– Nils
Mar 18 '12 at 20:37




5




5




If you have lots of RAM, care a lot about performance and don't care about data loss, just copy all your data to a RAM disk and serve it from there, discarding all updates on crash/shutdown. If that won't work for you, you may need to qualify "enough" for RAM or how critical the data isn't.
– James Youngman
Mar 19 '12 at 0:16




If you have lots of RAM, care a lot about performance and don't care about data loss, just copy all your data to a RAM disk and serve it from there, discarding all updates on crash/shutdown. If that won't work for you, you may need to qualify "enough" for RAM or how critical the data isn't.
– James Youngman
Mar 19 '12 at 0:16




1




1




@Nils, the computer is a laptop, so, I believe, the controller is pretty ordinary.
– Ivan
Mar 19 '12 at 0:17




@Nils, the computer is a laptop, so, I believe, the controller is pretty ordinary.
– Ivan
Mar 19 '12 at 0:17




1




1




One way to improve performance a lot is to skip durability of data. Simply disable syncing to disk even if some apps requests for sync. This will cause data loss if your storage device ever suffers loss of electricity. If you want to do it anyway, simply execute sudo mount -o ro,nobarrier /path/to/mountpoint or adjust /etc/fstab to include nobarrier for any filesystem that you're willing to sacrifice for improved performance. However, if your storage device has internal battery such as Intel 320 SSD series, using nobarrier causes no data loss.
– Mikko Rantalainen
Apr 11 '14 at 8:39




One way to improve performance a lot is to skip durability of data. Simply disable syncing to disk even if some apps requests for sync. This will cause data loss if your storage device ever suffers loss of electricity. If you want to do it anyway, simply execute sudo mount -o ro,nobarrier /path/to/mountpoint or adjust /etc/fstab to include nobarrier for any filesystem that you're willing to sacrifice for improved performance. However, if your storage device has internal battery such as Intel 320 SSD series, using nobarrier causes no data loss.
– Mikko Rantalainen
Apr 11 '14 at 8:39




1




1




The use of nobarrier is no longer recommended in Red Hat Enterprise Linux 6 as the negative performance impact of write barriers is negligible (approximately 3%). The benefits of write barriers typically outweigh the performance benefits of disabling them. Additionally, the nobarrier option should never be used on storage configured on virtual machines. access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/…
– Ivailo Bardarov
Oct 20 '17 at 18:46




The use of nobarrier is no longer recommended in Red Hat Enterprise Linux 6 as the negative performance impact of write barriers is negligible (approximately 3%). The benefits of write barriers typically outweigh the performance benefits of disabling them. Additionally, the nobarrier option should never be used on storage configured on virtual machines. access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/…
– Ivailo Bardarov
Oct 20 '17 at 18:46










6 Answers
6






active

oldest

votes

















up vote
92
down vote













Improving disk cache performance in general is more than just increasing the file system cache size unless your whole system fits in RAM in which case you should use RAM drive (tmpfs is good because it allows falling back to disk if you need the RAM in some case) for runtime storage (and perhaps an initrd script to copy system from storage to RAM drive at startup).



You didn't tell if your storage device is SSD or HDD. Here's what I've found to work for me (in my case sda is a HDD mounted at /home and sdb is SSD mounted at /).



First optimize the load-stuff-from-storage-to-cache part:



Here's my setup for HDD (make sure AHCI+NCQ is enabled in BIOS if you have toggles):



echo cfq > /sys/block/sda/queue/scheduler
echo 10000 > /sys/block/sda/queue/iosched/fifo_expire_async
echo 250 > /sys/block/sda/queue/iosched/fifo_expire_sync
echo 80 > /sys/block/sda/queue/iosched/slice_async
echo 1 > /sys/block/sda/queue/iosched/low_latency
echo 6 > /sys/block/sda/queue/iosched/quantum
echo 5 > /sys/block/sda/queue/iosched/slice_async_rq
echo 3 > /sys/block/sda/queue/iosched/slice_idle
echo 100 > /sys/block/sda/queue/iosched/slice_sync
hdparm -q -M 254 /dev/sda


Worth noting for the HDD case is high fifo_expire_async (usually write) and long slice_sync to allow a single process to get high throughput (set slice_sync to lower number if you hit situations where multiple processes are waiting for some data from the disk in parallel). The slice_idle is always a compromise for HDDs but setting it somewhere in range 3-20 should be okay depending on disk usage and disk firmware. I prefer to target for low values but setting it too low will destroy your throughput. The quantum setting seems to affect throughput a lot but try to keep this as low as possible to keep latency on sensible level. Setting quantum too low will destroy throughput. Values in range 3-8 seem to work well with HDDs. The worst case latency for a read is (quantum * slice_sync) + (slice_async_rq * slice_async) ms if I've understood the kernel behavior correctly. The async is mostly used by writes and since you're willing to delay writing to disk, set both slice_async_rq and slice_async to very low numbers. However, setting slice_async_rq too low value may stall reads because writes cannot be delayed after reads any more. My config will try to write data to disk at most after 10 seconds after data has been passed to kernel but since you can tolerate loss of data on power loss also set fifo_expire_async to 3600000 to tell that 1 hour is okay for the delay to disk. Just keep the slice_async low, though, because otherwise you can get high read latency.



The hdparm command is required to prevent AAM from killing much of the performance that AHCI+NCQ allows. If your disk makes too much noise, then skip this.



Here's my setup for SSD (Intel 320 series):



echo cfq > /sys/block/sdb/queue/scheduler
echo 1 > /sys/block/sdb/queue/iosched/back_seek_penalty
echo 10000 > /sys/block/sdb/queue/iosched/fifo_expire_async
echo 20 > /sys/block/sdb/queue/iosched/fifo_expire_sync
echo 1 > /sys/block/sdb/queue/iosched/low_latency
echo 6 > /sys/block/sdb/queue/iosched/quantum
echo 2 > /sys/block/sdb/queue/iosched/slice_async
echo 10 > /sys/block/sdb/queue/iosched/slice_async_rq
echo 1 > /sys/block/sdb/queue/iosched/slice_idle
echo 20 > /sys/block/sdb/queue/iosched/slice_sync


Here it's worth noting the low values for different slice settings. The most important setting for an SSD is slice_idle which must be set to 0-1. Setting it to zero moves all ordering decisions to native NCQ while setting it to 1 allows kernel to order requests (but if the NCQ is active, the hardware may override kernel ordering partially). Test both values to see if you can see the difference. For Intel 320 series, it seems that setting slide_idle to 0 gives the best throughput but setting it to 1 gives best (lowest) overall latency.



For more information about these tunables, see http://www.linux-mag.com/id/7572/.



Now that we have configured kernel to load stuff from disk to cache with sensible performance, it's time to adjust the cache behavior:



According to benchmarks I've done, I wouldn't bother setting read ahead via blockdev at all. Kernel default settings are fine.



Set system to prefer swapping file data over application code (this does not matter if you have enough RAM to keep whole filesystem and all the application code and all virtual memory allocated by applications in RAM). This reduces latency for swapping between different applications over latency for accessing big files from a single application:



echo 15 > /proc/sys/vm/swappiness


If you prefer to keep applications nearly always in RAM you could set this to 1. If you set this to zero, kernel will not swap at all unless absolutely necessary to avoid OOM. If you were memory limited and working with big files (e.g. HD video editing), then it might make sense to set this close to 100.



I nowadays (2017) prefer to have no swap at all if you have enough RAM. Having no swap will usually lose 200-1000 MB of RAM on long running desktop machine. I'm willing to sacrifice that much to avoid worst case scenario latency (swapping application code in when RAM is full). In practice, this means that I prefer OOM Killer to swapping. If you allow/need swapping, you might want to increase /proc/sys/vm/watermark_scale_factor, too, to avoid some latency. I would suggest values between 100 and 500. You can consider this setting as trading CPU usage for lower swap latency. Default is 10 and maximum possible is 1000. Higher value should (according to kernel documentation) result in higher CPU usage for kswapd processes and lower overall swapping latency.



Next, tell kernel to prefer keeping directory hierarchy in memory over file contents in case some RAM needs to be freed (again, if everything fits in RAM, this setting does nothing):



echo 10 > /proc/sys/vm/vfs_cache_pressure


Setting vfs_cache_pressure to low value makes sense because in most cases, the kernel needs to know the directory structure before it can use file contents from the cache and flushing the directory cache too soon will make the file cache next to worthless. Consider going all the way down to 1 with this setting if you have lots of small files (my system has around 150K 10 megapixel photos and counts as "lots of small files" system). Never set it to zero or directory structure is always kept in memory even if the system is running out of the memory. Setting this to big value is sensible only if you have only a few big files that are constantly being re-read (again, HD video editing without enough RAM would be an example case). Official kernel documentation says that "increasing vfs_cache_pressure significantly beyond 100 may have negative performance impact".



Exception: if you have truly massive amount of files and directories and you rarely touch/read/list all files setting vfs_cache_pressure higher than 100 may be wise. This only applies if you do not have enough RAM and cannot keep whole directory structure in RAM and still having enough RAM for normal file cache and processes (e.g. company wide file server with lots of archival content). If you feel that you need to increase vfs_cache_pressure above 100 you're running without enough RAM. Increasing vfs_cache_pressure may help but the only real fix is to get more RAM. Having vfs_cache_pressure set to high number sacrifices average performance for having more stable performance overall (that is, you can avoid really bad worst case behavior but have to deal with worse overall performance).



Finally tell the kernel to use up to 99% of the RAM as cache for writes and instruct kernel to use up to 50% of RAM before slowing down the process that's writing (default for dirty_background_ratio is 10). Warning: I personally would not do this but you claimed to have enough RAM and are willing to lose the data.



echo 99 > /proc/sys/vm/dirty_ratio
echo 50 > /proc/sys/vm/dirty_background_ratio


And tell that 1h write delay is ok to even start writing stuff on the disk (again, I would not do this):



echo 360000 > /proc/sys/vm/dirty_expire_centisecs
echo 360000 > /proc/sys/vm/dirty_writeback_centisecs


If you put all of those to /etc/rc.local and include following at the end, everything will be in cache as soon as possible after boot (only do this if your filesystem really fits in the RAM):



(nice find / -type f -and -not -path '/sys/*' -and -not -path '/proc/*' -print0 2>/dev/null | nice ionice -c 3 wc -l --files0-from - > /dev/null)&


Or a bit simpler alternative which might work better (cache only /home and /usr, only do this if your /home and /usr really fit in RAM):



(nice find /home /usr -type f -print0 | nice ionice -c 3 wc -l --files0-from - > /dev/null)&





share|improve this answer


















  • 3




    A well-informed and overall much better answer than the accepted one! This one is underrated... I guess most people just want simple instructions without bothering to understand what they really do...
    – Vladimir Panteleev
    Jan 26 '13 at 18:17






  • 2




    @Phpdevpad: In addition, the question said "I am neither concerned about RAM usage [...]"--I don't think any Maemo device qualifies.
    – Mikko Rantalainen
    Jan 28 '13 at 7:11






  • 1




    Isn't noop or deadline a better scheduler for SSDs?
    – rep_movsd
    Aug 14 '13 at 7:58






  • 1




    @rep_movsd I've been using only intel SSD drives but at least these drives are still slow enough to have better overall performance with more intelligent schedulers such as CFQ. I'd guess that if your SSD drive can deal with more than 100K random IOPS, using noop or deadline would make sense even with fast CPU. With "fast CPU" I mean something that has at least multiple 3GHz cores available for IO only.
    – Mikko Rantalainen
    Aug 15 '13 at 8:58











  • You can also read about these vm tunables from the vm kernel docs.
    – joeytwiddle
    3 hours ago

















up vote
15
down vote













Firstly, I DO NOT recommend you continue using NTFS, as ntfs implemention in Linux would be performance and security trouble at any time.



There are several things you can do:



  • use some newer fs such as ext4 or btrfs

  • try to change your io scheduler, for example bfq

  • turn off swap

  • use some automatic preloader like preload

  • use something like systemd to preload while booting

  • ... and something more

Maybe you want to give it a try :-)






share|improve this answer
















  • 1




    I've already moved entirely away from NTFS to ext4 once, leaving the only NTFS partition to be the Windows system partition. But it turned in many inconveniences for me and I have turned back to NTFS as the main data partition (where I store all my documents, downloads, projects, source code etc.) file system. I don't give up rethinking my partitions structure and my workflow (to use less Windows) but right now giving up NTFS doesn't seem a realistic option.
    – Ivan
    Feb 3 '12 at 12:39











  • If you have to use your data inside Windows too, NTFS may be the only option. (many other options available if you can use your Windows just as a VM inside linux)
    – Felix Yan
    Feb 3 '12 at 12:41






  • 1




    A summary of what these supposed problems are of NTFS would have been useful.
    – underscore_d
    Oct 5 '15 at 22:40






  • 1




    NTFS on Linux is pretty much acceptable except for the performance. Considering that the question was specifically about improving file system performance, NTFS should be the first thing to go.
    – Mikko Rantalainen
    Apr 12 at 12:51










  • Even though btrfs is recently designed file system, I would avoid that if performance is needed. We've been running otherwise identical systems with btrfs and ext4 file systems and ext4 wins in real world with a big margin (btrfs seems to require about 4x CPU time the ext4 needs for the same performance level and causes more disk operations for a single logical command). Depending on workload, I would suggest ext4, jfs or xfs for any performance demanding work.
    – Mikko Rantalainen
    May 15 at 6:00

















up vote
7
down vote













Read ahead:



On 32 bit systems:



blockdev --setra 8388607 /dev/sda


On 64 bit systems:



blockdev --setra 4294967295 /dev/sda


Write behind cache:



echo 100 > /proc/sys/vm/dirty_ratio


This will use up to 100% of your free memory as write cache.



Or you can go all out and use tmpfs. This is only relevant if you have RAM enough. Put this in /etc/fstab. Replace 100G with the amount of physical RAM.



tmpfs /mnt/tmpfs tmpfs size=100G,rw,nosuid,nodev 0 0


Then:



mkdir /mnt/tmpfs; mount -a


Then use /mnt/tmpfs.






share|improve this answer
















  • 3




    3GB or 2TB readahead? really? Do you even know what these options do?
    – Cobra_Fast
    Dec 25 '13 at 4:11






  • 1




    @Cobra_Fast Do you know what it means? I really have no idea and I am interested now.
    – syss
    Jun 15 '15 at 19:22







  • 2




    @syss the readahead settings are saved as number of memory "blocks", not bytes or bits. The size of one block is determined at kernel compilation time (since readahead-blocks are memory blocks) or filesystem creation time in some cases. Normally though, 1 block contains 512 or 4096 bytes. See linux.die.net/man/8/blockdev
    – Cobra_Fast
    Jun 15 '15 at 22:32


















up vote
6
down vote













You can set the read-ahead size with blockdev --setra sectors /dev/sda1, where sectors is the size you want in 512 byte sectors.






share|improve this answer



























    up vote
    1
    down vote













    My killer setting is very simple and very effective:



    echo "2000" > /proc/sys/vm/vfs_cache_pressure


    The explanation from kernel documentation:




    vfs_cache_pressure



    Controls the tendency of the kernel to reclaim the memory which is
    used for caching of directory and inode objects.



    At the default value of vfs_cache_pressure=100 the kernel will attempt
    to reclaim dentries and inodes at a "fair" rate with respect to
    pagecache and swapcache reclaim. Decreasing vfs_cache_pressure causes
    the kernel to prefer to retain dentry and inode caches. When
    vfs_cache_pressure=0, the kernel will never reclaim dentries and
    inodes due to memory pressure and this can easily lead to
    out-of-memory conditions. Increasing vfs_cache_pressure beyond 100
    causes the kernel to prefer to reclaim dentries and inodes.




    vfs_cache_pressure at 2000 causes that most of computing happens in the RAM
    and very late disk writes.






    share|improve this answer


















    • 4




      Setting vfs_cache_pressure too high (I would consider 2000 too high) will cause unnecessary disk access even for simple stuff such as directory listings which should easily fit in cache. How much RAM do you have and what are you doing with the system? As I wrote in my answer, using high value for this setting makes sense for e.g. HD video editing with limited RAM.
      – Mikko Rantalainen
      Sep 30 '14 at 10:51






    • 1




      Note that the referenced documentation continues: "Increasing vfs_cache_pressure significantly beyond 100 may have negative performance impact. Reclaim code needs to take various locks to find freeable directory and inode objects. With vfs_cache_pressure=1000, it will look for ten times more freeable objects than there are."
      – Mikko Rantalainen
      Mar 14 at 7:02

















    up vote
    0
    down vote













    Not related to write caching, but related to writes:




    • For an ext4 system, you could disable journaling entirely



      This will reduce the number of disk writes for any particular update, but may leave the filesystem is an inconsistent state after an unexpected shutdown, requiring an fsck or worse.



    To stop disk reads from triggering disk writes:




    • Mount with the noatime option



      When you read a file, the "last accessed time" metadata for that file is usually updated. The noatime option will disable that behaviour. This reduces unnecessary disk writes, but you will no longer have that metadata. Do you ever use that data? Some distributions are adopting this as default on all partitions (probably to increase the lifespan of earlier model SSDs).



    Other options:



    • In the comments above, Mikko shared the possibility of mounting with the nobarrier option. But Ivailo quoted RedHat who caution against it. How badly do you want that extra 3%?





    share|improve this answer




















      Your Answer








      StackExchange.ready(function()
      var channelOptions =
      tags: "".split(" "),
      id: "106"
      ;
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function()
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled)
      StackExchange.using("snippets", function()
      createEditor();
      );

      else
      createEditor();

      );

      function createEditor()
      StackExchange.prepareEditor(
      heartbeatType: 'answer',
      convertImagesToLinks: false,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: null,
      bindNavPrevention: true,
      postfix: "",
      imageUploader:
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      ,
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      );



      );













       

      draft saved


      draft discarded


















      StackExchange.ready(
      function ()
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f30286%2fcan-i-configure-my-linux-system-for-more-aggressive-file-system-caching%23new-answer', 'question_page');

      );

      Post as a guest






























      6 Answers
      6






      active

      oldest

      votes








      6 Answers
      6






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes








      up vote
      92
      down vote













      Improving disk cache performance in general is more than just increasing the file system cache size unless your whole system fits in RAM in which case you should use RAM drive (tmpfs is good because it allows falling back to disk if you need the RAM in some case) for runtime storage (and perhaps an initrd script to copy system from storage to RAM drive at startup).



      You didn't tell if your storage device is SSD or HDD. Here's what I've found to work for me (in my case sda is a HDD mounted at /home and sdb is SSD mounted at /).



      First optimize the load-stuff-from-storage-to-cache part:



      Here's my setup for HDD (make sure AHCI+NCQ is enabled in BIOS if you have toggles):



      echo cfq > /sys/block/sda/queue/scheduler
      echo 10000 > /sys/block/sda/queue/iosched/fifo_expire_async
      echo 250 > /sys/block/sda/queue/iosched/fifo_expire_sync
      echo 80 > /sys/block/sda/queue/iosched/slice_async
      echo 1 > /sys/block/sda/queue/iosched/low_latency
      echo 6 > /sys/block/sda/queue/iosched/quantum
      echo 5 > /sys/block/sda/queue/iosched/slice_async_rq
      echo 3 > /sys/block/sda/queue/iosched/slice_idle
      echo 100 > /sys/block/sda/queue/iosched/slice_sync
      hdparm -q -M 254 /dev/sda


      Worth noting for the HDD case is high fifo_expire_async (usually write) and long slice_sync to allow a single process to get high throughput (set slice_sync to lower number if you hit situations where multiple processes are waiting for some data from the disk in parallel). The slice_idle is always a compromise for HDDs but setting it somewhere in range 3-20 should be okay depending on disk usage and disk firmware. I prefer to target for low values but setting it too low will destroy your throughput. The quantum setting seems to affect throughput a lot but try to keep this as low as possible to keep latency on sensible level. Setting quantum too low will destroy throughput. Values in range 3-8 seem to work well with HDDs. The worst case latency for a read is (quantum * slice_sync) + (slice_async_rq * slice_async) ms if I've understood the kernel behavior correctly. The async is mostly used by writes and since you're willing to delay writing to disk, set both slice_async_rq and slice_async to very low numbers. However, setting slice_async_rq too low value may stall reads because writes cannot be delayed after reads any more. My config will try to write data to disk at most after 10 seconds after data has been passed to kernel but since you can tolerate loss of data on power loss also set fifo_expire_async to 3600000 to tell that 1 hour is okay for the delay to disk. Just keep the slice_async low, though, because otherwise you can get high read latency.



      The hdparm command is required to prevent AAM from killing much of the performance that AHCI+NCQ allows. If your disk makes too much noise, then skip this.



      Here's my setup for SSD (Intel 320 series):



      echo cfq > /sys/block/sdb/queue/scheduler
      echo 1 > /sys/block/sdb/queue/iosched/back_seek_penalty
      echo 10000 > /sys/block/sdb/queue/iosched/fifo_expire_async
      echo 20 > /sys/block/sdb/queue/iosched/fifo_expire_sync
      echo 1 > /sys/block/sdb/queue/iosched/low_latency
      echo 6 > /sys/block/sdb/queue/iosched/quantum
      echo 2 > /sys/block/sdb/queue/iosched/slice_async
      echo 10 > /sys/block/sdb/queue/iosched/slice_async_rq
      echo 1 > /sys/block/sdb/queue/iosched/slice_idle
      echo 20 > /sys/block/sdb/queue/iosched/slice_sync


      Here it's worth noting the low values for different slice settings. The most important setting for an SSD is slice_idle which must be set to 0-1. Setting it to zero moves all ordering decisions to native NCQ while setting it to 1 allows kernel to order requests (but if the NCQ is active, the hardware may override kernel ordering partially). Test both values to see if you can see the difference. For Intel 320 series, it seems that setting slide_idle to 0 gives the best throughput but setting it to 1 gives best (lowest) overall latency.



      For more information about these tunables, see http://www.linux-mag.com/id/7572/.



      Now that we have configured kernel to load stuff from disk to cache with sensible performance, it's time to adjust the cache behavior:



      According to benchmarks I've done, I wouldn't bother setting read ahead via blockdev at all. Kernel default settings are fine.



      Set system to prefer swapping file data over application code (this does not matter if you have enough RAM to keep whole filesystem and all the application code and all virtual memory allocated by applications in RAM). This reduces latency for swapping between different applications over latency for accessing big files from a single application:



      echo 15 > /proc/sys/vm/swappiness


      If you prefer to keep applications nearly always in RAM you could set this to 1. If you set this to zero, kernel will not swap at all unless absolutely necessary to avoid OOM. If you were memory limited and working with big files (e.g. HD video editing), then it might make sense to set this close to 100.



      I nowadays (2017) prefer to have no swap at all if you have enough RAM. Having no swap will usually lose 200-1000 MB of RAM on long running desktop machine. I'm willing to sacrifice that much to avoid worst case scenario latency (swapping application code in when RAM is full). In practice, this means that I prefer OOM Killer to swapping. If you allow/need swapping, you might want to increase /proc/sys/vm/watermark_scale_factor, too, to avoid some latency. I would suggest values between 100 and 500. You can consider this setting as trading CPU usage for lower swap latency. Default is 10 and maximum possible is 1000. Higher value should (according to kernel documentation) result in higher CPU usage for kswapd processes and lower overall swapping latency.



      Next, tell kernel to prefer keeping directory hierarchy in memory over file contents in case some RAM needs to be freed (again, if everything fits in RAM, this setting does nothing):



      echo 10 > /proc/sys/vm/vfs_cache_pressure


      Setting vfs_cache_pressure to low value makes sense because in most cases, the kernel needs to know the directory structure before it can use file contents from the cache and flushing the directory cache too soon will make the file cache next to worthless. Consider going all the way down to 1 with this setting if you have lots of small files (my system has around 150K 10 megapixel photos and counts as "lots of small files" system). Never set it to zero or directory structure is always kept in memory even if the system is running out of the memory. Setting this to big value is sensible only if you have only a few big files that are constantly being re-read (again, HD video editing without enough RAM would be an example case). Official kernel documentation says that "increasing vfs_cache_pressure significantly beyond 100 may have negative performance impact".



      Exception: if you have truly massive amount of files and directories and you rarely touch/read/list all files setting vfs_cache_pressure higher than 100 may be wise. This only applies if you do not have enough RAM and cannot keep whole directory structure in RAM and still having enough RAM for normal file cache and processes (e.g. company wide file server with lots of archival content). If you feel that you need to increase vfs_cache_pressure above 100 you're running without enough RAM. Increasing vfs_cache_pressure may help but the only real fix is to get more RAM. Having vfs_cache_pressure set to high number sacrifices average performance for having more stable performance overall (that is, you can avoid really bad worst case behavior but have to deal with worse overall performance).



      Finally tell the kernel to use up to 99% of the RAM as cache for writes and instruct kernel to use up to 50% of RAM before slowing down the process that's writing (default for dirty_background_ratio is 10). Warning: I personally would not do this but you claimed to have enough RAM and are willing to lose the data.



      echo 99 > /proc/sys/vm/dirty_ratio
      echo 50 > /proc/sys/vm/dirty_background_ratio


      And tell that 1h write delay is ok to even start writing stuff on the disk (again, I would not do this):



      echo 360000 > /proc/sys/vm/dirty_expire_centisecs
      echo 360000 > /proc/sys/vm/dirty_writeback_centisecs


      If you put all of those to /etc/rc.local and include following at the end, everything will be in cache as soon as possible after boot (only do this if your filesystem really fits in the RAM):



      (nice find / -type f -and -not -path '/sys/*' -and -not -path '/proc/*' -print0 2>/dev/null | nice ionice -c 3 wc -l --files0-from - > /dev/null)&


      Or a bit simpler alternative which might work better (cache only /home and /usr, only do this if your /home and /usr really fit in RAM):



      (nice find /home /usr -type f -print0 | nice ionice -c 3 wc -l --files0-from - > /dev/null)&





      share|improve this answer


















      • 3




        A well-informed and overall much better answer than the accepted one! This one is underrated... I guess most people just want simple instructions without bothering to understand what they really do...
        – Vladimir Panteleev
        Jan 26 '13 at 18:17






      • 2




        @Phpdevpad: In addition, the question said "I am neither concerned about RAM usage [...]"--I don't think any Maemo device qualifies.
        – Mikko Rantalainen
        Jan 28 '13 at 7:11






      • 1




        Isn't noop or deadline a better scheduler for SSDs?
        – rep_movsd
        Aug 14 '13 at 7:58






      • 1




        @rep_movsd I've been using only intel SSD drives but at least these drives are still slow enough to have better overall performance with more intelligent schedulers such as CFQ. I'd guess that if your SSD drive can deal with more than 100K random IOPS, using noop or deadline would make sense even with fast CPU. With "fast CPU" I mean something that has at least multiple 3GHz cores available for IO only.
        – Mikko Rantalainen
        Aug 15 '13 at 8:58











      • You can also read about these vm tunables from the vm kernel docs.
        – joeytwiddle
        3 hours ago














      up vote
      92
      down vote













      Improving disk cache performance in general is more than just increasing the file system cache size unless your whole system fits in RAM in which case you should use RAM drive (tmpfs is good because it allows falling back to disk if you need the RAM in some case) for runtime storage (and perhaps an initrd script to copy system from storage to RAM drive at startup).



      You didn't tell if your storage device is SSD or HDD. Here's what I've found to work for me (in my case sda is a HDD mounted at /home and sdb is SSD mounted at /).



      First optimize the load-stuff-from-storage-to-cache part:



      Here's my setup for HDD (make sure AHCI+NCQ is enabled in BIOS if you have toggles):



      echo cfq > /sys/block/sda/queue/scheduler
      echo 10000 > /sys/block/sda/queue/iosched/fifo_expire_async
      echo 250 > /sys/block/sda/queue/iosched/fifo_expire_sync
      echo 80 > /sys/block/sda/queue/iosched/slice_async
      echo 1 > /sys/block/sda/queue/iosched/low_latency
      echo 6 > /sys/block/sda/queue/iosched/quantum
      echo 5 > /sys/block/sda/queue/iosched/slice_async_rq
      echo 3 > /sys/block/sda/queue/iosched/slice_idle
      echo 100 > /sys/block/sda/queue/iosched/slice_sync
      hdparm -q -M 254 /dev/sda


      Worth noting for the HDD case is high fifo_expire_async (usually write) and long slice_sync to allow a single process to get high throughput (set slice_sync to lower number if you hit situations where multiple processes are waiting for some data from the disk in parallel). The slice_idle is always a compromise for HDDs but setting it somewhere in range 3-20 should be okay depending on disk usage and disk firmware. I prefer to target for low values but setting it too low will destroy your throughput. The quantum setting seems to affect throughput a lot but try to keep this as low as possible to keep latency on sensible level. Setting quantum too low will destroy throughput. Values in range 3-8 seem to work well with HDDs. The worst case latency for a read is (quantum * slice_sync) + (slice_async_rq * slice_async) ms if I've understood the kernel behavior correctly. The async is mostly used by writes and since you're willing to delay writing to disk, set both slice_async_rq and slice_async to very low numbers. However, setting slice_async_rq too low value may stall reads because writes cannot be delayed after reads any more. My config will try to write data to disk at most after 10 seconds after data has been passed to kernel but since you can tolerate loss of data on power loss also set fifo_expire_async to 3600000 to tell that 1 hour is okay for the delay to disk. Just keep the slice_async low, though, because otherwise you can get high read latency.



      The hdparm command is required to prevent AAM from killing much of the performance that AHCI+NCQ allows. If your disk makes too much noise, then skip this.



      Here's my setup for SSD (Intel 320 series):



      echo cfq > /sys/block/sdb/queue/scheduler
      echo 1 > /sys/block/sdb/queue/iosched/back_seek_penalty
      echo 10000 > /sys/block/sdb/queue/iosched/fifo_expire_async
      echo 20 > /sys/block/sdb/queue/iosched/fifo_expire_sync
      echo 1 > /sys/block/sdb/queue/iosched/low_latency
      echo 6 > /sys/block/sdb/queue/iosched/quantum
      echo 2 > /sys/block/sdb/queue/iosched/slice_async
      echo 10 > /sys/block/sdb/queue/iosched/slice_async_rq
      echo 1 > /sys/block/sdb/queue/iosched/slice_idle
      echo 20 > /sys/block/sdb/queue/iosched/slice_sync


      Here it's worth noting the low values for different slice settings. The most important setting for an SSD is slice_idle which must be set to 0-1. Setting it to zero moves all ordering decisions to native NCQ while setting it to 1 allows kernel to order requests (but if the NCQ is active, the hardware may override kernel ordering partially). Test both values to see if you can see the difference. For Intel 320 series, it seems that setting slide_idle to 0 gives the best throughput but setting it to 1 gives best (lowest) overall latency.



      For more information about these tunables, see http://www.linux-mag.com/id/7572/.



      Now that we have configured kernel to load stuff from disk to cache with sensible performance, it's time to adjust the cache behavior:



      According to benchmarks I've done, I wouldn't bother setting read ahead via blockdev at all. Kernel default settings are fine.



      Set system to prefer swapping file data over application code (this does not matter if you have enough RAM to keep whole filesystem and all the application code and all virtual memory allocated by applications in RAM). This reduces latency for swapping between different applications over latency for accessing big files from a single application:



      echo 15 > /proc/sys/vm/swappiness


      If you prefer to keep applications nearly always in RAM you could set this to 1. If you set this to zero, kernel will not swap at all unless absolutely necessary to avoid OOM. If you were memory limited and working with big files (e.g. HD video editing), then it might make sense to set this close to 100.



      I nowadays (2017) prefer to have no swap at all if you have enough RAM. Having no swap will usually lose 200-1000 MB of RAM on long running desktop machine. I'm willing to sacrifice that much to avoid worst case scenario latency (swapping application code in when RAM is full). In practice, this means that I prefer OOM Killer to swapping. If you allow/need swapping, you might want to increase /proc/sys/vm/watermark_scale_factor, too, to avoid some latency. I would suggest values between 100 and 500. You can consider this setting as trading CPU usage for lower swap latency. Default is 10 and maximum possible is 1000. Higher value should (according to kernel documentation) result in higher CPU usage for kswapd processes and lower overall swapping latency.



      Next, tell kernel to prefer keeping directory hierarchy in memory over file contents in case some RAM needs to be freed (again, if everything fits in RAM, this setting does nothing):



      echo 10 > /proc/sys/vm/vfs_cache_pressure


      Setting vfs_cache_pressure to low value makes sense because in most cases, the kernel needs to know the directory structure before it can use file contents from the cache and flushing the directory cache too soon will make the file cache next to worthless. Consider going all the way down to 1 with this setting if you have lots of small files (my system has around 150K 10 megapixel photos and counts as "lots of small files" system). Never set it to zero or directory structure is always kept in memory even if the system is running out of the memory. Setting this to big value is sensible only if you have only a few big files that are constantly being re-read (again, HD video editing without enough RAM would be an example case). Official kernel documentation says that "increasing vfs_cache_pressure significantly beyond 100 may have negative performance impact".



      Exception: if you have truly massive amount of files and directories and you rarely touch/read/list all files setting vfs_cache_pressure higher than 100 may be wise. This only applies if you do not have enough RAM and cannot keep whole directory structure in RAM and still having enough RAM for normal file cache and processes (e.g. company wide file server with lots of archival content). If you feel that you need to increase vfs_cache_pressure above 100 you're running without enough RAM. Increasing vfs_cache_pressure may help but the only real fix is to get more RAM. Having vfs_cache_pressure set to high number sacrifices average performance for having more stable performance overall (that is, you can avoid really bad worst case behavior but have to deal with worse overall performance).



      Finally tell the kernel to use up to 99% of the RAM as cache for writes and instruct kernel to use up to 50% of RAM before slowing down the process that's writing (default for dirty_background_ratio is 10). Warning: I personally would not do this but you claimed to have enough RAM and are willing to lose the data.



      echo 99 > /proc/sys/vm/dirty_ratio
      echo 50 > /proc/sys/vm/dirty_background_ratio


      And tell that 1h write delay is ok to even start writing stuff on the disk (again, I would not do this):



      echo 360000 > /proc/sys/vm/dirty_expire_centisecs
      echo 360000 > /proc/sys/vm/dirty_writeback_centisecs


      If you put all of those to /etc/rc.local and include following at the end, everything will be in cache as soon as possible after boot (only do this if your filesystem really fits in the RAM):



      (nice find / -type f -and -not -path '/sys/*' -and -not -path '/proc/*' -print0 2>/dev/null | nice ionice -c 3 wc -l --files0-from - > /dev/null)&


      Or a bit simpler alternative which might work better (cache only /home and /usr, only do this if your /home and /usr really fit in RAM):



      (nice find /home /usr -type f -print0 | nice ionice -c 3 wc -l --files0-from - > /dev/null)&





      share|improve this answer


















      • 3




        A well-informed and overall much better answer than the accepted one! This one is underrated... I guess most people just want simple instructions without bothering to understand what they really do...
        – Vladimir Panteleev
        Jan 26 '13 at 18:17






      • 2




        @Phpdevpad: In addition, the question said "I am neither concerned about RAM usage [...]"--I don't think any Maemo device qualifies.
        – Mikko Rantalainen
        Jan 28 '13 at 7:11






      • 1




        Isn't noop or deadline a better scheduler for SSDs?
        – rep_movsd
        Aug 14 '13 at 7:58






      • 1




        @rep_movsd I've been using only intel SSD drives but at least these drives are still slow enough to have better overall performance with more intelligent schedulers such as CFQ. I'd guess that if your SSD drive can deal with more than 100K random IOPS, using noop or deadline would make sense even with fast CPU. With "fast CPU" I mean something that has at least multiple 3GHz cores available for IO only.
        – Mikko Rantalainen
        Aug 15 '13 at 8:58











      • You can also read about these vm tunables from the vm kernel docs.
        – joeytwiddle
        3 hours ago












      up vote
      92
      down vote










      up vote
      92
      down vote









      Improving disk cache performance in general is more than just increasing the file system cache size unless your whole system fits in RAM in which case you should use RAM drive (tmpfs is good because it allows falling back to disk if you need the RAM in some case) for runtime storage (and perhaps an initrd script to copy system from storage to RAM drive at startup).



      You didn't tell if your storage device is SSD or HDD. Here's what I've found to work for me (in my case sda is a HDD mounted at /home and sdb is SSD mounted at /).



      First optimize the load-stuff-from-storage-to-cache part:



      Here's my setup for HDD (make sure AHCI+NCQ is enabled in BIOS if you have toggles):



      echo cfq > /sys/block/sda/queue/scheduler
      echo 10000 > /sys/block/sda/queue/iosched/fifo_expire_async
      echo 250 > /sys/block/sda/queue/iosched/fifo_expire_sync
      echo 80 > /sys/block/sda/queue/iosched/slice_async
      echo 1 > /sys/block/sda/queue/iosched/low_latency
      echo 6 > /sys/block/sda/queue/iosched/quantum
      echo 5 > /sys/block/sda/queue/iosched/slice_async_rq
      echo 3 > /sys/block/sda/queue/iosched/slice_idle
      echo 100 > /sys/block/sda/queue/iosched/slice_sync
      hdparm -q -M 254 /dev/sda


      Worth noting for the HDD case is high fifo_expire_async (usually write) and long slice_sync to allow a single process to get high throughput (set slice_sync to lower number if you hit situations where multiple processes are waiting for some data from the disk in parallel). The slice_idle is always a compromise for HDDs but setting it somewhere in range 3-20 should be okay depending on disk usage and disk firmware. I prefer to target for low values but setting it too low will destroy your throughput. The quantum setting seems to affect throughput a lot but try to keep this as low as possible to keep latency on sensible level. Setting quantum too low will destroy throughput. Values in range 3-8 seem to work well with HDDs. The worst case latency for a read is (quantum * slice_sync) + (slice_async_rq * slice_async) ms if I've understood the kernel behavior correctly. The async is mostly used by writes and since you're willing to delay writing to disk, set both slice_async_rq and slice_async to very low numbers. However, setting slice_async_rq too low value may stall reads because writes cannot be delayed after reads any more. My config will try to write data to disk at most after 10 seconds after data has been passed to kernel but since you can tolerate loss of data on power loss also set fifo_expire_async to 3600000 to tell that 1 hour is okay for the delay to disk. Just keep the slice_async low, though, because otherwise you can get high read latency.



      The hdparm command is required to prevent AAM from killing much of the performance that AHCI+NCQ allows. If your disk makes too much noise, then skip this.



      Here's my setup for SSD (Intel 320 series):



      echo cfq > /sys/block/sdb/queue/scheduler
      echo 1 > /sys/block/sdb/queue/iosched/back_seek_penalty
      echo 10000 > /sys/block/sdb/queue/iosched/fifo_expire_async
      echo 20 > /sys/block/sdb/queue/iosched/fifo_expire_sync
      echo 1 > /sys/block/sdb/queue/iosched/low_latency
      echo 6 > /sys/block/sdb/queue/iosched/quantum
      echo 2 > /sys/block/sdb/queue/iosched/slice_async
      echo 10 > /sys/block/sdb/queue/iosched/slice_async_rq
      echo 1 > /sys/block/sdb/queue/iosched/slice_idle
      echo 20 > /sys/block/sdb/queue/iosched/slice_sync


      Here it's worth noting the low values for different slice settings. The most important setting for an SSD is slice_idle which must be set to 0-1. Setting it to zero moves all ordering decisions to native NCQ while setting it to 1 allows kernel to order requests (but if the NCQ is active, the hardware may override kernel ordering partially). Test both values to see if you can see the difference. For Intel 320 series, it seems that setting slide_idle to 0 gives the best throughput but setting it to 1 gives best (lowest) overall latency.



      For more information about these tunables, see http://www.linux-mag.com/id/7572/.



      Now that we have configured kernel to load stuff from disk to cache with sensible performance, it's time to adjust the cache behavior:



      According to benchmarks I've done, I wouldn't bother setting read ahead via blockdev at all. Kernel default settings are fine.



      Set system to prefer swapping file data over application code (this does not matter if you have enough RAM to keep whole filesystem and all the application code and all virtual memory allocated by applications in RAM). This reduces latency for swapping between different applications over latency for accessing big files from a single application:



      echo 15 > /proc/sys/vm/swappiness


      If you prefer to keep applications nearly always in RAM you could set this to 1. If you set this to zero, kernel will not swap at all unless absolutely necessary to avoid OOM. If you were memory limited and working with big files (e.g. HD video editing), then it might make sense to set this close to 100.



      I nowadays (2017) prefer to have no swap at all if you have enough RAM. Having no swap will usually lose 200-1000 MB of RAM on long running desktop machine. I'm willing to sacrifice that much to avoid worst case scenario latency (swapping application code in when RAM is full). In practice, this means that I prefer OOM Killer to swapping. If you allow/need swapping, you might want to increase /proc/sys/vm/watermark_scale_factor, too, to avoid some latency. I would suggest values between 100 and 500. You can consider this setting as trading CPU usage for lower swap latency. Default is 10 and maximum possible is 1000. Higher value should (according to kernel documentation) result in higher CPU usage for kswapd processes and lower overall swapping latency.



      Next, tell kernel to prefer keeping directory hierarchy in memory over file contents in case some RAM needs to be freed (again, if everything fits in RAM, this setting does nothing):



      echo 10 > /proc/sys/vm/vfs_cache_pressure


      Setting vfs_cache_pressure to low value makes sense because in most cases, the kernel needs to know the directory structure before it can use file contents from the cache and flushing the directory cache too soon will make the file cache next to worthless. Consider going all the way down to 1 with this setting if you have lots of small files (my system has around 150K 10 megapixel photos and counts as "lots of small files" system). Never set it to zero or directory structure is always kept in memory even if the system is running out of the memory. Setting this to big value is sensible only if you have only a few big files that are constantly being re-read (again, HD video editing without enough RAM would be an example case). Official kernel documentation says that "increasing vfs_cache_pressure significantly beyond 100 may have negative performance impact".



      Exception: if you have truly massive amount of files and directories and you rarely touch/read/list all files setting vfs_cache_pressure higher than 100 may be wise. This only applies if you do not have enough RAM and cannot keep whole directory structure in RAM and still having enough RAM for normal file cache and processes (e.g. company wide file server with lots of archival content). If you feel that you need to increase vfs_cache_pressure above 100 you're running without enough RAM. Increasing vfs_cache_pressure may help but the only real fix is to get more RAM. Having vfs_cache_pressure set to high number sacrifices average performance for having more stable performance overall (that is, you can avoid really bad worst case behavior but have to deal with worse overall performance).



      Finally tell the kernel to use up to 99% of the RAM as cache for writes and instruct kernel to use up to 50% of RAM before slowing down the process that's writing (default for dirty_background_ratio is 10). Warning: I personally would not do this but you claimed to have enough RAM and are willing to lose the data.



      echo 99 > /proc/sys/vm/dirty_ratio
      echo 50 > /proc/sys/vm/dirty_background_ratio


      And tell that 1h write delay is ok to even start writing stuff on the disk (again, I would not do this):



      echo 360000 > /proc/sys/vm/dirty_expire_centisecs
      echo 360000 > /proc/sys/vm/dirty_writeback_centisecs


      If you put all of those to /etc/rc.local and include following at the end, everything will be in cache as soon as possible after boot (only do this if your filesystem really fits in the RAM):



      (nice find / -type f -and -not -path '/sys/*' -and -not -path '/proc/*' -print0 2>/dev/null | nice ionice -c 3 wc -l --files0-from - > /dev/null)&


      Or a bit simpler alternative which might work better (cache only /home and /usr, only do this if your /home and /usr really fit in RAM):



      (nice find /home /usr -type f -print0 | nice ionice -c 3 wc -l --files0-from - > /dev/null)&





      share|improve this answer














      Improving disk cache performance in general is more than just increasing the file system cache size unless your whole system fits in RAM in which case you should use RAM drive (tmpfs is good because it allows falling back to disk if you need the RAM in some case) for runtime storage (and perhaps an initrd script to copy system from storage to RAM drive at startup).



      You didn't tell if your storage device is SSD or HDD. Here's what I've found to work for me (in my case sda is a HDD mounted at /home and sdb is SSD mounted at /).



      First optimize the load-stuff-from-storage-to-cache part:



      Here's my setup for HDD (make sure AHCI+NCQ is enabled in BIOS if you have toggles):



      echo cfq > /sys/block/sda/queue/scheduler
      echo 10000 > /sys/block/sda/queue/iosched/fifo_expire_async
      echo 250 > /sys/block/sda/queue/iosched/fifo_expire_sync
      echo 80 > /sys/block/sda/queue/iosched/slice_async
      echo 1 > /sys/block/sda/queue/iosched/low_latency
      echo 6 > /sys/block/sda/queue/iosched/quantum
      echo 5 > /sys/block/sda/queue/iosched/slice_async_rq
      echo 3 > /sys/block/sda/queue/iosched/slice_idle
      echo 100 > /sys/block/sda/queue/iosched/slice_sync
      hdparm -q -M 254 /dev/sda


      Worth noting for the HDD case is high fifo_expire_async (usually write) and long slice_sync to allow a single process to get high throughput (set slice_sync to lower number if you hit situations where multiple processes are waiting for some data from the disk in parallel). The slice_idle is always a compromise for HDDs but setting it somewhere in range 3-20 should be okay depending on disk usage and disk firmware. I prefer to target for low values but setting it too low will destroy your throughput. The quantum setting seems to affect throughput a lot but try to keep this as low as possible to keep latency on sensible level. Setting quantum too low will destroy throughput. Values in range 3-8 seem to work well with HDDs. The worst case latency for a read is (quantum * slice_sync) + (slice_async_rq * slice_async) ms if I've understood the kernel behavior correctly. The async is mostly used by writes and since you're willing to delay writing to disk, set both slice_async_rq and slice_async to very low numbers. However, setting slice_async_rq too low value may stall reads because writes cannot be delayed after reads any more. My config will try to write data to disk at most after 10 seconds after data has been passed to kernel but since you can tolerate loss of data on power loss also set fifo_expire_async to 3600000 to tell that 1 hour is okay for the delay to disk. Just keep the slice_async low, though, because otherwise you can get high read latency.



      The hdparm command is required to prevent AAM from killing much of the performance that AHCI+NCQ allows. If your disk makes too much noise, then skip this.



      Here's my setup for SSD (Intel 320 series):



      echo cfq > /sys/block/sdb/queue/scheduler
      echo 1 > /sys/block/sdb/queue/iosched/back_seek_penalty
      echo 10000 > /sys/block/sdb/queue/iosched/fifo_expire_async
      echo 20 > /sys/block/sdb/queue/iosched/fifo_expire_sync
      echo 1 > /sys/block/sdb/queue/iosched/low_latency
      echo 6 > /sys/block/sdb/queue/iosched/quantum
      echo 2 > /sys/block/sdb/queue/iosched/slice_async
      echo 10 > /sys/block/sdb/queue/iosched/slice_async_rq
      echo 1 > /sys/block/sdb/queue/iosched/slice_idle
      echo 20 > /sys/block/sdb/queue/iosched/slice_sync


      Here it's worth noting the low values for different slice settings. The most important setting for an SSD is slice_idle which must be set to 0-1. Setting it to zero moves all ordering decisions to native NCQ while setting it to 1 allows kernel to order requests (but if the NCQ is active, the hardware may override kernel ordering partially). Test both values to see if you can see the difference. For Intel 320 series, it seems that setting slide_idle to 0 gives the best throughput but setting it to 1 gives best (lowest) overall latency.



      For more information about these tunables, see http://www.linux-mag.com/id/7572/.



      Now that we have configured kernel to load stuff from disk to cache with sensible performance, it's time to adjust the cache behavior:



      According to benchmarks I've done, I wouldn't bother setting read ahead via blockdev at all. Kernel default settings are fine.



      Set system to prefer swapping file data over application code (this does not matter if you have enough RAM to keep whole filesystem and all the application code and all virtual memory allocated by applications in RAM). This reduces latency for swapping between different applications over latency for accessing big files from a single application:



      echo 15 > /proc/sys/vm/swappiness


      If you prefer to keep applications nearly always in RAM you could set this to 1. If you set this to zero, kernel will not swap at all unless absolutely necessary to avoid OOM. If you were memory limited and working with big files (e.g. HD video editing), then it might make sense to set this close to 100.



      I nowadays (2017) prefer to have no swap at all if you have enough RAM. Having no swap will usually lose 200-1000 MB of RAM on long running desktop machine. I'm willing to sacrifice that much to avoid worst case scenario latency (swapping application code in when RAM is full). In practice, this means that I prefer OOM Killer to swapping. If you allow/need swapping, you might want to increase /proc/sys/vm/watermark_scale_factor, too, to avoid some latency. I would suggest values between 100 and 500. You can consider this setting as trading CPU usage for lower swap latency. Default is 10 and maximum possible is 1000. Higher value should (according to kernel documentation) result in higher CPU usage for kswapd processes and lower overall swapping latency.



      Next, tell kernel to prefer keeping directory hierarchy in memory over file contents in case some RAM needs to be freed (again, if everything fits in RAM, this setting does nothing):



      echo 10 > /proc/sys/vm/vfs_cache_pressure


      Setting vfs_cache_pressure to low value makes sense because in most cases, the kernel needs to know the directory structure before it can use file contents from the cache and flushing the directory cache too soon will make the file cache next to worthless. Consider going all the way down to 1 with this setting if you have lots of small files (my system has around 150K 10 megapixel photos and counts as "lots of small files" system). Never set it to zero or directory structure is always kept in memory even if the system is running out of the memory. Setting this to big value is sensible only if you have only a few big files that are constantly being re-read (again, HD video editing without enough RAM would be an example case). Official kernel documentation says that "increasing vfs_cache_pressure significantly beyond 100 may have negative performance impact".



      Exception: if you have truly massive amount of files and directories and you rarely touch/read/list all files setting vfs_cache_pressure higher than 100 may be wise. This only applies if you do not have enough RAM and cannot keep whole directory structure in RAM and still having enough RAM for normal file cache and processes (e.g. company wide file server with lots of archival content). If you feel that you need to increase vfs_cache_pressure above 100 you're running without enough RAM. Increasing vfs_cache_pressure may help but the only real fix is to get more RAM. Having vfs_cache_pressure set to high number sacrifices average performance for having more stable performance overall (that is, you can avoid really bad worst case behavior but have to deal with worse overall performance).



      Finally tell the kernel to use up to 99% of the RAM as cache for writes and instruct kernel to use up to 50% of RAM before slowing down the process that's writing (default for dirty_background_ratio is 10). Warning: I personally would not do this but you claimed to have enough RAM and are willing to lose the data.



      echo 99 > /proc/sys/vm/dirty_ratio
      echo 50 > /proc/sys/vm/dirty_background_ratio


      And tell that 1h write delay is ok to even start writing stuff on the disk (again, I would not do this):



      echo 360000 > /proc/sys/vm/dirty_expire_centisecs
      echo 360000 > /proc/sys/vm/dirty_writeback_centisecs


      If you put all of those to /etc/rc.local and include following at the end, everything will be in cache as soon as possible after boot (only do this if your filesystem really fits in the RAM):



      (nice find / -type f -and -not -path '/sys/*' -and -not -path '/proc/*' -print0 2>/dev/null | nice ionice -c 3 wc -l --files0-from - > /dev/null)&


      Or a bit simpler alternative which might work better (cache only /home and /usr, only do this if your /home and /usr really fit in RAM):



      (nice find /home /usr -type f -print0 | nice ionice -c 3 wc -l --files0-from - > /dev/null)&






      share|improve this answer














      share|improve this answer



      share|improve this answer








      edited Jan 17 at 10:47

























      answered Jun 28 '12 at 11:48









      Mikko Rantalainen

      1,442815




      1,442815







      • 3




        A well-informed and overall much better answer than the accepted one! This one is underrated... I guess most people just want simple instructions without bothering to understand what they really do...
        – Vladimir Panteleev
        Jan 26 '13 at 18:17






      • 2




        @Phpdevpad: In addition, the question said "I am neither concerned about RAM usage [...]"--I don't think any Maemo device qualifies.
        – Mikko Rantalainen
        Jan 28 '13 at 7:11






      • 1




        Isn't noop or deadline a better scheduler for SSDs?
        – rep_movsd
        Aug 14 '13 at 7:58






      • 1




        @rep_movsd I've been using only intel SSD drives but at least these drives are still slow enough to have better overall performance with more intelligent schedulers such as CFQ. I'd guess that if your SSD drive can deal with more than 100K random IOPS, using noop or deadline would make sense even with fast CPU. With "fast CPU" I mean something that has at least multiple 3GHz cores available for IO only.
        – Mikko Rantalainen
        Aug 15 '13 at 8:58











      • You can also read about these vm tunables from the vm kernel docs.
        – joeytwiddle
        3 hours ago












      • 3




        A well-informed and overall much better answer than the accepted one! This one is underrated... I guess most people just want simple instructions without bothering to understand what they really do...
        – Vladimir Panteleev
        Jan 26 '13 at 18:17






      • 2




        @Phpdevpad: In addition, the question said "I am neither concerned about RAM usage [...]"--I don't think any Maemo device qualifies.
        – Mikko Rantalainen
        Jan 28 '13 at 7:11






      • 1




        Isn't noop or deadline a better scheduler for SSDs?
        – rep_movsd
        Aug 14 '13 at 7:58






      • 1




        @rep_movsd I've been using only intel SSD drives but at least these drives are still slow enough to have better overall performance with more intelligent schedulers such as CFQ. I'd guess that if your SSD drive can deal with more than 100K random IOPS, using noop or deadline would make sense even with fast CPU. With "fast CPU" I mean something that has at least multiple 3GHz cores available for IO only.
        – Mikko Rantalainen
        Aug 15 '13 at 8:58











      • You can also read about these vm tunables from the vm kernel docs.
        – joeytwiddle
        3 hours ago







      3




      3




      A well-informed and overall much better answer than the accepted one! This one is underrated... I guess most people just want simple instructions without bothering to understand what they really do...
      – Vladimir Panteleev
      Jan 26 '13 at 18:17




      A well-informed and overall much better answer than the accepted one! This one is underrated... I guess most people just want simple instructions without bothering to understand what they really do...
      – Vladimir Panteleev
      Jan 26 '13 at 18:17




      2




      2




      @Phpdevpad: In addition, the question said "I am neither concerned about RAM usage [...]"--I don't think any Maemo device qualifies.
      – Mikko Rantalainen
      Jan 28 '13 at 7:11




      @Phpdevpad: In addition, the question said "I am neither concerned about RAM usage [...]"--I don't think any Maemo device qualifies.
      – Mikko Rantalainen
      Jan 28 '13 at 7:11




      1




      1




      Isn't noop or deadline a better scheduler for SSDs?
      – rep_movsd
      Aug 14 '13 at 7:58




      Isn't noop or deadline a better scheduler for SSDs?
      – rep_movsd
      Aug 14 '13 at 7:58




      1




      1




      @rep_movsd I've been using only intel SSD drives but at least these drives are still slow enough to have better overall performance with more intelligent schedulers such as CFQ. I'd guess that if your SSD drive can deal with more than 100K random IOPS, using noop or deadline would make sense even with fast CPU. With "fast CPU" I mean something that has at least multiple 3GHz cores available for IO only.
      – Mikko Rantalainen
      Aug 15 '13 at 8:58





      @rep_movsd I've been using only intel SSD drives but at least these drives are still slow enough to have better overall performance with more intelligent schedulers such as CFQ. I'd guess that if your SSD drive can deal with more than 100K random IOPS, using noop or deadline would make sense even with fast CPU. With "fast CPU" I mean something that has at least multiple 3GHz cores available for IO only.
      – Mikko Rantalainen
      Aug 15 '13 at 8:58













      You can also read about these vm tunables from the vm kernel docs.
      – joeytwiddle
      3 hours ago




      You can also read about these vm tunables from the vm kernel docs.
      – joeytwiddle
      3 hours ago












      up vote
      15
      down vote













      Firstly, I DO NOT recommend you continue using NTFS, as ntfs implemention in Linux would be performance and security trouble at any time.



      There are several things you can do:



      • use some newer fs such as ext4 or btrfs

      • try to change your io scheduler, for example bfq

      • turn off swap

      • use some automatic preloader like preload

      • use something like systemd to preload while booting

      • ... and something more

      Maybe you want to give it a try :-)






      share|improve this answer
















      • 1




        I've already moved entirely away from NTFS to ext4 once, leaving the only NTFS partition to be the Windows system partition. But it turned in many inconveniences for me and I have turned back to NTFS as the main data partition (where I store all my documents, downloads, projects, source code etc.) file system. I don't give up rethinking my partitions structure and my workflow (to use less Windows) but right now giving up NTFS doesn't seem a realistic option.
        – Ivan
        Feb 3 '12 at 12:39











      • If you have to use your data inside Windows too, NTFS may be the only option. (many other options available if you can use your Windows just as a VM inside linux)
        – Felix Yan
        Feb 3 '12 at 12:41






      • 1




        A summary of what these supposed problems are of NTFS would have been useful.
        – underscore_d
        Oct 5 '15 at 22:40






      • 1




        NTFS on Linux is pretty much acceptable except for the performance. Considering that the question was specifically about improving file system performance, NTFS should be the first thing to go.
        – Mikko Rantalainen
        Apr 12 at 12:51










      • Even though btrfs is recently designed file system, I would avoid that if performance is needed. We've been running otherwise identical systems with btrfs and ext4 file systems and ext4 wins in real world with a big margin (btrfs seems to require about 4x CPU time the ext4 needs for the same performance level and causes more disk operations for a single logical command). Depending on workload, I would suggest ext4, jfs or xfs for any performance demanding work.
        – Mikko Rantalainen
        May 15 at 6:00














      up vote
      15
      down vote













      Firstly, I DO NOT recommend you continue using NTFS, as ntfs implemention in Linux would be performance and security trouble at any time.



      There are several things you can do:



      • use some newer fs such as ext4 or btrfs

      • try to change your io scheduler, for example bfq

      • turn off swap

      • use some automatic preloader like preload

      • use something like systemd to preload while booting

      • ... and something more

      Maybe you want to give it a try :-)






      share|improve this answer
















      • 1




        I've already moved entirely away from NTFS to ext4 once, leaving the only NTFS partition to be the Windows system partition. But it turned in many inconveniences for me and I have turned back to NTFS as the main data partition (where I store all my documents, downloads, projects, source code etc.) file system. I don't give up rethinking my partitions structure and my workflow (to use less Windows) but right now giving up NTFS doesn't seem a realistic option.
        – Ivan
        Feb 3 '12 at 12:39











      • If you have to use your data inside Windows too, NTFS may be the only option. (many other options available if you can use your Windows just as a VM inside linux)
        – Felix Yan
        Feb 3 '12 at 12:41






      • 1




        A summary of what these supposed problems are of NTFS would have been useful.
        – underscore_d
        Oct 5 '15 at 22:40






      • 1




        NTFS on Linux is pretty much acceptable except for the performance. Considering that the question was specifically about improving file system performance, NTFS should be the first thing to go.
        – Mikko Rantalainen
        Apr 12 at 12:51










      • Even though btrfs is recently designed file system, I would avoid that if performance is needed. We've been running otherwise identical systems with btrfs and ext4 file systems and ext4 wins in real world with a big margin (btrfs seems to require about 4x CPU time the ext4 needs for the same performance level and causes more disk operations for a single logical command). Depending on workload, I would suggest ext4, jfs or xfs for any performance demanding work.
        – Mikko Rantalainen
        May 15 at 6:00












      up vote
      15
      down vote










      up vote
      15
      down vote









      Firstly, I DO NOT recommend you continue using NTFS, as ntfs implemention in Linux would be performance and security trouble at any time.



      There are several things you can do:



      • use some newer fs such as ext4 or btrfs

      • try to change your io scheduler, for example bfq

      • turn off swap

      • use some automatic preloader like preload

      • use something like systemd to preload while booting

      • ... and something more

      Maybe you want to give it a try :-)






      share|improve this answer












      Firstly, I DO NOT recommend you continue using NTFS, as ntfs implemention in Linux would be performance and security trouble at any time.



      There are several things you can do:



      • use some newer fs such as ext4 or btrfs

      • try to change your io scheduler, for example bfq

      • turn off swap

      • use some automatic preloader like preload

      • use something like systemd to preload while booting

      • ... and something more

      Maybe you want to give it a try :-)







      share|improve this answer












      share|improve this answer



      share|improve this answer










      answered Jan 29 '12 at 4:31









      Felix Yan

      5651513




      5651513







      • 1




        I've already moved entirely away from NTFS to ext4 once, leaving the only NTFS partition to be the Windows system partition. But it turned in many inconveniences for me and I have turned back to NTFS as the main data partition (where I store all my documents, downloads, projects, source code etc.) file system. I don't give up rethinking my partitions structure and my workflow (to use less Windows) but right now giving up NTFS doesn't seem a realistic option.
        – Ivan
        Feb 3 '12 at 12:39











      • If you have to use your data inside Windows too, NTFS may be the only option. (many other options available if you can use your Windows just as a VM inside linux)
        – Felix Yan
        Feb 3 '12 at 12:41






      • 1




        A summary of what these supposed problems are of NTFS would have been useful.
        – underscore_d
        Oct 5 '15 at 22:40






      • 1




        NTFS on Linux is pretty much acceptable except for the performance. Considering that the question was specifically about improving file system performance, NTFS should be the first thing to go.
        – Mikko Rantalainen
        Apr 12 at 12:51










      • Even though btrfs is recently designed file system, I would avoid that if performance is needed. We've been running otherwise identical systems with btrfs and ext4 file systems and ext4 wins in real world with a big margin (btrfs seems to require about 4x CPU time the ext4 needs for the same performance level and causes more disk operations for a single logical command). Depending on workload, I would suggest ext4, jfs or xfs for any performance demanding work.
        – Mikko Rantalainen
        May 15 at 6:00












      • 1




        I've already moved entirely away from NTFS to ext4 once, leaving the only NTFS partition to be the Windows system partition. But it turned in many inconveniences for me and I have turned back to NTFS as the main data partition (where I store all my documents, downloads, projects, source code etc.) file system. I don't give up rethinking my partitions structure and my workflow (to use less Windows) but right now giving up NTFS doesn't seem a realistic option.
        – Ivan
        Feb 3 '12 at 12:39











      • If you have to use your data inside Windows too, NTFS may be the only option. (many other options available if you can use your Windows just as a VM inside linux)
        – Felix Yan
        Feb 3 '12 at 12:41






      • 1




        A summary of what these supposed problems are of NTFS would have been useful.
        – underscore_d
        Oct 5 '15 at 22:40






      • 1




        NTFS on Linux is pretty much acceptable except for the performance. Considering that the question was specifically about improving file system performance, NTFS should be the first thing to go.
        – Mikko Rantalainen
        Apr 12 at 12:51










      • Even though btrfs is recently designed file system, I would avoid that if performance is needed. We've been running otherwise identical systems with btrfs and ext4 file systems and ext4 wins in real world with a big margin (btrfs seems to require about 4x CPU time the ext4 needs for the same performance level and causes more disk operations for a single logical command). Depending on workload, I would suggest ext4, jfs or xfs for any performance demanding work.
        – Mikko Rantalainen
        May 15 at 6:00







      1




      1




      I've already moved entirely away from NTFS to ext4 once, leaving the only NTFS partition to be the Windows system partition. But it turned in many inconveniences for me and I have turned back to NTFS as the main data partition (where I store all my documents, downloads, projects, source code etc.) file system. I don't give up rethinking my partitions structure and my workflow (to use less Windows) but right now giving up NTFS doesn't seem a realistic option.
      – Ivan
      Feb 3 '12 at 12:39





      I've already moved entirely away from NTFS to ext4 once, leaving the only NTFS partition to be the Windows system partition. But it turned in many inconveniences for me and I have turned back to NTFS as the main data partition (where I store all my documents, downloads, projects, source code etc.) file system. I don't give up rethinking my partitions structure and my workflow (to use less Windows) but right now giving up NTFS doesn't seem a realistic option.
      – Ivan
      Feb 3 '12 at 12:39













      If you have to use your data inside Windows too, NTFS may be the only option. (many other options available if you can use your Windows just as a VM inside linux)
      – Felix Yan
      Feb 3 '12 at 12:41




      If you have to use your data inside Windows too, NTFS may be the only option. (many other options available if you can use your Windows just as a VM inside linux)
      – Felix Yan
      Feb 3 '12 at 12:41




      1




      1




      A summary of what these supposed problems are of NTFS would have been useful.
      – underscore_d
      Oct 5 '15 at 22:40




      A summary of what these supposed problems are of NTFS would have been useful.
      – underscore_d
      Oct 5 '15 at 22:40




      1




      1




      NTFS on Linux is pretty much acceptable except for the performance. Considering that the question was specifically about improving file system performance, NTFS should be the first thing to go.
      – Mikko Rantalainen
      Apr 12 at 12:51




      NTFS on Linux is pretty much acceptable except for the performance. Considering that the question was specifically about improving file system performance, NTFS should be the first thing to go.
      – Mikko Rantalainen
      Apr 12 at 12:51












      Even though btrfs is recently designed file system, I would avoid that if performance is needed. We've been running otherwise identical systems with btrfs and ext4 file systems and ext4 wins in real world with a big margin (btrfs seems to require about 4x CPU time the ext4 needs for the same performance level and causes more disk operations for a single logical command). Depending on workload, I would suggest ext4, jfs or xfs for any performance demanding work.
      – Mikko Rantalainen
      May 15 at 6:00




      Even though btrfs is recently designed file system, I would avoid that if performance is needed. We've been running otherwise identical systems with btrfs and ext4 file systems and ext4 wins in real world with a big margin (btrfs seems to require about 4x CPU time the ext4 needs for the same performance level and causes more disk operations for a single logical command). Depending on workload, I would suggest ext4, jfs or xfs for any performance demanding work.
      – Mikko Rantalainen
      May 15 at 6:00










      up vote
      7
      down vote













      Read ahead:



      On 32 bit systems:



      blockdev --setra 8388607 /dev/sda


      On 64 bit systems:



      blockdev --setra 4294967295 /dev/sda


      Write behind cache:



      echo 100 > /proc/sys/vm/dirty_ratio


      This will use up to 100% of your free memory as write cache.



      Or you can go all out and use tmpfs. This is only relevant if you have RAM enough. Put this in /etc/fstab. Replace 100G with the amount of physical RAM.



      tmpfs /mnt/tmpfs tmpfs size=100G,rw,nosuid,nodev 0 0


      Then:



      mkdir /mnt/tmpfs; mount -a


      Then use /mnt/tmpfs.






      share|improve this answer
















      • 3




        3GB or 2TB readahead? really? Do you even know what these options do?
        – Cobra_Fast
        Dec 25 '13 at 4:11






      • 1




        @Cobra_Fast Do you know what it means? I really have no idea and I am interested now.
        – syss
        Jun 15 '15 at 19:22







      • 2




        @syss the readahead settings are saved as number of memory "blocks", not bytes or bits. The size of one block is determined at kernel compilation time (since readahead-blocks are memory blocks) or filesystem creation time in some cases. Normally though, 1 block contains 512 or 4096 bytes. See linux.die.net/man/8/blockdev
        – Cobra_Fast
        Jun 15 '15 at 22:32















      up vote
      7
      down vote













      Read ahead:



      On 32 bit systems:



      blockdev --setra 8388607 /dev/sda


      On 64 bit systems:



      blockdev --setra 4294967295 /dev/sda


      Write behind cache:



      echo 100 > /proc/sys/vm/dirty_ratio


      This will use up to 100% of your free memory as write cache.



      Or you can go all out and use tmpfs. This is only relevant if you have RAM enough. Put this in /etc/fstab. Replace 100G with the amount of physical RAM.



      tmpfs /mnt/tmpfs tmpfs size=100G,rw,nosuid,nodev 0 0


      Then:



      mkdir /mnt/tmpfs; mount -a


      Then use /mnt/tmpfs.






      share|improve this answer
















      • 3




        3GB or 2TB readahead? really? Do you even know what these options do?
        – Cobra_Fast
        Dec 25 '13 at 4:11






      • 1




        @Cobra_Fast Do you know what it means? I really have no idea and I am interested now.
        – syss
        Jun 15 '15 at 19:22







      • 2




        @syss the readahead settings are saved as number of memory "blocks", not bytes or bits. The size of one block is determined at kernel compilation time (since readahead-blocks are memory blocks) or filesystem creation time in some cases. Normally though, 1 block contains 512 or 4096 bytes. See linux.die.net/man/8/blockdev
        – Cobra_Fast
        Jun 15 '15 at 22:32













      up vote
      7
      down vote










      up vote
      7
      down vote









      Read ahead:



      On 32 bit systems:



      blockdev --setra 8388607 /dev/sda


      On 64 bit systems:



      blockdev --setra 4294967295 /dev/sda


      Write behind cache:



      echo 100 > /proc/sys/vm/dirty_ratio


      This will use up to 100% of your free memory as write cache.



      Or you can go all out and use tmpfs. This is only relevant if you have RAM enough. Put this in /etc/fstab. Replace 100G with the amount of physical RAM.



      tmpfs /mnt/tmpfs tmpfs size=100G,rw,nosuid,nodev 0 0


      Then:



      mkdir /mnt/tmpfs; mount -a


      Then use /mnt/tmpfs.






      share|improve this answer












      Read ahead:



      On 32 bit systems:



      blockdev --setra 8388607 /dev/sda


      On 64 bit systems:



      blockdev --setra 4294967295 /dev/sda


      Write behind cache:



      echo 100 > /proc/sys/vm/dirty_ratio


      This will use up to 100% of your free memory as write cache.



      Or you can go all out and use tmpfs. This is only relevant if you have RAM enough. Put this in /etc/fstab. Replace 100G with the amount of physical RAM.



      tmpfs /mnt/tmpfs tmpfs size=100G,rw,nosuid,nodev 0 0


      Then:



      mkdir /mnt/tmpfs; mount -a


      Then use /mnt/tmpfs.







      share|improve this answer












      share|improve this answer



      share|improve this answer










      answered Mar 14 '12 at 21:03









      Ole Tange

      11.7k1447103




      11.7k1447103







      • 3




        3GB or 2TB readahead? really? Do you even know what these options do?
        – Cobra_Fast
        Dec 25 '13 at 4:11






      • 1




        @Cobra_Fast Do you know what it means? I really have no idea and I am interested now.
        – syss
        Jun 15 '15 at 19:22







      • 2




        @syss the readahead settings are saved as number of memory "blocks", not bytes or bits. The size of one block is determined at kernel compilation time (since readahead-blocks are memory blocks) or filesystem creation time in some cases. Normally though, 1 block contains 512 or 4096 bytes. See linux.die.net/man/8/blockdev
        – Cobra_Fast
        Jun 15 '15 at 22:32













      • 3




        3GB or 2TB readahead? really? Do you even know what these options do?
        – Cobra_Fast
        Dec 25 '13 at 4:11






      • 1




        @Cobra_Fast Do you know what it means? I really have no idea and I am interested now.
        – syss
        Jun 15 '15 at 19:22







      • 2




        @syss the readahead settings are saved as number of memory "blocks", not bytes or bits. The size of one block is determined at kernel compilation time (since readahead-blocks are memory blocks) or filesystem creation time in some cases. Normally though, 1 block contains 512 or 4096 bytes. See linux.die.net/man/8/blockdev
        – Cobra_Fast
        Jun 15 '15 at 22:32








      3




      3




      3GB or 2TB readahead? really? Do you even know what these options do?
      – Cobra_Fast
      Dec 25 '13 at 4:11




      3GB or 2TB readahead? really? Do you even know what these options do?
      – Cobra_Fast
      Dec 25 '13 at 4:11




      1




      1




      @Cobra_Fast Do you know what it means? I really have no idea and I am interested now.
      – syss
      Jun 15 '15 at 19:22





      @Cobra_Fast Do you know what it means? I really have no idea and I am interested now.
      – syss
      Jun 15 '15 at 19:22





      2




      2




      @syss the readahead settings are saved as number of memory "blocks", not bytes or bits. The size of one block is determined at kernel compilation time (since readahead-blocks are memory blocks) or filesystem creation time in some cases. Normally though, 1 block contains 512 or 4096 bytes. See linux.die.net/man/8/blockdev
      – Cobra_Fast
      Jun 15 '15 at 22:32





      @syss the readahead settings are saved as number of memory "blocks", not bytes or bits. The size of one block is determined at kernel compilation time (since readahead-blocks are memory blocks) or filesystem creation time in some cases. Normally though, 1 block contains 512 or 4096 bytes. See linux.die.net/man/8/blockdev
      – Cobra_Fast
      Jun 15 '15 at 22:32











      up vote
      6
      down vote













      You can set the read-ahead size with blockdev --setra sectors /dev/sda1, where sectors is the size you want in 512 byte sectors.






      share|improve this answer
























        up vote
        6
        down vote













        You can set the read-ahead size with blockdev --setra sectors /dev/sda1, where sectors is the size you want in 512 byte sectors.






        share|improve this answer






















          up vote
          6
          down vote










          up vote
          6
          down vote









          You can set the read-ahead size with blockdev --setra sectors /dev/sda1, where sectors is the size you want in 512 byte sectors.






          share|improve this answer












          You can set the read-ahead size with blockdev --setra sectors /dev/sda1, where sectors is the size you want in 512 byte sectors.







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Jan 29 '12 at 4:40









          psusi

          13.3k22439




          13.3k22439




















              up vote
              1
              down vote













              My killer setting is very simple and very effective:



              echo "2000" > /proc/sys/vm/vfs_cache_pressure


              The explanation from kernel documentation:




              vfs_cache_pressure



              Controls the tendency of the kernel to reclaim the memory which is
              used for caching of directory and inode objects.



              At the default value of vfs_cache_pressure=100 the kernel will attempt
              to reclaim dentries and inodes at a "fair" rate with respect to
              pagecache and swapcache reclaim. Decreasing vfs_cache_pressure causes
              the kernel to prefer to retain dentry and inode caches. When
              vfs_cache_pressure=0, the kernel will never reclaim dentries and
              inodes due to memory pressure and this can easily lead to
              out-of-memory conditions. Increasing vfs_cache_pressure beyond 100
              causes the kernel to prefer to reclaim dentries and inodes.




              vfs_cache_pressure at 2000 causes that most of computing happens in the RAM
              and very late disk writes.






              share|improve this answer


















              • 4




                Setting vfs_cache_pressure too high (I would consider 2000 too high) will cause unnecessary disk access even for simple stuff such as directory listings which should easily fit in cache. How much RAM do you have and what are you doing with the system? As I wrote in my answer, using high value for this setting makes sense for e.g. HD video editing with limited RAM.
                – Mikko Rantalainen
                Sep 30 '14 at 10:51






              • 1




                Note that the referenced documentation continues: "Increasing vfs_cache_pressure significantly beyond 100 may have negative performance impact. Reclaim code needs to take various locks to find freeable directory and inode objects. With vfs_cache_pressure=1000, it will look for ten times more freeable objects than there are."
                – Mikko Rantalainen
                Mar 14 at 7:02














              up vote
              1
              down vote













              My killer setting is very simple and very effective:



              echo "2000" > /proc/sys/vm/vfs_cache_pressure


              The explanation from kernel documentation:




              vfs_cache_pressure



              Controls the tendency of the kernel to reclaim the memory which is
              used for caching of directory and inode objects.



              At the default value of vfs_cache_pressure=100 the kernel will attempt
              to reclaim dentries and inodes at a "fair" rate with respect to
              pagecache and swapcache reclaim. Decreasing vfs_cache_pressure causes
              the kernel to prefer to retain dentry and inode caches. When
              vfs_cache_pressure=0, the kernel will never reclaim dentries and
              inodes due to memory pressure and this can easily lead to
              out-of-memory conditions. Increasing vfs_cache_pressure beyond 100
              causes the kernel to prefer to reclaim dentries and inodes.




              vfs_cache_pressure at 2000 causes that most of computing happens in the RAM
              and very late disk writes.






              share|improve this answer


















              • 4




                Setting vfs_cache_pressure too high (I would consider 2000 too high) will cause unnecessary disk access even for simple stuff such as directory listings which should easily fit in cache. How much RAM do you have and what are you doing with the system? As I wrote in my answer, using high value for this setting makes sense for e.g. HD video editing with limited RAM.
                – Mikko Rantalainen
                Sep 30 '14 at 10:51






              • 1




                Note that the referenced documentation continues: "Increasing vfs_cache_pressure significantly beyond 100 may have negative performance impact. Reclaim code needs to take various locks to find freeable directory and inode objects. With vfs_cache_pressure=1000, it will look for ten times more freeable objects than there are."
                – Mikko Rantalainen
                Mar 14 at 7:02












              up vote
              1
              down vote










              up vote
              1
              down vote









              My killer setting is very simple and very effective:



              echo "2000" > /proc/sys/vm/vfs_cache_pressure


              The explanation from kernel documentation:




              vfs_cache_pressure



              Controls the tendency of the kernel to reclaim the memory which is
              used for caching of directory and inode objects.



              At the default value of vfs_cache_pressure=100 the kernel will attempt
              to reclaim dentries and inodes at a "fair" rate with respect to
              pagecache and swapcache reclaim. Decreasing vfs_cache_pressure causes
              the kernel to prefer to retain dentry and inode caches. When
              vfs_cache_pressure=0, the kernel will never reclaim dentries and
              inodes due to memory pressure and this can easily lead to
              out-of-memory conditions. Increasing vfs_cache_pressure beyond 100
              causes the kernel to prefer to reclaim dentries and inodes.




              vfs_cache_pressure at 2000 causes that most of computing happens in the RAM
              and very late disk writes.






              share|improve this answer














              My killer setting is very simple and very effective:



              echo "2000" > /proc/sys/vm/vfs_cache_pressure


              The explanation from kernel documentation:




              vfs_cache_pressure



              Controls the tendency of the kernel to reclaim the memory which is
              used for caching of directory and inode objects.



              At the default value of vfs_cache_pressure=100 the kernel will attempt
              to reclaim dentries and inodes at a "fair" rate with respect to
              pagecache and swapcache reclaim. Decreasing vfs_cache_pressure causes
              the kernel to prefer to retain dentry and inode caches. When
              vfs_cache_pressure=0, the kernel will never reclaim dentries and
              inodes due to memory pressure and this can easily lead to
              out-of-memory conditions. Increasing vfs_cache_pressure beyond 100
              causes the kernel to prefer to reclaim dentries and inodes.




              vfs_cache_pressure at 2000 causes that most of computing happens in the RAM
              and very late disk writes.







              share|improve this answer














              share|improve this answer



              share|improve this answer








              edited Dec 29 '13 at 19:18









              slm

              243k66501669




              243k66501669










              answered Dec 29 '13 at 18:41







              user55518














              • 4




                Setting vfs_cache_pressure too high (I would consider 2000 too high) will cause unnecessary disk access even for simple stuff such as directory listings which should easily fit in cache. How much RAM do you have and what are you doing with the system? As I wrote in my answer, using high value for this setting makes sense for e.g. HD video editing with limited RAM.
                – Mikko Rantalainen
                Sep 30 '14 at 10:51






              • 1




                Note that the referenced documentation continues: "Increasing vfs_cache_pressure significantly beyond 100 may have negative performance impact. Reclaim code needs to take various locks to find freeable directory and inode objects. With vfs_cache_pressure=1000, it will look for ten times more freeable objects than there are."
                – Mikko Rantalainen
                Mar 14 at 7:02












              • 4




                Setting vfs_cache_pressure too high (I would consider 2000 too high) will cause unnecessary disk access even for simple stuff such as directory listings which should easily fit in cache. How much RAM do you have and what are you doing with the system? As I wrote in my answer, using high value for this setting makes sense for e.g. HD video editing with limited RAM.
                – Mikko Rantalainen
                Sep 30 '14 at 10:51






              • 1




                Note that the referenced documentation continues: "Increasing vfs_cache_pressure significantly beyond 100 may have negative performance impact. Reclaim code needs to take various locks to find freeable directory and inode objects. With vfs_cache_pressure=1000, it will look for ten times more freeable objects than there are."
                – Mikko Rantalainen
                Mar 14 at 7:02







              4




              4




              Setting vfs_cache_pressure too high (I would consider 2000 too high) will cause unnecessary disk access even for simple stuff such as directory listings which should easily fit in cache. How much RAM do you have and what are you doing with the system? As I wrote in my answer, using high value for this setting makes sense for e.g. HD video editing with limited RAM.
              – Mikko Rantalainen
              Sep 30 '14 at 10:51




              Setting vfs_cache_pressure too high (I would consider 2000 too high) will cause unnecessary disk access even for simple stuff such as directory listings which should easily fit in cache. How much RAM do you have and what are you doing with the system? As I wrote in my answer, using high value for this setting makes sense for e.g. HD video editing with limited RAM.
              – Mikko Rantalainen
              Sep 30 '14 at 10:51




              1




              1




              Note that the referenced documentation continues: "Increasing vfs_cache_pressure significantly beyond 100 may have negative performance impact. Reclaim code needs to take various locks to find freeable directory and inode objects. With vfs_cache_pressure=1000, it will look for ten times more freeable objects than there are."
              – Mikko Rantalainen
              Mar 14 at 7:02




              Note that the referenced documentation continues: "Increasing vfs_cache_pressure significantly beyond 100 may have negative performance impact. Reclaim code needs to take various locks to find freeable directory and inode objects. With vfs_cache_pressure=1000, it will look for ten times more freeable objects than there are."
              – Mikko Rantalainen
              Mar 14 at 7:02










              up vote
              0
              down vote













              Not related to write caching, but related to writes:




              • For an ext4 system, you could disable journaling entirely



                This will reduce the number of disk writes for any particular update, but may leave the filesystem is an inconsistent state after an unexpected shutdown, requiring an fsck or worse.



              To stop disk reads from triggering disk writes:




              • Mount with the noatime option



                When you read a file, the "last accessed time" metadata for that file is usually updated. The noatime option will disable that behaviour. This reduces unnecessary disk writes, but you will no longer have that metadata. Do you ever use that data? Some distributions are adopting this as default on all partitions (probably to increase the lifespan of earlier model SSDs).



              Other options:



              • In the comments above, Mikko shared the possibility of mounting with the nobarrier option. But Ivailo quoted RedHat who caution against it. How badly do you want that extra 3%?





              share|improve this answer
























                up vote
                0
                down vote













                Not related to write caching, but related to writes:




                • For an ext4 system, you could disable journaling entirely



                  This will reduce the number of disk writes for any particular update, but may leave the filesystem is an inconsistent state after an unexpected shutdown, requiring an fsck or worse.



                To stop disk reads from triggering disk writes:




                • Mount with the noatime option



                  When you read a file, the "last accessed time" metadata for that file is usually updated. The noatime option will disable that behaviour. This reduces unnecessary disk writes, but you will no longer have that metadata. Do you ever use that data? Some distributions are adopting this as default on all partitions (probably to increase the lifespan of earlier model SSDs).



                Other options:



                • In the comments above, Mikko shared the possibility of mounting with the nobarrier option. But Ivailo quoted RedHat who caution against it. How badly do you want that extra 3%?





                share|improve this answer






















                  up vote
                  0
                  down vote










                  up vote
                  0
                  down vote









                  Not related to write caching, but related to writes:




                  • For an ext4 system, you could disable journaling entirely



                    This will reduce the number of disk writes for any particular update, but may leave the filesystem is an inconsistent state after an unexpected shutdown, requiring an fsck or worse.



                  To stop disk reads from triggering disk writes:




                  • Mount with the noatime option



                    When you read a file, the "last accessed time" metadata for that file is usually updated. The noatime option will disable that behaviour. This reduces unnecessary disk writes, but you will no longer have that metadata. Do you ever use that data? Some distributions are adopting this as default on all partitions (probably to increase the lifespan of earlier model SSDs).



                  Other options:



                  • In the comments above, Mikko shared the possibility of mounting with the nobarrier option. But Ivailo quoted RedHat who caution against it. How badly do you want that extra 3%?





                  share|improve this answer












                  Not related to write caching, but related to writes:




                  • For an ext4 system, you could disable journaling entirely



                    This will reduce the number of disk writes for any particular update, but may leave the filesystem is an inconsistent state after an unexpected shutdown, requiring an fsck or worse.



                  To stop disk reads from triggering disk writes:




                  • Mount with the noatime option



                    When you read a file, the "last accessed time" metadata for that file is usually updated. The noatime option will disable that behaviour. This reduces unnecessary disk writes, but you will no longer have that metadata. Do you ever use that data? Some distributions are adopting this as default on all partitions (probably to increase the lifespan of earlier model SSDs).



                  Other options:



                  • In the comments above, Mikko shared the possibility of mounting with the nobarrier option. But Ivailo quoted RedHat who caution against it. How badly do you want that extra 3%?






                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered 2 hours ago









                  joeytwiddle

                  54939




                  54939



























                       

                      draft saved


                      draft discarded















































                       


                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function ()
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f30286%2fcan-i-configure-my-linux-system-for-more-aggressive-file-system-caching%23new-answer', 'question_page');

                      );

                      Post as a guest













































































                      Popular posts from this blog

                      How to check contact read email or not when send email to Individual?

                      Bahrain

                      Postfix configuration issue with fips on centos 7; mailgun relay