NFS caches expiring unexpectedly

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;

I need to build a NFS4 + CacheFilesd setup on a high latency, low throughput link where local caches never expire. The only invalidation semantics must be the NFS Server callbacks when something is updated (which is working fine by the way, changes on the server files are instantly passed on to the client). This mount is read-only, so no locks are in place.

The issue: Even though it will always correctly read the requested file from the local cache, it keeps fetching the files attributes if said hasn't been accessed in the last 60 seconds or so regardless of actimeo=86400 being set. It seems to have something to do with how often the file is opened since it works perfectly fine as long as I keep opening it every 50 seconds or less.

Proof of concept:

(Server network latency is artificially set to 2000ms so I can clearly pinpoint when attribute checking is being performed)

Wait 50 seconds after each request yields 100% cache hit as intended. This will continue indefinitely:

root@client:~# while : ; do /usr/bin/time -f%e cat /nfs-mount/2bytes-file > /dev/null ; sleep 50 ; done
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00

Now setting the delay between requests to 70 seconds, see how inconsistent the outcome is:

root@client:~# while : ; do /usr/bin/time -f%e cat /nfs-mount/2bytes-file > /dev/null ; sleep 70 ; done
0.00
0.00
0.00
0.00
0.00
0.00
6.00 # <- Attributes fetched. Debug log recorded "NFS: nfs_update_inode(0:69/68697599 fh_crc=0xb9b7a69e ct=2 info=0x26040)" 
4.00 # <- Attributes fetched. Debug log recorded "NFS: nfs_update_inode(0:69/68697599 fh_crc=0xb9b7a69e ct=2 info=0x26040)" 
0.00
0.00
0.00
6.00 # <- Attributes fetched. Debug log recorded "NFS: nfs_update_inode(0:69/68697599 fh_crc=0xb9b7a69e ct=2 info=0x26040)" 
4.00 # <- Attributes fetched. Debug log recorded "NFS: nfs_update_inode(0:69/68697599 fh_crc=0xb9b7a69e ct=2 info=0x26040)" 
0.00
0.00
0.00

Also, nfsstats adds an extra "getattr" when those delays occur:

create delegpurge delegreturn getattr getfh link 
2 0% 0 0% 82 0% 177063 10% 87644 5% 0 0%

And finally, when delay is set to 110 seconds or more, every single request ends up getting checked against the server for some reason:

root@client:~# while : ; do /usr/bin/time -f%e cat /nfs-mount/2bytes-file > /dev/null ; sleep 110 ; done 
6.00
6.00
6.00
6.00
6.00
6.00
6.00

I managed to reproduce the very same behavior by serving this 2 bytes-long file via HTTP with nginx instead of "cat" and through "ioping" as well.

Cachefiled is not purging anything on its own since there is more than enough space in its partition:

/dev/vdb 20G 3,0G 16G 17% /disk2/fscache

I know it is only reaching out for the files' metadata and not the content itself because when I perform the same test against a 2GB file (which is more than the client's physical memory size), it hangs for 2 seconds (the network established delay) and then it starts reading the Cachefilesd locally cached file from the disk as expected.

I really don't understand what is going on during those 1-2 minutes that causes the client to recheck with the server for updates, and that kills off the purpose of my setup.

/etc/exports:

/cache 192.168.122.234(ro,async,no_subtree_check)

Client mount:

root@client:~# mount -t nfs4 -o lookupcache=all,actimeo=86400,nocto,ro,intr,soft,proto=tcp,async,fsc 192.168.122.1:/cache /nfs-mount
root@client:~# cat /proc/mounts | grep nfs
192.168.122.1:/cache /nfs-mount nfs4 ro,relatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,acregmin=86400,acregmax=86400,acdirmin=86400,acdirmax=86400,soft,nocto,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=192.168.122.234,fsc,local_lock=none,addr=192.168.122.1 0 0

Server is Centos 7, client is Ubuntu 16.04. Packages are obtained from the distro's repos.

root@client:~# dpkg -l | grep nfs
ii libnfsidmap2:amd64 0.25-5 amd64 NFS idmapping library
ii nfs-common 1:1.2.8-9ubuntu12 amd64 NFS support files common to client and server
ii nfs4-acl-tools 0.3.3-3 amd64 Commandline and GUI ACL utilities for the NFSv4 client

I also tried using Ubuntu as a server and CentOS 7 as client with no avail.

edited Mar 9 at 14:10

Rui F Ribeiro

41.9k1483142

asked Sep 3 '16 at 11:29

G.Ashburn

762

add a comment |

Proof of concept:

(Server network latency is artificially set to 2000ms so I can clearly pinpoint when attribute checking is being performed)

Wait 50 seconds after each request yields 100% cache hit as intended. This will continue indefinitely:

root@client:~# while : ; do /usr/bin/time -f%e cat /nfs-mount/2bytes-file > /dev/null ; sleep 50 ; done
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00

Now setting the delay between requests to 70 seconds, see how inconsistent the outcome is:

root@client:~# while : ; do /usr/bin/time -f%e cat /nfs-mount/2bytes-file > /dev/null ; sleep 70 ; done
0.00
0.00
0.00
0.00
0.00
0.00
6.00 # <- Attributes fetched. Debug log recorded "NFS: nfs_update_inode(0:69/68697599 fh_crc=0xb9b7a69e ct=2 info=0x26040)" 
4.00 # <- Attributes fetched. Debug log recorded "NFS: nfs_update_inode(0:69/68697599 fh_crc=0xb9b7a69e ct=2 info=0x26040)" 
0.00
0.00
0.00
6.00 # <- Attributes fetched. Debug log recorded "NFS: nfs_update_inode(0:69/68697599 fh_crc=0xb9b7a69e ct=2 info=0x26040)" 
4.00 # <- Attributes fetched. Debug log recorded "NFS: nfs_update_inode(0:69/68697599 fh_crc=0xb9b7a69e ct=2 info=0x26040)" 
0.00
0.00
0.00

Also, nfsstats adds an extra "getattr" when those delays occur:

create delegpurge delegreturn getattr getfh link 
2 0% 0 0% 82 0% 177063 10% 87644 5% 0 0%

And finally, when delay is set to 110 seconds or more, every single request ends up getting checked against the server for some reason:

root@client:~# while : ; do /usr/bin/time -f%e cat /nfs-mount/2bytes-file > /dev/null ; sleep 110 ; done 
6.00
6.00
6.00
6.00
6.00
6.00
6.00

I managed to reproduce the very same behavior by serving this 2 bytes-long file via HTTP with nginx instead of "cat" and through "ioping" as well.

Cachefiled is not purging anything on its own since there is more than enough space in its partition:

/dev/vdb 20G 3,0G 16G 17% /disk2/fscache

I really don't understand what is going on during those 1-2 minutes that causes the client to recheck with the server for updates, and that kills off the purpose of my setup.

/etc/exports:

/cache 192.168.122.234(ro,async,no_subtree_check)

Client mount:

root@client:~# mount -t nfs4 -o lookupcache=all,actimeo=86400,nocto,ro,intr,soft,proto=tcp,async,fsc 192.168.122.1:/cache /nfs-mount
root@client:~# cat /proc/mounts | grep nfs
192.168.122.1:/cache /nfs-mount nfs4 ro,relatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,acregmin=86400,acregmax=86400,acdirmin=86400,acdirmax=86400,soft,nocto,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=192.168.122.234,fsc,local_lock=none,addr=192.168.122.1 0 0

Server is Centos 7, client is Ubuntu 16.04. Packages are obtained from the distro's repos.

I also tried using Ubuntu as a server and CentOS 7 as client with no avail.

edited Mar 9 at 14:10

Rui F Ribeiro

41.9k1483142

asked Sep 3 '16 at 11:29

G.Ashburn

762

add a comment |

Proof of concept:

(Server network latency is artificially set to 2000ms so I can clearly pinpoint when attribute checking is being performed)

Wait 50 seconds after each request yields 100% cache hit as intended. This will continue indefinitely:

root@client:~# while : ; do /usr/bin/time -f%e cat /nfs-mount/2bytes-file > /dev/null ; sleep 50 ; done
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00

Now setting the delay between requests to 70 seconds, see how inconsistent the outcome is:

root@client:~# while : ; do /usr/bin/time -f%e cat /nfs-mount/2bytes-file > /dev/null ; sleep 70 ; done
0.00
0.00
0.00
0.00
0.00
0.00
6.00 # <- Attributes fetched. Debug log recorded "NFS: nfs_update_inode(0:69/68697599 fh_crc=0xb9b7a69e ct=2 info=0x26040)" 
4.00 # <- Attributes fetched. Debug log recorded "NFS: nfs_update_inode(0:69/68697599 fh_crc=0xb9b7a69e ct=2 info=0x26040)" 
0.00
0.00
0.00
6.00 # <- Attributes fetched. Debug log recorded "NFS: nfs_update_inode(0:69/68697599 fh_crc=0xb9b7a69e ct=2 info=0x26040)" 
4.00 # <- Attributes fetched. Debug log recorded "NFS: nfs_update_inode(0:69/68697599 fh_crc=0xb9b7a69e ct=2 info=0x26040)" 
0.00
0.00
0.00

Also, nfsstats adds an extra "getattr" when those delays occur:

create delegpurge delegreturn getattr getfh link 
2 0% 0 0% 82 0% 177063 10% 87644 5% 0 0%

And finally, when delay is set to 110 seconds or more, every single request ends up getting checked against the server for some reason:

root@client:~# while : ; do /usr/bin/time -f%e cat /nfs-mount/2bytes-file > /dev/null ; sleep 110 ; done 
6.00
6.00
6.00
6.00
6.00
6.00
6.00

I managed to reproduce the very same behavior by serving this 2 bytes-long file via HTTP with nginx instead of "cat" and through "ioping" as well.

Cachefiled is not purging anything on its own since there is more than enough space in its partition:

/dev/vdb 20G 3,0G 16G 17% /disk2/fscache

I really don't understand what is going on during those 1-2 minutes that causes the client to recheck with the server for updates, and that kills off the purpose of my setup.

/etc/exports:

/cache 192.168.122.234(ro,async,no_subtree_check)

Client mount:

root@client:~# mount -t nfs4 -o lookupcache=all,actimeo=86400,nocto,ro,intr,soft,proto=tcp,async,fsc 192.168.122.1:/cache /nfs-mount
root@client:~# cat /proc/mounts | grep nfs
192.168.122.1:/cache /nfs-mount nfs4 ro,relatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,acregmin=86400,acregmax=86400,acdirmin=86400,acdirmax=86400,soft,nocto,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=192.168.122.234,fsc,local_lock=none,addr=192.168.122.1 0 0

Server is Centos 7, client is Ubuntu 16.04. Packages are obtained from the distro's repos.

I also tried using Ubuntu as a server and CentOS 7 as client with no avail.

edited Mar 9 at 14:10

Rui F Ribeiro

41.9k1483142

asked Sep 3 '16 at 11:29

G.Ashburn

762

Proof of concept:

(Server network latency is artificially set to 2000ms so I can clearly pinpoint when attribute checking is being performed)

Wait 50 seconds after each request yields 100% cache hit as intended. This will continue indefinitely:

root@client:~# while : ; do /usr/bin/time -f%e cat /nfs-mount/2bytes-file > /dev/null ; sleep 50 ; done
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00

Now setting the delay between requests to 70 seconds, see how inconsistent the outcome is:

root@client:~# while : ; do /usr/bin/time -f%e cat /nfs-mount/2bytes-file > /dev/null ; sleep 70 ; done
0.00
0.00
0.00
0.00
0.00
0.00
6.00 # <- Attributes fetched. Debug log recorded "NFS: nfs_update_inode(0:69/68697599 fh_crc=0xb9b7a69e ct=2 info=0x26040)" 
4.00 # <- Attributes fetched. Debug log recorded "NFS: nfs_update_inode(0:69/68697599 fh_crc=0xb9b7a69e ct=2 info=0x26040)" 
0.00
0.00
0.00
6.00 # <- Attributes fetched. Debug log recorded "NFS: nfs_update_inode(0:69/68697599 fh_crc=0xb9b7a69e ct=2 info=0x26040)" 
4.00 # <- Attributes fetched. Debug log recorded "NFS: nfs_update_inode(0:69/68697599 fh_crc=0xb9b7a69e ct=2 info=0x26040)" 
0.00
0.00
0.00

Also, nfsstats adds an extra "getattr" when those delays occur:

create delegpurge delegreturn getattr getfh link 
2 0% 0 0% 82 0% 177063 10% 87644 5% 0 0%

And finally, when delay is set to 110 seconds or more, every single request ends up getting checked against the server for some reason:

root@client:~# while : ; do /usr/bin/time -f%e cat /nfs-mount/2bytes-file > /dev/null ; sleep 110 ; done 
6.00
6.00
6.00
6.00
6.00
6.00
6.00

I managed to reproduce the very same behavior by serving this 2 bytes-long file via HTTP with nginx instead of "cat" and through "ioping" as well.

Cachefiled is not purging anything on its own since there is more than enough space in its partition:

/dev/vdb 20G 3,0G 16G 17% /disk2/fscache

I really don't understand what is going on during those 1-2 minutes that causes the client to recheck with the server for updates, and that kills off the purpose of my setup.

/etc/exports:

/cache 192.168.122.234(ro,async,no_subtree_check)

Client mount:

root@client:~# mount -t nfs4 -o lookupcache=all,actimeo=86400,nocto,ro,intr,soft,proto=tcp,async,fsc 192.168.122.1:/cache /nfs-mount
root@client:~# cat /proc/mounts | grep nfs
192.168.122.1:/cache /nfs-mount nfs4 ro,relatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,acregmin=86400,acregmax=86400,acdirmin=86400,acdirmax=86400,soft,nocto,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=192.168.122.234,fsc,local_lock=none,addr=192.168.122.1 0 0

Server is Centos 7, client is Ubuntu 16.04. Packages are obtained from the distro's repos.

I also tried using Ubuntu as a server and CentOS 7 as client with no avail.

nfs cache timeout

edited Mar 9 at 14:10

Rui F Ribeiro

41.9k1483142

asked Sep 3 '16 at 11:29

G.Ashburn

762

edited Mar 9 at 14:10

Rui F Ribeiro

41.9k1483142

asked Sep 3 '16 at 11:29

G.Ashburn

762

edited Mar 9 at 14:10

Rui F Ribeiro

41.9k1483142

edited Mar 9 at 14:10

Rui F Ribeiro

41.9k1483142

edited Mar 9 at 14:10

Rui F Ribeiro

41.9k1483142

asked Sep 3 '16 at 11:29

G.Ashburn

762

asked Sep 3 '16 at 11:29

G.Ashburn

762

asked Sep 3 '16 at 11:29

G.Ashburn

762

add a comment |

0

active

oldest

votes

Your Answer

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f307648%2fnfs-caches-expiring-unexpectedly%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

0

active

oldest

votes

0

active

oldest

votes

draft saved

draft discarded

Thanks for contributing an answer to Unix & Linux Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

搜尋此網誌

mjhjmtu