In Linux what happens if 1000 files in a directory are moved to another location while another 300 files were added to the source directory?

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP












38















In Linux what happens if 1000 files in a directory are moved to another location and another 300 files were added to the source directory while original 1000 files were being moved. Will the destination end up being 1300 files? or will there be 300 files remaining in the source folder.










share|improve this question

















  • 7





    This is not a direct answer, which seems to be well provided by @Eugene-Rieck. But, you might find it interesting/userful to read about Race Conditions (en.wikipedia.org/wiki/Race_condition ). They seem to be relevant to your question. In effect, if the specific commands you use to do the moving and adding of files create a race condition, then unusual things will happen.

    – user02814
    Feb 27 at 5:13







  • 4





    @user02814: The problem with race conditions is that unusual things might happen. When you're looking for them or writing tests, they usually don't happen. When you're putting code in production, they will surely happen. :)

    – Eric Duminil
    Feb 27 at 12:58






  • 1





    As an anecdotal case, I was moving a directory (mv dir/ other/) during which I added files to it. At the end of the move the directory was deleted and the uncopied files disappeared with it.

    – The Vee
    Mar 1 at 6:33











  • To my above comment: across filesystems, that is.

    – The Vee
    Mar 1 at 7:00















38















In Linux what happens if 1000 files in a directory are moved to another location and another 300 files were added to the source directory while original 1000 files were being moved. Will the destination end up being 1300 files? or will there be 300 files remaining in the source folder.










share|improve this question

















  • 7





    This is not a direct answer, which seems to be well provided by @Eugene-Rieck. But, you might find it interesting/userful to read about Race Conditions (en.wikipedia.org/wiki/Race_condition ). They seem to be relevant to your question. In effect, if the specific commands you use to do the moving and adding of files create a race condition, then unusual things will happen.

    – user02814
    Feb 27 at 5:13







  • 4





    @user02814: The problem with race conditions is that unusual things might happen. When you're looking for them or writing tests, they usually don't happen. When you're putting code in production, they will surely happen. :)

    – Eric Duminil
    Feb 27 at 12:58






  • 1





    As an anecdotal case, I was moving a directory (mv dir/ other/) during which I added files to it. At the end of the move the directory was deleted and the uncopied files disappeared with it.

    – The Vee
    Mar 1 at 6:33











  • To my above comment: across filesystems, that is.

    – The Vee
    Mar 1 at 7:00













38












38








38


5






In Linux what happens if 1000 files in a directory are moved to another location and another 300 files were added to the source directory while original 1000 files were being moved. Will the destination end up being 1300 files? or will there be 300 files remaining in the source folder.










share|improve this question














In Linux what happens if 1000 files in a directory are moved to another location and another 300 files were added to the source directory while original 1000 files were being moved. Will the destination end up being 1300 files? or will there be 300 files remaining in the source folder.







linux filesystems operating-systems






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Feb 26 at 12:02









Shayan AhmadShayan Ahmad

199126




199126







  • 7





    This is not a direct answer, which seems to be well provided by @Eugene-Rieck. But, you might find it interesting/userful to read about Race Conditions (en.wikipedia.org/wiki/Race_condition ). They seem to be relevant to your question. In effect, if the specific commands you use to do the moving and adding of files create a race condition, then unusual things will happen.

    – user02814
    Feb 27 at 5:13







  • 4





    @user02814: The problem with race conditions is that unusual things might happen. When you're looking for them or writing tests, they usually don't happen. When you're putting code in production, they will surely happen. :)

    – Eric Duminil
    Feb 27 at 12:58






  • 1





    As an anecdotal case, I was moving a directory (mv dir/ other/) during which I added files to it. At the end of the move the directory was deleted and the uncopied files disappeared with it.

    – The Vee
    Mar 1 at 6:33











  • To my above comment: across filesystems, that is.

    – The Vee
    Mar 1 at 7:00












  • 7





    This is not a direct answer, which seems to be well provided by @Eugene-Rieck. But, you might find it interesting/userful to read about Race Conditions (en.wikipedia.org/wiki/Race_condition ). They seem to be relevant to your question. In effect, if the specific commands you use to do the moving and adding of files create a race condition, then unusual things will happen.

    – user02814
    Feb 27 at 5:13







  • 4





    @user02814: The problem with race conditions is that unusual things might happen. When you're looking for them or writing tests, they usually don't happen. When you're putting code in production, they will surely happen. :)

    – Eric Duminil
    Feb 27 at 12:58






  • 1





    As an anecdotal case, I was moving a directory (mv dir/ other/) during which I added files to it. At the end of the move the directory was deleted and the uncopied files disappeared with it.

    – The Vee
    Mar 1 at 6:33











  • To my above comment: across filesystems, that is.

    – The Vee
    Mar 1 at 7:00







7




7





This is not a direct answer, which seems to be well provided by @Eugene-Rieck. But, you might find it interesting/userful to read about Race Conditions (en.wikipedia.org/wiki/Race_condition ). They seem to be relevant to your question. In effect, if the specific commands you use to do the moving and adding of files create a race condition, then unusual things will happen.

– user02814
Feb 27 at 5:13






This is not a direct answer, which seems to be well provided by @Eugene-Rieck. But, you might find it interesting/userful to read about Race Conditions (en.wikipedia.org/wiki/Race_condition ). They seem to be relevant to your question. In effect, if the specific commands you use to do the moving and adding of files create a race condition, then unusual things will happen.

– user02814
Feb 27 at 5:13





4




4





@user02814: The problem with race conditions is that unusual things might happen. When you're looking for them or writing tests, they usually don't happen. When you're putting code in production, they will surely happen. :)

– Eric Duminil
Feb 27 at 12:58





@user02814: The problem with race conditions is that unusual things might happen. When you're looking for them or writing tests, they usually don't happen. When you're putting code in production, they will surely happen. :)

– Eric Duminil
Feb 27 at 12:58




1




1





As an anecdotal case, I was moving a directory (mv dir/ other/) during which I added files to it. At the end of the move the directory was deleted and the uncopied files disappeared with it.

– The Vee
Mar 1 at 6:33





As an anecdotal case, I was moving a directory (mv dir/ other/) during which I added files to it. At the end of the move the directory was deleted and the uncopied files disappeared with it.

– The Vee
Mar 1 at 6:33













To my above comment: across filesystems, that is.

– The Vee
Mar 1 at 7:00





To my above comment: across filesystems, that is.

– The Vee
Mar 1 at 7:00










3 Answers
3






active

oldest

votes


















91














This depends on which tools you use: Let's check a few cases:



If you run something along the lines of mv /path/to/source/* /path/to/dest/ int a shell, you will end up with the original 1000 files being moved, the new 300 being untouched. This comes from the fact, that the shell will expand the * before starting the move operation, so when the move is in progress, the list is already fixed.



If you use Nautilus (and other GUI friends), you will end up the same way: It will run the move operation based on which files were selected - this doesn't change when new files show up.



If you use your own program using syscalls along the line of loop over glob and only one mv until glob stays empty, you will end up with all 1300 files in the new directory. This is because every new glob will pick up the new files, that have showed up in the meantime.






share|improve this answer




















  • 7





    What happens if you opendir() the source, then loop over readdir() or getdents()?

    – grawity
    Feb 26 at 12:15







  • 16





    The result-set of an opendir() is stable according to POSIX. A quick test with PHP's opendir() confirms that (but I tested only ext4).

    – Eugen Rieck
    Feb 26 at 14:53






  • 24





    @grawity: POSIX says: If a file is removed from or added to the directory after the most recent call to opendir() or rewinddir(), whether a subsequent call to readdir() returns an entry for that file is unspecified. Also, NFS may put some restrictions on what is implementable, IIRC it complicates implementation of telldir()/seekdir()

    – ninjalj
    Feb 26 at 17:51






  • 3





    @d-b With more than say 100 000 files, the expanded command line will exceed the maximum command length limit (ARG_MAX, usually a few MiB) and the mv will fail to execute.

    – TooTea
    Feb 27 at 8:18






  • 3





    @d-b If files are added or removed while expanding takes place, then the readdir discussion applies (because that's how the expansion is done too).

    – grawity
    Feb 27 at 8:24


















8














When you tell the system to move all the files from a directory, it lists all the files and then starts moving them. If new files appear in the directory, they aren't added to the list of files to move, so they'll remain in the original location.



You can, of course, program a way of moving files different to mv which will periodically check for new files in the source directory.






share|improve this answer























  • like say xargs mv?

    – Joshua
    Feb 28 at 2:53


















8














The kernel itself can't be "in the middle" of a "move 1000 files" operation. You need to be much more specific about what operation you're proposing.



One thread can only move one file at a time with the rename(*oldpath, const char *newpath) or renameat system calls (and only within the same filesystem1). Or Linux renameat2 which has flags like RENAME_EXCHANGE to atomically exchange two pathnames, or RENAME_NOREPLACE to not replace the destination if it exists. (e.g. allowing a mv -i implementation that avoids the race condition of stat and then rename, which would still overwrite a file created after stat.
link + unlink could also solve that, because link fails if the new name exists.)



But each of these system calls only renames a single directory entry per system call. Using POSIX renameat with olddirfd and newdirfd (opened with open(O_DIRECTORY)) would allow you to keep looping over files in a directory even if the source or destination directory itself had been renamed. (Using relative paths could also allow that with regular rename().)



Anyway, as the other answers say, most programs that use the rename system call will figure out a list of filenames before doing the first rename. (Usually using the readdir(3) POSIX library function as a wrapper for platform-specific system calls like Linux getdents).



But if you're talking about find -exec ... ; to run one command per file, or the more efficient -exec + with so many files that they don't fit on one command line, then you can certainly have renames happening while still scanning. e.g.



find . -name '*.txt' -exec mv -t ../txtfiles ; # Intentionally inefficient


If you created some new .txt files while this was running, you might see some of them in ../txtfiles. But internally find(1) will have used open(O_DIRECTORY) and getdents on ..



If one system call was enough to return all the directory entries in . (which find will loop over one at a time, only making further system calls if needed for -type or to recurse, or fork+exec on a match), then the list is a snapshot of the directory entries at one point in time. Further changes to the directory can't affect what find does, because it already has a copy of the directory listing what it will loop over. (Probably it internally uses readdir(3), which returns one entry at a time, but inside glibc we know from using strace find . that it makes a getdents64 system call with a buffer size of count=32768 entries.)



But if the directory is huge and/or the kernel doesn't fill find's buffer, it will have to make a 2nd getdents system call after looping over what it got the first time. So it could maybe see new entries after doing some renames.



But see discussion in comments under other answers: the kernel might have snapshotted for us, because (I think) getdents isn't allowed to return the same filename twice. Different filesystems use different sorting / indexing mechanisms for making access to an entry in a huge directory more efficient than a linear search. So adding or removing a directory might possibly have other effects on the order of the remaining entries. Hmm, probably it's more likely that filesystems keep a stable order, and just update an actual index (like the EXT4 dir_index feature), so a directory FD's position can just be a directory entry to resume from? I really don't know how the telldir(3) library interface maps onto lseek, or if that's purely a user-space thing for looping over the buffer obtained by user-space. But multiple getdents can be needed to get all the entries from a huge directory, so even if seeking isn't supported, the kernel needs to be able to record a current position.




Footnote 1:



To "move" between filesystems, it's up to user-space to copy and unlink. (e.g. with open and either read+write, mmap+write or sendfile(2) or copy_file_range(2), the latter two totally avoiding bouncing the file data through user-space.)






share|improve this answer






















    Your Answer








    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "3"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













    draft saved

    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsuperuser.com%2fquestions%2f1409532%2fin-linux-what-happens-if-1000-files-in-a-directory-are-moved-to-another-location%23new-answer', 'question_page');

    );

    Post as a guest















    Required, but never shown

























    3 Answers
    3






    active

    oldest

    votes








    3 Answers
    3






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    91














    This depends on which tools you use: Let's check a few cases:



    If you run something along the lines of mv /path/to/source/* /path/to/dest/ int a shell, you will end up with the original 1000 files being moved, the new 300 being untouched. This comes from the fact, that the shell will expand the * before starting the move operation, so when the move is in progress, the list is already fixed.



    If you use Nautilus (and other GUI friends), you will end up the same way: It will run the move operation based on which files were selected - this doesn't change when new files show up.



    If you use your own program using syscalls along the line of loop over glob and only one mv until glob stays empty, you will end up with all 1300 files in the new directory. This is because every new glob will pick up the new files, that have showed up in the meantime.






    share|improve this answer




















    • 7





      What happens if you opendir() the source, then loop over readdir() or getdents()?

      – grawity
      Feb 26 at 12:15







    • 16





      The result-set of an opendir() is stable according to POSIX. A quick test with PHP's opendir() confirms that (but I tested only ext4).

      – Eugen Rieck
      Feb 26 at 14:53






    • 24





      @grawity: POSIX says: If a file is removed from or added to the directory after the most recent call to opendir() or rewinddir(), whether a subsequent call to readdir() returns an entry for that file is unspecified. Also, NFS may put some restrictions on what is implementable, IIRC it complicates implementation of telldir()/seekdir()

      – ninjalj
      Feb 26 at 17:51






    • 3





      @d-b With more than say 100 000 files, the expanded command line will exceed the maximum command length limit (ARG_MAX, usually a few MiB) and the mv will fail to execute.

      – TooTea
      Feb 27 at 8:18






    • 3





      @d-b If files are added or removed while expanding takes place, then the readdir discussion applies (because that's how the expansion is done too).

      – grawity
      Feb 27 at 8:24















    91














    This depends on which tools you use: Let's check a few cases:



    If you run something along the lines of mv /path/to/source/* /path/to/dest/ int a shell, you will end up with the original 1000 files being moved, the new 300 being untouched. This comes from the fact, that the shell will expand the * before starting the move operation, so when the move is in progress, the list is already fixed.



    If you use Nautilus (and other GUI friends), you will end up the same way: It will run the move operation based on which files were selected - this doesn't change when new files show up.



    If you use your own program using syscalls along the line of loop over glob and only one mv until glob stays empty, you will end up with all 1300 files in the new directory. This is because every new glob will pick up the new files, that have showed up in the meantime.






    share|improve this answer




















    • 7





      What happens if you opendir() the source, then loop over readdir() or getdents()?

      – grawity
      Feb 26 at 12:15







    • 16





      The result-set of an opendir() is stable according to POSIX. A quick test with PHP's opendir() confirms that (but I tested only ext4).

      – Eugen Rieck
      Feb 26 at 14:53






    • 24





      @grawity: POSIX says: If a file is removed from or added to the directory after the most recent call to opendir() or rewinddir(), whether a subsequent call to readdir() returns an entry for that file is unspecified. Also, NFS may put some restrictions on what is implementable, IIRC it complicates implementation of telldir()/seekdir()

      – ninjalj
      Feb 26 at 17:51






    • 3





      @d-b With more than say 100 000 files, the expanded command line will exceed the maximum command length limit (ARG_MAX, usually a few MiB) and the mv will fail to execute.

      – TooTea
      Feb 27 at 8:18






    • 3





      @d-b If files are added or removed while expanding takes place, then the readdir discussion applies (because that's how the expansion is done too).

      – grawity
      Feb 27 at 8:24













    91












    91








    91







    This depends on which tools you use: Let's check a few cases:



    If you run something along the lines of mv /path/to/source/* /path/to/dest/ int a shell, you will end up with the original 1000 files being moved, the new 300 being untouched. This comes from the fact, that the shell will expand the * before starting the move operation, so when the move is in progress, the list is already fixed.



    If you use Nautilus (and other GUI friends), you will end up the same way: It will run the move operation based on which files were selected - this doesn't change when new files show up.



    If you use your own program using syscalls along the line of loop over glob and only one mv until glob stays empty, you will end up with all 1300 files in the new directory. This is because every new glob will pick up the new files, that have showed up in the meantime.






    share|improve this answer















    This depends on which tools you use: Let's check a few cases:



    If you run something along the lines of mv /path/to/source/* /path/to/dest/ int a shell, you will end up with the original 1000 files being moved, the new 300 being untouched. This comes from the fact, that the shell will expand the * before starting the move operation, so when the move is in progress, the list is already fixed.



    If you use Nautilus (and other GUI friends), you will end up the same way: It will run the move operation based on which files were selected - this doesn't change when new files show up.



    If you use your own program using syscalls along the line of loop over glob and only one mv until glob stays empty, you will end up with all 1300 files in the new directory. This is because every new glob will pick up the new files, that have showed up in the meantime.







    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Feb 27 at 19:02

























    answered Feb 26 at 12:11









    Eugen RieckEugen Rieck

    11.2k22429




    11.2k22429







    • 7





      What happens if you opendir() the source, then loop over readdir() or getdents()?

      – grawity
      Feb 26 at 12:15







    • 16





      The result-set of an opendir() is stable according to POSIX. A quick test with PHP's opendir() confirms that (but I tested only ext4).

      – Eugen Rieck
      Feb 26 at 14:53






    • 24





      @grawity: POSIX says: If a file is removed from or added to the directory after the most recent call to opendir() or rewinddir(), whether a subsequent call to readdir() returns an entry for that file is unspecified. Also, NFS may put some restrictions on what is implementable, IIRC it complicates implementation of telldir()/seekdir()

      – ninjalj
      Feb 26 at 17:51






    • 3





      @d-b With more than say 100 000 files, the expanded command line will exceed the maximum command length limit (ARG_MAX, usually a few MiB) and the mv will fail to execute.

      – TooTea
      Feb 27 at 8:18






    • 3





      @d-b If files are added or removed while expanding takes place, then the readdir discussion applies (because that's how the expansion is done too).

      – grawity
      Feb 27 at 8:24












    • 7





      What happens if you opendir() the source, then loop over readdir() or getdents()?

      – grawity
      Feb 26 at 12:15







    • 16





      The result-set of an opendir() is stable according to POSIX. A quick test with PHP's opendir() confirms that (but I tested only ext4).

      – Eugen Rieck
      Feb 26 at 14:53






    • 24





      @grawity: POSIX says: If a file is removed from or added to the directory after the most recent call to opendir() or rewinddir(), whether a subsequent call to readdir() returns an entry for that file is unspecified. Also, NFS may put some restrictions on what is implementable, IIRC it complicates implementation of telldir()/seekdir()

      – ninjalj
      Feb 26 at 17:51






    • 3





      @d-b With more than say 100 000 files, the expanded command line will exceed the maximum command length limit (ARG_MAX, usually a few MiB) and the mv will fail to execute.

      – TooTea
      Feb 27 at 8:18






    • 3





      @d-b If files are added or removed while expanding takes place, then the readdir discussion applies (because that's how the expansion is done too).

      – grawity
      Feb 27 at 8:24







    7




    7





    What happens if you opendir() the source, then loop over readdir() or getdents()?

    – grawity
    Feb 26 at 12:15






    What happens if you opendir() the source, then loop over readdir() or getdents()?

    – grawity
    Feb 26 at 12:15





    16




    16





    The result-set of an opendir() is stable according to POSIX. A quick test with PHP's opendir() confirms that (but I tested only ext4).

    – Eugen Rieck
    Feb 26 at 14:53





    The result-set of an opendir() is stable according to POSIX. A quick test with PHP's opendir() confirms that (but I tested only ext4).

    – Eugen Rieck
    Feb 26 at 14:53




    24




    24





    @grawity: POSIX says: If a file is removed from or added to the directory after the most recent call to opendir() or rewinddir(), whether a subsequent call to readdir() returns an entry for that file is unspecified. Also, NFS may put some restrictions on what is implementable, IIRC it complicates implementation of telldir()/seekdir()

    – ninjalj
    Feb 26 at 17:51





    @grawity: POSIX says: If a file is removed from or added to the directory after the most recent call to opendir() or rewinddir(), whether a subsequent call to readdir() returns an entry for that file is unspecified. Also, NFS may put some restrictions on what is implementable, IIRC it complicates implementation of telldir()/seekdir()

    – ninjalj
    Feb 26 at 17:51




    3




    3





    @d-b With more than say 100 000 files, the expanded command line will exceed the maximum command length limit (ARG_MAX, usually a few MiB) and the mv will fail to execute.

    – TooTea
    Feb 27 at 8:18





    @d-b With more than say 100 000 files, the expanded command line will exceed the maximum command length limit (ARG_MAX, usually a few MiB) and the mv will fail to execute.

    – TooTea
    Feb 27 at 8:18




    3




    3





    @d-b If files are added or removed while expanding takes place, then the readdir discussion applies (because that's how the expansion is done too).

    – grawity
    Feb 27 at 8:24





    @d-b If files are added or removed while expanding takes place, then the readdir discussion applies (because that's how the expansion is done too).

    – grawity
    Feb 27 at 8:24













    8














    When you tell the system to move all the files from a directory, it lists all the files and then starts moving them. If new files appear in the directory, they aren't added to the list of files to move, so they'll remain in the original location.



    You can, of course, program a way of moving files different to mv which will periodically check for new files in the source directory.






    share|improve this answer























    • like say xargs mv?

      – Joshua
      Feb 28 at 2:53















    8














    When you tell the system to move all the files from a directory, it lists all the files and then starts moving them. If new files appear in the directory, they aren't added to the list of files to move, so they'll remain in the original location.



    You can, of course, program a way of moving files different to mv which will periodically check for new files in the source directory.






    share|improve this answer























    • like say xargs mv?

      – Joshua
      Feb 28 at 2:53













    8












    8








    8







    When you tell the system to move all the files from a directory, it lists all the files and then starts moving them. If new files appear in the directory, they aren't added to the list of files to move, so they'll remain in the original location.



    You can, of course, program a way of moving files different to mv which will periodically check for new files in the source directory.






    share|improve this answer













    When you tell the system to move all the files from a directory, it lists all the files and then starts moving them. If new files appear in the directory, they aren't added to the list of files to move, so they'll remain in the original location.



    You can, of course, program a way of moving files different to mv which will periodically check for new files in the source directory.







    share|improve this answer












    share|improve this answer



    share|improve this answer










    answered Feb 26 at 12:07









    chorobachoroba

    13.4k13341




    13.4k13341












    • like say xargs mv?

      – Joshua
      Feb 28 at 2:53

















    • like say xargs mv?

      – Joshua
      Feb 28 at 2:53
















    like say xargs mv?

    – Joshua
    Feb 28 at 2:53





    like say xargs mv?

    – Joshua
    Feb 28 at 2:53











    8














    The kernel itself can't be "in the middle" of a "move 1000 files" operation. You need to be much more specific about what operation you're proposing.



    One thread can only move one file at a time with the rename(*oldpath, const char *newpath) or renameat system calls (and only within the same filesystem1). Or Linux renameat2 which has flags like RENAME_EXCHANGE to atomically exchange two pathnames, or RENAME_NOREPLACE to not replace the destination if it exists. (e.g. allowing a mv -i implementation that avoids the race condition of stat and then rename, which would still overwrite a file created after stat.
    link + unlink could also solve that, because link fails if the new name exists.)



    But each of these system calls only renames a single directory entry per system call. Using POSIX renameat with olddirfd and newdirfd (opened with open(O_DIRECTORY)) would allow you to keep looping over files in a directory even if the source or destination directory itself had been renamed. (Using relative paths could also allow that with regular rename().)



    Anyway, as the other answers say, most programs that use the rename system call will figure out a list of filenames before doing the first rename. (Usually using the readdir(3) POSIX library function as a wrapper for platform-specific system calls like Linux getdents).



    But if you're talking about find -exec ... ; to run one command per file, or the more efficient -exec + with so many files that they don't fit on one command line, then you can certainly have renames happening while still scanning. e.g.



    find . -name '*.txt' -exec mv -t ../txtfiles ; # Intentionally inefficient


    If you created some new .txt files while this was running, you might see some of them in ../txtfiles. But internally find(1) will have used open(O_DIRECTORY) and getdents on ..



    If one system call was enough to return all the directory entries in . (which find will loop over one at a time, only making further system calls if needed for -type or to recurse, or fork+exec on a match), then the list is a snapshot of the directory entries at one point in time. Further changes to the directory can't affect what find does, because it already has a copy of the directory listing what it will loop over. (Probably it internally uses readdir(3), which returns one entry at a time, but inside glibc we know from using strace find . that it makes a getdents64 system call with a buffer size of count=32768 entries.)



    But if the directory is huge and/or the kernel doesn't fill find's buffer, it will have to make a 2nd getdents system call after looping over what it got the first time. So it could maybe see new entries after doing some renames.



    But see discussion in comments under other answers: the kernel might have snapshotted for us, because (I think) getdents isn't allowed to return the same filename twice. Different filesystems use different sorting / indexing mechanisms for making access to an entry in a huge directory more efficient than a linear search. So adding or removing a directory might possibly have other effects on the order of the remaining entries. Hmm, probably it's more likely that filesystems keep a stable order, and just update an actual index (like the EXT4 dir_index feature), so a directory FD's position can just be a directory entry to resume from? I really don't know how the telldir(3) library interface maps onto lseek, or if that's purely a user-space thing for looping over the buffer obtained by user-space. But multiple getdents can be needed to get all the entries from a huge directory, so even if seeking isn't supported, the kernel needs to be able to record a current position.




    Footnote 1:



    To "move" between filesystems, it's up to user-space to copy and unlink. (e.g. with open and either read+write, mmap+write or sendfile(2) or copy_file_range(2), the latter two totally avoiding bouncing the file data through user-space.)






    share|improve this answer



























      8














      The kernel itself can't be "in the middle" of a "move 1000 files" operation. You need to be much more specific about what operation you're proposing.



      One thread can only move one file at a time with the rename(*oldpath, const char *newpath) or renameat system calls (and only within the same filesystem1). Or Linux renameat2 which has flags like RENAME_EXCHANGE to atomically exchange two pathnames, or RENAME_NOREPLACE to not replace the destination if it exists. (e.g. allowing a mv -i implementation that avoids the race condition of stat and then rename, which would still overwrite a file created after stat.
      link + unlink could also solve that, because link fails if the new name exists.)



      But each of these system calls only renames a single directory entry per system call. Using POSIX renameat with olddirfd and newdirfd (opened with open(O_DIRECTORY)) would allow you to keep looping over files in a directory even if the source or destination directory itself had been renamed. (Using relative paths could also allow that with regular rename().)



      Anyway, as the other answers say, most programs that use the rename system call will figure out a list of filenames before doing the first rename. (Usually using the readdir(3) POSIX library function as a wrapper for platform-specific system calls like Linux getdents).



      But if you're talking about find -exec ... ; to run one command per file, or the more efficient -exec + with so many files that they don't fit on one command line, then you can certainly have renames happening while still scanning. e.g.



      find . -name '*.txt' -exec mv -t ../txtfiles ; # Intentionally inefficient


      If you created some new .txt files while this was running, you might see some of them in ../txtfiles. But internally find(1) will have used open(O_DIRECTORY) and getdents on ..



      If one system call was enough to return all the directory entries in . (which find will loop over one at a time, only making further system calls if needed for -type or to recurse, or fork+exec on a match), then the list is a snapshot of the directory entries at one point in time. Further changes to the directory can't affect what find does, because it already has a copy of the directory listing what it will loop over. (Probably it internally uses readdir(3), which returns one entry at a time, but inside glibc we know from using strace find . that it makes a getdents64 system call with a buffer size of count=32768 entries.)



      But if the directory is huge and/or the kernel doesn't fill find's buffer, it will have to make a 2nd getdents system call after looping over what it got the first time. So it could maybe see new entries after doing some renames.



      But see discussion in comments under other answers: the kernel might have snapshotted for us, because (I think) getdents isn't allowed to return the same filename twice. Different filesystems use different sorting / indexing mechanisms for making access to an entry in a huge directory more efficient than a linear search. So adding or removing a directory might possibly have other effects on the order of the remaining entries. Hmm, probably it's more likely that filesystems keep a stable order, and just update an actual index (like the EXT4 dir_index feature), so a directory FD's position can just be a directory entry to resume from? I really don't know how the telldir(3) library interface maps onto lseek, or if that's purely a user-space thing for looping over the buffer obtained by user-space. But multiple getdents can be needed to get all the entries from a huge directory, so even if seeking isn't supported, the kernel needs to be able to record a current position.




      Footnote 1:



      To "move" between filesystems, it's up to user-space to copy and unlink. (e.g. with open and either read+write, mmap+write or sendfile(2) or copy_file_range(2), the latter two totally avoiding bouncing the file data through user-space.)






      share|improve this answer

























        8












        8








        8







        The kernel itself can't be "in the middle" of a "move 1000 files" operation. You need to be much more specific about what operation you're proposing.



        One thread can only move one file at a time with the rename(*oldpath, const char *newpath) or renameat system calls (and only within the same filesystem1). Or Linux renameat2 which has flags like RENAME_EXCHANGE to atomically exchange two pathnames, or RENAME_NOREPLACE to not replace the destination if it exists. (e.g. allowing a mv -i implementation that avoids the race condition of stat and then rename, which would still overwrite a file created after stat.
        link + unlink could also solve that, because link fails if the new name exists.)



        But each of these system calls only renames a single directory entry per system call. Using POSIX renameat with olddirfd and newdirfd (opened with open(O_DIRECTORY)) would allow you to keep looping over files in a directory even if the source or destination directory itself had been renamed. (Using relative paths could also allow that with regular rename().)



        Anyway, as the other answers say, most programs that use the rename system call will figure out a list of filenames before doing the first rename. (Usually using the readdir(3) POSIX library function as a wrapper for platform-specific system calls like Linux getdents).



        But if you're talking about find -exec ... ; to run one command per file, or the more efficient -exec + with so many files that they don't fit on one command line, then you can certainly have renames happening while still scanning. e.g.



        find . -name '*.txt' -exec mv -t ../txtfiles ; # Intentionally inefficient


        If you created some new .txt files while this was running, you might see some of them in ../txtfiles. But internally find(1) will have used open(O_DIRECTORY) and getdents on ..



        If one system call was enough to return all the directory entries in . (which find will loop over one at a time, only making further system calls if needed for -type or to recurse, or fork+exec on a match), then the list is a snapshot of the directory entries at one point in time. Further changes to the directory can't affect what find does, because it already has a copy of the directory listing what it will loop over. (Probably it internally uses readdir(3), which returns one entry at a time, but inside glibc we know from using strace find . that it makes a getdents64 system call with a buffer size of count=32768 entries.)



        But if the directory is huge and/or the kernel doesn't fill find's buffer, it will have to make a 2nd getdents system call after looping over what it got the first time. So it could maybe see new entries after doing some renames.



        But see discussion in comments under other answers: the kernel might have snapshotted for us, because (I think) getdents isn't allowed to return the same filename twice. Different filesystems use different sorting / indexing mechanisms for making access to an entry in a huge directory more efficient than a linear search. So adding or removing a directory might possibly have other effects on the order of the remaining entries. Hmm, probably it's more likely that filesystems keep a stable order, and just update an actual index (like the EXT4 dir_index feature), so a directory FD's position can just be a directory entry to resume from? I really don't know how the telldir(3) library interface maps onto lseek, or if that's purely a user-space thing for looping over the buffer obtained by user-space. But multiple getdents can be needed to get all the entries from a huge directory, so even if seeking isn't supported, the kernel needs to be able to record a current position.




        Footnote 1:



        To "move" between filesystems, it's up to user-space to copy and unlink. (e.g. with open and either read+write, mmap+write or sendfile(2) or copy_file_range(2), the latter two totally avoiding bouncing the file data through user-space.)






        share|improve this answer













        The kernel itself can't be "in the middle" of a "move 1000 files" operation. You need to be much more specific about what operation you're proposing.



        One thread can only move one file at a time with the rename(*oldpath, const char *newpath) or renameat system calls (and only within the same filesystem1). Or Linux renameat2 which has flags like RENAME_EXCHANGE to atomically exchange two pathnames, or RENAME_NOREPLACE to not replace the destination if it exists. (e.g. allowing a mv -i implementation that avoids the race condition of stat and then rename, which would still overwrite a file created after stat.
        link + unlink could also solve that, because link fails if the new name exists.)



        But each of these system calls only renames a single directory entry per system call. Using POSIX renameat with olddirfd and newdirfd (opened with open(O_DIRECTORY)) would allow you to keep looping over files in a directory even if the source or destination directory itself had been renamed. (Using relative paths could also allow that with regular rename().)



        Anyway, as the other answers say, most programs that use the rename system call will figure out a list of filenames before doing the first rename. (Usually using the readdir(3) POSIX library function as a wrapper for platform-specific system calls like Linux getdents).



        But if you're talking about find -exec ... ; to run one command per file, or the more efficient -exec + with so many files that they don't fit on one command line, then you can certainly have renames happening while still scanning. e.g.



        find . -name '*.txt' -exec mv -t ../txtfiles ; # Intentionally inefficient


        If you created some new .txt files while this was running, you might see some of them in ../txtfiles. But internally find(1) will have used open(O_DIRECTORY) and getdents on ..



        If one system call was enough to return all the directory entries in . (which find will loop over one at a time, only making further system calls if needed for -type or to recurse, or fork+exec on a match), then the list is a snapshot of the directory entries at one point in time. Further changes to the directory can't affect what find does, because it already has a copy of the directory listing what it will loop over. (Probably it internally uses readdir(3), which returns one entry at a time, but inside glibc we know from using strace find . that it makes a getdents64 system call with a buffer size of count=32768 entries.)



        But if the directory is huge and/or the kernel doesn't fill find's buffer, it will have to make a 2nd getdents system call after looping over what it got the first time. So it could maybe see new entries after doing some renames.



        But see discussion in comments under other answers: the kernel might have snapshotted for us, because (I think) getdents isn't allowed to return the same filename twice. Different filesystems use different sorting / indexing mechanisms for making access to an entry in a huge directory more efficient than a linear search. So adding or removing a directory might possibly have other effects on the order of the remaining entries. Hmm, probably it's more likely that filesystems keep a stable order, and just update an actual index (like the EXT4 dir_index feature), so a directory FD's position can just be a directory entry to resume from? I really don't know how the telldir(3) library interface maps onto lseek, or if that's purely a user-space thing for looping over the buffer obtained by user-space. But multiple getdents can be needed to get all the entries from a huge directory, so even if seeking isn't supported, the kernel needs to be able to record a current position.




        Footnote 1:



        To "move" between filesystems, it's up to user-space to copy and unlink. (e.g. with open and either read+write, mmap+write or sendfile(2) or copy_file_range(2), the latter two totally avoiding bouncing the file data through user-space.)







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Feb 28 at 4:25









        Peter CordesPeter Cordes

        2,4361621




        2,4361621



























            draft saved

            draft discarded
















































            Thanks for contributing an answer to Super User!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsuperuser.com%2fquestions%2f1409532%2fin-linux-what-happens-if-1000-files-in-a-directory-are-moved-to-another-location%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown






            Popular posts from this blog

            How to check contact read email or not when send email to Individual?

            Displaying single band from multi-band raster using QGIS

            How many registers does an x86_64 CPU actually have?