Could the situation: many CPUs simultaneously read the same single file, slow down the speed of reading of every CPU?
Clash Royale CLAN TAG#URR8PPP
I am running cat
to separately combine file_X
with a numerous number of files e.g. file_1
to file_100000000000
.
Due to the large number, I distributed the job to a node with 64 CPUs to run parallelly on each CPU. Each job is ran in a sub-folder, and so there are 64 sub-folders.
To my surprise, the overall speed is much slower than expected.
As the shell script I used just direct each job to the same single file_X
located in the parent directory of the 64 sub-folders, I wonder if many CPUs simultaneously read the same single file, would slow down the speed of reading of every CPU?
parallelism
add a comment |
I am running cat
to separately combine file_X
with a numerous number of files e.g. file_1
to file_100000000000
.
Due to the large number, I distributed the job to a node with 64 CPUs to run parallelly on each CPU. Each job is ran in a sub-folder, and so there are 64 sub-folders.
To my surprise, the overall speed is much slower than expected.
As the shell script I used just direct each job to the same single file_X
located in the parent directory of the 64 sub-folders, I wonder if many CPUs simultaneously read the same single file, would slow down the speed of reading of every CPU?
parallelism
add a comment |
I am running cat
to separately combine file_X
with a numerous number of files e.g. file_1
to file_100000000000
.
Due to the large number, I distributed the job to a node with 64 CPUs to run parallelly on each CPU. Each job is ran in a sub-folder, and so there are 64 sub-folders.
To my surprise, the overall speed is much slower than expected.
As the shell script I used just direct each job to the same single file_X
located in the parent directory of the 64 sub-folders, I wonder if many CPUs simultaneously read the same single file, would slow down the speed of reading of every CPU?
parallelism
I am running cat
to separately combine file_X
with a numerous number of files e.g. file_1
to file_100000000000
.
Due to the large number, I distributed the job to a node with 64 CPUs to run parallelly on each CPU. Each job is ran in a sub-folder, and so there are 64 sub-folders.
To my surprise, the overall speed is much slower than expected.
As the shell script I used just direct each job to the same single file_X
located in the parent directory of the 64 sub-folders, I wonder if many CPUs simultaneously read the same single file, would slow down the speed of reading of every CPU?
parallelism
parallelism
edited Feb 21 at 2:27
Johnny Tam
asked Feb 21 at 2:10
Johnny TamJohnny Tam
122111
122111
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
Yes and no.
The actual reading of the file should happen at the same speed regardless of how many processors are doing it.
However, depending on the operating system and its configuration, there could be file locking that is happening. While multiple processes can have a read lock simultaneously, getting a lock and releasing that lock must happen in a shared mutual exclusion block. If your system is doing those sorts of locks, the processors must line up to get access to the file, and then must line up afterwards to declare their lack of further interest in the file.
Depending on the file system that stores the file_X and the various files being combined with it and the options with which that file system was mounted, the access time of file_X may be getting updated each time cat reads it. If this is the case, it's probable that there will be a write lock made to the file_X inode before each update, which will then be released.
Another possible reason for the reduced speed is the fact that all 64 of these jobs are writing files in parallel, which are necessarily on different points on the disk. Unless you're using a solid state drive (SSD), that could require a substantial amount of moving the write heads around on the disk. Those files are in 64 different directories, so there are 64 places to update besides the files being created.
Doing all of this activity in a shell script means that each file copy is fork-execed. Fork is seen as a fairly expensive system call, but on a system with shared libraries, it pales in comparison to the cost of the exec family of system call, as that requires doing a search for every shared library, and loading all of those shared libraries. This is another place which could potentially have a read lock placed on a file, depending on exactly what unix this is on and possibly what its configuration is.
2
In the normal case there is no file locking in unix/linux, certainly not when just usingcat
or the shell. I expect the slowdown will be due to disk thrashing, as described by the 4th paragraph of this answer.
– wurtel
Feb 21 at 7:59
Thank you very much!
– Johnny Tam
Feb 26 at 7:38
add a comment |
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f501993%2fcould-the-situation-many-cpus-simultaneously-read-the-same-single-file-slow-do%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
Yes and no.
The actual reading of the file should happen at the same speed regardless of how many processors are doing it.
However, depending on the operating system and its configuration, there could be file locking that is happening. While multiple processes can have a read lock simultaneously, getting a lock and releasing that lock must happen in a shared mutual exclusion block. If your system is doing those sorts of locks, the processors must line up to get access to the file, and then must line up afterwards to declare their lack of further interest in the file.
Depending on the file system that stores the file_X and the various files being combined with it and the options with which that file system was mounted, the access time of file_X may be getting updated each time cat reads it. If this is the case, it's probable that there will be a write lock made to the file_X inode before each update, which will then be released.
Another possible reason for the reduced speed is the fact that all 64 of these jobs are writing files in parallel, which are necessarily on different points on the disk. Unless you're using a solid state drive (SSD), that could require a substantial amount of moving the write heads around on the disk. Those files are in 64 different directories, so there are 64 places to update besides the files being created.
Doing all of this activity in a shell script means that each file copy is fork-execed. Fork is seen as a fairly expensive system call, but on a system with shared libraries, it pales in comparison to the cost of the exec family of system call, as that requires doing a search for every shared library, and loading all of those shared libraries. This is another place which could potentially have a read lock placed on a file, depending on exactly what unix this is on and possibly what its configuration is.
2
In the normal case there is no file locking in unix/linux, certainly not when just usingcat
or the shell. I expect the slowdown will be due to disk thrashing, as described by the 4th paragraph of this answer.
– wurtel
Feb 21 at 7:59
Thank you very much!
– Johnny Tam
Feb 26 at 7:38
add a comment |
Yes and no.
The actual reading of the file should happen at the same speed regardless of how many processors are doing it.
However, depending on the operating system and its configuration, there could be file locking that is happening. While multiple processes can have a read lock simultaneously, getting a lock and releasing that lock must happen in a shared mutual exclusion block. If your system is doing those sorts of locks, the processors must line up to get access to the file, and then must line up afterwards to declare their lack of further interest in the file.
Depending on the file system that stores the file_X and the various files being combined with it and the options with which that file system was mounted, the access time of file_X may be getting updated each time cat reads it. If this is the case, it's probable that there will be a write lock made to the file_X inode before each update, which will then be released.
Another possible reason for the reduced speed is the fact that all 64 of these jobs are writing files in parallel, which are necessarily on different points on the disk. Unless you're using a solid state drive (SSD), that could require a substantial amount of moving the write heads around on the disk. Those files are in 64 different directories, so there are 64 places to update besides the files being created.
Doing all of this activity in a shell script means that each file copy is fork-execed. Fork is seen as a fairly expensive system call, but on a system with shared libraries, it pales in comparison to the cost of the exec family of system call, as that requires doing a search for every shared library, and loading all of those shared libraries. This is another place which could potentially have a read lock placed on a file, depending on exactly what unix this is on and possibly what its configuration is.
2
In the normal case there is no file locking in unix/linux, certainly not when just usingcat
or the shell. I expect the slowdown will be due to disk thrashing, as described by the 4th paragraph of this answer.
– wurtel
Feb 21 at 7:59
Thank you very much!
– Johnny Tam
Feb 26 at 7:38
add a comment |
Yes and no.
The actual reading of the file should happen at the same speed regardless of how many processors are doing it.
However, depending on the operating system and its configuration, there could be file locking that is happening. While multiple processes can have a read lock simultaneously, getting a lock and releasing that lock must happen in a shared mutual exclusion block. If your system is doing those sorts of locks, the processors must line up to get access to the file, and then must line up afterwards to declare their lack of further interest in the file.
Depending on the file system that stores the file_X and the various files being combined with it and the options with which that file system was mounted, the access time of file_X may be getting updated each time cat reads it. If this is the case, it's probable that there will be a write lock made to the file_X inode before each update, which will then be released.
Another possible reason for the reduced speed is the fact that all 64 of these jobs are writing files in parallel, which are necessarily on different points on the disk. Unless you're using a solid state drive (SSD), that could require a substantial amount of moving the write heads around on the disk. Those files are in 64 different directories, so there are 64 places to update besides the files being created.
Doing all of this activity in a shell script means that each file copy is fork-execed. Fork is seen as a fairly expensive system call, but on a system with shared libraries, it pales in comparison to the cost of the exec family of system call, as that requires doing a search for every shared library, and loading all of those shared libraries. This is another place which could potentially have a read lock placed on a file, depending on exactly what unix this is on and possibly what its configuration is.
Yes and no.
The actual reading of the file should happen at the same speed regardless of how many processors are doing it.
However, depending on the operating system and its configuration, there could be file locking that is happening. While multiple processes can have a read lock simultaneously, getting a lock and releasing that lock must happen in a shared mutual exclusion block. If your system is doing those sorts of locks, the processors must line up to get access to the file, and then must line up afterwards to declare their lack of further interest in the file.
Depending on the file system that stores the file_X and the various files being combined with it and the options with which that file system was mounted, the access time of file_X may be getting updated each time cat reads it. If this is the case, it's probable that there will be a write lock made to the file_X inode before each update, which will then be released.
Another possible reason for the reduced speed is the fact that all 64 of these jobs are writing files in parallel, which are necessarily on different points on the disk. Unless you're using a solid state drive (SSD), that could require a substantial amount of moving the write heads around on the disk. Those files are in 64 different directories, so there are 64 places to update besides the files being created.
Doing all of this activity in a shell script means that each file copy is fork-execed. Fork is seen as a fairly expensive system call, but on a system with shared libraries, it pales in comparison to the cost of the exec family of system call, as that requires doing a search for every shared library, and loading all of those shared libraries. This is another place which could potentially have a read lock placed on a file, depending on exactly what unix this is on and possibly what its configuration is.
answered Feb 21 at 4:16
Ed GrimmEd Grimm
4988
4988
2
In the normal case there is no file locking in unix/linux, certainly not when just usingcat
or the shell. I expect the slowdown will be due to disk thrashing, as described by the 4th paragraph of this answer.
– wurtel
Feb 21 at 7:59
Thank you very much!
– Johnny Tam
Feb 26 at 7:38
add a comment |
2
In the normal case there is no file locking in unix/linux, certainly not when just usingcat
or the shell. I expect the slowdown will be due to disk thrashing, as described by the 4th paragraph of this answer.
– wurtel
Feb 21 at 7:59
Thank you very much!
– Johnny Tam
Feb 26 at 7:38
2
2
In the normal case there is no file locking in unix/linux, certainly not when just using
cat
or the shell. I expect the slowdown will be due to disk thrashing, as described by the 4th paragraph of this answer.– wurtel
Feb 21 at 7:59
In the normal case there is no file locking in unix/linux, certainly not when just using
cat
or the shell. I expect the slowdown will be due to disk thrashing, as described by the 4th paragraph of this answer.– wurtel
Feb 21 at 7:59
Thank you very much!
– Johnny Tam
Feb 26 at 7:38
Thank you very much!
– Johnny Tam
Feb 26 at 7:38
add a comment |
Thanks for contributing an answer to Unix & Linux Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f501993%2fcould-the-situation-many-cpus-simultaneously-read-the-same-single-file-slow-do%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown