Could the situation: many CPUs simultaneously read the same single file, slow down the speed of reading of every CPU?

I am running cat to separately combine file_X with a numerous number of files e.g. file_1 to file_100000000000.

Due to the large number, I distributed the job to a node with 64 CPUs to run parallelly on each CPU. Each job is ran in a sub-folder, and so there are 64 sub-folders.

To my surprise, the overall speed is much slower than expected.

As the shell script I used just direct each job to the same single file_X located in the parent directory of the 64 sub-folders, I wonder if many CPUs simultaneously read the same single file, would slow down the speed of reading of every CPU?

edited Feb 21 at 2:27

asked Feb 21 at 2:10

Johnny Tam

122111

add a comment |

I am running cat to separately combine file_X with a numerous number of files e.g. file_1 to file_100000000000.

Due to the large number, I distributed the job to a node with 64 CPUs to run parallelly on each CPU. Each job is ran in a sub-folder, and so there are 64 sub-folders.

To my surprise, the overall speed is much slower than expected.

edited Feb 21 at 2:27

asked Feb 21 at 2:10

Johnny Tam

122111

add a comment |

I am running cat to separately combine file_X with a numerous number of files e.g. file_1 to file_100000000000.

Due to the large number, I distributed the job to a node with 64 CPUs to run parallelly on each CPU. Each job is ran in a sub-folder, and so there are 64 sub-folders.

To my surprise, the overall speed is much slower than expected.

edited Feb 21 at 2:27

asked Feb 21 at 2:10

Johnny Tam

122111

I am running cat to separately combine file_X with a numerous number of files e.g. file_1 to file_100000000000.

Due to the large number, I distributed the job to a node with 64 CPUs to run parallelly on each CPU. Each job is ran in a sub-folder, and so there are 64 sub-folders.

To my surprise, the overall speed is much slower than expected.

parallelism

edited Feb 21 at 2:27

asked Feb 21 at 2:10

Johnny Tam

122111

edited Feb 21 at 2:27

asked Feb 21 at 2:10

Johnny Tam

122111

edited Feb 21 at 2:27

asked Feb 21 at 2:10

Johnny Tam

122111

asked Feb 21 at 2:10

Johnny Tam

122111

asked Feb 21 at 2:10

Johnny Tam

122111

add a comment |

1 Answer
1

active

oldest

votes

Yes and no.

The actual reading of the file should happen at the same speed regardless of how many processors are doing it.

However, depending on the operating system and its configuration, there could be file locking that is happening. While multiple processes can have a read lock simultaneously, getting a lock and releasing that lock must happen in a shared mutual exclusion block. If your system is doing those sorts of locks, the processors must line up to get access to the file, and then must line up afterwards to declare their lack of further interest in the file.

Depending on the file system that stores the file_X and the various files being combined with it and the options with which that file system was mounted, the access time of file_X may be getting updated each time cat reads it. If this is the case, it's probable that there will be a write lock made to the file_X inode before each update, which will then be released.

Another possible reason for the reduced speed is the fact that all 64 of these jobs are writing files in parallel, which are necessarily on different points on the disk. Unless you're using a solid state drive (SSD), that could require a substantial amount of moving the write heads around on the disk. Those files are in 64 different directories, so there are 64 places to update besides the files being created.

Doing all of this activity in a shell script means that each file copy is fork-execed. Fork is seen as a fairly expensive system call, but on a system with shared libraries, it pales in comparison to the cost of the exec family of system call, as that requires doing a search for every shared library, and loading all of those shared libraries. This is another place which could potentially have a read lock placed on a file, depending on exactly what unix this is on and possibly what its configuration is.

answered Feb 21 at 4:16

Ed Grimm

4988

2

In the normal case there is no file locking in unix/linux, certainly not when just using cat or the shell. I expect the slowdown will be due to disk thrashing, as described by the 4th paragraph of this answer.

– wurtel
Feb 21 at 7:59

Thank you very much!

– Johnny Tam
Feb 26 at 7:38

add a comment |

Your Answer

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f501993%2fcould-the-situation-many-cpus-simultaneously-read-the-same-single-file-slow-do%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

Yes and no.

The actual reading of the file should happen at the same speed regardless of how many processors are doing it.

answered Feb 21 at 4:16

Ed Grimm

4988

2

In the normal case there is no file locking in unix/linux, certainly not when just using cat or the shell. I expect the slowdown will be due to disk thrashing, as described by the 4th paragraph of this answer.

– wurtel
Feb 21 at 7:59

Thank you very much!

– Johnny Tam
Feb 26 at 7:38

add a comment |

Yes and no.

The actual reading of the file should happen at the same speed regardless of how many processors are doing it.

answered Feb 21 at 4:16

Ed Grimm

4988

2

In the normal case there is no file locking in unix/linux, certainly not when just using cat or the shell. I expect the slowdown will be due to disk thrashing, as described by the 4th paragraph of this answer.

– wurtel
Feb 21 at 7:59

Thank you very much!

– Johnny Tam
Feb 26 at 7:38

add a comment |

Yes and no.

The actual reading of the file should happen at the same speed regardless of how many processors are doing it.

answered Feb 21 at 4:16

Ed Grimm

4988

Yes and no.

The actual reading of the file should happen at the same speed regardless of how many processors are doing it.

answered Feb 21 at 4:16

Ed Grimm

4988

answered Feb 21 at 4:16

Ed Grimm

4988

answered Feb 21 at 4:16

Ed Grimm

4988

answered Feb 21 at 4:16

Ed Grimm

4988

2

In the normal case there is no file locking in unix/linux, certainly not when just using cat or the shell. I expect the slowdown will be due to disk thrashing, as described by the 4th paragraph of this answer.

– wurtel
Feb 21 at 7:59

Thank you very much!

– Johnny Tam
Feb 26 at 7:38

add a comment |

2

In the normal case there is no file locking in unix/linux, certainly not when just using cat or the shell. I expect the slowdown will be due to disk thrashing, as described by the 4th paragraph of this answer.

– wurtel
Feb 21 at 7:59

Thank you very much!

– Johnny Tam
Feb 26 at 7:38

In the normal case there is no file locking in unix/linux, certainly not when just using cat or the shell. I expect the slowdown will be due to disk thrashing, as described by the 4th paragraph of this answer.

– wurtel
Feb 21 at 7:59

Thank you very much!

– Johnny Tam
Feb 26 at 7:38

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Unix & Linux Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

搜尋此網誌

mjhjmtu