What characterizes a file in Linux/Unix?
Clash Royale CLAN TAG#URR8PPP
What characterizes a file in Linux/Unix?
A file can have many types: regular file, directory, symlink, device file, socket, pipe, fifo, and more that I miss. For example, a symlink:
$ sudo file /proc/22277/fd/23
/proc/22277/fd/23: broken symbolic link to socket:[7540288]
a socket:
$ sudo ls -l /run/user/1001/systemd/notify
srwxrwxr-x 1 testme testme 0 Feb 6 16:41 /run/user/1001/systemd/notify
Is a file characterized as something with an inode (an inode in some filesystem, either in memory or in secondary storage device?)? Do files of all the file types have inodes? (I guess yes to both questions.)
Linux's Internet domain socket, transport protocols (TCP/UDP)'s socket and port seems to say something with an open file description is a file. Does something with an open file description necessarily have an inode?
open file description is a much better terminology than file, you can't define "file". Network socket and Unix domain socket are all open file description. UDS might or might associate something on the disk(there's a lot of condition can affect this). NS never associate anything on disk.
Thanks.
files pipe socket inode fifo
|
show 3 more comments
What characterizes a file in Linux/Unix?
A file can have many types: regular file, directory, symlink, device file, socket, pipe, fifo, and more that I miss. For example, a symlink:
$ sudo file /proc/22277/fd/23
/proc/22277/fd/23: broken symbolic link to socket:[7540288]
a socket:
$ sudo ls -l /run/user/1001/systemd/notify
srwxrwxr-x 1 testme testme 0 Feb 6 16:41 /run/user/1001/systemd/notify
Is a file characterized as something with an inode (an inode in some filesystem, either in memory or in secondary storage device?)? Do files of all the file types have inodes? (I guess yes to both questions.)
Linux's Internet domain socket, transport protocols (TCP/UDP)'s socket and port seems to say something with an open file description is a file. Does something with an open file description necessarily have an inode?
open file description is a much better terminology than file, you can't define "file". Network socket and Unix domain socket are all open file description. UDS might or might associate something on the disk(there's a lot of condition can affect this). NS never associate anything on disk.
Thanks.
files pipe socket inode fifo
I never define "file". I also don't know what do you mean by "something with a open file description"
– 炸鱼薯条德里克
Feb 15 at 16:18
FIFO and pipe are the same thing. the one you've missed is device can be "block device" or "character device".
– Philip Couling
Feb 15 at 16:24
Also see man7.org/linux/man-pages/man5/proc.5.html
– 炸鱼薯条德里克
Feb 15 at 16:26
2
On linux a file is any thing that is referenced by a file descriptor, and a file descriptor is anything that can be close(2)d. I'm sorry if you don't like the definition, but the fact that linux is using anonymous inodes or such is an implementation detail, and the info fstat(2) returns from an anonymous inode is most of the time neither useful nor interesting.
– mosvy
Feb 15 at 18:51
1
An "anonymous inode" is an in-memory kernel structure that is used to implement file-like things like those created by epoll(2), timerfd(2), signalfd(2), eventfd(2), bpf(2), inotify(2), etc, etc. You can look at thefs/anon_inodes.c
in the kernel source.
– mosvy
Feb 15 at 19:20
|
show 3 more comments
What characterizes a file in Linux/Unix?
A file can have many types: regular file, directory, symlink, device file, socket, pipe, fifo, and more that I miss. For example, a symlink:
$ sudo file /proc/22277/fd/23
/proc/22277/fd/23: broken symbolic link to socket:[7540288]
a socket:
$ sudo ls -l /run/user/1001/systemd/notify
srwxrwxr-x 1 testme testme 0 Feb 6 16:41 /run/user/1001/systemd/notify
Is a file characterized as something with an inode (an inode in some filesystem, either in memory or in secondary storage device?)? Do files of all the file types have inodes? (I guess yes to both questions.)
Linux's Internet domain socket, transport protocols (TCP/UDP)'s socket and port seems to say something with an open file description is a file. Does something with an open file description necessarily have an inode?
open file description is a much better terminology than file, you can't define "file". Network socket and Unix domain socket are all open file description. UDS might or might associate something on the disk(there's a lot of condition can affect this). NS never associate anything on disk.
Thanks.
files pipe socket inode fifo
What characterizes a file in Linux/Unix?
A file can have many types: regular file, directory, symlink, device file, socket, pipe, fifo, and more that I miss. For example, a symlink:
$ sudo file /proc/22277/fd/23
/proc/22277/fd/23: broken symbolic link to socket:[7540288]
a socket:
$ sudo ls -l /run/user/1001/systemd/notify
srwxrwxr-x 1 testme testme 0 Feb 6 16:41 /run/user/1001/systemd/notify
Is a file characterized as something with an inode (an inode in some filesystem, either in memory or in secondary storage device?)? Do files of all the file types have inodes? (I guess yes to both questions.)
Linux's Internet domain socket, transport protocols (TCP/UDP)'s socket and port seems to say something with an open file description is a file. Does something with an open file description necessarily have an inode?
open file description is a much better terminology than file, you can't define "file". Network socket and Unix domain socket are all open file description. UDS might or might associate something on the disk(there's a lot of condition can affect this). NS never associate anything on disk.
Thanks.
files pipe socket inode fifo
files pipe socket inode fifo
edited Feb 15 at 18:21
K7AAY
680624
680624
asked Feb 15 at 16:01
TimTim
27.6k78264480
27.6k78264480
I never define "file". I also don't know what do you mean by "something with a open file description"
– 炸鱼薯条德里克
Feb 15 at 16:18
FIFO and pipe are the same thing. the one you've missed is device can be "block device" or "character device".
– Philip Couling
Feb 15 at 16:24
Also see man7.org/linux/man-pages/man5/proc.5.html
– 炸鱼薯条德里克
Feb 15 at 16:26
2
On linux a file is any thing that is referenced by a file descriptor, and a file descriptor is anything that can be close(2)d. I'm sorry if you don't like the definition, but the fact that linux is using anonymous inodes or such is an implementation detail, and the info fstat(2) returns from an anonymous inode is most of the time neither useful nor interesting.
– mosvy
Feb 15 at 18:51
1
An "anonymous inode" is an in-memory kernel structure that is used to implement file-like things like those created by epoll(2), timerfd(2), signalfd(2), eventfd(2), bpf(2), inotify(2), etc, etc. You can look at thefs/anon_inodes.c
in the kernel source.
– mosvy
Feb 15 at 19:20
|
show 3 more comments
I never define "file". I also don't know what do you mean by "something with a open file description"
– 炸鱼薯条德里克
Feb 15 at 16:18
FIFO and pipe are the same thing. the one you've missed is device can be "block device" or "character device".
– Philip Couling
Feb 15 at 16:24
Also see man7.org/linux/man-pages/man5/proc.5.html
– 炸鱼薯条德里克
Feb 15 at 16:26
2
On linux a file is any thing that is referenced by a file descriptor, and a file descriptor is anything that can be close(2)d. I'm sorry if you don't like the definition, but the fact that linux is using anonymous inodes or such is an implementation detail, and the info fstat(2) returns from an anonymous inode is most of the time neither useful nor interesting.
– mosvy
Feb 15 at 18:51
1
An "anonymous inode" is an in-memory kernel structure that is used to implement file-like things like those created by epoll(2), timerfd(2), signalfd(2), eventfd(2), bpf(2), inotify(2), etc, etc. You can look at thefs/anon_inodes.c
in the kernel source.
– mosvy
Feb 15 at 19:20
I never define "file". I also don't know what do you mean by "something with a open file description"
– 炸鱼薯条德里克
Feb 15 at 16:18
I never define "file". I also don't know what do you mean by "something with a open file description"
– 炸鱼薯条德里克
Feb 15 at 16:18
FIFO and pipe are the same thing. the one you've missed is device can be "block device" or "character device".
– Philip Couling
Feb 15 at 16:24
FIFO and pipe are the same thing. the one you've missed is device can be "block device" or "character device".
– Philip Couling
Feb 15 at 16:24
Also see man7.org/linux/man-pages/man5/proc.5.html
– 炸鱼薯条德里克
Feb 15 at 16:26
Also see man7.org/linux/man-pages/man5/proc.5.html
– 炸鱼薯条德里克
Feb 15 at 16:26
2
2
On linux a file is any thing that is referenced by a file descriptor, and a file descriptor is anything that can be close(2)d. I'm sorry if you don't like the definition, but the fact that linux is using anonymous inodes or such is an implementation detail, and the info fstat(2) returns from an anonymous inode is most of the time neither useful nor interesting.
– mosvy
Feb 15 at 18:51
On linux a file is any thing that is referenced by a file descriptor, and a file descriptor is anything that can be close(2)d. I'm sorry if you don't like the definition, but the fact that linux is using anonymous inodes or such is an implementation detail, and the info fstat(2) returns from an anonymous inode is most of the time neither useful nor interesting.
– mosvy
Feb 15 at 18:51
1
1
An "anonymous inode" is an in-memory kernel structure that is used to implement file-like things like those created by epoll(2), timerfd(2), signalfd(2), eventfd(2), bpf(2), inotify(2), etc, etc. You can look at the
fs/anon_inodes.c
in the kernel source.– mosvy
Feb 15 at 19:20
An "anonymous inode" is an in-memory kernel structure that is used to implement file-like things like those created by epoll(2), timerfd(2), signalfd(2), eventfd(2), bpf(2), inotify(2), etc, etc. You can look at the
fs/anon_inodes.c
in the kernel source.– mosvy
Feb 15 at 19:20
|
show 3 more comments
3 Answers
3
active
oldest
votes
What is a file?
A file in linux is basically just a thing you can interact with. There are exactly 7 types of file:
- socket,
- symbolic link,
- regular file,
- block device,
- directory,
- character device,
- FIFO (AKA pipe).
A lot of confusion arises because we talk about files in different ways depending on context. For this discussion lets about two separate contexts:
- A file as represented on disk (in a file system)
- A file as represented in linux ("in memory")
In linux (in memory)
In linux (in memory) every file has (or is?) an inode. It needs one because it's the inode that tells linux what this file is. To link the inode back to something meaningful like a file on disk, the inode stores 3 crucial pieces of information:
- Device id - references the file system or driver responsible for the file
- inode number - a unique given by the file system or driver. Two inodes can have the same inode number if they have different a device id
- a type - tells linux what this file actually is. See above.
The way you can interact with a file depends on the type of file. For example you can list a directory, you can't list a block device. You can connect to a socket, you can't connect to a regular file.
On disk
Different file systems are very different. The file systems like ext4 were written for unix and linux and mirror the inodes concept. So inodes in memory are pretty much just read from the inodes on disk.
But they are different. For example inodes on disk do not have or need a Device Id. Inodes on linux memory (in memory) do need to record where on disk the file data is stored. Inodes in linux rely on the driver to figure that out.
Inode numbers on disk are usually used by the driver as inode numbers in linux. So inodes on disk are often mistaken for being the same as inodes in memory.
How do we reference files (inodes)?
File names
A file name is the most familiar way to reference (find) a file. File systems store trees of file names, linux pieces these trees together into one tree using mount
. Each name in the tree simply points to an inode.
Files in linux can have more than one file name. This is only possible when the file system can also support it. Both in linux and on disk, multiple file names (hard links) are achieved by having more than one name point to the same inode.
Deleting a file is just deleting its file name. The actual space occupied can only be reclaimed when all file names have been removed and all "file descriptors" closed.
So for regular files (on disk) there's three things: file name --> inode --> data
File descriptors
When a program opens a file it swaps the file name for a file descriptor (a number). This is a different sort of link to an inode that does not have a name or path. All operations on a file like "read" and "write" use the file descriptor NOT the file name.
File descriptors don't have to be obtained through open(). Once you have a file descriptor it can be inherited (copied) by child processes and even copied to an entirely different process (via a unix domain socket).
I think this has caused some confusion because the words "file description" were used in a comment the OP referenced. I believe those comments were trying to say that file descriptors are more than just a number. But they have said it in a confusing way.
Files without a file name or file system
There are a few quirks to this model. Firstly if you open a file and then delete it (without closing the file), the open file descriptor prevents the file on disk from being recycled. This results in a file without a file name.
Even more strangely there are files which are never part of a file system. A program can create an unamed pipe or unamed scoket. These will have an inode on linux, but never be directly attached to a file system because they only exist as a thing in the linux kernel. These are still files (although weird ones)... they have file descriptors which reference inodes.
A common example of unnamed pipes is the STDIN, STDOUT for a command line program. When you pipe two programs together (foo | bar
), the pipe between them will be an unnamed pipe.
Closing remarks
Generally we fudge together all of these concepts into just the one word "file". Normally you can "write to a file" without caring that this involves a separate file name, inode that will be translated into a file name on disk and inode on disk and ultimately write content to the disk. The phrase "write to a file" means all of that.
It's only for special situations that you need to start separating out these concepts.
"file description" is the standard term (used in POSIX, manpages, etc) for the file "object", the kernel structure referenced by a "file descriptor". (since the words are so close, I prefer "file object" or "file structure"). You can think of "file descriptors" as indexes in an array of reference-counted pointers to "open file descriptions".
– mosvy
Feb 15 at 23:29
I mean, you can't compare fdor with simple ==. But two distinct ofdtion are different, although they might share some something like back-on-disk-inode(open the same path twice) or bound address(SO_REUSEPORT). Also, ofdtion can represent wider concept that on-disk-filesystem-inode, including socket, epollfd, etc . Although not all file related things are accessed through ofdtion(like stat or mmap), but it's still a much better terminology than "file".
– 炸鱼薯条德里克
Feb 16 at 1:20
add a comment |
TL;DR
- a file is an object on which you can perform some or all of basic operations - open,read,write,close - and has metadata stored in an inode.
- file descriptors are references to those objects
- open file description (yes, open part is important) is how file (represented by at least one file descriptor) is open
File As Abstraction
Let's consult POSIX definitions 2017, section 3.164 as to how File is defined:
An object that can be written to, or read from, or both. A file has certain attributes, including access permissions and type. File types include regular file, character special file, block special file, FIFO special file, symbolic link, socket, and directory. Other types of files may be supported by the implementation.
So a file is anything we can read from, write to, or both, which also has metadata. Everyone - go home, case closed !
Well, not so fast. Such definition opens up whole lot of room for related concepts, and there's obviously differences between regular files and say pipes. "Everything is a file" is itself a concept and design pattern rather than a literal statement. Based on that pattern such filetypes as directories, pipes, device files, in-memory files, sockets - all of that can be manipulated via set of system calls such as open()
, openat()
, write()
, and in case of sockets recv()
and send()
, in a consistent manner; take for example USB as analogy - you have so many different devices but they all connect to exactly the same USB port (nevermind there's actually multiple types of USB port types from A to C, but you get the idea).
Of course, there has to be a certain interface or reference to actual data in a consistent manner for that to work, and that's File Descriptor:
A per-process unique, non-negative integer used to identify an open file for the purpose of file access. The value of a newly-created file descriptor is from zero to OPEN_MAX-1.
As such, we can write()
to STDOUT via file descriptor 1 in the same fashion as we would write to a regular file /home/user/foobar.txt
. When you open()
a file, you get file descriptor and you can use same write()
function to write to that file. That's the whole point that original Unix creators tried to address - minimalist and consistent behavior. When you do command > /home/user/foobar.txt
the shell will make a copy of file descriptor that refers to foobar.txt
and pass it as command
's file descriptor 1 ( echo's STDOUT ), or to be more precise it will do dup2(3,1)
and then execve()
the command. But regardless of that, command
will still use the same write syscall into file descriptor 1 as if nothing happened.
Of course, in terms of what most users think is a file, they think of a regular file on disk filesystem. This is more consistent with Regular File definition, section 3.323:
A file that is a randomly accessible sequence of bytes, with no further structure imposed by the system.
By contrast, we have Sockets:
A file of a particular type that is used as a communications endpoint for process-to-process communication as described in the System Interfaces volume of POSIX.1-2017.
Regardless of the type, the actions we can take over different filetypes are exactly the same conceptually - open, read,write, close.
All Files Have Inodes
What you should have noticed in the file definition is that file has "certain attributes", which are stored in inodes. In fact on Linux specifically, we can refer to inode(7) manual first line:
Each file has an inode containing metadata about the file. An application can retrieve this metadata using stat(2) (or related calls)
Boom. Clear and direct. We're mostly familiar with inodes as bridge between blocks of data on disk and filenames stored in directories (because that's what directories are - lists of filenames and corresponding inodes). Even in virtual filesystems such as pipefs and sockfs in kernel, we can find inodes. Take for instance this code snippet:
static char *pipefs_dname(struct dentry *dent, char *buffer, int buflen)
return dynamic_dname(dentry, buffer, buflen, "pipe:[%lu]",
dentry->d_inode->i_ino);
Open File Description
Now that you're thoroughly confused, Linux/Unix introduces something known as Open File Description, and to make explanation simple - it's another abstraction. In words of Stephane Chazelas,
It's more about the the record of how the file was opened more than the file itself.
And it's consistent with POSIX definition:
A record of how a process or group of processes is accessing a file. Each file descriptor refers to exactly one open file description, but an open file description can be referred to by more than one file descriptor. The file offset, file status, and file access modes are attributes of an open file description.
Now if we also look at Understanding the Linux Kernel book, the author states
Linux implements BSD sockets as files that belong to the sockfs special filesystem...More precisely, for every new BSD socket, the kernel creates a new inode in the sockfs special filesystem.
Remembering that sockets are also referenced by file descriptors and therefore there will be open file description in kernel related to sockets, we can conclude sockets are files alright.
to be continued . . .maybe
It's more about the the record of how the file was opened more than the file itself. A great description, regarding two distinct ofdtion might point to the same inode
– 炸鱼薯条德里克
Feb 16 at 5:48
@炸鱼薯条德里克 You might want to give a +1 to Stephane's linked comment, as that's his original quote
– Sergiy Kolodyazhnyy
Feb 16 at 6:26
add a comment |
1) On most filesystems on Unix, a file, fifo, directory etcetera is described by an inode. An inode has a number of fields, but the most interesting in this case is the i_mode field. Next to the permissions, it contains the type of "file" that the inode points to:
Constant Value Description
-- file format --
EXT2_S_IFSOCK 0xC000 socket
EXT2_S_IFLNK 0xA000 symbolic link
EXT2_S_IFREG 0x8000 regular file
EXT2_S_IFBLK 0x6000 block device
EXT2_S_IFDIR 0x4000 directory
EXT2_S_IFCHR 0x2000 character device
EXT2_S_IFIFO 0x1000 fifo
2) That depends on how you see it. For every open file, whether it is a 'real' file or another construct, like unnamed pipes, you can get an inode via a system call. But that inode will not be available when the filehandles are closed. (section 2 edited to remove factual incorectness)
1
how come STDIN is not a "real" file and doesn't have an inode? Doesfoo.txt
stop being a real file as soon as I use it as the STDIN of a program, as incmd < foo.txt
?
– mosvy
Feb 15 at 19:46
I think this is wrong. You can stat the inode for an unamed pipe usingfstat()
. The function call succeeds and much as you'd expect you get the stat information for an inode created by the user and of the type pipe. So even if a program's STDIN and STDOUT are FDs onto unamed pipes, there is still an inode in linux for them.
– Philip Couling
Feb 15 at 22:06
@mosvy STDIN is a file descriptor, it's a file-like object as far as system is concerned. When you do< foo.txt
there's system calldup2()
that duplicates open file descriptor correstponding tofile.txt
onto file descriptor 0 (STDIN). And you will see that happening with$ strace -f -e dup2,openat sh -c 'cat < /etc/passwd >/dev/null'
command. File, in the way people understand it as file on disk, is different from file-like object. And "everything is a file" is just a design pattern to use same syscalls to handle different objects. It's not a literal statement
– Sergiy Kolodyazhnyy
Feb 15 at 22:29
@PhilipCouling Inode on which filesystem ? That's what you have to remember. There's in-kernel / in-memory filesystems. So even if object has an inode, that doesn't meanfstat()
returns information for on-disk filesystem.
– Sergiy Kolodyazhnyy
Feb 15 at 22:30
@SergiyKolodyazhnyy if you read my answer you'll see that this is a trap. Inodes are BOTH a concept in linux and the file system. An inode in linux does not need to exist in a file system, at least not in one you've mounted.
– Philip Couling
Feb 15 at 22:33
|
show 8 more comments
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f500907%2fwhat-characterizes-a-file-in-linux-unix%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
What is a file?
A file in linux is basically just a thing you can interact with. There are exactly 7 types of file:
- socket,
- symbolic link,
- regular file,
- block device,
- directory,
- character device,
- FIFO (AKA pipe).
A lot of confusion arises because we talk about files in different ways depending on context. For this discussion lets about two separate contexts:
- A file as represented on disk (in a file system)
- A file as represented in linux ("in memory")
In linux (in memory)
In linux (in memory) every file has (or is?) an inode. It needs one because it's the inode that tells linux what this file is. To link the inode back to something meaningful like a file on disk, the inode stores 3 crucial pieces of information:
- Device id - references the file system or driver responsible for the file
- inode number - a unique given by the file system or driver. Two inodes can have the same inode number if they have different a device id
- a type - tells linux what this file actually is. See above.
The way you can interact with a file depends on the type of file. For example you can list a directory, you can't list a block device. You can connect to a socket, you can't connect to a regular file.
On disk
Different file systems are very different. The file systems like ext4 were written for unix and linux and mirror the inodes concept. So inodes in memory are pretty much just read from the inodes on disk.
But they are different. For example inodes on disk do not have or need a Device Id. Inodes on linux memory (in memory) do need to record where on disk the file data is stored. Inodes in linux rely on the driver to figure that out.
Inode numbers on disk are usually used by the driver as inode numbers in linux. So inodes on disk are often mistaken for being the same as inodes in memory.
How do we reference files (inodes)?
File names
A file name is the most familiar way to reference (find) a file. File systems store trees of file names, linux pieces these trees together into one tree using mount
. Each name in the tree simply points to an inode.
Files in linux can have more than one file name. This is only possible when the file system can also support it. Both in linux and on disk, multiple file names (hard links) are achieved by having more than one name point to the same inode.
Deleting a file is just deleting its file name. The actual space occupied can only be reclaimed when all file names have been removed and all "file descriptors" closed.
So for regular files (on disk) there's three things: file name --> inode --> data
File descriptors
When a program opens a file it swaps the file name for a file descriptor (a number). This is a different sort of link to an inode that does not have a name or path. All operations on a file like "read" and "write" use the file descriptor NOT the file name.
File descriptors don't have to be obtained through open(). Once you have a file descriptor it can be inherited (copied) by child processes and even copied to an entirely different process (via a unix domain socket).
I think this has caused some confusion because the words "file description" were used in a comment the OP referenced. I believe those comments were trying to say that file descriptors are more than just a number. But they have said it in a confusing way.
Files without a file name or file system
There are a few quirks to this model. Firstly if you open a file and then delete it (without closing the file), the open file descriptor prevents the file on disk from being recycled. This results in a file without a file name.
Even more strangely there are files which are never part of a file system. A program can create an unamed pipe or unamed scoket. These will have an inode on linux, but never be directly attached to a file system because they only exist as a thing in the linux kernel. These are still files (although weird ones)... they have file descriptors which reference inodes.
A common example of unnamed pipes is the STDIN, STDOUT for a command line program. When you pipe two programs together (foo | bar
), the pipe between them will be an unnamed pipe.
Closing remarks
Generally we fudge together all of these concepts into just the one word "file". Normally you can "write to a file" without caring that this involves a separate file name, inode that will be translated into a file name on disk and inode on disk and ultimately write content to the disk. The phrase "write to a file" means all of that.
It's only for special situations that you need to start separating out these concepts.
"file description" is the standard term (used in POSIX, manpages, etc) for the file "object", the kernel structure referenced by a "file descriptor". (since the words are so close, I prefer "file object" or "file structure"). You can think of "file descriptors" as indexes in an array of reference-counted pointers to "open file descriptions".
– mosvy
Feb 15 at 23:29
I mean, you can't compare fdor with simple ==. But two distinct ofdtion are different, although they might share some something like back-on-disk-inode(open the same path twice) or bound address(SO_REUSEPORT). Also, ofdtion can represent wider concept that on-disk-filesystem-inode, including socket, epollfd, etc . Although not all file related things are accessed through ofdtion(like stat or mmap), but it's still a much better terminology than "file".
– 炸鱼薯条德里克
Feb 16 at 1:20
add a comment |
What is a file?
A file in linux is basically just a thing you can interact with. There are exactly 7 types of file:
- socket,
- symbolic link,
- regular file,
- block device,
- directory,
- character device,
- FIFO (AKA pipe).
A lot of confusion arises because we talk about files in different ways depending on context. For this discussion lets about two separate contexts:
- A file as represented on disk (in a file system)
- A file as represented in linux ("in memory")
In linux (in memory)
In linux (in memory) every file has (or is?) an inode. It needs one because it's the inode that tells linux what this file is. To link the inode back to something meaningful like a file on disk, the inode stores 3 crucial pieces of information:
- Device id - references the file system or driver responsible for the file
- inode number - a unique given by the file system or driver. Two inodes can have the same inode number if they have different a device id
- a type - tells linux what this file actually is. See above.
The way you can interact with a file depends on the type of file. For example you can list a directory, you can't list a block device. You can connect to a socket, you can't connect to a regular file.
On disk
Different file systems are very different. The file systems like ext4 were written for unix and linux and mirror the inodes concept. So inodes in memory are pretty much just read from the inodes on disk.
But they are different. For example inodes on disk do not have or need a Device Id. Inodes on linux memory (in memory) do need to record where on disk the file data is stored. Inodes in linux rely on the driver to figure that out.
Inode numbers on disk are usually used by the driver as inode numbers in linux. So inodes on disk are often mistaken for being the same as inodes in memory.
How do we reference files (inodes)?
File names
A file name is the most familiar way to reference (find) a file. File systems store trees of file names, linux pieces these trees together into one tree using mount
. Each name in the tree simply points to an inode.
Files in linux can have more than one file name. This is only possible when the file system can also support it. Both in linux and on disk, multiple file names (hard links) are achieved by having more than one name point to the same inode.
Deleting a file is just deleting its file name. The actual space occupied can only be reclaimed when all file names have been removed and all "file descriptors" closed.
So for regular files (on disk) there's three things: file name --> inode --> data
File descriptors
When a program opens a file it swaps the file name for a file descriptor (a number). This is a different sort of link to an inode that does not have a name or path. All operations on a file like "read" and "write" use the file descriptor NOT the file name.
File descriptors don't have to be obtained through open(). Once you have a file descriptor it can be inherited (copied) by child processes and even copied to an entirely different process (via a unix domain socket).
I think this has caused some confusion because the words "file description" were used in a comment the OP referenced. I believe those comments were trying to say that file descriptors are more than just a number. But they have said it in a confusing way.
Files without a file name or file system
There are a few quirks to this model. Firstly if you open a file and then delete it (without closing the file), the open file descriptor prevents the file on disk from being recycled. This results in a file without a file name.
Even more strangely there are files which are never part of a file system. A program can create an unamed pipe or unamed scoket. These will have an inode on linux, but never be directly attached to a file system because they only exist as a thing in the linux kernel. These are still files (although weird ones)... they have file descriptors which reference inodes.
A common example of unnamed pipes is the STDIN, STDOUT for a command line program. When you pipe two programs together (foo | bar
), the pipe between them will be an unnamed pipe.
Closing remarks
Generally we fudge together all of these concepts into just the one word "file". Normally you can "write to a file" without caring that this involves a separate file name, inode that will be translated into a file name on disk and inode on disk and ultimately write content to the disk. The phrase "write to a file" means all of that.
It's only for special situations that you need to start separating out these concepts.
"file description" is the standard term (used in POSIX, manpages, etc) for the file "object", the kernel structure referenced by a "file descriptor". (since the words are so close, I prefer "file object" or "file structure"). You can think of "file descriptors" as indexes in an array of reference-counted pointers to "open file descriptions".
– mosvy
Feb 15 at 23:29
I mean, you can't compare fdor with simple ==. But two distinct ofdtion are different, although they might share some something like back-on-disk-inode(open the same path twice) or bound address(SO_REUSEPORT). Also, ofdtion can represent wider concept that on-disk-filesystem-inode, including socket, epollfd, etc . Although not all file related things are accessed through ofdtion(like stat or mmap), but it's still a much better terminology than "file".
– 炸鱼薯条德里克
Feb 16 at 1:20
add a comment |
What is a file?
A file in linux is basically just a thing you can interact with. There are exactly 7 types of file:
- socket,
- symbolic link,
- regular file,
- block device,
- directory,
- character device,
- FIFO (AKA pipe).
A lot of confusion arises because we talk about files in different ways depending on context. For this discussion lets about two separate contexts:
- A file as represented on disk (in a file system)
- A file as represented in linux ("in memory")
In linux (in memory)
In linux (in memory) every file has (or is?) an inode. It needs one because it's the inode that tells linux what this file is. To link the inode back to something meaningful like a file on disk, the inode stores 3 crucial pieces of information:
- Device id - references the file system or driver responsible for the file
- inode number - a unique given by the file system or driver. Two inodes can have the same inode number if they have different a device id
- a type - tells linux what this file actually is. See above.
The way you can interact with a file depends on the type of file. For example you can list a directory, you can't list a block device. You can connect to a socket, you can't connect to a regular file.
On disk
Different file systems are very different. The file systems like ext4 were written for unix and linux and mirror the inodes concept. So inodes in memory are pretty much just read from the inodes on disk.
But they are different. For example inodes on disk do not have or need a Device Id. Inodes on linux memory (in memory) do need to record where on disk the file data is stored. Inodes in linux rely on the driver to figure that out.
Inode numbers on disk are usually used by the driver as inode numbers in linux. So inodes on disk are often mistaken for being the same as inodes in memory.
How do we reference files (inodes)?
File names
A file name is the most familiar way to reference (find) a file. File systems store trees of file names, linux pieces these trees together into one tree using mount
. Each name in the tree simply points to an inode.
Files in linux can have more than one file name. This is only possible when the file system can also support it. Both in linux and on disk, multiple file names (hard links) are achieved by having more than one name point to the same inode.
Deleting a file is just deleting its file name. The actual space occupied can only be reclaimed when all file names have been removed and all "file descriptors" closed.
So for regular files (on disk) there's three things: file name --> inode --> data
File descriptors
When a program opens a file it swaps the file name for a file descriptor (a number). This is a different sort of link to an inode that does not have a name or path. All operations on a file like "read" and "write" use the file descriptor NOT the file name.
File descriptors don't have to be obtained through open(). Once you have a file descriptor it can be inherited (copied) by child processes and even copied to an entirely different process (via a unix domain socket).
I think this has caused some confusion because the words "file description" were used in a comment the OP referenced. I believe those comments were trying to say that file descriptors are more than just a number. But they have said it in a confusing way.
Files without a file name or file system
There are a few quirks to this model. Firstly if you open a file and then delete it (without closing the file), the open file descriptor prevents the file on disk from being recycled. This results in a file without a file name.
Even more strangely there are files which are never part of a file system. A program can create an unamed pipe or unamed scoket. These will have an inode on linux, but never be directly attached to a file system because they only exist as a thing in the linux kernel. These are still files (although weird ones)... they have file descriptors which reference inodes.
A common example of unnamed pipes is the STDIN, STDOUT for a command line program. When you pipe two programs together (foo | bar
), the pipe between them will be an unnamed pipe.
Closing remarks
Generally we fudge together all of these concepts into just the one word "file". Normally you can "write to a file" without caring that this involves a separate file name, inode that will be translated into a file name on disk and inode on disk and ultimately write content to the disk. The phrase "write to a file" means all of that.
It's only for special situations that you need to start separating out these concepts.
What is a file?
A file in linux is basically just a thing you can interact with. There are exactly 7 types of file:
- socket,
- symbolic link,
- regular file,
- block device,
- directory,
- character device,
- FIFO (AKA pipe).
A lot of confusion arises because we talk about files in different ways depending on context. For this discussion lets about two separate contexts:
- A file as represented on disk (in a file system)
- A file as represented in linux ("in memory")
In linux (in memory)
In linux (in memory) every file has (or is?) an inode. It needs one because it's the inode that tells linux what this file is. To link the inode back to something meaningful like a file on disk, the inode stores 3 crucial pieces of information:
- Device id - references the file system or driver responsible for the file
- inode number - a unique given by the file system or driver. Two inodes can have the same inode number if they have different a device id
- a type - tells linux what this file actually is. See above.
The way you can interact with a file depends on the type of file. For example you can list a directory, you can't list a block device. You can connect to a socket, you can't connect to a regular file.
On disk
Different file systems are very different. The file systems like ext4 were written for unix and linux and mirror the inodes concept. So inodes in memory are pretty much just read from the inodes on disk.
But they are different. For example inodes on disk do not have or need a Device Id. Inodes on linux memory (in memory) do need to record where on disk the file data is stored. Inodes in linux rely on the driver to figure that out.
Inode numbers on disk are usually used by the driver as inode numbers in linux. So inodes on disk are often mistaken for being the same as inodes in memory.
How do we reference files (inodes)?
File names
A file name is the most familiar way to reference (find) a file. File systems store trees of file names, linux pieces these trees together into one tree using mount
. Each name in the tree simply points to an inode.
Files in linux can have more than one file name. This is only possible when the file system can also support it. Both in linux and on disk, multiple file names (hard links) are achieved by having more than one name point to the same inode.
Deleting a file is just deleting its file name. The actual space occupied can only be reclaimed when all file names have been removed and all "file descriptors" closed.
So for regular files (on disk) there's three things: file name --> inode --> data
File descriptors
When a program opens a file it swaps the file name for a file descriptor (a number). This is a different sort of link to an inode that does not have a name or path. All operations on a file like "read" and "write" use the file descriptor NOT the file name.
File descriptors don't have to be obtained through open(). Once you have a file descriptor it can be inherited (copied) by child processes and even copied to an entirely different process (via a unix domain socket).
I think this has caused some confusion because the words "file description" were used in a comment the OP referenced. I believe those comments were trying to say that file descriptors are more than just a number. But they have said it in a confusing way.
Files without a file name or file system
There are a few quirks to this model. Firstly if you open a file and then delete it (without closing the file), the open file descriptor prevents the file on disk from being recycled. This results in a file without a file name.
Even more strangely there are files which are never part of a file system. A program can create an unamed pipe or unamed scoket. These will have an inode on linux, but never be directly attached to a file system because they only exist as a thing in the linux kernel. These are still files (although weird ones)... they have file descriptors which reference inodes.
A common example of unnamed pipes is the STDIN, STDOUT for a command line program. When you pipe two programs together (foo | bar
), the pipe between them will be an unnamed pipe.
Closing remarks
Generally we fudge together all of these concepts into just the one word "file". Normally you can "write to a file" without caring that this involves a separate file name, inode that will be translated into a file name on disk and inode on disk and ultimately write content to the disk. The phrase "write to a file" means all of that.
It's only for special situations that you need to start separating out these concepts.
edited Feb 15 at 22:32
answered Feb 15 at 22:04
Philip CoulingPhilip Couling
1,694920
1,694920
"file description" is the standard term (used in POSIX, manpages, etc) for the file "object", the kernel structure referenced by a "file descriptor". (since the words are so close, I prefer "file object" or "file structure"). You can think of "file descriptors" as indexes in an array of reference-counted pointers to "open file descriptions".
– mosvy
Feb 15 at 23:29
I mean, you can't compare fdor with simple ==. But two distinct ofdtion are different, although they might share some something like back-on-disk-inode(open the same path twice) or bound address(SO_REUSEPORT). Also, ofdtion can represent wider concept that on-disk-filesystem-inode, including socket, epollfd, etc . Although not all file related things are accessed through ofdtion(like stat or mmap), but it's still a much better terminology than "file".
– 炸鱼薯条德里克
Feb 16 at 1:20
add a comment |
"file description" is the standard term (used in POSIX, manpages, etc) for the file "object", the kernel structure referenced by a "file descriptor". (since the words are so close, I prefer "file object" or "file structure"). You can think of "file descriptors" as indexes in an array of reference-counted pointers to "open file descriptions".
– mosvy
Feb 15 at 23:29
I mean, you can't compare fdor with simple ==. But two distinct ofdtion are different, although they might share some something like back-on-disk-inode(open the same path twice) or bound address(SO_REUSEPORT). Also, ofdtion can represent wider concept that on-disk-filesystem-inode, including socket, epollfd, etc . Although not all file related things are accessed through ofdtion(like stat or mmap), but it's still a much better terminology than "file".
– 炸鱼薯条德里克
Feb 16 at 1:20
"file description" is the standard term (used in POSIX, manpages, etc) for the file "object", the kernel structure referenced by a "file descriptor". (since the words are so close, I prefer "file object" or "file structure"). You can think of "file descriptors" as indexes in an array of reference-counted pointers to "open file descriptions".
– mosvy
Feb 15 at 23:29
"file description" is the standard term (used in POSIX, manpages, etc) for the file "object", the kernel structure referenced by a "file descriptor". (since the words are so close, I prefer "file object" or "file structure"). You can think of "file descriptors" as indexes in an array of reference-counted pointers to "open file descriptions".
– mosvy
Feb 15 at 23:29
I mean, you can't compare fdor with simple ==. But two distinct ofdtion are different, although they might share some something like back-on-disk-inode(open the same path twice) or bound address(SO_REUSEPORT). Also, ofdtion can represent wider concept that on-disk-filesystem-inode, including socket, epollfd, etc . Although not all file related things are accessed through ofdtion(like stat or mmap), but it's still a much better terminology than "file".
– 炸鱼薯条德里克
Feb 16 at 1:20
I mean, you can't compare fdor with simple ==. But two distinct ofdtion are different, although they might share some something like back-on-disk-inode(open the same path twice) or bound address(SO_REUSEPORT). Also, ofdtion can represent wider concept that on-disk-filesystem-inode, including socket, epollfd, etc . Although not all file related things are accessed through ofdtion(like stat or mmap), but it's still a much better terminology than "file".
– 炸鱼薯条德里克
Feb 16 at 1:20
add a comment |
TL;DR
- a file is an object on which you can perform some or all of basic operations - open,read,write,close - and has metadata stored in an inode.
- file descriptors are references to those objects
- open file description (yes, open part is important) is how file (represented by at least one file descriptor) is open
File As Abstraction
Let's consult POSIX definitions 2017, section 3.164 as to how File is defined:
An object that can be written to, or read from, or both. A file has certain attributes, including access permissions and type. File types include regular file, character special file, block special file, FIFO special file, symbolic link, socket, and directory. Other types of files may be supported by the implementation.
So a file is anything we can read from, write to, or both, which also has metadata. Everyone - go home, case closed !
Well, not so fast. Such definition opens up whole lot of room for related concepts, and there's obviously differences between regular files and say pipes. "Everything is a file" is itself a concept and design pattern rather than a literal statement. Based on that pattern such filetypes as directories, pipes, device files, in-memory files, sockets - all of that can be manipulated via set of system calls such as open()
, openat()
, write()
, and in case of sockets recv()
and send()
, in a consistent manner; take for example USB as analogy - you have so many different devices but they all connect to exactly the same USB port (nevermind there's actually multiple types of USB port types from A to C, but you get the idea).
Of course, there has to be a certain interface or reference to actual data in a consistent manner for that to work, and that's File Descriptor:
A per-process unique, non-negative integer used to identify an open file for the purpose of file access. The value of a newly-created file descriptor is from zero to OPEN_MAX-1.
As such, we can write()
to STDOUT via file descriptor 1 in the same fashion as we would write to a regular file /home/user/foobar.txt
. When you open()
a file, you get file descriptor and you can use same write()
function to write to that file. That's the whole point that original Unix creators tried to address - minimalist and consistent behavior. When you do command > /home/user/foobar.txt
the shell will make a copy of file descriptor that refers to foobar.txt
and pass it as command
's file descriptor 1 ( echo's STDOUT ), or to be more precise it will do dup2(3,1)
and then execve()
the command. But regardless of that, command
will still use the same write syscall into file descriptor 1 as if nothing happened.
Of course, in terms of what most users think is a file, they think of a regular file on disk filesystem. This is more consistent with Regular File definition, section 3.323:
A file that is a randomly accessible sequence of bytes, with no further structure imposed by the system.
By contrast, we have Sockets:
A file of a particular type that is used as a communications endpoint for process-to-process communication as described in the System Interfaces volume of POSIX.1-2017.
Regardless of the type, the actions we can take over different filetypes are exactly the same conceptually - open, read,write, close.
All Files Have Inodes
What you should have noticed in the file definition is that file has "certain attributes", which are stored in inodes. In fact on Linux specifically, we can refer to inode(7) manual first line:
Each file has an inode containing metadata about the file. An application can retrieve this metadata using stat(2) (or related calls)
Boom. Clear and direct. We're mostly familiar with inodes as bridge between blocks of data on disk and filenames stored in directories (because that's what directories are - lists of filenames and corresponding inodes). Even in virtual filesystems such as pipefs and sockfs in kernel, we can find inodes. Take for instance this code snippet:
static char *pipefs_dname(struct dentry *dent, char *buffer, int buflen)
return dynamic_dname(dentry, buffer, buflen, "pipe:[%lu]",
dentry->d_inode->i_ino);
Open File Description
Now that you're thoroughly confused, Linux/Unix introduces something known as Open File Description, and to make explanation simple - it's another abstraction. In words of Stephane Chazelas,
It's more about the the record of how the file was opened more than the file itself.
And it's consistent with POSIX definition:
A record of how a process or group of processes is accessing a file. Each file descriptor refers to exactly one open file description, but an open file description can be referred to by more than one file descriptor. The file offset, file status, and file access modes are attributes of an open file description.
Now if we also look at Understanding the Linux Kernel book, the author states
Linux implements BSD sockets as files that belong to the sockfs special filesystem...More precisely, for every new BSD socket, the kernel creates a new inode in the sockfs special filesystem.
Remembering that sockets are also referenced by file descriptors and therefore there will be open file description in kernel related to sockets, we can conclude sockets are files alright.
to be continued . . .maybe
It's more about the the record of how the file was opened more than the file itself. A great description, regarding two distinct ofdtion might point to the same inode
– 炸鱼薯条德里克
Feb 16 at 5:48
@炸鱼薯条德里克 You might want to give a +1 to Stephane's linked comment, as that's his original quote
– Sergiy Kolodyazhnyy
Feb 16 at 6:26
add a comment |
TL;DR
- a file is an object on which you can perform some or all of basic operations - open,read,write,close - and has metadata stored in an inode.
- file descriptors are references to those objects
- open file description (yes, open part is important) is how file (represented by at least one file descriptor) is open
File As Abstraction
Let's consult POSIX definitions 2017, section 3.164 as to how File is defined:
An object that can be written to, or read from, or both. A file has certain attributes, including access permissions and type. File types include regular file, character special file, block special file, FIFO special file, symbolic link, socket, and directory. Other types of files may be supported by the implementation.
So a file is anything we can read from, write to, or both, which also has metadata. Everyone - go home, case closed !
Well, not so fast. Such definition opens up whole lot of room for related concepts, and there's obviously differences between regular files and say pipes. "Everything is a file" is itself a concept and design pattern rather than a literal statement. Based on that pattern such filetypes as directories, pipes, device files, in-memory files, sockets - all of that can be manipulated via set of system calls such as open()
, openat()
, write()
, and in case of sockets recv()
and send()
, in a consistent manner; take for example USB as analogy - you have so many different devices but they all connect to exactly the same USB port (nevermind there's actually multiple types of USB port types from A to C, but you get the idea).
Of course, there has to be a certain interface or reference to actual data in a consistent manner for that to work, and that's File Descriptor:
A per-process unique, non-negative integer used to identify an open file for the purpose of file access. The value of a newly-created file descriptor is from zero to OPEN_MAX-1.
As such, we can write()
to STDOUT via file descriptor 1 in the same fashion as we would write to a regular file /home/user/foobar.txt
. When you open()
a file, you get file descriptor and you can use same write()
function to write to that file. That's the whole point that original Unix creators tried to address - minimalist and consistent behavior. When you do command > /home/user/foobar.txt
the shell will make a copy of file descriptor that refers to foobar.txt
and pass it as command
's file descriptor 1 ( echo's STDOUT ), or to be more precise it will do dup2(3,1)
and then execve()
the command. But regardless of that, command
will still use the same write syscall into file descriptor 1 as if nothing happened.
Of course, in terms of what most users think is a file, they think of a regular file on disk filesystem. This is more consistent with Regular File definition, section 3.323:
A file that is a randomly accessible sequence of bytes, with no further structure imposed by the system.
By contrast, we have Sockets:
A file of a particular type that is used as a communications endpoint for process-to-process communication as described in the System Interfaces volume of POSIX.1-2017.
Regardless of the type, the actions we can take over different filetypes are exactly the same conceptually - open, read,write, close.
All Files Have Inodes
What you should have noticed in the file definition is that file has "certain attributes", which are stored in inodes. In fact on Linux specifically, we can refer to inode(7) manual first line:
Each file has an inode containing metadata about the file. An application can retrieve this metadata using stat(2) (or related calls)
Boom. Clear and direct. We're mostly familiar with inodes as bridge between blocks of data on disk and filenames stored in directories (because that's what directories are - lists of filenames and corresponding inodes). Even in virtual filesystems such as pipefs and sockfs in kernel, we can find inodes. Take for instance this code snippet:
static char *pipefs_dname(struct dentry *dent, char *buffer, int buflen)
return dynamic_dname(dentry, buffer, buflen, "pipe:[%lu]",
dentry->d_inode->i_ino);
Open File Description
Now that you're thoroughly confused, Linux/Unix introduces something known as Open File Description, and to make explanation simple - it's another abstraction. In words of Stephane Chazelas,
It's more about the the record of how the file was opened more than the file itself.
And it's consistent with POSIX definition:
A record of how a process or group of processes is accessing a file. Each file descriptor refers to exactly one open file description, but an open file description can be referred to by more than one file descriptor. The file offset, file status, and file access modes are attributes of an open file description.
Now if we also look at Understanding the Linux Kernel book, the author states
Linux implements BSD sockets as files that belong to the sockfs special filesystem...More precisely, for every new BSD socket, the kernel creates a new inode in the sockfs special filesystem.
Remembering that sockets are also referenced by file descriptors and therefore there will be open file description in kernel related to sockets, we can conclude sockets are files alright.
to be continued . . .maybe
It's more about the the record of how the file was opened more than the file itself. A great description, regarding two distinct ofdtion might point to the same inode
– 炸鱼薯条德里克
Feb 16 at 5:48
@炸鱼薯条德里克 You might want to give a +1 to Stephane's linked comment, as that's his original quote
– Sergiy Kolodyazhnyy
Feb 16 at 6:26
add a comment |
TL;DR
- a file is an object on which you can perform some or all of basic operations - open,read,write,close - and has metadata stored in an inode.
- file descriptors are references to those objects
- open file description (yes, open part is important) is how file (represented by at least one file descriptor) is open
File As Abstraction
Let's consult POSIX definitions 2017, section 3.164 as to how File is defined:
An object that can be written to, or read from, or both. A file has certain attributes, including access permissions and type. File types include regular file, character special file, block special file, FIFO special file, symbolic link, socket, and directory. Other types of files may be supported by the implementation.
So a file is anything we can read from, write to, or both, which also has metadata. Everyone - go home, case closed !
Well, not so fast. Such definition opens up whole lot of room for related concepts, and there's obviously differences between regular files and say pipes. "Everything is a file" is itself a concept and design pattern rather than a literal statement. Based on that pattern such filetypes as directories, pipes, device files, in-memory files, sockets - all of that can be manipulated via set of system calls such as open()
, openat()
, write()
, and in case of sockets recv()
and send()
, in a consistent manner; take for example USB as analogy - you have so many different devices but they all connect to exactly the same USB port (nevermind there's actually multiple types of USB port types from A to C, but you get the idea).
Of course, there has to be a certain interface or reference to actual data in a consistent manner for that to work, and that's File Descriptor:
A per-process unique, non-negative integer used to identify an open file for the purpose of file access. The value of a newly-created file descriptor is from zero to OPEN_MAX-1.
As such, we can write()
to STDOUT via file descriptor 1 in the same fashion as we would write to a regular file /home/user/foobar.txt
. When you open()
a file, you get file descriptor and you can use same write()
function to write to that file. That's the whole point that original Unix creators tried to address - minimalist and consistent behavior. When you do command > /home/user/foobar.txt
the shell will make a copy of file descriptor that refers to foobar.txt
and pass it as command
's file descriptor 1 ( echo's STDOUT ), or to be more precise it will do dup2(3,1)
and then execve()
the command. But regardless of that, command
will still use the same write syscall into file descriptor 1 as if nothing happened.
Of course, in terms of what most users think is a file, they think of a regular file on disk filesystem. This is more consistent with Regular File definition, section 3.323:
A file that is a randomly accessible sequence of bytes, with no further structure imposed by the system.
By contrast, we have Sockets:
A file of a particular type that is used as a communications endpoint for process-to-process communication as described in the System Interfaces volume of POSIX.1-2017.
Regardless of the type, the actions we can take over different filetypes are exactly the same conceptually - open, read,write, close.
All Files Have Inodes
What you should have noticed in the file definition is that file has "certain attributes", which are stored in inodes. In fact on Linux specifically, we can refer to inode(7) manual first line:
Each file has an inode containing metadata about the file. An application can retrieve this metadata using stat(2) (or related calls)
Boom. Clear and direct. We're mostly familiar with inodes as bridge between blocks of data on disk and filenames stored in directories (because that's what directories are - lists of filenames and corresponding inodes). Even in virtual filesystems such as pipefs and sockfs in kernel, we can find inodes. Take for instance this code snippet:
static char *pipefs_dname(struct dentry *dent, char *buffer, int buflen)
return dynamic_dname(dentry, buffer, buflen, "pipe:[%lu]",
dentry->d_inode->i_ino);
Open File Description
Now that you're thoroughly confused, Linux/Unix introduces something known as Open File Description, and to make explanation simple - it's another abstraction. In words of Stephane Chazelas,
It's more about the the record of how the file was opened more than the file itself.
And it's consistent with POSIX definition:
A record of how a process or group of processes is accessing a file. Each file descriptor refers to exactly one open file description, but an open file description can be referred to by more than one file descriptor. The file offset, file status, and file access modes are attributes of an open file description.
Now if we also look at Understanding the Linux Kernel book, the author states
Linux implements BSD sockets as files that belong to the sockfs special filesystem...More precisely, for every new BSD socket, the kernel creates a new inode in the sockfs special filesystem.
Remembering that sockets are also referenced by file descriptors and therefore there will be open file description in kernel related to sockets, we can conclude sockets are files alright.
to be continued . . .maybe
TL;DR
- a file is an object on which you can perform some or all of basic operations - open,read,write,close - and has metadata stored in an inode.
- file descriptors are references to those objects
- open file description (yes, open part is important) is how file (represented by at least one file descriptor) is open
File As Abstraction
Let's consult POSIX definitions 2017, section 3.164 as to how File is defined:
An object that can be written to, or read from, or both. A file has certain attributes, including access permissions and type. File types include regular file, character special file, block special file, FIFO special file, symbolic link, socket, and directory. Other types of files may be supported by the implementation.
So a file is anything we can read from, write to, or both, which also has metadata. Everyone - go home, case closed !
Well, not so fast. Such definition opens up whole lot of room for related concepts, and there's obviously differences between regular files and say pipes. "Everything is a file" is itself a concept and design pattern rather than a literal statement. Based on that pattern such filetypes as directories, pipes, device files, in-memory files, sockets - all of that can be manipulated via set of system calls such as open()
, openat()
, write()
, and in case of sockets recv()
and send()
, in a consistent manner; take for example USB as analogy - you have so many different devices but they all connect to exactly the same USB port (nevermind there's actually multiple types of USB port types from A to C, but you get the idea).
Of course, there has to be a certain interface or reference to actual data in a consistent manner for that to work, and that's File Descriptor:
A per-process unique, non-negative integer used to identify an open file for the purpose of file access. The value of a newly-created file descriptor is from zero to OPEN_MAX-1.
As such, we can write()
to STDOUT via file descriptor 1 in the same fashion as we would write to a regular file /home/user/foobar.txt
. When you open()
a file, you get file descriptor and you can use same write()
function to write to that file. That's the whole point that original Unix creators tried to address - minimalist and consistent behavior. When you do command > /home/user/foobar.txt
the shell will make a copy of file descriptor that refers to foobar.txt
and pass it as command
's file descriptor 1 ( echo's STDOUT ), or to be more precise it will do dup2(3,1)
and then execve()
the command. But regardless of that, command
will still use the same write syscall into file descriptor 1 as if nothing happened.
Of course, in terms of what most users think is a file, they think of a regular file on disk filesystem. This is more consistent with Regular File definition, section 3.323:
A file that is a randomly accessible sequence of bytes, with no further structure imposed by the system.
By contrast, we have Sockets:
A file of a particular type that is used as a communications endpoint for process-to-process communication as described in the System Interfaces volume of POSIX.1-2017.
Regardless of the type, the actions we can take over different filetypes are exactly the same conceptually - open, read,write, close.
All Files Have Inodes
What you should have noticed in the file definition is that file has "certain attributes", which are stored in inodes. In fact on Linux specifically, we can refer to inode(7) manual first line:
Each file has an inode containing metadata about the file. An application can retrieve this metadata using stat(2) (or related calls)
Boom. Clear and direct. We're mostly familiar with inodes as bridge between blocks of data on disk and filenames stored in directories (because that's what directories are - lists of filenames and corresponding inodes). Even in virtual filesystems such as pipefs and sockfs in kernel, we can find inodes. Take for instance this code snippet:
static char *pipefs_dname(struct dentry *dent, char *buffer, int buflen)
return dynamic_dname(dentry, buffer, buflen, "pipe:[%lu]",
dentry->d_inode->i_ino);
Open File Description
Now that you're thoroughly confused, Linux/Unix introduces something known as Open File Description, and to make explanation simple - it's another abstraction. In words of Stephane Chazelas,
It's more about the the record of how the file was opened more than the file itself.
And it's consistent with POSIX definition:
A record of how a process or group of processes is accessing a file. Each file descriptor refers to exactly one open file description, but an open file description can be referred to by more than one file descriptor. The file offset, file status, and file access modes are attributes of an open file description.
Now if we also look at Understanding the Linux Kernel book, the author states
Linux implements BSD sockets as files that belong to the sockfs special filesystem...More precisely, for every new BSD socket, the kernel creates a new inode in the sockfs special filesystem.
Remembering that sockets are also referenced by file descriptors and therefore there will be open file description in kernel related to sockets, we can conclude sockets are files alright.
to be continued . . .maybe
answered Feb 16 at 3:41
Sergiy KolodyazhnyySergiy Kolodyazhnyy
10.5k42663
10.5k42663
It's more about the the record of how the file was opened more than the file itself. A great description, regarding two distinct ofdtion might point to the same inode
– 炸鱼薯条德里克
Feb 16 at 5:48
@炸鱼薯条德里克 You might want to give a +1 to Stephane's linked comment, as that's his original quote
– Sergiy Kolodyazhnyy
Feb 16 at 6:26
add a comment |
It's more about the the record of how the file was opened more than the file itself. A great description, regarding two distinct ofdtion might point to the same inode
– 炸鱼薯条德里克
Feb 16 at 5:48
@炸鱼薯条德里克 You might want to give a +1 to Stephane's linked comment, as that's his original quote
– Sergiy Kolodyazhnyy
Feb 16 at 6:26
It's more about the the record of how the file was opened more than the file itself. A great description, regarding two distinct ofdtion might point to the same inode
– 炸鱼薯条德里克
Feb 16 at 5:48
It's more about the the record of how the file was opened more than the file itself. A great description, regarding two distinct ofdtion might point to the same inode
– 炸鱼薯条德里克
Feb 16 at 5:48
@炸鱼薯条德里克 You might want to give a +1 to Stephane's linked comment, as that's his original quote
– Sergiy Kolodyazhnyy
Feb 16 at 6:26
@炸鱼薯条德里克 You might want to give a +1 to Stephane's linked comment, as that's his original quote
– Sergiy Kolodyazhnyy
Feb 16 at 6:26
add a comment |
1) On most filesystems on Unix, a file, fifo, directory etcetera is described by an inode. An inode has a number of fields, but the most interesting in this case is the i_mode field. Next to the permissions, it contains the type of "file" that the inode points to:
Constant Value Description
-- file format --
EXT2_S_IFSOCK 0xC000 socket
EXT2_S_IFLNK 0xA000 symbolic link
EXT2_S_IFREG 0x8000 regular file
EXT2_S_IFBLK 0x6000 block device
EXT2_S_IFDIR 0x4000 directory
EXT2_S_IFCHR 0x2000 character device
EXT2_S_IFIFO 0x1000 fifo
2) That depends on how you see it. For every open file, whether it is a 'real' file or another construct, like unnamed pipes, you can get an inode via a system call. But that inode will not be available when the filehandles are closed. (section 2 edited to remove factual incorectness)
1
how come STDIN is not a "real" file and doesn't have an inode? Doesfoo.txt
stop being a real file as soon as I use it as the STDIN of a program, as incmd < foo.txt
?
– mosvy
Feb 15 at 19:46
I think this is wrong. You can stat the inode for an unamed pipe usingfstat()
. The function call succeeds and much as you'd expect you get the stat information for an inode created by the user and of the type pipe. So even if a program's STDIN and STDOUT are FDs onto unamed pipes, there is still an inode in linux for them.
– Philip Couling
Feb 15 at 22:06
@mosvy STDIN is a file descriptor, it's a file-like object as far as system is concerned. When you do< foo.txt
there's system calldup2()
that duplicates open file descriptor correstponding tofile.txt
onto file descriptor 0 (STDIN). And you will see that happening with$ strace -f -e dup2,openat sh -c 'cat < /etc/passwd >/dev/null'
command. File, in the way people understand it as file on disk, is different from file-like object. And "everything is a file" is just a design pattern to use same syscalls to handle different objects. It's not a literal statement
– Sergiy Kolodyazhnyy
Feb 15 at 22:29
@PhilipCouling Inode on which filesystem ? That's what you have to remember. There's in-kernel / in-memory filesystems. So even if object has an inode, that doesn't meanfstat()
returns information for on-disk filesystem.
– Sergiy Kolodyazhnyy
Feb 15 at 22:30
@SergiyKolodyazhnyy if you read my answer you'll see that this is a trap. Inodes are BOTH a concept in linux and the file system. An inode in linux does not need to exist in a file system, at least not in one you've mounted.
– Philip Couling
Feb 15 at 22:33
|
show 8 more comments
1) On most filesystems on Unix, a file, fifo, directory etcetera is described by an inode. An inode has a number of fields, but the most interesting in this case is the i_mode field. Next to the permissions, it contains the type of "file" that the inode points to:
Constant Value Description
-- file format --
EXT2_S_IFSOCK 0xC000 socket
EXT2_S_IFLNK 0xA000 symbolic link
EXT2_S_IFREG 0x8000 regular file
EXT2_S_IFBLK 0x6000 block device
EXT2_S_IFDIR 0x4000 directory
EXT2_S_IFCHR 0x2000 character device
EXT2_S_IFIFO 0x1000 fifo
2) That depends on how you see it. For every open file, whether it is a 'real' file or another construct, like unnamed pipes, you can get an inode via a system call. But that inode will not be available when the filehandles are closed. (section 2 edited to remove factual incorectness)
1
how come STDIN is not a "real" file and doesn't have an inode? Doesfoo.txt
stop being a real file as soon as I use it as the STDIN of a program, as incmd < foo.txt
?
– mosvy
Feb 15 at 19:46
I think this is wrong. You can stat the inode for an unamed pipe usingfstat()
. The function call succeeds and much as you'd expect you get the stat information for an inode created by the user and of the type pipe. So even if a program's STDIN and STDOUT are FDs onto unamed pipes, there is still an inode in linux for them.
– Philip Couling
Feb 15 at 22:06
@mosvy STDIN is a file descriptor, it's a file-like object as far as system is concerned. When you do< foo.txt
there's system calldup2()
that duplicates open file descriptor correstponding tofile.txt
onto file descriptor 0 (STDIN). And you will see that happening with$ strace -f -e dup2,openat sh -c 'cat < /etc/passwd >/dev/null'
command. File, in the way people understand it as file on disk, is different from file-like object. And "everything is a file" is just a design pattern to use same syscalls to handle different objects. It's not a literal statement
– Sergiy Kolodyazhnyy
Feb 15 at 22:29
@PhilipCouling Inode on which filesystem ? That's what you have to remember. There's in-kernel / in-memory filesystems. So even if object has an inode, that doesn't meanfstat()
returns information for on-disk filesystem.
– Sergiy Kolodyazhnyy
Feb 15 at 22:30
@SergiyKolodyazhnyy if you read my answer you'll see that this is a trap. Inodes are BOTH a concept in linux and the file system. An inode in linux does not need to exist in a file system, at least not in one you've mounted.
– Philip Couling
Feb 15 at 22:33
|
show 8 more comments
1) On most filesystems on Unix, a file, fifo, directory etcetera is described by an inode. An inode has a number of fields, but the most interesting in this case is the i_mode field. Next to the permissions, it contains the type of "file" that the inode points to:
Constant Value Description
-- file format --
EXT2_S_IFSOCK 0xC000 socket
EXT2_S_IFLNK 0xA000 symbolic link
EXT2_S_IFREG 0x8000 regular file
EXT2_S_IFBLK 0x6000 block device
EXT2_S_IFDIR 0x4000 directory
EXT2_S_IFCHR 0x2000 character device
EXT2_S_IFIFO 0x1000 fifo
2) That depends on how you see it. For every open file, whether it is a 'real' file or another construct, like unnamed pipes, you can get an inode via a system call. But that inode will not be available when the filehandles are closed. (section 2 edited to remove factual incorectness)
1) On most filesystems on Unix, a file, fifo, directory etcetera is described by an inode. An inode has a number of fields, but the most interesting in this case is the i_mode field. Next to the permissions, it contains the type of "file" that the inode points to:
Constant Value Description
-- file format --
EXT2_S_IFSOCK 0xC000 socket
EXT2_S_IFLNK 0xA000 symbolic link
EXT2_S_IFREG 0x8000 regular file
EXT2_S_IFBLK 0x6000 block device
EXT2_S_IFDIR 0x4000 directory
EXT2_S_IFCHR 0x2000 character device
EXT2_S_IFIFO 0x1000 fifo
2) That depends on how you see it. For every open file, whether it is a 'real' file or another construct, like unnamed pipes, you can get an inode via a system call. But that inode will not be available when the filehandles are closed. (section 2 edited to remove factual incorectness)
edited Feb 16 at 9:43
answered Feb 15 at 18:56
Ljm DullaartLjm Dullaart
734210
734210
1
how come STDIN is not a "real" file and doesn't have an inode? Doesfoo.txt
stop being a real file as soon as I use it as the STDIN of a program, as incmd < foo.txt
?
– mosvy
Feb 15 at 19:46
I think this is wrong. You can stat the inode for an unamed pipe usingfstat()
. The function call succeeds and much as you'd expect you get the stat information for an inode created by the user and of the type pipe. So even if a program's STDIN and STDOUT are FDs onto unamed pipes, there is still an inode in linux for them.
– Philip Couling
Feb 15 at 22:06
@mosvy STDIN is a file descriptor, it's a file-like object as far as system is concerned. When you do< foo.txt
there's system calldup2()
that duplicates open file descriptor correstponding tofile.txt
onto file descriptor 0 (STDIN). And you will see that happening with$ strace -f -e dup2,openat sh -c 'cat < /etc/passwd >/dev/null'
command. File, in the way people understand it as file on disk, is different from file-like object. And "everything is a file" is just a design pattern to use same syscalls to handle different objects. It's not a literal statement
– Sergiy Kolodyazhnyy
Feb 15 at 22:29
@PhilipCouling Inode on which filesystem ? That's what you have to remember. There's in-kernel / in-memory filesystems. So even if object has an inode, that doesn't meanfstat()
returns information for on-disk filesystem.
– Sergiy Kolodyazhnyy
Feb 15 at 22:30
@SergiyKolodyazhnyy if you read my answer you'll see that this is a trap. Inodes are BOTH a concept in linux and the file system. An inode in linux does not need to exist in a file system, at least not in one you've mounted.
– Philip Couling
Feb 15 at 22:33
|
show 8 more comments
1
how come STDIN is not a "real" file and doesn't have an inode? Doesfoo.txt
stop being a real file as soon as I use it as the STDIN of a program, as incmd < foo.txt
?
– mosvy
Feb 15 at 19:46
I think this is wrong. You can stat the inode for an unamed pipe usingfstat()
. The function call succeeds and much as you'd expect you get the stat information for an inode created by the user and of the type pipe. So even if a program's STDIN and STDOUT are FDs onto unamed pipes, there is still an inode in linux for them.
– Philip Couling
Feb 15 at 22:06
@mosvy STDIN is a file descriptor, it's a file-like object as far as system is concerned. When you do< foo.txt
there's system calldup2()
that duplicates open file descriptor correstponding tofile.txt
onto file descriptor 0 (STDIN). And you will see that happening with$ strace -f -e dup2,openat sh -c 'cat < /etc/passwd >/dev/null'
command. File, in the way people understand it as file on disk, is different from file-like object. And "everything is a file" is just a design pattern to use same syscalls to handle different objects. It's not a literal statement
– Sergiy Kolodyazhnyy
Feb 15 at 22:29
@PhilipCouling Inode on which filesystem ? That's what you have to remember. There's in-kernel / in-memory filesystems. So even if object has an inode, that doesn't meanfstat()
returns information for on-disk filesystem.
– Sergiy Kolodyazhnyy
Feb 15 at 22:30
@SergiyKolodyazhnyy if you read my answer you'll see that this is a trap. Inodes are BOTH a concept in linux and the file system. An inode in linux does not need to exist in a file system, at least not in one you've mounted.
– Philip Couling
Feb 15 at 22:33
1
1
how come STDIN is not a "real" file and doesn't have an inode? Does
foo.txt
stop being a real file as soon as I use it as the STDIN of a program, as in cmd < foo.txt
?– mosvy
Feb 15 at 19:46
how come STDIN is not a "real" file and doesn't have an inode? Does
foo.txt
stop being a real file as soon as I use it as the STDIN of a program, as in cmd < foo.txt
?– mosvy
Feb 15 at 19:46
I think this is wrong. You can stat the inode for an unamed pipe using
fstat()
. The function call succeeds and much as you'd expect you get the stat information for an inode created by the user and of the type pipe. So even if a program's STDIN and STDOUT are FDs onto unamed pipes, there is still an inode in linux for them.– Philip Couling
Feb 15 at 22:06
I think this is wrong. You can stat the inode for an unamed pipe using
fstat()
. The function call succeeds and much as you'd expect you get the stat information for an inode created by the user and of the type pipe. So even if a program's STDIN and STDOUT are FDs onto unamed pipes, there is still an inode in linux for them.– Philip Couling
Feb 15 at 22:06
@mosvy STDIN is a file descriptor, it's a file-like object as far as system is concerned. When you do
< foo.txt
there's system call dup2()
that duplicates open file descriptor correstponding to file.txt
onto file descriptor 0 (STDIN). And you will see that happening with $ strace -f -e dup2,openat sh -c 'cat < /etc/passwd >/dev/null'
command. File, in the way people understand it as file on disk, is different from file-like object. And "everything is a file" is just a design pattern to use same syscalls to handle different objects. It's not a literal statement– Sergiy Kolodyazhnyy
Feb 15 at 22:29
@mosvy STDIN is a file descriptor, it's a file-like object as far as system is concerned. When you do
< foo.txt
there's system call dup2()
that duplicates open file descriptor correstponding to file.txt
onto file descriptor 0 (STDIN). And you will see that happening with $ strace -f -e dup2,openat sh -c 'cat < /etc/passwd >/dev/null'
command. File, in the way people understand it as file on disk, is different from file-like object. And "everything is a file" is just a design pattern to use same syscalls to handle different objects. It's not a literal statement– Sergiy Kolodyazhnyy
Feb 15 at 22:29
@PhilipCouling Inode on which filesystem ? That's what you have to remember. There's in-kernel / in-memory filesystems. So even if object has an inode, that doesn't mean
fstat()
returns information for on-disk filesystem.– Sergiy Kolodyazhnyy
Feb 15 at 22:30
@PhilipCouling Inode on which filesystem ? That's what you have to remember. There's in-kernel / in-memory filesystems. So even if object has an inode, that doesn't mean
fstat()
returns information for on-disk filesystem.– Sergiy Kolodyazhnyy
Feb 15 at 22:30
@SergiyKolodyazhnyy if you read my answer you'll see that this is a trap. Inodes are BOTH a concept in linux and the file system. An inode in linux does not need to exist in a file system, at least not in one you've mounted.
– Philip Couling
Feb 15 at 22:33
@SergiyKolodyazhnyy if you read my answer you'll see that this is a trap. Inodes are BOTH a concept in linux and the file system. An inode in linux does not need to exist in a file system, at least not in one you've mounted.
– Philip Couling
Feb 15 at 22:33
|
show 8 more comments
Thanks for contributing an answer to Unix & Linux Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f500907%2fwhat-characterizes-a-file-in-linux-unix%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
I never define "file". I also don't know what do you mean by "something with a open file description"
– 炸鱼薯条德里克
Feb 15 at 16:18
FIFO and pipe are the same thing. the one you've missed is device can be "block device" or "character device".
– Philip Couling
Feb 15 at 16:24
Also see man7.org/linux/man-pages/man5/proc.5.html
– 炸鱼薯条德里克
Feb 15 at 16:26
2
On linux a file is any thing that is referenced by a file descriptor, and a file descriptor is anything that can be close(2)d. I'm sorry if you don't like the definition, but the fact that linux is using anonymous inodes or such is an implementation detail, and the info fstat(2) returns from an anonymous inode is most of the time neither useful nor interesting.
– mosvy
Feb 15 at 18:51
1
An "anonymous inode" is an in-memory kernel structure that is used to implement file-like things like those created by epoll(2), timerfd(2), signalfd(2), eventfd(2), bpf(2), inotify(2), etc, etc. You can look at the
fs/anon_inodes.c
in the kernel source.– mosvy
Feb 15 at 19:20