Recursively count number of files in folders in tar file
Clash Royale CLAN TAG#URR8PPP
up vote
0
down vote
favorite
I'm further extending a previous question to count number of files in tar file (link) to a new question on how to count files under subfolders in a tar file. What I would to have at the end is:
- list the folders that contains files in it
- count the number of files within that folder
My example tar file listing tar -tvf myfile.tar
looks like below (the real tar file has more files and directories). There are a total of 2 folders where folder_files_1 has 3 files within and folder_files_2 has 4 files within.
drwxrwxrwx someuser/users 0 2017-08-07 11:43 ./root_folder/subfolder/folder_files_1/
-rwxr-xr-x someuser/users 538962 2017-08-07 11:43 ./root_folder/subfolder/folder_files_1/i716266.MRDC.270
-rwxr-xr-x someuser/users 538962 2017-08-07 11:43 ./root_folder/subfolder/folder_files_1/i716267.MRDC.266
-rwxr-xr-x someuser/users 538944 2017-08-07 11:43 ./root_folder/subfolder/folder_files_1/i716268.MRDC.287
drwxrwxrwx someuser/users 0 2017-08-07 11:50 ./root_folder/subfolder/folder_files_2/
-rwxr-xr-x someuser/users 538696 2017-08-07 11:50 ./root_folder/subfolder/folder_files_2/i717157.MRDC.8
-rwxr-xr-x someuser/users 538694 2017-08-07 11:50 ./root_folder/subfolder/folder_files_2/i717158.MRDC.4
-rwxr-xr-x someuser/users 538692 2017-08-07 11:50 ./root_folder/subfolder/folder_files_2/i717159.MRDC.34
-rwxr-xr-x someuser/users 538696 2017-08-07 11:50 ./root_folder/subfolder/folder_files_2/i717160.MRDC.5
The closest solution I've searched pointed me to using awk
after tar
(see references here and here).
tar tvf myfile.tar | awk '/^d/ print $0; /$6/; getline; file_no++ END print file_no'
/$6/
is to match the corresponding folder ./root_folder/subfolder/folder_files_1/
. But it still is no accurately counting the file numbers under the matching directory, ie. folder_files_1, _folder_files_2.
Any suggestions on how to fix my code?
linux awk tar
add a comment |Â
up vote
0
down vote
favorite
I'm further extending a previous question to count number of files in tar file (link) to a new question on how to count files under subfolders in a tar file. What I would to have at the end is:
- list the folders that contains files in it
- count the number of files within that folder
My example tar file listing tar -tvf myfile.tar
looks like below (the real tar file has more files and directories). There are a total of 2 folders where folder_files_1 has 3 files within and folder_files_2 has 4 files within.
drwxrwxrwx someuser/users 0 2017-08-07 11:43 ./root_folder/subfolder/folder_files_1/
-rwxr-xr-x someuser/users 538962 2017-08-07 11:43 ./root_folder/subfolder/folder_files_1/i716266.MRDC.270
-rwxr-xr-x someuser/users 538962 2017-08-07 11:43 ./root_folder/subfolder/folder_files_1/i716267.MRDC.266
-rwxr-xr-x someuser/users 538944 2017-08-07 11:43 ./root_folder/subfolder/folder_files_1/i716268.MRDC.287
drwxrwxrwx someuser/users 0 2017-08-07 11:50 ./root_folder/subfolder/folder_files_2/
-rwxr-xr-x someuser/users 538696 2017-08-07 11:50 ./root_folder/subfolder/folder_files_2/i717157.MRDC.8
-rwxr-xr-x someuser/users 538694 2017-08-07 11:50 ./root_folder/subfolder/folder_files_2/i717158.MRDC.4
-rwxr-xr-x someuser/users 538692 2017-08-07 11:50 ./root_folder/subfolder/folder_files_2/i717159.MRDC.34
-rwxr-xr-x someuser/users 538696 2017-08-07 11:50 ./root_folder/subfolder/folder_files_2/i717160.MRDC.5
The closest solution I've searched pointed me to using awk
after tar
(see references here and here).
tar tvf myfile.tar | awk '/^d/ print $0; /$6/; getline; file_no++ END print file_no'
/$6/
is to match the corresponding folder ./root_folder/subfolder/folder_files_1/
. But it still is no accurately counting the file numbers under the matching directory, ie. folder_files_1, _folder_files_2.
Any suggestions on how to fix my code?
linux awk tar
The same solution in your other question should work:tar tvf myfile.tar | wc -l
â Nasir Riley
Mar 6 at 19:24
@NasirRiley No, it won't. That will count everything in the tar file, now he's asking for only certain paths.
â smokes2345
Mar 6 at 19:26
The way that he's worded it is somewhat confusing. Perhaps it can be certain that he wants to find only files but I don't see where it says that he's looking for certain paths. The answer right below this will give him what he wants if it's only files but if he only wants certain paths then it's going to get really hairy and convoluted.
â Nasir Riley
Mar 6 at 23:13
add a comment |Â
up vote
0
down vote
favorite
up vote
0
down vote
favorite
I'm further extending a previous question to count number of files in tar file (link) to a new question on how to count files under subfolders in a tar file. What I would to have at the end is:
- list the folders that contains files in it
- count the number of files within that folder
My example tar file listing tar -tvf myfile.tar
looks like below (the real tar file has more files and directories). There are a total of 2 folders where folder_files_1 has 3 files within and folder_files_2 has 4 files within.
drwxrwxrwx someuser/users 0 2017-08-07 11:43 ./root_folder/subfolder/folder_files_1/
-rwxr-xr-x someuser/users 538962 2017-08-07 11:43 ./root_folder/subfolder/folder_files_1/i716266.MRDC.270
-rwxr-xr-x someuser/users 538962 2017-08-07 11:43 ./root_folder/subfolder/folder_files_1/i716267.MRDC.266
-rwxr-xr-x someuser/users 538944 2017-08-07 11:43 ./root_folder/subfolder/folder_files_1/i716268.MRDC.287
drwxrwxrwx someuser/users 0 2017-08-07 11:50 ./root_folder/subfolder/folder_files_2/
-rwxr-xr-x someuser/users 538696 2017-08-07 11:50 ./root_folder/subfolder/folder_files_2/i717157.MRDC.8
-rwxr-xr-x someuser/users 538694 2017-08-07 11:50 ./root_folder/subfolder/folder_files_2/i717158.MRDC.4
-rwxr-xr-x someuser/users 538692 2017-08-07 11:50 ./root_folder/subfolder/folder_files_2/i717159.MRDC.34
-rwxr-xr-x someuser/users 538696 2017-08-07 11:50 ./root_folder/subfolder/folder_files_2/i717160.MRDC.5
The closest solution I've searched pointed me to using awk
after tar
(see references here and here).
tar tvf myfile.tar | awk '/^d/ print $0; /$6/; getline; file_no++ END print file_no'
/$6/
is to match the corresponding folder ./root_folder/subfolder/folder_files_1/
. But it still is no accurately counting the file numbers under the matching directory, ie. folder_files_1, _folder_files_2.
Any suggestions on how to fix my code?
linux awk tar
I'm further extending a previous question to count number of files in tar file (link) to a new question on how to count files under subfolders in a tar file. What I would to have at the end is:
- list the folders that contains files in it
- count the number of files within that folder
My example tar file listing tar -tvf myfile.tar
looks like below (the real tar file has more files and directories). There are a total of 2 folders where folder_files_1 has 3 files within and folder_files_2 has 4 files within.
drwxrwxrwx someuser/users 0 2017-08-07 11:43 ./root_folder/subfolder/folder_files_1/
-rwxr-xr-x someuser/users 538962 2017-08-07 11:43 ./root_folder/subfolder/folder_files_1/i716266.MRDC.270
-rwxr-xr-x someuser/users 538962 2017-08-07 11:43 ./root_folder/subfolder/folder_files_1/i716267.MRDC.266
-rwxr-xr-x someuser/users 538944 2017-08-07 11:43 ./root_folder/subfolder/folder_files_1/i716268.MRDC.287
drwxrwxrwx someuser/users 0 2017-08-07 11:50 ./root_folder/subfolder/folder_files_2/
-rwxr-xr-x someuser/users 538696 2017-08-07 11:50 ./root_folder/subfolder/folder_files_2/i717157.MRDC.8
-rwxr-xr-x someuser/users 538694 2017-08-07 11:50 ./root_folder/subfolder/folder_files_2/i717158.MRDC.4
-rwxr-xr-x someuser/users 538692 2017-08-07 11:50 ./root_folder/subfolder/folder_files_2/i717159.MRDC.34
-rwxr-xr-x someuser/users 538696 2017-08-07 11:50 ./root_folder/subfolder/folder_files_2/i717160.MRDC.5
The closest solution I've searched pointed me to using awk
after tar
(see references here and here).
tar tvf myfile.tar | awk '/^d/ print $0; /$6/; getline; file_no++ END print file_no'
/$6/
is to match the corresponding folder ./root_folder/subfolder/folder_files_1/
. But it still is no accurately counting the file numbers under the matching directory, ie. folder_files_1, _folder_files_2.
Any suggestions on how to fix my code?
linux awk tar
edited Mar 7 at 14:30
asked Mar 6 at 18:55
SeanM
11
11
The same solution in your other question should work:tar tvf myfile.tar | wc -l
â Nasir Riley
Mar 6 at 19:24
@NasirRiley No, it won't. That will count everything in the tar file, now he's asking for only certain paths.
â smokes2345
Mar 6 at 19:26
The way that he's worded it is somewhat confusing. Perhaps it can be certain that he wants to find only files but I don't see where it says that he's looking for certain paths. The answer right below this will give him what he wants if it's only files but if he only wants certain paths then it's going to get really hairy and convoluted.
â Nasir Riley
Mar 6 at 23:13
add a comment |Â
The same solution in your other question should work:tar tvf myfile.tar | wc -l
â Nasir Riley
Mar 6 at 19:24
@NasirRiley No, it won't. That will count everything in the tar file, now he's asking for only certain paths.
â smokes2345
Mar 6 at 19:26
The way that he's worded it is somewhat confusing. Perhaps it can be certain that he wants to find only files but I don't see where it says that he's looking for certain paths. The answer right below this will give him what he wants if it's only files but if he only wants certain paths then it's going to get really hairy and convoluted.
â Nasir Riley
Mar 6 at 23:13
The same solution in your other question should work:
tar tvf myfile.tar | wc -l
â Nasir Riley
Mar 6 at 19:24
The same solution in your other question should work:
tar tvf myfile.tar | wc -l
â Nasir Riley
Mar 6 at 19:24
@NasirRiley No, it won't. That will count everything in the tar file, now he's asking for only certain paths.
â smokes2345
Mar 6 at 19:26
@NasirRiley No, it won't. That will count everything in the tar file, now he's asking for only certain paths.
â smokes2345
Mar 6 at 19:26
The way that he's worded it is somewhat confusing. Perhaps it can be certain that he wants to find only files but I don't see where it says that he's looking for certain paths. The answer right below this will give him what he wants if it's only files but if he only wants certain paths then it's going to get really hairy and convoluted.
â Nasir Riley
Mar 6 at 23:13
The way that he's worded it is somewhat confusing. Perhaps it can be certain that he wants to find only files but I don't see where it says that he's looking for certain paths. The answer right below this will give him what he wants if it's only files but if he only wants certain paths then it's going to get really hairy and convoluted.
â Nasir Riley
Mar 6 at 23:13
add a comment |Â
4 Answers
4
active
oldest
votes
up vote
1
down vote
Another option:
tar tf archive.tar |
awk '
if (gsub("[^/]+$", "")) h[$0]++
END for (f in h) printf "%dt%sn", h[f], f
'
The first awk
statement strips filenames, and counts the instances of resulting directory paths. The second runs when the input has been fully consumed (i.e. at the end of stdin) and prints the list of paths and their respective counts.
The whole thing can be run into a single line if you prefer (just literally concatenate the whole lot). I've split it here for readability.
Result from running against your tarball:
4 ./root_folder/subfolder/folder_files_2/
3 ./root_folder/subfolder/folder_files_1/
add a comment |Â
up vote
1
down vote
tar -tvf file.tar | grep '^-' | wc -l
This will count the number of lines in the tar
output that start with -
(i.e. files). Change /^-
to /^[^d]/
to count "anything but directories" if you have special types of files in your archive.
Another way, with awk
:
tar -tvf file.tar | awk '/^-/ n++ END print n '
Both of these commands outputs 7
, the total number of files in the archive.
If you want separate counts for each subfolder:
tar -tvf file.tar | awk '/^d/ d = $NF; next n[d]++ END for (d in n) print n[d], d '
This generates
4 ./root_folder/subfolder/folder_files_2/
3 ./root_folder/subfolder/folder_files_1/
for the data that you have provided.
The awk
code in this last example picks out the directory name from any line that starts with d
and uses it as a key in an associative array. The array entry is incremented for each found file. At the end, all entries and their count are printed.
1
Depending on whether pipes and device files count as "files", you might use something likegrep '^[^d]
to specifically omit directories.
â Jeff Schaller
Mar 6 at 20:12
Works for the data given, but$NF
doesn't work if (path)names contain whitespace, and that logic is wrong if the tar contains e.g./dir/file1,subdir/[abc],file2
â dave_thompson_085
Mar 7 at 7:20
@dave_thompson_085 I understand your note about white spaces, but I don't fully understand the comment about the logic. Are you concerned about sub-subfolders or subfolders occurring mixed in with files (I could understand that).
â Kusalananda
Mar 7 at 7:29
I think that's what @dave is talking about: files from parent directory listed after subdirectory and its files, in which cased
should be reset or extracted from the filename.
â muru
Mar 7 at 9:00
@muru Ah. Yes. Well, in this case this is a simple solution for simple archives...
â Kusalananda
Mar 7 at 9:06
add a comment |Â
up vote
1
down vote
If you have GNU tar, it has a --to-command
option:
--to-command=COMMAND
Pipe extracted files to COMMAND. The argument is the pathname
of an external program, optionally with command line
arguments. The program will be invoked and the contents of
the file being extracted supplied to it on its standard
output. Additional data will be supplied via the following
environment variables:
TAR_FILETYPE
Type of the file. It is a single letter with the
following meaning:
f Regular file
d Directory
l Symbolic link
h Hard link
b Block device
c Character device
Currently only regular files are supported.
...
TAR_FILENAME
The name of the file.
These variables can be used to safely handle filenames with spaces, etc.
For example, using shell string substitution to remove the filename from the path given, then using sed to print only the paths for non-directories, you can then sort and apply uniq -c
to get the count:
tar xf foo.tar --to-command 'echo "$TAR_FILETYPE" "$TAR_FILENAME%/*"' |
sed -n '/^[^d]/s/^. //p' |
sort |
uniq -c
If you have GNU sed, sort and uniq, you can use their -z
options and printf "%s %s"
instead of echo
to safely handle all filenames.
Example:
% tar xf dev/pacaur/byobu/byobu_5.124.orig.tar.gz --to-command 'printf "%s %s" "$TAR_FILETYPE" "$TAR_FILENAME%/*"' | sed -zn '/^[^d]/s/^. //p' | sort -z | uniq -zc | tr '' 'n'
15 byobu-5.124
2 byobu-5.124/Applications/Byobu.app/Contents
1 byobu-5.124/Applications/Byobu.app/Contents/MacOS
8 byobu-5.124/Applications/Byobu.app/Contents/Resources
4 byobu-5.124/etc/byobu
3 byobu-5.124/etc/profile.d
1 byobu-5.124/experimental
23 byobu-5.124/po
1 byobu-5.124/snap
38 byobu-5.124/usr/bin
43 byobu-5.124/usr/lib/byobu
18 byobu-5.124/usr/lib/byobu/include
1 byobu-5.124/usr/share/appdata
4 byobu-5.124/usr/share/byobu/desktop
12 byobu-5.124/usr/share/byobu/keybindings
4 byobu-5.124/usr/share/byobu/pixmaps
1 byobu-5.124/usr/share/byobu/pixmaps/highcontrast
11 byobu-5.124/usr/share/byobu/profiles
4 byobu-5.124/usr/share/byobu/status
3 byobu-5.124/usr/share/byobu/tests
3 byobu-5.124/usr/share/byobu/windows
3 byobu-5.124/usr/share/dbus-1/services
4 byobu-5.124/usr/share/doc/byobu
37 byobu-5.124/usr/share/man/man1
1 byobu-5.124/usr/share/sounds/byobu
add a comment |Â
up vote
0
down vote
If you don't mind running it twice (to get the count, then the lines), you can use grep.
For the count:
tar tvf myfile.tar | grep <path> | wc -l
For the lines, just remove the | wc -l
If you'd prefer to just run tar
once, you can save the output to a file then cat
it to grep and wc. The script all together would look something like this:
tmp_file=$(mktemp)
tar tvf myfile.tar > $tmp_file
cat $tmp_file | grep <subdir> | wc -l
cat $tmp_file | grep <subdir>
rm $tmp_file
If you want a one-liner there's probably a hack you can do with process substitution and redirection, but if you're running this with any cadence you'll probably end up putting it in a script/alias/function anyway so this is a little easier to read and understand.
If you have multiple paths in the tar file that you'd like to grep out, you can put them all in a text file and use grep -f <paths file>
Thanks for your answer, however, if I have more folders and files in my .tar file, I will have to point to them each for thegrep <path>
which is not the ideal solution.
â SeanM
Mar 6 at 21:00
To get a count of each path that is true, but if you use the script i wrote up your overhead is minimal and each grep is relatively cheap. You can use multiple patterns in grep. I updated the answer to reflect this, but you can also specify multiple patterns on the command line with '-e'
â smokes2345
Mar 7 at 18:46
add a comment |Â
4 Answers
4
active
oldest
votes
4 Answers
4
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
1
down vote
Another option:
tar tf archive.tar |
awk '
if (gsub("[^/]+$", "")) h[$0]++
END for (f in h) printf "%dt%sn", h[f], f
'
The first awk
statement strips filenames, and counts the instances of resulting directory paths. The second runs when the input has been fully consumed (i.e. at the end of stdin) and prints the list of paths and their respective counts.
The whole thing can be run into a single line if you prefer (just literally concatenate the whole lot). I've split it here for readability.
Result from running against your tarball:
4 ./root_folder/subfolder/folder_files_2/
3 ./root_folder/subfolder/folder_files_1/
add a comment |Â
up vote
1
down vote
Another option:
tar tf archive.tar |
awk '
if (gsub("[^/]+$", "")) h[$0]++
END for (f in h) printf "%dt%sn", h[f], f
'
The first awk
statement strips filenames, and counts the instances of resulting directory paths. The second runs when the input has been fully consumed (i.e. at the end of stdin) and prints the list of paths and their respective counts.
The whole thing can be run into a single line if you prefer (just literally concatenate the whole lot). I've split it here for readability.
Result from running against your tarball:
4 ./root_folder/subfolder/folder_files_2/
3 ./root_folder/subfolder/folder_files_1/
add a comment |Â
up vote
1
down vote
up vote
1
down vote
Another option:
tar tf archive.tar |
awk '
if (gsub("[^/]+$", "")) h[$0]++
END for (f in h) printf "%dt%sn", h[f], f
'
The first awk
statement strips filenames, and counts the instances of resulting directory paths. The second runs when the input has been fully consumed (i.e. at the end of stdin) and prints the list of paths and their respective counts.
The whole thing can be run into a single line if you prefer (just literally concatenate the whole lot). I've split it here for readability.
Result from running against your tarball:
4 ./root_folder/subfolder/folder_files_2/
3 ./root_folder/subfolder/folder_files_1/
Another option:
tar tf archive.tar |
awk '
if (gsub("[^/]+$", "")) h[$0]++
END for (f in h) printf "%dt%sn", h[f], f
'
The first awk
statement strips filenames, and counts the instances of resulting directory paths. The second runs when the input has been fully consumed (i.e. at the end of stdin) and prints the list of paths and their respective counts.
The whole thing can be run into a single line if you prefer (just literally concatenate the whole lot). I've split it here for readability.
Result from running against your tarball:
4 ./root_folder/subfolder/folder_files_2/
3 ./root_folder/subfolder/folder_files_1/
answered Mar 6 at 23:44


roaima
39.5k545107
39.5k545107
add a comment |Â
add a comment |Â
up vote
1
down vote
tar -tvf file.tar | grep '^-' | wc -l
This will count the number of lines in the tar
output that start with -
(i.e. files). Change /^-
to /^[^d]/
to count "anything but directories" if you have special types of files in your archive.
Another way, with awk
:
tar -tvf file.tar | awk '/^-/ n++ END print n '
Both of these commands outputs 7
, the total number of files in the archive.
If you want separate counts for each subfolder:
tar -tvf file.tar | awk '/^d/ d = $NF; next n[d]++ END for (d in n) print n[d], d '
This generates
4 ./root_folder/subfolder/folder_files_2/
3 ./root_folder/subfolder/folder_files_1/
for the data that you have provided.
The awk
code in this last example picks out the directory name from any line that starts with d
and uses it as a key in an associative array. The array entry is incremented for each found file. At the end, all entries and their count are printed.
1
Depending on whether pipes and device files count as "files", you might use something likegrep '^[^d]
to specifically omit directories.
â Jeff Schaller
Mar 6 at 20:12
Works for the data given, but$NF
doesn't work if (path)names contain whitespace, and that logic is wrong if the tar contains e.g./dir/file1,subdir/[abc],file2
â dave_thompson_085
Mar 7 at 7:20
@dave_thompson_085 I understand your note about white spaces, but I don't fully understand the comment about the logic. Are you concerned about sub-subfolders or subfolders occurring mixed in with files (I could understand that).
â Kusalananda
Mar 7 at 7:29
I think that's what @dave is talking about: files from parent directory listed after subdirectory and its files, in which cased
should be reset or extracted from the filename.
â muru
Mar 7 at 9:00
@muru Ah. Yes. Well, in this case this is a simple solution for simple archives...
â Kusalananda
Mar 7 at 9:06
add a comment |Â
up vote
1
down vote
tar -tvf file.tar | grep '^-' | wc -l
This will count the number of lines in the tar
output that start with -
(i.e. files). Change /^-
to /^[^d]/
to count "anything but directories" if you have special types of files in your archive.
Another way, with awk
:
tar -tvf file.tar | awk '/^-/ n++ END print n '
Both of these commands outputs 7
, the total number of files in the archive.
If you want separate counts for each subfolder:
tar -tvf file.tar | awk '/^d/ d = $NF; next n[d]++ END for (d in n) print n[d], d '
This generates
4 ./root_folder/subfolder/folder_files_2/
3 ./root_folder/subfolder/folder_files_1/
for the data that you have provided.
The awk
code in this last example picks out the directory name from any line that starts with d
and uses it as a key in an associative array. The array entry is incremented for each found file. At the end, all entries and their count are printed.
1
Depending on whether pipes and device files count as "files", you might use something likegrep '^[^d]
to specifically omit directories.
â Jeff Schaller
Mar 6 at 20:12
Works for the data given, but$NF
doesn't work if (path)names contain whitespace, and that logic is wrong if the tar contains e.g./dir/file1,subdir/[abc],file2
â dave_thompson_085
Mar 7 at 7:20
@dave_thompson_085 I understand your note about white spaces, but I don't fully understand the comment about the logic. Are you concerned about sub-subfolders or subfolders occurring mixed in with files (I could understand that).
â Kusalananda
Mar 7 at 7:29
I think that's what @dave is talking about: files from parent directory listed after subdirectory and its files, in which cased
should be reset or extracted from the filename.
â muru
Mar 7 at 9:00
@muru Ah. Yes. Well, in this case this is a simple solution for simple archives...
â Kusalananda
Mar 7 at 9:06
add a comment |Â
up vote
1
down vote
up vote
1
down vote
tar -tvf file.tar | grep '^-' | wc -l
This will count the number of lines in the tar
output that start with -
(i.e. files). Change /^-
to /^[^d]/
to count "anything but directories" if you have special types of files in your archive.
Another way, with awk
:
tar -tvf file.tar | awk '/^-/ n++ END print n '
Both of these commands outputs 7
, the total number of files in the archive.
If you want separate counts for each subfolder:
tar -tvf file.tar | awk '/^d/ d = $NF; next n[d]++ END for (d in n) print n[d], d '
This generates
4 ./root_folder/subfolder/folder_files_2/
3 ./root_folder/subfolder/folder_files_1/
for the data that you have provided.
The awk
code in this last example picks out the directory name from any line that starts with d
and uses it as a key in an associative array. The array entry is incremented for each found file. At the end, all entries and their count are printed.
tar -tvf file.tar | grep '^-' | wc -l
This will count the number of lines in the tar
output that start with -
(i.e. files). Change /^-
to /^[^d]/
to count "anything but directories" if you have special types of files in your archive.
Another way, with awk
:
tar -tvf file.tar | awk '/^-/ n++ END print n '
Both of these commands outputs 7
, the total number of files in the archive.
If you want separate counts for each subfolder:
tar -tvf file.tar | awk '/^d/ d = $NF; next n[d]++ END for (d in n) print n[d], d '
This generates
4 ./root_folder/subfolder/folder_files_2/
3 ./root_folder/subfolder/folder_files_1/
for the data that you have provided.
The awk
code in this last example picks out the directory name from any line that starts with d
and uses it as a key in an associative array. The array entry is incremented for each found file. At the end, all entries and their count are printed.
edited Mar 7 at 6:29
answered Mar 6 at 19:44


Kusalananda
103k13202318
103k13202318
1
Depending on whether pipes and device files count as "files", you might use something likegrep '^[^d]
to specifically omit directories.
â Jeff Schaller
Mar 6 at 20:12
Works for the data given, but$NF
doesn't work if (path)names contain whitespace, and that logic is wrong if the tar contains e.g./dir/file1,subdir/[abc],file2
â dave_thompson_085
Mar 7 at 7:20
@dave_thompson_085 I understand your note about white spaces, but I don't fully understand the comment about the logic. Are you concerned about sub-subfolders or subfolders occurring mixed in with files (I could understand that).
â Kusalananda
Mar 7 at 7:29
I think that's what @dave is talking about: files from parent directory listed after subdirectory and its files, in which cased
should be reset or extracted from the filename.
â muru
Mar 7 at 9:00
@muru Ah. Yes. Well, in this case this is a simple solution for simple archives...
â Kusalananda
Mar 7 at 9:06
add a comment |Â
1
Depending on whether pipes and device files count as "files", you might use something likegrep '^[^d]
to specifically omit directories.
â Jeff Schaller
Mar 6 at 20:12
Works for the data given, but$NF
doesn't work if (path)names contain whitespace, and that logic is wrong if the tar contains e.g./dir/file1,subdir/[abc],file2
â dave_thompson_085
Mar 7 at 7:20
@dave_thompson_085 I understand your note about white spaces, but I don't fully understand the comment about the logic. Are you concerned about sub-subfolders or subfolders occurring mixed in with files (I could understand that).
â Kusalananda
Mar 7 at 7:29
I think that's what @dave is talking about: files from parent directory listed after subdirectory and its files, in which cased
should be reset or extracted from the filename.
â muru
Mar 7 at 9:00
@muru Ah. Yes. Well, in this case this is a simple solution for simple archives...
â Kusalananda
Mar 7 at 9:06
1
1
Depending on whether pipes and device files count as "files", you might use something like
grep '^[^d]
to specifically omit directories.â Jeff Schaller
Mar 6 at 20:12
Depending on whether pipes and device files count as "files", you might use something like
grep '^[^d]
to specifically omit directories.â Jeff Schaller
Mar 6 at 20:12
Works for the data given, but
$NF
doesn't work if (path)names contain whitespace, and that logic is wrong if the tar contains e.g. /dir/file1,subdir/[abc],file2
â dave_thompson_085
Mar 7 at 7:20
Works for the data given, but
$NF
doesn't work if (path)names contain whitespace, and that logic is wrong if the tar contains e.g. /dir/file1,subdir/[abc],file2
â dave_thompson_085
Mar 7 at 7:20
@dave_thompson_085 I understand your note about white spaces, but I don't fully understand the comment about the logic. Are you concerned about sub-subfolders or subfolders occurring mixed in with files (I could understand that).
â Kusalananda
Mar 7 at 7:29
@dave_thompson_085 I understand your note about white spaces, but I don't fully understand the comment about the logic. Are you concerned about sub-subfolders or subfolders occurring mixed in with files (I could understand that).
â Kusalananda
Mar 7 at 7:29
I think that's what @dave is talking about: files from parent directory listed after subdirectory and its files, in which case
d
should be reset or extracted from the filename.â muru
Mar 7 at 9:00
I think that's what @dave is talking about: files from parent directory listed after subdirectory and its files, in which case
d
should be reset or extracted from the filename.â muru
Mar 7 at 9:00
@muru Ah. Yes. Well, in this case this is a simple solution for simple archives...
â Kusalananda
Mar 7 at 9:06
@muru Ah. Yes. Well, in this case this is a simple solution for simple archives...
â Kusalananda
Mar 7 at 9:06
add a comment |Â
up vote
1
down vote
If you have GNU tar, it has a --to-command
option:
--to-command=COMMAND
Pipe extracted files to COMMAND. The argument is the pathname
of an external program, optionally with command line
arguments. The program will be invoked and the contents of
the file being extracted supplied to it on its standard
output. Additional data will be supplied via the following
environment variables:
TAR_FILETYPE
Type of the file. It is a single letter with the
following meaning:
f Regular file
d Directory
l Symbolic link
h Hard link
b Block device
c Character device
Currently only regular files are supported.
...
TAR_FILENAME
The name of the file.
These variables can be used to safely handle filenames with spaces, etc.
For example, using shell string substitution to remove the filename from the path given, then using sed to print only the paths for non-directories, you can then sort and apply uniq -c
to get the count:
tar xf foo.tar --to-command 'echo "$TAR_FILETYPE" "$TAR_FILENAME%/*"' |
sed -n '/^[^d]/s/^. //p' |
sort |
uniq -c
If you have GNU sed, sort and uniq, you can use their -z
options and printf "%s %s"
instead of echo
to safely handle all filenames.
Example:
% tar xf dev/pacaur/byobu/byobu_5.124.orig.tar.gz --to-command 'printf "%s %s" "$TAR_FILETYPE" "$TAR_FILENAME%/*"' | sed -zn '/^[^d]/s/^. //p' | sort -z | uniq -zc | tr '' 'n'
15 byobu-5.124
2 byobu-5.124/Applications/Byobu.app/Contents
1 byobu-5.124/Applications/Byobu.app/Contents/MacOS
8 byobu-5.124/Applications/Byobu.app/Contents/Resources
4 byobu-5.124/etc/byobu
3 byobu-5.124/etc/profile.d
1 byobu-5.124/experimental
23 byobu-5.124/po
1 byobu-5.124/snap
38 byobu-5.124/usr/bin
43 byobu-5.124/usr/lib/byobu
18 byobu-5.124/usr/lib/byobu/include
1 byobu-5.124/usr/share/appdata
4 byobu-5.124/usr/share/byobu/desktop
12 byobu-5.124/usr/share/byobu/keybindings
4 byobu-5.124/usr/share/byobu/pixmaps
1 byobu-5.124/usr/share/byobu/pixmaps/highcontrast
11 byobu-5.124/usr/share/byobu/profiles
4 byobu-5.124/usr/share/byobu/status
3 byobu-5.124/usr/share/byobu/tests
3 byobu-5.124/usr/share/byobu/windows
3 byobu-5.124/usr/share/dbus-1/services
4 byobu-5.124/usr/share/doc/byobu
37 byobu-5.124/usr/share/man/man1
1 byobu-5.124/usr/share/sounds/byobu
add a comment |Â
up vote
1
down vote
If you have GNU tar, it has a --to-command
option:
--to-command=COMMAND
Pipe extracted files to COMMAND. The argument is the pathname
of an external program, optionally with command line
arguments. The program will be invoked and the contents of
the file being extracted supplied to it on its standard
output. Additional data will be supplied via the following
environment variables:
TAR_FILETYPE
Type of the file. It is a single letter with the
following meaning:
f Regular file
d Directory
l Symbolic link
h Hard link
b Block device
c Character device
Currently only regular files are supported.
...
TAR_FILENAME
The name of the file.
These variables can be used to safely handle filenames with spaces, etc.
For example, using shell string substitution to remove the filename from the path given, then using sed to print only the paths for non-directories, you can then sort and apply uniq -c
to get the count:
tar xf foo.tar --to-command 'echo "$TAR_FILETYPE" "$TAR_FILENAME%/*"' |
sed -n '/^[^d]/s/^. //p' |
sort |
uniq -c
If you have GNU sed, sort and uniq, you can use their -z
options and printf "%s %s"
instead of echo
to safely handle all filenames.
Example:
% tar xf dev/pacaur/byobu/byobu_5.124.orig.tar.gz --to-command 'printf "%s %s" "$TAR_FILETYPE" "$TAR_FILENAME%/*"' | sed -zn '/^[^d]/s/^. //p' | sort -z | uniq -zc | tr '' 'n'
15 byobu-5.124
2 byobu-5.124/Applications/Byobu.app/Contents
1 byobu-5.124/Applications/Byobu.app/Contents/MacOS
8 byobu-5.124/Applications/Byobu.app/Contents/Resources
4 byobu-5.124/etc/byobu
3 byobu-5.124/etc/profile.d
1 byobu-5.124/experimental
23 byobu-5.124/po
1 byobu-5.124/snap
38 byobu-5.124/usr/bin
43 byobu-5.124/usr/lib/byobu
18 byobu-5.124/usr/lib/byobu/include
1 byobu-5.124/usr/share/appdata
4 byobu-5.124/usr/share/byobu/desktop
12 byobu-5.124/usr/share/byobu/keybindings
4 byobu-5.124/usr/share/byobu/pixmaps
1 byobu-5.124/usr/share/byobu/pixmaps/highcontrast
11 byobu-5.124/usr/share/byobu/profiles
4 byobu-5.124/usr/share/byobu/status
3 byobu-5.124/usr/share/byobu/tests
3 byobu-5.124/usr/share/byobu/windows
3 byobu-5.124/usr/share/dbus-1/services
4 byobu-5.124/usr/share/doc/byobu
37 byobu-5.124/usr/share/man/man1
1 byobu-5.124/usr/share/sounds/byobu
add a comment |Â
up vote
1
down vote
up vote
1
down vote
If you have GNU tar, it has a --to-command
option:
--to-command=COMMAND
Pipe extracted files to COMMAND. The argument is the pathname
of an external program, optionally with command line
arguments. The program will be invoked and the contents of
the file being extracted supplied to it on its standard
output. Additional data will be supplied via the following
environment variables:
TAR_FILETYPE
Type of the file. It is a single letter with the
following meaning:
f Regular file
d Directory
l Symbolic link
h Hard link
b Block device
c Character device
Currently only regular files are supported.
...
TAR_FILENAME
The name of the file.
These variables can be used to safely handle filenames with spaces, etc.
For example, using shell string substitution to remove the filename from the path given, then using sed to print only the paths for non-directories, you can then sort and apply uniq -c
to get the count:
tar xf foo.tar --to-command 'echo "$TAR_FILETYPE" "$TAR_FILENAME%/*"' |
sed -n '/^[^d]/s/^. //p' |
sort |
uniq -c
If you have GNU sed, sort and uniq, you can use their -z
options and printf "%s %s"
instead of echo
to safely handle all filenames.
Example:
% tar xf dev/pacaur/byobu/byobu_5.124.orig.tar.gz --to-command 'printf "%s %s" "$TAR_FILETYPE" "$TAR_FILENAME%/*"' | sed -zn '/^[^d]/s/^. //p' | sort -z | uniq -zc | tr '' 'n'
15 byobu-5.124
2 byobu-5.124/Applications/Byobu.app/Contents
1 byobu-5.124/Applications/Byobu.app/Contents/MacOS
8 byobu-5.124/Applications/Byobu.app/Contents/Resources
4 byobu-5.124/etc/byobu
3 byobu-5.124/etc/profile.d
1 byobu-5.124/experimental
23 byobu-5.124/po
1 byobu-5.124/snap
38 byobu-5.124/usr/bin
43 byobu-5.124/usr/lib/byobu
18 byobu-5.124/usr/lib/byobu/include
1 byobu-5.124/usr/share/appdata
4 byobu-5.124/usr/share/byobu/desktop
12 byobu-5.124/usr/share/byobu/keybindings
4 byobu-5.124/usr/share/byobu/pixmaps
1 byobu-5.124/usr/share/byobu/pixmaps/highcontrast
11 byobu-5.124/usr/share/byobu/profiles
4 byobu-5.124/usr/share/byobu/status
3 byobu-5.124/usr/share/byobu/tests
3 byobu-5.124/usr/share/byobu/windows
3 byobu-5.124/usr/share/dbus-1/services
4 byobu-5.124/usr/share/doc/byobu
37 byobu-5.124/usr/share/man/man1
1 byobu-5.124/usr/share/sounds/byobu
If you have GNU tar, it has a --to-command
option:
--to-command=COMMAND
Pipe extracted files to COMMAND. The argument is the pathname
of an external program, optionally with command line
arguments. The program will be invoked and the contents of
the file being extracted supplied to it on its standard
output. Additional data will be supplied via the following
environment variables:
TAR_FILETYPE
Type of the file. It is a single letter with the
following meaning:
f Regular file
d Directory
l Symbolic link
h Hard link
b Block device
c Character device
Currently only regular files are supported.
...
TAR_FILENAME
The name of the file.
These variables can be used to safely handle filenames with spaces, etc.
For example, using shell string substitution to remove the filename from the path given, then using sed to print only the paths for non-directories, you can then sort and apply uniq -c
to get the count:
tar xf foo.tar --to-command 'echo "$TAR_FILETYPE" "$TAR_FILENAME%/*"' |
sed -n '/^[^d]/s/^. //p' |
sort |
uniq -c
If you have GNU sed, sort and uniq, you can use their -z
options and printf "%s %s"
instead of echo
to safely handle all filenames.
Example:
% tar xf dev/pacaur/byobu/byobu_5.124.orig.tar.gz --to-command 'printf "%s %s" "$TAR_FILETYPE" "$TAR_FILENAME%/*"' | sed -zn '/^[^d]/s/^. //p' | sort -z | uniq -zc | tr '' 'n'
15 byobu-5.124
2 byobu-5.124/Applications/Byobu.app/Contents
1 byobu-5.124/Applications/Byobu.app/Contents/MacOS
8 byobu-5.124/Applications/Byobu.app/Contents/Resources
4 byobu-5.124/etc/byobu
3 byobu-5.124/etc/profile.d
1 byobu-5.124/experimental
23 byobu-5.124/po
1 byobu-5.124/snap
38 byobu-5.124/usr/bin
43 byobu-5.124/usr/lib/byobu
18 byobu-5.124/usr/lib/byobu/include
1 byobu-5.124/usr/share/appdata
4 byobu-5.124/usr/share/byobu/desktop
12 byobu-5.124/usr/share/byobu/keybindings
4 byobu-5.124/usr/share/byobu/pixmaps
1 byobu-5.124/usr/share/byobu/pixmaps/highcontrast
11 byobu-5.124/usr/share/byobu/profiles
4 byobu-5.124/usr/share/byobu/status
3 byobu-5.124/usr/share/byobu/tests
3 byobu-5.124/usr/share/byobu/windows
3 byobu-5.124/usr/share/dbus-1/services
4 byobu-5.124/usr/share/doc/byobu
37 byobu-5.124/usr/share/man/man1
1 byobu-5.124/usr/share/sounds/byobu
answered Mar 7 at 6:54
muru
33.4k577141
33.4k577141
add a comment |Â
add a comment |Â
up vote
0
down vote
If you don't mind running it twice (to get the count, then the lines), you can use grep.
For the count:
tar tvf myfile.tar | grep <path> | wc -l
For the lines, just remove the | wc -l
If you'd prefer to just run tar
once, you can save the output to a file then cat
it to grep and wc. The script all together would look something like this:
tmp_file=$(mktemp)
tar tvf myfile.tar > $tmp_file
cat $tmp_file | grep <subdir> | wc -l
cat $tmp_file | grep <subdir>
rm $tmp_file
If you want a one-liner there's probably a hack you can do with process substitution and redirection, but if you're running this with any cadence you'll probably end up putting it in a script/alias/function anyway so this is a little easier to read and understand.
If you have multiple paths in the tar file that you'd like to grep out, you can put them all in a text file and use grep -f <paths file>
Thanks for your answer, however, if I have more folders and files in my .tar file, I will have to point to them each for thegrep <path>
which is not the ideal solution.
â SeanM
Mar 6 at 21:00
To get a count of each path that is true, but if you use the script i wrote up your overhead is minimal and each grep is relatively cheap. You can use multiple patterns in grep. I updated the answer to reflect this, but you can also specify multiple patterns on the command line with '-e'
â smokes2345
Mar 7 at 18:46
add a comment |Â
up vote
0
down vote
If you don't mind running it twice (to get the count, then the lines), you can use grep.
For the count:
tar tvf myfile.tar | grep <path> | wc -l
For the lines, just remove the | wc -l
If you'd prefer to just run tar
once, you can save the output to a file then cat
it to grep and wc. The script all together would look something like this:
tmp_file=$(mktemp)
tar tvf myfile.tar > $tmp_file
cat $tmp_file | grep <subdir> | wc -l
cat $tmp_file | grep <subdir>
rm $tmp_file
If you want a one-liner there's probably a hack you can do with process substitution and redirection, but if you're running this with any cadence you'll probably end up putting it in a script/alias/function anyway so this is a little easier to read and understand.
If you have multiple paths in the tar file that you'd like to grep out, you can put them all in a text file and use grep -f <paths file>
Thanks for your answer, however, if I have more folders and files in my .tar file, I will have to point to them each for thegrep <path>
which is not the ideal solution.
â SeanM
Mar 6 at 21:00
To get a count of each path that is true, but if you use the script i wrote up your overhead is minimal and each grep is relatively cheap. You can use multiple patterns in grep. I updated the answer to reflect this, but you can also specify multiple patterns on the command line with '-e'
â smokes2345
Mar 7 at 18:46
add a comment |Â
up vote
0
down vote
up vote
0
down vote
If you don't mind running it twice (to get the count, then the lines), you can use grep.
For the count:
tar tvf myfile.tar | grep <path> | wc -l
For the lines, just remove the | wc -l
If you'd prefer to just run tar
once, you can save the output to a file then cat
it to grep and wc. The script all together would look something like this:
tmp_file=$(mktemp)
tar tvf myfile.tar > $tmp_file
cat $tmp_file | grep <subdir> | wc -l
cat $tmp_file | grep <subdir>
rm $tmp_file
If you want a one-liner there's probably a hack you can do with process substitution and redirection, but if you're running this with any cadence you'll probably end up putting it in a script/alias/function anyway so this is a little easier to read and understand.
If you have multiple paths in the tar file that you'd like to grep out, you can put them all in a text file and use grep -f <paths file>
If you don't mind running it twice (to get the count, then the lines), you can use grep.
For the count:
tar tvf myfile.tar | grep <path> | wc -l
For the lines, just remove the | wc -l
If you'd prefer to just run tar
once, you can save the output to a file then cat
it to grep and wc. The script all together would look something like this:
tmp_file=$(mktemp)
tar tvf myfile.tar > $tmp_file
cat $tmp_file | grep <subdir> | wc -l
cat $tmp_file | grep <subdir>
rm $tmp_file
If you want a one-liner there's probably a hack you can do with process substitution and redirection, but if you're running this with any cadence you'll probably end up putting it in a script/alias/function anyway so this is a little easier to read and understand.
If you have multiple paths in the tar file that you'd like to grep out, you can put them all in a text file and use grep -f <paths file>
edited Mar 7 at 18:44
answered Mar 6 at 19:28
smokes2345
697314
697314
Thanks for your answer, however, if I have more folders and files in my .tar file, I will have to point to them each for thegrep <path>
which is not the ideal solution.
â SeanM
Mar 6 at 21:00
To get a count of each path that is true, but if you use the script i wrote up your overhead is minimal and each grep is relatively cheap. You can use multiple patterns in grep. I updated the answer to reflect this, but you can also specify multiple patterns on the command line with '-e'
â smokes2345
Mar 7 at 18:46
add a comment |Â
Thanks for your answer, however, if I have more folders and files in my .tar file, I will have to point to them each for thegrep <path>
which is not the ideal solution.
â SeanM
Mar 6 at 21:00
To get a count of each path that is true, but if you use the script i wrote up your overhead is minimal and each grep is relatively cheap. You can use multiple patterns in grep. I updated the answer to reflect this, but you can also specify multiple patterns on the command line with '-e'
â smokes2345
Mar 7 at 18:46
Thanks for your answer, however, if I have more folders and files in my .tar file, I will have to point to them each for the
grep <path>
which is not the ideal solution.â SeanM
Mar 6 at 21:00
Thanks for your answer, however, if I have more folders and files in my .tar file, I will have to point to them each for the
grep <path>
which is not the ideal solution.â SeanM
Mar 6 at 21:00
To get a count of each path that is true, but if you use the script i wrote up your overhead is minimal and each grep is relatively cheap. You can use multiple patterns in grep. I updated the answer to reflect this, but you can also specify multiple patterns on the command line with '-e'
â smokes2345
Mar 7 at 18:46
To get a count of each path that is true, but if you use the script i wrote up your overhead is minimal and each grep is relatively cheap. You can use multiple patterns in grep. I updated the answer to reflect this, but you can also specify multiple patterns on the command line with '-e'
â smokes2345
Mar 7 at 18:46
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f428586%2frecursively-count-number-of-files-in-folders-in-tar-file%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
The same solution in your other question should work:
tar tvf myfile.tar | wc -l
â Nasir Riley
Mar 6 at 19:24
@NasirRiley No, it won't. That will count everything in the tar file, now he's asking for only certain paths.
â smokes2345
Mar 6 at 19:26
The way that he's worded it is somewhat confusing. Perhaps it can be certain that he wants to find only files but I don't see where it says that he's looking for certain paths. The answer right below this will give him what he wants if it's only files but if he only wants certain paths then it's going to get really hairy and convoluted.
â Nasir Riley
Mar 6 at 23:13