Filter specific numbers from multiple files
Clash Royale CLAN TAG#URR8PPP
up vote
-2
down vote
favorite
I have multiple files (apx. 150) that look like this:
reconstructed_hap_4_Local_nt_haplo_freq_60.3 GGGCAACTGGGCCAAGGTCGCTATCATCATGGTTATGTTTTCAGGGGTCGATGCCAATACATATATCACCGGTGGCAAAGCAGCTCAAACTGCCAGAGGCCTTGTTGGCTGGTTTAATCCGGGTCCCAAACAGAACCTGCAGCTGGTCAACACCAATGGCTCGTGGCA
reconstructed_hap_6_Local_nt_haplo_freq_37.2 GGGCAACTGGGCCAAGGTCGCTATCATCATGGTTATGTTTTCAGGGGTCGATGCCGAAACATATGCCTCCGGTGGCAGTGCAGCTCGTAATACCTG-GGCCTTTCTAGCTTGTTTAGTTCGGGTCCCAAACAGAGCCTGCAGCTGGTCAACACCAATGGCTCGTGGCA
reconstructed_hap_1_Local_nt_haplo_freq_0.6 GGGCAACTGGGCCAAGGTCGCTATCATCATGGTTATGTTTTCAGGGGTCGATGCCAATACATATATCACCGGTGGCAAAGCAGCTCAAACTGCCAGAGGCCTTGTTTGGCTGTTTAATCCGGGTCCCAAACAGAACCTGCAGCTGGTCAACACCAATGGCTCGTGGCA
Each file has different number of lines.
I would like to filter from each files numbers from title line after "freq_"
In this example, I would like to filter: 60.3, 37.2, 0.6
The most preferred output should be a CSV file having each sample name
Filename1 60.3 37.2 0.6
Filename2 56.1 26.2 52.3 42.1
Filename3 2.5 1.2
Do you have any solutions?
awk r
add a comment |Â
up vote
-2
down vote
favorite
I have multiple files (apx. 150) that look like this:
reconstructed_hap_4_Local_nt_haplo_freq_60.3 GGGCAACTGGGCCAAGGTCGCTATCATCATGGTTATGTTTTCAGGGGTCGATGCCAATACATATATCACCGGTGGCAAAGCAGCTCAAACTGCCAGAGGCCTTGTTGGCTGGTTTAATCCGGGTCCCAAACAGAACCTGCAGCTGGTCAACACCAATGGCTCGTGGCA
reconstructed_hap_6_Local_nt_haplo_freq_37.2 GGGCAACTGGGCCAAGGTCGCTATCATCATGGTTATGTTTTCAGGGGTCGATGCCGAAACATATGCCTCCGGTGGCAGTGCAGCTCGTAATACCTG-GGCCTTTCTAGCTTGTTTAGTTCGGGTCCCAAACAGAGCCTGCAGCTGGTCAACACCAATGGCTCGTGGCA
reconstructed_hap_1_Local_nt_haplo_freq_0.6 GGGCAACTGGGCCAAGGTCGCTATCATCATGGTTATGTTTTCAGGGGTCGATGCCAATACATATATCACCGGTGGCAAAGCAGCTCAAACTGCCAGAGGCCTTGTTTGGCTGTTTAATCCGGGTCCCAAACAGAACCTGCAGCTGGTCAACACCAATGGCTCGTGGCA
Each file has different number of lines.
I would like to filter from each files numbers from title line after "freq_"
In this example, I would like to filter: 60.3, 37.2, 0.6
The most preferred output should be a CSV file having each sample name
Filename1 60.3 37.2 0.6
Filename2 56.1 26.2 52.3 42.1
Filename3 2.5 1.2
Do you have any solutions?
awk r
Sorry, I've made a mistake. It should be one word. I edited it.
â k_a_r_o_l
Aug 21 at 11:54
again... what doesFilename1
mean?
â msp9011
Aug 21 at 12:02
name of first file.
â k_a_r_o_l
Aug 21 at 12:07
What is a title line? How we detect them?
â andcoz
Aug 21 at 12:11
add a comment |Â
up vote
-2
down vote
favorite
up vote
-2
down vote
favorite
I have multiple files (apx. 150) that look like this:
reconstructed_hap_4_Local_nt_haplo_freq_60.3 GGGCAACTGGGCCAAGGTCGCTATCATCATGGTTATGTTTTCAGGGGTCGATGCCAATACATATATCACCGGTGGCAAAGCAGCTCAAACTGCCAGAGGCCTTGTTGGCTGGTTTAATCCGGGTCCCAAACAGAACCTGCAGCTGGTCAACACCAATGGCTCGTGGCA
reconstructed_hap_6_Local_nt_haplo_freq_37.2 GGGCAACTGGGCCAAGGTCGCTATCATCATGGTTATGTTTTCAGGGGTCGATGCCGAAACATATGCCTCCGGTGGCAGTGCAGCTCGTAATACCTG-GGCCTTTCTAGCTTGTTTAGTTCGGGTCCCAAACAGAGCCTGCAGCTGGTCAACACCAATGGCTCGTGGCA
reconstructed_hap_1_Local_nt_haplo_freq_0.6 GGGCAACTGGGCCAAGGTCGCTATCATCATGGTTATGTTTTCAGGGGTCGATGCCAATACATATATCACCGGTGGCAAAGCAGCTCAAACTGCCAGAGGCCTTGTTTGGCTGTTTAATCCGGGTCCCAAACAGAACCTGCAGCTGGTCAACACCAATGGCTCGTGGCA
Each file has different number of lines.
I would like to filter from each files numbers from title line after "freq_"
In this example, I would like to filter: 60.3, 37.2, 0.6
The most preferred output should be a CSV file having each sample name
Filename1 60.3 37.2 0.6
Filename2 56.1 26.2 52.3 42.1
Filename3 2.5 1.2
Do you have any solutions?
awk r
I have multiple files (apx. 150) that look like this:
reconstructed_hap_4_Local_nt_haplo_freq_60.3 GGGCAACTGGGCCAAGGTCGCTATCATCATGGTTATGTTTTCAGGGGTCGATGCCAATACATATATCACCGGTGGCAAAGCAGCTCAAACTGCCAGAGGCCTTGTTGGCTGGTTTAATCCGGGTCCCAAACAGAACCTGCAGCTGGTCAACACCAATGGCTCGTGGCA
reconstructed_hap_6_Local_nt_haplo_freq_37.2 GGGCAACTGGGCCAAGGTCGCTATCATCATGGTTATGTTTTCAGGGGTCGATGCCGAAACATATGCCTCCGGTGGCAGTGCAGCTCGTAATACCTG-GGCCTTTCTAGCTTGTTTAGTTCGGGTCCCAAACAGAGCCTGCAGCTGGTCAACACCAATGGCTCGTGGCA
reconstructed_hap_1_Local_nt_haplo_freq_0.6 GGGCAACTGGGCCAAGGTCGCTATCATCATGGTTATGTTTTCAGGGGTCGATGCCAATACATATATCACCGGTGGCAAAGCAGCTCAAACTGCCAGAGGCCTTGTTTGGCTGTTTAATCCGGGTCCCAAACAGAACCTGCAGCTGGTCAACACCAATGGCTCGTGGCA
Each file has different number of lines.
I would like to filter from each files numbers from title line after "freq_"
In this example, I would like to filter: 60.3, 37.2, 0.6
The most preferred output should be a CSV file having each sample name
Filename1 60.3 37.2 0.6
Filename2 56.1 26.2 52.3 42.1
Filename3 2.5 1.2
Do you have any solutions?
awk r
awk r
edited Aug 21 at 11:53
asked Aug 21 at 11:35
k_a_r_o_l
122
122
Sorry, I've made a mistake. It should be one word. I edited it.
â k_a_r_o_l
Aug 21 at 11:54
again... what doesFilename1
mean?
â msp9011
Aug 21 at 12:02
name of first file.
â k_a_r_o_l
Aug 21 at 12:07
What is a title line? How we detect them?
â andcoz
Aug 21 at 12:11
add a comment |Â
Sorry, I've made a mistake. It should be one word. I edited it.
â k_a_r_o_l
Aug 21 at 11:54
again... what doesFilename1
mean?
â msp9011
Aug 21 at 12:02
name of first file.
â k_a_r_o_l
Aug 21 at 12:07
What is a title line? How we detect them?
â andcoz
Aug 21 at 12:11
Sorry, I've made a mistake. It should be one word. I edited it.
â k_a_r_o_l
Aug 21 at 11:54
Sorry, I've made a mistake. It should be one word. I edited it.
â k_a_r_o_l
Aug 21 at 11:54
again... what does
Filename1
mean?â msp9011
Aug 21 at 12:02
again... what does
Filename1
mean?â msp9011
Aug 21 at 12:02
name of first file.
â k_a_r_o_l
Aug 21 at 12:07
name of first file.
â k_a_r_o_l
Aug 21 at 12:07
What is a title line? How we detect them?
â andcoz
Aug 21 at 12:11
What is a title line? How we detect them?
â andcoz
Aug 21 at 12:11
add a comment |Â
4 Answers
4
active
oldest
votes
up vote
0
down vote
Try this,
cd /path/to/directory
for i in `ls`
do
VALUE=`awk 'print $1' $i | awk -F '_' 'print $NF' | tr 'n' 't'`
echo -e "$it$VALUE"
done
add a comment |Â
up vote
0
down vote
With GNU Awk:
awk '
BEGINFILE i=0
n=split($1,a,"_")
freqs[i++] = a[n]
ENDFILE
printf FILENAME
for (j=0;j<i;j++) printf("t%s", freqs[j])
printf "n"
delete freqs
' Filename*
Ex.
$ awk 'BEGINFILEi=0; n=split($1,a,"_"); freqs[i++] = a[n] ENDFILEprintf FILENAME; for (j=0;j<i;j++) printf("t%s", freqs[j]); printf "n"; delete freqs' Filename*
Filename1 60.3 37.2 0.6
Filename2 56.1 26.2 52.3
add a comment |Â
up vote
0
down vote
Shell Script:
for file_number in 1..150
do
data=$( cat file$file_number.txt | cut -f1 -d' ' | cut -f8 -d'_' | tr 'n' 't' )
#echo $data
file_name="file$file_number.txt"
content="$file_name $data"
#echo $content
echo $content >> result.csv
done
result.csv
file contains the expected result.
EDIT: The following code is better
#!/bin/bash
FILES=/path/to/directory
for file in $FILES
do
data=$( cat $file | cut -f1 -d' ' | cut -f8 -d'_' | tr 'n' 't' )
content="$file $data"
echo $content >> result.csv
done
Explanation
FILES
contains all the input files.
Using cut
command we get the field (which contains the float number).
Using tr
we replace the tabs to new lines.result.csv
file contains your expected result.
add a comment |Â
up vote
0
down vote
With GNU awk (extended command):
awk -F '[ _]' '
/^[^ ]*_[^ _]* /
a[FILENAME]=a[FILENAME] " " $(NF-1)
END
for(i in a)print i,a[i]
' Filename*
May be executed as a one liner:
$ awk -F '[ _]' '/^[^ ]*_[^ _]* /a[FILENAME]=a[FILENAME] " " $(NF-1)ENDfor(i in a)print i,a[i]' Filename*
Filename1 60.3 37.2 0.6
Filename2 56.1 26.2 52.3
add a comment |Â
4 Answers
4
active
oldest
votes
4 Answers
4
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
0
down vote
Try this,
cd /path/to/directory
for i in `ls`
do
VALUE=`awk 'print $1' $i | awk -F '_' 'print $NF' | tr 'n' 't'`
echo -e "$it$VALUE"
done
add a comment |Â
up vote
0
down vote
Try this,
cd /path/to/directory
for i in `ls`
do
VALUE=`awk 'print $1' $i | awk -F '_' 'print $NF' | tr 'n' 't'`
echo -e "$it$VALUE"
done
add a comment |Â
up vote
0
down vote
up vote
0
down vote
Try this,
cd /path/to/directory
for i in `ls`
do
VALUE=`awk 'print $1' $i | awk -F '_' 'print $NF' | tr 'n' 't'`
echo -e "$it$VALUE"
done
Try this,
cd /path/to/directory
for i in `ls`
do
VALUE=`awk 'print $1' $i | awk -F '_' 'print $NF' | tr 'n' 't'`
echo -e "$it$VALUE"
done
answered Aug 21 at 12:16
msp9011
3,46643862
3,46643862
add a comment |Â
add a comment |Â
up vote
0
down vote
With GNU Awk:
awk '
BEGINFILE i=0
n=split($1,a,"_")
freqs[i++] = a[n]
ENDFILE
printf FILENAME
for (j=0;j<i;j++) printf("t%s", freqs[j])
printf "n"
delete freqs
' Filename*
Ex.
$ awk 'BEGINFILEi=0; n=split($1,a,"_"); freqs[i++] = a[n] ENDFILEprintf FILENAME; for (j=0;j<i;j++) printf("t%s", freqs[j]); printf "n"; delete freqs' Filename*
Filename1 60.3 37.2 0.6
Filename2 56.1 26.2 52.3
add a comment |Â
up vote
0
down vote
With GNU Awk:
awk '
BEGINFILE i=0
n=split($1,a,"_")
freqs[i++] = a[n]
ENDFILE
printf FILENAME
for (j=0;j<i;j++) printf("t%s", freqs[j])
printf "n"
delete freqs
' Filename*
Ex.
$ awk 'BEGINFILEi=0; n=split($1,a,"_"); freqs[i++] = a[n] ENDFILEprintf FILENAME; for (j=0;j<i;j++) printf("t%s", freqs[j]); printf "n"; delete freqs' Filename*
Filename1 60.3 37.2 0.6
Filename2 56.1 26.2 52.3
add a comment |Â
up vote
0
down vote
up vote
0
down vote
With GNU Awk:
awk '
BEGINFILE i=0
n=split($1,a,"_")
freqs[i++] = a[n]
ENDFILE
printf FILENAME
for (j=0;j<i;j++) printf("t%s", freqs[j])
printf "n"
delete freqs
' Filename*
Ex.
$ awk 'BEGINFILEi=0; n=split($1,a,"_"); freqs[i++] = a[n] ENDFILEprintf FILENAME; for (j=0;j<i;j++) printf("t%s", freqs[j]); printf "n"; delete freqs' Filename*
Filename1 60.3 37.2 0.6
Filename2 56.1 26.2 52.3
With GNU Awk:
awk '
BEGINFILE i=0
n=split($1,a,"_")
freqs[i++] = a[n]
ENDFILE
printf FILENAME
for (j=0;j<i;j++) printf("t%s", freqs[j])
printf "n"
delete freqs
' Filename*
Ex.
$ awk 'BEGINFILEi=0; n=split($1,a,"_"); freqs[i++] = a[n] ENDFILEprintf FILENAME; for (j=0;j<i;j++) printf("t%s", freqs[j]); printf "n"; delete freqs' Filename*
Filename1 60.3 37.2 0.6
Filename2 56.1 26.2 52.3
edited Aug 21 at 12:33
answered Aug 21 at 12:28
steeldriver
32.2k34979
32.2k34979
add a comment |Â
add a comment |Â
up vote
0
down vote
Shell Script:
for file_number in 1..150
do
data=$( cat file$file_number.txt | cut -f1 -d' ' | cut -f8 -d'_' | tr 'n' 't' )
#echo $data
file_name="file$file_number.txt"
content="$file_name $data"
#echo $content
echo $content >> result.csv
done
result.csv
file contains the expected result.
EDIT: The following code is better
#!/bin/bash
FILES=/path/to/directory
for file in $FILES
do
data=$( cat $file | cut -f1 -d' ' | cut -f8 -d'_' | tr 'n' 't' )
content="$file $data"
echo $content >> result.csv
done
Explanation
FILES
contains all the input files.
Using cut
command we get the field (which contains the float number).
Using tr
we replace the tabs to new lines.result.csv
file contains your expected result.
add a comment |Â
up vote
0
down vote
Shell Script:
for file_number in 1..150
do
data=$( cat file$file_number.txt | cut -f1 -d' ' | cut -f8 -d'_' | tr 'n' 't' )
#echo $data
file_name="file$file_number.txt"
content="$file_name $data"
#echo $content
echo $content >> result.csv
done
result.csv
file contains the expected result.
EDIT: The following code is better
#!/bin/bash
FILES=/path/to/directory
for file in $FILES
do
data=$( cat $file | cut -f1 -d' ' | cut -f8 -d'_' | tr 'n' 't' )
content="$file $data"
echo $content >> result.csv
done
Explanation
FILES
contains all the input files.
Using cut
command we get the field (which contains the float number).
Using tr
we replace the tabs to new lines.result.csv
file contains your expected result.
add a comment |Â
up vote
0
down vote
up vote
0
down vote
Shell Script:
for file_number in 1..150
do
data=$( cat file$file_number.txt | cut -f1 -d' ' | cut -f8 -d'_' | tr 'n' 't' )
#echo $data
file_name="file$file_number.txt"
content="$file_name $data"
#echo $content
echo $content >> result.csv
done
result.csv
file contains the expected result.
EDIT: The following code is better
#!/bin/bash
FILES=/path/to/directory
for file in $FILES
do
data=$( cat $file | cut -f1 -d' ' | cut -f8 -d'_' | tr 'n' 't' )
content="$file $data"
echo $content >> result.csv
done
Explanation
FILES
contains all the input files.
Using cut
command we get the field (which contains the float number).
Using tr
we replace the tabs to new lines.result.csv
file contains your expected result.
Shell Script:
for file_number in 1..150
do
data=$( cat file$file_number.txt | cut -f1 -d' ' | cut -f8 -d'_' | tr 'n' 't' )
#echo $data
file_name="file$file_number.txt"
content="$file_name $data"
#echo $content
echo $content >> result.csv
done
result.csv
file contains the expected result.
EDIT: The following code is better
#!/bin/bash
FILES=/path/to/directory
for file in $FILES
do
data=$( cat $file | cut -f1 -d' ' | cut -f8 -d'_' | tr 'n' 't' )
content="$file $data"
echo $content >> result.csv
done
Explanation
FILES
contains all the input files.
Using cut
command we get the field (which contains the float number).
Using tr
we replace the tabs to new lines.result.csv
file contains your expected result.
edited Aug 21 at 13:31
answered Aug 21 at 13:10
Dipankar Nalui
38218
38218
add a comment |Â
add a comment |Â
up vote
0
down vote
With GNU awk (extended command):
awk -F '[ _]' '
/^[^ ]*_[^ _]* /
a[FILENAME]=a[FILENAME] " " $(NF-1)
END
for(i in a)print i,a[i]
' Filename*
May be executed as a one liner:
$ awk -F '[ _]' '/^[^ ]*_[^ _]* /a[FILENAME]=a[FILENAME] " " $(NF-1)ENDfor(i in a)print i,a[i]' Filename*
Filename1 60.3 37.2 0.6
Filename2 56.1 26.2 52.3
add a comment |Â
up vote
0
down vote
With GNU awk (extended command):
awk -F '[ _]' '
/^[^ ]*_[^ _]* /
a[FILENAME]=a[FILENAME] " " $(NF-1)
END
for(i in a)print i,a[i]
' Filename*
May be executed as a one liner:
$ awk -F '[ _]' '/^[^ ]*_[^ _]* /a[FILENAME]=a[FILENAME] " " $(NF-1)ENDfor(i in a)print i,a[i]' Filename*
Filename1 60.3 37.2 0.6
Filename2 56.1 26.2 52.3
add a comment |Â
up vote
0
down vote
up vote
0
down vote
With GNU awk (extended command):
awk -F '[ _]' '
/^[^ ]*_[^ _]* /
a[FILENAME]=a[FILENAME] " " $(NF-1)
END
for(i in a)print i,a[i]
' Filename*
May be executed as a one liner:
$ awk -F '[ _]' '/^[^ ]*_[^ _]* /a[FILENAME]=a[FILENAME] " " $(NF-1)ENDfor(i in a)print i,a[i]' Filename*
Filename1 60.3 37.2 0.6
Filename2 56.1 26.2 52.3
With GNU awk (extended command):
awk -F '[ _]' '
/^[^ ]*_[^ _]* /
a[FILENAME]=a[FILENAME] " " $(NF-1)
END
for(i in a)print i,a[i]
' Filename*
May be executed as a one liner:
$ awk -F '[ _]' '/^[^ ]*_[^ _]* /a[FILENAME]=a[FILENAME] " " $(NF-1)ENDfor(i in a)print i,a[i]' Filename*
Filename1 60.3 37.2 0.6
Filename2 56.1 26.2 52.3
answered Aug 21 at 22:01
Isaac
7,1311835
7,1311835
add a comment |Â
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f463837%2ffilter-specific-numbers-from-multiple-files%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sorry, I've made a mistake. It should be one word. I edited it.
â k_a_r_o_l
Aug 21 at 11:54
again... what does
Filename1
mean?â msp9011
Aug 21 at 12:02
name of first file.
â k_a_r_o_l
Aug 21 at 12:07
What is a title line? How we detect them?
â andcoz
Aug 21 at 12:11