Filter specific numbers from multiple files

up vote
-2
down vote

favorite

I have multiple files (apx. 150) that look like this:

reconstructed_hap_4_Local_nt_haplo_freq_60.3 GGGCAACTGGGCCAAGGTCGCTATCATCATGGTTATGTTTTCAGGGGTCGATGCCAATACATATATCACCGGTGGCAAAGCAGCTCAAACTGCCAGAGGCCTTGTTGGCTGGTTTAATCCGGGTCCCAAACAGAACCTGCAGCTGGTCAACACCAATGGCTCGTGGCA
reconstructed_hap_6_Local_nt_haplo_freq_37.2 GGGCAACTGGGCCAAGGTCGCTATCATCATGGTTATGTTTTCAGGGGTCGATGCCGAAACATATGCCTCCGGTGGCAGTGCAGCTCGTAATACCTG-GGCCTTTCTAGCTTGTTTAGTTCGGGTCCCAAACAGAGCCTGCAGCTGGTCAACACCAATGGCTCGTGGCA
reconstructed_hap_1_Local_nt_haplo_freq_0.6 GGGCAACTGGGCCAAGGTCGCTATCATCATGGTTATGTTTTCAGGGGTCGATGCCAATACATATATCACCGGTGGCAAAGCAGCTCAAACTGCCAGAGGCCTTGTTTGGCTGTTTAATCCGGGTCCCAAACAGAACCTGCAGCTGGTCAACACCAATGGCTCGTGGCA

Each file has different number of lines.

I would like to filter from each files numbers from title line after "freq_"

In this example, I would like to filter: 60.3, 37.2, 0.6

The most preferred output should be a CSV file having each sample name

Filename1 60.3 37.2 0.6 
Filename2 56.1 26.2 52.3 42.1
Filename3 2.5 1.2

Do you have any solutions?

edited Aug 21 at 11:53

asked Aug 21 at 11:35

k_a_r_o_l

122

Sorry, I've made a mistake. It should be one word. I edited it.
â€“Â k_a_r_o_l
Aug 21 at 11:54

again... what does Filename1 mean?
â€“Â msp9011
Aug 21 at 12:02

name of first file.
â€“Â k_a_r_o_l
Aug 21 at 12:07

What is a title line? How we detect them?
â€“Â andcoz
Aug 21 at 12:11

add a commentÂ |Â

up vote
-2
down vote

favorite

I have multiple files (apx. 150) that look like this:

reconstructed_hap_4_Local_nt_haplo_freq_60.3 GGGCAACTGGGCCAAGGTCGCTATCATCATGGTTATGTTTTCAGGGGTCGATGCCAATACATATATCACCGGTGGCAAAGCAGCTCAAACTGCCAGAGGCCTTGTTGGCTGGTTTAATCCGGGTCCCAAACAGAACCTGCAGCTGGTCAACACCAATGGCTCGTGGCA
reconstructed_hap_6_Local_nt_haplo_freq_37.2 GGGCAACTGGGCCAAGGTCGCTATCATCATGGTTATGTTTTCAGGGGTCGATGCCGAAACATATGCCTCCGGTGGCAGTGCAGCTCGTAATACCTG-GGCCTTTCTAGCTTGTTTAGTTCGGGTCCCAAACAGAGCCTGCAGCTGGTCAACACCAATGGCTCGTGGCA
reconstructed_hap_1_Local_nt_haplo_freq_0.6 GGGCAACTGGGCCAAGGTCGCTATCATCATGGTTATGTTTTCAGGGGTCGATGCCAATACATATATCACCGGTGGCAAAGCAGCTCAAACTGCCAGAGGCCTTGTTTGGCTGTTTAATCCGGGTCCCAAACAGAACCTGCAGCTGGTCAACACCAATGGCTCGTGGCA

Each file has different number of lines.

I would like to filter from each files numbers from title line after "freq_"

In this example, I would like to filter: 60.3, 37.2, 0.6

The most preferred output should be a CSV file having each sample name

Filename1 60.3 37.2 0.6 
Filename2 56.1 26.2 52.3 42.1
Filename3 2.5 1.2

Do you have any solutions?

edited Aug 21 at 11:53

asked Aug 21 at 11:35

k_a_r_o_l

122

Sorry, I've made a mistake. It should be one word. I edited it.
â€“Â k_a_r_o_l
Aug 21 at 11:54

again... what does Filename1 mean?
â€“Â msp9011
Aug 21 at 12:02

name of first file.
â€“Â k_a_r_o_l
Aug 21 at 12:07

What is a title line? How we detect them?
â€“Â andcoz
Aug 21 at 12:11

add a commentÂ |Â

up vote
-2
down vote

favorite

I have multiple files (apx. 150) that look like this:

reconstructed_hap_4_Local_nt_haplo_freq_60.3 GGGCAACTGGGCCAAGGTCGCTATCATCATGGTTATGTTTTCAGGGGTCGATGCCAATACATATATCACCGGTGGCAAAGCAGCTCAAACTGCCAGAGGCCTTGTTGGCTGGTTTAATCCGGGTCCCAAACAGAACCTGCAGCTGGTCAACACCAATGGCTCGTGGCA
reconstructed_hap_6_Local_nt_haplo_freq_37.2 GGGCAACTGGGCCAAGGTCGCTATCATCATGGTTATGTTTTCAGGGGTCGATGCCGAAACATATGCCTCCGGTGGCAGTGCAGCTCGTAATACCTG-GGCCTTTCTAGCTTGTTTAGTTCGGGTCCCAAACAGAGCCTGCAGCTGGTCAACACCAATGGCTCGTGGCA
reconstructed_hap_1_Local_nt_haplo_freq_0.6 GGGCAACTGGGCCAAGGTCGCTATCATCATGGTTATGTTTTCAGGGGTCGATGCCAATACATATATCACCGGTGGCAAAGCAGCTCAAACTGCCAGAGGCCTTGTTTGGCTGTTTAATCCGGGTCCCAAACAGAACCTGCAGCTGGTCAACACCAATGGCTCGTGGCA

Each file has different number of lines.

I would like to filter from each files numbers from title line after "freq_"

In this example, I would like to filter: 60.3, 37.2, 0.6

The most preferred output should be a CSV file having each sample name

Filename1 60.3 37.2 0.6 
Filename2 56.1 26.2 52.3 42.1
Filename3 2.5 1.2

Do you have any solutions?

edited Aug 21 at 11:53

asked Aug 21 at 11:35

k_a_r_o_l

122

I have multiple files (apx. 150) that look like this:

reconstructed_hap_4_Local_nt_haplo_freq_60.3 GGGCAACTGGGCCAAGGTCGCTATCATCATGGTTATGTTTTCAGGGGTCGATGCCAATACATATATCACCGGTGGCAAAGCAGCTCAAACTGCCAGAGGCCTTGTTGGCTGGTTTAATCCGGGTCCCAAACAGAACCTGCAGCTGGTCAACACCAATGGCTCGTGGCA
reconstructed_hap_6_Local_nt_haplo_freq_37.2 GGGCAACTGGGCCAAGGTCGCTATCATCATGGTTATGTTTTCAGGGGTCGATGCCGAAACATATGCCTCCGGTGGCAGTGCAGCTCGTAATACCTG-GGCCTTTCTAGCTTGTTTAGTTCGGGTCCCAAACAGAGCCTGCAGCTGGTCAACACCAATGGCTCGTGGCA
reconstructed_hap_1_Local_nt_haplo_freq_0.6 GGGCAACTGGGCCAAGGTCGCTATCATCATGGTTATGTTTTCAGGGGTCGATGCCAATACATATATCACCGGTGGCAAAGCAGCTCAAACTGCCAGAGGCCTTGTTTGGCTGTTTAATCCGGGTCCCAAACAGAACCTGCAGCTGGTCAACACCAATGGCTCGTGGCA

Each file has different number of lines.

I would like to filter from each files numbers from title line after "freq_"

In this example, I would like to filter: 60.3, 37.2, 0.6

The most preferred output should be a CSV file having each sample name

Filename1 60.3 37.2 0.6 
Filename2 56.1 26.2 52.3 42.1
Filename3 2.5 1.2

Do you have any solutions?

awk r

edited Aug 21 at 11:53

asked Aug 21 at 11:35

k_a_r_o_l

122

edited Aug 21 at 11:53

asked Aug 21 at 11:35

k_a_r_o_l

122

edited Aug 21 at 11:53

asked Aug 21 at 11:35

k_a_r_o_l

122

asked Aug 21 at 11:35

k_a_r_o_l

122

asked Aug 21 at 11:35

k_a_r_o_l

122

Sorry, I've made a mistake. It should be one word. I edited it.
â€“Â k_a_r_o_l
Aug 21 at 11:54

again... what does Filename1 mean?
â€“Â msp9011
Aug 21 at 12:02

name of first file.
â€“Â k_a_r_o_l
Aug 21 at 12:07

What is a title line? How we detect them?
â€“Â andcoz
Aug 21 at 12:11

add a commentÂ |Â

Sorry, I've made a mistake. It should be one word. I edited it.
â€“Â k_a_r_o_l
Aug 21 at 11:54

again... what does Filename1 mean?
â€“Â msp9011
Aug 21 at 12:02

name of first file.
â€“Â k_a_r_o_l
Aug 21 at 12:07

What is a title line? How we detect them?
â€“Â andcoz
Aug 21 at 12:11

Sorry, I've made a mistake. It should be one word. I edited it.
â€“Â k_a_r_o_l
Aug 21 at 11:54

again... what does Filename1 mean?
â€“Â msp9011
Aug 21 at 12:02

name of first file.
â€“Â k_a_r_o_l
Aug 21 at 12:07

What is a title line? How we detect them?
â€“Â andcoz
Aug 21 at 12:11

add a commentÂ |Â

4 Answers
4

active

oldest

votes

up vote
0
down vote

Try this,

cd /path/to/directory
for i in `ls`
do
 VALUE=`awk 'print $1' $i | awk -F '_' 'print $NF' | tr 'n' 't'`
 echo -e "$it$VALUE" 
done

answered Aug 21 at 12:16

msp9011

3,46643862

add a commentÂ |Â

up vote
0
down vote

With GNU Awk:

awk '
 BEGINFILE i=0 
 
 n=split($1,a,"_")
 freqs[i++] = a[n]
 
 ENDFILE 
 printf FILENAME
 for (j=0;j<i;j++) printf("t%s", freqs[j])
 printf "n"
 delete freqs
 
' Filename*

Ex.

$ awk 'BEGINFILEi=0; n=split($1,a,"_"); freqs[i++] = a[n] ENDFILEprintf FILENAME; for (j=0;j<i;j++) printf("t%s", freqs[j]); printf "n"; delete freqs' Filename*
Filename1 60.3 37.2 0.6
Filename2 56.1 26.2 52.3

edited Aug 21 at 12:33

answered Aug 21 at 12:28

steeldriver

32.2k34979

add a commentÂ |Â

up vote
0
down vote

Shell Script:

for file_number in 1..150
do
 data=$( cat file$file_number.txt | cut -f1 -d' ' | cut -f8 -d'_' | tr 'n' 't' )
 #echo $data
 file_name="file$file_number.txt"
 content="$file_name $data"
 #echo $content
 echo $content >> result.csv
done

result.csv file contains the expected result.

EDIT: The following code is better

#!/bin/bash
FILES=/path/to/directory
for file in $FILES
do
 data=$( cat $file | cut -f1 -d' ' | cut -f8 -d'_' | tr 'n' 't' )
 content="$file $data"
 echo $content >> result.csv
done

Explanation

FILES contains all the input files.
Using cut command we get the field (which contains the float number).
Using tr we replace the tabs to new lines.
result.csv file contains your expected result.

edited Aug 21 at 13:31

answered Aug 21 at 13:10

Dipankar Nalui

38218

add a commentÂ |Â

up vote
0
down vote

With GNU awk (extended command):

awk -F '[ _]' '
 /^[^ ]*_[^ _]* /
 a[FILENAME]=a[FILENAME] " " $(NF-1)
 
 END
 for(i in a)print i,a[i]
 
 ' Filename*

May be executed as a one liner:

$ awk -F '[ _]' '/^[^ ]*_[^ _]* /a[FILENAME]=a[FILENAME] " " $(NF-1)ENDfor(i in a)print i,a[i]' Filename*

Filename1 60.3 37.2 0.6
Filename2 56.1 26.2 52.3

answered Aug 21 at 22:01

Isaac

7,1311835

add a commentÂ |Â

Your Answer

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f463837%2ffilter-specific-numbers-from-multiple-files%23new-answer', 'question_page');

);

Post as a guest

Name

4 Answers
4

active

oldest

votes

4 Answers
4

active

oldest

votes

up vote
0
down vote

Try this,

cd /path/to/directory
for i in `ls`
do
 VALUE=`awk 'print $1' $i | awk -F '_' 'print $NF' | tr 'n' 't'`
 echo -e "$it$VALUE" 
done

answered Aug 21 at 12:16

msp9011

3,46643862

add a commentÂ |Â

up vote
0
down vote

Try this,

cd /path/to/directory
for i in `ls`
do
 VALUE=`awk 'print $1' $i | awk -F '_' 'print $NF' | tr 'n' 't'`
 echo -e "$it$VALUE" 
done

answered Aug 21 at 12:16

msp9011

3,46643862

add a commentÂ |Â

up vote
0
down vote

Try this,

cd /path/to/directory
for i in `ls`
do
 VALUE=`awk 'print $1' $i | awk -F '_' 'print $NF' | tr 'n' 't'`
 echo -e "$it$VALUE" 
done

answered Aug 21 at 12:16

msp9011

3,46643862

Try this,

cd /path/to/directory
for i in `ls`
do
 VALUE=`awk 'print $1' $i | awk -F '_' 'print $NF' | tr 'n' 't'`
 echo -e "$it$VALUE" 
done

answered Aug 21 at 12:16

msp9011

3,46643862

answered Aug 21 at 12:16

msp9011

3,46643862

answered Aug 21 at 12:16

msp9011

3,46643862

answered Aug 21 at 12:16

msp9011

3,46643862

add a commentÂ |Â

up vote
0
down vote

With GNU Awk:

awk '
 BEGINFILE i=0 
 
 n=split($1,a,"_")
 freqs[i++] = a[n]
 
 ENDFILE 
 printf FILENAME
 for (j=0;j<i;j++) printf("t%s", freqs[j])
 printf "n"
 delete freqs
 
' Filename*

Ex.

$ awk 'BEGINFILEi=0; n=split($1,a,"_"); freqs[i++] = a[n] ENDFILEprintf FILENAME; for (j=0;j<i;j++) printf("t%s", freqs[j]); printf "n"; delete freqs' Filename*
Filename1 60.3 37.2 0.6
Filename2 56.1 26.2 52.3

edited Aug 21 at 12:33

answered Aug 21 at 12:28

steeldriver

32.2k34979

add a commentÂ |Â

up vote
0
down vote

With GNU Awk:

awk '
 BEGINFILE i=0 
 
 n=split($1,a,"_")
 freqs[i++] = a[n]
 
 ENDFILE 
 printf FILENAME
 for (j=0;j<i;j++) printf("t%s", freqs[j])
 printf "n"
 delete freqs
 
' Filename*

Ex.

$ awk 'BEGINFILEi=0; n=split($1,a,"_"); freqs[i++] = a[n] ENDFILEprintf FILENAME; for (j=0;j<i;j++) printf("t%s", freqs[j]); printf "n"; delete freqs' Filename*
Filename1 60.3 37.2 0.6
Filename2 56.1 26.2 52.3

edited Aug 21 at 12:33

answered Aug 21 at 12:28

steeldriver

32.2k34979

add a commentÂ |Â

up vote
0
down vote

With GNU Awk:

awk '
 BEGINFILE i=0 
 
 n=split($1,a,"_")
 freqs[i++] = a[n]
 
 ENDFILE 
 printf FILENAME
 for (j=0;j<i;j++) printf("t%s", freqs[j])
 printf "n"
 delete freqs
 
' Filename*

Ex.

$ awk 'BEGINFILEi=0; n=split($1,a,"_"); freqs[i++] = a[n] ENDFILEprintf FILENAME; for (j=0;j<i;j++) printf("t%s", freqs[j]); printf "n"; delete freqs' Filename*
Filename1 60.3 37.2 0.6
Filename2 56.1 26.2 52.3

edited Aug 21 at 12:33

answered Aug 21 at 12:28

steeldriver

32.2k34979

With GNU Awk:

awk '
 BEGINFILE i=0 
 
 n=split($1,a,"_")
 freqs[i++] = a[n]
 
 ENDFILE 
 printf FILENAME
 for (j=0;j<i;j++) printf("t%s", freqs[j])
 printf "n"
 delete freqs
 
' Filename*

Ex.

$ awk 'BEGINFILEi=0; n=split($1,a,"_"); freqs[i++] = a[n] ENDFILEprintf FILENAME; for (j=0;j<i;j++) printf("t%s", freqs[j]); printf "n"; delete freqs' Filename*
Filename1 60.3 37.2 0.6
Filename2 56.1 26.2 52.3

edited Aug 21 at 12:33

answered Aug 21 at 12:28

steeldriver

32.2k34979

edited Aug 21 at 12:33

answered Aug 21 at 12:28

steeldriver

32.2k34979

answered Aug 21 at 12:28

steeldriver

32.2k34979

answered Aug 21 at 12:28

steeldriver

32.2k34979

add a commentÂ |Â

up vote
0
down vote

Shell Script:

for file_number in 1..150
do
 data=$( cat file$file_number.txt | cut -f1 -d' ' | cut -f8 -d'_' | tr 'n' 't' )
 #echo $data
 file_name="file$file_number.txt"
 content="$file_name $data"
 #echo $content
 echo $content >> result.csv
done

result.csv file contains the expected result.

EDIT: The following code is better

#!/bin/bash
FILES=/path/to/directory
for file in $FILES
do
 data=$( cat $file | cut -f1 -d' ' | cut -f8 -d'_' | tr 'n' 't' )
 content="$file $data"
 echo $content >> result.csv
done

Explanation

edited Aug 21 at 13:31

answered Aug 21 at 13:10

Dipankar Nalui

38218

add a commentÂ |Â

up vote
0
down vote

Shell Script:

for file_number in 1..150
do
 data=$( cat file$file_number.txt | cut -f1 -d' ' | cut -f8 -d'_' | tr 'n' 't' )
 #echo $data
 file_name="file$file_number.txt"
 content="$file_name $data"
 #echo $content
 echo $content >> result.csv
done

result.csv file contains the expected result.

EDIT: The following code is better

#!/bin/bash
FILES=/path/to/directory
for file in $FILES
do
 data=$( cat $file | cut -f1 -d' ' | cut -f8 -d'_' | tr 'n' 't' )
 content="$file $data"
 echo $content >> result.csv
done

Explanation

edited Aug 21 at 13:31

answered Aug 21 at 13:10

Dipankar Nalui

38218

add a commentÂ |Â

up vote
0
down vote

Shell Script:

for file_number in 1..150
do
 data=$( cat file$file_number.txt | cut -f1 -d' ' | cut -f8 -d'_' | tr 'n' 't' )
 #echo $data
 file_name="file$file_number.txt"
 content="$file_name $data"
 #echo $content
 echo $content >> result.csv
done

result.csv file contains the expected result.

EDIT: The following code is better

#!/bin/bash
FILES=/path/to/directory
for file in $FILES
do
 data=$( cat $file | cut -f1 -d' ' | cut -f8 -d'_' | tr 'n' 't' )
 content="$file $data"
 echo $content >> result.csv
done

Explanation

edited Aug 21 at 13:31

answered Aug 21 at 13:10

Dipankar Nalui

38218

Shell Script:

for file_number in 1..150
do
 data=$( cat file$file_number.txt | cut -f1 -d' ' | cut -f8 -d'_' | tr 'n' 't' )
 #echo $data
 file_name="file$file_number.txt"
 content="$file_name $data"
 #echo $content
 echo $content >> result.csv
done

result.csv file contains the expected result.

EDIT: The following code is better

#!/bin/bash
FILES=/path/to/directory
for file in $FILES
do
 data=$( cat $file | cut -f1 -d' ' | cut -f8 -d'_' | tr 'n' 't' )
 content="$file $data"
 echo $content >> result.csv
done

Explanation

edited Aug 21 at 13:31

answered Aug 21 at 13:10

Dipankar Nalui

38218

edited Aug 21 at 13:31

answered Aug 21 at 13:10

Dipankar Nalui

38218

answered Aug 21 at 13:10

Dipankar Nalui

38218

answered Aug 21 at 13:10

Dipankar Nalui

38218

add a commentÂ |Â

up vote
0
down vote

With GNU awk (extended command):

awk -F '[ _]' '
 /^[^ ]*_[^ _]* /
 a[FILENAME]=a[FILENAME] " " $(NF-1)
 
 END
 for(i in a)print i,a[i]
 
 ' Filename*

May be executed as a one liner:

$ awk -F '[ _]' '/^[^ ]*_[^ _]* /a[FILENAME]=a[FILENAME] " " $(NF-1)ENDfor(i in a)print i,a[i]' Filename*

Filename1 60.3 37.2 0.6
Filename2 56.1 26.2 52.3

answered Aug 21 at 22:01

Isaac

7,1311835

add a commentÂ |Â

up vote
0
down vote

With GNU awk (extended command):

awk -F '[ _]' '
 /^[^ ]*_[^ _]* /
 a[FILENAME]=a[FILENAME] " " $(NF-1)
 
 END
 for(i in a)print i,a[i]
 
 ' Filename*

May be executed as a one liner:

$ awk -F '[ _]' '/^[^ ]*_[^ _]* /a[FILENAME]=a[FILENAME] " " $(NF-1)ENDfor(i in a)print i,a[i]' Filename*

Filename1 60.3 37.2 0.6
Filename2 56.1 26.2 52.3

answered Aug 21 at 22:01

Isaac

7,1311835

add a commentÂ |Â

up vote
0
down vote

With GNU awk (extended command):

awk -F '[ _]' '
 /^[^ ]*_[^ _]* /
 a[FILENAME]=a[FILENAME] " " $(NF-1)
 
 END
 for(i in a)print i,a[i]
 
 ' Filename*

May be executed as a one liner:

$ awk -F '[ _]' '/^[^ ]*_[^ _]* /a[FILENAME]=a[FILENAME] " " $(NF-1)ENDfor(i in a)print i,a[i]' Filename*

Filename1 60.3 37.2 0.6
Filename2 56.1 26.2 52.3

answered Aug 21 at 22:01

Isaac

7,1311835

With GNU awk (extended command):

awk -F '[ _]' '
 /^[^ ]*_[^ _]* /
 a[FILENAME]=a[FILENAME] " " $(NF-1)
 
 END
 for(i in a)print i,a[i]
 
 ' Filename*

May be executed as a one liner:

$ awk -F '[ _]' '/^[^ ]*_[^ _]* /a[FILENAME]=a[FILENAME] " " $(NF-1)ENDfor(i in a)print i,a[i]' Filename*

Filename1 60.3 37.2 0.6
Filename2 56.1 26.2 52.3

answered Aug 21 at 22:01

Isaac

7,1311835

answered Aug 21 at 22:01

Isaac

7,1311835

answered Aug 21 at 22:01

Isaac

7,1311835

answered Aug 21 at 22:01

Isaac

7,1311835

add a commentÂ |Â

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

搜尋此網誌

mjhjmtu