Compare Columns of genes in file and output the gene and number of column it is present in linux
Clash Royale CLAN TAG#URR8PPP
up vote
-2
down vote
favorite
I have 3 coumns of genes in a file like this
col1 col2 col3
CXCL9 CXCL9 CXCL9
MAP2K6 MAP2K6 MAP2K6
CXCL10 CXCL10 CXCL11
I want to match the 3 columns and see which gene is present in how many columns, The output desired is in the format as
CXCL9 3
MAP2K6 3
CXCL10 2
CXCL11 1
Can somebody help me, it would save me a lot of time
linux text-processing columns bioinformatics
add a comment |Â
up vote
-2
down vote
favorite
I have 3 coumns of genes in a file like this
col1 col2 col3
CXCL9 CXCL9 CXCL9
MAP2K6 MAP2K6 MAP2K6
CXCL10 CXCL10 CXCL11
I want to match the 3 columns and see which gene is present in how many columns, The output desired is in the format as
CXCL9 3
MAP2K6 3
CXCL10 2
CXCL11 1
Can somebody help me, it would save me a lot of time
linux text-processing columns bioinformatics
Is this homework? See unix.meta.stackexchange.com/questions/344/â¦
â mattdm
Jan 13 at 16:27
is the linecol1 col2 col3
really appears as the header line in your file?
â RomanPerekhrest
Jan 13 at 16:29
add a comment |Â
up vote
-2
down vote
favorite
up vote
-2
down vote
favorite
I have 3 coumns of genes in a file like this
col1 col2 col3
CXCL9 CXCL9 CXCL9
MAP2K6 MAP2K6 MAP2K6
CXCL10 CXCL10 CXCL11
I want to match the 3 columns and see which gene is present in how many columns, The output desired is in the format as
CXCL9 3
MAP2K6 3
CXCL10 2
CXCL11 1
Can somebody help me, it would save me a lot of time
linux text-processing columns bioinformatics
I have 3 coumns of genes in a file like this
col1 col2 col3
CXCL9 CXCL9 CXCL9
MAP2K6 MAP2K6 MAP2K6
CXCL10 CXCL10 CXCL11
I want to match the 3 columns and see which gene is present in how many columns, The output desired is in the format as
CXCL9 3
MAP2K6 3
CXCL10 2
CXCL11 1
Can somebody help me, it would save me a lot of time
linux text-processing columns bioinformatics
edited Jan 13 at 17:40
Jeff Schaller
31.8k848109
31.8k848109
asked Jan 13 at 16:11
saamar rajput
12
12
Is this homework? See unix.meta.stackexchange.com/questions/344/â¦
â mattdm
Jan 13 at 16:27
is the linecol1 col2 col3
really appears as the header line in your file?
â RomanPerekhrest
Jan 13 at 16:29
add a comment |Â
Is this homework? See unix.meta.stackexchange.com/questions/344/â¦
â mattdm
Jan 13 at 16:27
is the linecol1 col2 col3
really appears as the header line in your file?
â RomanPerekhrest
Jan 13 at 16:29
Is this homework? See unix.meta.stackexchange.com/questions/344/â¦
â mattdm
Jan 13 at 16:27
Is this homework? See unix.meta.stackexchange.com/questions/344/â¦
â mattdm
Jan 13 at 16:27
is the line
col1 col2 col3
really appears as the header line in your file?â RomanPerekhrest
Jan 13 at 16:29
is the line
col1 col2 col3
really appears as the header line in your file?â RomanPerekhrest
Jan 13 at 16:29
add a comment |Â
3 Answers
3
active
oldest
votes
up vote
1
down vote
sed
+ sort
+ uniq
solution:
sed 's/[[:space:]]+/n/g' file | sort | uniq -c
The output:
2 CXCL10
1 CXCL11
3 CXCL9
3 MAP2K6
What about the col1, col2, col3 headings?
â Mukesh Sai Kumar
Jan 13 at 16:24
@MukeshSaiKumar, I suppose that they were just for demonstration purpose
â RomanPerekhrest
Jan 13 at 16:25
. . . perhapssed -n '1!s/[[:space:]]+/n/gp'
?
â steeldriver
Jan 13 at 16:26
add a comment |Â
up vote
0
down vote
If there are no spaces between the names of genes, and the names of the columns follow the pattern you gave, then you may use this following script as a hint:
#!/bin/bash
for i in `cat genes.txt`; do
[[ $i == "col"* ]] || echo $i;
done | sort | uniq -c
add a comment |Â
up vote
0
down vote
Awk
solution:
awk ' for(i=1;i<=NF;i++) a[$i]++ END for(i in a) print i, a[i] ' file
The output:
CXCL11 1
MAP2K6 3
CXCL9 3
CXCL10 2
add a comment |Â
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
1
down vote
sed
+ sort
+ uniq
solution:
sed 's/[[:space:]]+/n/g' file | sort | uniq -c
The output:
2 CXCL10
1 CXCL11
3 CXCL9
3 MAP2K6
What about the col1, col2, col3 headings?
â Mukesh Sai Kumar
Jan 13 at 16:24
@MukeshSaiKumar, I suppose that they were just for demonstration purpose
â RomanPerekhrest
Jan 13 at 16:25
. . . perhapssed -n '1!s/[[:space:]]+/n/gp'
?
â steeldriver
Jan 13 at 16:26
add a comment |Â
up vote
1
down vote
sed
+ sort
+ uniq
solution:
sed 's/[[:space:]]+/n/g' file | sort | uniq -c
The output:
2 CXCL10
1 CXCL11
3 CXCL9
3 MAP2K6
What about the col1, col2, col3 headings?
â Mukesh Sai Kumar
Jan 13 at 16:24
@MukeshSaiKumar, I suppose that they were just for demonstration purpose
â RomanPerekhrest
Jan 13 at 16:25
. . . perhapssed -n '1!s/[[:space:]]+/n/gp'
?
â steeldriver
Jan 13 at 16:26
add a comment |Â
up vote
1
down vote
up vote
1
down vote
sed
+ sort
+ uniq
solution:
sed 's/[[:space:]]+/n/g' file | sort | uniq -c
The output:
2 CXCL10
1 CXCL11
3 CXCL9
3 MAP2K6
sed
+ sort
+ uniq
solution:
sed 's/[[:space:]]+/n/g' file | sort | uniq -c
The output:
2 CXCL10
1 CXCL11
3 CXCL9
3 MAP2K6
answered Jan 13 at 16:22
RomanPerekhrest
22.4k12145
22.4k12145
What about the col1, col2, col3 headings?
â Mukesh Sai Kumar
Jan 13 at 16:24
@MukeshSaiKumar, I suppose that they were just for demonstration purpose
â RomanPerekhrest
Jan 13 at 16:25
. . . perhapssed -n '1!s/[[:space:]]+/n/gp'
?
â steeldriver
Jan 13 at 16:26
add a comment |Â
What about the col1, col2, col3 headings?
â Mukesh Sai Kumar
Jan 13 at 16:24
@MukeshSaiKumar, I suppose that they were just for demonstration purpose
â RomanPerekhrest
Jan 13 at 16:25
. . . perhapssed -n '1!s/[[:space:]]+/n/gp'
?
â steeldriver
Jan 13 at 16:26
What about the col1, col2, col3 headings?
â Mukesh Sai Kumar
Jan 13 at 16:24
What about the col1, col2, col3 headings?
â Mukesh Sai Kumar
Jan 13 at 16:24
@MukeshSaiKumar, I suppose that they were just for demonstration purpose
â RomanPerekhrest
Jan 13 at 16:25
@MukeshSaiKumar, I suppose that they were just for demonstration purpose
â RomanPerekhrest
Jan 13 at 16:25
. . . perhaps
sed -n '1!s/[[:space:]]+/n/gp'
?â steeldriver
Jan 13 at 16:26
. . . perhaps
sed -n '1!s/[[:space:]]+/n/gp'
?â steeldriver
Jan 13 at 16:26
add a comment |Â
up vote
0
down vote
If there are no spaces between the names of genes, and the names of the columns follow the pattern you gave, then you may use this following script as a hint:
#!/bin/bash
for i in `cat genes.txt`; do
[[ $i == "col"* ]] || echo $i;
done | sort | uniq -c
add a comment |Â
up vote
0
down vote
If there are no spaces between the names of genes, and the names of the columns follow the pattern you gave, then you may use this following script as a hint:
#!/bin/bash
for i in `cat genes.txt`; do
[[ $i == "col"* ]] || echo $i;
done | sort | uniq -c
add a comment |Â
up vote
0
down vote
up vote
0
down vote
If there are no spaces between the names of genes, and the names of the columns follow the pattern you gave, then you may use this following script as a hint:
#!/bin/bash
for i in `cat genes.txt`; do
[[ $i == "col"* ]] || echo $i;
done | sort | uniq -c
If there are no spaces between the names of genes, and the names of the columns follow the pattern you gave, then you may use this following script as a hint:
#!/bin/bash
for i in `cat genes.txt`; do
[[ $i == "col"* ]] || echo $i;
done | sort | uniq -c
answered Jan 13 at 16:23
Mukesh Sai Kumar
27819
27819
add a comment |Â
add a comment |Â
up vote
0
down vote
Awk
solution:
awk ' for(i=1;i<=NF;i++) a[$i]++ END for(i in a) print i, a[i] ' file
The output:
CXCL11 1
MAP2K6 3
CXCL9 3
CXCL10 2
add a comment |Â
up vote
0
down vote
Awk
solution:
awk ' for(i=1;i<=NF;i++) a[$i]++ END for(i in a) print i, a[i] ' file
The output:
CXCL11 1
MAP2K6 3
CXCL9 3
CXCL10 2
add a comment |Â
up vote
0
down vote
up vote
0
down vote
Awk
solution:
awk ' for(i=1;i<=NF;i++) a[$i]++ END for(i in a) print i, a[i] ' file
The output:
CXCL11 1
MAP2K6 3
CXCL9 3
CXCL10 2
Awk
solution:
awk ' for(i=1;i<=NF;i++) a[$i]++ END for(i in a) print i, a[i] ' file
The output:
CXCL11 1
MAP2K6 3
CXCL9 3
CXCL10 2
answered Jan 13 at 16:24
RomanPerekhrest
22.4k12145
22.4k12145
add a comment |Â
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f416842%2fcompare-columns-of-genes-in-file-and-output-the-gene-and-number-of-column-it-is%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Is this homework? See unix.meta.stackexchange.com/questions/344/â¦
â mattdm
Jan 13 at 16:27
is the line
col1 col2 col3
really appears as the header line in your file?â RomanPerekhrest
Jan 13 at 16:29