Compare Columns of genes in file and output the gene and number of column it is present in linux

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
-2
down vote

favorite












I have 3 coumns of genes in a file like this



col1 col2 col3
CXCL9 CXCL9 CXCL9
MAP2K6 MAP2K6 MAP2K6
CXCL10 CXCL10 CXCL11


I want to match the 3 columns and see which gene is present in how many columns, The output desired is in the format as



CXCL9 3
MAP2K6 3
CXCL10 2
CXCL11 1


Can somebody help me, it would save me a lot of time







share|improve this question






















  • Is this homework? See unix.meta.stackexchange.com/questions/344/…
    – mattdm
    Jan 13 at 16:27










  • is the line col1 col2 col3 really appears as the header line in your file?
    – RomanPerekhrest
    Jan 13 at 16:29














up vote
-2
down vote

favorite












I have 3 coumns of genes in a file like this



col1 col2 col3
CXCL9 CXCL9 CXCL9
MAP2K6 MAP2K6 MAP2K6
CXCL10 CXCL10 CXCL11


I want to match the 3 columns and see which gene is present in how many columns, The output desired is in the format as



CXCL9 3
MAP2K6 3
CXCL10 2
CXCL11 1


Can somebody help me, it would save me a lot of time







share|improve this question






















  • Is this homework? See unix.meta.stackexchange.com/questions/344/…
    – mattdm
    Jan 13 at 16:27










  • is the line col1 col2 col3 really appears as the header line in your file?
    – RomanPerekhrest
    Jan 13 at 16:29












up vote
-2
down vote

favorite









up vote
-2
down vote

favorite











I have 3 coumns of genes in a file like this



col1 col2 col3
CXCL9 CXCL9 CXCL9
MAP2K6 MAP2K6 MAP2K6
CXCL10 CXCL10 CXCL11


I want to match the 3 columns and see which gene is present in how many columns, The output desired is in the format as



CXCL9 3
MAP2K6 3
CXCL10 2
CXCL11 1


Can somebody help me, it would save me a lot of time







share|improve this question














I have 3 coumns of genes in a file like this



col1 col2 col3
CXCL9 CXCL9 CXCL9
MAP2K6 MAP2K6 MAP2K6
CXCL10 CXCL10 CXCL11


I want to match the 3 columns and see which gene is present in how many columns, The output desired is in the format as



CXCL9 3
MAP2K6 3
CXCL10 2
CXCL11 1


Can somebody help me, it would save me a lot of time









share|improve this question













share|improve this question




share|improve this question








edited Jan 13 at 17:40









Jeff Schaller

31.8k848109




31.8k848109










asked Jan 13 at 16:11









saamar rajput

12




12











  • Is this homework? See unix.meta.stackexchange.com/questions/344/…
    – mattdm
    Jan 13 at 16:27










  • is the line col1 col2 col3 really appears as the header line in your file?
    – RomanPerekhrest
    Jan 13 at 16:29
















  • Is this homework? See unix.meta.stackexchange.com/questions/344/…
    – mattdm
    Jan 13 at 16:27










  • is the line col1 col2 col3 really appears as the header line in your file?
    – RomanPerekhrest
    Jan 13 at 16:29















Is this homework? See unix.meta.stackexchange.com/questions/344/…
– mattdm
Jan 13 at 16:27




Is this homework? See unix.meta.stackexchange.com/questions/344/…
– mattdm
Jan 13 at 16:27












is the line col1 col2 col3 really appears as the header line in your file?
– RomanPerekhrest
Jan 13 at 16:29




is the line col1 col2 col3 really appears as the header line in your file?
– RomanPerekhrest
Jan 13 at 16:29










3 Answers
3






active

oldest

votes

















up vote
1
down vote













sed + sort + uniq solution:



sed 's/[[:space:]]+/n/g' file | sort | uniq -c


The output:



 2 CXCL10
1 CXCL11
3 CXCL9
3 MAP2K6





share|improve this answer




















  • What about the col1, col2, col3 headings?
    – Mukesh Sai Kumar
    Jan 13 at 16:24










  • @MukeshSaiKumar, I suppose that they were just for demonstration purpose
    – RomanPerekhrest
    Jan 13 at 16:25










  • . . . perhaps sed -n '1!s/[[:space:]]+/n/gp' ?
    – steeldriver
    Jan 13 at 16:26

















up vote
0
down vote













If there are no spaces between the names of genes, and the names of the columns follow the pattern you gave, then you may use this following script as a hint:





#!/bin/bash
for i in `cat genes.txt`; do
[[ $i == "col"* ]] || echo $i;
done | sort | uniq -c





share|improve this answer



























    up vote
    0
    down vote













    Awk solution:



    awk ' for(i=1;i<=NF;i++) a[$i]++ END for(i in a) print i, a[i] ' file


    The output:



    CXCL11 1
    MAP2K6 3
    CXCL9 3
    CXCL10 2





    share|improve this answer




















      Your Answer







      StackExchange.ready(function()
      var channelOptions =
      tags: "".split(" "),
      id: "106"
      ;
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function()
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled)
      StackExchange.using("snippets", function()
      createEditor();
      );

      else
      createEditor();

      );

      function createEditor()
      StackExchange.prepareEditor(
      heartbeatType: 'answer',
      convertImagesToLinks: false,
      noModals: false,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: null,
      bindNavPrevention: true,
      postfix: "",
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      );



      );








       

      draft saved


      draft discarded


















      StackExchange.ready(
      function ()
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f416842%2fcompare-columns-of-genes-in-file-and-output-the-gene-and-number-of-column-it-is%23new-answer', 'question_page');

      );

      Post as a guest






























      3 Answers
      3






      active

      oldest

      votes








      3 Answers
      3






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes








      up vote
      1
      down vote













      sed + sort + uniq solution:



      sed 's/[[:space:]]+/n/g' file | sort | uniq -c


      The output:



       2 CXCL10
      1 CXCL11
      3 CXCL9
      3 MAP2K6





      share|improve this answer




















      • What about the col1, col2, col3 headings?
        – Mukesh Sai Kumar
        Jan 13 at 16:24










      • @MukeshSaiKumar, I suppose that they were just for demonstration purpose
        – RomanPerekhrest
        Jan 13 at 16:25










      • . . . perhaps sed -n '1!s/[[:space:]]+/n/gp' ?
        – steeldriver
        Jan 13 at 16:26














      up vote
      1
      down vote













      sed + sort + uniq solution:



      sed 's/[[:space:]]+/n/g' file | sort | uniq -c


      The output:



       2 CXCL10
      1 CXCL11
      3 CXCL9
      3 MAP2K6





      share|improve this answer




















      • What about the col1, col2, col3 headings?
        – Mukesh Sai Kumar
        Jan 13 at 16:24










      • @MukeshSaiKumar, I suppose that they were just for demonstration purpose
        – RomanPerekhrest
        Jan 13 at 16:25










      • . . . perhaps sed -n '1!s/[[:space:]]+/n/gp' ?
        – steeldriver
        Jan 13 at 16:26












      up vote
      1
      down vote










      up vote
      1
      down vote









      sed + sort + uniq solution:



      sed 's/[[:space:]]+/n/g' file | sort | uniq -c


      The output:



       2 CXCL10
      1 CXCL11
      3 CXCL9
      3 MAP2K6





      share|improve this answer












      sed + sort + uniq solution:



      sed 's/[[:space:]]+/n/g' file | sort | uniq -c


      The output:



       2 CXCL10
      1 CXCL11
      3 CXCL9
      3 MAP2K6






      share|improve this answer












      share|improve this answer



      share|improve this answer










      answered Jan 13 at 16:22









      RomanPerekhrest

      22.4k12145




      22.4k12145











      • What about the col1, col2, col3 headings?
        – Mukesh Sai Kumar
        Jan 13 at 16:24










      • @MukeshSaiKumar, I suppose that they were just for demonstration purpose
        – RomanPerekhrest
        Jan 13 at 16:25










      • . . . perhaps sed -n '1!s/[[:space:]]+/n/gp' ?
        – steeldriver
        Jan 13 at 16:26
















      • What about the col1, col2, col3 headings?
        – Mukesh Sai Kumar
        Jan 13 at 16:24










      • @MukeshSaiKumar, I suppose that they were just for demonstration purpose
        – RomanPerekhrest
        Jan 13 at 16:25










      • . . . perhaps sed -n '1!s/[[:space:]]+/n/gp' ?
        – steeldriver
        Jan 13 at 16:26















      What about the col1, col2, col3 headings?
      – Mukesh Sai Kumar
      Jan 13 at 16:24




      What about the col1, col2, col3 headings?
      – Mukesh Sai Kumar
      Jan 13 at 16:24












      @MukeshSaiKumar, I suppose that they were just for demonstration purpose
      – RomanPerekhrest
      Jan 13 at 16:25




      @MukeshSaiKumar, I suppose that they were just for demonstration purpose
      – RomanPerekhrest
      Jan 13 at 16:25












      . . . perhaps sed -n '1!s/[[:space:]]+/n/gp' ?
      – steeldriver
      Jan 13 at 16:26




      . . . perhaps sed -n '1!s/[[:space:]]+/n/gp' ?
      – steeldriver
      Jan 13 at 16:26












      up vote
      0
      down vote













      If there are no spaces between the names of genes, and the names of the columns follow the pattern you gave, then you may use this following script as a hint:





      #!/bin/bash
      for i in `cat genes.txt`; do
      [[ $i == "col"* ]] || echo $i;
      done | sort | uniq -c





      share|improve this answer
























        up vote
        0
        down vote













        If there are no spaces between the names of genes, and the names of the columns follow the pattern you gave, then you may use this following script as a hint:





        #!/bin/bash
        for i in `cat genes.txt`; do
        [[ $i == "col"* ]] || echo $i;
        done | sort | uniq -c





        share|improve this answer






















          up vote
          0
          down vote










          up vote
          0
          down vote









          If there are no spaces between the names of genes, and the names of the columns follow the pattern you gave, then you may use this following script as a hint:





          #!/bin/bash
          for i in `cat genes.txt`; do
          [[ $i == "col"* ]] || echo $i;
          done | sort | uniq -c





          share|improve this answer












          If there are no spaces between the names of genes, and the names of the columns follow the pattern you gave, then you may use this following script as a hint:





          #!/bin/bash
          for i in `cat genes.txt`; do
          [[ $i == "col"* ]] || echo $i;
          done | sort | uniq -c






          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Jan 13 at 16:23









          Mukesh Sai Kumar

          27819




          27819




















              up vote
              0
              down vote













              Awk solution:



              awk ' for(i=1;i<=NF;i++) a[$i]++ END for(i in a) print i, a[i] ' file


              The output:



              CXCL11 1
              MAP2K6 3
              CXCL9 3
              CXCL10 2





              share|improve this answer
























                up vote
                0
                down vote













                Awk solution:



                awk ' for(i=1;i<=NF;i++) a[$i]++ END for(i in a) print i, a[i] ' file


                The output:



                CXCL11 1
                MAP2K6 3
                CXCL9 3
                CXCL10 2





                share|improve this answer






















                  up vote
                  0
                  down vote










                  up vote
                  0
                  down vote









                  Awk solution:



                  awk ' for(i=1;i<=NF;i++) a[$i]++ END for(i in a) print i, a[i] ' file


                  The output:



                  CXCL11 1
                  MAP2K6 3
                  CXCL9 3
                  CXCL10 2





                  share|improve this answer












                  Awk solution:



                  awk ' for(i=1;i<=NF;i++) a[$i]++ END for(i in a) print i, a[i] ' file


                  The output:



                  CXCL11 1
                  MAP2K6 3
                  CXCL9 3
                  CXCL10 2






                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Jan 13 at 16:24









                  RomanPerekhrest

                  22.4k12145




                  22.4k12145






















                       

                      draft saved


                      draft discarded


























                       


                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function ()
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f416842%2fcompare-columns-of-genes-in-file-and-output-the-gene-and-number-of-column-it-is%23new-answer', 'question_page');

                      );

                      Post as a guest













































































                      Popular posts from this blog

                      How to check contact read email or not when send email to Individual?

                      Displaying single band from multi-band raster using QGIS

                      How many registers does an x86_64 CPU actually have?