how many times a certain DNA base sequence occurs in a file

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
0
down vote

favorite












The assignment is to write a bash script named countmatches that will display the number of times a certain sequence, such as aac appears in a specific file. The script should expect at least two commands in which the first argument has to be the pathname of a file containing a valid DNA string which we are given. The remaining arguments are strings containing only the bases a, c, g, and t in any order. For each valid argument string, it will search the DNA string in the file and count how many non-overlapping occurrences of that argument string are in the DNA string.



An example sequence and output would be if the string aaccgtttgtaaccggaac is in a file named dnafile, then your script should work as follows



$ countmatches dnafile ttt



ttt 1



with the input being "count matches dnafile ttt" and the output being "ttt 1" showing that "ttt" appears once.



this is my script:



 #!/bin/bash

for /data/biocs/b/student.accounts/cs132/data/dna_textfiles

do

count=$grep -o '[acgt][acgt][acgt] /data/biocs/b/student.accounts/cs132/
dna_textfiles | wc -w

echo $/data/biocs/b/student.accounts/cs132/data/dna_textfiles $count

done


and this is the error I get



 [Osama.Chaudry07@cslab5 assignment3]$ ./ countmatches /data/biocs/b/student.accounts/cs132/
data/dna_textfiles aac

./countmatches: line 6: '/data/biocs/b/student.accounts/cs32/data/dna_textfiles': not a valid identifier









share|improve this question









New contributor




Chaudry Osama is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.















  • 2




    We're not a script-writing service, but people where will be happy to help you when you hit specific issues with a script you've written.
    – roaima
    2 hours ago










  • Oh. That screenshot is a script is it? Please don't post pictures of text. They're harder to read, impossible for people who need screenreaders, and not good for search engines.
    – roaima
    2 hours ago










  • Hello @Chaudry Osama. Would you please provide example of the content of file dna_textfiles and the expected output.
    – Goro
    2 hours ago










  • dna_textfiles is a file with nothing but a sequence of letters a, c , g, and t. This is the file for which we have to write a script that will show you how many times a certain sequence such as aac comes up.
    – Chaudry Osama
    2 hours ago










  • @Chaudry Osama so what you are looking for is a code that will help you to identify how many times a nucleotide is repeated in a DNA sequence? For example if the DNA sequence is "accacgactacc" the code must show you 2 counts of acc, correct?
    – Goro
    1 hour ago















up vote
0
down vote

favorite












The assignment is to write a bash script named countmatches that will display the number of times a certain sequence, such as aac appears in a specific file. The script should expect at least two commands in which the first argument has to be the pathname of a file containing a valid DNA string which we are given. The remaining arguments are strings containing only the bases a, c, g, and t in any order. For each valid argument string, it will search the DNA string in the file and count how many non-overlapping occurrences of that argument string are in the DNA string.



An example sequence and output would be if the string aaccgtttgtaaccggaac is in a file named dnafile, then your script should work as follows



$ countmatches dnafile ttt



ttt 1



with the input being "count matches dnafile ttt" and the output being "ttt 1" showing that "ttt" appears once.



this is my script:



 #!/bin/bash

for /data/biocs/b/student.accounts/cs132/data/dna_textfiles

do

count=$grep -o '[acgt][acgt][acgt] /data/biocs/b/student.accounts/cs132/
dna_textfiles | wc -w

echo $/data/biocs/b/student.accounts/cs132/data/dna_textfiles $count

done


and this is the error I get



 [Osama.Chaudry07@cslab5 assignment3]$ ./ countmatches /data/biocs/b/student.accounts/cs132/
data/dna_textfiles aac

./countmatches: line 6: '/data/biocs/b/student.accounts/cs32/data/dna_textfiles': not a valid identifier









share|improve this question









New contributor




Chaudry Osama is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.















  • 2




    We're not a script-writing service, but people where will be happy to help you when you hit specific issues with a script you've written.
    – roaima
    2 hours ago










  • Oh. That screenshot is a script is it? Please don't post pictures of text. They're harder to read, impossible for people who need screenreaders, and not good for search engines.
    – roaima
    2 hours ago










  • Hello @Chaudry Osama. Would you please provide example of the content of file dna_textfiles and the expected output.
    – Goro
    2 hours ago










  • dna_textfiles is a file with nothing but a sequence of letters a, c , g, and t. This is the file for which we have to write a script that will show you how many times a certain sequence such as aac comes up.
    – Chaudry Osama
    2 hours ago










  • @Chaudry Osama so what you are looking for is a code that will help you to identify how many times a nucleotide is repeated in a DNA sequence? For example if the DNA sequence is "accacgactacc" the code must show you 2 counts of acc, correct?
    – Goro
    1 hour ago













up vote
0
down vote

favorite









up vote
0
down vote

favorite











The assignment is to write a bash script named countmatches that will display the number of times a certain sequence, such as aac appears in a specific file. The script should expect at least two commands in which the first argument has to be the pathname of a file containing a valid DNA string which we are given. The remaining arguments are strings containing only the bases a, c, g, and t in any order. For each valid argument string, it will search the DNA string in the file and count how many non-overlapping occurrences of that argument string are in the DNA string.



An example sequence and output would be if the string aaccgtttgtaaccggaac is in a file named dnafile, then your script should work as follows



$ countmatches dnafile ttt



ttt 1



with the input being "count matches dnafile ttt" and the output being "ttt 1" showing that "ttt" appears once.



this is my script:



 #!/bin/bash

for /data/biocs/b/student.accounts/cs132/data/dna_textfiles

do

count=$grep -o '[acgt][acgt][acgt] /data/biocs/b/student.accounts/cs132/
dna_textfiles | wc -w

echo $/data/biocs/b/student.accounts/cs132/data/dna_textfiles $count

done


and this is the error I get



 [Osama.Chaudry07@cslab5 assignment3]$ ./ countmatches /data/biocs/b/student.accounts/cs132/
data/dna_textfiles aac

./countmatches: line 6: '/data/biocs/b/student.accounts/cs32/data/dna_textfiles': not a valid identifier









share|improve this question









New contributor




Chaudry Osama is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











The assignment is to write a bash script named countmatches that will display the number of times a certain sequence, such as aac appears in a specific file. The script should expect at least two commands in which the first argument has to be the pathname of a file containing a valid DNA string which we are given. The remaining arguments are strings containing only the bases a, c, g, and t in any order. For each valid argument string, it will search the DNA string in the file and count how many non-overlapping occurrences of that argument string are in the DNA string.



An example sequence and output would be if the string aaccgtttgtaaccggaac is in a file named dnafile, then your script should work as follows



$ countmatches dnafile ttt



ttt 1



with the input being "count matches dnafile ttt" and the output being "ttt 1" showing that "ttt" appears once.



this is my script:



 #!/bin/bash

for /data/biocs/b/student.accounts/cs132/data/dna_textfiles

do

count=$grep -o '[acgt][acgt][acgt] /data/biocs/b/student.accounts/cs132/
dna_textfiles | wc -w

echo $/data/biocs/b/student.accounts/cs132/data/dna_textfiles $count

done


and this is the error I get



 [Osama.Chaudry07@cslab5 assignment3]$ ./ countmatches /data/biocs/b/student.accounts/cs132/
data/dna_textfiles aac

./countmatches: line 6: '/data/biocs/b/student.accounts/cs32/data/dna_textfiles': not a valid identifier






scripting bioinformatics






share|improve this question









New contributor




Chaudry Osama is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











share|improve this question









New contributor




Chaudry Osama is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









share|improve this question




share|improve this question








edited 7 mins ago









TNT

365313




365313






New contributor




Chaudry Osama is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









asked 2 hours ago









Chaudry Osama

152




152




New contributor




Chaudry Osama is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





New contributor





Chaudry Osama is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






Chaudry Osama is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







  • 2




    We're not a script-writing service, but people where will be happy to help you when you hit specific issues with a script you've written.
    – roaima
    2 hours ago










  • Oh. That screenshot is a script is it? Please don't post pictures of text. They're harder to read, impossible for people who need screenreaders, and not good for search engines.
    – roaima
    2 hours ago










  • Hello @Chaudry Osama. Would you please provide example of the content of file dna_textfiles and the expected output.
    – Goro
    2 hours ago










  • dna_textfiles is a file with nothing but a sequence of letters a, c , g, and t. This is the file for which we have to write a script that will show you how many times a certain sequence such as aac comes up.
    – Chaudry Osama
    2 hours ago










  • @Chaudry Osama so what you are looking for is a code that will help you to identify how many times a nucleotide is repeated in a DNA sequence? For example if the DNA sequence is "accacgactacc" the code must show you 2 counts of acc, correct?
    – Goro
    1 hour ago













  • 2




    We're not a script-writing service, but people where will be happy to help you when you hit specific issues with a script you've written.
    – roaima
    2 hours ago










  • Oh. That screenshot is a script is it? Please don't post pictures of text. They're harder to read, impossible for people who need screenreaders, and not good for search engines.
    – roaima
    2 hours ago










  • Hello @Chaudry Osama. Would you please provide example of the content of file dna_textfiles and the expected output.
    – Goro
    2 hours ago










  • dna_textfiles is a file with nothing but a sequence of letters a, c , g, and t. This is the file for which we have to write a script that will show you how many times a certain sequence such as aac comes up.
    – Chaudry Osama
    2 hours ago










  • @Chaudry Osama so what you are looking for is a code that will help you to identify how many times a nucleotide is repeated in a DNA sequence? For example if the DNA sequence is "accacgactacc" the code must show you 2 counts of acc, correct?
    – Goro
    1 hour ago








2




2




We're not a script-writing service, but people where will be happy to help you when you hit specific issues with a script you've written.
– roaima
2 hours ago




We're not a script-writing service, but people where will be happy to help you when you hit specific issues with a script you've written.
– roaima
2 hours ago












Oh. That screenshot is a script is it? Please don't post pictures of text. They're harder to read, impossible for people who need screenreaders, and not good for search engines.
– roaima
2 hours ago




Oh. That screenshot is a script is it? Please don't post pictures of text. They're harder to read, impossible for people who need screenreaders, and not good for search engines.
– roaima
2 hours ago












Hello @Chaudry Osama. Would you please provide example of the content of file dna_textfiles and the expected output.
– Goro
2 hours ago




Hello @Chaudry Osama. Would you please provide example of the content of file dna_textfiles and the expected output.
– Goro
2 hours ago












dna_textfiles is a file with nothing but a sequence of letters a, c , g, and t. This is the file for which we have to write a script that will show you how many times a certain sequence such as aac comes up.
– Chaudry Osama
2 hours ago




dna_textfiles is a file with nothing but a sequence of letters a, c , g, and t. This is the file for which we have to write a script that will show you how many times a certain sequence such as aac comes up.
– Chaudry Osama
2 hours ago












@Chaudry Osama so what you are looking for is a code that will help you to identify how many times a nucleotide is repeated in a DNA sequence? For example if the DNA sequence is "accacgactacc" the code must show you 2 counts of acc, correct?
– Goro
1 hour ago





@Chaudry Osama so what you are looking for is a code that will help you to identify how many times a nucleotide is repeated in a DNA sequence? For example if the DNA sequence is "accacgactacc" the code must show you 2 counts of acc, correct?
– Goro
1 hour ago











1 Answer
1






active

oldest

votes

















up vote
2
down vote













cat dna_textfile 
aaccgtttgtaaccggaac

#!/bin/bash
dna_file=/autofs/cluster/atassigp/garbage/dna_textfiles
printf "e[31mnucleotide sequence?:";
read -en 3 userInput
while [[ -z "$userInput" ]]
do
read -en 3 userInput
done

count=$(grep -o "$userInput" $dna_file | wc -l)

echo "$userInput", $count


output:



 ttt, 1





share|improve this answer




















    Your Answer







    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "106"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    convertImagesToLinks: false,
    noModals: false,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );






    Chaudry Osama is a new contributor. Be nice, and check out our Code of Conduct.









     

    draft saved


    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f475426%2fhow-many-times-a-certain-dna-base-sequence-occurs-in-a-file%23new-answer', 'question_page');

    );

    Post as a guest






























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    2
    down vote













    cat dna_textfile 
    aaccgtttgtaaccggaac

    #!/bin/bash
    dna_file=/autofs/cluster/atassigp/garbage/dna_textfiles
    printf "e[31mnucleotide sequence?:";
    read -en 3 userInput
    while [[ -z "$userInput" ]]
    do
    read -en 3 userInput
    done

    count=$(grep -o "$userInput" $dna_file | wc -l)

    echo "$userInput", $count


    output:



     ttt, 1





    share|improve this answer
























      up vote
      2
      down vote













      cat dna_textfile 
      aaccgtttgtaaccggaac

      #!/bin/bash
      dna_file=/autofs/cluster/atassigp/garbage/dna_textfiles
      printf "e[31mnucleotide sequence?:";
      read -en 3 userInput
      while [[ -z "$userInput" ]]
      do
      read -en 3 userInput
      done

      count=$(grep -o "$userInput" $dna_file | wc -l)

      echo "$userInput", $count


      output:



       ttt, 1





      share|improve this answer






















        up vote
        2
        down vote










        up vote
        2
        down vote









        cat dna_textfile 
        aaccgtttgtaaccggaac

        #!/bin/bash
        dna_file=/autofs/cluster/atassigp/garbage/dna_textfiles
        printf "e[31mnucleotide sequence?:";
        read -en 3 userInput
        while [[ -z "$userInput" ]]
        do
        read -en 3 userInput
        done

        count=$(grep -o "$userInput" $dna_file | wc -l)

        echo "$userInput", $count


        output:



         ttt, 1





        share|improve this answer












        cat dna_textfile 
        aaccgtttgtaaccggaac

        #!/bin/bash
        dna_file=/autofs/cluster/atassigp/garbage/dna_textfiles
        printf "e[31mnucleotide sequence?:";
        read -en 3 userInput
        while [[ -z "$userInput" ]]
        do
        read -en 3 userInput
        done

        count=$(grep -o "$userInput" $dna_file | wc -l)

        echo "$userInput", $count


        output:



         ttt, 1






        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered 58 mins ago









        Goro

        9,19164486




        9,19164486




















            Chaudry Osama is a new contributor. Be nice, and check out our Code of Conduct.









             

            draft saved


            draft discarded


















            Chaudry Osama is a new contributor. Be nice, and check out our Code of Conduct.












            Chaudry Osama is a new contributor. Be nice, and check out our Code of Conduct.











            Chaudry Osama is a new contributor. Be nice, and check out our Code of Conduct.













             


            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f475426%2fhow-many-times-a-certain-dna-base-sequence-occurs-in-a-file%23new-answer', 'question_page');

            );

            Post as a guest













































































            Popular posts from this blog

            Peggy Mitchell

            Palaiologos

            The Forum (Inglewood, California)