How to concatenate RNA-seq files generated in differnt lanes [closed]

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
-1
down vote

favorite












I have very large RNA-seq files generated in different lanes. I extracted few of the file names as shown below.



MC9_FNEN_638A_S19_L008_R1_001.fastq.gz
MC9_FNEN_638A_S19_L008_R2_001.fastq.gz
MC9_FNEN_638A_S9_L001_R1_001.fastq.gz
MC9_FNEN_638A_S9_L001_R2_001.fastq.gz
MC9_FNEN_638A_S9_L002_R1_001.fastq.gz
MC9_FREN_638A_S9_L002_R2_001.fastq.gz
MC9_FREN_638A_S9_L006_R1_001.fastq.gz
MC9_FREN_638A_S9_L006_R2_001.fastq.gz
MC9_FREN_638A_S9_L008_R1_001.fastq.gz
MC9_FREN_638A_S9_L008_R2_001.fastq.gz
MC9_ZH_637A_S74_L001_R1_001.fastq.gz
MC9_ZH_637A_S74_L001_R2_001.fastq.gz
MC9_ZH_637A_S74_L003_R1_001.fastq.gz
MC9_ZH_637A_S74_L003_R2_001.fastq.gz
MC9_ZH_637A_S74_L007_R1_001.fastq.gz
MC9_ZH_637A_S74_L007_R2_001.fastq.gz
MC9_ZH_637A_S74_L008_R1_001.fastq.gz
MC9_ZH_637A_S74_L008_R2_001.fastq.gz
MC9_ZH_637A_S84_L008_R1_001.fastq.gz
MC9_ZH_637A_S84_L008_R2_001.fastq.gz
DR14_DCRP_479C_S50_L001_R1_001.fastq.gz
DR14_DCRP_479C_S50_L001_R2_001.fastq.gz
DR14_DCRP_479C_S50_L002_R1_001.fastq.gz
DR14_DCRP_479C_S50_L002_R2_001.fastq.gz
DR14_DCRP_479C_S50_L006_R1_001.fastq.gz
DR14_DCRP_479C_S50_L006_R2_001.fastq.gz
DR14_DCRP_479C_S50_L007_R1_001.fastq.gz
DR14_DCRP_479C_S50_L007_R2_001.fastq.gz
DR14_DCRP_479C_S50_L008_R1_001.fastq.gz
DR14_DCRP_479C_S50_L008_R2_001.fastq.gz


I want to concatenate all the sequence generated in different lanes for the forward and reverse read. For example the first 10 lines are sequence file from the same animal and specific tissue (MC9_FREN). I want to concatenate all the forward read XXXXX_R1_001.fastq.gz that are generated in different lanes and put in the file name MC9_FREN_R1.fastq.gz and all reverse reads XXXX_R2_001.fastq.gz to MC9_FREN_R2.fastq.gz



cat MC9_FREN_638A_S19_L008_R1_001.fastq.gz MC9_FREN_638A_S9_L001_R1_001.fastq.gz MC9_FREN_638A_S9_L002_R1_001.fastq.gz MC9_FREN_638A_S9_L007_R1_001.fastq.gz MC9_FREN_638A_S9_L008_R1_001.fastq.gz > MC9_FREN_R1.fastq.gz
cat MC9_FREN_638A_S19_L008_R2_001.fastq.gz MC9_FREN_638A_S9_L001_R2_001.fastq.gz MC9_FREN_638A_S9_L002_R2_001.fastq.gz MC9_FREN_638A_S9_L007_R2_001.fastq.gz MC9_FREN_638A_S9_L008_R2_001.fastq.gz > MC9_FREN_R2.fastq.gz
cat MC9_ZH_637A_S74_L001_R1_001.fastq.gz MC9_ZH_637A_S74_L003_R1_001.fastq.gz MC9_ZH_637A_S74_L007_R1_001.fastq.gz MC9_ZH_637A_S74_L008_R1_001.fastq.gz MC9_ZH_637A_S84_L008_R1_001.fastq.gz > MC9_ZH_R1.gz
cat MC9_ZH_637A_S74_L001_R2_001.fastq.gz MC9_ZH_637A_S74_L003_R2_001.fastq.gz MC9_ZH_637A_S74_L007_R2_001.fastq.gz MC9_ZH_637A_S74_L008_R2_001.fastq.gz MC9_ZH_637A_S84_L008_R2_001.fastq.gz > MC9_ZH_R2.gz
cat DR14_DCRP_479C_S50_L001_R1_001.fastq.gz DR14_DCRP_479C_S50_L002_R1_001.fastq.gz DR14_DCRP_479C_S50_L006_R1_001.fastq.gz DR14_DCRP_479C_S50_L007_R1_001.fastq.gz DR14_DCRP_479C_S50_L008_R1_001.fastq.gz > DR14_DCRP_R1.gz
cat DR14_DCRP_479C_S50_L001_R2_001.fastq.gz DR14_DCRP_479C_S50_L002_R2_001.fastq.gz DR14_DCRP_479C_S50_L006_R2_001.fastq.gz DR14_DCRP_479C_S50_L007_R2_001.fastq.gz DR14_DCRP_479C_S50_L008_R2_001.fastq.gz > DR14_DCRP_R1.gz






share|improve this question














closed as unclear what you're asking by Kiwy, Jeff Schaller, Timothy Martin, roaima, Eliah Kagan Apr 10 at 18:23


Please clarify your specific problem or add additional details to highlight exactly what you need. As it's currently written, it’s hard to tell exactly what you're asking. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.










  • 3




    Yes? What is the question. You have now concatenated the files. Is that the type of "merging" that you want to do?
    – Kusalananda
    Apr 10 at 13:01










  • I have very large number of files, writing each file name manually may be time consuming. I wonder if you could write a command line using regular expression.
    – desu
    Apr 10 at 13:05






  • 2




    What is the logic behind what files should be concatenated?
    – Kusalananda
    Apr 10 at 13:06











  • For example the first 10 lines are sequence file from the same animal and specific tissue (MC9_PREN). I want to merge all XXXXX_R1_001.fastq.gz and put in the file name MC9_PREN_R1.fastq.gz and all XXXX_R2_001.fastq.gz to MC9_PREN_R2.fastq.gz
    – desu
    Apr 10 at 13:12







  • 1




    I don't think you are using the word "merge" in the way that we, as computing people, would expect. Please update your question to provide a short worked example of what you are trying to achieve.
    – roaima
    Apr 10 at 13:27















up vote
-1
down vote

favorite












I have very large RNA-seq files generated in different lanes. I extracted few of the file names as shown below.



MC9_FNEN_638A_S19_L008_R1_001.fastq.gz
MC9_FNEN_638A_S19_L008_R2_001.fastq.gz
MC9_FNEN_638A_S9_L001_R1_001.fastq.gz
MC9_FNEN_638A_S9_L001_R2_001.fastq.gz
MC9_FNEN_638A_S9_L002_R1_001.fastq.gz
MC9_FREN_638A_S9_L002_R2_001.fastq.gz
MC9_FREN_638A_S9_L006_R1_001.fastq.gz
MC9_FREN_638A_S9_L006_R2_001.fastq.gz
MC9_FREN_638A_S9_L008_R1_001.fastq.gz
MC9_FREN_638A_S9_L008_R2_001.fastq.gz
MC9_ZH_637A_S74_L001_R1_001.fastq.gz
MC9_ZH_637A_S74_L001_R2_001.fastq.gz
MC9_ZH_637A_S74_L003_R1_001.fastq.gz
MC9_ZH_637A_S74_L003_R2_001.fastq.gz
MC9_ZH_637A_S74_L007_R1_001.fastq.gz
MC9_ZH_637A_S74_L007_R2_001.fastq.gz
MC9_ZH_637A_S74_L008_R1_001.fastq.gz
MC9_ZH_637A_S74_L008_R2_001.fastq.gz
MC9_ZH_637A_S84_L008_R1_001.fastq.gz
MC9_ZH_637A_S84_L008_R2_001.fastq.gz
DR14_DCRP_479C_S50_L001_R1_001.fastq.gz
DR14_DCRP_479C_S50_L001_R2_001.fastq.gz
DR14_DCRP_479C_S50_L002_R1_001.fastq.gz
DR14_DCRP_479C_S50_L002_R2_001.fastq.gz
DR14_DCRP_479C_S50_L006_R1_001.fastq.gz
DR14_DCRP_479C_S50_L006_R2_001.fastq.gz
DR14_DCRP_479C_S50_L007_R1_001.fastq.gz
DR14_DCRP_479C_S50_L007_R2_001.fastq.gz
DR14_DCRP_479C_S50_L008_R1_001.fastq.gz
DR14_DCRP_479C_S50_L008_R2_001.fastq.gz


I want to concatenate all the sequence generated in different lanes for the forward and reverse read. For example the first 10 lines are sequence file from the same animal and specific tissue (MC9_FREN). I want to concatenate all the forward read XXXXX_R1_001.fastq.gz that are generated in different lanes and put in the file name MC9_FREN_R1.fastq.gz and all reverse reads XXXX_R2_001.fastq.gz to MC9_FREN_R2.fastq.gz



cat MC9_FREN_638A_S19_L008_R1_001.fastq.gz MC9_FREN_638A_S9_L001_R1_001.fastq.gz MC9_FREN_638A_S9_L002_R1_001.fastq.gz MC9_FREN_638A_S9_L007_R1_001.fastq.gz MC9_FREN_638A_S9_L008_R1_001.fastq.gz > MC9_FREN_R1.fastq.gz
cat MC9_FREN_638A_S19_L008_R2_001.fastq.gz MC9_FREN_638A_S9_L001_R2_001.fastq.gz MC9_FREN_638A_S9_L002_R2_001.fastq.gz MC9_FREN_638A_S9_L007_R2_001.fastq.gz MC9_FREN_638A_S9_L008_R2_001.fastq.gz > MC9_FREN_R2.fastq.gz
cat MC9_ZH_637A_S74_L001_R1_001.fastq.gz MC9_ZH_637A_S74_L003_R1_001.fastq.gz MC9_ZH_637A_S74_L007_R1_001.fastq.gz MC9_ZH_637A_S74_L008_R1_001.fastq.gz MC9_ZH_637A_S84_L008_R1_001.fastq.gz > MC9_ZH_R1.gz
cat MC9_ZH_637A_S74_L001_R2_001.fastq.gz MC9_ZH_637A_S74_L003_R2_001.fastq.gz MC9_ZH_637A_S74_L007_R2_001.fastq.gz MC9_ZH_637A_S74_L008_R2_001.fastq.gz MC9_ZH_637A_S84_L008_R2_001.fastq.gz > MC9_ZH_R2.gz
cat DR14_DCRP_479C_S50_L001_R1_001.fastq.gz DR14_DCRP_479C_S50_L002_R1_001.fastq.gz DR14_DCRP_479C_S50_L006_R1_001.fastq.gz DR14_DCRP_479C_S50_L007_R1_001.fastq.gz DR14_DCRP_479C_S50_L008_R1_001.fastq.gz > DR14_DCRP_R1.gz
cat DR14_DCRP_479C_S50_L001_R2_001.fastq.gz DR14_DCRP_479C_S50_L002_R2_001.fastq.gz DR14_DCRP_479C_S50_L006_R2_001.fastq.gz DR14_DCRP_479C_S50_L007_R2_001.fastq.gz DR14_DCRP_479C_S50_L008_R2_001.fastq.gz > DR14_DCRP_R1.gz






share|improve this question














closed as unclear what you're asking by Kiwy, Jeff Schaller, Timothy Martin, roaima, Eliah Kagan Apr 10 at 18:23


Please clarify your specific problem or add additional details to highlight exactly what you need. As it's currently written, it’s hard to tell exactly what you're asking. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.










  • 3




    Yes? What is the question. You have now concatenated the files. Is that the type of "merging" that you want to do?
    – Kusalananda
    Apr 10 at 13:01










  • I have very large number of files, writing each file name manually may be time consuming. I wonder if you could write a command line using regular expression.
    – desu
    Apr 10 at 13:05






  • 2




    What is the logic behind what files should be concatenated?
    – Kusalananda
    Apr 10 at 13:06











  • For example the first 10 lines are sequence file from the same animal and specific tissue (MC9_PREN). I want to merge all XXXXX_R1_001.fastq.gz and put in the file name MC9_PREN_R1.fastq.gz and all XXXX_R2_001.fastq.gz to MC9_PREN_R2.fastq.gz
    – desu
    Apr 10 at 13:12







  • 1




    I don't think you are using the word "merge" in the way that we, as computing people, would expect. Please update your question to provide a short worked example of what you are trying to achieve.
    – roaima
    Apr 10 at 13:27













up vote
-1
down vote

favorite









up vote
-1
down vote

favorite











I have very large RNA-seq files generated in different lanes. I extracted few of the file names as shown below.



MC9_FNEN_638A_S19_L008_R1_001.fastq.gz
MC9_FNEN_638A_S19_L008_R2_001.fastq.gz
MC9_FNEN_638A_S9_L001_R1_001.fastq.gz
MC9_FNEN_638A_S9_L001_R2_001.fastq.gz
MC9_FNEN_638A_S9_L002_R1_001.fastq.gz
MC9_FREN_638A_S9_L002_R2_001.fastq.gz
MC9_FREN_638A_S9_L006_R1_001.fastq.gz
MC9_FREN_638A_S9_L006_R2_001.fastq.gz
MC9_FREN_638A_S9_L008_R1_001.fastq.gz
MC9_FREN_638A_S9_L008_R2_001.fastq.gz
MC9_ZH_637A_S74_L001_R1_001.fastq.gz
MC9_ZH_637A_S74_L001_R2_001.fastq.gz
MC9_ZH_637A_S74_L003_R1_001.fastq.gz
MC9_ZH_637A_S74_L003_R2_001.fastq.gz
MC9_ZH_637A_S74_L007_R1_001.fastq.gz
MC9_ZH_637A_S74_L007_R2_001.fastq.gz
MC9_ZH_637A_S74_L008_R1_001.fastq.gz
MC9_ZH_637A_S74_L008_R2_001.fastq.gz
MC9_ZH_637A_S84_L008_R1_001.fastq.gz
MC9_ZH_637A_S84_L008_R2_001.fastq.gz
DR14_DCRP_479C_S50_L001_R1_001.fastq.gz
DR14_DCRP_479C_S50_L001_R2_001.fastq.gz
DR14_DCRP_479C_S50_L002_R1_001.fastq.gz
DR14_DCRP_479C_S50_L002_R2_001.fastq.gz
DR14_DCRP_479C_S50_L006_R1_001.fastq.gz
DR14_DCRP_479C_S50_L006_R2_001.fastq.gz
DR14_DCRP_479C_S50_L007_R1_001.fastq.gz
DR14_DCRP_479C_S50_L007_R2_001.fastq.gz
DR14_DCRP_479C_S50_L008_R1_001.fastq.gz
DR14_DCRP_479C_S50_L008_R2_001.fastq.gz


I want to concatenate all the sequence generated in different lanes for the forward and reverse read. For example the first 10 lines are sequence file from the same animal and specific tissue (MC9_FREN). I want to concatenate all the forward read XXXXX_R1_001.fastq.gz that are generated in different lanes and put in the file name MC9_FREN_R1.fastq.gz and all reverse reads XXXX_R2_001.fastq.gz to MC9_FREN_R2.fastq.gz



cat MC9_FREN_638A_S19_L008_R1_001.fastq.gz MC9_FREN_638A_S9_L001_R1_001.fastq.gz MC9_FREN_638A_S9_L002_R1_001.fastq.gz MC9_FREN_638A_S9_L007_R1_001.fastq.gz MC9_FREN_638A_S9_L008_R1_001.fastq.gz > MC9_FREN_R1.fastq.gz
cat MC9_FREN_638A_S19_L008_R2_001.fastq.gz MC9_FREN_638A_S9_L001_R2_001.fastq.gz MC9_FREN_638A_S9_L002_R2_001.fastq.gz MC9_FREN_638A_S9_L007_R2_001.fastq.gz MC9_FREN_638A_S9_L008_R2_001.fastq.gz > MC9_FREN_R2.fastq.gz
cat MC9_ZH_637A_S74_L001_R1_001.fastq.gz MC9_ZH_637A_S74_L003_R1_001.fastq.gz MC9_ZH_637A_S74_L007_R1_001.fastq.gz MC9_ZH_637A_S74_L008_R1_001.fastq.gz MC9_ZH_637A_S84_L008_R1_001.fastq.gz > MC9_ZH_R1.gz
cat MC9_ZH_637A_S74_L001_R2_001.fastq.gz MC9_ZH_637A_S74_L003_R2_001.fastq.gz MC9_ZH_637A_S74_L007_R2_001.fastq.gz MC9_ZH_637A_S74_L008_R2_001.fastq.gz MC9_ZH_637A_S84_L008_R2_001.fastq.gz > MC9_ZH_R2.gz
cat DR14_DCRP_479C_S50_L001_R1_001.fastq.gz DR14_DCRP_479C_S50_L002_R1_001.fastq.gz DR14_DCRP_479C_S50_L006_R1_001.fastq.gz DR14_DCRP_479C_S50_L007_R1_001.fastq.gz DR14_DCRP_479C_S50_L008_R1_001.fastq.gz > DR14_DCRP_R1.gz
cat DR14_DCRP_479C_S50_L001_R2_001.fastq.gz DR14_DCRP_479C_S50_L002_R2_001.fastq.gz DR14_DCRP_479C_S50_L006_R2_001.fastq.gz DR14_DCRP_479C_S50_L007_R2_001.fastq.gz DR14_DCRP_479C_S50_L008_R2_001.fastq.gz > DR14_DCRP_R1.gz






share|improve this question














I have very large RNA-seq files generated in different lanes. I extracted few of the file names as shown below.



MC9_FNEN_638A_S19_L008_R1_001.fastq.gz
MC9_FNEN_638A_S19_L008_R2_001.fastq.gz
MC9_FNEN_638A_S9_L001_R1_001.fastq.gz
MC9_FNEN_638A_S9_L001_R2_001.fastq.gz
MC9_FNEN_638A_S9_L002_R1_001.fastq.gz
MC9_FREN_638A_S9_L002_R2_001.fastq.gz
MC9_FREN_638A_S9_L006_R1_001.fastq.gz
MC9_FREN_638A_S9_L006_R2_001.fastq.gz
MC9_FREN_638A_S9_L008_R1_001.fastq.gz
MC9_FREN_638A_S9_L008_R2_001.fastq.gz
MC9_ZH_637A_S74_L001_R1_001.fastq.gz
MC9_ZH_637A_S74_L001_R2_001.fastq.gz
MC9_ZH_637A_S74_L003_R1_001.fastq.gz
MC9_ZH_637A_S74_L003_R2_001.fastq.gz
MC9_ZH_637A_S74_L007_R1_001.fastq.gz
MC9_ZH_637A_S74_L007_R2_001.fastq.gz
MC9_ZH_637A_S74_L008_R1_001.fastq.gz
MC9_ZH_637A_S74_L008_R2_001.fastq.gz
MC9_ZH_637A_S84_L008_R1_001.fastq.gz
MC9_ZH_637A_S84_L008_R2_001.fastq.gz
DR14_DCRP_479C_S50_L001_R1_001.fastq.gz
DR14_DCRP_479C_S50_L001_R2_001.fastq.gz
DR14_DCRP_479C_S50_L002_R1_001.fastq.gz
DR14_DCRP_479C_S50_L002_R2_001.fastq.gz
DR14_DCRP_479C_S50_L006_R1_001.fastq.gz
DR14_DCRP_479C_S50_L006_R2_001.fastq.gz
DR14_DCRP_479C_S50_L007_R1_001.fastq.gz
DR14_DCRP_479C_S50_L007_R2_001.fastq.gz
DR14_DCRP_479C_S50_L008_R1_001.fastq.gz
DR14_DCRP_479C_S50_L008_R2_001.fastq.gz


I want to concatenate all the sequence generated in different lanes for the forward and reverse read. For example the first 10 lines are sequence file from the same animal and specific tissue (MC9_FREN). I want to concatenate all the forward read XXXXX_R1_001.fastq.gz that are generated in different lanes and put in the file name MC9_FREN_R1.fastq.gz and all reverse reads XXXX_R2_001.fastq.gz to MC9_FREN_R2.fastq.gz



cat MC9_FREN_638A_S19_L008_R1_001.fastq.gz MC9_FREN_638A_S9_L001_R1_001.fastq.gz MC9_FREN_638A_S9_L002_R1_001.fastq.gz MC9_FREN_638A_S9_L007_R1_001.fastq.gz MC9_FREN_638A_S9_L008_R1_001.fastq.gz > MC9_FREN_R1.fastq.gz
cat MC9_FREN_638A_S19_L008_R2_001.fastq.gz MC9_FREN_638A_S9_L001_R2_001.fastq.gz MC9_FREN_638A_S9_L002_R2_001.fastq.gz MC9_FREN_638A_S9_L007_R2_001.fastq.gz MC9_FREN_638A_S9_L008_R2_001.fastq.gz > MC9_FREN_R2.fastq.gz
cat MC9_ZH_637A_S74_L001_R1_001.fastq.gz MC9_ZH_637A_S74_L003_R1_001.fastq.gz MC9_ZH_637A_S74_L007_R1_001.fastq.gz MC9_ZH_637A_S74_L008_R1_001.fastq.gz MC9_ZH_637A_S84_L008_R1_001.fastq.gz > MC9_ZH_R1.gz
cat MC9_ZH_637A_S74_L001_R2_001.fastq.gz MC9_ZH_637A_S74_L003_R2_001.fastq.gz MC9_ZH_637A_S74_L007_R2_001.fastq.gz MC9_ZH_637A_S74_L008_R2_001.fastq.gz MC9_ZH_637A_S84_L008_R2_001.fastq.gz > MC9_ZH_R2.gz
cat DR14_DCRP_479C_S50_L001_R1_001.fastq.gz DR14_DCRP_479C_S50_L002_R1_001.fastq.gz DR14_DCRP_479C_S50_L006_R1_001.fastq.gz DR14_DCRP_479C_S50_L007_R1_001.fastq.gz DR14_DCRP_479C_S50_L008_R1_001.fastq.gz > DR14_DCRP_R1.gz
cat DR14_DCRP_479C_S50_L001_R2_001.fastq.gz DR14_DCRP_479C_S50_L002_R2_001.fastq.gz DR14_DCRP_479C_S50_L006_R2_001.fastq.gz DR14_DCRP_479C_S50_L007_R2_001.fastq.gz DR14_DCRP_479C_S50_L008_R2_001.fastq.gz > DR14_DCRP_R1.gz








share|improve this question













share|improve this question




share|improve this question








edited Apr 12 at 15:51

























asked Apr 10 at 12:57









desu

544




544




closed as unclear what you're asking by Kiwy, Jeff Schaller, Timothy Martin, roaima, Eliah Kagan Apr 10 at 18:23


Please clarify your specific problem or add additional details to highlight exactly what you need. As it's currently written, it’s hard to tell exactly what you're asking. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.






closed as unclear what you're asking by Kiwy, Jeff Schaller, Timothy Martin, roaima, Eliah Kagan Apr 10 at 18:23


Please clarify your specific problem or add additional details to highlight exactly what you need. As it's currently written, it’s hard to tell exactly what you're asking. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.









  • 3




    Yes? What is the question. You have now concatenated the files. Is that the type of "merging" that you want to do?
    – Kusalananda
    Apr 10 at 13:01










  • I have very large number of files, writing each file name manually may be time consuming. I wonder if you could write a command line using regular expression.
    – desu
    Apr 10 at 13:05






  • 2




    What is the logic behind what files should be concatenated?
    – Kusalananda
    Apr 10 at 13:06











  • For example the first 10 lines are sequence file from the same animal and specific tissue (MC9_PREN). I want to merge all XXXXX_R1_001.fastq.gz and put in the file name MC9_PREN_R1.fastq.gz and all XXXX_R2_001.fastq.gz to MC9_PREN_R2.fastq.gz
    – desu
    Apr 10 at 13:12







  • 1




    I don't think you are using the word "merge" in the way that we, as computing people, would expect. Please update your question to provide a short worked example of what you are trying to achieve.
    – roaima
    Apr 10 at 13:27













  • 3




    Yes? What is the question. You have now concatenated the files. Is that the type of "merging" that you want to do?
    – Kusalananda
    Apr 10 at 13:01










  • I have very large number of files, writing each file name manually may be time consuming. I wonder if you could write a command line using regular expression.
    – desu
    Apr 10 at 13:05






  • 2




    What is the logic behind what files should be concatenated?
    – Kusalananda
    Apr 10 at 13:06











  • For example the first 10 lines are sequence file from the same animal and specific tissue (MC9_PREN). I want to merge all XXXXX_R1_001.fastq.gz and put in the file name MC9_PREN_R1.fastq.gz and all XXXX_R2_001.fastq.gz to MC9_PREN_R2.fastq.gz
    – desu
    Apr 10 at 13:12







  • 1




    I don't think you are using the word "merge" in the way that we, as computing people, would expect. Please update your question to provide a short worked example of what you are trying to achieve.
    – roaima
    Apr 10 at 13:27








3




3




Yes? What is the question. You have now concatenated the files. Is that the type of "merging" that you want to do?
– Kusalananda
Apr 10 at 13:01




Yes? What is the question. You have now concatenated the files. Is that the type of "merging" that you want to do?
– Kusalananda
Apr 10 at 13:01












I have very large number of files, writing each file name manually may be time consuming. I wonder if you could write a command line using regular expression.
– desu
Apr 10 at 13:05




I have very large number of files, writing each file name manually may be time consuming. I wonder if you could write a command line using regular expression.
– desu
Apr 10 at 13:05




2




2




What is the logic behind what files should be concatenated?
– Kusalananda
Apr 10 at 13:06





What is the logic behind what files should be concatenated?
– Kusalananda
Apr 10 at 13:06













For example the first 10 lines are sequence file from the same animal and specific tissue (MC9_PREN). I want to merge all XXXXX_R1_001.fastq.gz and put in the file name MC9_PREN_R1.fastq.gz and all XXXX_R2_001.fastq.gz to MC9_PREN_R2.fastq.gz
– desu
Apr 10 at 13:12





For example the first 10 lines are sequence file from the same animal and specific tissue (MC9_PREN). I want to merge all XXXXX_R1_001.fastq.gz and put in the file name MC9_PREN_R1.fastq.gz and all XXXX_R2_001.fastq.gz to MC9_PREN_R2.fastq.gz
– desu
Apr 10 at 13:12





1




1




I don't think you are using the word "merge" in the way that we, as computing people, would expect. Please update your question to provide a short worked example of what you are trying to achieve.
– roaima
Apr 10 at 13:27





I don't think you are using the word "merge" in the way that we, as computing people, would expect. Please update your question to provide a short worked example of what you are trying to achieve.
– roaima
Apr 10 at 13:27











2 Answers
2






active

oldest

votes

















up vote
2
down vote



accepted










The following loop gives us the unique filename prefixes of the FastQ files in the current directory. It relies on the fact that there will always be four underscores (_) between the filename prefix that we want and the R1 or R2 later in the filename.



for name in *.fastq.gz; do
printf '%sn' "$name%_*_*_*_R[12]*"
done | uniq


The following is equivalent, but does not use a loop (rather than deleting the last bit of the filename, this keeps the first bit of the filename):



printf '%sn' *.fastq.gz | sed 's/^([^_]*_[^_]*).*/1/' | uniq


With the given list of files, either of the above returns



DR14_DCRP
MC9_FNEN
MC9_FREN
MC9_ZH


We then read these prefixes and create our concatenated files:



for name in *.fastq.gz; do
printf '%sn' "$name%_*_*_*_R[12]*"
done | uniq |
while read prefix; do
cat "$prefix"*R1*.fastq.gz >"$prefix_R1.fastq.gz"
cat "$prefix"*R2*.fastq.gz >"$prefix_R2.fastq.gz"
done


or, using the sed code from above,



printf '%sn' *.fastq.gz | sed 's/^([^_]*_[^_]*).*/1/' | uniq |
while read prefix; do
cat "$prefix"*R1*.fastq.gz >"$prefix_R1.fastq.gz"
cat "$prefix"*R2*.fastq.gz >"$prefix_R2.fastq.gz"
done


No code above uses bash-specific (or GNU-specific) features and should work in all POSIX shells.




UPDATE: I work with bioinformaticians, and a colleague of mine commented:




One should not just simply merge fastq files... In an ideal world, one should map each lane separately, adding an appropriate RG, and then merge the BAM files. Because lane-specific effects exist, etc. It can be more or less important, depending on the downstream application of course.




For questions about this, please refer to the Bioinformatics Stack Exchange site.






share|improve this answer





























    up vote
    1
    down vote













    Bash solution:



    for f in *.fastq.gz; do 
    [[ "$f" =~ ^([^_]+_[^_]+)_.*(_[^_]+)_[0-9]+.fastq.gz$ ]]
    cat "$f" >> "$BASH_REMATCH[1]$BASH_REMATCH[2].fastq.gz"
    done




    • ^([^_]+_[^_]+)_.*(_[^_]+)_[0-9]+.fastq.gz$ - the crucial regex pattern to capture the first 2 prefixes into the 1st captured group (for ex. MC9_PREN) and R-named suffix into the 2nd captured group (for ex. _R1)





    share|improve this answer



























      2 Answers
      2






      active

      oldest

      votes








      2 Answers
      2






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes








      up vote
      2
      down vote



      accepted










      The following loop gives us the unique filename prefixes of the FastQ files in the current directory. It relies on the fact that there will always be four underscores (_) between the filename prefix that we want and the R1 or R2 later in the filename.



      for name in *.fastq.gz; do
      printf '%sn' "$name%_*_*_*_R[12]*"
      done | uniq


      The following is equivalent, but does not use a loop (rather than deleting the last bit of the filename, this keeps the first bit of the filename):



      printf '%sn' *.fastq.gz | sed 's/^([^_]*_[^_]*).*/1/' | uniq


      With the given list of files, either of the above returns



      DR14_DCRP
      MC9_FNEN
      MC9_FREN
      MC9_ZH


      We then read these prefixes and create our concatenated files:



      for name in *.fastq.gz; do
      printf '%sn' "$name%_*_*_*_R[12]*"
      done | uniq |
      while read prefix; do
      cat "$prefix"*R1*.fastq.gz >"$prefix_R1.fastq.gz"
      cat "$prefix"*R2*.fastq.gz >"$prefix_R2.fastq.gz"
      done


      or, using the sed code from above,



      printf '%sn' *.fastq.gz | sed 's/^([^_]*_[^_]*).*/1/' | uniq |
      while read prefix; do
      cat "$prefix"*R1*.fastq.gz >"$prefix_R1.fastq.gz"
      cat "$prefix"*R2*.fastq.gz >"$prefix_R2.fastq.gz"
      done


      No code above uses bash-specific (or GNU-specific) features and should work in all POSIX shells.




      UPDATE: I work with bioinformaticians, and a colleague of mine commented:




      One should not just simply merge fastq files... In an ideal world, one should map each lane separately, adding an appropriate RG, and then merge the BAM files. Because lane-specific effects exist, etc. It can be more or less important, depending on the downstream application of course.




      For questions about this, please refer to the Bioinformatics Stack Exchange site.






      share|improve this answer


























        up vote
        2
        down vote



        accepted










        The following loop gives us the unique filename prefixes of the FastQ files in the current directory. It relies on the fact that there will always be four underscores (_) between the filename prefix that we want and the R1 or R2 later in the filename.



        for name in *.fastq.gz; do
        printf '%sn' "$name%_*_*_*_R[12]*"
        done | uniq


        The following is equivalent, but does not use a loop (rather than deleting the last bit of the filename, this keeps the first bit of the filename):



        printf '%sn' *.fastq.gz | sed 's/^([^_]*_[^_]*).*/1/' | uniq


        With the given list of files, either of the above returns



        DR14_DCRP
        MC9_FNEN
        MC9_FREN
        MC9_ZH


        We then read these prefixes and create our concatenated files:



        for name in *.fastq.gz; do
        printf '%sn' "$name%_*_*_*_R[12]*"
        done | uniq |
        while read prefix; do
        cat "$prefix"*R1*.fastq.gz >"$prefix_R1.fastq.gz"
        cat "$prefix"*R2*.fastq.gz >"$prefix_R2.fastq.gz"
        done


        or, using the sed code from above,



        printf '%sn' *.fastq.gz | sed 's/^([^_]*_[^_]*).*/1/' | uniq |
        while read prefix; do
        cat "$prefix"*R1*.fastq.gz >"$prefix_R1.fastq.gz"
        cat "$prefix"*R2*.fastq.gz >"$prefix_R2.fastq.gz"
        done


        No code above uses bash-specific (or GNU-specific) features and should work in all POSIX shells.




        UPDATE: I work with bioinformaticians, and a colleague of mine commented:




        One should not just simply merge fastq files... In an ideal world, one should map each lane separately, adding an appropriate RG, and then merge the BAM files. Because lane-specific effects exist, etc. It can be more or less important, depending on the downstream application of course.




        For questions about this, please refer to the Bioinformatics Stack Exchange site.






        share|improve this answer
























          up vote
          2
          down vote



          accepted







          up vote
          2
          down vote



          accepted






          The following loop gives us the unique filename prefixes of the FastQ files in the current directory. It relies on the fact that there will always be four underscores (_) between the filename prefix that we want and the R1 or R2 later in the filename.



          for name in *.fastq.gz; do
          printf '%sn' "$name%_*_*_*_R[12]*"
          done | uniq


          The following is equivalent, but does not use a loop (rather than deleting the last bit of the filename, this keeps the first bit of the filename):



          printf '%sn' *.fastq.gz | sed 's/^([^_]*_[^_]*).*/1/' | uniq


          With the given list of files, either of the above returns



          DR14_DCRP
          MC9_FNEN
          MC9_FREN
          MC9_ZH


          We then read these prefixes and create our concatenated files:



          for name in *.fastq.gz; do
          printf '%sn' "$name%_*_*_*_R[12]*"
          done | uniq |
          while read prefix; do
          cat "$prefix"*R1*.fastq.gz >"$prefix_R1.fastq.gz"
          cat "$prefix"*R2*.fastq.gz >"$prefix_R2.fastq.gz"
          done


          or, using the sed code from above,



          printf '%sn' *.fastq.gz | sed 's/^([^_]*_[^_]*).*/1/' | uniq |
          while read prefix; do
          cat "$prefix"*R1*.fastq.gz >"$prefix_R1.fastq.gz"
          cat "$prefix"*R2*.fastq.gz >"$prefix_R2.fastq.gz"
          done


          No code above uses bash-specific (or GNU-specific) features and should work in all POSIX shells.




          UPDATE: I work with bioinformaticians, and a colleague of mine commented:




          One should not just simply merge fastq files... In an ideal world, one should map each lane separately, adding an appropriate RG, and then merge the BAM files. Because lane-specific effects exist, etc. It can be more or less important, depending on the downstream application of course.




          For questions about this, please refer to the Bioinformatics Stack Exchange site.






          share|improve this answer














          The following loop gives us the unique filename prefixes of the FastQ files in the current directory. It relies on the fact that there will always be four underscores (_) between the filename prefix that we want and the R1 or R2 later in the filename.



          for name in *.fastq.gz; do
          printf '%sn' "$name%_*_*_*_R[12]*"
          done | uniq


          The following is equivalent, but does not use a loop (rather than deleting the last bit of the filename, this keeps the first bit of the filename):



          printf '%sn' *.fastq.gz | sed 's/^([^_]*_[^_]*).*/1/' | uniq


          With the given list of files, either of the above returns



          DR14_DCRP
          MC9_FNEN
          MC9_FREN
          MC9_ZH


          We then read these prefixes and create our concatenated files:



          for name in *.fastq.gz; do
          printf '%sn' "$name%_*_*_*_R[12]*"
          done | uniq |
          while read prefix; do
          cat "$prefix"*R1*.fastq.gz >"$prefix_R1.fastq.gz"
          cat "$prefix"*R2*.fastq.gz >"$prefix_R2.fastq.gz"
          done


          or, using the sed code from above,



          printf '%sn' *.fastq.gz | sed 's/^([^_]*_[^_]*).*/1/' | uniq |
          while read prefix; do
          cat "$prefix"*R1*.fastq.gz >"$prefix_R1.fastq.gz"
          cat "$prefix"*R2*.fastq.gz >"$prefix_R2.fastq.gz"
          done


          No code above uses bash-specific (or GNU-specific) features and should work in all POSIX shells.




          UPDATE: I work with bioinformaticians, and a colleague of mine commented:




          One should not just simply merge fastq files... In an ideal world, one should map each lane separately, adding an appropriate RG, and then merge the BAM files. Because lane-specific effects exist, etc. It can be more or less important, depending on the downstream application of course.




          For questions about this, please refer to the Bioinformatics Stack Exchange site.







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Apr 12 at 15:57

























          answered Apr 10 at 13:23









          Kusalananda

          102k13200317




          102k13200317






















              up vote
              1
              down vote













              Bash solution:



              for f in *.fastq.gz; do 
              [[ "$f" =~ ^([^_]+_[^_]+)_.*(_[^_]+)_[0-9]+.fastq.gz$ ]]
              cat "$f" >> "$BASH_REMATCH[1]$BASH_REMATCH[2].fastq.gz"
              done




              • ^([^_]+_[^_]+)_.*(_[^_]+)_[0-9]+.fastq.gz$ - the crucial regex pattern to capture the first 2 prefixes into the 1st captured group (for ex. MC9_PREN) and R-named suffix into the 2nd captured group (for ex. _R1)





              share|improve this answer
























                up vote
                1
                down vote













                Bash solution:



                for f in *.fastq.gz; do 
                [[ "$f" =~ ^([^_]+_[^_]+)_.*(_[^_]+)_[0-9]+.fastq.gz$ ]]
                cat "$f" >> "$BASH_REMATCH[1]$BASH_REMATCH[2].fastq.gz"
                done




                • ^([^_]+_[^_]+)_.*(_[^_]+)_[0-9]+.fastq.gz$ - the crucial regex pattern to capture the first 2 prefixes into the 1st captured group (for ex. MC9_PREN) and R-named suffix into the 2nd captured group (for ex. _R1)





                share|improve this answer






















                  up vote
                  1
                  down vote










                  up vote
                  1
                  down vote









                  Bash solution:



                  for f in *.fastq.gz; do 
                  [[ "$f" =~ ^([^_]+_[^_]+)_.*(_[^_]+)_[0-9]+.fastq.gz$ ]]
                  cat "$f" >> "$BASH_REMATCH[1]$BASH_REMATCH[2].fastq.gz"
                  done




                  • ^([^_]+_[^_]+)_.*(_[^_]+)_[0-9]+.fastq.gz$ - the crucial regex pattern to capture the first 2 prefixes into the 1st captured group (for ex. MC9_PREN) and R-named suffix into the 2nd captured group (for ex. _R1)





                  share|improve this answer












                  Bash solution:



                  for f in *.fastq.gz; do 
                  [[ "$f" =~ ^([^_]+_[^_]+)_.*(_[^_]+)_[0-9]+.fastq.gz$ ]]
                  cat "$f" >> "$BASH_REMATCH[1]$BASH_REMATCH[2].fastq.gz"
                  done




                  • ^([^_]+_[^_]+)_.*(_[^_]+)_[0-9]+.fastq.gz$ - the crucial regex pattern to capture the first 2 prefixes into the 1st captured group (for ex. MC9_PREN) and R-named suffix into the 2nd captured group (for ex. _R1)






                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Apr 10 at 13:18









                  RomanPerekhrest

                  22.4k12144




                  22.4k12144












                      Popular posts from this blog

                      How to check contact read email or not when send email to Individual?

                      Bahrain

                      Postfix configuration issue with fips on centos 7; mailgun relay