Combine text files by title using grep awk sed

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
0
down vote

favorite












I'm trying to combine multiple files into one final file. Each file has many entries within them, most with overlapping titles. I would like to merge content between both, under the title headings.
Think of combining two dictionaries and it makes more sense. Entries for a single word can be found in both, but definitions differ slightly in each. Some entries exist in one and not the other, etc.



For example, I'd like to merge these two files to produce a single output file:



File 1



Entry 1
Green Trees
Entry 3
Orange Fibers


File 2



Entry 1
Red Trees
Entry 2
Spotted Zebras
Entry 3
Blue Fibers


Output File



Entry 1
Green Trees
Red Trees
Entry 2
Spotted Zebras
Entry 3
Orange Fibers
Blue Fibers


Note that Entry 2 did not exist in File 1, but made it to the final product. Likewise, the content of each entry was merged anywhere the entry ID matches.



How can I accomplish this?



EDIT: The above is a simplified version for asking the question. Below is a sample of actual entries in the files.



The $$$00001 is the Entry title.



From File 1



$$$00001
<b><br>- Original: Α<b><br></b></b>- Transliteration: A<b><br></b></b>- Phonetic: al'-fah<b><br></b></b>-...
$$$00002
<b><br>- Original: script<b><br></b></b>- Translitera...


From File 2



$$$00001
<b><br>α<b><br></b></b>a; indeclinable...
$$$00002
<b><br>texts<b><br></b></b>A...






share|improve this question






















  • Are the headings all of the format Entry <num>?
    – muru
    Apr 10 at 8:43










  • Entry <num> is a simplified version of the headings in order to ask the question. More realistically, they will be zero-padded numbered entries with 5 digits.
    – Matt Zabojnik
    Apr 10 at 8:47











  • Well, how do we identify the headings then?
    – muru
    Apr 10 at 8:47










  • I've updated my question with a real example for clarity.
    – Matt Zabojnik
    Apr 10 at 8:53














up vote
0
down vote

favorite












I'm trying to combine multiple files into one final file. Each file has many entries within them, most with overlapping titles. I would like to merge content between both, under the title headings.
Think of combining two dictionaries and it makes more sense. Entries for a single word can be found in both, but definitions differ slightly in each. Some entries exist in one and not the other, etc.



For example, I'd like to merge these two files to produce a single output file:



File 1



Entry 1
Green Trees
Entry 3
Orange Fibers


File 2



Entry 1
Red Trees
Entry 2
Spotted Zebras
Entry 3
Blue Fibers


Output File



Entry 1
Green Trees
Red Trees
Entry 2
Spotted Zebras
Entry 3
Orange Fibers
Blue Fibers


Note that Entry 2 did not exist in File 1, but made it to the final product. Likewise, the content of each entry was merged anywhere the entry ID matches.



How can I accomplish this?



EDIT: The above is a simplified version for asking the question. Below is a sample of actual entries in the files.



The $$$00001 is the Entry title.



From File 1



$$$00001
<b><br>- Original: Α<b><br></b></b>- Transliteration: A<b><br></b></b>- Phonetic: al'-fah<b><br></b></b>-...
$$$00002
<b><br>- Original: script<b><br></b></b>- Translitera...


From File 2



$$$00001
<b><br>α<b><br></b></b>a; indeclinable...
$$$00002
<b><br>texts<b><br></b></b>A...






share|improve this question






















  • Are the headings all of the format Entry <num>?
    – muru
    Apr 10 at 8:43










  • Entry <num> is a simplified version of the headings in order to ask the question. More realistically, they will be zero-padded numbered entries with 5 digits.
    – Matt Zabojnik
    Apr 10 at 8:47











  • Well, how do we identify the headings then?
    – muru
    Apr 10 at 8:47










  • I've updated my question with a real example for clarity.
    – Matt Zabojnik
    Apr 10 at 8:53












up vote
0
down vote

favorite









up vote
0
down vote

favorite











I'm trying to combine multiple files into one final file. Each file has many entries within them, most with overlapping titles. I would like to merge content between both, under the title headings.
Think of combining two dictionaries and it makes more sense. Entries for a single word can be found in both, but definitions differ slightly in each. Some entries exist in one and not the other, etc.



For example, I'd like to merge these two files to produce a single output file:



File 1



Entry 1
Green Trees
Entry 3
Orange Fibers


File 2



Entry 1
Red Trees
Entry 2
Spotted Zebras
Entry 3
Blue Fibers


Output File



Entry 1
Green Trees
Red Trees
Entry 2
Spotted Zebras
Entry 3
Orange Fibers
Blue Fibers


Note that Entry 2 did not exist in File 1, but made it to the final product. Likewise, the content of each entry was merged anywhere the entry ID matches.



How can I accomplish this?



EDIT: The above is a simplified version for asking the question. Below is a sample of actual entries in the files.



The $$$00001 is the Entry title.



From File 1



$$$00001
<b><br>- Original: Α<b><br></b></b>- Transliteration: A<b><br></b></b>- Phonetic: al'-fah<b><br></b></b>-...
$$$00002
<b><br>- Original: script<b><br></b></b>- Translitera...


From File 2



$$$00001
<b><br>α<b><br></b></b>a; indeclinable...
$$$00002
<b><br>texts<b><br></b></b>A...






share|improve this question














I'm trying to combine multiple files into one final file. Each file has many entries within them, most with overlapping titles. I would like to merge content between both, under the title headings.
Think of combining two dictionaries and it makes more sense. Entries for a single word can be found in both, but definitions differ slightly in each. Some entries exist in one and not the other, etc.



For example, I'd like to merge these two files to produce a single output file:



File 1



Entry 1
Green Trees
Entry 3
Orange Fibers


File 2



Entry 1
Red Trees
Entry 2
Spotted Zebras
Entry 3
Blue Fibers


Output File



Entry 1
Green Trees
Red Trees
Entry 2
Spotted Zebras
Entry 3
Orange Fibers
Blue Fibers


Note that Entry 2 did not exist in File 1, but made it to the final product. Likewise, the content of each entry was merged anywhere the entry ID matches.



How can I accomplish this?



EDIT: The above is a simplified version for asking the question. Below is a sample of actual entries in the files.



The $$$00001 is the Entry title.



From File 1



$$$00001
<b><br>- Original: Α<b><br></b></b>- Transliteration: A<b><br></b></b>- Phonetic: al'-fah<b><br></b></b>-...
$$$00002
<b><br>- Original: script<b><br></b></b>- Translitera...


From File 2



$$$00001
<b><br>α<b><br></b></b>a; indeclinable...
$$$00002
<b><br>texts<b><br></b></b>A...








share|improve this question













share|improve this question




share|improve this question








edited Apr 10 at 8:53

























asked Apr 10 at 8:18









Matt Zabojnik

86




86











  • Are the headings all of the format Entry <num>?
    – muru
    Apr 10 at 8:43










  • Entry <num> is a simplified version of the headings in order to ask the question. More realistically, they will be zero-padded numbered entries with 5 digits.
    – Matt Zabojnik
    Apr 10 at 8:47











  • Well, how do we identify the headings then?
    – muru
    Apr 10 at 8:47










  • I've updated my question with a real example for clarity.
    – Matt Zabojnik
    Apr 10 at 8:53
















  • Are the headings all of the format Entry <num>?
    – muru
    Apr 10 at 8:43










  • Entry <num> is a simplified version of the headings in order to ask the question. More realistically, they will be zero-padded numbered entries with 5 digits.
    – Matt Zabojnik
    Apr 10 at 8:47











  • Well, how do we identify the headings then?
    – muru
    Apr 10 at 8:47










  • I've updated my question with a real example for clarity.
    – Matt Zabojnik
    Apr 10 at 8:53















Are the headings all of the format Entry <num>?
– muru
Apr 10 at 8:43




Are the headings all of the format Entry <num>?
– muru
Apr 10 at 8:43












Entry <num> is a simplified version of the headings in order to ask the question. More realistically, they will be zero-padded numbered entries with 5 digits.
– Matt Zabojnik
Apr 10 at 8:47





Entry <num> is a simplified version of the headings in order to ask the question. More realistically, they will be zero-padded numbered entries with 5 digits.
– Matt Zabojnik
Apr 10 at 8:47













Well, how do we identify the headings then?
– muru
Apr 10 at 8:47




Well, how do we identify the headings then?
– muru
Apr 10 at 8:47












I've updated my question with a real example for clarity.
– Matt Zabojnik
Apr 10 at 8:53




I've updated my question with a real example for clarity.
– Matt Zabojnik
Apr 10 at 8:53










1 Answer
1






active

oldest

votes

















up vote
1
down vote



accepted










A simple awk one-liner solves your example:



awk '/^Entry/k=$0;nextg[k]=g[k]"n"$0ENDfor(k in g)print k g[k]' file1 file2


I suppose you know that basically awk processes input lines one after another according to a program. This particular awk program is specified as first argument and consists of three statements. Let’s analyze them one by one:



  • /^Entry/k=$0;next means: if the processed line matches /^Entry/, store it in the variable k and go to the next cycle ignoring the following statements.


  • g[k]=g[k]"n"$0 has no preceding condition, so it is always executed, and means: update the value stored in the dictionary g with the key k: the new value has to be the concatenation of the (possibly empty) previous value g[k], a carriage return "n", and the current line.


  • ENDfor(k in g)print k g[k] has an END condition and is therefore executed when all input lines have been processed. It says: for each key in g, that is, for each title which has appeared in the input files, print the associated value, which is the concatenation of all lines found in input files under that title.


To use it IRL, You have to replace /^Entry/ with the correct pattern (probably /^$$$/).






share|improve this answer






















  • Excellent, this solution worked perfectly. Thank you! Would you mind explaining a little further on what's going on in there?
    – Matt Zabojnik
    Apr 10 at 9:46











  • @MattZabojnik Done.
    – Dario
    Apr 10 at 10:54










Your Answer







StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);








 

draft saved


draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f436716%2fcombine-text-files-by-title-using-grep-awk-sed%23new-answer', 'question_page');

);

Post as a guest






























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
1
down vote



accepted










A simple awk one-liner solves your example:



awk '/^Entry/k=$0;nextg[k]=g[k]"n"$0ENDfor(k in g)print k g[k]' file1 file2


I suppose you know that basically awk processes input lines one after another according to a program. This particular awk program is specified as first argument and consists of three statements. Let’s analyze them one by one:



  • /^Entry/k=$0;next means: if the processed line matches /^Entry/, store it in the variable k and go to the next cycle ignoring the following statements.


  • g[k]=g[k]"n"$0 has no preceding condition, so it is always executed, and means: update the value stored in the dictionary g with the key k: the new value has to be the concatenation of the (possibly empty) previous value g[k], a carriage return "n", and the current line.


  • ENDfor(k in g)print k g[k] has an END condition and is therefore executed when all input lines have been processed. It says: for each key in g, that is, for each title which has appeared in the input files, print the associated value, which is the concatenation of all lines found in input files under that title.


To use it IRL, You have to replace /^Entry/ with the correct pattern (probably /^$$$/).






share|improve this answer






















  • Excellent, this solution worked perfectly. Thank you! Would you mind explaining a little further on what's going on in there?
    – Matt Zabojnik
    Apr 10 at 9:46











  • @MattZabojnik Done.
    – Dario
    Apr 10 at 10:54














up vote
1
down vote



accepted










A simple awk one-liner solves your example:



awk '/^Entry/k=$0;nextg[k]=g[k]"n"$0ENDfor(k in g)print k g[k]' file1 file2


I suppose you know that basically awk processes input lines one after another according to a program. This particular awk program is specified as first argument and consists of three statements. Let’s analyze them one by one:



  • /^Entry/k=$0;next means: if the processed line matches /^Entry/, store it in the variable k and go to the next cycle ignoring the following statements.


  • g[k]=g[k]"n"$0 has no preceding condition, so it is always executed, and means: update the value stored in the dictionary g with the key k: the new value has to be the concatenation of the (possibly empty) previous value g[k], a carriage return "n", and the current line.


  • ENDfor(k in g)print k g[k] has an END condition and is therefore executed when all input lines have been processed. It says: for each key in g, that is, for each title which has appeared in the input files, print the associated value, which is the concatenation of all lines found in input files under that title.


To use it IRL, You have to replace /^Entry/ with the correct pattern (probably /^$$$/).






share|improve this answer






















  • Excellent, this solution worked perfectly. Thank you! Would you mind explaining a little further on what's going on in there?
    – Matt Zabojnik
    Apr 10 at 9:46











  • @MattZabojnik Done.
    – Dario
    Apr 10 at 10:54












up vote
1
down vote



accepted







up vote
1
down vote



accepted






A simple awk one-liner solves your example:



awk '/^Entry/k=$0;nextg[k]=g[k]"n"$0ENDfor(k in g)print k g[k]' file1 file2


I suppose you know that basically awk processes input lines one after another according to a program. This particular awk program is specified as first argument and consists of three statements. Let’s analyze them one by one:



  • /^Entry/k=$0;next means: if the processed line matches /^Entry/, store it in the variable k and go to the next cycle ignoring the following statements.


  • g[k]=g[k]"n"$0 has no preceding condition, so it is always executed, and means: update the value stored in the dictionary g with the key k: the new value has to be the concatenation of the (possibly empty) previous value g[k], a carriage return "n", and the current line.


  • ENDfor(k in g)print k g[k] has an END condition and is therefore executed when all input lines have been processed. It says: for each key in g, that is, for each title which has appeared in the input files, print the associated value, which is the concatenation of all lines found in input files under that title.


To use it IRL, You have to replace /^Entry/ with the correct pattern (probably /^$$$/).






share|improve this answer














A simple awk one-liner solves your example:



awk '/^Entry/k=$0;nextg[k]=g[k]"n"$0ENDfor(k in g)print k g[k]' file1 file2


I suppose you know that basically awk processes input lines one after another according to a program. This particular awk program is specified as first argument and consists of three statements. Let’s analyze them one by one:



  • /^Entry/k=$0;next means: if the processed line matches /^Entry/, store it in the variable k and go to the next cycle ignoring the following statements.


  • g[k]=g[k]"n"$0 has no preceding condition, so it is always executed, and means: update the value stored in the dictionary g with the key k: the new value has to be the concatenation of the (possibly empty) previous value g[k], a carriage return "n", and the current line.


  • ENDfor(k in g)print k g[k] has an END condition and is therefore executed when all input lines have been processed. It says: for each key in g, that is, for each title which has appeared in the input files, print the associated value, which is the concatenation of all lines found in input files under that title.


To use it IRL, You have to replace /^Entry/ with the correct pattern (probably /^$$$/).







share|improve this answer














share|improve this answer



share|improve this answer








edited Apr 10 at 10:54

























answered Apr 10 at 9:14









Dario

30115




30115











  • Excellent, this solution worked perfectly. Thank you! Would you mind explaining a little further on what's going on in there?
    – Matt Zabojnik
    Apr 10 at 9:46











  • @MattZabojnik Done.
    – Dario
    Apr 10 at 10:54
















  • Excellent, this solution worked perfectly. Thank you! Would you mind explaining a little further on what's going on in there?
    – Matt Zabojnik
    Apr 10 at 9:46











  • @MattZabojnik Done.
    – Dario
    Apr 10 at 10:54















Excellent, this solution worked perfectly. Thank you! Would you mind explaining a little further on what's going on in there?
– Matt Zabojnik
Apr 10 at 9:46





Excellent, this solution worked perfectly. Thank you! Would you mind explaining a little further on what's going on in there?
– Matt Zabojnik
Apr 10 at 9:46













@MattZabojnik Done.
– Dario
Apr 10 at 10:54




@MattZabojnik Done.
– Dario
Apr 10 at 10:54












 

draft saved


draft discarded


























 


draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f436716%2fcombine-text-files-by-title-using-grep-awk-sed%23new-answer', 'question_page');

);

Post as a guest













































































Popular posts from this blog

How to check contact read email or not when send email to Individual?

Displaying single band from multi-band raster using QGIS

How many registers does an x86_64 CPU actually have?