find string from one file in another if not present then remove from original file

up vote
1
down vote

favorite

I'm trying to make a script that looks through each line of one file and if a line fails to match anywhere in any line of another text file then remove that line from the original file.

An example of and input and output desired from this script would be:

example input: file 1 (groups file),

hello
hi hello
hi
great
interesting

 file 2: 
this is a hi you see
this is great don't ya think
sometimes hello is a good expansion of its more commonly used shortening hi
interesting how brilliant coding can be just wish i could get the hang of it

Example script output - file 1 changed to:

hello
hi
great
interesting

So its removed hi hello, because its not present in the second file

here is the script, it seems to work to the point of making the variables.

#take first line from stability.contigs.groups
echo | head -n1 ~/test_folder/stability.contigs.groups > ~/test_folder/ErrorFix.txt
#remove the last 5 character
sed -i -r '$ s/.5$//' ~/test_folder/ErrorFix.txt 

#find match of the word string in errorfix.txt in stability.trim.contigs.fasta if not found then delete the line containing the string in stability.contigs.groups
STRING=$(cat ~/test_folder/MothurErrorFix.txt)
FILE=~/test_folder/stability.trim.contigs.fasta
if [ ! -z $(grep "$STRING" "$FILE") ]
 then
 perl -e 's/.*$VARs*n//' ~/test_folder/stability.contigs.groups
fi

edited Nov 18 at 9:35

Rui F Ribeiro

38.2k1475123

asked May 21 '16 at 18:17

Giles

180112

Please read stackoverflow.com/help/formatting
– garethTheRed
May 21 '16 at 18:27

yes, but the line from file 1 can be on and within any portion of text within file 2, so it needs to take that into account before deleting it from file 1 if its not there.
– Giles
May 21 '16 at 19:24

sorry was trying to keep it simple, short and sweet. can provide a better example if it helps. also i apologise if i occasionally make no sense, as i've just come off a 48 hour straight work stint, feeling a little woozy/dizzy!
– Giles
May 21 '16 at 19:30

i get that, should of known better really, as even in the short time i've been doing this i've seen it happen a few times in these forums when looking through other peoples questions. i think sometimes I can stare at something so long trying to figure it out, when I go to explain the problem i can overlook important factors in the explanation.
– Giles
May 21 '16 at 19:45

add a comment |

up vote
1
down vote

favorite

I'm trying to make a script that looks through each line of one file and if a line fails to match anywhere in any line of another text file then remove that line from the original file.

An example of and input and output desired from this script would be:

example input: file 1 (groups file),

hello
hi hello
hi
great
interesting

 file 2: 
this is a hi you see
this is great don't ya think
sometimes hello is a good expansion of its more commonly used shortening hi
interesting how brilliant coding can be just wish i could get the hang of it

Example script output - file 1 changed to:

hello
hi
great
interesting

So its removed hi hello, because its not present in the second file

here is the script, it seems to work to the point of making the variables.

#take first line from stability.contigs.groups
echo | head -n1 ~/test_folder/stability.contigs.groups > ~/test_folder/ErrorFix.txt
#remove the last 5 character
sed -i -r '$ s/.5$//' ~/test_folder/ErrorFix.txt 

#find match of the word string in errorfix.txt in stability.trim.contigs.fasta if not found then delete the line containing the string in stability.contigs.groups
STRING=$(cat ~/test_folder/MothurErrorFix.txt)
FILE=~/test_folder/stability.trim.contigs.fasta
if [ ! -z $(grep "$STRING" "$FILE") ]
 then
 perl -e 's/.*$VARs*n//' ~/test_folder/stability.contigs.groups
fi

edited Nov 18 at 9:35

Rui F Ribeiro

38.2k1475123

asked May 21 '16 at 18:17

Giles

180112

Please read stackoverflow.com/help/formatting
– garethTheRed
May 21 '16 at 18:27

yes, but the line from file 1 can be on and within any portion of text within file 2, so it needs to take that into account before deleting it from file 1 if its not there.
– Giles
May 21 '16 at 19:24

sorry was trying to keep it simple, short and sweet. can provide a better example if it helps. also i apologise if i occasionally make no sense, as i've just come off a 48 hour straight work stint, feeling a little woozy/dizzy!
– Giles
May 21 '16 at 19:30

i get that, should of known better really, as even in the short time i've been doing this i've seen it happen a few times in these forums when looking through other peoples questions. i think sometimes I can stare at something so long trying to figure it out, when I go to explain the problem i can overlook important factors in the explanation.
– Giles
May 21 '16 at 19:45

add a comment |

up vote
1
down vote

favorite

I'm trying to make a script that looks through each line of one file and if a line fails to match anywhere in any line of another text file then remove that line from the original file.

An example of and input and output desired from this script would be:

example input: file 1 (groups file),

hello
hi hello
hi
great
interesting

 file 2: 
this is a hi you see
this is great don't ya think
sometimes hello is a good expansion of its more commonly used shortening hi
interesting how brilliant coding can be just wish i could get the hang of it

Example script output - file 1 changed to:

hello
hi
great
interesting

So its removed hi hello, because its not present in the second file

here is the script, it seems to work to the point of making the variables.

#take first line from stability.contigs.groups
echo | head -n1 ~/test_folder/stability.contigs.groups > ~/test_folder/ErrorFix.txt
#remove the last 5 character
sed -i -r '$ s/.5$//' ~/test_folder/ErrorFix.txt 

#find match of the word string in errorfix.txt in stability.trim.contigs.fasta if not found then delete the line containing the string in stability.contigs.groups
STRING=$(cat ~/test_folder/MothurErrorFix.txt)
FILE=~/test_folder/stability.trim.contigs.fasta
if [ ! -z $(grep "$STRING" "$FILE") ]
 then
 perl -e 's/.*$VARs*n//' ~/test_folder/stability.contigs.groups
fi

edited Nov 18 at 9:35

Rui F Ribeiro

38.2k1475123

asked May 21 '16 at 18:17

Giles

180112

I'm trying to make a script that looks through each line of one file and if a line fails to match anywhere in any line of another text file then remove that line from the original file.

An example of and input and output desired from this script would be:

example input: file 1 (groups file),

hello
hi hello
hi
great
interesting

 file 2: 
this is a hi you see
this is great don't ya think
sometimes hello is a good expansion of its more commonly used shortening hi
interesting how brilliant coding can be just wish i could get the hang of it

Example script output - file 1 changed to:

hello
hi
great
interesting

So its removed hi hello, because its not present in the second file

here is the script, it seems to work to the point of making the variables.

#take first line from stability.contigs.groups
echo | head -n1 ~/test_folder/stability.contigs.groups > ~/test_folder/ErrorFix.txt
#remove the last 5 character
sed -i -r '$ s/.5$//' ~/test_folder/ErrorFix.txt 

#find match of the word string in errorfix.txt in stability.trim.contigs.fasta if not found then delete the line containing the string in stability.contigs.groups
STRING=$(cat ~/test_folder/MothurErrorFix.txt)
FILE=~/test_folder/stability.trim.contigs.fasta
if [ ! -z $(grep "$STRING" "$FILE") ]
 then
 perl -e 's/.*$VARs*n//' ~/test_folder/stability.contigs.groups
fi

text-processing

edited Nov 18 at 9:35

Rui F Ribeiro

38.2k1475123

asked May 21 '16 at 18:17

Giles

180112

edited Nov 18 at 9:35

Rui F Ribeiro

38.2k1475123

asked May 21 '16 at 18:17

Giles

180112

edited Nov 18 at 9:35

Rui F Ribeiro

38.2k1475123

edited Nov 18 at 9:35

Rui F Ribeiro

38.2k1475123

edited Nov 18 at 9:35

Rui F Ribeiro

38.2k1475123

asked May 21 '16 at 18:17

Giles

180112

asked May 21 '16 at 18:17

Giles

180112

asked May 21 '16 at 18:17

Giles

180112

Please read stackoverflow.com/help/formatting
– garethTheRed
May 21 '16 at 18:27

yes, but the line from file 1 can be on and within any portion of text within file 2, so it needs to take that into account before deleting it from file 1 if its not there.
– Giles
May 21 '16 at 19:24

sorry was trying to keep it simple, short and sweet. can provide a better example if it helps. also i apologise if i occasionally make no sense, as i've just come off a 48 hour straight work stint, feeling a little woozy/dizzy!
– Giles
May 21 '16 at 19:30

i get that, should of known better really, as even in the short time i've been doing this i've seen it happen a few times in these forums when looking through other peoples questions. i think sometimes I can stare at something so long trying to figure it out, when I go to explain the problem i can overlook important factors in the explanation.
– Giles
May 21 '16 at 19:45

add a comment |

Please read stackoverflow.com/help/formatting
– garethTheRed
May 21 '16 at 18:27

yes, but the line from file 1 can be on and within any portion of text within file 2, so it needs to take that into account before deleting it from file 1 if its not there.
– Giles
May 21 '16 at 19:24

sorry was trying to keep it simple, short and sweet. can provide a better example if it helps. also i apologise if i occasionally make no sense, as i've just come off a 48 hour straight work stint, feeling a little woozy/dizzy!
– Giles
May 21 '16 at 19:30

i get that, should of known better really, as even in the short time i've been doing this i've seen it happen a few times in these forums when looking through other peoples questions. i think sometimes I can stare at something so long trying to figure it out, when I go to explain the problem i can overlook important factors in the explanation.
– Giles
May 21 '16 at 19:45

Please read stackoverflow.com/help/formatting
– garethTheRed
May 21 '16 at 18:27

yes, but the line from file 1 can be on and within any portion of text within file 2, so it needs to take that into account before deleting it from file 1 if its not there.
– Giles
May 21 '16 at 19:24

sorry was trying to keep it simple, short and sweet. can provide a better example if it helps. also i apologise if i occasionally make no sense, as i've just come off a 48 hour straight work stint, feeling a little woozy/dizzy!
– Giles
May 21 '16 at 19:30

i get that, should of known better really, as even in the short time i've been doing this i've seen it happen a few times in these forums when looking through other peoples questions. i think sometimes I can stare at something so long trying to figure it out, when I go to explain the problem i can overlook important factors in the explanation.
– Giles
May 21 '16 at 19:45

add a comment |

2 Answers
2

active

oldest

votes

up vote
1
down vote

accepted

If you have gnu grep you could run:

grep -oFf file1 file2 | sort | uniq | grep -Ff - file1

remove the last grep if don't need to preserve the order of the lines in file1.

If you don't have access to gnu grep, with awk:

awk 'NR==FNRz[$0]++;next;for (l in z)if (index($0, l)) y[l]++
ENDfor (i in y) print i' file1 file2

edited May 21 '16 at 22:29

community wiki

6 revs
don_crissti

gnu grep worked perfectly, thank you. awk solution had a few problems, which seemed to be associated with grep and some numbers that were in the text file (text file is huge, thought it was all normal text, but i was wrong)
– Giles
May 21 '16 at 20:28

add a comment |

up vote
0
down vote

Go for don_crissti's (accepted) answer if you have GNU grep. Just in case you don't (e.g. on a standard Mac OS X, where that won't work), you could alternatively save this snippet to a bash script, e.g. myconvert.sh

#!/bin/bash
while IFS='' read -r line || [[ -n "$line" ]]; do
 if ! grep -Fq "$line" $2
 then
 sed -i '' "/$(echo $line | sed -e 's//$*.^|/\&/g')/d" $1
 fi
done < "$1"

an call it with the two files as arguments

./myconvert.sh file1 file2

However, please note don_crissti's knowledgeable comments below regarding the usage of while/read and the obvious performance drawbacks of invoking sed.

edited May 21 '16 at 21:07

answered May 21 '16 at 20:23

Mischa

1012

Avoid while..reading a file... Also, if file1 had 10.000 lines your script would edit the file in-place 10.000 times (via sed) not to mention your sed command could (potentially) remove other lines too since you're not escaping any special characters that may be present in file1 (most common of them being the dot).
– don_crissti
May 21 '16 at 20:28

With a modern shell like e.g. bash or zsh you could read the lines of file1 into an array and for each element use grep -q and if successful print the element: c=0; readarray -t arr <file1; while [ $c -le $#arr ]; do grep -qF "$arr[$c]" file2 && printf %s\n "$arr[$c]"; ((c++)); done Now, if you modify this to print something else that you then pipe to sed as a script you can edit the file in-place with a single sed invocation.
– don_crissti
May 21 '16 at 22:31

add a comment |

Your Answer

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f284658%2ffind-string-from-one-file-in-another-if-not-present-then-remove-from-original-fi%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

up vote
1
down vote

accepted

If you have gnu grep you could run:

grep -oFf file1 file2 | sort | uniq | grep -Ff - file1

remove the last grep if don't need to preserve the order of the lines in file1.

If you don't have access to gnu grep, with awk:

awk 'NR==FNRz[$0]++;next;for (l in z)if (index($0, l)) y[l]++
ENDfor (i in y) print i' file1 file2

edited May 21 '16 at 22:29

community wiki

6 revs
don_crissti

gnu grep worked perfectly, thank you. awk solution had a few problems, which seemed to be associated with grep and some numbers that were in the text file (text file is huge, thought it was all normal text, but i was wrong)
– Giles
May 21 '16 at 20:28

add a comment |

up vote
1
down vote

accepted

If you have gnu grep you could run:

grep -oFf file1 file2 | sort | uniq | grep -Ff - file1

remove the last grep if don't need to preserve the order of the lines in file1.

If you don't have access to gnu grep, with awk:

awk 'NR==FNRz[$0]++;next;for (l in z)if (index($0, l)) y[l]++
ENDfor (i in y) print i' file1 file2

edited May 21 '16 at 22:29

community wiki

6 revs
don_crissti

gnu grep worked perfectly, thank you. awk solution had a few problems, which seemed to be associated with grep and some numbers that were in the text file (text file is huge, thought it was all normal text, but i was wrong)
– Giles
May 21 '16 at 20:28

add a comment |

up vote
1
down vote

accepted

If you have gnu grep you could run:

grep -oFf file1 file2 | sort | uniq | grep -Ff - file1

remove the last grep if don't need to preserve the order of the lines in file1.

If you don't have access to gnu grep, with awk:

awk 'NR==FNRz[$0]++;next;for (l in z)if (index($0, l)) y[l]++
ENDfor (i in y) print i' file1 file2

edited May 21 '16 at 22:29

community wiki

6 revs
don_crissti

If you have gnu grep you could run:

grep -oFf file1 file2 | sort | uniq | grep -Ff - file1

remove the last grep if don't need to preserve the order of the lines in file1.

If you don't have access to gnu grep, with awk:

awk 'NR==FNRz[$0]++;next;for (l in z)if (index($0, l)) y[l]++
ENDfor (i in y) print i' file1 file2

edited May 21 '16 at 22:29

community wiki

6 revs
don_crissti

edited May 21 '16 at 22:29

community wiki

6 revs
don_crissti

community wiki

6 revs
don_crissti

community wiki

6 revs
don_crissti

gnu grep worked perfectly, thank you. awk solution had a few problems, which seemed to be associated with grep and some numbers that were in the text file (text file is huge, thought it was all normal text, but i was wrong)
– Giles
May 21 '16 at 20:28

add a comment |

gnu grep worked perfectly, thank you. awk solution had a few problems, which seemed to be associated with grep and some numbers that were in the text file (text file is huge, thought it was all normal text, but i was wrong)
– Giles
May 21 '16 at 20:28

gnu grep worked perfectly, thank you. awk solution had a few problems, which seemed to be associated with grep and some numbers that were in the text file (text file is huge, thought it was all normal text, but i was wrong)
– Giles
May 21 '16 at 20:28

add a comment |

up vote
0
down vote

#!/bin/bash
while IFS='' read -r line || [[ -n "$line" ]]; do
 if ! grep -Fq "$line" $2
 then
 sed -i '' "/$(echo $line | sed -e 's//$*.^|/\&/g')/d" $1
 fi
done < "$1"

an call it with the two files as arguments

./myconvert.sh file1 file2

However, please note don_crissti's knowledgeable comments below regarding the usage of while/read and the obvious performance drawbacks of invoking sed.

edited May 21 '16 at 21:07

answered May 21 '16 at 20:23

Mischa

1012

Avoid while..reading a file... Also, if file1 had 10.000 lines your script would edit the file in-place 10.000 times (via sed) not to mention your sed command could (potentially) remove other lines too since you're not escaping any special characters that may be present in file1 (most common of them being the dot).
– don_crissti
May 21 '16 at 20:28

With a modern shell like e.g. bash or zsh you could read the lines of file1 into an array and for each element use grep -q and if successful print the element: c=0; readarray -t arr <file1; while [ $c -le $#arr ]; do grep -qF "$arr[$c]" file2 && printf %s\n "$arr[$c]"; ((c++)); done Now, if you modify this to print something else that you then pipe to sed as a script you can edit the file in-place with a single sed invocation.
– don_crissti
May 21 '16 at 22:31

add a comment |

up vote
0
down vote

#!/bin/bash
while IFS='' read -r line || [[ -n "$line" ]]; do
 if ! grep -Fq "$line" $2
 then
 sed -i '' "/$(echo $line | sed -e 's//$*.^|/\&/g')/d" $1
 fi
done < "$1"

an call it with the two files as arguments

./myconvert.sh file1 file2

However, please note don_crissti's knowledgeable comments below regarding the usage of while/read and the obvious performance drawbacks of invoking sed.

edited May 21 '16 at 21:07

answered May 21 '16 at 20:23

Mischa

1012

Avoid while..reading a file... Also, if file1 had 10.000 lines your script would edit the file in-place 10.000 times (via sed) not to mention your sed command could (potentially) remove other lines too since you're not escaping any special characters that may be present in file1 (most common of them being the dot).
– don_crissti
May 21 '16 at 20:28

With a modern shell like e.g. bash or zsh you could read the lines of file1 into an array and for each element use grep -q and if successful print the element: c=0; readarray -t arr <file1; while [ $c -le $#arr ]; do grep -qF "$arr[$c]" file2 && printf %s\n "$arr[$c]"; ((c++)); done Now, if you modify this to print something else that you then pipe to sed as a script you can edit the file in-place with a single sed invocation.
– don_crissti
May 21 '16 at 22:31

add a comment |

up vote
0
down vote

#!/bin/bash
while IFS='' read -r line || [[ -n "$line" ]]; do
 if ! grep -Fq "$line" $2
 then
 sed -i '' "/$(echo $line | sed -e 's//$*.^|/\&/g')/d" $1
 fi
done < "$1"

an call it with the two files as arguments

./myconvert.sh file1 file2

However, please note don_crissti's knowledgeable comments below regarding the usage of while/read and the obvious performance drawbacks of invoking sed.

edited May 21 '16 at 21:07

answered May 21 '16 at 20:23

Mischa

1012

#!/bin/bash
while IFS='' read -r line || [[ -n "$line" ]]; do
 if ! grep -Fq "$line" $2
 then
 sed -i '' "/$(echo $line | sed -e 's//$*.^|/\&/g')/d" $1
 fi
done < "$1"

an call it with the two files as arguments

./myconvert.sh file1 file2

However, please note don_crissti's knowledgeable comments below regarding the usage of while/read and the obvious performance drawbacks of invoking sed.

edited May 21 '16 at 21:07

answered May 21 '16 at 20:23

Mischa

1012

edited May 21 '16 at 21:07

answered May 21 '16 at 20:23

Mischa

1012

answered May 21 '16 at 20:23

Mischa

1012

answered May 21 '16 at 20:23

Mischa

1012

Avoid while..reading a file... Also, if file1 had 10.000 lines your script would edit the file in-place 10.000 times (via sed) not to mention your sed command could (potentially) remove other lines too since you're not escaping any special characters that may be present in file1 (most common of them being the dot).
– don_crissti
May 21 '16 at 20:28

With a modern shell like e.g. bash or zsh you could read the lines of file1 into an array and for each element use grep -q and if successful print the element: c=0; readarray -t arr <file1; while [ $c -le $#arr ]; do grep -qF "$arr[$c]" file2 && printf %s\n "$arr[$c]"; ((c++)); done Now, if you modify this to print something else that you then pipe to sed as a script you can edit the file in-place with a single sed invocation.
– don_crissti
May 21 '16 at 22:31

add a comment |

Avoid while..reading a file... Also, if file1 had 10.000 lines your script would edit the file in-place 10.000 times (via sed) not to mention your sed command could (potentially) remove other lines too since you're not escaping any special characters that may be present in file1 (most common of them being the dot).
– don_crissti
May 21 '16 at 20:28

With a modern shell like e.g. bash or zsh you could read the lines of file1 into an array and for each element use grep -q and if successful print the element: c=0; readarray -t arr <file1; while [ $c -le $#arr ]; do grep -qF "$arr[$c]" file2 && printf %s\n "$arr[$c]"; ((c++)); done Now, if you modify this to print something else that you then pipe to sed as a script you can edit the file in-place with a single sed invocation.
– don_crissti
May 21 '16 at 22:31

Avoid while..reading a file... Also, if file1 had 10.000 lines your script would edit the file in-place 10.000 times (via sed) not to mention your sed command could (potentially) remove other lines too since you're not escaping any special characters that may be present in file1 (most common of them being the dot).
– don_crissti
May 21 '16 at 20:28

With a modern shell like e.g. bash or zsh you could read the lines of file1 into an array and for each element use grep -q and if successful print the element:

c=0; readarray -t arr <file1; while [ $c -le $#arr ]; do grep -qF "$arr[$c]" file2 && printf %s\n "$arr[$c]"; ((c++)); done

Now, if you modify this to print something else that you then pipe to sed as a script you can edit the file in-place with a single sed invocation.
– don_crissti
May 21 '16 at 22:31

With a modern shell like e.g. bash or zsh you could read the lines of file1 into an array and for each element use grep -q and if successful print the element:

c=0; readarray -t arr <file1; while [ $c -le $#arr ]; do grep -qF "$arr[$c]" file2 && printf %s\n "$arr[$c]"; ((c++)); done

Now, if you modify this to print something else that you then pipe to sed as a script you can edit the file in-place with a single sed invocation.
– don_crissti
May 21 '16 at 22:31

add a comment |

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

搜尋此網誌

mjhjmtu