find string from one file in another if not present then remove from original file
Clash Royale CLAN TAG#URR8PPP
up vote
1
down vote
favorite
I'm trying to make a script that looks through each line of one file and if a line fails to match anywhere in any line of another text file then remove that line from the original file.
An example of and input and output desired from this script would be:
example input: file 1 (groups file),
hello
hi hello
hi
great
interesting
file 2:
this is a hi you see
this is great don't ya think
sometimes hello is a good expansion of its more commonly used shortening hi
interesting how brilliant coding can be just wish i could get the hang of it
Example script output - file 1 changed to:
hello
hi
great
interesting
So its removed hi hello
, because its not present in the second file
here is the script, it seems to work to the point of making the variables.
#take first line from stability.contigs.groups
echo | head -n1 ~/test_folder/stability.contigs.groups > ~/test_folder/ErrorFix.txt
#remove the last 5 character
sed -i -r '$ s/.5$//' ~/test_folder/ErrorFix.txt
#find match of the word string in errorfix.txt in stability.trim.contigs.fasta if not found then delete the line containing the string in stability.contigs.groups
STRING=$(cat ~/test_folder/MothurErrorFix.txt)
FILE=~/test_folder/stability.trim.contigs.fasta
if [ ! -z $(grep "$STRING" "$FILE") ]
then
perl -e 's/.*$VARs*n//' ~/test_folder/stability.contigs.groups
fi
text-processing
add a comment |
up vote
1
down vote
favorite
I'm trying to make a script that looks through each line of one file and if a line fails to match anywhere in any line of another text file then remove that line from the original file.
An example of and input and output desired from this script would be:
example input: file 1 (groups file),
hello
hi hello
hi
great
interesting
file 2:
this is a hi you see
this is great don't ya think
sometimes hello is a good expansion of its more commonly used shortening hi
interesting how brilliant coding can be just wish i could get the hang of it
Example script output - file 1 changed to:
hello
hi
great
interesting
So its removed hi hello
, because its not present in the second file
here is the script, it seems to work to the point of making the variables.
#take first line from stability.contigs.groups
echo | head -n1 ~/test_folder/stability.contigs.groups > ~/test_folder/ErrorFix.txt
#remove the last 5 character
sed -i -r '$ s/.5$//' ~/test_folder/ErrorFix.txt
#find match of the word string in errorfix.txt in stability.trim.contigs.fasta if not found then delete the line containing the string in stability.contigs.groups
STRING=$(cat ~/test_folder/MothurErrorFix.txt)
FILE=~/test_folder/stability.trim.contigs.fasta
if [ ! -z $(grep "$STRING" "$FILE") ]
then
perl -e 's/.*$VARs*n//' ~/test_folder/stability.contigs.groups
fi
text-processing
Please read stackoverflow.com/help/formatting
– garethTheRed
May 21 '16 at 18:27
yes, but the line from file 1 can be on and within any portion of text within file 2, so it needs to take that into account before deleting it from file 1 if its not there.
– Giles
May 21 '16 at 19:24
sorry was trying to keep it simple, short and sweet. can provide a better example if it helps. also i apologise if i occasionally make no sense, as i've just come off a 48 hour straight work stint, feeling a little woozy/dizzy!
– Giles
May 21 '16 at 19:30
i get that, should of known better really, as even in the short time i've been doing this i've seen it happen a few times in these forums when looking through other peoples questions. i think sometimes I can stare at something so long trying to figure it out, when I go to explain the problem i can overlook important factors in the explanation.
– Giles
May 21 '16 at 19:45
add a comment |
up vote
1
down vote
favorite
up vote
1
down vote
favorite
I'm trying to make a script that looks through each line of one file and if a line fails to match anywhere in any line of another text file then remove that line from the original file.
An example of and input and output desired from this script would be:
example input: file 1 (groups file),
hello
hi hello
hi
great
interesting
file 2:
this is a hi you see
this is great don't ya think
sometimes hello is a good expansion of its more commonly used shortening hi
interesting how brilliant coding can be just wish i could get the hang of it
Example script output - file 1 changed to:
hello
hi
great
interesting
So its removed hi hello
, because its not present in the second file
here is the script, it seems to work to the point of making the variables.
#take first line from stability.contigs.groups
echo | head -n1 ~/test_folder/stability.contigs.groups > ~/test_folder/ErrorFix.txt
#remove the last 5 character
sed -i -r '$ s/.5$//' ~/test_folder/ErrorFix.txt
#find match of the word string in errorfix.txt in stability.trim.contigs.fasta if not found then delete the line containing the string in stability.contigs.groups
STRING=$(cat ~/test_folder/MothurErrorFix.txt)
FILE=~/test_folder/stability.trim.contigs.fasta
if [ ! -z $(grep "$STRING" "$FILE") ]
then
perl -e 's/.*$VARs*n//' ~/test_folder/stability.contigs.groups
fi
text-processing
I'm trying to make a script that looks through each line of one file and if a line fails to match anywhere in any line of another text file then remove that line from the original file.
An example of and input and output desired from this script would be:
example input: file 1 (groups file),
hello
hi hello
hi
great
interesting
file 2:
this is a hi you see
this is great don't ya think
sometimes hello is a good expansion of its more commonly used shortening hi
interesting how brilliant coding can be just wish i could get the hang of it
Example script output - file 1 changed to:
hello
hi
great
interesting
So its removed hi hello
, because its not present in the second file
here is the script, it seems to work to the point of making the variables.
#take first line from stability.contigs.groups
echo | head -n1 ~/test_folder/stability.contigs.groups > ~/test_folder/ErrorFix.txt
#remove the last 5 character
sed -i -r '$ s/.5$//' ~/test_folder/ErrorFix.txt
#find match of the word string in errorfix.txt in stability.trim.contigs.fasta if not found then delete the line containing the string in stability.contigs.groups
STRING=$(cat ~/test_folder/MothurErrorFix.txt)
FILE=~/test_folder/stability.trim.contigs.fasta
if [ ! -z $(grep "$STRING" "$FILE") ]
then
perl -e 's/.*$VARs*n//' ~/test_folder/stability.contigs.groups
fi
text-processing
text-processing
edited Nov 18 at 9:35
Rui F Ribeiro
38.2k1475123
38.2k1475123
asked May 21 '16 at 18:17
Giles
180112
180112
Please read stackoverflow.com/help/formatting
– garethTheRed
May 21 '16 at 18:27
yes, but the line from file 1 can be on and within any portion of text within file 2, so it needs to take that into account before deleting it from file 1 if its not there.
– Giles
May 21 '16 at 19:24
sorry was trying to keep it simple, short and sweet. can provide a better example if it helps. also i apologise if i occasionally make no sense, as i've just come off a 48 hour straight work stint, feeling a little woozy/dizzy!
– Giles
May 21 '16 at 19:30
i get that, should of known better really, as even in the short time i've been doing this i've seen it happen a few times in these forums when looking through other peoples questions. i think sometimes I can stare at something so long trying to figure it out, when I go to explain the problem i can overlook important factors in the explanation.
– Giles
May 21 '16 at 19:45
add a comment |
Please read stackoverflow.com/help/formatting
– garethTheRed
May 21 '16 at 18:27
yes, but the line from file 1 can be on and within any portion of text within file 2, so it needs to take that into account before deleting it from file 1 if its not there.
– Giles
May 21 '16 at 19:24
sorry was trying to keep it simple, short and sweet. can provide a better example if it helps. also i apologise if i occasionally make no sense, as i've just come off a 48 hour straight work stint, feeling a little woozy/dizzy!
– Giles
May 21 '16 at 19:30
i get that, should of known better really, as even in the short time i've been doing this i've seen it happen a few times in these forums when looking through other peoples questions. i think sometimes I can stare at something so long trying to figure it out, when I go to explain the problem i can overlook important factors in the explanation.
– Giles
May 21 '16 at 19:45
Please read stackoverflow.com/help/formatting
– garethTheRed
May 21 '16 at 18:27
Please read stackoverflow.com/help/formatting
– garethTheRed
May 21 '16 at 18:27
yes, but the line from file 1 can be on and within any portion of text within file 2, so it needs to take that into account before deleting it from file 1 if its not there.
– Giles
May 21 '16 at 19:24
yes, but the line from file 1 can be on and within any portion of text within file 2, so it needs to take that into account before deleting it from file 1 if its not there.
– Giles
May 21 '16 at 19:24
sorry was trying to keep it simple, short and sweet. can provide a better example if it helps. also i apologise if i occasionally make no sense, as i've just come off a 48 hour straight work stint, feeling a little woozy/dizzy!
– Giles
May 21 '16 at 19:30
sorry was trying to keep it simple, short and sweet. can provide a better example if it helps. also i apologise if i occasionally make no sense, as i've just come off a 48 hour straight work stint, feeling a little woozy/dizzy!
– Giles
May 21 '16 at 19:30
i get that, should of known better really, as even in the short time i've been doing this i've seen it happen a few times in these forums when looking through other peoples questions. i think sometimes I can stare at something so long trying to figure it out, when I go to explain the problem i can overlook important factors in the explanation.
– Giles
May 21 '16 at 19:45
i get that, should of known better really, as even in the short time i've been doing this i've seen it happen a few times in these forums when looking through other peoples questions. i think sometimes I can stare at something so long trying to figure it out, when I go to explain the problem i can overlook important factors in the explanation.
– Giles
May 21 '16 at 19:45
add a comment |
2 Answers
2
active
oldest
votes
up vote
1
down vote
accepted
If you have gnu grep
you could run:
grep -oFf file1 file2 | sort | uniq | grep -Ff - file1
remove the last grep
if don't need to preserve the order of the lines in file1
.
If you don't have access to gnu grep
, with awk
:
awk 'NR==FNRz[$0]++;next;for (l in z)if (index($0, l)) y[l]++
ENDfor (i in y) print i' file1 file2
gnu grep worked perfectly, thank you. awk solution had a few problems, which seemed to be associated with grep and some numbers that were in the text file (text file is huge, thought it was all normal text, but i was wrong)
– Giles
May 21 '16 at 20:28
add a comment |
up vote
0
down vote
Go for don_crissti's (accepted) answer if you have GNU grep
. Just in case you don't (e.g. on a standard Mac OS X, where that won't work), you could alternatively save this snippet to a bash script, e.g. myconvert.sh
#!/bin/bash
while IFS='' read -r line || [[ -n "$line" ]]; do
if ! grep -Fq "$line" $2
then
sed -i '' "/$(echo $line | sed -e 's//$*.^|/\&/g')/d" $1
fi
done < "$1"
an call it with the two files as arguments
./myconvert.sh file1 file2
However, please note don_crissti's knowledgeable comments below regarding the usage of while/read and the obvious performance drawbacks of invoking sed
.
Avoidwhile..read
ing a file... Also, iffile1
had 10.000 lines your script would edit the file in-place 10.000 times (viased
) not to mention yoursed
command could (potentially) remove other lines too since you're not escaping any special characters that may be present infile1
(most common of them being the dot).
– don_crissti
May 21 '16 at 20:28
With a modern shell like e.g.bash
orzsh
you could read the lines offile1
into an array and for each element usegrep -q
and if successful print the element:c=0; readarray -t arr <file1; while [ $c -le $#arr ]; do grep -qF "$arr[$c]" file2 && printf %s\n "$arr[$c]"; ((c++)); done
Now, if you modify this to print something else that you then pipe tosed
as a script you can edit the file in-place with a singlesed
invocation.
– don_crissti
May 21 '16 at 22:31
add a comment |
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
1
down vote
accepted
If you have gnu grep
you could run:
grep -oFf file1 file2 | sort | uniq | grep -Ff - file1
remove the last grep
if don't need to preserve the order of the lines in file1
.
If you don't have access to gnu grep
, with awk
:
awk 'NR==FNRz[$0]++;next;for (l in z)if (index($0, l)) y[l]++
ENDfor (i in y) print i' file1 file2
gnu grep worked perfectly, thank you. awk solution had a few problems, which seemed to be associated with grep and some numbers that were in the text file (text file is huge, thought it was all normal text, but i was wrong)
– Giles
May 21 '16 at 20:28
add a comment |
up vote
1
down vote
accepted
If you have gnu grep
you could run:
grep -oFf file1 file2 | sort | uniq | grep -Ff - file1
remove the last grep
if don't need to preserve the order of the lines in file1
.
If you don't have access to gnu grep
, with awk
:
awk 'NR==FNRz[$0]++;next;for (l in z)if (index($0, l)) y[l]++
ENDfor (i in y) print i' file1 file2
gnu grep worked perfectly, thank you. awk solution had a few problems, which seemed to be associated with grep and some numbers that were in the text file (text file is huge, thought it was all normal text, but i was wrong)
– Giles
May 21 '16 at 20:28
add a comment |
up vote
1
down vote
accepted
up vote
1
down vote
accepted
If you have gnu grep
you could run:
grep -oFf file1 file2 | sort | uniq | grep -Ff - file1
remove the last grep
if don't need to preserve the order of the lines in file1
.
If you don't have access to gnu grep
, with awk
:
awk 'NR==FNRz[$0]++;next;for (l in z)if (index($0, l)) y[l]++
ENDfor (i in y) print i' file1 file2
If you have gnu grep
you could run:
grep -oFf file1 file2 | sort | uniq | grep -Ff - file1
remove the last grep
if don't need to preserve the order of the lines in file1
.
If you don't have access to gnu grep
, with awk
:
awk 'NR==FNRz[$0]++;next;for (l in z)if (index($0, l)) y[l]++
ENDfor (i in y) print i' file1 file2
edited May 21 '16 at 22:29
community wiki
6 revs
don_crissti
gnu grep worked perfectly, thank you. awk solution had a few problems, which seemed to be associated with grep and some numbers that were in the text file (text file is huge, thought it was all normal text, but i was wrong)
– Giles
May 21 '16 at 20:28
add a comment |
gnu grep worked perfectly, thank you. awk solution had a few problems, which seemed to be associated with grep and some numbers that were in the text file (text file is huge, thought it was all normal text, but i was wrong)
– Giles
May 21 '16 at 20:28
gnu grep worked perfectly, thank you. awk solution had a few problems, which seemed to be associated with grep and some numbers that were in the text file (text file is huge, thought it was all normal text, but i was wrong)
– Giles
May 21 '16 at 20:28
gnu grep worked perfectly, thank you. awk solution had a few problems, which seemed to be associated with grep and some numbers that were in the text file (text file is huge, thought it was all normal text, but i was wrong)
– Giles
May 21 '16 at 20:28
add a comment |
up vote
0
down vote
Go for don_crissti's (accepted) answer if you have GNU grep
. Just in case you don't (e.g. on a standard Mac OS X, where that won't work), you could alternatively save this snippet to a bash script, e.g. myconvert.sh
#!/bin/bash
while IFS='' read -r line || [[ -n "$line" ]]; do
if ! grep -Fq "$line" $2
then
sed -i '' "/$(echo $line | sed -e 's//$*.^|/\&/g')/d" $1
fi
done < "$1"
an call it with the two files as arguments
./myconvert.sh file1 file2
However, please note don_crissti's knowledgeable comments below regarding the usage of while/read and the obvious performance drawbacks of invoking sed
.
Avoidwhile..read
ing a file... Also, iffile1
had 10.000 lines your script would edit the file in-place 10.000 times (viased
) not to mention yoursed
command could (potentially) remove other lines too since you're not escaping any special characters that may be present infile1
(most common of them being the dot).
– don_crissti
May 21 '16 at 20:28
With a modern shell like e.g.bash
orzsh
you could read the lines offile1
into an array and for each element usegrep -q
and if successful print the element:c=0; readarray -t arr <file1; while [ $c -le $#arr ]; do grep -qF "$arr[$c]" file2 && printf %s\n "$arr[$c]"; ((c++)); done
Now, if you modify this to print something else that you then pipe tosed
as a script you can edit the file in-place with a singlesed
invocation.
– don_crissti
May 21 '16 at 22:31
add a comment |
up vote
0
down vote
Go for don_crissti's (accepted) answer if you have GNU grep
. Just in case you don't (e.g. on a standard Mac OS X, where that won't work), you could alternatively save this snippet to a bash script, e.g. myconvert.sh
#!/bin/bash
while IFS='' read -r line || [[ -n "$line" ]]; do
if ! grep -Fq "$line" $2
then
sed -i '' "/$(echo $line | sed -e 's//$*.^|/\&/g')/d" $1
fi
done < "$1"
an call it with the two files as arguments
./myconvert.sh file1 file2
However, please note don_crissti's knowledgeable comments below regarding the usage of while/read and the obvious performance drawbacks of invoking sed
.
Avoidwhile..read
ing a file... Also, iffile1
had 10.000 lines your script would edit the file in-place 10.000 times (viased
) not to mention yoursed
command could (potentially) remove other lines too since you're not escaping any special characters that may be present infile1
(most common of them being the dot).
– don_crissti
May 21 '16 at 20:28
With a modern shell like e.g.bash
orzsh
you could read the lines offile1
into an array and for each element usegrep -q
and if successful print the element:c=0; readarray -t arr <file1; while [ $c -le $#arr ]; do grep -qF "$arr[$c]" file2 && printf %s\n "$arr[$c]"; ((c++)); done
Now, if you modify this to print something else that you then pipe tosed
as a script you can edit the file in-place with a singlesed
invocation.
– don_crissti
May 21 '16 at 22:31
add a comment |
up vote
0
down vote
up vote
0
down vote
Go for don_crissti's (accepted) answer if you have GNU grep
. Just in case you don't (e.g. on a standard Mac OS X, where that won't work), you could alternatively save this snippet to a bash script, e.g. myconvert.sh
#!/bin/bash
while IFS='' read -r line || [[ -n "$line" ]]; do
if ! grep -Fq "$line" $2
then
sed -i '' "/$(echo $line | sed -e 's//$*.^|/\&/g')/d" $1
fi
done < "$1"
an call it with the two files as arguments
./myconvert.sh file1 file2
However, please note don_crissti's knowledgeable comments below regarding the usage of while/read and the obvious performance drawbacks of invoking sed
.
Go for don_crissti's (accepted) answer if you have GNU grep
. Just in case you don't (e.g. on a standard Mac OS X, where that won't work), you could alternatively save this snippet to a bash script, e.g. myconvert.sh
#!/bin/bash
while IFS='' read -r line || [[ -n "$line" ]]; do
if ! grep -Fq "$line" $2
then
sed -i '' "/$(echo $line | sed -e 's//$*.^|/\&/g')/d" $1
fi
done < "$1"
an call it with the two files as arguments
./myconvert.sh file1 file2
However, please note don_crissti's knowledgeable comments below regarding the usage of while/read and the obvious performance drawbacks of invoking sed
.
edited May 21 '16 at 21:07
answered May 21 '16 at 20:23
Mischa
1012
1012
Avoidwhile..read
ing a file... Also, iffile1
had 10.000 lines your script would edit the file in-place 10.000 times (viased
) not to mention yoursed
command could (potentially) remove other lines too since you're not escaping any special characters that may be present infile1
(most common of them being the dot).
– don_crissti
May 21 '16 at 20:28
With a modern shell like e.g.bash
orzsh
you could read the lines offile1
into an array and for each element usegrep -q
and if successful print the element:c=0; readarray -t arr <file1; while [ $c -le $#arr ]; do grep -qF "$arr[$c]" file2 && printf %s\n "$arr[$c]"; ((c++)); done
Now, if you modify this to print something else that you then pipe tosed
as a script you can edit the file in-place with a singlesed
invocation.
– don_crissti
May 21 '16 at 22:31
add a comment |
Avoidwhile..read
ing a file... Also, iffile1
had 10.000 lines your script would edit the file in-place 10.000 times (viased
) not to mention yoursed
command could (potentially) remove other lines too since you're not escaping any special characters that may be present infile1
(most common of them being the dot).
– don_crissti
May 21 '16 at 20:28
With a modern shell like e.g.bash
orzsh
you could read the lines offile1
into an array and for each element usegrep -q
and if successful print the element:c=0; readarray -t arr <file1; while [ $c -le $#arr ]; do grep -qF "$arr[$c]" file2 && printf %s\n "$arr[$c]"; ((c++)); done
Now, if you modify this to print something else that you then pipe tosed
as a script you can edit the file in-place with a singlesed
invocation.
– don_crissti
May 21 '16 at 22:31
Avoid
while..read
ing a file... Also, if file1
had 10.000 lines your script would edit the file in-place 10.000 times (via sed
) not to mention your sed
command could (potentially) remove other lines too since you're not escaping any special characters that may be present in file1
(most common of them being the dot).– don_crissti
May 21 '16 at 20:28
Avoid
while..read
ing a file... Also, if file1
had 10.000 lines your script would edit the file in-place 10.000 times (via sed
) not to mention your sed
command could (potentially) remove other lines too since you're not escaping any special characters that may be present in file1
(most common of them being the dot).– don_crissti
May 21 '16 at 20:28
With a modern shell like e.g.
bash
or zsh
you could read the lines of file1
into an array and for each element use grep -q
and if successful print the element: c=0; readarray -t arr <file1; while [ $c -le $#arr ]; do grep -qF "$arr[$c]" file2 && printf %s\n "$arr[$c]"; ((c++)); done
Now, if you modify this to print something else that you then pipe to sed
as a script you can edit the file in-place with a single sed
invocation.– don_crissti
May 21 '16 at 22:31
With a modern shell like e.g.
bash
or zsh
you could read the lines of file1
into an array and for each element use grep -q
and if successful print the element: c=0; readarray -t arr <file1; while [ $c -le $#arr ]; do grep -qF "$arr[$c]" file2 && printf %s\n "$arr[$c]"; ((c++)); done
Now, if you modify this to print something else that you then pipe to sed
as a script you can edit the file in-place with a single sed
invocation.– don_crissti
May 21 '16 at 22:31
add a comment |
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f284658%2ffind-string-from-one-file-in-another-if-not-present-then-remove-from-original-fi%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Please read stackoverflow.com/help/formatting
– garethTheRed
May 21 '16 at 18:27
yes, but the line from file 1 can be on and within any portion of text within file 2, so it needs to take that into account before deleting it from file 1 if its not there.
– Giles
May 21 '16 at 19:24
sorry was trying to keep it simple, short and sweet. can provide a better example if it helps. also i apologise if i occasionally make no sense, as i've just come off a 48 hour straight work stint, feeling a little woozy/dizzy!
– Giles
May 21 '16 at 19:30
i get that, should of known better really, as even in the short time i've been doing this i've seen it happen a few times in these forums when looking through other peoples questions. i think sometimes I can stare at something so long trying to figure it out, when I go to explain the problem i can overlook important factors in the explanation.
– Giles
May 21 '16 at 19:45