Can't replace my regex matches
Clash Royale CLAN TAG#URR8PPP
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;
I can filter files, I can can stream the matches of my regex ...
However, I need to remove exactly that, from a large file.
Regex:^(?:[A-Za-z0-9+/]4)*(?:[A-Za-z0-9+/]2==|[A-Za-z0-9+/]3=)?$
sed -e '/^(?:[A-Za-z0-9+/]4)*(?:[A-Za-z0-9+/]2==|[A-Za-z0-9+/]3=)?$/d/ /g' file
only streams the matches but does not replace/cut them.
I can seach for files, containing matches, also works.
What is formula to get it working?
sed regular-expression
add a comment |
I can filter files, I can can stream the matches of my regex ...
However, I need to remove exactly that, from a large file.
Regex:^(?:[A-Za-z0-9+/]4)*(?:[A-Za-z0-9+/]2==|[A-Za-z0-9+/]3=)?$
sed -e '/^(?:[A-Za-z0-9+/]4)*(?:[A-Za-z0-9+/]2==|[A-Za-z0-9+/]3=)?$/d/ /g' file
only streams the matches but does not replace/cut them.
I can seach for files, containing matches, also works.
What is formula to get it working?
sed regular-expression
what did you try until now?
– yael
Mar 10 at 5:40
I did split the large file, delete the partial files, which did contain my regex and assemble them reverse to the whole. However, that is more likely a slower / more dirty process over endless partial files
– Olaf
Mar 10 at 6:41
@Olaf Just to be clear, you want to remove the patterns matching your regex from the file. Is that correct?
– Haxiel
Mar 10 at 6:45
YES. In fact, that are emails, where attachments get removed, with the aging of the mails, as a huge number of clients abuses the mail-servers as storage. After 8 years and longer, several with attachments forwarded, no 35MB email with the attachments is useful. I only want to preserve the textual content
– Olaf
Mar 10 at 6:52
add a comment |
I can filter files, I can can stream the matches of my regex ...
However, I need to remove exactly that, from a large file.
Regex:^(?:[A-Za-z0-9+/]4)*(?:[A-Za-z0-9+/]2==|[A-Za-z0-9+/]3=)?$
sed -e '/^(?:[A-Za-z0-9+/]4)*(?:[A-Za-z0-9+/]2==|[A-Za-z0-9+/]3=)?$/d/ /g' file
only streams the matches but does not replace/cut them.
I can seach for files, containing matches, also works.
What is formula to get it working?
sed regular-expression
I can filter files, I can can stream the matches of my regex ...
However, I need to remove exactly that, from a large file.
Regex:^(?:[A-Za-z0-9+/]4)*(?:[A-Za-z0-9+/]2==|[A-Za-z0-9+/]3=)?$
sed -e '/^(?:[A-Za-z0-9+/]4)*(?:[A-Za-z0-9+/]2==|[A-Za-z0-9+/]3=)?$/d/ /g' file
only streams the matches but does not replace/cut them.
I can seach for files, containing matches, also works.
What is formula to get it working?
sed regular-expression
sed regular-expression
edited Mar 10 at 13:34
GAD3R
28k1958114
28k1958114
asked Mar 10 at 3:49
OlafOlaf
191
191
what did you try until now?
– yael
Mar 10 at 5:40
I did split the large file, delete the partial files, which did contain my regex and assemble them reverse to the whole. However, that is more likely a slower / more dirty process over endless partial files
– Olaf
Mar 10 at 6:41
@Olaf Just to be clear, you want to remove the patterns matching your regex from the file. Is that correct?
– Haxiel
Mar 10 at 6:45
YES. In fact, that are emails, where attachments get removed, with the aging of the mails, as a huge number of clients abuses the mail-servers as storage. After 8 years and longer, several with attachments forwarded, no 35MB email with the attachments is useful. I only want to preserve the textual content
– Olaf
Mar 10 at 6:52
add a comment |
what did you try until now?
– yael
Mar 10 at 5:40
I did split the large file, delete the partial files, which did contain my regex and assemble them reverse to the whole. However, that is more likely a slower / more dirty process over endless partial files
– Olaf
Mar 10 at 6:41
@Olaf Just to be clear, you want to remove the patterns matching your regex from the file. Is that correct?
– Haxiel
Mar 10 at 6:45
YES. In fact, that are emails, where attachments get removed, with the aging of the mails, as a huge number of clients abuses the mail-servers as storage. After 8 years and longer, several with attachments forwarded, no 35MB email with the attachments is useful. I only want to preserve the textual content
– Olaf
Mar 10 at 6:52
what did you try until now?
– yael
Mar 10 at 5:40
what did you try until now?
– yael
Mar 10 at 5:40
I did split the large file, delete the partial files, which did contain my regex and assemble them reverse to the whole. However, that is more likely a slower / more dirty process over endless partial files
– Olaf
Mar 10 at 6:41
I did split the large file, delete the partial files, which did contain my regex and assemble them reverse to the whole. However, that is more likely a slower / more dirty process over endless partial files
– Olaf
Mar 10 at 6:41
@Olaf Just to be clear, you want to remove the patterns matching your regex from the file. Is that correct?
– Haxiel
Mar 10 at 6:45
@Olaf Just to be clear, you want to remove the patterns matching your regex from the file. Is that correct?
– Haxiel
Mar 10 at 6:45
YES. In fact, that are emails, where attachments get removed, with the aging of the mails, as a huge number of clients abuses the mail-servers as storage. After 8 years and longer, several with attachments forwarded, no 35MB email with the attachments is useful. I only want to preserve the textual content
– Olaf
Mar 10 at 6:52
YES. In fact, that are emails, where attachments get removed, with the aging of the mails, as a huge number of clients abuses the mail-servers as storage. After 8 years and longer, several with attachments forwarded, no 35MB email with the attachments is useful. I only want to preserve the textual content
– Olaf
Mar 10 at 6:52
add a comment |
2 Answers
2
active
oldest
votes
It appears that you are using a Perl-compatible Regular Expression (PCRE) with sed
. The sed
utility only knows Basic Regular Expressions (BRE) by default (or Extended Regular Expressions (ERE) when used with -E
on most systems).
I also don't think that the sed
syntax is correct, but it's difficult to read because the expression in the question seems to have extra *
in them. You appear to want to strip out the multipart divider in an email message, but you don't seem to care about matching these up correctly (matching a start of one multipart part to the corresponding end divider). If the sed
syntax was corrected, the expression would likely delete the full contents of the emails, or combine all attachments into the body of the message.
The PCRE expression
^(?:[A-Za-z0-9+/]4)*(?:[A-Za-z0-9+/]2==|[A-Za-z0-9+/]3=)?$
is the same as the ERE (to be used with sed -E
)
^([A-Za-z0-9+/]4)*([A-Za-z0-9+/]2==|[A-Za-z0-9+/]3)?=$
and using this with d
(which you appear to be doing) to would delete those lines, but the trailing / /g
in your sed
command is an error. Removing / /g
would likely have the effect of combining all attachments into the body of the email.
If you want to strip attachments of email messages (as indicated in comments), I would not try to do it with sed
but with a proper email message parser.
Examples of how to go about doing this may be found in the following related questions:
- Remove/Delete Attachments from email server (IMAP)
- Detaching an attachment in Mutt
- Best way to archive attachments?
Personally, I would write a Perl script similar to the one in the first linked question/answer above. Just remember to always run test runs of such scripts on copies of you mailboxes, just in case you make mistakes.
The fdm
mail tool is able to filter messages based on the number and/or size of attachments, which may be handy as a way of filtering out large email messages from archived mailboxes.
"I also don't think that the sed syntax is correct, but it's difficult to read because the expression in the question seems to have extra * in them." - yes, this regex matches in gedit AND bluefish, as well other apps exactly the parts, I want/need remove. (eyGf7wbBqo6apbkvAzbR1Vjmt+W5hhQl5F6dSawrzVJrpvKs49ynhnzis5xS6u500JTlpZW89isn iC/kzCI13njNSQXrQNvltXaQ9WPJp6aXNJCqO6RAHORyfzqKXTZIJN1xK0sIHVTyPwrO01q2zovR ......) Until now, I just split the files on ss1='------=_NextPart_' ss2='------=_Part_' ss3='----' ss4='--=' ss5='--'
– Olaf
Mar 10 at 8:25
@Olaf Yes, it may well do if those tools understand PCRE, butsed
does not.
– Kusalananda♦
Mar 10 at 8:27
to split the huge files. Obviously, when splitting here and delete the file parts containing the regex, there 5 lines missing. I neither manage to break before or after the regex - empty lines, neither I get the base64 codes (see the regex out)
– Olaf
Mar 10 at 8:30
@Olaf As I mentioned in my answer, I would not do this with a tool that is not email-aware.
– Kusalananda♦
Mar 10 at 8:31
"The fdm mail tool is able to filter messages" - That I can do via find, sort desc. A plain text mail of 1 mb is not between, 2100k small - 800k is huge for a mail, without attachment. I only look currently at files exceeding 30MB, containing 29 attached mails containing attachments
– Olaf
Mar 10 at 8:34
|
show 1 more comment
try:
sed -E "s/^(?:[A-Za-z0-9+/]4)*(?:[A-Za-z0-9+/]2==|[A-Za-z0-9+/]3=)?$//g" file
and double-check the output. The -E
HAVE to be capital. -e
doesn't work.
Once you are sure it works, use -iE
instead to make the changes directly into the file
I will try that in a while and give you a thumb up/down. Thx a lot
– Olaf
Mar 11 at 8:54
sed: -e expression #1, char 69: Invalid preceding regular expression
– Olaf
Mar 11 at 9:14
add a comment |
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f505419%2fcant-replace-my-regex-matches%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
It appears that you are using a Perl-compatible Regular Expression (PCRE) with sed
. The sed
utility only knows Basic Regular Expressions (BRE) by default (or Extended Regular Expressions (ERE) when used with -E
on most systems).
I also don't think that the sed
syntax is correct, but it's difficult to read because the expression in the question seems to have extra *
in them. You appear to want to strip out the multipart divider in an email message, but you don't seem to care about matching these up correctly (matching a start of one multipart part to the corresponding end divider). If the sed
syntax was corrected, the expression would likely delete the full contents of the emails, or combine all attachments into the body of the message.
The PCRE expression
^(?:[A-Za-z0-9+/]4)*(?:[A-Za-z0-9+/]2==|[A-Za-z0-9+/]3=)?$
is the same as the ERE (to be used with sed -E
)
^([A-Za-z0-9+/]4)*([A-Za-z0-9+/]2==|[A-Za-z0-9+/]3)?=$
and using this with d
(which you appear to be doing) to would delete those lines, but the trailing / /g
in your sed
command is an error. Removing / /g
would likely have the effect of combining all attachments into the body of the email.
If you want to strip attachments of email messages (as indicated in comments), I would not try to do it with sed
but with a proper email message parser.
Examples of how to go about doing this may be found in the following related questions:
- Remove/Delete Attachments from email server (IMAP)
- Detaching an attachment in Mutt
- Best way to archive attachments?
Personally, I would write a Perl script similar to the one in the first linked question/answer above. Just remember to always run test runs of such scripts on copies of you mailboxes, just in case you make mistakes.
The fdm
mail tool is able to filter messages based on the number and/or size of attachments, which may be handy as a way of filtering out large email messages from archived mailboxes.
"I also don't think that the sed syntax is correct, but it's difficult to read because the expression in the question seems to have extra * in them." - yes, this regex matches in gedit AND bluefish, as well other apps exactly the parts, I want/need remove. (eyGf7wbBqo6apbkvAzbR1Vjmt+W5hhQl5F6dSawrzVJrpvKs49ynhnzis5xS6u500JTlpZW89isn iC/kzCI13njNSQXrQNvltXaQ9WPJp6aXNJCqO6RAHORyfzqKXTZIJN1xK0sIHVTyPwrO01q2zovR ......) Until now, I just split the files on ss1='------=_NextPart_' ss2='------=_Part_' ss3='----' ss4='--=' ss5='--'
– Olaf
Mar 10 at 8:25
@Olaf Yes, it may well do if those tools understand PCRE, butsed
does not.
– Kusalananda♦
Mar 10 at 8:27
to split the huge files. Obviously, when splitting here and delete the file parts containing the regex, there 5 lines missing. I neither manage to break before or after the regex - empty lines, neither I get the base64 codes (see the regex out)
– Olaf
Mar 10 at 8:30
@Olaf As I mentioned in my answer, I would not do this with a tool that is not email-aware.
– Kusalananda♦
Mar 10 at 8:31
"The fdm mail tool is able to filter messages" - That I can do via find, sort desc. A plain text mail of 1 mb is not between, 2100k small - 800k is huge for a mail, without attachment. I only look currently at files exceeding 30MB, containing 29 attached mails containing attachments
– Olaf
Mar 10 at 8:34
|
show 1 more comment
It appears that you are using a Perl-compatible Regular Expression (PCRE) with sed
. The sed
utility only knows Basic Regular Expressions (BRE) by default (or Extended Regular Expressions (ERE) when used with -E
on most systems).
I also don't think that the sed
syntax is correct, but it's difficult to read because the expression in the question seems to have extra *
in them. You appear to want to strip out the multipart divider in an email message, but you don't seem to care about matching these up correctly (matching a start of one multipart part to the corresponding end divider). If the sed
syntax was corrected, the expression would likely delete the full contents of the emails, or combine all attachments into the body of the message.
The PCRE expression
^(?:[A-Za-z0-9+/]4)*(?:[A-Za-z0-9+/]2==|[A-Za-z0-9+/]3=)?$
is the same as the ERE (to be used with sed -E
)
^([A-Za-z0-9+/]4)*([A-Za-z0-9+/]2==|[A-Za-z0-9+/]3)?=$
and using this with d
(which you appear to be doing) to would delete those lines, but the trailing / /g
in your sed
command is an error. Removing / /g
would likely have the effect of combining all attachments into the body of the email.
If you want to strip attachments of email messages (as indicated in comments), I would not try to do it with sed
but with a proper email message parser.
Examples of how to go about doing this may be found in the following related questions:
- Remove/Delete Attachments from email server (IMAP)
- Detaching an attachment in Mutt
- Best way to archive attachments?
Personally, I would write a Perl script similar to the one in the first linked question/answer above. Just remember to always run test runs of such scripts on copies of you mailboxes, just in case you make mistakes.
The fdm
mail tool is able to filter messages based on the number and/or size of attachments, which may be handy as a way of filtering out large email messages from archived mailboxes.
"I also don't think that the sed syntax is correct, but it's difficult to read because the expression in the question seems to have extra * in them." - yes, this regex matches in gedit AND bluefish, as well other apps exactly the parts, I want/need remove. (eyGf7wbBqo6apbkvAzbR1Vjmt+W5hhQl5F6dSawrzVJrpvKs49ynhnzis5xS6u500JTlpZW89isn iC/kzCI13njNSQXrQNvltXaQ9WPJp6aXNJCqO6RAHORyfzqKXTZIJN1xK0sIHVTyPwrO01q2zovR ......) Until now, I just split the files on ss1='------=_NextPart_' ss2='------=_Part_' ss3='----' ss4='--=' ss5='--'
– Olaf
Mar 10 at 8:25
@Olaf Yes, it may well do if those tools understand PCRE, butsed
does not.
– Kusalananda♦
Mar 10 at 8:27
to split the huge files. Obviously, when splitting here and delete the file parts containing the regex, there 5 lines missing. I neither manage to break before or after the regex - empty lines, neither I get the base64 codes (see the regex out)
– Olaf
Mar 10 at 8:30
@Olaf As I mentioned in my answer, I would not do this with a tool that is not email-aware.
– Kusalananda♦
Mar 10 at 8:31
"The fdm mail tool is able to filter messages" - That I can do via find, sort desc. A plain text mail of 1 mb is not between, 2100k small - 800k is huge for a mail, without attachment. I only look currently at files exceeding 30MB, containing 29 attached mails containing attachments
– Olaf
Mar 10 at 8:34
|
show 1 more comment
It appears that you are using a Perl-compatible Regular Expression (PCRE) with sed
. The sed
utility only knows Basic Regular Expressions (BRE) by default (or Extended Regular Expressions (ERE) when used with -E
on most systems).
I also don't think that the sed
syntax is correct, but it's difficult to read because the expression in the question seems to have extra *
in them. You appear to want to strip out the multipart divider in an email message, but you don't seem to care about matching these up correctly (matching a start of one multipart part to the corresponding end divider). If the sed
syntax was corrected, the expression would likely delete the full contents of the emails, or combine all attachments into the body of the message.
The PCRE expression
^(?:[A-Za-z0-9+/]4)*(?:[A-Za-z0-9+/]2==|[A-Za-z0-9+/]3=)?$
is the same as the ERE (to be used with sed -E
)
^([A-Za-z0-9+/]4)*([A-Za-z0-9+/]2==|[A-Za-z0-9+/]3)?=$
and using this with d
(which you appear to be doing) to would delete those lines, but the trailing / /g
in your sed
command is an error. Removing / /g
would likely have the effect of combining all attachments into the body of the email.
If you want to strip attachments of email messages (as indicated in comments), I would not try to do it with sed
but with a proper email message parser.
Examples of how to go about doing this may be found in the following related questions:
- Remove/Delete Attachments from email server (IMAP)
- Detaching an attachment in Mutt
- Best way to archive attachments?
Personally, I would write a Perl script similar to the one in the first linked question/answer above. Just remember to always run test runs of such scripts on copies of you mailboxes, just in case you make mistakes.
The fdm
mail tool is able to filter messages based on the number and/or size of attachments, which may be handy as a way of filtering out large email messages from archived mailboxes.
It appears that you are using a Perl-compatible Regular Expression (PCRE) with sed
. The sed
utility only knows Basic Regular Expressions (BRE) by default (or Extended Regular Expressions (ERE) when used with -E
on most systems).
I also don't think that the sed
syntax is correct, but it's difficult to read because the expression in the question seems to have extra *
in them. You appear to want to strip out the multipart divider in an email message, but you don't seem to care about matching these up correctly (matching a start of one multipart part to the corresponding end divider). If the sed
syntax was corrected, the expression would likely delete the full contents of the emails, or combine all attachments into the body of the message.
The PCRE expression
^(?:[A-Za-z0-9+/]4)*(?:[A-Za-z0-9+/]2==|[A-Za-z0-9+/]3=)?$
is the same as the ERE (to be used with sed -E
)
^([A-Za-z0-9+/]4)*([A-Za-z0-9+/]2==|[A-Za-z0-9+/]3)?=$
and using this with d
(which you appear to be doing) to would delete those lines, but the trailing / /g
in your sed
command is an error. Removing / /g
would likely have the effect of combining all attachments into the body of the email.
If you want to strip attachments of email messages (as indicated in comments), I would not try to do it with sed
but with a proper email message parser.
Examples of how to go about doing this may be found in the following related questions:
- Remove/Delete Attachments from email server (IMAP)
- Detaching an attachment in Mutt
- Best way to archive attachments?
Personally, I would write a Perl script similar to the one in the first linked question/answer above. Just remember to always run test runs of such scripts on copies of you mailboxes, just in case you make mistakes.
The fdm
mail tool is able to filter messages based on the number and/or size of attachments, which may be handy as a way of filtering out large email messages from archived mailboxes.
edited Mar 10 at 8:48
answered Mar 10 at 8:17
Kusalananda♦Kusalananda
140k17261435
140k17261435
"I also don't think that the sed syntax is correct, but it's difficult to read because the expression in the question seems to have extra * in them." - yes, this regex matches in gedit AND bluefish, as well other apps exactly the parts, I want/need remove. (eyGf7wbBqo6apbkvAzbR1Vjmt+W5hhQl5F6dSawrzVJrpvKs49ynhnzis5xS6u500JTlpZW89isn iC/kzCI13njNSQXrQNvltXaQ9WPJp6aXNJCqO6RAHORyfzqKXTZIJN1xK0sIHVTyPwrO01q2zovR ......) Until now, I just split the files on ss1='------=_NextPart_' ss2='------=_Part_' ss3='----' ss4='--=' ss5='--'
– Olaf
Mar 10 at 8:25
@Olaf Yes, it may well do if those tools understand PCRE, butsed
does not.
– Kusalananda♦
Mar 10 at 8:27
to split the huge files. Obviously, when splitting here and delete the file parts containing the regex, there 5 lines missing. I neither manage to break before or after the regex - empty lines, neither I get the base64 codes (see the regex out)
– Olaf
Mar 10 at 8:30
@Olaf As I mentioned in my answer, I would not do this with a tool that is not email-aware.
– Kusalananda♦
Mar 10 at 8:31
"The fdm mail tool is able to filter messages" - That I can do via find, sort desc. A plain text mail of 1 mb is not between, 2100k small - 800k is huge for a mail, without attachment. I only look currently at files exceeding 30MB, containing 29 attached mails containing attachments
– Olaf
Mar 10 at 8:34
|
show 1 more comment
"I also don't think that the sed syntax is correct, but it's difficult to read because the expression in the question seems to have extra * in them." - yes, this regex matches in gedit AND bluefish, as well other apps exactly the parts, I want/need remove. (eyGf7wbBqo6apbkvAzbR1Vjmt+W5hhQl5F6dSawrzVJrpvKs49ynhnzis5xS6u500JTlpZW89isn iC/kzCI13njNSQXrQNvltXaQ9WPJp6aXNJCqO6RAHORyfzqKXTZIJN1xK0sIHVTyPwrO01q2zovR ......) Until now, I just split the files on ss1='------=_NextPart_' ss2='------=_Part_' ss3='----' ss4='--=' ss5='--'
– Olaf
Mar 10 at 8:25
@Olaf Yes, it may well do if those tools understand PCRE, butsed
does not.
– Kusalananda♦
Mar 10 at 8:27
to split the huge files. Obviously, when splitting here and delete the file parts containing the regex, there 5 lines missing. I neither manage to break before or after the regex - empty lines, neither I get the base64 codes (see the regex out)
– Olaf
Mar 10 at 8:30
@Olaf As I mentioned in my answer, I would not do this with a tool that is not email-aware.
– Kusalananda♦
Mar 10 at 8:31
"The fdm mail tool is able to filter messages" - That I can do via find, sort desc. A plain text mail of 1 mb is not between, 2100k small - 800k is huge for a mail, without attachment. I only look currently at files exceeding 30MB, containing 29 attached mails containing attachments
– Olaf
Mar 10 at 8:34
"I also don't think that the sed syntax is correct, but it's difficult to read because the expression in the question seems to have extra * in them." - yes, this regex matches in gedit AND bluefish, as well other apps exactly the parts, I want/need remove. (eyGf7wbBqo6apbkvAzbR1Vjmt+W5hhQl5F6dSawrzVJrpvKs49ynhnzis5xS6u500JTlpZW89isn iC/kzCI13njNSQXrQNvltXaQ9WPJp6aXNJCqO6RAHORyfzqKXTZIJN1xK0sIHVTyPwrO01q2zovR ......) Until now, I just split the files on ss1='------=_NextPart_' ss2='------=_Part_' ss3='----' ss4='--=' ss5='--'
– Olaf
Mar 10 at 8:25
"I also don't think that the sed syntax is correct, but it's difficult to read because the expression in the question seems to have extra * in them." - yes, this regex matches in gedit AND bluefish, as well other apps exactly the parts, I want/need remove. (eyGf7wbBqo6apbkvAzbR1Vjmt+W5hhQl5F6dSawrzVJrpvKs49ynhnzis5xS6u500JTlpZW89isn iC/kzCI13njNSQXrQNvltXaQ9WPJp6aXNJCqO6RAHORyfzqKXTZIJN1xK0sIHVTyPwrO01q2zovR ......) Until now, I just split the files on ss1='------=_NextPart_' ss2='------=_Part_' ss3='----' ss4='--=' ss5='--'
– Olaf
Mar 10 at 8:25
@Olaf Yes, it may well do if those tools understand PCRE, but
sed
does not.– Kusalananda♦
Mar 10 at 8:27
@Olaf Yes, it may well do if those tools understand PCRE, but
sed
does not.– Kusalananda♦
Mar 10 at 8:27
to split the huge files. Obviously, when splitting here and delete the file parts containing the regex, there 5 lines missing. I neither manage to break before or after the regex - empty lines, neither I get the base64 codes (see the regex out)
– Olaf
Mar 10 at 8:30
to split the huge files. Obviously, when splitting here and delete the file parts containing the regex, there 5 lines missing. I neither manage to break before or after the regex - empty lines, neither I get the base64 codes (see the regex out)
– Olaf
Mar 10 at 8:30
@Olaf As I mentioned in my answer, I would not do this with a tool that is not email-aware.
– Kusalananda♦
Mar 10 at 8:31
@Olaf As I mentioned in my answer, I would not do this with a tool that is not email-aware.
– Kusalananda♦
Mar 10 at 8:31
"The fdm mail tool is able to filter messages" - That I can do via find, sort desc. A plain text mail of 1 mb is not between, 2100k small - 800k is huge for a mail, without attachment. I only look currently at files exceeding 30MB, containing 29 attached mails containing attachments
– Olaf
Mar 10 at 8:34
"The fdm mail tool is able to filter messages" - That I can do via find, sort desc. A plain text mail of 1 mb is not between, 2100k small - 800k is huge for a mail, without attachment. I only look currently at files exceeding 30MB, containing 29 attached mails containing attachments
– Olaf
Mar 10 at 8:34
|
show 1 more comment
try:
sed -E "s/^(?:[A-Za-z0-9+/]4)*(?:[A-Za-z0-9+/]2==|[A-Za-z0-9+/]3=)?$//g" file
and double-check the output. The -E
HAVE to be capital. -e
doesn't work.
Once you are sure it works, use -iE
instead to make the changes directly into the file
I will try that in a while and give you a thumb up/down. Thx a lot
– Olaf
Mar 11 at 8:54
sed: -e expression #1, char 69: Invalid preceding regular expression
– Olaf
Mar 11 at 9:14
add a comment |
try:
sed -E "s/^(?:[A-Za-z0-9+/]4)*(?:[A-Za-z0-9+/]2==|[A-Za-z0-9+/]3=)?$//g" file
and double-check the output. The -E
HAVE to be capital. -e
doesn't work.
Once you are sure it works, use -iE
instead to make the changes directly into the file
I will try that in a while and give you a thumb up/down. Thx a lot
– Olaf
Mar 11 at 8:54
sed: -e expression #1, char 69: Invalid preceding regular expression
– Olaf
Mar 11 at 9:14
add a comment |
try:
sed -E "s/^(?:[A-Za-z0-9+/]4)*(?:[A-Za-z0-9+/]2==|[A-Za-z0-9+/]3=)?$//g" file
and double-check the output. The -E
HAVE to be capital. -e
doesn't work.
Once you are sure it works, use -iE
instead to make the changes directly into the file
try:
sed -E "s/^(?:[A-Za-z0-9+/]4)*(?:[A-Za-z0-9+/]2==|[A-Za-z0-9+/]3=)?$//g" file
and double-check the output. The -E
HAVE to be capital. -e
doesn't work.
Once you are sure it works, use -iE
instead to make the changes directly into the file
answered Mar 10 at 18:18
JuanJuan
201210
201210
I will try that in a while and give you a thumb up/down. Thx a lot
– Olaf
Mar 11 at 8:54
sed: -e expression #1, char 69: Invalid preceding regular expression
– Olaf
Mar 11 at 9:14
add a comment |
I will try that in a while and give you a thumb up/down. Thx a lot
– Olaf
Mar 11 at 8:54
sed: -e expression #1, char 69: Invalid preceding regular expression
– Olaf
Mar 11 at 9:14
I will try that in a while and give you a thumb up/down. Thx a lot
– Olaf
Mar 11 at 8:54
I will try that in a while and give you a thumb up/down. Thx a lot
– Olaf
Mar 11 at 8:54
sed: -e expression #1, char 69: Invalid preceding regular expression
– Olaf
Mar 11 at 9:14
sed: -e expression #1, char 69: Invalid preceding regular expression
– Olaf
Mar 11 at 9:14
add a comment |
Thanks for contributing an answer to Unix & Linux Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f505419%2fcant-replace-my-regex-matches%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
what did you try until now?
– yael
Mar 10 at 5:40
I did split the large file, delete the partial files, which did contain my regex and assemble them reverse to the whole. However, that is more likely a slower / more dirty process over endless partial files
– Olaf
Mar 10 at 6:41
@Olaf Just to be clear, you want to remove the patterns matching your regex from the file. Is that correct?
– Haxiel
Mar 10 at 6:45
YES. In fact, that are emails, where attachments get removed, with the aging of the mails, as a huge number of clients abuses the mail-servers as storage. After 8 years and longer, several with attachments forwarded, no 35MB email with the attachments is useful. I only want to preserve the textual content
– Olaf
Mar 10 at 6:52