Removing characters with sed [duplicate]

Clash Royale CLAN TAG#URR8PPP
up vote
2
down vote
favorite
This question already has an answer here:
Match language range in shell, sed or awk
2 answers
I am working on AIX unix and trying to remove non-printable characters from file the data looks like Caucasian male lives in Arizona w/ fiancÃÂÃÂÃÂÃÂÃÂÃÂÃÂÃÂÃÂàin file when I view in Notepad++ using UTF-8 encoding. When I try to view file in unix I get ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂâ instead of the special characters.
I want to replace all those special characters with space.
I tried sed 's/[^[:print:]]/ /g' file but it does not remove those characters.My locale are listed below when I run locale -a
C
POSIX
en_US.8859-15
en_US.ISO8859-1
en_US
I even tried sed -e 's/[^ -~]/ /g' file and it did not remove the characters.
I see that others stackflow answers used UTF-8 locale with GNU sed and this worked but I do not have that locale.
Also I am using ksh.
text-processing sed ksh aix
marked as duplicate by Isaac, Goro, RalfFriedl, Shadur, X Tian Sep 27 at 8:53
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
add a comment |Â
up vote
2
down vote
favorite
This question already has an answer here:
Match language range in shell, sed or awk
2 answers
I am working on AIX unix and trying to remove non-printable characters from file the data looks like Caucasian male lives in Arizona w/ fiancÃÂÃÂÃÂÃÂÃÂÃÂÃÂÃÂÃÂàin file when I view in Notepad++ using UTF-8 encoding. When I try to view file in unix I get ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂâ instead of the special characters.
I want to replace all those special characters with space.
I tried sed 's/[^[:print:]]/ /g' file but it does not remove those characters.My locale are listed below when I run locale -a
C
POSIX
en_US.8859-15
en_US.ISO8859-1
en_US
I even tried sed -e 's/[^ -~]/ /g' file and it did not remove the characters.
I see that others stackflow answers used UTF-8 locale with GNU sed and this worked but I do not have that locale.
Also I am using ksh.
text-processing sed ksh aix
marked as duplicate by Isaac, Goro, RalfFriedl, Shadur, X Tian Sep 27 at 8:53
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
ÃandâÂÂlook pretty printable to me. A UTF-8Ãis encoded as 0xc3 0x83. 0xc3 in iso8859-1 or 15 is alsoÃas it happens which is printable, 0x83 would be a control character in both though
â Stéphane Chazelas
Sep 25 at 19:53
Possible dublicate unix.stackexchange.com/questions/201751/â¦
â Goro
Sep 25 at 20:05
1
@Goro Yes at this point its is possibly a duplicate now that I understand to use C locale
â Auguster
Sep 25 at 20:09
To actually show what the characeters are it is useful to show their hex values. Something like:echo "fiancÃÃÃÃÃÃÃÃÃÃ" | od -tx1, or, maybe if your sed supports it:echo "fiancÃÃÃÃÃÃÃÃÃÃ" | sed -n l.
â Isaac
Sep 25 at 21:08
add a comment |Â
up vote
2
down vote
favorite
up vote
2
down vote
favorite
This question already has an answer here:
Match language range in shell, sed or awk
2 answers
I am working on AIX unix and trying to remove non-printable characters from file the data looks like Caucasian male lives in Arizona w/ fiancÃÂÃÂÃÂÃÂÃÂÃÂÃÂÃÂÃÂàin file when I view in Notepad++ using UTF-8 encoding. When I try to view file in unix I get ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂâ instead of the special characters.
I want to replace all those special characters with space.
I tried sed 's/[^[:print:]]/ /g' file but it does not remove those characters.My locale are listed below when I run locale -a
C
POSIX
en_US.8859-15
en_US.ISO8859-1
en_US
I even tried sed -e 's/[^ -~]/ /g' file and it did not remove the characters.
I see that others stackflow answers used UTF-8 locale with GNU sed and this worked but I do not have that locale.
Also I am using ksh.
text-processing sed ksh aix
This question already has an answer here:
Match language range in shell, sed or awk
2 answers
I am working on AIX unix and trying to remove non-printable characters from file the data looks like Caucasian male lives in Arizona w/ fiancÃÂÃÂÃÂÃÂÃÂÃÂÃÂÃÂÃÂàin file when I view in Notepad++ using UTF-8 encoding. When I try to view file in unix I get ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂâ instead of the special characters.
I want to replace all those special characters with space.
I tried sed 's/[^[:print:]]/ /g' file but it does not remove those characters.My locale are listed below when I run locale -a
C
POSIX
en_US.8859-15
en_US.ISO8859-1
en_US
I even tried sed -e 's/[^ -~]/ /g' file and it did not remove the characters.
I see that others stackflow answers used UTF-8 locale with GNU sed and this worked but I do not have that locale.
Also I am using ksh.
This question already has an answer here:
Match language range in shell, sed or awk
2 answers
text-processing sed ksh aix
text-processing sed ksh aix
edited Sep 25 at 19:29
asked Sep 25 at 19:13
Auguster
133
133
marked as duplicate by Isaac, Goro, RalfFriedl, Shadur, X Tian Sep 27 at 8:53
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
marked as duplicate by Isaac, Goro, RalfFriedl, Shadur, X Tian Sep 27 at 8:53
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
ÃandâÂÂlook pretty printable to me. A UTF-8Ãis encoded as 0xc3 0x83. 0xc3 in iso8859-1 or 15 is alsoÃas it happens which is printable, 0x83 would be a control character in both though
â Stéphane Chazelas
Sep 25 at 19:53
Possible dublicate unix.stackexchange.com/questions/201751/â¦
â Goro
Sep 25 at 20:05
1
@Goro Yes at this point its is possibly a duplicate now that I understand to use C locale
â Auguster
Sep 25 at 20:09
To actually show what the characeters are it is useful to show their hex values. Something like:echo "fiancÃÃÃÃÃÃÃÃÃÃ" | od -tx1, or, maybe if your sed supports it:echo "fiancÃÃÃÃÃÃÃÃÃÃ" | sed -n l.
â Isaac
Sep 25 at 21:08
add a comment |Â
ÃandâÂÂlook pretty printable to me. A UTF-8Ãis encoded as 0xc3 0x83. 0xc3 in iso8859-1 or 15 is alsoÃas it happens which is printable, 0x83 would be a control character in both though
â Stéphane Chazelas
Sep 25 at 19:53
Possible dublicate unix.stackexchange.com/questions/201751/â¦
â Goro
Sep 25 at 20:05
1
@Goro Yes at this point its is possibly a duplicate now that I understand to use C locale
â Auguster
Sep 25 at 20:09
To actually show what the characeters are it is useful to show their hex values. Something like:echo "fiancÃÃÃÃÃÃÃÃÃÃ" | od -tx1, or, maybe if your sed supports it:echo "fiancÃÃÃÃÃÃÃÃÃÃ" | sed -n l.
â Isaac
Sep 25 at 21:08
à and â look pretty printable to me. A UTF-8 à is encoded as 0xc3 0x83. 0xc3 in iso8859-1 or 15 is also à as it happens which is printable, 0x83 would be a control character in both thoughâ Stéphane Chazelas
Sep 25 at 19:53
à and â look pretty printable to me. A UTF-8 à is encoded as 0xc3 0x83. 0xc3 in iso8859-1 or 15 is also à as it happens which is printable, 0x83 would be a control character in both thoughâ Stéphane Chazelas
Sep 25 at 19:53
Possible dublicate unix.stackexchange.com/questions/201751/â¦
â Goro
Sep 25 at 20:05
Possible dublicate unix.stackexchange.com/questions/201751/â¦
â Goro
Sep 25 at 20:05
1
1
@Goro Yes at this point its is possibly a duplicate now that I understand to use C locale
â Auguster
Sep 25 at 20:09
@Goro Yes at this point its is possibly a duplicate now that I understand to use C locale
â Auguster
Sep 25 at 20:09
To actually show what the characeters are it is useful to show their hex values. Something like:
echo "fiancÃÃÃÃÃÃÃÃÃÃ" | od -tx1, or, maybe if your sed supports it: echo "fiancÃÃÃÃÃÃÃÃÃÃ" | sed -n l.â Isaac
Sep 25 at 21:08
To actually show what the characeters are it is useful to show their hex values. Something like:
echo "fiancÃÃÃÃÃÃÃÃÃÃ" | od -tx1, or, maybe if your sed supports it: echo "fiancÃÃÃÃÃÃÃÃÃÃ" | sed -n l.â Isaac
Sep 25 at 21:08
add a comment |Â
2 Answers
2
active
oldest
votes
up vote
1
down vote
accepted
If the current locale already uses UTF-8 as the charset (and file is written using that charset):
<file LC_ALL=C sed 's/[^ -~]//g'
Or, to include control characters in AIX sed:
<file LC_ALL=C sed "$(printf "s/[^[:print:]tr]//g")"
@Stéphane what does printf do here? If I am deleting all characters and saving to another file do I need to use printf?
â Auguster
Sep 26 at 13:50
@Auguster,printfis there to expand thetinto a TAB character andrinto a CR character. If usingksh93on AIX, you can also use$'s/[^[:print:]tr]//g'
â Stéphane Chazelas
Sep 26 at 15:16
add a comment |Â
up vote
3
down vote
You can use the command tr as follows:
tr -cd '[:print:]trn'
Explanation:
`[:print:]'
Any character from the `[:space:]' class, and any character that is not in the `[:graph:]' class
r -- return
t -- horizontal tab
Examples based on Centos 7:tris GNU and UTF-8 encoding
$ echo "fiancÃÂÃÂÃÂÃÂÃÂÃÂÃÂÃÂÃÂÃÂ" | tr -cd '[:print:]trn'
fianc
$ echo "get ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂâ " | tr -cd '[:print:]trn'
get ^^^^^^
echo " Caucasian male lives in Arizona w/ fiancâÂÂ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂ" | tr -cd '[:print:]trn'
Caucasian male lives in Arizona w/ fianc^^^^^^^^^^^^
That did not work for me I tried echo" Caucasian male lives in Arizona w/ fiancâÂÂ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂ" | tr -d '[:print:]'and got output as some unreadable text
â Auguster
Sep 25 at 19:36
1
LC_ALL=C tr ...
â Jeff Schaller
Sep 25 at 19:38
1
LC_ALL=C tr -cd '[:print:]' < inputworks here
â Jeff Schaller
Sep 25 at 19:43
1
echo "fiancÃÃÃÃÃÃÃÃÃÃ" | tr -cd '[:print:]trn'should returnfiancÃÃÃÃÃÃÃÃÃÃasÃis a printable character. GNUtrdoesn't in UTF8 as it doesn't support multi-byte characters yet, but it does in iso8859-1. In the C locale on systems where the C locale charset is ASCII, that does removeÃ(or whatever bytes those are made of) as ASCII has no such character in the first place.
â Stéphane Chazelas
Sep 25 at 22:46
1
Because CentOStris GNUtrand you probably tried it in a UTF-8 locale whereÃis made of 2 bytes and GNUtrdoesn't support multibyte characters. If you useLC_ALL=Cas suggested by Auguster, it will work (at removing thoseÃhowever they're encoded) regardless of whethertrsupports multibyte characters or not. In the C locale, all characters are single bytes, and on most systems including AIX, the C locale charset is ASCII that has no character with the 8th bit set (which each byte of the UTF-8 encoding of à has as well as its single byte iso8859-1 encoding)
â Stéphane Chazelas
Sep 25 at 22:52
 |Â
show 3 more comments
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
1
down vote
accepted
If the current locale already uses UTF-8 as the charset (and file is written using that charset):
<file LC_ALL=C sed 's/[^ -~]//g'
Or, to include control characters in AIX sed:
<file LC_ALL=C sed "$(printf "s/[^[:print:]tr]//g")"
@Stéphane what does printf do here? If I am deleting all characters and saving to another file do I need to use printf?
â Auguster
Sep 26 at 13:50
@Auguster,printfis there to expand thetinto a TAB character andrinto a CR character. If usingksh93on AIX, you can also use$'s/[^[:print:]tr]//g'
â Stéphane Chazelas
Sep 26 at 15:16
add a comment |Â
up vote
1
down vote
accepted
If the current locale already uses UTF-8 as the charset (and file is written using that charset):
<file LC_ALL=C sed 's/[^ -~]//g'
Or, to include control characters in AIX sed:
<file LC_ALL=C sed "$(printf "s/[^[:print:]tr]//g")"
@Stéphane what does printf do here? If I am deleting all characters and saving to another file do I need to use printf?
â Auguster
Sep 26 at 13:50
@Auguster,printfis there to expand thetinto a TAB character andrinto a CR character. If usingksh93on AIX, you can also use$'s/[^[:print:]tr]//g'
â Stéphane Chazelas
Sep 26 at 15:16
add a comment |Â
up vote
1
down vote
accepted
up vote
1
down vote
accepted
If the current locale already uses UTF-8 as the charset (and file is written using that charset):
<file LC_ALL=C sed 's/[^ -~]//g'
Or, to include control characters in AIX sed:
<file LC_ALL=C sed "$(printf "s/[^[:print:]tr]//g")"
If the current locale already uses UTF-8 as the charset (and file is written using that charset):
<file LC_ALL=C sed 's/[^ -~]//g'
Or, to include control characters in AIX sed:
<file LC_ALL=C sed "$(printf "s/[^[:print:]tr]//g")"
edited Sep 25 at 22:57
Stéphane Chazelas
287k53529867
287k53529867
answered Sep 25 at 21:55
Isaac
7,56011137
7,56011137
@Stéphane what does printf do here? If I am deleting all characters and saving to another file do I need to use printf?
â Auguster
Sep 26 at 13:50
@Auguster,printfis there to expand thetinto a TAB character andrinto a CR character. If usingksh93on AIX, you can also use$'s/[^[:print:]tr]//g'
â Stéphane Chazelas
Sep 26 at 15:16
add a comment |Â
@Stéphane what does printf do here? If I am deleting all characters and saving to another file do I need to use printf?
â Auguster
Sep 26 at 13:50
@Auguster,printfis there to expand thetinto a TAB character andrinto a CR character. If usingksh93on AIX, you can also use$'s/[^[:print:]tr]//g'
â Stéphane Chazelas
Sep 26 at 15:16
@Stéphane what does printf do here? If I am deleting all characters and saving to another file do I need to use printf?
â Auguster
Sep 26 at 13:50
@Stéphane what does printf do here? If I am deleting all characters and saving to another file do I need to use printf?
â Auguster
Sep 26 at 13:50
@Auguster,
printf is there to expand the t into a TAB character and r into a CR character. If using ksh93 on AIX, you can also use $'s/[^[:print:]tr]//g'â Stéphane Chazelas
Sep 26 at 15:16
@Auguster,
printf is there to expand the t into a TAB character and r into a CR character. If using ksh93 on AIX, you can also use $'s/[^[:print:]tr]//g'â Stéphane Chazelas
Sep 26 at 15:16
add a comment |Â
up vote
3
down vote
You can use the command tr as follows:
tr -cd '[:print:]trn'
Explanation:
`[:print:]'
Any character from the `[:space:]' class, and any character that is not in the `[:graph:]' class
r -- return
t -- horizontal tab
Examples based on Centos 7:tris GNU and UTF-8 encoding
$ echo "fiancÃÂÃÂÃÂÃÂÃÂÃÂÃÂÃÂÃÂÃÂ" | tr -cd '[:print:]trn'
fianc
$ echo "get ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂâ " | tr -cd '[:print:]trn'
get ^^^^^^
echo " Caucasian male lives in Arizona w/ fiancâÂÂ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂ" | tr -cd '[:print:]trn'
Caucasian male lives in Arizona w/ fianc^^^^^^^^^^^^
That did not work for me I tried echo" Caucasian male lives in Arizona w/ fiancâÂÂ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂ" | tr -d '[:print:]'and got output as some unreadable text
â Auguster
Sep 25 at 19:36
1
LC_ALL=C tr ...
â Jeff Schaller
Sep 25 at 19:38
1
LC_ALL=C tr -cd '[:print:]' < inputworks here
â Jeff Schaller
Sep 25 at 19:43
1
echo "fiancÃÃÃÃÃÃÃÃÃÃ" | tr -cd '[:print:]trn'should returnfiancÃÃÃÃÃÃÃÃÃÃasÃis a printable character. GNUtrdoesn't in UTF8 as it doesn't support multi-byte characters yet, but it does in iso8859-1. In the C locale on systems where the C locale charset is ASCII, that does removeÃ(or whatever bytes those are made of) as ASCII has no such character in the first place.
â Stéphane Chazelas
Sep 25 at 22:46
1
Because CentOStris GNUtrand you probably tried it in a UTF-8 locale whereÃis made of 2 bytes and GNUtrdoesn't support multibyte characters. If you useLC_ALL=Cas suggested by Auguster, it will work (at removing thoseÃhowever they're encoded) regardless of whethertrsupports multibyte characters or not. In the C locale, all characters are single bytes, and on most systems including AIX, the C locale charset is ASCII that has no character with the 8th bit set (which each byte of the UTF-8 encoding of à has as well as its single byte iso8859-1 encoding)
â Stéphane Chazelas
Sep 25 at 22:52
 |Â
show 3 more comments
up vote
3
down vote
You can use the command tr as follows:
tr -cd '[:print:]trn'
Explanation:
`[:print:]'
Any character from the `[:space:]' class, and any character that is not in the `[:graph:]' class
r -- return
t -- horizontal tab
Examples based on Centos 7:tris GNU and UTF-8 encoding
$ echo "fiancÃÂÃÂÃÂÃÂÃÂÃÂÃÂÃÂÃÂÃÂ" | tr -cd '[:print:]trn'
fianc
$ echo "get ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂâ " | tr -cd '[:print:]trn'
get ^^^^^^
echo " Caucasian male lives in Arizona w/ fiancâÂÂ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂ" | tr -cd '[:print:]trn'
Caucasian male lives in Arizona w/ fianc^^^^^^^^^^^^
That did not work for me I tried echo" Caucasian male lives in Arizona w/ fiancâÂÂ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂ" | tr -d '[:print:]'and got output as some unreadable text
â Auguster
Sep 25 at 19:36
1
LC_ALL=C tr ...
â Jeff Schaller
Sep 25 at 19:38
1
LC_ALL=C tr -cd '[:print:]' < inputworks here
â Jeff Schaller
Sep 25 at 19:43
1
echo "fiancÃÃÃÃÃÃÃÃÃÃ" | tr -cd '[:print:]trn'should returnfiancÃÃÃÃÃÃÃÃÃÃasÃis a printable character. GNUtrdoesn't in UTF8 as it doesn't support multi-byte characters yet, but it does in iso8859-1. In the C locale on systems where the C locale charset is ASCII, that does removeÃ(or whatever bytes those are made of) as ASCII has no such character in the first place.
â Stéphane Chazelas
Sep 25 at 22:46
1
Because CentOStris GNUtrand you probably tried it in a UTF-8 locale whereÃis made of 2 bytes and GNUtrdoesn't support multibyte characters. If you useLC_ALL=Cas suggested by Auguster, it will work (at removing thoseÃhowever they're encoded) regardless of whethertrsupports multibyte characters or not. In the C locale, all characters are single bytes, and on most systems including AIX, the C locale charset is ASCII that has no character with the 8th bit set (which each byte of the UTF-8 encoding of à has as well as its single byte iso8859-1 encoding)
â Stéphane Chazelas
Sep 25 at 22:52
 |Â
show 3 more comments
up vote
3
down vote
up vote
3
down vote
You can use the command tr as follows:
tr -cd '[:print:]trn'
Explanation:
`[:print:]'
Any character from the `[:space:]' class, and any character that is not in the `[:graph:]' class
r -- return
t -- horizontal tab
Examples based on Centos 7:tris GNU and UTF-8 encoding
$ echo "fiancÃÂÃÂÃÂÃÂÃÂÃÂÃÂÃÂÃÂÃÂ" | tr -cd '[:print:]trn'
fianc
$ echo "get ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂâ " | tr -cd '[:print:]trn'
get ^^^^^^
echo " Caucasian male lives in Arizona w/ fiancâÂÂ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂ" | tr -cd '[:print:]trn'
Caucasian male lives in Arizona w/ fianc^^^^^^^^^^^^
You can use the command tr as follows:
tr -cd '[:print:]trn'
Explanation:
`[:print:]'
Any character from the `[:space:]' class, and any character that is not in the `[:graph:]' class
r -- return
t -- horizontal tab
Examples based on Centos 7:tris GNU and UTF-8 encoding
$ echo "fiancÃÂÃÂÃÂÃÂÃÂÃÂÃÂÃÂÃÂÃÂ" | tr -cd '[:print:]trn'
fianc
$ echo "get ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂâ " | tr -cd '[:print:]trn'
get ^^^^^^
echo " Caucasian male lives in Arizona w/ fiancâÂÂ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂ" | tr -cd '[:print:]trn'
Caucasian male lives in Arizona w/ fianc^^^^^^^^^^^^
edited Sep 25 at 22:58
answered Sep 25 at 19:23
Goro
6,42352863
6,42352863
That did not work for me I tried echo" Caucasian male lives in Arizona w/ fiancâÂÂ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂ" | tr -d '[:print:]'and got output as some unreadable text
â Auguster
Sep 25 at 19:36
1
LC_ALL=C tr ...
â Jeff Schaller
Sep 25 at 19:38
1
LC_ALL=C tr -cd '[:print:]' < inputworks here
â Jeff Schaller
Sep 25 at 19:43
1
echo "fiancÃÃÃÃÃÃÃÃÃÃ" | tr -cd '[:print:]trn'should returnfiancÃÃÃÃÃÃÃÃÃÃasÃis a printable character. GNUtrdoesn't in UTF8 as it doesn't support multi-byte characters yet, but it does in iso8859-1. In the C locale on systems where the C locale charset is ASCII, that does removeÃ(or whatever bytes those are made of) as ASCII has no such character in the first place.
â Stéphane Chazelas
Sep 25 at 22:46
1
Because CentOStris GNUtrand you probably tried it in a UTF-8 locale whereÃis made of 2 bytes and GNUtrdoesn't support multibyte characters. If you useLC_ALL=Cas suggested by Auguster, it will work (at removing thoseÃhowever they're encoded) regardless of whethertrsupports multibyte characters or not. In the C locale, all characters are single bytes, and on most systems including AIX, the C locale charset is ASCII that has no character with the 8th bit set (which each byte of the UTF-8 encoding of à has as well as its single byte iso8859-1 encoding)
â Stéphane Chazelas
Sep 25 at 22:52
 |Â
show 3 more comments
That did not work for me I tried echo" Caucasian male lives in Arizona w/ fiancâÂÂ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂ" | tr -d '[:print:]'and got output as some unreadable text
â Auguster
Sep 25 at 19:36
1
LC_ALL=C tr ...
â Jeff Schaller
Sep 25 at 19:38
1
LC_ALL=C tr -cd '[:print:]' < inputworks here
â Jeff Schaller
Sep 25 at 19:43
1
echo "fiancÃÃÃÃÃÃÃÃÃÃ" | tr -cd '[:print:]trn'should returnfiancÃÃÃÃÃÃÃÃÃÃasÃis a printable character. GNUtrdoesn't in UTF8 as it doesn't support multi-byte characters yet, but it does in iso8859-1. In the C locale on systems where the C locale charset is ASCII, that does removeÃ(or whatever bytes those are made of) as ASCII has no such character in the first place.
â Stéphane Chazelas
Sep 25 at 22:46
1
Because CentOStris GNUtrand you probably tried it in a UTF-8 locale whereÃis made of 2 bytes and GNUtrdoesn't support multibyte characters. If you useLC_ALL=Cas suggested by Auguster, it will work (at removing thoseÃhowever they're encoded) regardless of whethertrsupports multibyte characters or not. In the C locale, all characters are single bytes, and on most systems including AIX, the C locale charset is ASCII that has no character with the 8th bit set (which each byte of the UTF-8 encoding of à has as well as its single byte iso8859-1 encoding)
â Stéphane Chazelas
Sep 25 at 22:52
That did not work for me I tried echo
" Caucasian male lives in Arizona w/ fiancâÂÂ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂ" | tr -d '[:print:]' and got output as some unreadable textâ Auguster
Sep 25 at 19:36
That did not work for me I tried echo
" Caucasian male lives in Arizona w/ fiancâÂÂ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂâÂÂ^âÂÂ" | tr -d '[:print:]' and got output as some unreadable textâ Auguster
Sep 25 at 19:36
1
1
LC_ALL=C tr ...â Jeff Schaller
Sep 25 at 19:38
LC_ALL=C tr ...â Jeff Schaller
Sep 25 at 19:38
1
1
LC_ALL=C tr -cd '[:print:]' < input works hereâ Jeff Schaller
Sep 25 at 19:43
LC_ALL=C tr -cd '[:print:]' < input works hereâ Jeff Schaller
Sep 25 at 19:43
1
1
echo "fiancÃÃÃÃÃÃÃÃÃÃ" | tr -cd '[:print:]trn' should return fiancÃÃÃÃÃÃÃÃÃà as à is a printable character. GNU tr doesn't in UTF8 as it doesn't support multi-byte characters yet, but it does in iso8859-1. In the C locale on systems where the C locale charset is ASCII, that does remove à (or whatever bytes those are made of) as ASCII has no such character in the first place.â Stéphane Chazelas
Sep 25 at 22:46
echo "fiancÃÃÃÃÃÃÃÃÃÃ" | tr -cd '[:print:]trn' should return fiancÃÃÃÃÃÃÃÃÃà as à is a printable character. GNU tr doesn't in UTF8 as it doesn't support multi-byte characters yet, but it does in iso8859-1. In the C locale on systems where the C locale charset is ASCII, that does remove à (or whatever bytes those are made of) as ASCII has no such character in the first place.â Stéphane Chazelas
Sep 25 at 22:46
1
1
Because CentOS
tr is GNU tr and you probably tried it in a UTF-8 locale where à is made of 2 bytes and GNU tr doesn't support multibyte characters. If you use LC_ALL=C as suggested by Auguster, it will work (at removing those à however they're encoded) regardless of whether tr supports multibyte characters or not. In the C locale, all characters are single bytes, and on most systems including AIX, the C locale charset is ASCII that has no character with the 8th bit set (which each byte of the UTF-8 encoding of à has as well as its single byte iso8859-1 encoding)â Stéphane Chazelas
Sep 25 at 22:52
Because CentOS
tr is GNU tr and you probably tried it in a UTF-8 locale where à is made of 2 bytes and GNU tr doesn't support multibyte characters. If you use LC_ALL=C as suggested by Auguster, it will work (at removing those à however they're encoded) regardless of whether tr supports multibyte characters or not. In the C locale, all characters are single bytes, and on most systems including AIX, the C locale charset is ASCII that has no character with the 8th bit set (which each byte of the UTF-8 encoding of à has as well as its single byte iso8859-1 encoding)â Stéphane Chazelas
Sep 25 at 22:52
 |Â
show 3 more comments
ÃandâÂÂlook pretty printable to me. A UTF-8Ãis encoded as 0xc3 0x83. 0xc3 in iso8859-1 or 15 is alsoÃas it happens which is printable, 0x83 would be a control character in both thoughâ Stéphane Chazelas
Sep 25 at 19:53
Possible dublicate unix.stackexchange.com/questions/201751/â¦
â Goro
Sep 25 at 20:05
1
@Goro Yes at this point its is possibly a duplicate now that I understand to use C locale
â Auguster
Sep 25 at 20:09
To actually show what the characeters are it is useful to show their hex values. Something like:
echo "fiancÃÃÃÃÃÃÃÃÃÃ" | od -tx1, or, maybe if your sed supports it:echo "fiancÃÃÃÃÃÃÃÃÃÃ" | sed -n l.â Isaac
Sep 25 at 21:08