Removing characters with sed [duplicate]

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
2
down vote

favorite













This question already has an answer here:



  • Match language range in shell, sed or awk

    2 answers



I am working on AIX unix and trying to remove non-printable characters from file the data looks like Caucasian male lives in Arizona w/ fiancÃÂÃÂÃÂÃÂÃÂ in file when I view in Notepad++ using UTF-8 encoding. When I try to view file in unix I get ^▒▒^▒▒^▒▒^▒▒^▒▒^▒▒ instead of the special characters.



I want to replace all those special characters with space.



I tried sed 's/[^[:print:]]/ /g' file but it does not remove those characters.My locale are listed below when I run locale -a



C
POSIX
en_US.8859-15
en_US.ISO8859-1
en_US


I even tried sed -e 's/[^ -~]/ /g' file and it did not remove the characters.



I see that others stackflow answers used UTF-8 locale with GNU sed and this worked but I do not have that locale.



Also I am using ksh.










share|improve this question















marked as duplicate by Isaac, Goro, RalfFriedl, Shadur, X Tian Sep 27 at 8:53


This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.














  • à and ▒ look pretty printable to me. A UTF-8 à is encoded as 0xc3 0x83. 0xc3 in iso8859-1 or 15 is also à as it happens which is printable, 0x83 would be a control character in both though
    – Stéphane Chazelas
    Sep 25 at 19:53










  • Possible dublicate unix.stackexchange.com/questions/201751/…
    – Goro
    Sep 25 at 20:05







  • 1




    @Goro Yes at this point its is possibly a duplicate now that I understand to use C locale
    – Auguster
    Sep 25 at 20:09










  • To actually show what the characeters are it is useful to show their hex values. Something like: echo "fiancÃÂÃÂÃÂÃÂÃÂ" | od -tx1, or, maybe if your sed supports it: echo "fiancÃÂÃÂÃÂÃÂÃÂ" | sed -n l.
    – Isaac
    Sep 25 at 21:08














up vote
2
down vote

favorite













This question already has an answer here:



  • Match language range in shell, sed or awk

    2 answers



I am working on AIX unix and trying to remove non-printable characters from file the data looks like Caucasian male lives in Arizona w/ fiancÃÂÃÂÃÂÃÂÃÂ in file when I view in Notepad++ using UTF-8 encoding. When I try to view file in unix I get ^▒▒^▒▒^▒▒^▒▒^▒▒^▒▒ instead of the special characters.



I want to replace all those special characters with space.



I tried sed 's/[^[:print:]]/ /g' file but it does not remove those characters.My locale are listed below when I run locale -a



C
POSIX
en_US.8859-15
en_US.ISO8859-1
en_US


I even tried sed -e 's/[^ -~]/ /g' file and it did not remove the characters.



I see that others stackflow answers used UTF-8 locale with GNU sed and this worked but I do not have that locale.



Also I am using ksh.










share|improve this question















marked as duplicate by Isaac, Goro, RalfFriedl, Shadur, X Tian Sep 27 at 8:53


This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.














  • à and ▒ look pretty printable to me. A UTF-8 à is encoded as 0xc3 0x83. 0xc3 in iso8859-1 or 15 is also à as it happens which is printable, 0x83 would be a control character in both though
    – Stéphane Chazelas
    Sep 25 at 19:53










  • Possible dublicate unix.stackexchange.com/questions/201751/…
    – Goro
    Sep 25 at 20:05







  • 1




    @Goro Yes at this point its is possibly a duplicate now that I understand to use C locale
    – Auguster
    Sep 25 at 20:09










  • To actually show what the characeters are it is useful to show their hex values. Something like: echo "fiancÃÂÃÂÃÂÃÂÃÂ" | od -tx1, or, maybe if your sed supports it: echo "fiancÃÂÃÂÃÂÃÂÃÂ" | sed -n l.
    – Isaac
    Sep 25 at 21:08












up vote
2
down vote

favorite









up vote
2
down vote

favorite












This question already has an answer here:



  • Match language range in shell, sed or awk

    2 answers



I am working on AIX unix and trying to remove non-printable characters from file the data looks like Caucasian male lives in Arizona w/ fiancÃÂÃÂÃÂÃÂÃÂ in file when I view in Notepad++ using UTF-8 encoding. When I try to view file in unix I get ^▒▒^▒▒^▒▒^▒▒^▒▒^▒▒ instead of the special characters.



I want to replace all those special characters with space.



I tried sed 's/[^[:print:]]/ /g' file but it does not remove those characters.My locale are listed below when I run locale -a



C
POSIX
en_US.8859-15
en_US.ISO8859-1
en_US


I even tried sed -e 's/[^ -~]/ /g' file and it did not remove the characters.



I see that others stackflow answers used UTF-8 locale with GNU sed and this worked but I do not have that locale.



Also I am using ksh.










share|improve this question
















This question already has an answer here:



  • Match language range in shell, sed or awk

    2 answers



I am working on AIX unix and trying to remove non-printable characters from file the data looks like Caucasian male lives in Arizona w/ fiancÃÂÃÂÃÂÃÂÃÂ in file when I view in Notepad++ using UTF-8 encoding. When I try to view file in unix I get ^▒▒^▒▒^▒▒^▒▒^▒▒^▒▒ instead of the special characters.



I want to replace all those special characters with space.



I tried sed 's/[^[:print:]]/ /g' file but it does not remove those characters.My locale are listed below when I run locale -a



C
POSIX
en_US.8859-15
en_US.ISO8859-1
en_US


I even tried sed -e 's/[^ -~]/ /g' file and it did not remove the characters.



I see that others stackflow answers used UTF-8 locale with GNU sed and this worked but I do not have that locale.



Also I am using ksh.





This question already has an answer here:



  • Match language range in shell, sed or awk

    2 answers







text-processing sed ksh aix






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Sep 25 at 19:29

























asked Sep 25 at 19:13









Auguster

133




133




marked as duplicate by Isaac, Goro, RalfFriedl, Shadur, X Tian Sep 27 at 8:53


This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.






marked as duplicate by Isaac, Goro, RalfFriedl, Shadur, X Tian Sep 27 at 8:53


This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.













  • à and ▒ look pretty printable to me. A UTF-8 à is encoded as 0xc3 0x83. 0xc3 in iso8859-1 or 15 is also à as it happens which is printable, 0x83 would be a control character in both though
    – Stéphane Chazelas
    Sep 25 at 19:53










  • Possible dublicate unix.stackexchange.com/questions/201751/…
    – Goro
    Sep 25 at 20:05







  • 1




    @Goro Yes at this point its is possibly a duplicate now that I understand to use C locale
    – Auguster
    Sep 25 at 20:09










  • To actually show what the characeters are it is useful to show their hex values. Something like: echo "fiancÃÂÃÂÃÂÃÂÃÂ" | od -tx1, or, maybe if your sed supports it: echo "fiancÃÂÃÂÃÂÃÂÃÂ" | sed -n l.
    – Isaac
    Sep 25 at 21:08
















  • à and ▒ look pretty printable to me. A UTF-8 à is encoded as 0xc3 0x83. 0xc3 in iso8859-1 or 15 is also à as it happens which is printable, 0x83 would be a control character in both though
    – Stéphane Chazelas
    Sep 25 at 19:53










  • Possible dublicate unix.stackexchange.com/questions/201751/…
    – Goro
    Sep 25 at 20:05







  • 1




    @Goro Yes at this point its is possibly a duplicate now that I understand to use C locale
    – Auguster
    Sep 25 at 20:09










  • To actually show what the characeters are it is useful to show their hex values. Something like: echo "fiancÃÂÃÂÃÂÃÂÃÂ" | od -tx1, or, maybe if your sed supports it: echo "fiancÃÂÃÂÃÂÃÂÃÂ" | sed -n l.
    – Isaac
    Sep 25 at 21:08















à and ▒ look pretty printable to me. A UTF-8 à is encoded as 0xc3 0x83. 0xc3 in iso8859-1 or 15 is also à as it happens which is printable, 0x83 would be a control character in both though
– Stéphane Chazelas
Sep 25 at 19:53




à and ▒ look pretty printable to me. A UTF-8 à is encoded as 0xc3 0x83. 0xc3 in iso8859-1 or 15 is also à as it happens which is printable, 0x83 would be a control character in both though
– Stéphane Chazelas
Sep 25 at 19:53












Possible dublicate unix.stackexchange.com/questions/201751/…
– Goro
Sep 25 at 20:05





Possible dublicate unix.stackexchange.com/questions/201751/…
– Goro
Sep 25 at 20:05





1




1




@Goro Yes at this point its is possibly a duplicate now that I understand to use C locale
– Auguster
Sep 25 at 20:09




@Goro Yes at this point its is possibly a duplicate now that I understand to use C locale
– Auguster
Sep 25 at 20:09












To actually show what the characeters are it is useful to show their hex values. Something like: echo "fiancÃÂÃÂÃÂÃÂÃÂ" | od -tx1, or, maybe if your sed supports it: echo "fiancÃÂÃÂÃÂÃÂÃÂ" | sed -n l.
– Isaac
Sep 25 at 21:08




To actually show what the characeters are it is useful to show their hex values. Something like: echo "fiancÃÂÃÂÃÂÃÂÃÂ" | od -tx1, or, maybe if your sed supports it: echo "fiancÃÂÃÂÃÂÃÂÃÂ" | sed -n l.
– Isaac
Sep 25 at 21:08










2 Answers
2






active

oldest

votes

















up vote
1
down vote



accepted










If the current locale already uses UTF-8 as the charset (and file is written using that charset):



<file LC_ALL=C sed 's/[^ -~]//g'


Or, to include control characters in AIX sed:



<file LC_ALL=C sed "$(printf "s/[^[:print:]tr]//g")"





share|improve this answer






















  • @Stéphane what does printf do here? If I am deleting all characters and saving to another file do I need to use printf?
    – Auguster
    Sep 26 at 13:50










  • @Auguster, printf is there to expand the t into a TAB character and r into a CR character. If using ksh93 on AIX, you can also use $'s/[^[:print:]tr]//g'
    – Stéphane Chazelas
    Sep 26 at 15:16

















up vote
3
down vote













You can use the command tr as follows:



tr -cd '[:print:]trn'


Explanation:



`[:print:]'
Any character from the `[:space:]' class, and any character that is not in the `[:graph:]' class
r -- return
t -- horizontal tab


Examples based on Centos 7:tris GNU and UTF-8 encoding



$ echo "fiancÃÂÃÂÃÂÃÂÃÂ" | tr -cd '[:print:]trn'
fianc

$ echo "get ^▒▒^▒▒^▒▒^▒▒^▒▒^▒▒ " | tr -cd '[:print:]trn'
get ^^^^^^

echo " Caucasian male lives in Arizona w/ fianc▒^▒▒^▒▒^▒▒^▒▒^▒▒^▒^▒▒^▒▒^▒▒^▒▒^▒▒^▒" | tr -cd '[:print:]trn'
Caucasian male lives in Arizona w/ fianc^^^^^^^^^^^^





share|improve this answer






















  • That did not work for me I tried echo " Caucasian male lives in Arizona w/ fianc▒^▒▒^▒▒^▒▒^▒▒^▒▒^▒^▒▒^▒▒^▒▒^▒▒^▒▒^▒" | tr -d '[:print:]' and got output as some unreadable text
    – Auguster
    Sep 25 at 19:36






  • 1




    LC_ALL=C tr ...
    – Jeff Schaller
    Sep 25 at 19:38






  • 1




    LC_ALL=C tr -cd '[:print:]' < input works here
    – Jeff Schaller
    Sep 25 at 19:43






  • 1




    echo "fiancÃÂÃÂÃÂÃÂÃÂ" | tr -cd '[:print:]trn' should return fiancÃÂÃÂÃÂÃÂàas  is a printable character. GNU tr doesn't in UTF8 as it doesn't support multi-byte characters yet, but it does in iso8859-1. In the C locale on systems where the C locale charset is ASCII, that does remove  (or whatever bytes those are made of) as ASCII has no such character in the first place.
    – Stéphane Chazelas
    Sep 25 at 22:46







  • 1




    Because CentOS tr is GNU tr and you probably tried it in a UTF-8 locale where à is made of 2 bytes and GNU tr doesn't support multibyte characters. If you use LC_ALL=C as suggested by Auguster, it will work (at removing those à however they're encoded) regardless of whether tr supports multibyte characters or not. In the C locale, all characters are single bytes, and on most systems including AIX, the C locale charset is ASCII that has no character with the 8th bit set (which each byte of the UTF-8 encoding of à has as well as its single byte iso8859-1 encoding)
    – Stéphane Chazelas
    Sep 25 at 22:52


















2 Answers
2






active

oldest

votes








2 Answers
2






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
1
down vote



accepted










If the current locale already uses UTF-8 as the charset (and file is written using that charset):



<file LC_ALL=C sed 's/[^ -~]//g'


Or, to include control characters in AIX sed:



<file LC_ALL=C sed "$(printf "s/[^[:print:]tr]//g")"





share|improve this answer






















  • @Stéphane what does printf do here? If I am deleting all characters and saving to another file do I need to use printf?
    – Auguster
    Sep 26 at 13:50










  • @Auguster, printf is there to expand the t into a TAB character and r into a CR character. If using ksh93 on AIX, you can also use $'s/[^[:print:]tr]//g'
    – Stéphane Chazelas
    Sep 26 at 15:16














up vote
1
down vote



accepted










If the current locale already uses UTF-8 as the charset (and file is written using that charset):



<file LC_ALL=C sed 's/[^ -~]//g'


Or, to include control characters in AIX sed:



<file LC_ALL=C sed "$(printf "s/[^[:print:]tr]//g")"





share|improve this answer






















  • @Stéphane what does printf do here? If I am deleting all characters and saving to another file do I need to use printf?
    – Auguster
    Sep 26 at 13:50










  • @Auguster, printf is there to expand the t into a TAB character and r into a CR character. If using ksh93 on AIX, you can also use $'s/[^[:print:]tr]//g'
    – Stéphane Chazelas
    Sep 26 at 15:16












up vote
1
down vote



accepted







up vote
1
down vote



accepted






If the current locale already uses UTF-8 as the charset (and file is written using that charset):



<file LC_ALL=C sed 's/[^ -~]//g'


Or, to include control characters in AIX sed:



<file LC_ALL=C sed "$(printf "s/[^[:print:]tr]//g")"





share|improve this answer














If the current locale already uses UTF-8 as the charset (and file is written using that charset):



<file LC_ALL=C sed 's/[^ -~]//g'


Or, to include control characters in AIX sed:



<file LC_ALL=C sed "$(printf "s/[^[:print:]tr]//g")"






share|improve this answer














share|improve this answer



share|improve this answer








edited Sep 25 at 22:57









Stéphane Chazelas

287k53529867




287k53529867










answered Sep 25 at 21:55









Isaac

7,56011137




7,56011137











  • @Stéphane what does printf do here? If I am deleting all characters and saving to another file do I need to use printf?
    – Auguster
    Sep 26 at 13:50










  • @Auguster, printf is there to expand the t into a TAB character and r into a CR character. If using ksh93 on AIX, you can also use $'s/[^[:print:]tr]//g'
    – Stéphane Chazelas
    Sep 26 at 15:16
















  • @Stéphane what does printf do here? If I am deleting all characters and saving to another file do I need to use printf?
    – Auguster
    Sep 26 at 13:50










  • @Auguster, printf is there to expand the t into a TAB character and r into a CR character. If using ksh93 on AIX, you can also use $'s/[^[:print:]tr]//g'
    – Stéphane Chazelas
    Sep 26 at 15:16















@Stéphane what does printf do here? If I am deleting all characters and saving to another file do I need to use printf?
– Auguster
Sep 26 at 13:50




@Stéphane what does printf do here? If I am deleting all characters and saving to another file do I need to use printf?
– Auguster
Sep 26 at 13:50












@Auguster, printf is there to expand the t into a TAB character and r into a CR character. If using ksh93 on AIX, you can also use $'s/[^[:print:]tr]//g'
– Stéphane Chazelas
Sep 26 at 15:16




@Auguster, printf is there to expand the t into a TAB character and r into a CR character. If using ksh93 on AIX, you can also use $'s/[^[:print:]tr]//g'
– Stéphane Chazelas
Sep 26 at 15:16












up vote
3
down vote













You can use the command tr as follows:



tr -cd '[:print:]trn'


Explanation:



`[:print:]'
Any character from the `[:space:]' class, and any character that is not in the `[:graph:]' class
r -- return
t -- horizontal tab


Examples based on Centos 7:tris GNU and UTF-8 encoding



$ echo "fiancÃÂÃÂÃÂÃÂÃÂ" | tr -cd '[:print:]trn'
fianc

$ echo "get ^▒▒^▒▒^▒▒^▒▒^▒▒^▒▒ " | tr -cd '[:print:]trn'
get ^^^^^^

echo " Caucasian male lives in Arizona w/ fianc▒^▒▒^▒▒^▒▒^▒▒^▒▒^▒^▒▒^▒▒^▒▒^▒▒^▒▒^▒" | tr -cd '[:print:]trn'
Caucasian male lives in Arizona w/ fianc^^^^^^^^^^^^





share|improve this answer






















  • That did not work for me I tried echo " Caucasian male lives in Arizona w/ fianc▒^▒▒^▒▒^▒▒^▒▒^▒▒^▒^▒▒^▒▒^▒▒^▒▒^▒▒^▒" | tr -d '[:print:]' and got output as some unreadable text
    – Auguster
    Sep 25 at 19:36






  • 1




    LC_ALL=C tr ...
    – Jeff Schaller
    Sep 25 at 19:38






  • 1




    LC_ALL=C tr -cd '[:print:]' < input works here
    – Jeff Schaller
    Sep 25 at 19:43






  • 1




    echo "fiancÃÂÃÂÃÂÃÂÃÂ" | tr -cd '[:print:]trn' should return fiancÃÂÃÂÃÂÃÂàas  is a printable character. GNU tr doesn't in UTF8 as it doesn't support multi-byte characters yet, but it does in iso8859-1. In the C locale on systems where the C locale charset is ASCII, that does remove  (or whatever bytes those are made of) as ASCII has no such character in the first place.
    – Stéphane Chazelas
    Sep 25 at 22:46







  • 1




    Because CentOS tr is GNU tr and you probably tried it in a UTF-8 locale where à is made of 2 bytes and GNU tr doesn't support multibyte characters. If you use LC_ALL=C as suggested by Auguster, it will work (at removing those à however they're encoded) regardless of whether tr supports multibyte characters or not. In the C locale, all characters are single bytes, and on most systems including AIX, the C locale charset is ASCII that has no character with the 8th bit set (which each byte of the UTF-8 encoding of à has as well as its single byte iso8859-1 encoding)
    – Stéphane Chazelas
    Sep 25 at 22:52















up vote
3
down vote













You can use the command tr as follows:



tr -cd '[:print:]trn'


Explanation:



`[:print:]'
Any character from the `[:space:]' class, and any character that is not in the `[:graph:]' class
r -- return
t -- horizontal tab


Examples based on Centos 7:tris GNU and UTF-8 encoding



$ echo "fiancÃÂÃÂÃÂÃÂÃÂ" | tr -cd '[:print:]trn'
fianc

$ echo "get ^▒▒^▒▒^▒▒^▒▒^▒▒^▒▒ " | tr -cd '[:print:]trn'
get ^^^^^^

echo " Caucasian male lives in Arizona w/ fianc▒^▒▒^▒▒^▒▒^▒▒^▒▒^▒^▒▒^▒▒^▒▒^▒▒^▒▒^▒" | tr -cd '[:print:]trn'
Caucasian male lives in Arizona w/ fianc^^^^^^^^^^^^





share|improve this answer






















  • That did not work for me I tried echo " Caucasian male lives in Arizona w/ fianc▒^▒▒^▒▒^▒▒^▒▒^▒▒^▒^▒▒^▒▒^▒▒^▒▒^▒▒^▒" | tr -d '[:print:]' and got output as some unreadable text
    – Auguster
    Sep 25 at 19:36






  • 1




    LC_ALL=C tr ...
    – Jeff Schaller
    Sep 25 at 19:38






  • 1




    LC_ALL=C tr -cd '[:print:]' < input works here
    – Jeff Schaller
    Sep 25 at 19:43






  • 1




    echo "fiancÃÂÃÂÃÂÃÂÃÂ" | tr -cd '[:print:]trn' should return fiancÃÂÃÂÃÂÃÂàas  is a printable character. GNU tr doesn't in UTF8 as it doesn't support multi-byte characters yet, but it does in iso8859-1. In the C locale on systems where the C locale charset is ASCII, that does remove  (or whatever bytes those are made of) as ASCII has no such character in the first place.
    – Stéphane Chazelas
    Sep 25 at 22:46







  • 1




    Because CentOS tr is GNU tr and you probably tried it in a UTF-8 locale where à is made of 2 bytes and GNU tr doesn't support multibyte characters. If you use LC_ALL=C as suggested by Auguster, it will work (at removing those à however they're encoded) regardless of whether tr supports multibyte characters or not. In the C locale, all characters are single bytes, and on most systems including AIX, the C locale charset is ASCII that has no character with the 8th bit set (which each byte of the UTF-8 encoding of à has as well as its single byte iso8859-1 encoding)
    – Stéphane Chazelas
    Sep 25 at 22:52













up vote
3
down vote










up vote
3
down vote









You can use the command tr as follows:



tr -cd '[:print:]trn'


Explanation:



`[:print:]'
Any character from the `[:space:]' class, and any character that is not in the `[:graph:]' class
r -- return
t -- horizontal tab


Examples based on Centos 7:tris GNU and UTF-8 encoding



$ echo "fiancÃÂÃÂÃÂÃÂÃÂ" | tr -cd '[:print:]trn'
fianc

$ echo "get ^▒▒^▒▒^▒▒^▒▒^▒▒^▒▒ " | tr -cd '[:print:]trn'
get ^^^^^^

echo " Caucasian male lives in Arizona w/ fianc▒^▒▒^▒▒^▒▒^▒▒^▒▒^▒^▒▒^▒▒^▒▒^▒▒^▒▒^▒" | tr -cd '[:print:]trn'
Caucasian male lives in Arizona w/ fianc^^^^^^^^^^^^





share|improve this answer














You can use the command tr as follows:



tr -cd '[:print:]trn'


Explanation:



`[:print:]'
Any character from the `[:space:]' class, and any character that is not in the `[:graph:]' class
r -- return
t -- horizontal tab


Examples based on Centos 7:tris GNU and UTF-8 encoding



$ echo "fiancÃÂÃÂÃÂÃÂÃÂ" | tr -cd '[:print:]trn'
fianc

$ echo "get ^▒▒^▒▒^▒▒^▒▒^▒▒^▒▒ " | tr -cd '[:print:]trn'
get ^^^^^^

echo " Caucasian male lives in Arizona w/ fianc▒^▒▒^▒▒^▒▒^▒▒^▒▒^▒^▒▒^▒▒^▒▒^▒▒^▒▒^▒" | tr -cd '[:print:]trn'
Caucasian male lives in Arizona w/ fianc^^^^^^^^^^^^






share|improve this answer














share|improve this answer



share|improve this answer








edited Sep 25 at 22:58

























answered Sep 25 at 19:23









Goro

6,42352863




6,42352863











  • That did not work for me I tried echo " Caucasian male lives in Arizona w/ fianc▒^▒▒^▒▒^▒▒^▒▒^▒▒^▒^▒▒^▒▒^▒▒^▒▒^▒▒^▒" | tr -d '[:print:]' and got output as some unreadable text
    – Auguster
    Sep 25 at 19:36






  • 1




    LC_ALL=C tr ...
    – Jeff Schaller
    Sep 25 at 19:38






  • 1




    LC_ALL=C tr -cd '[:print:]' < input works here
    – Jeff Schaller
    Sep 25 at 19:43






  • 1




    echo "fiancÃÂÃÂÃÂÃÂÃÂ" | tr -cd '[:print:]trn' should return fiancÃÂÃÂÃÂÃÂàas  is a printable character. GNU tr doesn't in UTF8 as it doesn't support multi-byte characters yet, but it does in iso8859-1. In the C locale on systems where the C locale charset is ASCII, that does remove  (or whatever bytes those are made of) as ASCII has no such character in the first place.
    – Stéphane Chazelas
    Sep 25 at 22:46







  • 1




    Because CentOS tr is GNU tr and you probably tried it in a UTF-8 locale where à is made of 2 bytes and GNU tr doesn't support multibyte characters. If you use LC_ALL=C as suggested by Auguster, it will work (at removing those à however they're encoded) regardless of whether tr supports multibyte characters or not. In the C locale, all characters are single bytes, and on most systems including AIX, the C locale charset is ASCII that has no character with the 8th bit set (which each byte of the UTF-8 encoding of à has as well as its single byte iso8859-1 encoding)
    – Stéphane Chazelas
    Sep 25 at 22:52

















  • That did not work for me I tried echo " Caucasian male lives in Arizona w/ fianc▒^▒▒^▒▒^▒▒^▒▒^▒▒^▒^▒▒^▒▒^▒▒^▒▒^▒▒^▒" | tr -d '[:print:]' and got output as some unreadable text
    – Auguster
    Sep 25 at 19:36






  • 1




    LC_ALL=C tr ...
    – Jeff Schaller
    Sep 25 at 19:38






  • 1




    LC_ALL=C tr -cd '[:print:]' < input works here
    – Jeff Schaller
    Sep 25 at 19:43






  • 1




    echo "fiancÃÂÃÂÃÂÃÂÃÂ" | tr -cd '[:print:]trn' should return fiancÃÂÃÂÃÂÃÂàas  is a printable character. GNU tr doesn't in UTF8 as it doesn't support multi-byte characters yet, but it does in iso8859-1. In the C locale on systems where the C locale charset is ASCII, that does remove  (or whatever bytes those are made of) as ASCII has no such character in the first place.
    – Stéphane Chazelas
    Sep 25 at 22:46







  • 1




    Because CentOS tr is GNU tr and you probably tried it in a UTF-8 locale where à is made of 2 bytes and GNU tr doesn't support multibyte characters. If you use LC_ALL=C as suggested by Auguster, it will work (at removing those à however they're encoded) regardless of whether tr supports multibyte characters or not. In the C locale, all characters are single bytes, and on most systems including AIX, the C locale charset is ASCII that has no character with the 8th bit set (which each byte of the UTF-8 encoding of à has as well as its single byte iso8859-1 encoding)
    – Stéphane Chazelas
    Sep 25 at 22:52
















That did not work for me I tried echo " Caucasian male lives in Arizona w/ fianc▒^▒▒^▒▒^▒▒^▒▒^▒▒^▒^▒▒^▒▒^▒▒^▒▒^▒▒^▒" | tr -d '[:print:]' and got output as some unreadable text
– Auguster
Sep 25 at 19:36




That did not work for me I tried echo " Caucasian male lives in Arizona w/ fianc▒^▒▒^▒▒^▒▒^▒▒^▒▒^▒^▒▒^▒▒^▒▒^▒▒^▒▒^▒" | tr -d '[:print:]' and got output as some unreadable text
– Auguster
Sep 25 at 19:36




1




1




LC_ALL=C tr ...
– Jeff Schaller
Sep 25 at 19:38




LC_ALL=C tr ...
– Jeff Schaller
Sep 25 at 19:38




1




1




LC_ALL=C tr -cd '[:print:]' < input works here
– Jeff Schaller
Sep 25 at 19:43




LC_ALL=C tr -cd '[:print:]' < input works here
– Jeff Schaller
Sep 25 at 19:43




1




1




echo "fiancÃÂÃÂÃÂÃÂÃÂ" | tr -cd '[:print:]trn' should return fiancÃÂÃÂÃÂÃÂàas  is a printable character. GNU tr doesn't in UTF8 as it doesn't support multi-byte characters yet, but it does in iso8859-1. In the C locale on systems where the C locale charset is ASCII, that does remove  (or whatever bytes those are made of) as ASCII has no such character in the first place.
– Stéphane Chazelas
Sep 25 at 22:46





echo "fiancÃÂÃÂÃÂÃÂÃÂ" | tr -cd '[:print:]trn' should return fiancÃÂÃÂÃÂÃÂàas  is a printable character. GNU tr doesn't in UTF8 as it doesn't support multi-byte characters yet, but it does in iso8859-1. In the C locale on systems where the C locale charset is ASCII, that does remove  (or whatever bytes those are made of) as ASCII has no such character in the first place.
– Stéphane Chazelas
Sep 25 at 22:46





1




1




Because CentOS tr is GNU tr and you probably tried it in a UTF-8 locale where à is made of 2 bytes and GNU tr doesn't support multibyte characters. If you use LC_ALL=C as suggested by Auguster, it will work (at removing those à however they're encoded) regardless of whether tr supports multibyte characters or not. In the C locale, all characters are single bytes, and on most systems including AIX, the C locale charset is ASCII that has no character with the 8th bit set (which each byte of the UTF-8 encoding of à has as well as its single byte iso8859-1 encoding)
– Stéphane Chazelas
Sep 25 at 22:52





Because CentOS tr is GNU tr and you probably tried it in a UTF-8 locale where à is made of 2 bytes and GNU tr doesn't support multibyte characters. If you use LC_ALL=C as suggested by Auguster, it will work (at removing those à however they're encoded) regardless of whether tr supports multibyte characters or not. In the C locale, all characters are single bytes, and on most systems including AIX, the C locale charset is ASCII that has no character with the 8th bit set (which each byte of the UTF-8 encoding of à has as well as its single byte iso8859-1 encoding)
– Stéphane Chazelas
Sep 25 at 22:52



Popular posts from this blog

How to check contact read email or not when send email to Individual?

Displaying single band from multi-band raster using QGIS

How many registers does an x86_64 CPU actually have?