Replace UTF-8 characters with shell perl

up vote
6
down vote

favorite

How do I get perl to properly replace UTF-8 character from a shell?

The examples use stdin, but I need something that works for perl ... file too.

This is what I expect:

$ echo ABCÃƒÂ¦ÃƒÂ¸ÃƒÂ¥DEF | perl -CS -pe "s/([ÃƒÂ¦ÃƒÂ¸ÃƒÂ¥])/[\1]/g"
ABC[ÃƒÂ¦][ÃƒÂ¸][ÃƒÂ¥]DEF

This is what I get:

$ echo ABCÃƒÂ¦ÃƒÂ¸ÃƒÂ¥DEF | perl -CS -pe "s/([ÃƒÂ¦ÃƒÂ¸ÃƒÂ¥])/[\1]/g"
ABCÃƒÂ¦ÃƒÂ¸ÃƒÂ¥DEF

Replacing the Unicode characters with ASCII works instantly:

$ echo ABC123DEF | perl -CS -pe "s/([123])/[\1]/g"
ABC[1][2][3]DEF

My environment:

perl 5.18.2
Bash 3.2.57
LC_ALL=en_US.UTF-8
LANG=en_US.UTF-8

edited Apr 2 at 19:49

asked Apr 2 at 12:27

forthrin

800821

add a commentÂ |Â

up vote
6
down vote

favorite

How do I get perl to properly replace UTF-8 character from a shell?

The examples use stdin, but I need something that works for perl ... file too.

This is what I expect:

$ echo ABCÃƒÂ¦ÃƒÂ¸ÃƒÂ¥DEF | perl -CS -pe "s/([ÃƒÂ¦ÃƒÂ¸ÃƒÂ¥])/[\1]/g"
ABC[ÃƒÂ¦][ÃƒÂ¸][ÃƒÂ¥]DEF

This is what I get:

$ echo ABCÃƒÂ¦ÃƒÂ¸ÃƒÂ¥DEF | perl -CS -pe "s/([ÃƒÂ¦ÃƒÂ¸ÃƒÂ¥])/[\1]/g"
ABCÃƒÂ¦ÃƒÂ¸ÃƒÂ¥DEF

Replacing the Unicode characters with ASCII works instantly:

$ echo ABC123DEF | perl -CS -pe "s/([123])/[\1]/g"
ABC[1][2][3]DEF

My environment:

perl 5.18.2
Bash 3.2.57
LC_ALL=en_US.UTF-8
LANG=en_US.UTF-8

edited Apr 2 at 19:49

asked Apr 2 at 12:27

forthrin

800821

add a commentÂ |Â

up vote
6
down vote

favorite

How do I get perl to properly replace UTF-8 character from a shell?

The examples use stdin, but I need something that works for perl ... file too.

This is what I expect:

$ echo ABCÃƒÂ¦ÃƒÂ¸ÃƒÂ¥DEF | perl -CS -pe "s/([ÃƒÂ¦ÃƒÂ¸ÃƒÂ¥])/[\1]/g"
ABC[ÃƒÂ¦][ÃƒÂ¸][ÃƒÂ¥]DEF

This is what I get:

$ echo ABCÃƒÂ¦ÃƒÂ¸ÃƒÂ¥DEF | perl -CS -pe "s/([ÃƒÂ¦ÃƒÂ¸ÃƒÂ¥])/[\1]/g"
ABCÃƒÂ¦ÃƒÂ¸ÃƒÂ¥DEF

Replacing the Unicode characters with ASCII works instantly:

$ echo ABC123DEF | perl -CS -pe "s/([123])/[\1]/g"
ABC[1][2][3]DEF

My environment:

perl 5.18.2
Bash 3.2.57
LC_ALL=en_US.UTF-8
LANG=en_US.UTF-8

edited Apr 2 at 19:49

asked Apr 2 at 12:27

forthrin

800821

How do I get perl to properly replace UTF-8 character from a shell?

The examples use stdin, but I need something that works for perl ... file too.

This is what I expect:

$ echo ABCÃƒÂ¦ÃƒÂ¸ÃƒÂ¥DEF | perl -CS -pe "s/([ÃƒÂ¦ÃƒÂ¸ÃƒÂ¥])/[\1]/g"
ABC[ÃƒÂ¦][ÃƒÂ¸][ÃƒÂ¥]DEF

This is what I get:

$ echo ABCÃƒÂ¦ÃƒÂ¸ÃƒÂ¥DEF | perl -CS -pe "s/([ÃƒÂ¦ÃƒÂ¸ÃƒÂ¥])/[\1]/g"
ABCÃƒÂ¦ÃƒÂ¸ÃƒÂ¥DEF

Replacing the Unicode characters with ASCII works instantly:

$ echo ABC123DEF | perl -CS -pe "s/([123])/[\1]/g"
ABC[1][2][3]DEF

My environment:

perl 5.18.2
Bash 3.2.57
LC_ALL=en_US.UTF-8
LANG=en_US.UTF-8

edited Apr 2 at 19:49

asked Apr 2 at 12:27

forthrin

800821

edited Apr 2 at 19:49

asked Apr 2 at 12:27

forthrin

800821

asked Apr 2 at 12:27

forthrin

800821

asked Apr 2 at 12:27

forthrin

800821

add a commentÂ |Â

3 Answers
3

active

oldest

votes

up vote
6
down vote

accepted

Use this :

 $ echo 'ABCÃƒÂ¦ÃƒÂ¸ÃƒÂ¥DEF' |
 perl -CSD -Mutf8 -pe 's/([ÃƒÂ¦ÃƒÂ¸ÃƒÂ¥])/[$1]/g'

Works also for files

Output :

ABC[ÃƒÂ¦][ÃƒÂ¸][ÃƒÂ¥]DEF

Note :

substitutions: \1 is for awk, 1 is for sed and in perl we use $1

check perldoc perlrun for -CSD utf8 tricks

edited Apr 5 at 21:59

answered Apr 2 at 12:54

Gilles Quenot

15.3k13448

Very nice! I know you can do export PERL_UNICODE=S to avoid -CS. Is there a similar thing you can do to avoid -Mutf8? alias perl="/usr/bin/perl -Mutf8" if nothing else? I always use UTF-8.
â€“Â forthrin
Apr 2 at 13:39

Please read this authoritative post stackoverflow.com/questions/6162484/â€¦
â€“Â Gilles Quenot
Apr 2 at 13:43

1

Instead of perl -CS -Mopen=":std,IN,:encoding(utf-8)", why not perl -CSD?
â€“Â haukex
Apr 2 at 17:45

2

@forthrin Re "perl is not meant for Unicode." Perl has excellent Unicode support, but for backwards compatibility it is not enabled by default, including for oneliners. If all you use Perl for is oneliners, then yes, you may have to jump through some hoops, but if you write scripts, then you'll have an easier time.
â€“Â haukex
Apr 2 at 17:45

1

What I need is a string replacement tool on the command line, because this is something I do very often. I can confirm that your suggestion works for both stdin and files. I really appreciate the help.
â€“Â forthrin
Apr 3 at 5:42

Â |Â
show 3 more comments

up vote
1
down vote

Your input:

$ cat input.txt 
ABCÃƒÂ¦ÃƒÂ¸ÃƒÂ¥DEF
$ hexdump -C input.txt 
00000000 41 42 43 c3 a6 c3 b8 c3 a5 44 45 46 0a |ABC......DEF.|
0000000d

One good way IMO is the -C option plus utf8:

$ perl -CSD -Mutf8 -pe 's/([ÃƒÂ¦ÃƒÂ¸ÃƒÂ¥])/[$1]/g' input.txt 
ABC[ÃƒÂ¦][ÃƒÂ¸][ÃƒÂ¥]DEF
$ cat input.txt | perl -CSD -Mutf8 -pe 's/([ÃƒÂ¦ÃƒÂ¸ÃƒÂ¥])/[$1]/g'
ABC[ÃƒÂ¦][ÃƒÂ¸][ÃƒÂ¥]DEF

If you don't want to use UTF-8 on the command line, you can always write your Perl code in plain ASCII and use escapes such as xAB, xABCD, or in newer Perls NU+ABCD or NCHARNAME:

$ perl -CSD -pe 's/([xE6xF8xE5])/[$1]/g' input.txt 
ABC[ÃƒÂ¦][ÃƒÂ¸][ÃƒÂ¥]DEF
$ cat input.txt | perl -CSD -pe 's/([xE6xF8xE5])/[$1]/g'
ABC[ÃƒÂ¦][ÃƒÂ¸][ÃƒÂ¥]DEF

This one is getting a little creative: @ARGV will be interpreted as UTF-8, so you can keep your source code as ASCII and pass the UTF-8 characters via a command line argument (not necessarily the nicest solution, just showing how you could make use of the the -CA option):

$ perl -CSDA -pe 'BEGIN$p=shift; s/($p)/[$1]/g' '[ÃƒÂ¦ÃƒÂ¸ÃƒÂ¥]' input.txt 
ABC[ÃƒÂ¦][ÃƒÂ¸][ÃƒÂ¥]DEF
$ cat input.txt | perl -CSDA -pe 'BEGIN$p=shift; s/($p)/[$1]/g' '[ÃƒÂ¦ÃƒÂ¸ÃƒÂ¥]'
ABC[ÃƒÂ¦][ÃƒÂ¸][ÃƒÂ¥]DEF

Or, of course you can always turn the oneliner into an actual script, where you can

use warnings;
use 5.012;
use utf8;
use open qw/:std :encoding(UTF-8)/;
use charnames qw/:full :short/;

Further reading: perlunitut, perlunifaq, perluniintro, perlunicode, perlunicook.

answered Apr 2 at 18:21

haukex

2839

Appreciate the information about the alternative character syntaxes.
â€“Â forthrin
Apr 3 at 5:43

add a commentÂ |Â

up vote
-2
down vote

$ echo 'ABCÃƒÂ¦ÃƒÂ¸ÃƒÂ¥DEF' |
perl -CS -Mutf8 -pe 's/([ÃƒÂ¦ÃƒÂ¸ÃƒÂ¥])/[$1]/g'

answered Apr 2 at 18:26

Porno Nacionais

add a commentÂ |Â

Your Answer

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f435043%2freplace-utf-8-characters-with-shell-perl%23new-answer', 'question_page');

);

Post as a guest

Name

3 Answers
3

active

oldest

votes

3 Answers
3

active

oldest

votes

up vote
6
down vote

accepted

Use this :

 $ echo 'ABCÃƒÂ¦ÃƒÂ¸ÃƒÂ¥DEF' |
 perl -CSD -Mutf8 -pe 's/([ÃƒÂ¦ÃƒÂ¸ÃƒÂ¥])/[$1]/g'

Works also for files

Output :

ABC[ÃƒÂ¦][ÃƒÂ¸][ÃƒÂ¥]DEF

Note :

substitutions: \1 is for awk, 1 is for sed and in perl we use $1

check perldoc perlrun for -CSD utf8 tricks

edited Apr 5 at 21:59

answered Apr 2 at 12:54

Gilles Quenot

15.3k13448

Very nice! I know you can do export PERL_UNICODE=S to avoid -CS. Is there a similar thing you can do to avoid -Mutf8? alias perl="/usr/bin/perl -Mutf8" if nothing else? I always use UTF-8.
â€“Â forthrin
Apr 2 at 13:39

Please read this authoritative post stackoverflow.com/questions/6162484/â€¦
â€“Â Gilles Quenot
Apr 2 at 13:43

1

Instead of perl -CS -Mopen=":std,IN,:encoding(utf-8)", why not perl -CSD?
â€“Â haukex
Apr 2 at 17:45

2

@forthrin Re "perl is not meant for Unicode." Perl has excellent Unicode support, but for backwards compatibility it is not enabled by default, including for oneliners. If all you use Perl for is oneliners, then yes, you may have to jump through some hoops, but if you write scripts, then you'll have an easier time.
â€“Â haukex
Apr 2 at 17:45

1

What I need is a string replacement tool on the command line, because this is something I do very often. I can confirm that your suggestion works for both stdin and files. I really appreciate the help.
â€“Â forthrin
Apr 3 at 5:42

Â |Â
show 3 more comments

up vote
6
down vote

accepted

Use this :

 $ echo 'ABCÃƒÂ¦ÃƒÂ¸ÃƒÂ¥DEF' |
 perl -CSD -Mutf8 -pe 's/([ÃƒÂ¦ÃƒÂ¸ÃƒÂ¥])/[$1]/g'

Works also for files

Output :

ABC[ÃƒÂ¦][ÃƒÂ¸][ÃƒÂ¥]DEF

Note :

substitutions: \1 is for awk, 1 is for sed and in perl we use $1

check perldoc perlrun for -CSD utf8 tricks

edited Apr 5 at 21:59

answered Apr 2 at 12:54

Gilles Quenot

15.3k13448

Very nice! I know you can do export PERL_UNICODE=S to avoid -CS. Is there a similar thing you can do to avoid -Mutf8? alias perl="/usr/bin/perl -Mutf8" if nothing else? I always use UTF-8.
â€“Â forthrin
Apr 2 at 13:39

Please read this authoritative post stackoverflow.com/questions/6162484/â€¦
â€“Â Gilles Quenot
Apr 2 at 13:43

1

Instead of perl -CS -Mopen=":std,IN,:encoding(utf-8)", why not perl -CSD?
â€“Â haukex
Apr 2 at 17:45

2

@forthrin Re "perl is not meant for Unicode." Perl has excellent Unicode support, but for backwards compatibility it is not enabled by default, including for oneliners. If all you use Perl for is oneliners, then yes, you may have to jump through some hoops, but if you write scripts, then you'll have an easier time.
â€“Â haukex
Apr 2 at 17:45

1

What I need is a string replacement tool on the command line, because this is something I do very often. I can confirm that your suggestion works for both stdin and files. I really appreciate the help.
â€“Â forthrin
Apr 3 at 5:42

Â |Â
show 3 more comments

up vote
6
down vote

accepted

Use this :

 $ echo 'ABCÃƒÂ¦ÃƒÂ¸ÃƒÂ¥DEF' |
 perl -CSD -Mutf8 -pe 's/([ÃƒÂ¦ÃƒÂ¸ÃƒÂ¥])/[$1]/g'

Works also for files

Output :

ABC[ÃƒÂ¦][ÃƒÂ¸][ÃƒÂ¥]DEF

Note :

substitutions: \1 is for awk, 1 is for sed and in perl we use $1

check perldoc perlrun for -CSD utf8 tricks

edited Apr 5 at 21:59

answered Apr 2 at 12:54

Gilles Quenot

15.3k13448

Use this :

 $ echo 'ABCÃƒÂ¦ÃƒÂ¸ÃƒÂ¥DEF' |
 perl -CSD -Mutf8 -pe 's/([ÃƒÂ¦ÃƒÂ¸ÃƒÂ¥])/[$1]/g'

Works also for files

Output :

ABC[ÃƒÂ¦][ÃƒÂ¸][ÃƒÂ¥]DEF

Note :

substitutions: \1 is for awk, 1 is for sed and in perl we use $1

check perldoc perlrun for -CSD utf8 tricks

edited Apr 5 at 21:59

answered Apr 2 at 12:54

Gilles Quenot

15.3k13448

edited Apr 5 at 21:59

answered Apr 2 at 12:54

Gilles Quenot

15.3k13448

answered Apr 2 at 12:54

Gilles Quenot

15.3k13448

answered Apr 2 at 12:54

Gilles Quenot

15.3k13448

Very nice! I know you can do export PERL_UNICODE=S to avoid -CS. Is there a similar thing you can do to avoid -Mutf8? alias perl="/usr/bin/perl -Mutf8" if nothing else? I always use UTF-8.
â€“Â forthrin
Apr 2 at 13:39

Please read this authoritative post stackoverflow.com/questions/6162484/â€¦
â€“Â Gilles Quenot
Apr 2 at 13:43

1

Instead of perl -CS -Mopen=":std,IN,:encoding(utf-8)", why not perl -CSD?
â€“Â haukex
Apr 2 at 17:45

2

@forthrin Re "perl is not meant for Unicode." Perl has excellent Unicode support, but for backwards compatibility it is not enabled by default, including for oneliners. If all you use Perl for is oneliners, then yes, you may have to jump through some hoops, but if you write scripts, then you'll have an easier time.
â€“Â haukex
Apr 2 at 17:45

1

What I need is a string replacement tool on the command line, because this is something I do very often. I can confirm that your suggestion works for both stdin and files. I really appreciate the help.
â€“Â forthrin
Apr 3 at 5:42

Â |Â
show 3 more comments

Very nice! I know you can do export PERL_UNICODE=S to avoid -CS. Is there a similar thing you can do to avoid -Mutf8? alias perl="/usr/bin/perl -Mutf8" if nothing else? I always use UTF-8.
â€“Â forthrin
Apr 2 at 13:39

Please read this authoritative post stackoverflow.com/questions/6162484/â€¦
â€“Â Gilles Quenot
Apr 2 at 13:43

1

Instead of perl -CS -Mopen=":std,IN,:encoding(utf-8)", why not perl -CSD?
â€“Â haukex
Apr 2 at 17:45

2

@forthrin Re "perl is not meant for Unicode." Perl has excellent Unicode support, but for backwards compatibility it is not enabled by default, including for oneliners. If all you use Perl for is oneliners, then yes, you may have to jump through some hoops, but if you write scripts, then you'll have an easier time.
â€“Â haukex
Apr 2 at 17:45

1

What I need is a string replacement tool on the command line, because this is something I do very often. I can confirm that your suggestion works for both stdin and files. I really appreciate the help.
â€“Â forthrin
Apr 3 at 5:42

Very nice! I know you can do export PERL_UNICODE=S to avoid -CS. Is there a similar thing you can do to avoid -Mutf8? alias perl="/usr/bin/perl -Mutf8" if nothing else? I always use UTF-8.
â€“Â forthrin
Apr 2 at 13:39

Please read this authoritative post stackoverflow.com/questions/6162484/â€¦
â€“Â Gilles Quenot
Apr 2 at 13:43

Instead of perl -CS -Mopen=":std,IN,:encoding(utf-8)", why not perl -CSD?
â€“Â haukex
Apr 2 at 17:45

@forthrin Re "perl is not meant for Unicode." Perl has excellent Unicode support, but for backwards compatibility it is not enabled by default, including for oneliners. If all you use Perl for is oneliners, then yes, you may have to jump through some hoops, but if you write scripts, then you'll have an easier time.
â€“Â haukex
Apr 2 at 17:45

What I need is a string replacement tool on the command line, because this is something I do very often. I can confirm that your suggestion works for both stdin and files. I really appreciate the help.
â€“Â forthrin
Apr 3 at 5:42

Â |Â
show 3 more comments

up vote
1
down vote

Your input:

$ cat input.txt 
ABCÃƒÂ¦ÃƒÂ¸ÃƒÂ¥DEF
$ hexdump -C input.txt 
00000000 41 42 43 c3 a6 c3 b8 c3 a5 44 45 46 0a |ABC......DEF.|
0000000d

One good way IMO is the -C option plus utf8:

$ perl -CSD -Mutf8 -pe 's/([ÃƒÂ¦ÃƒÂ¸ÃƒÂ¥])/[$1]/g' input.txt 
ABC[ÃƒÂ¦][ÃƒÂ¸][ÃƒÂ¥]DEF
$ cat input.txt | perl -CSD -Mutf8 -pe 's/([ÃƒÂ¦ÃƒÂ¸ÃƒÂ¥])/[$1]/g'
ABC[ÃƒÂ¦][ÃƒÂ¸][ÃƒÂ¥]DEF

If you don't want to use UTF-8 on the command line, you can always write your Perl code in plain ASCII and use escapes such as xAB, xABCD, or in newer Perls NU+ABCD or NCHARNAME:

$ perl -CSD -pe 's/([xE6xF8xE5])/[$1]/g' input.txt 
ABC[ÃƒÂ¦][ÃƒÂ¸][ÃƒÂ¥]DEF
$ cat input.txt | perl -CSD -pe 's/([xE6xF8xE5])/[$1]/g'
ABC[ÃƒÂ¦][ÃƒÂ¸][ÃƒÂ¥]DEF

$ perl -CSDA -pe 'BEGIN$p=shift; s/($p)/[$1]/g' '[ÃƒÂ¦ÃƒÂ¸ÃƒÂ¥]' input.txt 
ABC[ÃƒÂ¦][ÃƒÂ¸][ÃƒÂ¥]DEF
$ cat input.txt | perl -CSDA -pe 'BEGIN$p=shift; s/($p)/[$1]/g' '[ÃƒÂ¦ÃƒÂ¸ÃƒÂ¥]'
ABC[ÃƒÂ¦][ÃƒÂ¸][ÃƒÂ¥]DEF

Or, of course you can always turn the oneliner into an actual script, where you can

use warnings;
use 5.012;
use utf8;
use open qw/:std :encoding(UTF-8)/;
use charnames qw/:full :short/;

Further reading: perlunitut, perlunifaq, perluniintro, perlunicode, perlunicook.

answered Apr 2 at 18:21

haukex

2839

Appreciate the information about the alternative character syntaxes.
â€“Â forthrin
Apr 3 at 5:43

add a commentÂ |Â

up vote
1
down vote

Your input:

$ cat input.txt 
ABCÃƒÂ¦ÃƒÂ¸ÃƒÂ¥DEF
$ hexdump -C input.txt 
00000000 41 42 43 c3 a6 c3 b8 c3 a5 44 45 46 0a |ABC......DEF.|
0000000d

One good way IMO is the -C option plus utf8:

$ perl -CSD -Mutf8 -pe 's/([ÃƒÂ¦ÃƒÂ¸ÃƒÂ¥])/[$1]/g' input.txt 
ABC[ÃƒÂ¦][ÃƒÂ¸][ÃƒÂ¥]DEF
$ cat input.txt | perl -CSD -Mutf8 -pe 's/([ÃƒÂ¦ÃƒÂ¸ÃƒÂ¥])/[$1]/g'
ABC[ÃƒÂ¦][ÃƒÂ¸][ÃƒÂ¥]DEF

If you don't want to use UTF-8 on the command line, you can always write your Perl code in plain ASCII and use escapes such as xAB, xABCD, or in newer Perls NU+ABCD or NCHARNAME:

$ perl -CSD -pe 's/([xE6xF8xE5])/[$1]/g' input.txt 
ABC[ÃƒÂ¦][ÃƒÂ¸][ÃƒÂ¥]DEF
$ cat input.txt | perl -CSD -pe 's/([xE6xF8xE5])/[$1]/g'
ABC[ÃƒÂ¦][ÃƒÂ¸][ÃƒÂ¥]DEF

$ perl -CSDA -pe 'BEGIN$p=shift; s/($p)/[$1]/g' '[ÃƒÂ¦ÃƒÂ¸ÃƒÂ¥]' input.txt 
ABC[ÃƒÂ¦][ÃƒÂ¸][ÃƒÂ¥]DEF
$ cat input.txt | perl -CSDA -pe 'BEGIN$p=shift; s/($p)/[$1]/g' '[ÃƒÂ¦ÃƒÂ¸ÃƒÂ¥]'
ABC[ÃƒÂ¦][ÃƒÂ¸][ÃƒÂ¥]DEF

Or, of course you can always turn the oneliner into an actual script, where you can

use warnings;
use 5.012;
use utf8;
use open qw/:std :encoding(UTF-8)/;
use charnames qw/:full :short/;

Further reading: perlunitut, perlunifaq, perluniintro, perlunicode, perlunicook.

answered Apr 2 at 18:21

haukex

2839

Appreciate the information about the alternative character syntaxes.
â€“Â forthrin
Apr 3 at 5:43

add a commentÂ |Â

up vote
1
down vote

Your input:

$ cat input.txt 
ABCÃƒÂ¦ÃƒÂ¸ÃƒÂ¥DEF
$ hexdump -C input.txt 
00000000 41 42 43 c3 a6 c3 b8 c3 a5 44 45 46 0a |ABC......DEF.|
0000000d

One good way IMO is the -C option plus utf8:

$ perl -CSD -Mutf8 -pe 's/([ÃƒÂ¦ÃƒÂ¸ÃƒÂ¥])/[$1]/g' input.txt 
ABC[ÃƒÂ¦][ÃƒÂ¸][ÃƒÂ¥]DEF
$ cat input.txt | perl -CSD -Mutf8 -pe 's/([ÃƒÂ¦ÃƒÂ¸ÃƒÂ¥])/[$1]/g'
ABC[ÃƒÂ¦][ÃƒÂ¸][ÃƒÂ¥]DEF

If you don't want to use UTF-8 on the command line, you can always write your Perl code in plain ASCII and use escapes such as xAB, xABCD, or in newer Perls NU+ABCD or NCHARNAME:

$ perl -CSD -pe 's/([xE6xF8xE5])/[$1]/g' input.txt 
ABC[ÃƒÂ¦][ÃƒÂ¸][ÃƒÂ¥]DEF
$ cat input.txt | perl -CSD -pe 's/([xE6xF8xE5])/[$1]/g'
ABC[ÃƒÂ¦][ÃƒÂ¸][ÃƒÂ¥]DEF

$ perl -CSDA -pe 'BEGIN$p=shift; s/($p)/[$1]/g' '[ÃƒÂ¦ÃƒÂ¸ÃƒÂ¥]' input.txt 
ABC[ÃƒÂ¦][ÃƒÂ¸][ÃƒÂ¥]DEF
$ cat input.txt | perl -CSDA -pe 'BEGIN$p=shift; s/($p)/[$1]/g' '[ÃƒÂ¦ÃƒÂ¸ÃƒÂ¥]'
ABC[ÃƒÂ¦][ÃƒÂ¸][ÃƒÂ¥]DEF

Or, of course you can always turn the oneliner into an actual script, where you can

use warnings;
use 5.012;
use utf8;
use open qw/:std :encoding(UTF-8)/;
use charnames qw/:full :short/;

Further reading: perlunitut, perlunifaq, perluniintro, perlunicode, perlunicook.

answered Apr 2 at 18:21

haukex

2839

Your input:

$ cat input.txt 
ABCÃƒÂ¦ÃƒÂ¸ÃƒÂ¥DEF
$ hexdump -C input.txt 
00000000 41 42 43 c3 a6 c3 b8 c3 a5 44 45 46 0a |ABC......DEF.|
0000000d

One good way IMO is the -C option plus utf8:

$ perl -CSD -Mutf8 -pe 's/([ÃƒÂ¦ÃƒÂ¸ÃƒÂ¥])/[$1]/g' input.txt 
ABC[ÃƒÂ¦][ÃƒÂ¸][ÃƒÂ¥]DEF
$ cat input.txt | perl -CSD -Mutf8 -pe 's/([ÃƒÂ¦ÃƒÂ¸ÃƒÂ¥])/[$1]/g'
ABC[ÃƒÂ¦][ÃƒÂ¸][ÃƒÂ¥]DEF

If you don't want to use UTF-8 on the command line, you can always write your Perl code in plain ASCII and use escapes such as xAB, xABCD, or in newer Perls NU+ABCD or NCHARNAME:

$ perl -CSD -pe 's/([xE6xF8xE5])/[$1]/g' input.txt 
ABC[ÃƒÂ¦][ÃƒÂ¸][ÃƒÂ¥]DEF
$ cat input.txt | perl -CSD -pe 's/([xE6xF8xE5])/[$1]/g'
ABC[ÃƒÂ¦][ÃƒÂ¸][ÃƒÂ¥]DEF

$ perl -CSDA -pe 'BEGIN$p=shift; s/($p)/[$1]/g' '[ÃƒÂ¦ÃƒÂ¸ÃƒÂ¥]' input.txt 
ABC[ÃƒÂ¦][ÃƒÂ¸][ÃƒÂ¥]DEF
$ cat input.txt | perl -CSDA -pe 'BEGIN$p=shift; s/($p)/[$1]/g' '[ÃƒÂ¦ÃƒÂ¸ÃƒÂ¥]'
ABC[ÃƒÂ¦][ÃƒÂ¸][ÃƒÂ¥]DEF

Or, of course you can always turn the oneliner into an actual script, where you can

use warnings;
use 5.012;
use utf8;
use open qw/:std :encoding(UTF-8)/;
use charnames qw/:full :short/;

Further reading: perlunitut, perlunifaq, perluniintro, perlunicode, perlunicook.

answered Apr 2 at 18:21

haukex

2839

answered Apr 2 at 18:21

haukex

2839

answered Apr 2 at 18:21

haukex

2839

answered Apr 2 at 18:21

haukex

2839

Appreciate the information about the alternative character syntaxes.
â€“Â forthrin
Apr 3 at 5:43

add a commentÂ |Â

Appreciate the information about the alternative character syntaxes.
â€“Â forthrin
Apr 3 at 5:43

Appreciate the information about the alternative character syntaxes.
â€“Â forthrin
Apr 3 at 5:43

add a commentÂ |Â

up vote
-2
down vote

$ echo 'ABCÃƒÂ¦ÃƒÂ¸ÃƒÂ¥DEF' |
perl -CS -Mutf8 -pe 's/([ÃƒÂ¦ÃƒÂ¸ÃƒÂ¥])/[$1]/g'

answered Apr 2 at 18:26

Porno Nacionais

add a commentÂ |Â

up vote
-2
down vote

$ echo 'ABCÃƒÂ¦ÃƒÂ¸ÃƒÂ¥DEF' |
perl -CS -Mutf8 -pe 's/([ÃƒÂ¦ÃƒÂ¸ÃƒÂ¥])/[$1]/g'

answered Apr 2 at 18:26

Porno Nacionais

add a commentÂ |Â

up vote
-2
down vote

$ echo 'ABCÃƒÂ¦ÃƒÂ¸ÃƒÂ¥DEF' |
perl -CS -Mutf8 -pe 's/([ÃƒÂ¦ÃƒÂ¸ÃƒÂ¥])/[$1]/g'

answered Apr 2 at 18:26

Porno Nacionais

$ echo 'ABCÃƒÂ¦ÃƒÂ¸ÃƒÂ¥DEF' |
perl -CS -Mutf8 -pe 's/([ÃƒÂ¦ÃƒÂ¸ÃƒÂ¥])/[$1]/g'

answered Apr 2 at 18:26

Porno Nacionais

answered Apr 2 at 18:26

Porno Nacionais

answered Apr 2 at 18:26

Porno Nacionais

answered Apr 2 at 18:26

Porno Nacionais

add a commentÂ |Â

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

搜尋此網誌

mjhjmtu