How can I correctly decompress a ZIP archive of files with Hebrew names?

up vote
8
down vote

favorite

Someone sent me a ZIP file containing files with Hebrew names (and created on Windows, not sure with which tool). I use LXDE on Debian Stretch. The Gnome archive manager manages to unzip the file, but the Hebrew characters are garbled. I think I'm getting UTF-8 octets extended into Unicode characters, e.g. I have a file whose name has four characters and a .doc suffic, and the characters are: 0x008E 0x0087 0x008E 0x0085 . Using the command-line unzip utility is even worse - it refuses to decompress altogether, complaining about an "Invalid or incomplete multibyte or wide character".

So, my questions are:

Is there another decompression utility that will decompress my files with the correct names?

Is there something wrong with the way the file was compressed, or is it just an incompatibility of ZIP implementations? Or even misfeature/bug of the Linux ZIP utilities?

What can I do to get the correct filenames after having decompressed using the garbled ones?

edited May 15 at 19:28

Braiam

22.6k1971132

asked Dec 28 '15 at 17:47

einpoklum

1,95441846

If you look up those bytes in the cp862 table does the file name match what you expect? Otherwise, do you know the native encoding of the source machine?
â€“Â Michael Homer
Dec 28 '15 at 20:11

Ditto for cp1255, and any other plausible encodings; it may be possible to just work it out based on what looks right.
â€“Â Michael Homer
Dec 28 '15 at 20:22

@MichaelHomer: No, it doesn't look like it matches. The native encoding of the source machine is whatever MS Windows uses when you set the regional settings to Hebrew-Israel, so I guess it's sometimes UTF-8 and sometimes CP1255.
â€“Â einpoklum
Dec 29 '15 at 21:41

add a commentÂ |Â

up vote
8
down vote

favorite

So, my questions are:

Is there another decompression utility that will decompress my files with the correct names?

Is there something wrong with the way the file was compressed, or is it just an incompatibility of ZIP implementations? Or even misfeature/bug of the Linux ZIP utilities?

What can I do to get the correct filenames after having decompressed using the garbled ones?

edited May 15 at 19:28

Braiam

22.6k1971132

asked Dec 28 '15 at 17:47

einpoklum

1,95441846

If you look up those bytes in the cp862 table does the file name match what you expect? Otherwise, do you know the native encoding of the source machine?
â€“Â Michael Homer
Dec 28 '15 at 20:11

Ditto for cp1255, and any other plausible encodings; it may be possible to just work it out based on what looks right.
â€“Â Michael Homer
Dec 28 '15 at 20:22

@MichaelHomer: No, it doesn't look like it matches. The native encoding of the source machine is whatever MS Windows uses when you set the regional settings to Hebrew-Israel, so I guess it's sometimes UTF-8 and sometimes CP1255.
â€“Â einpoklum
Dec 29 '15 at 21:41

add a commentÂ |Â

up vote
8
down vote

favorite

So, my questions are:

Is there another decompression utility that will decompress my files with the correct names?

Is there something wrong with the way the file was compressed, or is it just an incompatibility of ZIP implementations? Or even misfeature/bug of the Linux ZIP utilities?

What can I do to get the correct filenames after having decompressed using the garbled ones?

edited May 15 at 19:28

Braiam

22.6k1971132

asked Dec 28 '15 at 17:47

einpoklum

1,95441846

So, my questions are:

Is there another decompression utility that will decompress my files with the correct names?

Is there something wrong with the way the file was compressed, or is it just an incompatibility of ZIP implementations? Or even misfeature/bug of the Linux ZIP utilities?

What can I do to get the correct filenames after having decompressed using the garbled ones?

character-encoding zip unicode file-format

edited May 15 at 19:28

Braiam

22.6k1971132

asked Dec 28 '15 at 17:47

einpoklum

1,95441846

edited May 15 at 19:28

Braiam

22.6k1971132

asked Dec 28 '15 at 17:47

einpoklum

1,95441846

edited May 15 at 19:28

Braiam

22.6k1971132

edited May 15 at 19:28

Braiam

22.6k1971132

edited May 15 at 19:28

Braiam

22.6k1971132

asked Dec 28 '15 at 17:47

einpoklum

1,95441846

asked Dec 28 '15 at 17:47

einpoklum

1,95441846

asked Dec 28 '15 at 17:47

einpoklum

1,95441846

If you look up those bytes in the cp862 table does the file name match what you expect? Otherwise, do you know the native encoding of the source machine?
â€“Â Michael Homer
Dec 28 '15 at 20:11

Ditto for cp1255, and any other plausible encodings; it may be possible to just work it out based on what looks right.
â€“Â Michael Homer
Dec 28 '15 at 20:22

@MichaelHomer: No, it doesn't look like it matches. The native encoding of the source machine is whatever MS Windows uses when you set the regional settings to Hebrew-Israel, so I guess it's sometimes UTF-8 and sometimes CP1255.
â€“Â einpoklum
Dec 29 '15 at 21:41

add a commentÂ |Â

If you look up those bytes in the cp862 table does the file name match what you expect? Otherwise, do you know the native encoding of the source machine?
â€“Â Michael Homer
Dec 28 '15 at 20:11

Ditto for cp1255, and any other plausible encodings; it may be possible to just work it out based on what looks right.
â€“Â Michael Homer
Dec 28 '15 at 20:22

@MichaelHomer: No, it doesn't look like it matches. The native encoding of the source machine is whatever MS Windows uses when you set the regional settings to Hebrew-Israel, so I guess it's sometimes UTF-8 and sometimes CP1255.
â€“Â einpoklum
Dec 29 '15 at 21:41

If you look up those bytes in the cp862 table does the file name match what you expect? Otherwise, do you know the native encoding of the source machine?
â€“Â Michael Homer
Dec 28 '15 at 20:11

Ditto for cp1255, and any other plausible encodings; it may be possible to just work it out based on what looks right.
â€“Â Michael Homer
Dec 28 '15 at 20:22

@MichaelHomer: No, it doesn't look like it matches. The native encoding of the source machine is whatever MS Windows uses when you set the regional settings to Hebrew-Israel, so I guess it's sometimes UTF-8 and sometimes CP1255.
â€“Â einpoklum
Dec 29 '15 at 21:41

add a commentÂ |Â

4 Answers
4

active

oldest

votes

up vote
9
down vote

accepted

It sounds like the filenames are encoded in one of Windows' proprietary codepages (CP862, 1255, etc).

Is there another decompression utility that will decompress my files with the correct names? I'm not aware of a zip utility that supports these code pages natively. 7z has some understanding of encodings, but I believe it has to be an encoding your system knows about more generally (you pick it by setting the LANG environment variable) and Windows codepages likely aren't among those.

unzip -UU should work from the command line to create files with the correct bytes in their names (by disabling all Unicode support). That is probably the effect you got from GNOME's tool already. The encoding won't be right either way, but we can fix that below.

Is there something wrong with the way the file was compressed, or is it just an incompatibility of ZIP implementations? Or even misfeature/bug of the Linux ZIP utilities? The file you've been given was not created portably. That's not necessarily wrong for an internal use where the encoding is fixed and known in advance, although the format specification says that names are supposed to be either UTF-8 or cp437 and yours are neither. Even between Windows machines, using different codepages doesn't work out well, but non-Windows machines have no concept of those code pages to begin with. Most tools UTF-8 encode their filenames (which still isn't always enough to avoid problems).

What can I do to get the correct filenames after having decompressed using the garbled ones? If you can identify the encoding of the filenames, you can convert the bytes in the existing names into UTF-8 and move the existing files to the right name. The convmv tool essentially wraps up that process into a single command: convmv -f cp862 -t utf8 -r . will try to convert everything inside . from cp862 to UTF-8.

Alternatively, you can use iconv and find to move everything to their correct names. Something like:
```
find -mindepth 1 -exec sh -c 'mv "$1" "$(echo "$1" | iconv -f cp862 -t utf8)"' sh ;
```
will find all the files underneath the current directory and try to convert the names into UTF-8.

In either case, you can experiment with different encodings and try to find one that makes sense.

After you've fixed the encoding for you, if you want to send these files back in the other direction it's possible you'll have the same problem on the other end. In that case, you can reverse the process before zipping the files up with -UU, since it's likely to be very hard to fix on the Windows end.

answered Dec 28 '15 at 20:52

Michael Homer

42.9k6108148

I guess this will have to do since the ZIP file I was looking into is now gone for, well, reasons irrelevant here. Thanks, will do this next time and hope for the best.
â€“Â einpoklum
Dec 29 '15 at 21:49

1

rar or p7zip refuse to handle .zip archives. Is there a way to extract an archive with filenames in proprietary encodings, on Linux? When I extract with unzip, I get an error: "error: cannot create Ã¢Â•Â¨ÃÂ¸Ã¢Â•Â¨Ã¢Â•Â•Ã¢Â•Â¨Ã¢Â”Â/Ship_Ã¢Â•Â¨ÃÂ¿ Ã¢Â•Â¨ÃÂ¯Ã¢Â•Â¤ÃÂÃ¢Â•Â¨Ã¢Â•Â›Ã¢Â•Â¤ÃÂ—Ã¢Â•Â¨Ã¢Â•Â—Ã¢Â•Â¨Ã¢Â–Â‘ Ã¢Â•Â¨ÃÂ¯Ã¢Â•Â¤ÃÂÃ¢Â•Â¨Ã¢Â•Â› Ã¢Â•Â¨ÃÂ½Ã¢Â•Â¤ÃÂ’Ã¢Â•Â¨Ã¢Â•Â› Ã¢Â•Â¨Ã¢Â–Â“Ã¢Â•Â¨ÃÂ®Ã¢Â•Â¨Ã¢Â”Â¤Ã¢Â•Â¨Ã¢Â•ÂœÃ¢Â•Â¨Ã¢Â•Â›Ã¢Â•Â¨Ã¢Â•Â£ Ã¢Â•Â¨ÃÂªÃ¢Â•Â¨Ã¢Â•ÂœÃ¢Â•Â¨Ã¢Â•Â•Ã¢Â•Â¨Ã¢Â•Â¢Ã¢Â•Â¨Ã¢Â•Â‘Ã¢Â•Â¨Ã¢Â•Â¡!.png File name too long"
â€“Â Nickolai Leschov
Jan 13 '17 at 19:01

I managed to extract .zip file correctly with LANG=ru_RU.CP1251; unzip Bleed.zip (it was Cyrillic encoding in my case). Now I wonder how do I set up my system so that I can correctly open such .zip files in GUI by default?
â€“Â Nickolai Leschov
Jan 13 '17 at 19:18

@NickolaiLeschov Ask a question and someone may be able to help you. You'll probably need to provide more information about your system.
â€“Â Michael Homer
Jan 13 '17 at 22:09

unzip -UU foo.zip worked for Turkish characters
â€“Â Mert S. Kaplan
Nov 14 '17 at 20:04

add a commentÂ |Â

up vote
1
down vote

I have just had the same problem, and it turns out that my version of unzip that is available from Ubuntu repositories (UnZip 6.00 of 20 April 2009, by Debian. Original by Info-ZIP.) can handle automatic decoding of filenames if you specify the -a switch.

unzip -xa stupid.zip

answered Mar 4 at 15:27

Igor Zinov'yev

1113

+1 although I have nothing to test this with right now.
â€“Â einpoklum
Mar 4 at 15:32

According to the man page of unzip the -a switch takes care of converting text files. Not file names.
â€“Â beruic
May 7 at 14:04

add a commentÂ |Â

up vote
1
down vote

I had success with the command 7z x <source.zip>.

Version:

p7zip Version 16.02 (locale=utf8,Utf16=on,HugeFiles=on,64 bits,[...])

Potentially relevant environment:

LANG=en_US.UTF-8
LC_ALL=en_US.UTF-8
LC_CTYPE=UTF-8

It was able to decompress all files with 8-bit characters in their filenames, with some of these characters skipped, some garbled.

answered Mar 11 at 13:03

vszakats

112

I'll try this next time, thank you.
â€“Â einpoklum
Mar 11 at 13:39

add a commentÂ |Â

up vote
0
down vote

I have zip archive compressed in Linux (from command line) and filenames with diacritics characters are not correctly decompressed on Windows, but I succesfully unpacked it with Bandizip software which can set charset on toolbar.

edited Feb 14 at 1:11

iruvar

11.6k62959

answered Feb 14 at 0:41

Miro Junker

add a commentÂ |Â

Your Answer

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f251969%2fhow-can-i-correctly-decompress-a-zip-archive-of-files-with-hebrew-names%23new-answer', 'question_page');

);

Post as a guest

Name

4 Answers
4

active

oldest

votes

4 Answers
4

active

oldest

votes

up vote
9
down vote

accepted

It sounds like the filenames are encoded in one of Windows' proprietary codepages (CP862, 1255, etc).

Is there another decompression utility that will decompress my files with the correct names? I'm not aware of a zip utility that supports these code pages natively. 7z has some understanding of encodings, but I believe it has to be an encoding your system knows about more generally (you pick it by setting the LANG environment variable) and Windows codepages likely aren't among those.

unzip -UU should work from the command line to create files with the correct bytes in their names (by disabling all Unicode support). That is probably the effect you got from GNOME's tool already. The encoding won't be right either way, but we can fix that below.

Is there something wrong with the way the file was compressed, or is it just an incompatibility of ZIP implementations? Or even misfeature/bug of the Linux ZIP utilities? The file you've been given was not created portably. That's not necessarily wrong for an internal use where the encoding is fixed and known in advance, although the format specification says that names are supposed to be either UTF-8 or cp437 and yours are neither. Even between Windows machines, using different codepages doesn't work out well, but non-Windows machines have no concept of those code pages to begin with. Most tools UTF-8 encode their filenames (which still isn't always enough to avoid problems).

What can I do to get the correct filenames after having decompressed using the garbled ones? If you can identify the encoding of the filenames, you can convert the bytes in the existing names into UTF-8 and move the existing files to the right name. The convmv tool essentially wraps up that process into a single command: convmv -f cp862 -t utf8 -r . will try to convert everything inside . from cp862 to UTF-8.

Alternatively, you can use iconv and find to move everything to their correct names. Something like:
```
find -mindepth 1 -exec sh -c 'mv "$1" "$(echo "$1" | iconv -f cp862 -t utf8)"' sh ;
```
will find all the files underneath the current directory and try to convert the names into UTF-8.

In either case, you can experiment with different encodings and try to find one that makes sense.

answered Dec 28 '15 at 20:52

Michael Homer

42.9k6108148

I guess this will have to do since the ZIP file I was looking into is now gone for, well, reasons irrelevant here. Thanks, will do this next time and hope for the best.
â€“Â einpoklum
Dec 29 '15 at 21:49

1

rar or p7zip refuse to handle .zip archives. Is there a way to extract an archive with filenames in proprietary encodings, on Linux? When I extract with unzip, I get an error: "error: cannot create Ã¢Â•Â¨ÃÂ¸Ã¢Â•Â¨Ã¢Â•Â•Ã¢Â•Â¨Ã¢Â”Â/Ship_Ã¢Â•Â¨ÃÂ¿ Ã¢Â•Â¨ÃÂ¯Ã¢Â•Â¤ÃÂÃ¢Â•Â¨Ã¢Â•Â›Ã¢Â•Â¤ÃÂ—Ã¢Â•Â¨Ã¢Â•Â—Ã¢Â•Â¨Ã¢Â–Â‘ Ã¢Â•Â¨ÃÂ¯Ã¢Â•Â¤ÃÂÃ¢Â•Â¨Ã¢Â•Â› Ã¢Â•Â¨ÃÂ½Ã¢Â•Â¤ÃÂ’Ã¢Â•Â¨Ã¢Â•Â› Ã¢Â•Â¨Ã¢Â–Â“Ã¢Â•Â¨ÃÂ®Ã¢Â•Â¨Ã¢Â”Â¤Ã¢Â•Â¨Ã¢Â•ÂœÃ¢Â•Â¨Ã¢Â•Â›Ã¢Â•Â¨Ã¢Â•Â£ Ã¢Â•Â¨ÃÂªÃ¢Â•Â¨Ã¢Â•ÂœÃ¢Â•Â¨Ã¢Â•Â•Ã¢Â•Â¨Ã¢Â•Â¢Ã¢Â•Â¨Ã¢Â•Â‘Ã¢Â•Â¨Ã¢Â•Â¡!.png File name too long"
â€“Â Nickolai Leschov
Jan 13 '17 at 19:01

I managed to extract .zip file correctly with LANG=ru_RU.CP1251; unzip Bleed.zip (it was Cyrillic encoding in my case). Now I wonder how do I set up my system so that I can correctly open such .zip files in GUI by default?
â€“Â Nickolai Leschov
Jan 13 '17 at 19:18

@NickolaiLeschov Ask a question and someone may be able to help you. You'll probably need to provide more information about your system.
â€“Â Michael Homer
Jan 13 '17 at 22:09

unzip -UU foo.zip worked for Turkish characters
â€“Â Mert S. Kaplan
Nov 14 '17 at 20:04

add a commentÂ |Â

up vote
9
down vote

accepted

It sounds like the filenames are encoded in one of Windows' proprietary codepages (CP862, 1255, etc).

Is there another decompression utility that will decompress my files with the correct names? I'm not aware of a zip utility that supports these code pages natively. 7z has some understanding of encodings, but I believe it has to be an encoding your system knows about more generally (you pick it by setting the LANG environment variable) and Windows codepages likely aren't among those.

unzip -UU should work from the command line to create files with the correct bytes in their names (by disabling all Unicode support). That is probably the effect you got from GNOME's tool already. The encoding won't be right either way, but we can fix that below.

Is there something wrong with the way the file was compressed, or is it just an incompatibility of ZIP implementations? Or even misfeature/bug of the Linux ZIP utilities? The file you've been given was not created portably. That's not necessarily wrong for an internal use where the encoding is fixed and known in advance, although the format specification says that names are supposed to be either UTF-8 or cp437 and yours are neither. Even between Windows machines, using different codepages doesn't work out well, but non-Windows machines have no concept of those code pages to begin with. Most tools UTF-8 encode their filenames (which still isn't always enough to avoid problems).

What can I do to get the correct filenames after having decompressed using the garbled ones? If you can identify the encoding of the filenames, you can convert the bytes in the existing names into UTF-8 and move the existing files to the right name. The convmv tool essentially wraps up that process into a single command: convmv -f cp862 -t utf8 -r . will try to convert everything inside . from cp862 to UTF-8.

Alternatively, you can use iconv and find to move everything to their correct names. Something like:
```
find -mindepth 1 -exec sh -c 'mv "$1" "$(echo "$1" | iconv -f cp862 -t utf8)"' sh ;
```
will find all the files underneath the current directory and try to convert the names into UTF-8.

In either case, you can experiment with different encodings and try to find one that makes sense.

answered Dec 28 '15 at 20:52

Michael Homer

42.9k6108148

I guess this will have to do since the ZIP file I was looking into is now gone for, well, reasons irrelevant here. Thanks, will do this next time and hope for the best.
â€“Â einpoklum
Dec 29 '15 at 21:49

1

rar or p7zip refuse to handle .zip archives. Is there a way to extract an archive with filenames in proprietary encodings, on Linux? When I extract with unzip, I get an error: "error: cannot create Ã¢Â•Â¨ÃÂ¸Ã¢Â•Â¨Ã¢Â•Â•Ã¢Â•Â¨Ã¢Â”Â/Ship_Ã¢Â•Â¨ÃÂ¿ Ã¢Â•Â¨ÃÂ¯Ã¢Â•Â¤ÃÂÃ¢Â•Â¨Ã¢Â•Â›Ã¢Â•Â¤ÃÂ—Ã¢Â•Â¨Ã¢Â•Â—Ã¢Â•Â¨Ã¢Â–Â‘ Ã¢Â•Â¨ÃÂ¯Ã¢Â•Â¤ÃÂÃ¢Â•Â¨Ã¢Â•Â› Ã¢Â•Â¨ÃÂ½Ã¢Â•Â¤ÃÂ’Ã¢Â•Â¨Ã¢Â•Â› Ã¢Â•Â¨Ã¢Â–Â“Ã¢Â•Â¨ÃÂ®Ã¢Â•Â¨Ã¢Â”Â¤Ã¢Â•Â¨Ã¢Â•ÂœÃ¢Â•Â¨Ã¢Â•Â›Ã¢Â•Â¨Ã¢Â•Â£ Ã¢Â•Â¨ÃÂªÃ¢Â•Â¨Ã¢Â•ÂœÃ¢Â•Â¨Ã¢Â•Â•Ã¢Â•Â¨Ã¢Â•Â¢Ã¢Â•Â¨Ã¢Â•Â‘Ã¢Â•Â¨Ã¢Â•Â¡!.png File name too long"
â€“Â Nickolai Leschov
Jan 13 '17 at 19:01

I managed to extract .zip file correctly with LANG=ru_RU.CP1251; unzip Bleed.zip (it was Cyrillic encoding in my case). Now I wonder how do I set up my system so that I can correctly open such .zip files in GUI by default?
â€“Â Nickolai Leschov
Jan 13 '17 at 19:18

@NickolaiLeschov Ask a question and someone may be able to help you. You'll probably need to provide more information about your system.
â€“Â Michael Homer
Jan 13 '17 at 22:09

unzip -UU foo.zip worked for Turkish characters
â€“Â Mert S. Kaplan
Nov 14 '17 at 20:04

add a commentÂ |Â

up vote
9
down vote

accepted

It sounds like the filenames are encoded in one of Windows' proprietary codepages (CP862, 1255, etc).

Is there another decompression utility that will decompress my files with the correct names? I'm not aware of a zip utility that supports these code pages natively. 7z has some understanding of encodings, but I believe it has to be an encoding your system knows about more generally (you pick it by setting the LANG environment variable) and Windows codepages likely aren't among those.

unzip -UU should work from the command line to create files with the correct bytes in their names (by disabling all Unicode support). That is probably the effect you got from GNOME's tool already. The encoding won't be right either way, but we can fix that below.

Is there something wrong with the way the file was compressed, or is it just an incompatibility of ZIP implementations? Or even misfeature/bug of the Linux ZIP utilities? The file you've been given was not created portably. That's not necessarily wrong for an internal use where the encoding is fixed and known in advance, although the format specification says that names are supposed to be either UTF-8 or cp437 and yours are neither. Even between Windows machines, using different codepages doesn't work out well, but non-Windows machines have no concept of those code pages to begin with. Most tools UTF-8 encode their filenames (which still isn't always enough to avoid problems).

What can I do to get the correct filenames after having decompressed using the garbled ones? If you can identify the encoding of the filenames, you can convert the bytes in the existing names into UTF-8 and move the existing files to the right name. The convmv tool essentially wraps up that process into a single command: convmv -f cp862 -t utf8 -r . will try to convert everything inside . from cp862 to UTF-8.

Alternatively, you can use iconv and find to move everything to their correct names. Something like:
```
find -mindepth 1 -exec sh -c 'mv "$1" "$(echo "$1" | iconv -f cp862 -t utf8)"' sh ;
```
will find all the files underneath the current directory and try to convert the names into UTF-8.

In either case, you can experiment with different encodings and try to find one that makes sense.

answered Dec 28 '15 at 20:52

Michael Homer

42.9k6108148

It sounds like the filenames are encoded in one of Windows' proprietary codepages (CP862, 1255, etc).

Is there another decompression utility that will decompress my files with the correct names? I'm not aware of a zip utility that supports these code pages natively. 7z has some understanding of encodings, but I believe it has to be an encoding your system knows about more generally (you pick it by setting the LANG environment variable) and Windows codepages likely aren't among those.

unzip -UU should work from the command line to create files with the correct bytes in their names (by disabling all Unicode support). That is probably the effect you got from GNOME's tool already. The encoding won't be right either way, but we can fix that below.

Is there something wrong with the way the file was compressed, or is it just an incompatibility of ZIP implementations? Or even misfeature/bug of the Linux ZIP utilities? The file you've been given was not created portably. That's not necessarily wrong for an internal use where the encoding is fixed and known in advance, although the format specification says that names are supposed to be either UTF-8 or cp437 and yours are neither. Even between Windows machines, using different codepages doesn't work out well, but non-Windows machines have no concept of those code pages to begin with. Most tools UTF-8 encode their filenames (which still isn't always enough to avoid problems).

What can I do to get the correct filenames after having decompressed using the garbled ones? If you can identify the encoding of the filenames, you can convert the bytes in the existing names into UTF-8 and move the existing files to the right name. The convmv tool essentially wraps up that process into a single command: convmv -f cp862 -t utf8 -r . will try to convert everything inside . from cp862 to UTF-8.

Alternatively, you can use iconv and find to move everything to their correct names. Something like:
```
find -mindepth 1 -exec sh -c 'mv "$1" "$(echo "$1" | iconv -f cp862 -t utf8)"' sh ;
```
will find all the files underneath the current directory and try to convert the names into UTF-8.

In either case, you can experiment with different encodings and try to find one that makes sense.

answered Dec 28 '15 at 20:52

Michael Homer

42.9k6108148

answered Dec 28 '15 at 20:52

Michael Homer

42.9k6108148

answered Dec 28 '15 at 20:52

Michael Homer

42.9k6108148

answered Dec 28 '15 at 20:52

Michael Homer

42.9k6108148

I guess this will have to do since the ZIP file I was looking into is now gone for, well, reasons irrelevant here. Thanks, will do this next time and hope for the best.
â€“Â einpoklum
Dec 29 '15 at 21:49

1

rar or p7zip refuse to handle .zip archives. Is there a way to extract an archive with filenames in proprietary encodings, on Linux? When I extract with unzip, I get an error: "error: cannot create Ã¢Â•Â¨ÃÂ¸Ã¢Â•Â¨Ã¢Â•Â•Ã¢Â•Â¨Ã¢Â”Â/Ship_Ã¢Â•Â¨ÃÂ¿ Ã¢Â•Â¨ÃÂ¯Ã¢Â•Â¤ÃÂÃ¢Â•Â¨Ã¢Â•Â›Ã¢Â•Â¤ÃÂ—Ã¢Â•Â¨Ã¢Â•Â—Ã¢Â•Â¨Ã¢Â–Â‘ Ã¢Â•Â¨ÃÂ¯Ã¢Â•Â¤ÃÂÃ¢Â•Â¨Ã¢Â•Â› Ã¢Â•Â¨ÃÂ½Ã¢Â•Â¤ÃÂ’Ã¢Â•Â¨Ã¢Â•Â› Ã¢Â•Â¨Ã¢Â–Â“Ã¢Â•Â¨ÃÂ®Ã¢Â•Â¨Ã¢Â”Â¤Ã¢Â•Â¨Ã¢Â•ÂœÃ¢Â•Â¨Ã¢Â•Â›Ã¢Â•Â¨Ã¢Â•Â£ Ã¢Â•Â¨ÃÂªÃ¢Â•Â¨Ã¢Â•ÂœÃ¢Â•Â¨Ã¢Â•Â•Ã¢Â•Â¨Ã¢Â•Â¢Ã¢Â•Â¨Ã¢Â•Â‘Ã¢Â•Â¨Ã¢Â•Â¡!.png File name too long"
â€“Â Nickolai Leschov
Jan 13 '17 at 19:01

I managed to extract .zip file correctly with LANG=ru_RU.CP1251; unzip Bleed.zip (it was Cyrillic encoding in my case). Now I wonder how do I set up my system so that I can correctly open such .zip files in GUI by default?
â€“Â Nickolai Leschov
Jan 13 '17 at 19:18

@NickolaiLeschov Ask a question and someone may be able to help you. You'll probably need to provide more information about your system.
â€“Â Michael Homer
Jan 13 '17 at 22:09

unzip -UU foo.zip worked for Turkish characters
â€“Â Mert S. Kaplan
Nov 14 '17 at 20:04

add a commentÂ |Â

I guess this will have to do since the ZIP file I was looking into is now gone for, well, reasons irrelevant here. Thanks, will do this next time and hope for the best.
â€“Â einpoklum
Dec 29 '15 at 21:49

1

rar or p7zip refuse to handle .zip archives. Is there a way to extract an archive with filenames in proprietary encodings, on Linux? When I extract with unzip, I get an error: "error: cannot create Ã¢Â•Â¨ÃÂ¸Ã¢Â•Â¨Ã¢Â•Â•Ã¢Â•Â¨Ã¢Â”Â/Ship_Ã¢Â•Â¨ÃÂ¿ Ã¢Â•Â¨ÃÂ¯Ã¢Â•Â¤ÃÂÃ¢Â•Â¨Ã¢Â•Â›Ã¢Â•Â¤ÃÂ—Ã¢Â•Â¨Ã¢Â•Â—Ã¢Â•Â¨Ã¢Â–Â‘ Ã¢Â•Â¨ÃÂ¯Ã¢Â•Â¤ÃÂÃ¢Â•Â¨Ã¢Â•Â› Ã¢Â•Â¨ÃÂ½Ã¢Â•Â¤ÃÂ’Ã¢Â•Â¨Ã¢Â•Â› Ã¢Â•Â¨Ã¢Â–Â“Ã¢Â•Â¨ÃÂ®Ã¢Â•Â¨Ã¢Â”Â¤Ã¢Â•Â¨Ã¢Â•ÂœÃ¢Â•Â¨Ã¢Â•Â›Ã¢Â•Â¨Ã¢Â•Â£ Ã¢Â•Â¨ÃÂªÃ¢Â•Â¨Ã¢Â•ÂœÃ¢Â•Â¨Ã¢Â•Â•Ã¢Â•Â¨Ã¢Â•Â¢Ã¢Â•Â¨Ã¢Â•Â‘Ã¢Â•Â¨Ã¢Â•Â¡!.png File name too long"
â€“Â Nickolai Leschov
Jan 13 '17 at 19:01

I managed to extract .zip file correctly with LANG=ru_RU.CP1251; unzip Bleed.zip (it was Cyrillic encoding in my case). Now I wonder how do I set up my system so that I can correctly open such .zip files in GUI by default?
â€“Â Nickolai Leschov
Jan 13 '17 at 19:18

@NickolaiLeschov Ask a question and someone may be able to help you. You'll probably need to provide more information about your system.
â€“Â Michael Homer
Jan 13 '17 at 22:09

unzip -UU foo.zip worked for Turkish characters
â€“Â Mert S. Kaplan
Nov 14 '17 at 20:04

I guess this will have to do since the ZIP file I was looking into is now gone for, well, reasons irrelevant here. Thanks, will do this next time and hope for the best.
â€“Â einpoklum
Dec 29 '15 at 21:49

rar or p7zip refuse to handle .zip archives. Is there a way to extract an archive with filenames in proprietary encodings, on Linux? When I extract with unzip, I get an error: "error: cannot create Ã¢Â•Â¨ÃÂ¸Ã¢Â•Â¨Ã¢Â•Â•Ã¢Â•Â¨Ã¢Â”Â/Ship_Ã¢Â•Â¨ÃÂ¿ Ã¢Â•Â¨ÃÂ¯Ã¢Â•Â¤ÃÂÃ¢Â•Â¨Ã¢Â•Â›Ã¢Â•Â¤ÃÂ—Ã¢Â•Â¨Ã¢Â•Â—Ã¢Â•Â¨Ã¢Â–Â‘ Ã¢Â•Â¨ÃÂ¯Ã¢Â•Â¤ÃÂÃ¢Â•Â¨Ã¢Â•Â› Ã¢Â•Â¨ÃÂ½Ã¢Â•Â¤ÃÂ’Ã¢Â•Â¨Ã¢Â•Â› Ã¢Â•Â¨Ã¢Â–Â“Ã¢Â•Â¨ÃÂ®Ã¢Â•Â¨Ã¢Â”Â¤Ã¢Â•Â¨Ã¢Â•ÂœÃ¢Â•Â¨Ã¢Â•Â›Ã¢Â•Â¨Ã¢Â•Â£ Ã¢Â•Â¨ÃÂªÃ¢Â•Â¨Ã¢Â•ÂœÃ¢Â•Â¨Ã¢Â•Â•Ã¢Â•Â¨Ã¢Â•Â¢Ã¢Â•Â¨Ã¢Â•Â‘Ã¢Â•Â¨Ã¢Â•Â¡!.png File name too long"
â€“Â Nickolai Leschov
Jan 13 '17 at 19:01

I managed to extract .zip file correctly with LANG=ru_RU.CP1251; unzip Bleed.zip (it was Cyrillic encoding in my case). Now I wonder how do I set up my system so that I can correctly open such .zip files in GUI by default?
â€“Â Nickolai Leschov
Jan 13 '17 at 19:18

@NickolaiLeschov Ask a question and someone may be able to help you. You'll probably need to provide more information about your system.
â€“Â Michael Homer
Jan 13 '17 at 22:09

unzip -UU foo.zip worked for Turkish characters
â€“Â Mert S. Kaplan
Nov 14 '17 at 20:04

add a commentÂ |Â

up vote
1
down vote

unzip -xa stupid.zip

answered Mar 4 at 15:27

Igor Zinov'yev

1113

+1 although I have nothing to test this with right now.
â€“Â einpoklum
Mar 4 at 15:32

According to the man page of unzip the -a switch takes care of converting text files. Not file names.
â€“Â beruic
May 7 at 14:04

add a commentÂ |Â

up vote
1
down vote

unzip -xa stupid.zip

answered Mar 4 at 15:27

Igor Zinov'yev

1113

+1 although I have nothing to test this with right now.
â€“Â einpoklum
Mar 4 at 15:32

According to the man page of unzip the -a switch takes care of converting text files. Not file names.
â€“Â beruic
May 7 at 14:04

add a commentÂ |Â

up vote
1
down vote

unzip -xa stupid.zip

answered Mar 4 at 15:27

Igor Zinov'yev

1113

unzip -xa stupid.zip

answered Mar 4 at 15:27

Igor Zinov'yev

1113

answered Mar 4 at 15:27

Igor Zinov'yev

1113

answered Mar 4 at 15:27

Igor Zinov'yev

1113

answered Mar 4 at 15:27

Igor Zinov'yev

1113

+1 although I have nothing to test this with right now.
â€“Â einpoklum
Mar 4 at 15:32

According to the man page of unzip the -a switch takes care of converting text files. Not file names.
â€“Â beruic
May 7 at 14:04

add a commentÂ |Â

+1 although I have nothing to test this with right now.
â€“Â einpoklum
Mar 4 at 15:32

According to the man page of unzip the -a switch takes care of converting text files. Not file names.
â€“Â beruic
May 7 at 14:04

+1 although I have nothing to test this with right now.
â€“Â einpoklum
Mar 4 at 15:32

According to the man page of unzip the -a switch takes care of converting text files. Not file names.
â€“Â beruic
May 7 at 14:04

add a commentÂ |Â

up vote
1
down vote

I had success with the command 7z x <source.zip>.

Version:

p7zip Version 16.02 (locale=utf8,Utf16=on,HugeFiles=on,64 bits,[...])

Potentially relevant environment:

LANG=en_US.UTF-8
LC_ALL=en_US.UTF-8
LC_CTYPE=UTF-8

It was able to decompress all files with 8-bit characters in their filenames, with some of these characters skipped, some garbled.

answered Mar 11 at 13:03

vszakats

112

I'll try this next time, thank you.
â€“Â einpoklum
Mar 11 at 13:39

add a commentÂ |Â

up vote
1
down vote

I had success with the command 7z x <source.zip>.

Version:

p7zip Version 16.02 (locale=utf8,Utf16=on,HugeFiles=on,64 bits,[...])

Potentially relevant environment:

LANG=en_US.UTF-8
LC_ALL=en_US.UTF-8
LC_CTYPE=UTF-8

It was able to decompress all files with 8-bit characters in their filenames, with some of these characters skipped, some garbled.

answered Mar 11 at 13:03

vszakats

112

I'll try this next time, thank you.
â€“Â einpoklum
Mar 11 at 13:39

add a commentÂ |Â

up vote
1
down vote

I had success with the command 7z x <source.zip>.

Version:

p7zip Version 16.02 (locale=utf8,Utf16=on,HugeFiles=on,64 bits,[...])

Potentially relevant environment:

LANG=en_US.UTF-8
LC_ALL=en_US.UTF-8
LC_CTYPE=UTF-8

It was able to decompress all files with 8-bit characters in their filenames, with some of these characters skipped, some garbled.

answered Mar 11 at 13:03

vszakats

112

I had success with the command 7z x <source.zip>.

Version:

p7zip Version 16.02 (locale=utf8,Utf16=on,HugeFiles=on,64 bits,[...])

Potentially relevant environment:

LANG=en_US.UTF-8
LC_ALL=en_US.UTF-8
LC_CTYPE=UTF-8

It was able to decompress all files with 8-bit characters in their filenames, with some of these characters skipped, some garbled.

answered Mar 11 at 13:03

vszakats

112

answered Mar 11 at 13:03

vszakats

112

answered Mar 11 at 13:03

vszakats

112

answered Mar 11 at 13:03

vszakats

112

I'll try this next time, thank you.
â€“Â einpoklum
Mar 11 at 13:39

add a commentÂ |Â

I'll try this next time, thank you.
â€“Â einpoklum
Mar 11 at 13:39

I'll try this next time, thank you.
â€“Â einpoklum
Mar 11 at 13:39

add a commentÂ |Â

up vote
0
down vote

edited Feb 14 at 1:11

iruvar

11.6k62959

answered Feb 14 at 0:41

Miro Junker

add a commentÂ |Â

up vote
0
down vote

edited Feb 14 at 1:11

iruvar

11.6k62959

answered Feb 14 at 0:41

Miro Junker

add a commentÂ |Â

up vote
0
down vote

edited Feb 14 at 1:11

iruvar

11.6k62959

answered Feb 14 at 0:41

Miro Junker

edited Feb 14 at 1:11

iruvar

11.6k62959

answered Feb 14 at 0:41

Miro Junker

edited Feb 14 at 1:11

iruvar

11.6k62959

edited Feb 14 at 1:11

iruvar

11.6k62959

edited Feb 14 at 1:11

iruvar

11.6k62959

answered Feb 14 at 0:41

Miro Junker

answered Feb 14 at 0:41

Miro Junker

answered Feb 14 at 0:41

Miro Junker

add a commentÂ |Â

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

搜尋此網誌

mjhjmtu