file command gives incorrect encoding type

up vote
0
down vote

favorite

I have a csv file in hand. When I ran 'file -i filename', it shows that it's encoded as us-ascii. But when I ran cat filename | csvcut -t -e us-ascii, I got an error:

"Your file is not "us-ascii" encoded. Please specify the correct encoding with the -e flag or with the PYTHONIOENCODING environment variable"

csvkit documentation can be found here.

I also found the file has HEX codes like 0xd1, which caused some issues. So how do I find the correct encoding of this file? Ideally I would like to convert it to utf-8 encoding. What shall do?

edited Sep 15 at 15:59

Rui F Ribeiro

36.8k1273117

asked Sep 10 at 19:02

user3768495

1303

For example include a part of your txt file in your question.
â€“Â Ipor Sircer
Sep 10 at 19:04

EBCDIC ? ibm.com/support/knowledgecenter/en/SSZJPZ_11.5.0/â€¦
â€“Â steve
Sep 10 at 19:22

Hi @IporSircer, to prepare the example file, I took a small part of the file out and made sure it contains the row of 0xd1 which caused trouble. Then I re-ran the file -i command and this time it says the encoding is iso-8559-1. And that seems to be the correct encoding for the original file. So I guess the file -i only looks at a portion of a file and draws a conclusion?
â€“Â user3768495
Sep 10 at 19:32

Bottom line? Determining the correct encoding of a file is actually very, very difficult, unfortunately.If Python is an option, you can try chardet which gives great results in my opinion. FYI even chardet will give you results with a confidence interval (which is pretty smart!). pypi.org/project/chardet
â€“Â pi0tr
Sep 10 at 20:45

add a commentÂ |Â

up vote
0
down vote

favorite

I have a csv file in hand. When I ran 'file -i filename', it shows that it's encoded as us-ascii. But when I ran cat filename | csvcut -t -e us-ascii, I got an error:

"Your file is not "us-ascii" encoded. Please specify the correct encoding with the -e flag or with the PYTHONIOENCODING environment variable"

csvkit documentation can be found here.

I also found the file has HEX codes like 0xd1, which caused some issues. So how do I find the correct encoding of this file? Ideally I would like to convert it to utf-8 encoding. What shall do?

edited Sep 15 at 15:59

Rui F Ribeiro

36.8k1273117

asked Sep 10 at 19:02

user3768495

1303

For example include a part of your txt file in your question.
â€“Â Ipor Sircer
Sep 10 at 19:04

EBCDIC ? ibm.com/support/knowledgecenter/en/SSZJPZ_11.5.0/â€¦
â€“Â steve
Sep 10 at 19:22

Hi @IporSircer, to prepare the example file, I took a small part of the file out and made sure it contains the row of 0xd1 which caused trouble. Then I re-ran the file -i command and this time it says the encoding is iso-8559-1. And that seems to be the correct encoding for the original file. So I guess the file -i only looks at a portion of a file and draws a conclusion?
â€“Â user3768495
Sep 10 at 19:32

Bottom line? Determining the correct encoding of a file is actually very, very difficult, unfortunately.If Python is an option, you can try chardet which gives great results in my opinion. FYI even chardet will give you results with a confidence interval (which is pretty smart!). pypi.org/project/chardet
â€“Â pi0tr
Sep 10 at 20:45

add a commentÂ |Â

up vote
0
down vote

favorite

I have a csv file in hand. When I ran 'file -i filename', it shows that it's encoded as us-ascii. But when I ran cat filename | csvcut -t -e us-ascii, I got an error:

"Your file is not "us-ascii" encoded. Please specify the correct encoding with the -e flag or with the PYTHONIOENCODING environment variable"

csvkit documentation can be found here.

I also found the file has HEX codes like 0xd1, which caused some issues. So how do I find the correct encoding of this file? Ideally I would like to convert it to utf-8 encoding. What shall do?

edited Sep 15 at 15:59

Rui F Ribeiro

36.8k1273117

asked Sep 10 at 19:02

user3768495

1303

I have a csv file in hand. When I ran 'file -i filename', it shows that it's encoded as us-ascii. But when I ran cat filename | csvcut -t -e us-ascii, I got an error:

"Your file is not "us-ascii" encoded. Please specify the correct encoding with the -e flag or with the PYTHONIOENCODING environment variable"

csvkit documentation can be found here.

I also found the file has HEX codes like 0xd1, which caused some issues. So how do I find the correct encoding of this file? Ideally I would like to convert it to utf-8 encoding. What shall do?

csv character-encoding unicode ascii

edited Sep 15 at 15:59

Rui F Ribeiro

36.8k1273117

asked Sep 10 at 19:02

user3768495

1303

edited Sep 15 at 15:59

Rui F Ribeiro

36.8k1273117

asked Sep 10 at 19:02

user3768495

1303

edited Sep 15 at 15:59

Rui F Ribeiro

36.8k1273117

edited Sep 15 at 15:59

Rui F Ribeiro

36.8k1273117

edited Sep 15 at 15:59

Rui F Ribeiro

36.8k1273117

asked Sep 10 at 19:02

user3768495

1303

asked Sep 10 at 19:02

user3768495

1303

asked Sep 10 at 19:02

user3768495

1303

For example include a part of your txt file in your question.
â€“Â Ipor Sircer
Sep 10 at 19:04

EBCDIC ? ibm.com/support/knowledgecenter/en/SSZJPZ_11.5.0/â€¦
â€“Â steve
Sep 10 at 19:22

Hi @IporSircer, to prepare the example file, I took a small part of the file out and made sure it contains the row of 0xd1 which caused trouble. Then I re-ran the file -i command and this time it says the encoding is iso-8559-1. And that seems to be the correct encoding for the original file. So I guess the file -i only looks at a portion of a file and draws a conclusion?
â€“Â user3768495
Sep 10 at 19:32

Bottom line? Determining the correct encoding of a file is actually very, very difficult, unfortunately.If Python is an option, you can try chardet which gives great results in my opinion. FYI even chardet will give you results with a confidence interval (which is pretty smart!). pypi.org/project/chardet
â€“Â pi0tr
Sep 10 at 20:45

add a commentÂ |Â

For example include a part of your txt file in your question.
â€“Â Ipor Sircer
Sep 10 at 19:04

EBCDIC ? ibm.com/support/knowledgecenter/en/SSZJPZ_11.5.0/â€¦
â€“Â steve
Sep 10 at 19:22

Hi @IporSircer, to prepare the example file, I took a small part of the file out and made sure it contains the row of 0xd1 which caused trouble. Then I re-ran the file -i command and this time it says the encoding is iso-8559-1. And that seems to be the correct encoding for the original file. So I guess the file -i only looks at a portion of a file and draws a conclusion?
â€“Â user3768495
Sep 10 at 19:32

Bottom line? Determining the correct encoding of a file is actually very, very difficult, unfortunately.If Python is an option, you can try chardet which gives great results in my opinion. FYI even chardet will give you results with a confidence interval (which is pretty smart!). pypi.org/project/chardet
â€“Â pi0tr
Sep 10 at 20:45

For example include a part of your txt file in your question.
â€“Â Ipor Sircer
Sep 10 at 19:04

EBCDIC ? ibm.com/support/knowledgecenter/en/SSZJPZ_11.5.0/â€¦
â€“Â steve
Sep 10 at 19:22

Hi @IporSircer, to prepare the example file, I took a small part of the file out and made sure it contains the row of 0xd1 which caused trouble. Then I re-ran the file -i command and this time it says the encoding is iso-8559-1. And that seems to be the correct encoding for the original file. So I guess the file -i only looks at a portion of a file and draws a conclusion?
â€“Â user3768495
Sep 10 at 19:32

Bottom line? Determining the correct encoding of a file is actually very, very difficult, unfortunately.If Python is an option, you can try chardet which gives great results in my opinion. FYI even chardet will give you results with a confidence interval (which is pretty smart!). pypi.org/project/chardet
â€“Â pi0tr
Sep 10 at 20:45

add a commentÂ |Â

active

oldest

votes

Your Answer

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f468084%2ffile-command-gives-incorrect-encoding-type%23new-answer', 'question_page');

);

Post as a guest

Name

active

oldest

votes

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

搜尋此網誌

mjhjmtu