file command gives incorrect encoding type

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
0
down vote

favorite












I have a csv file in hand. When I ran 'file -i filename', it shows that it's encoded as us-ascii. But when I ran cat filename | csvcut -t -e us-ascii, I got an error:



"Your file is not "us-ascii" encoded. Please specify the correct encoding with the -e flag or with the PYTHONIOENCODING environment variable"



csvkit documentation can be found here.



I also found the file has HEX codes like 0xd1, which caused some issues. So how do I find the correct encoding of this file? Ideally I would like to convert it to utf-8 encoding. What shall do?










share|improve this question























  • For example include a part of your txt file in your question.
    – Ipor Sircer
    Sep 10 at 19:04










  • EBCDIC ? ibm.com/support/knowledgecenter/en/SSZJPZ_11.5.0/…
    – steve
    Sep 10 at 19:22










  • Hi @IporSircer, to prepare the example file, I took a small part of the file out and made sure it contains the row of 0xd1 which caused trouble. Then I re-ran the file -i command and this time it says the encoding is iso-8559-1. And that seems to be the correct encoding for the original file. So I guess the file -i only looks at a portion of a file and draws a conclusion?
    – user3768495
    Sep 10 at 19:32











  • Bottom line? Determining the correct encoding of a file is actually very, very difficult, unfortunately.If Python is an option, you can try chardet which gives great results in my opinion. FYI even chardet will give you results with a confidence interval (which is pretty smart!). pypi.org/project/chardet
    – pi0tr
    Sep 10 at 20:45















up vote
0
down vote

favorite












I have a csv file in hand. When I ran 'file -i filename', it shows that it's encoded as us-ascii. But when I ran cat filename | csvcut -t -e us-ascii, I got an error:



"Your file is not "us-ascii" encoded. Please specify the correct encoding with the -e flag or with the PYTHONIOENCODING environment variable"



csvkit documentation can be found here.



I also found the file has HEX codes like 0xd1, which caused some issues. So how do I find the correct encoding of this file? Ideally I would like to convert it to utf-8 encoding. What shall do?










share|improve this question























  • For example include a part of your txt file in your question.
    – Ipor Sircer
    Sep 10 at 19:04










  • EBCDIC ? ibm.com/support/knowledgecenter/en/SSZJPZ_11.5.0/…
    – steve
    Sep 10 at 19:22










  • Hi @IporSircer, to prepare the example file, I took a small part of the file out and made sure it contains the row of 0xd1 which caused trouble. Then I re-ran the file -i command and this time it says the encoding is iso-8559-1. And that seems to be the correct encoding for the original file. So I guess the file -i only looks at a portion of a file and draws a conclusion?
    – user3768495
    Sep 10 at 19:32











  • Bottom line? Determining the correct encoding of a file is actually very, very difficult, unfortunately.If Python is an option, you can try chardet which gives great results in my opinion. FYI even chardet will give you results with a confidence interval (which is pretty smart!). pypi.org/project/chardet
    – pi0tr
    Sep 10 at 20:45













up vote
0
down vote

favorite









up vote
0
down vote

favorite











I have a csv file in hand. When I ran 'file -i filename', it shows that it's encoded as us-ascii. But when I ran cat filename | csvcut -t -e us-ascii, I got an error:



"Your file is not "us-ascii" encoded. Please specify the correct encoding with the -e flag or with the PYTHONIOENCODING environment variable"



csvkit documentation can be found here.



I also found the file has HEX codes like 0xd1, which caused some issues. So how do I find the correct encoding of this file? Ideally I would like to convert it to utf-8 encoding. What shall do?










share|improve this question















I have a csv file in hand. When I ran 'file -i filename', it shows that it's encoded as us-ascii. But when I ran cat filename | csvcut -t -e us-ascii, I got an error:



"Your file is not "us-ascii" encoded. Please specify the correct encoding with the -e flag or with the PYTHONIOENCODING environment variable"



csvkit documentation can be found here.



I also found the file has HEX codes like 0xd1, which caused some issues. So how do I find the correct encoding of this file? Ideally I would like to convert it to utf-8 encoding. What shall do?







csv character-encoding unicode ascii






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Sep 15 at 15:59









Rui F Ribeiro

36.8k1273117




36.8k1273117










asked Sep 10 at 19:02









user3768495

1303




1303











  • For example include a part of your txt file in your question.
    – Ipor Sircer
    Sep 10 at 19:04










  • EBCDIC ? ibm.com/support/knowledgecenter/en/SSZJPZ_11.5.0/…
    – steve
    Sep 10 at 19:22










  • Hi @IporSircer, to prepare the example file, I took a small part of the file out and made sure it contains the row of 0xd1 which caused trouble. Then I re-ran the file -i command and this time it says the encoding is iso-8559-1. And that seems to be the correct encoding for the original file. So I guess the file -i only looks at a portion of a file and draws a conclusion?
    – user3768495
    Sep 10 at 19:32











  • Bottom line? Determining the correct encoding of a file is actually very, very difficult, unfortunately.If Python is an option, you can try chardet which gives great results in my opinion. FYI even chardet will give you results with a confidence interval (which is pretty smart!). pypi.org/project/chardet
    – pi0tr
    Sep 10 at 20:45

















  • For example include a part of your txt file in your question.
    – Ipor Sircer
    Sep 10 at 19:04










  • EBCDIC ? ibm.com/support/knowledgecenter/en/SSZJPZ_11.5.0/…
    – steve
    Sep 10 at 19:22










  • Hi @IporSircer, to prepare the example file, I took a small part of the file out and made sure it contains the row of 0xd1 which caused trouble. Then I re-ran the file -i command and this time it says the encoding is iso-8559-1. And that seems to be the correct encoding for the original file. So I guess the file -i only looks at a portion of a file and draws a conclusion?
    – user3768495
    Sep 10 at 19:32











  • Bottom line? Determining the correct encoding of a file is actually very, very difficult, unfortunately.If Python is an option, you can try chardet which gives great results in my opinion. FYI even chardet will give you results with a confidence interval (which is pretty smart!). pypi.org/project/chardet
    – pi0tr
    Sep 10 at 20:45
















For example include a part of your txt file in your question.
– Ipor Sircer
Sep 10 at 19:04




For example include a part of your txt file in your question.
– Ipor Sircer
Sep 10 at 19:04












EBCDIC ? ibm.com/support/knowledgecenter/en/SSZJPZ_11.5.0/…
– steve
Sep 10 at 19:22




EBCDIC ? ibm.com/support/knowledgecenter/en/SSZJPZ_11.5.0/…
– steve
Sep 10 at 19:22












Hi @IporSircer, to prepare the example file, I took a small part of the file out and made sure it contains the row of 0xd1 which caused trouble. Then I re-ran the file -i command and this time it says the encoding is iso-8559-1. And that seems to be the correct encoding for the original file. So I guess the file -i only looks at a portion of a file and draws a conclusion?
– user3768495
Sep 10 at 19:32





Hi @IporSircer, to prepare the example file, I took a small part of the file out and made sure it contains the row of 0xd1 which caused trouble. Then I re-ran the file -i command and this time it says the encoding is iso-8559-1. And that seems to be the correct encoding for the original file. So I guess the file -i only looks at a portion of a file and draws a conclusion?
– user3768495
Sep 10 at 19:32













Bottom line? Determining the correct encoding of a file is actually very, very difficult, unfortunately.If Python is an option, you can try chardet which gives great results in my opinion. FYI even chardet will give you results with a confidence interval (which is pretty smart!). pypi.org/project/chardet
– pi0tr
Sep 10 at 20:45





Bottom line? Determining the correct encoding of a file is actually very, very difficult, unfortunately.If Python is an option, you can try chardet which gives great results in my opinion. FYI even chardet will give you results with a confidence interval (which is pretty smart!). pypi.org/project/chardet
– pi0tr
Sep 10 at 20:45
















active

oldest

votes











Your Answer







StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













 

draft saved


draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f468084%2ffile-command-gives-incorrect-encoding-type%23new-answer', 'question_page');

);

Post as a guest



































active

oldest

votes













active

oldest

votes









active

oldest

votes






active

oldest

votes















 

draft saved


draft discarded















































 


draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f468084%2ffile-command-gives-incorrect-encoding-type%23new-answer', 'question_page');

);

Post as a guest













































































Popular posts from this blog

How to check contact read email or not when send email to Individual?

Displaying single band from multi-band raster using QGIS

How many registers does an x86_64 CPU actually have?