Recursively converting Windows files to Unix files
Clash Royale CLAN TAG#URR8PPP
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;
I have a PHP application with is located on Linux with multiple directories (and sub-directories) and many PHP, JS, HTML, CSS, etc files. Many of the files have Windows EOL control characters and I am also concerned that some might not be UTF-8 encoded but maybe ISO-8859-1, Windows-1252, etc. My desire is to convert all files to UTF-8 with LF only.
Looks like I might have a couple steps.
The dos2unix man provides this solution:
find . -name *.txt |xargs dos2unix
https://stackoverflow.com/a/11929475 provides this solution:
find . -type f -print0 | xargs -0 dos2unix
https://stackoverflow.com/a/7068241 provides this solution:
find ./ -type f -exec dos2unix ;
I recognize the first will only convert txt files which isn't what I want but I can easily change to target all files using -type f
. That being said, is one solution "better" than the other? If so, why? Is it possible to tell which files will be changed without changing them? When I finally change them, I don't want the date to change, and intend to use dos2unix's --keepdate
flag. Should any other options be used?
Next, I will need to deal with encoding. https://stackoverflow.com/a/805474/1032531 recommends enca
(or its sister command encov
) and https://stackoverflow.com/a/64889/1032531 recommends iconv
. It also seems like file
might be applicable. Again, which one (or maybe something else all together) should be used? I installed enca
and when executing enca --list languages
, it lists several languages but not english (maybe choose "none"?), and I question is applicability. iconv
was already installed, however, it does not have a man page (at least man iconv
doesn't result in one). How can this be used to recursively check and convert encoding?
Please confirm/correct my proposed solution or provide a complete solution.
files unicode recursive newlines
add a comment |
I have a PHP application with is located on Linux with multiple directories (and sub-directories) and many PHP, JS, HTML, CSS, etc files. Many of the files have Windows EOL control characters and I am also concerned that some might not be UTF-8 encoded but maybe ISO-8859-1, Windows-1252, etc. My desire is to convert all files to UTF-8 with LF only.
Looks like I might have a couple steps.
The dos2unix man provides this solution:
find . -name *.txt |xargs dos2unix
https://stackoverflow.com/a/11929475 provides this solution:
find . -type f -print0 | xargs -0 dos2unix
https://stackoverflow.com/a/7068241 provides this solution:
find ./ -type f -exec dos2unix ;
I recognize the first will only convert txt files which isn't what I want but I can easily change to target all files using -type f
. That being said, is one solution "better" than the other? If so, why? Is it possible to tell which files will be changed without changing them? When I finally change them, I don't want the date to change, and intend to use dos2unix's --keepdate
flag. Should any other options be used?
Next, I will need to deal with encoding. https://stackoverflow.com/a/805474/1032531 recommends enca
(or its sister command encov
) and https://stackoverflow.com/a/64889/1032531 recommends iconv
. It also seems like file
might be applicable. Again, which one (or maybe something else all together) should be used? I installed enca
and when executing enca --list languages
, it lists several languages but not english (maybe choose "none"?), and I question is applicability. iconv
was already installed, however, it does not have a man page (at least man iconv
doesn't result in one). How can this be used to recursively check and convert encoding?
Please confirm/correct my proposed solution or provide a complete solution.
files unicode recursive newlines
@K7AAY I thought it was pretty clear, however, modified added "files" in the sentence "My desire is to convert allfiles
to UTF-8 with LF only". The example in dos2unix's example converts only txt files and not all files.
– user1032531
Mar 16 at 0:21
add a comment |
I have a PHP application with is located on Linux with multiple directories (and sub-directories) and many PHP, JS, HTML, CSS, etc files. Many of the files have Windows EOL control characters and I am also concerned that some might not be UTF-8 encoded but maybe ISO-8859-1, Windows-1252, etc. My desire is to convert all files to UTF-8 with LF only.
Looks like I might have a couple steps.
The dos2unix man provides this solution:
find . -name *.txt |xargs dos2unix
https://stackoverflow.com/a/11929475 provides this solution:
find . -type f -print0 | xargs -0 dos2unix
https://stackoverflow.com/a/7068241 provides this solution:
find ./ -type f -exec dos2unix ;
I recognize the first will only convert txt files which isn't what I want but I can easily change to target all files using -type f
. That being said, is one solution "better" than the other? If so, why? Is it possible to tell which files will be changed without changing them? When I finally change them, I don't want the date to change, and intend to use dos2unix's --keepdate
flag. Should any other options be used?
Next, I will need to deal with encoding. https://stackoverflow.com/a/805474/1032531 recommends enca
(or its sister command encov
) and https://stackoverflow.com/a/64889/1032531 recommends iconv
. It also seems like file
might be applicable. Again, which one (or maybe something else all together) should be used? I installed enca
and when executing enca --list languages
, it lists several languages but not english (maybe choose "none"?), and I question is applicability. iconv
was already installed, however, it does not have a man page (at least man iconv
doesn't result in one). How can this be used to recursively check and convert encoding?
Please confirm/correct my proposed solution or provide a complete solution.
files unicode recursive newlines
I have a PHP application with is located on Linux with multiple directories (and sub-directories) and many PHP, JS, HTML, CSS, etc files. Many of the files have Windows EOL control characters and I am also concerned that some might not be UTF-8 encoded but maybe ISO-8859-1, Windows-1252, etc. My desire is to convert all files to UTF-8 with LF only.
Looks like I might have a couple steps.
The dos2unix man provides this solution:
find . -name *.txt |xargs dos2unix
https://stackoverflow.com/a/11929475 provides this solution:
find . -type f -print0 | xargs -0 dos2unix
https://stackoverflow.com/a/7068241 provides this solution:
find ./ -type f -exec dos2unix ;
I recognize the first will only convert txt files which isn't what I want but I can easily change to target all files using -type f
. That being said, is one solution "better" than the other? If so, why? Is it possible to tell which files will be changed without changing them? When I finally change them, I don't want the date to change, and intend to use dos2unix's --keepdate
flag. Should any other options be used?
Next, I will need to deal with encoding. https://stackoverflow.com/a/805474/1032531 recommends enca
(or its sister command encov
) and https://stackoverflow.com/a/64889/1032531 recommends iconv
. It also seems like file
might be applicable. Again, which one (or maybe something else all together) should be used? I installed enca
and when executing enca --list languages
, it lists several languages but not english (maybe choose "none"?), and I question is applicability. iconv
was already installed, however, it does not have a man page (at least man iconv
doesn't result in one). How can this be used to recursively check and convert encoding?
Please confirm/correct my proposed solution or provide a complete solution.
files unicode recursive newlines
files unicode recursive newlines
edited Mar 16 at 0:18
user1032531
asked Mar 15 at 13:04
user1032531user1032531
58011124
58011124
@K7AAY I thought it was pretty clear, however, modified added "files" in the sentence "My desire is to convert allfiles
to UTF-8 with LF only". The example in dos2unix's example converts only txt files and not all files.
– user1032531
Mar 16 at 0:21
add a comment |
@K7AAY I thought it was pretty clear, however, modified added "files" in the sentence "My desire is to convert allfiles
to UTF-8 with LF only". The example in dos2unix's example converts only txt files and not all files.
– user1032531
Mar 16 at 0:21
@K7AAY I thought it was pretty clear, however, modified added "files" in the sentence "My desire is to convert all
files
to UTF-8 with LF only". The example in dos2unix's example converts only txt files and not all files.– user1032531
Mar 16 at 0:21
@K7AAY I thought it was pretty clear, however, modified added "files" in the sentence "My desire is to convert all
files
to UTF-8 with LF only". The example in dos2unix's example converts only txt files and not all files.– user1032531
Mar 16 at 0:21
add a comment |
1 Answer
1
active
oldest
votes
There's quite a few questions here rolled into one.
Firstly when using find I would always use --exec
instead of xargs
. As a general rule it's better to do things in as few commands as possible. But also the first two methods write all the file names out to a text stream ready for xargs to re-interpret back into file names. Its a needless step which only adds (addmittedly small) opportunity to fail.
dos2unix
will accept multiple file names so I would use:
find . -type f -exec dos2unix --keepdate +
This will stack up long lists of files and then kick off dos2unix
on a whole bunch of them at once.
To Find out which files will be touch just drop the exec clauses:
find . -type f
Encoding changes are far more problematic. Please be aware that there is no way to reliably determine the current encoding of any text file. It can sometimes be guessed but that is never 100% reliable. So you can only batch process encoding if you are sure all the files are currently the same encoding.
I would recommend using iconv
. It really is the default too for this job. You can find a man page for it here:
https://linux.die.net/man/1/iconv
There's a working example of how to use iconv
with find
here:
https://stackoverflow.com/questions/4544669/batch-convert-latin-1-files-to-utf-8-using-iconv
add a comment |
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f506506%2frecursively-converting-windows-files-to-unix-files%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
There's quite a few questions here rolled into one.
Firstly when using find I would always use --exec
instead of xargs
. As a general rule it's better to do things in as few commands as possible. But also the first two methods write all the file names out to a text stream ready for xargs to re-interpret back into file names. Its a needless step which only adds (addmittedly small) opportunity to fail.
dos2unix
will accept multiple file names so I would use:
find . -type f -exec dos2unix --keepdate +
This will stack up long lists of files and then kick off dos2unix
on a whole bunch of them at once.
To Find out which files will be touch just drop the exec clauses:
find . -type f
Encoding changes are far more problematic. Please be aware that there is no way to reliably determine the current encoding of any text file. It can sometimes be guessed but that is never 100% reliable. So you can only batch process encoding if you are sure all the files are currently the same encoding.
I would recommend using iconv
. It really is the default too for this job. You can find a man page for it here:
https://linux.die.net/man/1/iconv
There's a working example of how to use iconv
with find
here:
https://stackoverflow.com/questions/4544669/batch-convert-latin-1-files-to-utf-8-using-iconv
add a comment |
There's quite a few questions here rolled into one.
Firstly when using find I would always use --exec
instead of xargs
. As a general rule it's better to do things in as few commands as possible. But also the first two methods write all the file names out to a text stream ready for xargs to re-interpret back into file names. Its a needless step which only adds (addmittedly small) opportunity to fail.
dos2unix
will accept multiple file names so I would use:
find . -type f -exec dos2unix --keepdate +
This will stack up long lists of files and then kick off dos2unix
on a whole bunch of them at once.
To Find out which files will be touch just drop the exec clauses:
find . -type f
Encoding changes are far more problematic. Please be aware that there is no way to reliably determine the current encoding of any text file. It can sometimes be guessed but that is never 100% reliable. So you can only batch process encoding if you are sure all the files are currently the same encoding.
I would recommend using iconv
. It really is the default too for this job. You can find a man page for it here:
https://linux.die.net/man/1/iconv
There's a working example of how to use iconv
with find
here:
https://stackoverflow.com/questions/4544669/batch-convert-latin-1-files-to-utf-8-using-iconv
add a comment |
There's quite a few questions here rolled into one.
Firstly when using find I would always use --exec
instead of xargs
. As a general rule it's better to do things in as few commands as possible. But also the first two methods write all the file names out to a text stream ready for xargs to re-interpret back into file names. Its a needless step which only adds (addmittedly small) opportunity to fail.
dos2unix
will accept multiple file names so I would use:
find . -type f -exec dos2unix --keepdate +
This will stack up long lists of files and then kick off dos2unix
on a whole bunch of them at once.
To Find out which files will be touch just drop the exec clauses:
find . -type f
Encoding changes are far more problematic. Please be aware that there is no way to reliably determine the current encoding of any text file. It can sometimes be guessed but that is never 100% reliable. So you can only batch process encoding if you are sure all the files are currently the same encoding.
I would recommend using iconv
. It really is the default too for this job. You can find a man page for it here:
https://linux.die.net/man/1/iconv
There's a working example of how to use iconv
with find
here:
https://stackoverflow.com/questions/4544669/batch-convert-latin-1-files-to-utf-8-using-iconv
There's quite a few questions here rolled into one.
Firstly when using find I would always use --exec
instead of xargs
. As a general rule it's better to do things in as few commands as possible. But also the first two methods write all the file names out to a text stream ready for xargs to re-interpret back into file names. Its a needless step which only adds (addmittedly small) opportunity to fail.
dos2unix
will accept multiple file names so I would use:
find . -type f -exec dos2unix --keepdate +
This will stack up long lists of files and then kick off dos2unix
on a whole bunch of them at once.
To Find out which files will be touch just drop the exec clauses:
find . -type f
Encoding changes are far more problematic. Please be aware that there is no way to reliably determine the current encoding of any text file. It can sometimes be guessed but that is never 100% reliable. So you can only batch process encoding if you are sure all the files are currently the same encoding.
I would recommend using iconv
. It really is the default too for this job. You can find a man page for it here:
https://linux.die.net/man/1/iconv
There's a working example of how to use iconv
with find
here:
https://stackoverflow.com/questions/4544669/batch-convert-latin-1-files-to-utf-8-using-iconv
answered Mar 16 at 1:54
Philip CoulingPhilip Couling
2,5791123
2,5791123
add a comment |
add a comment |
Thanks for contributing an answer to Unix & Linux Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f506506%2frecursively-converting-windows-files-to-unix-files%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
@K7AAY I thought it was pretty clear, however, modified added "files" in the sentence "My desire is to convert all
files
to UTF-8 with LF only". The example in dos2unix's example converts only txt files and not all files.– user1032531
Mar 16 at 0:21