Recursively converting Windows files to Unix files

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








1















I have a PHP application with is located on Linux with multiple directories (and sub-directories) and many PHP, JS, HTML, CSS, etc files. Many of the files have Windows EOL control characters and I am also concerned that some might not be UTF-8 encoded but maybe ISO-8859-1, Windows-1252, etc. My desire is to convert all files to UTF-8 with LF only.



Looks like I might have a couple steps.



The dos2unix man provides this solution:



find . -name *.txt |xargs dos2unix


https://stackoverflow.com/a/11929475 provides this solution:



find . -type f -print0 | xargs -0 dos2unix


https://stackoverflow.com/a/7068241 provides this solution:



find ./ -type f -exec dos2unix ;


I recognize the first will only convert txt files which isn't what I want but I can easily change to target all files using -type f. That being said, is one solution "better" than the other? If so, why? Is it possible to tell which files will be changed without changing them? When I finally change them, I don't want the date to change, and intend to use dos2unix's --keepdate flag. Should any other options be used?



Next, I will need to deal with encoding. https://stackoverflow.com/a/805474/1032531 recommends enca (or its sister command encov) and https://stackoverflow.com/a/64889/1032531 recommends iconv. It also seems like file might be applicable. Again, which one (or maybe something else all together) should be used? I installed enca and when executing enca --list languages, it lists several languages but not english (maybe choose "none"?), and I question is applicability. iconv was already installed, however, it does not have a man page (at least man iconv doesn't result in one). How can this be used to recursively check and convert encoding?



Please confirm/correct my proposed solution or provide a complete solution.










share|improve this question
























  • @K7AAY I thought it was pretty clear, however, modified added "files" in the sentence "My desire is to convert all files to UTF-8 with LF only". The example in dos2unix's example converts only txt files and not all files.

    – user1032531
    Mar 16 at 0:21

















1















I have a PHP application with is located on Linux with multiple directories (and sub-directories) and many PHP, JS, HTML, CSS, etc files. Many of the files have Windows EOL control characters and I am also concerned that some might not be UTF-8 encoded but maybe ISO-8859-1, Windows-1252, etc. My desire is to convert all files to UTF-8 with LF only.



Looks like I might have a couple steps.



The dos2unix man provides this solution:



find . -name *.txt |xargs dos2unix


https://stackoverflow.com/a/11929475 provides this solution:



find . -type f -print0 | xargs -0 dos2unix


https://stackoverflow.com/a/7068241 provides this solution:



find ./ -type f -exec dos2unix ;


I recognize the first will only convert txt files which isn't what I want but I can easily change to target all files using -type f. That being said, is one solution "better" than the other? If so, why? Is it possible to tell which files will be changed without changing them? When I finally change them, I don't want the date to change, and intend to use dos2unix's --keepdate flag. Should any other options be used?



Next, I will need to deal with encoding. https://stackoverflow.com/a/805474/1032531 recommends enca (or its sister command encov) and https://stackoverflow.com/a/64889/1032531 recommends iconv. It also seems like file might be applicable. Again, which one (or maybe something else all together) should be used? I installed enca and when executing enca --list languages, it lists several languages but not english (maybe choose "none"?), and I question is applicability. iconv was already installed, however, it does not have a man page (at least man iconv doesn't result in one). How can this be used to recursively check and convert encoding?



Please confirm/correct my proposed solution or provide a complete solution.










share|improve this question
























  • @K7AAY I thought it was pretty clear, however, modified added "files" in the sentence "My desire is to convert all files to UTF-8 with LF only". The example in dos2unix's example converts only txt files and not all files.

    – user1032531
    Mar 16 at 0:21













1












1








1








I have a PHP application with is located on Linux with multiple directories (and sub-directories) and many PHP, JS, HTML, CSS, etc files. Many of the files have Windows EOL control characters and I am also concerned that some might not be UTF-8 encoded but maybe ISO-8859-1, Windows-1252, etc. My desire is to convert all files to UTF-8 with LF only.



Looks like I might have a couple steps.



The dos2unix man provides this solution:



find . -name *.txt |xargs dos2unix


https://stackoverflow.com/a/11929475 provides this solution:



find . -type f -print0 | xargs -0 dos2unix


https://stackoverflow.com/a/7068241 provides this solution:



find ./ -type f -exec dos2unix ;


I recognize the first will only convert txt files which isn't what I want but I can easily change to target all files using -type f. That being said, is one solution "better" than the other? If so, why? Is it possible to tell which files will be changed without changing them? When I finally change them, I don't want the date to change, and intend to use dos2unix's --keepdate flag. Should any other options be used?



Next, I will need to deal with encoding. https://stackoverflow.com/a/805474/1032531 recommends enca (or its sister command encov) and https://stackoverflow.com/a/64889/1032531 recommends iconv. It also seems like file might be applicable. Again, which one (or maybe something else all together) should be used? I installed enca and when executing enca --list languages, it lists several languages but not english (maybe choose "none"?), and I question is applicability. iconv was already installed, however, it does not have a man page (at least man iconv doesn't result in one). How can this be used to recursively check and convert encoding?



Please confirm/correct my proposed solution or provide a complete solution.










share|improve this question
















I have a PHP application with is located on Linux with multiple directories (and sub-directories) and many PHP, JS, HTML, CSS, etc files. Many of the files have Windows EOL control characters and I am also concerned that some might not be UTF-8 encoded but maybe ISO-8859-1, Windows-1252, etc. My desire is to convert all files to UTF-8 with LF only.



Looks like I might have a couple steps.



The dos2unix man provides this solution:



find . -name *.txt |xargs dos2unix


https://stackoverflow.com/a/11929475 provides this solution:



find . -type f -print0 | xargs -0 dos2unix


https://stackoverflow.com/a/7068241 provides this solution:



find ./ -type f -exec dos2unix ;


I recognize the first will only convert txt files which isn't what I want but I can easily change to target all files using -type f. That being said, is one solution "better" than the other? If so, why? Is it possible to tell which files will be changed without changing them? When I finally change them, I don't want the date to change, and intend to use dos2unix's --keepdate flag. Should any other options be used?



Next, I will need to deal with encoding. https://stackoverflow.com/a/805474/1032531 recommends enca (or its sister command encov) and https://stackoverflow.com/a/64889/1032531 recommends iconv. It also seems like file might be applicable. Again, which one (or maybe something else all together) should be used? I installed enca and when executing enca --list languages, it lists several languages but not english (maybe choose "none"?), and I question is applicability. iconv was already installed, however, it does not have a man page (at least man iconv doesn't result in one). How can this be used to recursively check and convert encoding?



Please confirm/correct my proposed solution or provide a complete solution.







files unicode recursive newlines






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Mar 16 at 0:18







user1032531

















asked Mar 15 at 13:04









user1032531user1032531

58011124




58011124












  • @K7AAY I thought it was pretty clear, however, modified added "files" in the sentence "My desire is to convert all files to UTF-8 with LF only". The example in dos2unix's example converts only txt files and not all files.

    – user1032531
    Mar 16 at 0:21

















  • @K7AAY I thought it was pretty clear, however, modified added "files" in the sentence "My desire is to convert all files to UTF-8 with LF only". The example in dos2unix's example converts only txt files and not all files.

    – user1032531
    Mar 16 at 0:21
















@K7AAY I thought it was pretty clear, however, modified added "files" in the sentence "My desire is to convert all files to UTF-8 with LF only". The example in dos2unix's example converts only txt files and not all files.

– user1032531
Mar 16 at 0:21





@K7AAY I thought it was pretty clear, however, modified added "files" in the sentence "My desire is to convert all files to UTF-8 with LF only". The example in dos2unix's example converts only txt files and not all files.

– user1032531
Mar 16 at 0:21










1 Answer
1






active

oldest

votes


















1














There's quite a few questions here rolled into one.



Firstly when using find I would always use --exec instead of xargs. As a general rule it's better to do things in as few commands as possible. But also the first two methods write all the file names out to a text stream ready for xargs to re-interpret back into file names. Its a needless step which only adds (addmittedly small) opportunity to fail.



dos2unix will accept multiple file names so I would use:



find . -type f -exec dos2unix --keepdate +


This will stack up long lists of files and then kick off dos2unix on a whole bunch of them at once.




To Find out which files will be touch just drop the exec clauses:



find . -type f



Encoding changes are far more problematic. Please be aware that there is no way to reliably determine the current encoding of any text file. It can sometimes be guessed but that is never 100% reliable. So you can only batch process encoding if you are sure all the files are currently the same encoding.



I would recommend using iconv. It really is the default too for this job. You can find a man page for it here:



https://linux.die.net/man/1/iconv



There's a working example of how to use iconv with find here:



https://stackoverflow.com/questions/4544669/batch-convert-latin-1-files-to-utf-8-using-iconv






share|improve this answer























    Your Answer








    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "106"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













    draft saved

    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f506506%2frecursively-converting-windows-files-to-unix-files%23new-answer', 'question_page');

    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    1














    There's quite a few questions here rolled into one.



    Firstly when using find I would always use --exec instead of xargs. As a general rule it's better to do things in as few commands as possible. But also the first two methods write all the file names out to a text stream ready for xargs to re-interpret back into file names. Its a needless step which only adds (addmittedly small) opportunity to fail.



    dos2unix will accept multiple file names so I would use:



    find . -type f -exec dos2unix --keepdate +


    This will stack up long lists of files and then kick off dos2unix on a whole bunch of them at once.




    To Find out which files will be touch just drop the exec clauses:



    find . -type f



    Encoding changes are far more problematic. Please be aware that there is no way to reliably determine the current encoding of any text file. It can sometimes be guessed but that is never 100% reliable. So you can only batch process encoding if you are sure all the files are currently the same encoding.



    I would recommend using iconv. It really is the default too for this job. You can find a man page for it here:



    https://linux.die.net/man/1/iconv



    There's a working example of how to use iconv with find here:



    https://stackoverflow.com/questions/4544669/batch-convert-latin-1-files-to-utf-8-using-iconv






    share|improve this answer



























      1














      There's quite a few questions here rolled into one.



      Firstly when using find I would always use --exec instead of xargs. As a general rule it's better to do things in as few commands as possible. But also the first two methods write all the file names out to a text stream ready for xargs to re-interpret back into file names. Its a needless step which only adds (addmittedly small) opportunity to fail.



      dos2unix will accept multiple file names so I would use:



      find . -type f -exec dos2unix --keepdate +


      This will stack up long lists of files and then kick off dos2unix on a whole bunch of them at once.




      To Find out which files will be touch just drop the exec clauses:



      find . -type f



      Encoding changes are far more problematic. Please be aware that there is no way to reliably determine the current encoding of any text file. It can sometimes be guessed but that is never 100% reliable. So you can only batch process encoding if you are sure all the files are currently the same encoding.



      I would recommend using iconv. It really is the default too for this job. You can find a man page for it here:



      https://linux.die.net/man/1/iconv



      There's a working example of how to use iconv with find here:



      https://stackoverflow.com/questions/4544669/batch-convert-latin-1-files-to-utf-8-using-iconv






      share|improve this answer

























        1












        1








        1







        There's quite a few questions here rolled into one.



        Firstly when using find I would always use --exec instead of xargs. As a general rule it's better to do things in as few commands as possible. But also the first two methods write all the file names out to a text stream ready for xargs to re-interpret back into file names. Its a needless step which only adds (addmittedly small) opportunity to fail.



        dos2unix will accept multiple file names so I would use:



        find . -type f -exec dos2unix --keepdate +


        This will stack up long lists of files and then kick off dos2unix on a whole bunch of them at once.




        To Find out which files will be touch just drop the exec clauses:



        find . -type f



        Encoding changes are far more problematic. Please be aware that there is no way to reliably determine the current encoding of any text file. It can sometimes be guessed but that is never 100% reliable. So you can only batch process encoding if you are sure all the files are currently the same encoding.



        I would recommend using iconv. It really is the default too for this job. You can find a man page for it here:



        https://linux.die.net/man/1/iconv



        There's a working example of how to use iconv with find here:



        https://stackoverflow.com/questions/4544669/batch-convert-latin-1-files-to-utf-8-using-iconv






        share|improve this answer













        There's quite a few questions here rolled into one.



        Firstly when using find I would always use --exec instead of xargs. As a general rule it's better to do things in as few commands as possible. But also the first two methods write all the file names out to a text stream ready for xargs to re-interpret back into file names. Its a needless step which only adds (addmittedly small) opportunity to fail.



        dos2unix will accept multiple file names so I would use:



        find . -type f -exec dos2unix --keepdate +


        This will stack up long lists of files and then kick off dos2unix on a whole bunch of them at once.




        To Find out which files will be touch just drop the exec clauses:



        find . -type f



        Encoding changes are far more problematic. Please be aware that there is no way to reliably determine the current encoding of any text file. It can sometimes be guessed but that is never 100% reliable. So you can only batch process encoding if you are sure all the files are currently the same encoding.



        I would recommend using iconv. It really is the default too for this job. You can find a man page for it here:



        https://linux.die.net/man/1/iconv



        There's a working example of how to use iconv with find here:



        https://stackoverflow.com/questions/4544669/batch-convert-latin-1-files-to-utf-8-using-iconv







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Mar 16 at 1:54









        Philip CoulingPhilip Couling

        2,5791123




        2,5791123



























            draft saved

            draft discarded
















































            Thanks for contributing an answer to Unix & Linux Stack Exchange!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f506506%2frecursively-converting-windows-files-to-unix-files%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown






            Popular posts from this blog

            How to check contact read email or not when send email to Individual?

            Displaying single band from multi-band raster using QGIS

            How many registers does an x86_64 CPU actually have?