Is sort -k1,2 equivalent to sort -k1,1 -k2,2?

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;







up vote
1
down vote

favorite












I'm experimenting with GNU sort and LC_COLLATE="en_US.UTF-8". I have a file called 'test':



1,0 1
10 2
1,0 3
10 4


With sort -k1,2 as well as with simple sort test the order doesn't change:



$ sort -k1,2 test
1,0 1
10 2
1,0 3
10 4


So, sort thinks that '1,0' is equal to '10' probably due to some quirks of LC_COLLATE (skipping punctuation?)



Now, when I use sort -k1,1 -k2,2, it gives me a different order:



$ sort -k1,1 -k2,2 test
10 2
10 4
1,0 1
1,0 3


and suddenly sort doesn't think that '10' is the same as '1,0' anymore.



What happened? Why isn't sort -k1,1 -k2,2 equivalent to sort -k1,2 in this case? Should it really be equivalent? Or have I misinterpreted the man page? (I tried versions 8.22 and 8.29 of coreutils, both have this behavior)







share|improve this question



























    up vote
    1
    down vote

    favorite












    I'm experimenting with GNU sort and LC_COLLATE="en_US.UTF-8". I have a file called 'test':



    1,0 1
    10 2
    1,0 3
    10 4


    With sort -k1,2 as well as with simple sort test the order doesn't change:



    $ sort -k1,2 test
    1,0 1
    10 2
    1,0 3
    10 4


    So, sort thinks that '1,0' is equal to '10' probably due to some quirks of LC_COLLATE (skipping punctuation?)



    Now, when I use sort -k1,1 -k2,2, it gives me a different order:



    $ sort -k1,1 -k2,2 test
    10 2
    10 4
    1,0 1
    1,0 3


    and suddenly sort doesn't think that '10' is the same as '1,0' anymore.



    What happened? Why isn't sort -k1,1 -k2,2 equivalent to sort -k1,2 in this case? Should it really be equivalent? Or have I misinterpreted the man page? (I tried versions 8.22 and 8.29 of coreutils, both have this behavior)







    share|improve this question























      up vote
      1
      down vote

      favorite









      up vote
      1
      down vote

      favorite











      I'm experimenting with GNU sort and LC_COLLATE="en_US.UTF-8". I have a file called 'test':



      1,0 1
      10 2
      1,0 3
      10 4


      With sort -k1,2 as well as with simple sort test the order doesn't change:



      $ sort -k1,2 test
      1,0 1
      10 2
      1,0 3
      10 4


      So, sort thinks that '1,0' is equal to '10' probably due to some quirks of LC_COLLATE (skipping punctuation?)



      Now, when I use sort -k1,1 -k2,2, it gives me a different order:



      $ sort -k1,1 -k2,2 test
      10 2
      10 4
      1,0 1
      1,0 3


      and suddenly sort doesn't think that '10' is the same as '1,0' anymore.



      What happened? Why isn't sort -k1,1 -k2,2 equivalent to sort -k1,2 in this case? Should it really be equivalent? Or have I misinterpreted the man page? (I tried versions 8.22 and 8.29 of coreutils, both have this behavior)







      share|improve this question













      I'm experimenting with GNU sort and LC_COLLATE="en_US.UTF-8". I have a file called 'test':



      1,0 1
      10 2
      1,0 3
      10 4


      With sort -k1,2 as well as with simple sort test the order doesn't change:



      $ sort -k1,2 test
      1,0 1
      10 2
      1,0 3
      10 4


      So, sort thinks that '1,0' is equal to '10' probably due to some quirks of LC_COLLATE (skipping punctuation?)



      Now, when I use sort -k1,1 -k2,2, it gives me a different order:



      $ sort -k1,1 -k2,2 test
      10 2
      10 4
      1,0 1
      1,0 3


      and suddenly sort doesn't think that '10' is the same as '1,0' anymore.



      What happened? Why isn't sort -k1,1 -k2,2 equivalent to sort -k1,2 in this case? Should it really be equivalent? Or have I misinterpreted the man page? (I tried versions 8.22 and 8.29 of coreutils, both have this behavior)









      share|improve this question












      share|improve this question




      share|improve this question








      edited Jul 19 at 16:23
























      asked Jul 19 at 16:11









      lutyj

      83




      83




















          1 Answer
          1






          active

          oldest

          votes

















          up vote
          2
          down vote



          accepted










          -k1,2 means “sort all lines, comparing the contents of all fields from 1 to 2 simultaneously”; so “1,0 1” is compared with “10 2” etc.



          -k1,1 -k2,2 means “sort all lines, comparing the contents of field 1, and when two lines have the same content in field 1, comparing the contents of field 2”; so “1,0” is compared with “10”, then “2” with “4” etc.



          What happens then, in both cases, boils down to collation, in particular weighting. Digits typically have a higher weight than punctuation and spacing. When comparing “1,0 1” and “10 2”, the difference due to the comma is ignored because the digits are different. When comparing “1,0” and “10”, the only difference is the comma, so it’s no longer ignored. See ISO 14651 for details.



          You can set LC_COLLATE=C to get collation based only on character values, with no weights. Your examples both result in



          1,0 1
          1,0 3
          10 2
          10 4


          when the “C” locale is used.






          share|improve this answer























          • Thank you for the explanation. That's some elaborate collation rules! Good to know. I actually use LC_COLLATE=C locally, but when running jobs on a hadoop cluster, I don't have the same control over remote environments. But that's for a different question :)
            – lutyj
            Jul 19 at 16:52











          Your Answer







          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "106"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          convertImagesToLinks: false,
          noModals: false,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );








           

          draft saved


          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f457259%2fis-sort-k1-2-equivalent-to-sort-k1-1-k2-2%23new-answer', 'question_page');

          );

          Post as a guest






























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes








          up vote
          2
          down vote



          accepted










          -k1,2 means “sort all lines, comparing the contents of all fields from 1 to 2 simultaneously”; so “1,0 1” is compared with “10 2” etc.



          -k1,1 -k2,2 means “sort all lines, comparing the contents of field 1, and when two lines have the same content in field 1, comparing the contents of field 2”; so “1,0” is compared with “10”, then “2” with “4” etc.



          What happens then, in both cases, boils down to collation, in particular weighting. Digits typically have a higher weight than punctuation and spacing. When comparing “1,0 1” and “10 2”, the difference due to the comma is ignored because the digits are different. When comparing “1,0” and “10”, the only difference is the comma, so it’s no longer ignored. See ISO 14651 for details.



          You can set LC_COLLATE=C to get collation based only on character values, with no weights. Your examples both result in



          1,0 1
          1,0 3
          10 2
          10 4


          when the “C” locale is used.






          share|improve this answer























          • Thank you for the explanation. That's some elaborate collation rules! Good to know. I actually use LC_COLLATE=C locally, but when running jobs on a hadoop cluster, I don't have the same control over remote environments. But that's for a different question :)
            – lutyj
            Jul 19 at 16:52















          up vote
          2
          down vote



          accepted










          -k1,2 means “sort all lines, comparing the contents of all fields from 1 to 2 simultaneously”; so “1,0 1” is compared with “10 2” etc.



          -k1,1 -k2,2 means “sort all lines, comparing the contents of field 1, and when two lines have the same content in field 1, comparing the contents of field 2”; so “1,0” is compared with “10”, then “2” with “4” etc.



          What happens then, in both cases, boils down to collation, in particular weighting. Digits typically have a higher weight than punctuation and spacing. When comparing “1,0 1” and “10 2”, the difference due to the comma is ignored because the digits are different. When comparing “1,0” and “10”, the only difference is the comma, so it’s no longer ignored. See ISO 14651 for details.



          You can set LC_COLLATE=C to get collation based only on character values, with no weights. Your examples both result in



          1,0 1
          1,0 3
          10 2
          10 4


          when the “C” locale is used.






          share|improve this answer























          • Thank you for the explanation. That's some elaborate collation rules! Good to know. I actually use LC_COLLATE=C locally, but when running jobs on a hadoop cluster, I don't have the same control over remote environments. But that's for a different question :)
            – lutyj
            Jul 19 at 16:52













          up vote
          2
          down vote



          accepted







          up vote
          2
          down vote



          accepted






          -k1,2 means “sort all lines, comparing the contents of all fields from 1 to 2 simultaneously”; so “1,0 1” is compared with “10 2” etc.



          -k1,1 -k2,2 means “sort all lines, comparing the contents of field 1, and when two lines have the same content in field 1, comparing the contents of field 2”; so “1,0” is compared with “10”, then “2” with “4” etc.



          What happens then, in both cases, boils down to collation, in particular weighting. Digits typically have a higher weight than punctuation and spacing. When comparing “1,0 1” and “10 2”, the difference due to the comma is ignored because the digits are different. When comparing “1,0” and “10”, the only difference is the comma, so it’s no longer ignored. See ISO 14651 for details.



          You can set LC_COLLATE=C to get collation based only on character values, with no weights. Your examples both result in



          1,0 1
          1,0 3
          10 2
          10 4


          when the “C” locale is used.






          share|improve this answer















          -k1,2 means “sort all lines, comparing the contents of all fields from 1 to 2 simultaneously”; so “1,0 1” is compared with “10 2” etc.



          -k1,1 -k2,2 means “sort all lines, comparing the contents of field 1, and when two lines have the same content in field 1, comparing the contents of field 2”; so “1,0” is compared with “10”, then “2” with “4” etc.



          What happens then, in both cases, boils down to collation, in particular weighting. Digits typically have a higher weight than punctuation and spacing. When comparing “1,0 1” and “10 2”, the difference due to the comma is ignored because the digits are different. When comparing “1,0” and “10”, the only difference is the comma, so it’s no longer ignored. See ISO 14651 for details.



          You can set LC_COLLATE=C to get collation based only on character values, with no weights. Your examples both result in



          1,0 1
          1,0 3
          10 2
          10 4


          when the “C” locale is used.







          share|improve this answer















          share|improve this answer



          share|improve this answer








          edited Jul 19 at 17:00


























          answered Jul 19 at 16:27









          Stephen Kitt

          139k22296359




          139k22296359











          • Thank you for the explanation. That's some elaborate collation rules! Good to know. I actually use LC_COLLATE=C locally, but when running jobs on a hadoop cluster, I don't have the same control over remote environments. But that's for a different question :)
            – lutyj
            Jul 19 at 16:52

















          • Thank you for the explanation. That's some elaborate collation rules! Good to know. I actually use LC_COLLATE=C locally, but when running jobs on a hadoop cluster, I don't have the same control over remote environments. But that's for a different question :)
            – lutyj
            Jul 19 at 16:52
















          Thank you for the explanation. That's some elaborate collation rules! Good to know. I actually use LC_COLLATE=C locally, but when running jobs on a hadoop cluster, I don't have the same control over remote environments. But that's for a different question :)
          – lutyj
          Jul 19 at 16:52





          Thank you for the explanation. That's some elaborate collation rules! Good to know. I actually use LC_COLLATE=C locally, but when running jobs on a hadoop cluster, I don't have the same control over remote environments. But that's for a different question :)
          – lutyj
          Jul 19 at 16:52













           

          draft saved


          draft discarded


























           


          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f457259%2fis-sort-k1-2-equivalent-to-sort-k1-1-k2-2%23new-answer', 'question_page');

          );

          Post as a guest













































































          Popular posts from this blog

          How to check contact read email or not when send email to Individual?

          Bahrain

          Postfix configuration issue with fips on centos 7; mailgun relay