Is sort -k1,2 equivalent to sort -k1,1 -k2,2?

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;

up vote
1
down vote

favorite

I'm experimenting with GNU sort and LC_COLLATE="en_US.UTF-8". I have a file called 'test':

With sort -k1,2 as well as with simple sort test the order doesn't change:

$ sort -k1,2 test
1,0 1
10 2
1,0 3
10 4

So, sort thinks that '1,0' is equal to '10' probably due to some quirks of LC_COLLATE (skipping punctuation?)

Now, when I use sort -k1,1 -k2,2, it gives me a different order:

$ sort -k1,1 -k2,2 test
10 2
10 4
1,0 1
1,0 3

and suddenly sort doesn't think that '10' is the same as '1,0' anymore.

What happened? Why isn't sort -k1,1 -k2,2 equivalent to sort -k1,2 in this case? Should it really be equivalent? Or have I misinterpreted the man page? (I tried versions 8.22 and 8.29 of coreutils, both have this behavior)

edited Jul 19 at 16:23

asked Jul 19 at 16:11

lutyj

add a commentÂ |Â

up vote
1
down vote

favorite

I'm experimenting with GNU sort and LC_COLLATE="en_US.UTF-8". I have a file called 'test':

With sort -k1,2 as well as with simple sort test the order doesn't change:

$ sort -k1,2 test
1,0 1
10 2
1,0 3
10 4

So, sort thinks that '1,0' is equal to '10' probably due to some quirks of LC_COLLATE (skipping punctuation?)

Now, when I use sort -k1,1 -k2,2, it gives me a different order:

$ sort -k1,1 -k2,2 test
10 2
10 4
1,0 1
1,0 3

and suddenly sort doesn't think that '10' is the same as '1,0' anymore.

edited Jul 19 at 16:23

asked Jul 19 at 16:11

lutyj

add a commentÂ |Â

up vote
1
down vote

favorite

I'm experimenting with GNU sort and LC_COLLATE="en_US.UTF-8". I have a file called 'test':

With sort -k1,2 as well as with simple sort test the order doesn't change:

$ sort -k1,2 test
1,0 1
10 2
1,0 3
10 4

So, sort thinks that '1,0' is equal to '10' probably due to some quirks of LC_COLLATE (skipping punctuation?)

Now, when I use sort -k1,1 -k2,2, it gives me a different order:

$ sort -k1,1 -k2,2 test
10 2
10 4
1,0 1
1,0 3

and suddenly sort doesn't think that '10' is the same as '1,0' anymore.

edited Jul 19 at 16:23

asked Jul 19 at 16:11

lutyj

I'm experimenting with GNU sort and LC_COLLATE="en_US.UTF-8". I have a file called 'test':

With sort -k1,2 as well as with simple sort test the order doesn't change:

$ sort -k1,2 test
1,0 1
10 2
1,0 3
10 4

So, sort thinks that '1,0' is equal to '10' probably due to some quirks of LC_COLLATE (skipping punctuation?)

Now, when I use sort -k1,1 -k2,2, it gives me a different order:

$ sort -k1,1 -k2,2 test
10 2
10 4
1,0 1
1,0 3

and suddenly sort doesn't think that '10' is the same as '1,0' anymore.

edited Jul 19 at 16:23

asked Jul 19 at 16:11

lutyj

edited Jul 19 at 16:23

asked Jul 19 at 16:11

lutyj

asked Jul 19 at 16:11

lutyj

asked Jul 19 at 16:11

lutyj

add a commentÂ |Â

1 Answer
1

active

oldest

votes

up vote
2
down vote

accepted

-k1,2 means Ã¢Â€Âœsort all lines, comparing the contents of all fields from 1 to 2 simultaneouslyÃ¢Â€Â; so Ã¢Â€Âœ1,0 1Ã¢Â€Â is compared with Ã¢Â€Âœ10 2Ã¢Â€Â etc.

-k1,1 -k2,2 means Ã¢Â€Âœsort all lines, comparing the contents of field 1, and when two lines have the same content in field 1, comparing the contents of field 2Ã¢Â€Â; so Ã¢Â€Âœ1,0Ã¢Â€Â is compared with Ã¢Â€Âœ10Ã¢Â€Â, then Ã¢Â€Âœ2Ã¢Â€Â with Ã¢Â€Âœ4Ã¢Â€Â etc.

What happens then, in both cases, boils down to collation, in particular weighting. Digits typically have a higher weight than punctuation and spacing. When comparing Ã¢Â€Âœ1,0 1Ã¢Â€Â and Ã¢Â€Âœ10 2Ã¢Â€Â, the difference due to the comma is ignored because the digits are different. When comparing Ã¢Â€Âœ1,0Ã¢Â€Â and Ã¢Â€Âœ10Ã¢Â€Â, the only difference is the comma, so itÃ¢Â€Â™s no longer ignored. See ISO 14651 for details.

You can set LC_COLLATE=C to get collation based only on character values, with no weights. Your examples both result in

when the Ã¢Â€ÂœCÃ¢Â€Â locale is used.

edited Jul 19 at 17:00

answered Jul 19 at 16:27

Stephen Kitt

139k22296359

Thank you for the explanation. That's some elaborate collation rules! Good to know. I actually use LC_COLLATE=C locally, but when running jobs on a hadoop cluster, I don't have the same control over remote environments. But that's for a different question :)
â€“Â lutyj
Jul 19 at 16:52

add a commentÂ |Â

Your Answer

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f457259%2fis-sort-k1-2-equivalent-to-sort-k1-1-k2-2%23new-answer', 'question_page');

);

Post as a guest

Name

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
2
down vote

accepted

You can set LC_COLLATE=C to get collation based only on character values, with no weights. Your examples both result in

when the Ã¢Â€ÂœCÃ¢Â€Â locale is used.

edited Jul 19 at 17:00

answered Jul 19 at 16:27

Stephen Kitt

139k22296359

Thank you for the explanation. That's some elaborate collation rules! Good to know. I actually use LC_COLLATE=C locally, but when running jobs on a hadoop cluster, I don't have the same control over remote environments. But that's for a different question :)
â€“Â lutyj
Jul 19 at 16:52

add a commentÂ |Â

up vote
2
down vote

accepted

You can set LC_COLLATE=C to get collation based only on character values, with no weights. Your examples both result in

when the Ã¢Â€ÂœCÃ¢Â€Â locale is used.

edited Jul 19 at 17:00

answered Jul 19 at 16:27

Stephen Kitt

139k22296359

Thank you for the explanation. That's some elaborate collation rules! Good to know. I actually use LC_COLLATE=C locally, but when running jobs on a hadoop cluster, I don't have the same control over remote environments. But that's for a different question :)
â€“Â lutyj
Jul 19 at 16:52

add a commentÂ |Â

up vote
2
down vote

accepted

You can set LC_COLLATE=C to get collation based only on character values, with no weights. Your examples both result in

when the Ã¢Â€ÂœCÃ¢Â€Â locale is used.

edited Jul 19 at 17:00

answered Jul 19 at 16:27

Stephen Kitt

139k22296359

You can set LC_COLLATE=C to get collation based only on character values, with no weights. Your examples both result in

when the Ã¢Â€ÂœCÃ¢Â€Â locale is used.

edited Jul 19 at 17:00

answered Jul 19 at 16:27

Stephen Kitt

139k22296359

edited Jul 19 at 17:00

answered Jul 19 at 16:27

Stephen Kitt

139k22296359

answered Jul 19 at 16:27

Stephen Kitt

139k22296359

answered Jul 19 at 16:27

Stephen Kitt

139k22296359

Thank you for the explanation. That's some elaborate collation rules! Good to know. I actually use LC_COLLATE=C locally, but when running jobs on a hadoop cluster, I don't have the same control over remote environments. But that's for a different question :)
â€“Â lutyj
Jul 19 at 16:52

add a commentÂ |Â

Thank you for the explanation. That's some elaborate collation rules! Good to know. I actually use LC_COLLATE=C locally, but when running jobs on a hadoop cluster, I don't have the same control over remote environments. But that's for a different question :)
â€“Â lutyj
Jul 19 at 16:52

Thank you for the explanation. That's some elaborate collation rules! Good to know. I actually use LC_COLLATE=C locally, but when running jobs on a hadoop cluster, I don't have the same control over remote environments. But that's for a different question :)
â€“Â lutyj
Jul 19 at 16:52

add a commentÂ |Â

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

搜尋此網誌

mjhjmtu