Does sort --unique -k drop duplicates in original order?
Clash Royale CLAN TAG#URR8PPP
up vote
2
down vote
favorite
I'm doing a unique sort on a concatenated set of index files where the first column will sometimes change between each index and the second column will be a key value (actually hex addresses). Each indexN file iteration records addresses that changed since the prior one -- if address 0xaa11 exists in index3, in the merged+sorted output it should replace the 0xaa11 address references from index1 and index2.
The question is, can I do this type of merge reliably with a tool like GNU sort
if I merely pipe each source index to sort -u
in a certain order?
For example, pipe indexes newest to oldest:
cat index3 index2 index1 | sort -u -k 2,2
When I test this, it does seem to preserve the lines from index3 containing addresses that also appear in index2 and index1, while removing those duplicate references coming from index2 and index1.
But will that always be the case? The sort
man page is vague about this:
-u --unique output only the first of an equal run
I don't know enough about GNU sort's algorithms to predict whether lines with matching keys will always sort into the same order in which their source files were concatenated (e.g. the order they appear in the source stream). But I do know that sort algorithms don't always work in a linear fashion. That's why I'm looking for clarification of what sort's documentation seems to imply.
sort merge indexing key
add a comment |
up vote
2
down vote
favorite
I'm doing a unique sort on a concatenated set of index files where the first column will sometimes change between each index and the second column will be a key value (actually hex addresses). Each indexN file iteration records addresses that changed since the prior one -- if address 0xaa11 exists in index3, in the merged+sorted output it should replace the 0xaa11 address references from index1 and index2.
The question is, can I do this type of merge reliably with a tool like GNU sort
if I merely pipe each source index to sort -u
in a certain order?
For example, pipe indexes newest to oldest:
cat index3 index2 index1 | sort -u -k 2,2
When I test this, it does seem to preserve the lines from index3 containing addresses that also appear in index2 and index1, while removing those duplicate references coming from index2 and index1.
But will that always be the case? The sort
man page is vague about this:
-u --unique output only the first of an equal run
I don't know enough about GNU sort's algorithms to predict whether lines with matching keys will always sort into the same order in which their source files were concatenated (e.g. the order they appear in the source stream). But I do know that sort algorithms don't always work in a linear fashion. That's why I'm looking for clarification of what sort's documentation seems to imply.
sort merge indexing key
add a comment |
up vote
2
down vote
favorite
up vote
2
down vote
favorite
I'm doing a unique sort on a concatenated set of index files where the first column will sometimes change between each index and the second column will be a key value (actually hex addresses). Each indexN file iteration records addresses that changed since the prior one -- if address 0xaa11 exists in index3, in the merged+sorted output it should replace the 0xaa11 address references from index1 and index2.
The question is, can I do this type of merge reliably with a tool like GNU sort
if I merely pipe each source index to sort -u
in a certain order?
For example, pipe indexes newest to oldest:
cat index3 index2 index1 | sort -u -k 2,2
When I test this, it does seem to preserve the lines from index3 containing addresses that also appear in index2 and index1, while removing those duplicate references coming from index2 and index1.
But will that always be the case? The sort
man page is vague about this:
-u --unique output only the first of an equal run
I don't know enough about GNU sort's algorithms to predict whether lines with matching keys will always sort into the same order in which their source files were concatenated (e.g. the order they appear in the source stream). But I do know that sort algorithms don't always work in a linear fashion. That's why I'm looking for clarification of what sort's documentation seems to imply.
sort merge indexing key
I'm doing a unique sort on a concatenated set of index files where the first column will sometimes change between each index and the second column will be a key value (actually hex addresses). Each indexN file iteration records addresses that changed since the prior one -- if address 0xaa11 exists in index3, in the merged+sorted output it should replace the 0xaa11 address references from index1 and index2.
The question is, can I do this type of merge reliably with a tool like GNU sort
if I merely pipe each source index to sort -u
in a certain order?
For example, pipe indexes newest to oldest:
cat index3 index2 index1 | sort -u -k 2,2
When I test this, it does seem to preserve the lines from index3 containing addresses that also appear in index2 and index1, while removing those duplicate references coming from index2 and index1.
But will that always be the case? The sort
man page is vague about this:
-u --unique output only the first of an equal run
I don't know enough about GNU sort's algorithms to predict whether lines with matching keys will always sort into the same order in which their source files were concatenated (e.g. the order they appear in the source stream). But I do know that sort algorithms don't always work in a linear fashion. That's why I'm looking for clarification of what sort's documentation seems to imply.
sort merge indexing key
sort merge indexing key
asked Nov 22 at 6:02
tasket
707
707
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
up vote
3
down vote
accepted
sort
does not guarantee the order of lines that are equal for its purposes, unless you explicitly request this with the -s
switch (--stable
: stabilize sort by disabling last-resort comparison) – a stable sort algorithm is one that does not change the order of equal items.
However, the info page informs us that -u
"also disables the default last-resort comparison", so yeah, you should be fine, but it's entirely not obvious from the manpage.
What this also taught me: Always checkinfo
on GNU stuff. Thanks!
– tasket
Nov 22 at 18:41
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
3
down vote
accepted
sort
does not guarantee the order of lines that are equal for its purposes, unless you explicitly request this with the -s
switch (--stable
: stabilize sort by disabling last-resort comparison) – a stable sort algorithm is one that does not change the order of equal items.
However, the info page informs us that -u
"also disables the default last-resort comparison", so yeah, you should be fine, but it's entirely not obvious from the manpage.
What this also taught me: Always checkinfo
on GNU stuff. Thanks!
– tasket
Nov 22 at 18:41
add a comment |
up vote
3
down vote
accepted
sort
does not guarantee the order of lines that are equal for its purposes, unless you explicitly request this with the -s
switch (--stable
: stabilize sort by disabling last-resort comparison) – a stable sort algorithm is one that does not change the order of equal items.
However, the info page informs us that -u
"also disables the default last-resort comparison", so yeah, you should be fine, but it's entirely not obvious from the manpage.
What this also taught me: Always checkinfo
on GNU stuff. Thanks!
– tasket
Nov 22 at 18:41
add a comment |
up vote
3
down vote
accepted
up vote
3
down vote
accepted
sort
does not guarantee the order of lines that are equal for its purposes, unless you explicitly request this with the -s
switch (--stable
: stabilize sort by disabling last-resort comparison) – a stable sort algorithm is one that does not change the order of equal items.
However, the info page informs us that -u
"also disables the default last-resort comparison", so yeah, you should be fine, but it's entirely not obvious from the manpage.
sort
does not guarantee the order of lines that are equal for its purposes, unless you explicitly request this with the -s
switch (--stable
: stabilize sort by disabling last-resort comparison) – a stable sort algorithm is one that does not change the order of equal items.
However, the info page informs us that -u
"also disables the default last-resort comparison", so yeah, you should be fine, but it's entirely not obvious from the manpage.
answered Nov 22 at 6:35
Ulrich Schwarz
9,43012744
9,43012744
What this also taught me: Always checkinfo
on GNU stuff. Thanks!
– tasket
Nov 22 at 18:41
add a comment |
What this also taught me: Always checkinfo
on GNU stuff. Thanks!
– tasket
Nov 22 at 18:41
What this also taught me: Always check
info
on GNU stuff. Thanks!– tasket
Nov 22 at 18:41
What this also taught me: Always check
info
on GNU stuff. Thanks!– tasket
Nov 22 at 18:41
add a comment |
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f483363%2fdoes-sort-unique-k-drop-duplicates-in-original-order%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown