Does sort --unique -k drop duplicates in original order?

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
2
down vote

favorite












I'm doing a unique sort on a concatenated set of index files where the first column will sometimes change between each index and the second column will be a key value (actually hex addresses). Each indexN file iteration records addresses that changed since the prior one -- if address 0xaa11 exists in index3, in the merged+sorted output it should replace the 0xaa11 address references from index1 and index2.



The question is, can I do this type of merge reliably with a tool like GNU sort if I merely pipe each source index to sort -u in a certain order?



For example, pipe indexes newest to oldest:



cat index3 index2 index1 | sort -u -k 2,2


When I test this, it does seem to preserve the lines from index3 containing addresses that also appear in index2 and index1, while removing those duplicate references coming from index2 and index1.



But will that always be the case? The sort man page is vague about this:



-u --unique output only the first of an equal run


I don't know enough about GNU sort's algorithms to predict whether lines with matching keys will always sort into the same order in which their source files were concatenated (e.g. the order they appear in the source stream). But I do know that sort algorithms don't always work in a linear fashion. That's why I'm looking for clarification of what sort's documentation seems to imply.










share|improve this question

























    up vote
    2
    down vote

    favorite












    I'm doing a unique sort on a concatenated set of index files where the first column will sometimes change between each index and the second column will be a key value (actually hex addresses). Each indexN file iteration records addresses that changed since the prior one -- if address 0xaa11 exists in index3, in the merged+sorted output it should replace the 0xaa11 address references from index1 and index2.



    The question is, can I do this type of merge reliably with a tool like GNU sort if I merely pipe each source index to sort -u in a certain order?



    For example, pipe indexes newest to oldest:



    cat index3 index2 index1 | sort -u -k 2,2


    When I test this, it does seem to preserve the lines from index3 containing addresses that also appear in index2 and index1, while removing those duplicate references coming from index2 and index1.



    But will that always be the case? The sort man page is vague about this:



    -u --unique output only the first of an equal run


    I don't know enough about GNU sort's algorithms to predict whether lines with matching keys will always sort into the same order in which their source files were concatenated (e.g. the order they appear in the source stream). But I do know that sort algorithms don't always work in a linear fashion. That's why I'm looking for clarification of what sort's documentation seems to imply.










    share|improve this question























      up vote
      2
      down vote

      favorite









      up vote
      2
      down vote

      favorite











      I'm doing a unique sort on a concatenated set of index files where the first column will sometimes change between each index and the second column will be a key value (actually hex addresses). Each indexN file iteration records addresses that changed since the prior one -- if address 0xaa11 exists in index3, in the merged+sorted output it should replace the 0xaa11 address references from index1 and index2.



      The question is, can I do this type of merge reliably with a tool like GNU sort if I merely pipe each source index to sort -u in a certain order?



      For example, pipe indexes newest to oldest:



      cat index3 index2 index1 | sort -u -k 2,2


      When I test this, it does seem to preserve the lines from index3 containing addresses that also appear in index2 and index1, while removing those duplicate references coming from index2 and index1.



      But will that always be the case? The sort man page is vague about this:



      -u --unique output only the first of an equal run


      I don't know enough about GNU sort's algorithms to predict whether lines with matching keys will always sort into the same order in which their source files were concatenated (e.g. the order they appear in the source stream). But I do know that sort algorithms don't always work in a linear fashion. That's why I'm looking for clarification of what sort's documentation seems to imply.










      share|improve this question













      I'm doing a unique sort on a concatenated set of index files where the first column will sometimes change between each index and the second column will be a key value (actually hex addresses). Each indexN file iteration records addresses that changed since the prior one -- if address 0xaa11 exists in index3, in the merged+sorted output it should replace the 0xaa11 address references from index1 and index2.



      The question is, can I do this type of merge reliably with a tool like GNU sort if I merely pipe each source index to sort -u in a certain order?



      For example, pipe indexes newest to oldest:



      cat index3 index2 index1 | sort -u -k 2,2


      When I test this, it does seem to preserve the lines from index3 containing addresses that also appear in index2 and index1, while removing those duplicate references coming from index2 and index1.



      But will that always be the case? The sort man page is vague about this:



      -u --unique output only the first of an equal run


      I don't know enough about GNU sort's algorithms to predict whether lines with matching keys will always sort into the same order in which their source files were concatenated (e.g. the order they appear in the source stream). But I do know that sort algorithms don't always work in a linear fashion. That's why I'm looking for clarification of what sort's documentation seems to imply.







      sort merge indexing key






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 22 at 6:02









      tasket

      707




      707




















          1 Answer
          1






          active

          oldest

          votes

















          up vote
          3
          down vote



          accepted










          sort does not guarantee the order of lines that are equal for its purposes, unless you explicitly request this with the -s switch (--stable: stabilize sort by disabling last-resort comparison) – a stable sort algorithm is one that does not change the order of equal items.



          However, the info page informs us that -u "also disables the default last-resort comparison", so yeah, you should be fine, but it's entirely not obvious from the manpage.






          share|improve this answer




















          • What this also taught me: Always check info on GNU stuff. Thanks!
            – tasket
            Nov 22 at 18:41










          Your Answer








          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "106"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          convertImagesToLinks: false,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          imageUploader:
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          ,
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );













           

          draft saved


          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f483363%2fdoes-sort-unique-k-drop-duplicates-in-original-order%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes








          up vote
          3
          down vote



          accepted










          sort does not guarantee the order of lines that are equal for its purposes, unless you explicitly request this with the -s switch (--stable: stabilize sort by disabling last-resort comparison) – a stable sort algorithm is one that does not change the order of equal items.



          However, the info page informs us that -u "also disables the default last-resort comparison", so yeah, you should be fine, but it's entirely not obvious from the manpage.






          share|improve this answer




















          • What this also taught me: Always check info on GNU stuff. Thanks!
            – tasket
            Nov 22 at 18:41














          up vote
          3
          down vote



          accepted










          sort does not guarantee the order of lines that are equal for its purposes, unless you explicitly request this with the -s switch (--stable: stabilize sort by disabling last-resort comparison) – a stable sort algorithm is one that does not change the order of equal items.



          However, the info page informs us that -u "also disables the default last-resort comparison", so yeah, you should be fine, but it's entirely not obvious from the manpage.






          share|improve this answer




















          • What this also taught me: Always check info on GNU stuff. Thanks!
            – tasket
            Nov 22 at 18:41












          up vote
          3
          down vote



          accepted







          up vote
          3
          down vote



          accepted






          sort does not guarantee the order of lines that are equal for its purposes, unless you explicitly request this with the -s switch (--stable: stabilize sort by disabling last-resort comparison) – a stable sort algorithm is one that does not change the order of equal items.



          However, the info page informs us that -u "also disables the default last-resort comparison", so yeah, you should be fine, but it's entirely not obvious from the manpage.






          share|improve this answer












          sort does not guarantee the order of lines that are equal for its purposes, unless you explicitly request this with the -s switch (--stable: stabilize sort by disabling last-resort comparison) – a stable sort algorithm is one that does not change the order of equal items.



          However, the info page informs us that -u "also disables the default last-resort comparison", so yeah, you should be fine, but it's entirely not obvious from the manpage.







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Nov 22 at 6:35









          Ulrich Schwarz

          9,43012744




          9,43012744











          • What this also taught me: Always check info on GNU stuff. Thanks!
            – tasket
            Nov 22 at 18:41
















          • What this also taught me: Always check info on GNU stuff. Thanks!
            – tasket
            Nov 22 at 18:41















          What this also taught me: Always check info on GNU stuff. Thanks!
          – tasket
          Nov 22 at 18:41




          What this also taught me: Always check info on GNU stuff. Thanks!
          – tasket
          Nov 22 at 18:41

















           

          draft saved


          draft discarded















































           


          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f483363%2fdoes-sort-unique-k-drop-duplicates-in-original-order%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown






          Popular posts from this blog

          How to check contact read email or not when send email to Individual?

          How many registers does an x86_64 CPU actually have?

          Nur Jahan