How to sort a file by duration column?

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
2
down vote

favorite
1












How to sort a file containing below? (s=second, h=hour, d=day m=minute)



1s
2s
1h
2h
1m
2m
2s
1d
1m






share|improve this question


















  • 3




    Are there always only whole numbers, nothing like 1h30m40s or 1.30h?
    – jimmij
    Oct 15 '17 at 10:02














up vote
2
down vote

favorite
1












How to sort a file containing below? (s=second, h=hour, d=day m=minute)



1s
2s
1h
2h
1m
2m
2s
1d
1m






share|improve this question


















  • 3




    Are there always only whole numbers, nothing like 1h30m40s or 1.30h?
    – jimmij
    Oct 15 '17 at 10:02












up vote
2
down vote

favorite
1









up vote
2
down vote

favorite
1






1





How to sort a file containing below? (s=second, h=hour, d=day m=minute)



1s
2s
1h
2h
1m
2m
2s
1d
1m






share|improve this question














How to sort a file containing below? (s=second, h=hour, d=day m=minute)



1s
2s
1h
2h
1m
2m
2s
1d
1m








share|improve this question













share|improve this question




share|improve this question








edited Oct 15 '17 at 18:58









GAD3R

22.7k154895




22.7k154895










asked Oct 15 '17 at 9:22









mert inan

1525




1525







  • 3




    Are there always only whole numbers, nothing like 1h30m40s or 1.30h?
    – jimmij
    Oct 15 '17 at 10:02












  • 3




    Are there always only whole numbers, nothing like 1h30m40s or 1.30h?
    – jimmij
    Oct 15 '17 at 10:02







3




3




Are there always only whole numbers, nothing like 1h30m40s or 1.30h?
– jimmij
Oct 15 '17 at 10:02




Are there always only whole numbers, nothing like 1h30m40s or 1.30h?
– jimmij
Oct 15 '17 at 10:02










5 Answers
5






active

oldest

votes

















up vote
5
down vote



accepted










awk ' unitvalue=$1; ; 
/s/ m=1 ; /m/ m=60 ; /h/ m=3600 ; /d/ m=86400 ;
sub("[smhd]","",unitvalue); unitvalue=unitvalue*m;
print unitvalue " " $1; ' input |
sort -n | awk ' print $2 '
1s
2s
2s
1m
1m
2m
1h
2h
1d





share|improve this answer



























    up vote
    4
    down vote













    First version - FPAT is used



    gawk '
    BEGIN [smhd]";

    /s/ factor = 1
    /m/ factor = 60
    /h/ factor = 3600
    /d/ factor = 86400

    print $1 * factor, $0;
    ' input.txt | sort -n | awk 'print $2'



    FPAT - A regular expression describing the contents of the fields
    in a record. When set, gawk
    parses the input into fields, where the fields match the regular expression, instead of
    using the value of the FS variable as the field separator.




    Second version



    I was surprised to discover, that without FPAT it also works.
    It is caused the number conversion mechanism of awk - How awk Converts Between Strings and Numbers, namely:




    A string is converted to a number by interpreting any numeric prefix of the string as numerals: "2.5" converts to 2.5, "1e3" converts to 1,000, and "25fix" has a numeric value of 25. Strings that can’t be interpreted as valid numbers convert to zero.




    gawk '
    /s/ factor = 1
    /m/ factor = 60
    /h/ factor = 3600
    /d/ factor = 86400

    print $0 * factor, $0;
    ' input.txt | sort -n | awk 'print $2'


    Input (changed a little bit)



    1s
    122s
    1h
    2h
    1m
    2m
    2s
    1d
    1m


    Output



    Note: 122 seconds more than 2 minutes, so it sorted after 2m.



    1s
    2s
    1m
    1m
    2m
    122s
    1h
    2h
    1d





    share|improve this answer


















    • 1




      +1 I like the clever use of FPAT. This could easily be expanded to accept and handle time values like 1d3h10m40s.
      – David Foerster
      Oct 15 '17 at 16:44











    • @DavidFoerster I looked to your awk answer and discovered interesting fact: strings like 1s, 3d, 4m converting to the integer by awk itself, without problems. So, they can be used for math operations directly - without splitting by regex. I was added second version of the solution and an explanation of this behaviour too.
      – MiniMax
      Oct 15 '17 at 19:42


















    up vote
    2
    down vote













    If you only have times in the format of your question:



    sort -k 1.2,1.2 -k 1.1,1.1 <file>



    Where <file> is the file your data resides in. This command sorts on the second letter (ascending) and then sorts on the first letter (ascending). This works because it just so happes that the ordering of the letters for the time units (d > h > m > s) is exactly the order we want (day > hours > minutes > seconds).






    share|improve this answer
















    • 1




      Just like the other (now deleted) answer, this assumes durations are single-digit...
      – don_crissti
      Oct 15 '17 at 10:59










    • @don_crissti I answered the question because in the worst case I can just delete and in the best case this is exactly what he was looking for. I thought this was a better approach than waiting for an edit of the question (which potentially takes a long time, so by then the question might be lost).
      – PawkyPenguin
      Oct 15 '17 at 11:32


















    up vote
    2
    down vote













    This an extension of MiniMax’ answer that can handle a broader range of duration value like 1d3h10m40s.



    GNU Awk program (stored in parse-times.awk for the sake of this answer):



    #!/usr/bin/gawk -f
    BEGIN
    FPAT = "[0-9]+[dhms]";
    duration["s"] = 1;
    duration["m"] = 60;
    duration["h"] = duration["m"] * 60;
    duration["d"] = duration["h"] * 24;



    t=0;
    for (i=1; i<=NF; i++)
    t += $i * duration[substr($i, length($i))];
    print(t, $0);



    Invocation:



    gawk -f parse-times.awk input.txt | sort -n -k 1,1 | cut -d ' ' -f 2





    share|improve this answer



























      up vote
      1
      down vote













      Solution in Python 3:



      #!/usr/bin/python3
      import re, fileinput

      class RegexMatchIterator:
      def __init__(self, regex, string, error_on_incomplete=False):
      self.regex = regex
      self.string = string
      self.error_on_incomplete = error_on_incomplete
      self.pos = 0

      def __iter__(self):
      return self

      def __next__(self):
      match = self.regex.match(self.string, self.pos)
      if match is not None:
      if match.end() > self.pos:
      self.pos = match.end()
      return match
      else:
      fmt = '0!s returns an empty match at position 1:d for "3!r"'

      elif self.error_on_incomplete and self.pos < len(self.string):
      if isinstance(self.error_on_incomplete, str):
      fmt = self.error_on_incomplete
      else:
      fmt = '0!s didn't match the suffix 3!r at position 1:d of 2!r'

      else:
      raise StopIteration(self.pos)

      raise ValueError(fmt.format(
      self.regex, self.pos, self.string, self.string[self.pos:]))


      DURATION_SUFFIXES = 's': 1, 'm': 60, 'h': 3600, 'd': 24*3600
      DURATION_PATTERN = re.compile(
      '(\d+)(' + '|'.join(map(re.escape, DURATION_SUFFIXES.keys())) + ')')

      def parse_duration(s):
      return sum(
      int(m.group(1)) * DURATION_SUFFIXES[m.group(2)]
      for m in RegexMatchIterator(DURATION_PATTERN, s,
      'Illegal duration string 3!r at position 1:d'))


      if __name__ == '__main__':
      with fileinput.input() as f:
      result = sorted((l.rstrip('n') for l in f), key=parse_duration)
      for item in result:
      print(item)


      As you can see I spent about ⅔ of the line count towards a useful iterator over regex.match() results because regex.finditer() doesn't tie matches to the beginning of the current region and there are no other suitable ways to iterate over match results. *grrr*






      share|improve this answer






















        Your Answer







        StackExchange.ready(function()
        var channelOptions =
        tags: "".split(" "),
        id: "106"
        ;
        initTagRenderer("".split(" "), "".split(" "), channelOptions);

        StackExchange.using("externalEditor", function()
        // Have to fire editor after snippets, if snippets enabled
        if (StackExchange.settings.snippets.snippetsEnabled)
        StackExchange.using("snippets", function()
        createEditor();
        );

        else
        createEditor();

        );

        function createEditor()
        StackExchange.prepareEditor(
        heartbeatType: 'answer',
        convertImagesToLinks: false,
        noModals: false,
        showLowRepImageUploadWarning: true,
        reputationToPostImages: null,
        bindNavPrevention: true,
        postfix: "",
        onDemand: true,
        discardSelector: ".discard-answer"
        ,immediatelyShowMarkdownHelp:true
        );



        );













         

        draft saved


        draft discarded


















        StackExchange.ready(
        function ()
        StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f398212%2fhow-to-sort-a-file-by-duration-column%23new-answer', 'question_page');

        );

        Post as a guest






























        5 Answers
        5






        active

        oldest

        votes








        5 Answers
        5






        active

        oldest

        votes









        active

        oldest

        votes






        active

        oldest

        votes








        up vote
        5
        down vote



        accepted










        awk ' unitvalue=$1; ; 
        /s/ m=1 ; /m/ m=60 ; /h/ m=3600 ; /d/ m=86400 ;
        sub("[smhd]","",unitvalue); unitvalue=unitvalue*m;
        print unitvalue " " $1; ' input |
        sort -n | awk ' print $2 '
        1s
        2s
        2s
        1m
        1m
        2m
        1h
        2h
        1d





        share|improve this answer
























          up vote
          5
          down vote



          accepted










          awk ' unitvalue=$1; ; 
          /s/ m=1 ; /m/ m=60 ; /h/ m=3600 ; /d/ m=86400 ;
          sub("[smhd]","",unitvalue); unitvalue=unitvalue*m;
          print unitvalue " " $1; ' input |
          sort -n | awk ' print $2 '
          1s
          2s
          2s
          1m
          1m
          2m
          1h
          2h
          1d





          share|improve this answer






















            up vote
            5
            down vote



            accepted







            up vote
            5
            down vote



            accepted






            awk ' unitvalue=$1; ; 
            /s/ m=1 ; /m/ m=60 ; /h/ m=3600 ; /d/ m=86400 ;
            sub("[smhd]","",unitvalue); unitvalue=unitvalue*m;
            print unitvalue " " $1; ' input |
            sort -n | awk ' print $2 '
            1s
            2s
            2s
            1m
            1m
            2m
            1h
            2h
            1d





            share|improve this answer












            awk ' unitvalue=$1; ; 
            /s/ m=1 ; /m/ m=60 ; /h/ m=3600 ; /d/ m=86400 ;
            sub("[smhd]","",unitvalue); unitvalue=unitvalue*m;
            print unitvalue " " $1; ' input |
            sort -n | awk ' print $2 '
            1s
            2s
            2s
            1m
            1m
            2m
            1h
            2h
            1d






            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered Oct 15 '17 at 11:03









            Hauke Laging

            53.6k1282130




            53.6k1282130






















                up vote
                4
                down vote













                First version - FPAT is used



                gawk '
                BEGIN [smhd]";

                /s/ factor = 1
                /m/ factor = 60
                /h/ factor = 3600
                /d/ factor = 86400

                print $1 * factor, $0;
                ' input.txt | sort -n | awk 'print $2'



                FPAT - A regular expression describing the contents of the fields
                in a record. When set, gawk
                parses the input into fields, where the fields match the regular expression, instead of
                using the value of the FS variable as the field separator.




                Second version



                I was surprised to discover, that without FPAT it also works.
                It is caused the number conversion mechanism of awk - How awk Converts Between Strings and Numbers, namely:




                A string is converted to a number by interpreting any numeric prefix of the string as numerals: "2.5" converts to 2.5, "1e3" converts to 1,000, and "25fix" has a numeric value of 25. Strings that can’t be interpreted as valid numbers convert to zero.




                gawk '
                /s/ factor = 1
                /m/ factor = 60
                /h/ factor = 3600
                /d/ factor = 86400

                print $0 * factor, $0;
                ' input.txt | sort -n | awk 'print $2'


                Input (changed a little bit)



                1s
                122s
                1h
                2h
                1m
                2m
                2s
                1d
                1m


                Output



                Note: 122 seconds more than 2 minutes, so it sorted after 2m.



                1s
                2s
                1m
                1m
                2m
                122s
                1h
                2h
                1d





                share|improve this answer


















                • 1




                  +1 I like the clever use of FPAT. This could easily be expanded to accept and handle time values like 1d3h10m40s.
                  – David Foerster
                  Oct 15 '17 at 16:44











                • @DavidFoerster I looked to your awk answer and discovered interesting fact: strings like 1s, 3d, 4m converting to the integer by awk itself, without problems. So, they can be used for math operations directly - without splitting by regex. I was added second version of the solution and an explanation of this behaviour too.
                  – MiniMax
                  Oct 15 '17 at 19:42















                up vote
                4
                down vote













                First version - FPAT is used



                gawk '
                BEGIN [smhd]";

                /s/ factor = 1
                /m/ factor = 60
                /h/ factor = 3600
                /d/ factor = 86400

                print $1 * factor, $0;
                ' input.txt | sort -n | awk 'print $2'



                FPAT - A regular expression describing the contents of the fields
                in a record. When set, gawk
                parses the input into fields, where the fields match the regular expression, instead of
                using the value of the FS variable as the field separator.




                Second version



                I was surprised to discover, that without FPAT it also works.
                It is caused the number conversion mechanism of awk - How awk Converts Between Strings and Numbers, namely:




                A string is converted to a number by interpreting any numeric prefix of the string as numerals: "2.5" converts to 2.5, "1e3" converts to 1,000, and "25fix" has a numeric value of 25. Strings that can’t be interpreted as valid numbers convert to zero.




                gawk '
                /s/ factor = 1
                /m/ factor = 60
                /h/ factor = 3600
                /d/ factor = 86400

                print $0 * factor, $0;
                ' input.txt | sort -n | awk 'print $2'


                Input (changed a little bit)



                1s
                122s
                1h
                2h
                1m
                2m
                2s
                1d
                1m


                Output



                Note: 122 seconds more than 2 minutes, so it sorted after 2m.



                1s
                2s
                1m
                1m
                2m
                122s
                1h
                2h
                1d





                share|improve this answer


















                • 1




                  +1 I like the clever use of FPAT. This could easily be expanded to accept and handle time values like 1d3h10m40s.
                  – David Foerster
                  Oct 15 '17 at 16:44











                • @DavidFoerster I looked to your awk answer and discovered interesting fact: strings like 1s, 3d, 4m converting to the integer by awk itself, without problems. So, they can be used for math operations directly - without splitting by regex. I was added second version of the solution and an explanation of this behaviour too.
                  – MiniMax
                  Oct 15 '17 at 19:42













                up vote
                4
                down vote










                up vote
                4
                down vote









                First version - FPAT is used



                gawk '
                BEGIN [smhd]";

                /s/ factor = 1
                /m/ factor = 60
                /h/ factor = 3600
                /d/ factor = 86400

                print $1 * factor, $0;
                ' input.txt | sort -n | awk 'print $2'



                FPAT - A regular expression describing the contents of the fields
                in a record. When set, gawk
                parses the input into fields, where the fields match the regular expression, instead of
                using the value of the FS variable as the field separator.




                Second version



                I was surprised to discover, that without FPAT it also works.
                It is caused the number conversion mechanism of awk - How awk Converts Between Strings and Numbers, namely:




                A string is converted to a number by interpreting any numeric prefix of the string as numerals: "2.5" converts to 2.5, "1e3" converts to 1,000, and "25fix" has a numeric value of 25. Strings that can’t be interpreted as valid numbers convert to zero.




                gawk '
                /s/ factor = 1
                /m/ factor = 60
                /h/ factor = 3600
                /d/ factor = 86400

                print $0 * factor, $0;
                ' input.txt | sort -n | awk 'print $2'


                Input (changed a little bit)



                1s
                122s
                1h
                2h
                1m
                2m
                2s
                1d
                1m


                Output



                Note: 122 seconds more than 2 minutes, so it sorted after 2m.



                1s
                2s
                1m
                1m
                2m
                122s
                1h
                2h
                1d





                share|improve this answer














                First version - FPAT is used



                gawk '
                BEGIN [smhd]";

                /s/ factor = 1
                /m/ factor = 60
                /h/ factor = 3600
                /d/ factor = 86400

                print $1 * factor, $0;
                ' input.txt | sort -n | awk 'print $2'



                FPAT - A regular expression describing the contents of the fields
                in a record. When set, gawk
                parses the input into fields, where the fields match the regular expression, instead of
                using the value of the FS variable as the field separator.




                Second version



                I was surprised to discover, that without FPAT it also works.
                It is caused the number conversion mechanism of awk - How awk Converts Between Strings and Numbers, namely:




                A string is converted to a number by interpreting any numeric prefix of the string as numerals: "2.5" converts to 2.5, "1e3" converts to 1,000, and "25fix" has a numeric value of 25. Strings that can’t be interpreted as valid numbers convert to zero.




                gawk '
                /s/ factor = 1
                /m/ factor = 60
                /h/ factor = 3600
                /d/ factor = 86400

                print $0 * factor, $0;
                ' input.txt | sort -n | awk 'print $2'


                Input (changed a little bit)



                1s
                122s
                1h
                2h
                1m
                2m
                2s
                1d
                1m


                Output



                Note: 122 seconds more than 2 minutes, so it sorted after 2m.



                1s
                2s
                1m
                1m
                2m
                122s
                1h
                2h
                1d






                share|improve this answer














                share|improve this answer



                share|improve this answer








                edited Oct 15 '17 at 19:34

























                answered Oct 15 '17 at 14:44









                MiniMax

                2,706719




                2,706719







                • 1




                  +1 I like the clever use of FPAT. This could easily be expanded to accept and handle time values like 1d3h10m40s.
                  – David Foerster
                  Oct 15 '17 at 16:44











                • @DavidFoerster I looked to your awk answer and discovered interesting fact: strings like 1s, 3d, 4m converting to the integer by awk itself, without problems. So, they can be used for math operations directly - without splitting by regex. I was added second version of the solution and an explanation of this behaviour too.
                  – MiniMax
                  Oct 15 '17 at 19:42













                • 1




                  +1 I like the clever use of FPAT. This could easily be expanded to accept and handle time values like 1d3h10m40s.
                  – David Foerster
                  Oct 15 '17 at 16:44











                • @DavidFoerster I looked to your awk answer and discovered interesting fact: strings like 1s, 3d, 4m converting to the integer by awk itself, without problems. So, they can be used for math operations directly - without splitting by regex. I was added second version of the solution and an explanation of this behaviour too.
                  – MiniMax
                  Oct 15 '17 at 19:42








                1




                1




                +1 I like the clever use of FPAT. This could easily be expanded to accept and handle time values like 1d3h10m40s.
                – David Foerster
                Oct 15 '17 at 16:44





                +1 I like the clever use of FPAT. This could easily be expanded to accept and handle time values like 1d3h10m40s.
                – David Foerster
                Oct 15 '17 at 16:44













                @DavidFoerster I looked to your awk answer and discovered interesting fact: strings like 1s, 3d, 4m converting to the integer by awk itself, without problems. So, they can be used for math operations directly - without splitting by regex. I was added second version of the solution and an explanation of this behaviour too.
                – MiniMax
                Oct 15 '17 at 19:42





                @DavidFoerster I looked to your awk answer and discovered interesting fact: strings like 1s, 3d, 4m converting to the integer by awk itself, without problems. So, they can be used for math operations directly - without splitting by regex. I was added second version of the solution and an explanation of this behaviour too.
                – MiniMax
                Oct 15 '17 at 19:42











                up vote
                2
                down vote













                If you only have times in the format of your question:



                sort -k 1.2,1.2 -k 1.1,1.1 <file>



                Where <file> is the file your data resides in. This command sorts on the second letter (ascending) and then sorts on the first letter (ascending). This works because it just so happes that the ordering of the letters for the time units (d > h > m > s) is exactly the order we want (day > hours > minutes > seconds).






                share|improve this answer
















                • 1




                  Just like the other (now deleted) answer, this assumes durations are single-digit...
                  – don_crissti
                  Oct 15 '17 at 10:59










                • @don_crissti I answered the question because in the worst case I can just delete and in the best case this is exactly what he was looking for. I thought this was a better approach than waiting for an edit of the question (which potentially takes a long time, so by then the question might be lost).
                  – PawkyPenguin
                  Oct 15 '17 at 11:32















                up vote
                2
                down vote













                If you only have times in the format of your question:



                sort -k 1.2,1.2 -k 1.1,1.1 <file>



                Where <file> is the file your data resides in. This command sorts on the second letter (ascending) and then sorts on the first letter (ascending). This works because it just so happes that the ordering of the letters for the time units (d > h > m > s) is exactly the order we want (day > hours > minutes > seconds).






                share|improve this answer
















                • 1




                  Just like the other (now deleted) answer, this assumes durations are single-digit...
                  – don_crissti
                  Oct 15 '17 at 10:59










                • @don_crissti I answered the question because in the worst case I can just delete and in the best case this is exactly what he was looking for. I thought this was a better approach than waiting for an edit of the question (which potentially takes a long time, so by then the question might be lost).
                  – PawkyPenguin
                  Oct 15 '17 at 11:32













                up vote
                2
                down vote










                up vote
                2
                down vote









                If you only have times in the format of your question:



                sort -k 1.2,1.2 -k 1.1,1.1 <file>



                Where <file> is the file your data resides in. This command sorts on the second letter (ascending) and then sorts on the first letter (ascending). This works because it just so happes that the ordering of the letters for the time units (d > h > m > s) is exactly the order we want (day > hours > minutes > seconds).






                share|improve this answer












                If you only have times in the format of your question:



                sort -k 1.2,1.2 -k 1.1,1.1 <file>



                Where <file> is the file your data resides in. This command sorts on the second letter (ascending) and then sorts on the first letter (ascending). This works because it just so happes that the ordering of the letters for the time units (d > h > m > s) is exactly the order we want (day > hours > minutes > seconds).







                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Oct 15 '17 at 10:58









                PawkyPenguin

                696110




                696110







                • 1




                  Just like the other (now deleted) answer, this assumes durations are single-digit...
                  – don_crissti
                  Oct 15 '17 at 10:59










                • @don_crissti I answered the question because in the worst case I can just delete and in the best case this is exactly what he was looking for. I thought this was a better approach than waiting for an edit of the question (which potentially takes a long time, so by then the question might be lost).
                  – PawkyPenguin
                  Oct 15 '17 at 11:32













                • 1




                  Just like the other (now deleted) answer, this assumes durations are single-digit...
                  – don_crissti
                  Oct 15 '17 at 10:59










                • @don_crissti I answered the question because in the worst case I can just delete and in the best case this is exactly what he was looking for. I thought this was a better approach than waiting for an edit of the question (which potentially takes a long time, so by then the question might be lost).
                  – PawkyPenguin
                  Oct 15 '17 at 11:32








                1




                1




                Just like the other (now deleted) answer, this assumes durations are single-digit...
                – don_crissti
                Oct 15 '17 at 10:59




                Just like the other (now deleted) answer, this assumes durations are single-digit...
                – don_crissti
                Oct 15 '17 at 10:59












                @don_crissti I answered the question because in the worst case I can just delete and in the best case this is exactly what he was looking for. I thought this was a better approach than waiting for an edit of the question (which potentially takes a long time, so by then the question might be lost).
                – PawkyPenguin
                Oct 15 '17 at 11:32





                @don_crissti I answered the question because in the worst case I can just delete and in the best case this is exactly what he was looking for. I thought this was a better approach than waiting for an edit of the question (which potentially takes a long time, so by then the question might be lost).
                – PawkyPenguin
                Oct 15 '17 at 11:32











                up vote
                2
                down vote













                This an extension of MiniMax’ answer that can handle a broader range of duration value like 1d3h10m40s.



                GNU Awk program (stored in parse-times.awk for the sake of this answer):



                #!/usr/bin/gawk -f
                BEGIN
                FPAT = "[0-9]+[dhms]";
                duration["s"] = 1;
                duration["m"] = 60;
                duration["h"] = duration["m"] * 60;
                duration["d"] = duration["h"] * 24;



                t=0;
                for (i=1; i<=NF; i++)
                t += $i * duration[substr($i, length($i))];
                print(t, $0);



                Invocation:



                gawk -f parse-times.awk input.txt | sort -n -k 1,1 | cut -d ' ' -f 2





                share|improve this answer
























                  up vote
                  2
                  down vote













                  This an extension of MiniMax’ answer that can handle a broader range of duration value like 1d3h10m40s.



                  GNU Awk program (stored in parse-times.awk for the sake of this answer):



                  #!/usr/bin/gawk -f
                  BEGIN
                  FPAT = "[0-9]+[dhms]";
                  duration["s"] = 1;
                  duration["m"] = 60;
                  duration["h"] = duration["m"] * 60;
                  duration["d"] = duration["h"] * 24;



                  t=0;
                  for (i=1; i<=NF; i++)
                  t += $i * duration[substr($i, length($i))];
                  print(t, $0);



                  Invocation:



                  gawk -f parse-times.awk input.txt | sort -n -k 1,1 | cut -d ' ' -f 2





                  share|improve this answer






















                    up vote
                    2
                    down vote










                    up vote
                    2
                    down vote









                    This an extension of MiniMax’ answer that can handle a broader range of duration value like 1d3h10m40s.



                    GNU Awk program (stored in parse-times.awk for the sake of this answer):



                    #!/usr/bin/gawk -f
                    BEGIN
                    FPAT = "[0-9]+[dhms]";
                    duration["s"] = 1;
                    duration["m"] = 60;
                    duration["h"] = duration["m"] * 60;
                    duration["d"] = duration["h"] * 24;



                    t=0;
                    for (i=1; i<=NF; i++)
                    t += $i * duration[substr($i, length($i))];
                    print(t, $0);



                    Invocation:



                    gawk -f parse-times.awk input.txt | sort -n -k 1,1 | cut -d ' ' -f 2





                    share|improve this answer












                    This an extension of MiniMax’ answer that can handle a broader range of duration value like 1d3h10m40s.



                    GNU Awk program (stored in parse-times.awk for the sake of this answer):



                    #!/usr/bin/gawk -f
                    BEGIN
                    FPAT = "[0-9]+[dhms]";
                    duration["s"] = 1;
                    duration["m"] = 60;
                    duration["h"] = duration["m"] * 60;
                    duration["d"] = duration["h"] * 24;



                    t=0;
                    for (i=1; i<=NF; i++)
                    t += $i * duration[substr($i, length($i))];
                    print(t, $0);



                    Invocation:



                    gawk -f parse-times.awk input.txt | sort -n -k 1,1 | cut -d ' ' -f 2






                    share|improve this answer












                    share|improve this answer



                    share|improve this answer










                    answered Oct 15 '17 at 17:04









                    David Foerster

                    917616




                    917616




















                        up vote
                        1
                        down vote













                        Solution in Python 3:



                        #!/usr/bin/python3
                        import re, fileinput

                        class RegexMatchIterator:
                        def __init__(self, regex, string, error_on_incomplete=False):
                        self.regex = regex
                        self.string = string
                        self.error_on_incomplete = error_on_incomplete
                        self.pos = 0

                        def __iter__(self):
                        return self

                        def __next__(self):
                        match = self.regex.match(self.string, self.pos)
                        if match is not None:
                        if match.end() > self.pos:
                        self.pos = match.end()
                        return match
                        else:
                        fmt = '0!s returns an empty match at position 1:d for "3!r"'

                        elif self.error_on_incomplete and self.pos < len(self.string):
                        if isinstance(self.error_on_incomplete, str):
                        fmt = self.error_on_incomplete
                        else:
                        fmt = '0!s didn't match the suffix 3!r at position 1:d of 2!r'

                        else:
                        raise StopIteration(self.pos)

                        raise ValueError(fmt.format(
                        self.regex, self.pos, self.string, self.string[self.pos:]))


                        DURATION_SUFFIXES = 's': 1, 'm': 60, 'h': 3600, 'd': 24*3600
                        DURATION_PATTERN = re.compile(
                        '(\d+)(' + '|'.join(map(re.escape, DURATION_SUFFIXES.keys())) + ')')

                        def parse_duration(s):
                        return sum(
                        int(m.group(1)) * DURATION_SUFFIXES[m.group(2)]
                        for m in RegexMatchIterator(DURATION_PATTERN, s,
                        'Illegal duration string 3!r at position 1:d'))


                        if __name__ == '__main__':
                        with fileinput.input() as f:
                        result = sorted((l.rstrip('n') for l in f), key=parse_duration)
                        for item in result:
                        print(item)


                        As you can see I spent about ⅔ of the line count towards a useful iterator over regex.match() results because regex.finditer() doesn't tie matches to the beginning of the current region and there are no other suitable ways to iterate over match results. *grrr*






                        share|improve this answer


























                          up vote
                          1
                          down vote













                          Solution in Python 3:



                          #!/usr/bin/python3
                          import re, fileinput

                          class RegexMatchIterator:
                          def __init__(self, regex, string, error_on_incomplete=False):
                          self.regex = regex
                          self.string = string
                          self.error_on_incomplete = error_on_incomplete
                          self.pos = 0

                          def __iter__(self):
                          return self

                          def __next__(self):
                          match = self.regex.match(self.string, self.pos)
                          if match is not None:
                          if match.end() > self.pos:
                          self.pos = match.end()
                          return match
                          else:
                          fmt = '0!s returns an empty match at position 1:d for "3!r"'

                          elif self.error_on_incomplete and self.pos < len(self.string):
                          if isinstance(self.error_on_incomplete, str):
                          fmt = self.error_on_incomplete
                          else:
                          fmt = '0!s didn't match the suffix 3!r at position 1:d of 2!r'

                          else:
                          raise StopIteration(self.pos)

                          raise ValueError(fmt.format(
                          self.regex, self.pos, self.string, self.string[self.pos:]))


                          DURATION_SUFFIXES = 's': 1, 'm': 60, 'h': 3600, 'd': 24*3600
                          DURATION_PATTERN = re.compile(
                          '(\d+)(' + '|'.join(map(re.escape, DURATION_SUFFIXES.keys())) + ')')

                          def parse_duration(s):
                          return sum(
                          int(m.group(1)) * DURATION_SUFFIXES[m.group(2)]
                          for m in RegexMatchIterator(DURATION_PATTERN, s,
                          'Illegal duration string 3!r at position 1:d'))


                          if __name__ == '__main__':
                          with fileinput.input() as f:
                          result = sorted((l.rstrip('n') for l in f), key=parse_duration)
                          for item in result:
                          print(item)


                          As you can see I spent about ⅔ of the line count towards a useful iterator over regex.match() results because regex.finditer() doesn't tie matches to the beginning of the current region and there are no other suitable ways to iterate over match results. *grrr*






                          share|improve this answer
























                            up vote
                            1
                            down vote










                            up vote
                            1
                            down vote









                            Solution in Python 3:



                            #!/usr/bin/python3
                            import re, fileinput

                            class RegexMatchIterator:
                            def __init__(self, regex, string, error_on_incomplete=False):
                            self.regex = regex
                            self.string = string
                            self.error_on_incomplete = error_on_incomplete
                            self.pos = 0

                            def __iter__(self):
                            return self

                            def __next__(self):
                            match = self.regex.match(self.string, self.pos)
                            if match is not None:
                            if match.end() > self.pos:
                            self.pos = match.end()
                            return match
                            else:
                            fmt = '0!s returns an empty match at position 1:d for "3!r"'

                            elif self.error_on_incomplete and self.pos < len(self.string):
                            if isinstance(self.error_on_incomplete, str):
                            fmt = self.error_on_incomplete
                            else:
                            fmt = '0!s didn't match the suffix 3!r at position 1:d of 2!r'

                            else:
                            raise StopIteration(self.pos)

                            raise ValueError(fmt.format(
                            self.regex, self.pos, self.string, self.string[self.pos:]))


                            DURATION_SUFFIXES = 's': 1, 'm': 60, 'h': 3600, 'd': 24*3600
                            DURATION_PATTERN = re.compile(
                            '(\d+)(' + '|'.join(map(re.escape, DURATION_SUFFIXES.keys())) + ')')

                            def parse_duration(s):
                            return sum(
                            int(m.group(1)) * DURATION_SUFFIXES[m.group(2)]
                            for m in RegexMatchIterator(DURATION_PATTERN, s,
                            'Illegal duration string 3!r at position 1:d'))


                            if __name__ == '__main__':
                            with fileinput.input() as f:
                            result = sorted((l.rstrip('n') for l in f), key=parse_duration)
                            for item in result:
                            print(item)


                            As you can see I spent about ⅔ of the line count towards a useful iterator over regex.match() results because regex.finditer() doesn't tie matches to the beginning of the current region and there are no other suitable ways to iterate over match results. *grrr*






                            share|improve this answer














                            Solution in Python 3:



                            #!/usr/bin/python3
                            import re, fileinput

                            class RegexMatchIterator:
                            def __init__(self, regex, string, error_on_incomplete=False):
                            self.regex = regex
                            self.string = string
                            self.error_on_incomplete = error_on_incomplete
                            self.pos = 0

                            def __iter__(self):
                            return self

                            def __next__(self):
                            match = self.regex.match(self.string, self.pos)
                            if match is not None:
                            if match.end() > self.pos:
                            self.pos = match.end()
                            return match
                            else:
                            fmt = '0!s returns an empty match at position 1:d for "3!r"'

                            elif self.error_on_incomplete and self.pos < len(self.string):
                            if isinstance(self.error_on_incomplete, str):
                            fmt = self.error_on_incomplete
                            else:
                            fmt = '0!s didn't match the suffix 3!r at position 1:d of 2!r'

                            else:
                            raise StopIteration(self.pos)

                            raise ValueError(fmt.format(
                            self.regex, self.pos, self.string, self.string[self.pos:]))


                            DURATION_SUFFIXES = 's': 1, 'm': 60, 'h': 3600, 'd': 24*3600
                            DURATION_PATTERN = re.compile(
                            '(\d+)(' + '|'.join(map(re.escape, DURATION_SUFFIXES.keys())) + ')')

                            def parse_duration(s):
                            return sum(
                            int(m.group(1)) * DURATION_SUFFIXES[m.group(2)]
                            for m in RegexMatchIterator(DURATION_PATTERN, s,
                            'Illegal duration string 3!r at position 1:d'))


                            if __name__ == '__main__':
                            with fileinput.input() as f:
                            result = sorted((l.rstrip('n') for l in f), key=parse_duration)
                            for item in result:
                            print(item)


                            As you can see I spent about ⅔ of the line count towards a useful iterator over regex.match() results because regex.finditer() doesn't tie matches to the beginning of the current region and there are no other suitable ways to iterate over match results. *grrr*







                            share|improve this answer














                            share|improve this answer



                            share|improve this answer








                            edited Oct 15 '17 at 19:08

























                            answered Oct 15 '17 at 18:56









                            David Foerster

                            917616




                            917616



























                                 

                                draft saved


                                draft discarded















































                                 


                                draft saved


                                draft discarded














                                StackExchange.ready(
                                function ()
                                StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f398212%2fhow-to-sort-a-file-by-duration-column%23new-answer', 'question_page');

                                );

                                Post as a guest













































































                                Popular posts from this blog

                                How to check contact read email or not when send email to Individual?

                                Displaying single band from multi-band raster using QGIS

                                How many registers does an x86_64 CPU actually have?