How to sort a file by duration column?
Clash Royale CLAN TAG#URR8PPP
up vote
2
down vote
favorite
How to sort a file containing below? (s=second, h=hour, d=day m=minute)
1s
2s
1h
2h
1m
2m
2s
1d
1m
sort timestamps
add a comment |Â
up vote
2
down vote
favorite
How to sort a file containing below? (s=second, h=hour, d=day m=minute)
1s
2s
1h
2h
1m
2m
2s
1d
1m
sort timestamps
3
Are there always only whole numbers, nothing like1h30m40s
or1.30h
?
â jimmij
Oct 15 '17 at 10:02
add a comment |Â
up vote
2
down vote
favorite
up vote
2
down vote
favorite
How to sort a file containing below? (s=second, h=hour, d=day m=minute)
1s
2s
1h
2h
1m
2m
2s
1d
1m
sort timestamps
How to sort a file containing below? (s=second, h=hour, d=day m=minute)
1s
2s
1h
2h
1m
2m
2s
1d
1m
sort timestamps
edited Oct 15 '17 at 18:58
GAD3R
22.7k154895
22.7k154895
asked Oct 15 '17 at 9:22
mert inan
1525
1525
3
Are there always only whole numbers, nothing like1h30m40s
or1.30h
?
â jimmij
Oct 15 '17 at 10:02
add a comment |Â
3
Are there always only whole numbers, nothing like1h30m40s
or1.30h
?
â jimmij
Oct 15 '17 at 10:02
3
3
Are there always only whole numbers, nothing like
1h30m40s
or 1.30h
?â jimmij
Oct 15 '17 at 10:02
Are there always only whole numbers, nothing like
1h30m40s
or 1.30h
?â jimmij
Oct 15 '17 at 10:02
add a comment |Â
5 Answers
5
active
oldest
votes
up vote
5
down vote
accepted
awk ' unitvalue=$1; ;
/s/ m=1 ; /m/ m=60 ; /h/ m=3600 ; /d/ m=86400 ;
sub("[smhd]","",unitvalue); unitvalue=unitvalue*m;
print unitvalue " " $1; ' input |
sort -n | awk ' print $2 '
1s
2s
2s
1m
1m
2m
1h
2h
1d
add a comment |Â
up vote
4
down vote
First version - FPAT is used
gawk '
BEGIN [smhd]";
/s/ factor = 1
/m/ factor = 60
/h/ factor = 3600
/d/ factor = 86400
print $1 * factor, $0;
' input.txt | sort -n | awk 'print $2'
FPAT - A regular expression describing the contents of the fields
in a record. When set, gawk
parses the input into fields, where the fields match the regular expression, instead of
using the value of the FS variable as the field separator.
Second version
I was surprised to discover, that without FPAT
it also works.
It is caused the number conversion mechanism of awk
- How awk Converts Between Strings and Numbers, namely:
A string is converted to a number by interpreting any numeric prefix of the string as numerals: "2.5" converts to 2.5, "1e3" converts to 1,000, and "25fix" has a numeric value of 25. Strings that canâÂÂt be interpreted as valid numbers convert to zero.
gawk '
/s/ factor = 1
/m/ factor = 60
/h/ factor = 3600
/d/ factor = 86400
print $0 * factor, $0;
' input.txt | sort -n | awk 'print $2'
Input (changed a little bit)
1s
122s
1h
2h
1m
2m
2s
1d
1m
Output
Note: 122 seconds more than 2 minutes, so it sorted after 2m.
1s
2s
1m
1m
2m
122s
1h
2h
1d
1
+1 I like the clever use ofFPAT
. This could easily be expanded to accept and handle time values like1d3h10m40s
.
â David Foerster
Oct 15 '17 at 16:44
@DavidFoerster I looked to yourawk
answer and discovered interesting fact: strings like1s
,3d
,4m
converting to the integer byawk
itself, without problems. So, they can be used for math operations directly - without splitting by regex. I was added second version of the solution and an explanation of this behaviour too.
â MiniMax
Oct 15 '17 at 19:42
add a comment |Â
up vote
2
down vote
If you only have times in the format of your question:
sort -k 1.2,1.2 -k 1.1,1.1 <file>
Where <file>
is the file your data resides in. This command sorts on the second letter (ascending) and then sorts on the first letter (ascending). This works because it just so happes that the ordering of the letters for the time units (d > h > m > s) is exactly the order we want (day > hours > minutes > seconds).
1
Just like the other (now deleted) answer, this assumes durations are single-digit...
â don_crissti
Oct 15 '17 at 10:59
@don_crissti I answered the question because in the worst case I can just delete and in the best case this is exactly what he was looking for. I thought this was a better approach than waiting for an edit of the question (which potentially takes a long time, so by then the question might be lost).
â PawkyPenguin
Oct 15 '17 at 11:32
add a comment |Â
up vote
2
down vote
This an extension of MiniMaxâ answer that can handle a broader range of duration value like 1d3h10m40s
.
GNU Awk program (stored in parse-times.awk
for the sake of this answer):
#!/usr/bin/gawk -f
BEGIN
FPAT = "[0-9]+[dhms]";
duration["s"] = 1;
duration["m"] = 60;
duration["h"] = duration["m"] * 60;
duration["d"] = duration["h"] * 24;
t=0;
for (i=1; i<=NF; i++)
t += $i * duration[substr($i, length($i))];
print(t, $0);
Invocation:
gawk -f parse-times.awk input.txt | sort -n -k 1,1 | cut -d ' ' -f 2
add a comment |Â
up vote
1
down vote
Solution in Python 3:
#!/usr/bin/python3
import re, fileinput
class RegexMatchIterator:
def __init__(self, regex, string, error_on_incomplete=False):
self.regex = regex
self.string = string
self.error_on_incomplete = error_on_incomplete
self.pos = 0
def __iter__(self):
return self
def __next__(self):
match = self.regex.match(self.string, self.pos)
if match is not None:
if match.end() > self.pos:
self.pos = match.end()
return match
else:
fmt = '0!s returns an empty match at position 1:d for "3!r"'
elif self.error_on_incomplete and self.pos < len(self.string):
if isinstance(self.error_on_incomplete, str):
fmt = self.error_on_incomplete
else:
fmt = '0!s didn't match the suffix 3!r at position 1:d of 2!r'
else:
raise StopIteration(self.pos)
raise ValueError(fmt.format(
self.regex, self.pos, self.string, self.string[self.pos:]))
DURATION_SUFFIXES = 's': 1, 'm': 60, 'h': 3600, 'd': 24*3600
DURATION_PATTERN = re.compile(
'(\d+)(' + '|'.join(map(re.escape, DURATION_SUFFIXES.keys())) + ')')
def parse_duration(s):
return sum(
int(m.group(1)) * DURATION_SUFFIXES[m.group(2)]
for m in RegexMatchIterator(DURATION_PATTERN, s,
'Illegal duration string 3!r at position 1:d'))
if __name__ == '__main__':
with fileinput.input() as f:
result = sorted((l.rstrip('n') for l in f), key=parse_duration)
for item in result:
print(item)
As you can see I spent about âÂ
 of the line count towards a useful iterator over regex.match()
results because regex.finditer()
doesn't tie matches to the beginning of the current region and there are no other suitable ways to iterate over match results. *grrr*
add a comment |Â
5 Answers
5
active
oldest
votes
5 Answers
5
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
5
down vote
accepted
awk ' unitvalue=$1; ;
/s/ m=1 ; /m/ m=60 ; /h/ m=3600 ; /d/ m=86400 ;
sub("[smhd]","",unitvalue); unitvalue=unitvalue*m;
print unitvalue " " $1; ' input |
sort -n | awk ' print $2 '
1s
2s
2s
1m
1m
2m
1h
2h
1d
add a comment |Â
up vote
5
down vote
accepted
awk ' unitvalue=$1; ;
/s/ m=1 ; /m/ m=60 ; /h/ m=3600 ; /d/ m=86400 ;
sub("[smhd]","",unitvalue); unitvalue=unitvalue*m;
print unitvalue " " $1; ' input |
sort -n | awk ' print $2 '
1s
2s
2s
1m
1m
2m
1h
2h
1d
add a comment |Â
up vote
5
down vote
accepted
up vote
5
down vote
accepted
awk ' unitvalue=$1; ;
/s/ m=1 ; /m/ m=60 ; /h/ m=3600 ; /d/ m=86400 ;
sub("[smhd]","",unitvalue); unitvalue=unitvalue*m;
print unitvalue " " $1; ' input |
sort -n | awk ' print $2 '
1s
2s
2s
1m
1m
2m
1h
2h
1d
awk ' unitvalue=$1; ;
/s/ m=1 ; /m/ m=60 ; /h/ m=3600 ; /d/ m=86400 ;
sub("[smhd]","",unitvalue); unitvalue=unitvalue*m;
print unitvalue " " $1; ' input |
sort -n | awk ' print $2 '
1s
2s
2s
1m
1m
2m
1h
2h
1d
answered Oct 15 '17 at 11:03
Hauke Laging
53.6k1282130
53.6k1282130
add a comment |Â
add a comment |Â
up vote
4
down vote
First version - FPAT is used
gawk '
BEGIN [smhd]";
/s/ factor = 1
/m/ factor = 60
/h/ factor = 3600
/d/ factor = 86400
print $1 * factor, $0;
' input.txt | sort -n | awk 'print $2'
FPAT - A regular expression describing the contents of the fields
in a record. When set, gawk
parses the input into fields, where the fields match the regular expression, instead of
using the value of the FS variable as the field separator.
Second version
I was surprised to discover, that without FPAT
it also works.
It is caused the number conversion mechanism of awk
- How awk Converts Between Strings and Numbers, namely:
A string is converted to a number by interpreting any numeric prefix of the string as numerals: "2.5" converts to 2.5, "1e3" converts to 1,000, and "25fix" has a numeric value of 25. Strings that canâÂÂt be interpreted as valid numbers convert to zero.
gawk '
/s/ factor = 1
/m/ factor = 60
/h/ factor = 3600
/d/ factor = 86400
print $0 * factor, $0;
' input.txt | sort -n | awk 'print $2'
Input (changed a little bit)
1s
122s
1h
2h
1m
2m
2s
1d
1m
Output
Note: 122 seconds more than 2 minutes, so it sorted after 2m.
1s
2s
1m
1m
2m
122s
1h
2h
1d
1
+1 I like the clever use ofFPAT
. This could easily be expanded to accept and handle time values like1d3h10m40s
.
â David Foerster
Oct 15 '17 at 16:44
@DavidFoerster I looked to yourawk
answer and discovered interesting fact: strings like1s
,3d
,4m
converting to the integer byawk
itself, without problems. So, they can be used for math operations directly - without splitting by regex. I was added second version of the solution and an explanation of this behaviour too.
â MiniMax
Oct 15 '17 at 19:42
add a comment |Â
up vote
4
down vote
First version - FPAT is used
gawk '
BEGIN [smhd]";
/s/ factor = 1
/m/ factor = 60
/h/ factor = 3600
/d/ factor = 86400
print $1 * factor, $0;
' input.txt | sort -n | awk 'print $2'
FPAT - A regular expression describing the contents of the fields
in a record. When set, gawk
parses the input into fields, where the fields match the regular expression, instead of
using the value of the FS variable as the field separator.
Second version
I was surprised to discover, that without FPAT
it also works.
It is caused the number conversion mechanism of awk
- How awk Converts Between Strings and Numbers, namely:
A string is converted to a number by interpreting any numeric prefix of the string as numerals: "2.5" converts to 2.5, "1e3" converts to 1,000, and "25fix" has a numeric value of 25. Strings that canâÂÂt be interpreted as valid numbers convert to zero.
gawk '
/s/ factor = 1
/m/ factor = 60
/h/ factor = 3600
/d/ factor = 86400
print $0 * factor, $0;
' input.txt | sort -n | awk 'print $2'
Input (changed a little bit)
1s
122s
1h
2h
1m
2m
2s
1d
1m
Output
Note: 122 seconds more than 2 minutes, so it sorted after 2m.
1s
2s
1m
1m
2m
122s
1h
2h
1d
1
+1 I like the clever use ofFPAT
. This could easily be expanded to accept and handle time values like1d3h10m40s
.
â David Foerster
Oct 15 '17 at 16:44
@DavidFoerster I looked to yourawk
answer and discovered interesting fact: strings like1s
,3d
,4m
converting to the integer byawk
itself, without problems. So, they can be used for math operations directly - without splitting by regex. I was added second version of the solution and an explanation of this behaviour too.
â MiniMax
Oct 15 '17 at 19:42
add a comment |Â
up vote
4
down vote
up vote
4
down vote
First version - FPAT is used
gawk '
BEGIN [smhd]";
/s/ factor = 1
/m/ factor = 60
/h/ factor = 3600
/d/ factor = 86400
print $1 * factor, $0;
' input.txt | sort -n | awk 'print $2'
FPAT - A regular expression describing the contents of the fields
in a record. When set, gawk
parses the input into fields, where the fields match the regular expression, instead of
using the value of the FS variable as the field separator.
Second version
I was surprised to discover, that without FPAT
it also works.
It is caused the number conversion mechanism of awk
- How awk Converts Between Strings and Numbers, namely:
A string is converted to a number by interpreting any numeric prefix of the string as numerals: "2.5" converts to 2.5, "1e3" converts to 1,000, and "25fix" has a numeric value of 25. Strings that canâÂÂt be interpreted as valid numbers convert to zero.
gawk '
/s/ factor = 1
/m/ factor = 60
/h/ factor = 3600
/d/ factor = 86400
print $0 * factor, $0;
' input.txt | sort -n | awk 'print $2'
Input (changed a little bit)
1s
122s
1h
2h
1m
2m
2s
1d
1m
Output
Note: 122 seconds more than 2 minutes, so it sorted after 2m.
1s
2s
1m
1m
2m
122s
1h
2h
1d
First version - FPAT is used
gawk '
BEGIN [smhd]";
/s/ factor = 1
/m/ factor = 60
/h/ factor = 3600
/d/ factor = 86400
print $1 * factor, $0;
' input.txt | sort -n | awk 'print $2'
FPAT - A regular expression describing the contents of the fields
in a record. When set, gawk
parses the input into fields, where the fields match the regular expression, instead of
using the value of the FS variable as the field separator.
Second version
I was surprised to discover, that without FPAT
it also works.
It is caused the number conversion mechanism of awk
- How awk Converts Between Strings and Numbers, namely:
A string is converted to a number by interpreting any numeric prefix of the string as numerals: "2.5" converts to 2.5, "1e3" converts to 1,000, and "25fix" has a numeric value of 25. Strings that canâÂÂt be interpreted as valid numbers convert to zero.
gawk '
/s/ factor = 1
/m/ factor = 60
/h/ factor = 3600
/d/ factor = 86400
print $0 * factor, $0;
' input.txt | sort -n | awk 'print $2'
Input (changed a little bit)
1s
122s
1h
2h
1m
2m
2s
1d
1m
Output
Note: 122 seconds more than 2 minutes, so it sorted after 2m.
1s
2s
1m
1m
2m
122s
1h
2h
1d
edited Oct 15 '17 at 19:34
answered Oct 15 '17 at 14:44
MiniMax
2,706719
2,706719
1
+1 I like the clever use ofFPAT
. This could easily be expanded to accept and handle time values like1d3h10m40s
.
â David Foerster
Oct 15 '17 at 16:44
@DavidFoerster I looked to yourawk
answer and discovered interesting fact: strings like1s
,3d
,4m
converting to the integer byawk
itself, without problems. So, they can be used for math operations directly - without splitting by regex. I was added second version of the solution and an explanation of this behaviour too.
â MiniMax
Oct 15 '17 at 19:42
add a comment |Â
1
+1 I like the clever use ofFPAT
. This could easily be expanded to accept and handle time values like1d3h10m40s
.
â David Foerster
Oct 15 '17 at 16:44
@DavidFoerster I looked to yourawk
answer and discovered interesting fact: strings like1s
,3d
,4m
converting to the integer byawk
itself, without problems. So, they can be used for math operations directly - without splitting by regex. I was added second version of the solution and an explanation of this behaviour too.
â MiniMax
Oct 15 '17 at 19:42
1
1
+1 I like the clever use of
FPAT
. This could easily be expanded to accept and handle time values like 1d3h10m40s
.â David Foerster
Oct 15 '17 at 16:44
+1 I like the clever use of
FPAT
. This could easily be expanded to accept and handle time values like 1d3h10m40s
.â David Foerster
Oct 15 '17 at 16:44
@DavidFoerster I looked to your
awk
answer and discovered interesting fact: strings like 1s
, 3d
, 4m
converting to the integer by awk
itself, without problems. So, they can be used for math operations directly - without splitting by regex. I was added second version of the solution and an explanation of this behaviour too.â MiniMax
Oct 15 '17 at 19:42
@DavidFoerster I looked to your
awk
answer and discovered interesting fact: strings like 1s
, 3d
, 4m
converting to the integer by awk
itself, without problems. So, they can be used for math operations directly - without splitting by regex. I was added second version of the solution and an explanation of this behaviour too.â MiniMax
Oct 15 '17 at 19:42
add a comment |Â
up vote
2
down vote
If you only have times in the format of your question:
sort -k 1.2,1.2 -k 1.1,1.1 <file>
Where <file>
is the file your data resides in. This command sorts on the second letter (ascending) and then sorts on the first letter (ascending). This works because it just so happes that the ordering of the letters for the time units (d > h > m > s) is exactly the order we want (day > hours > minutes > seconds).
1
Just like the other (now deleted) answer, this assumes durations are single-digit...
â don_crissti
Oct 15 '17 at 10:59
@don_crissti I answered the question because in the worst case I can just delete and in the best case this is exactly what he was looking for. I thought this was a better approach than waiting for an edit of the question (which potentially takes a long time, so by then the question might be lost).
â PawkyPenguin
Oct 15 '17 at 11:32
add a comment |Â
up vote
2
down vote
If you only have times in the format of your question:
sort -k 1.2,1.2 -k 1.1,1.1 <file>
Where <file>
is the file your data resides in. This command sorts on the second letter (ascending) and then sorts on the first letter (ascending). This works because it just so happes that the ordering of the letters for the time units (d > h > m > s) is exactly the order we want (day > hours > minutes > seconds).
1
Just like the other (now deleted) answer, this assumes durations are single-digit...
â don_crissti
Oct 15 '17 at 10:59
@don_crissti I answered the question because in the worst case I can just delete and in the best case this is exactly what he was looking for. I thought this was a better approach than waiting for an edit of the question (which potentially takes a long time, so by then the question might be lost).
â PawkyPenguin
Oct 15 '17 at 11:32
add a comment |Â
up vote
2
down vote
up vote
2
down vote
If you only have times in the format of your question:
sort -k 1.2,1.2 -k 1.1,1.1 <file>
Where <file>
is the file your data resides in. This command sorts on the second letter (ascending) and then sorts on the first letter (ascending). This works because it just so happes that the ordering of the letters for the time units (d > h > m > s) is exactly the order we want (day > hours > minutes > seconds).
If you only have times in the format of your question:
sort -k 1.2,1.2 -k 1.1,1.1 <file>
Where <file>
is the file your data resides in. This command sorts on the second letter (ascending) and then sorts on the first letter (ascending). This works because it just so happes that the ordering of the letters for the time units (d > h > m > s) is exactly the order we want (day > hours > minutes > seconds).
answered Oct 15 '17 at 10:58
PawkyPenguin
696110
696110
1
Just like the other (now deleted) answer, this assumes durations are single-digit...
â don_crissti
Oct 15 '17 at 10:59
@don_crissti I answered the question because in the worst case I can just delete and in the best case this is exactly what he was looking for. I thought this was a better approach than waiting for an edit of the question (which potentially takes a long time, so by then the question might be lost).
â PawkyPenguin
Oct 15 '17 at 11:32
add a comment |Â
1
Just like the other (now deleted) answer, this assumes durations are single-digit...
â don_crissti
Oct 15 '17 at 10:59
@don_crissti I answered the question because in the worst case I can just delete and in the best case this is exactly what he was looking for. I thought this was a better approach than waiting for an edit of the question (which potentially takes a long time, so by then the question might be lost).
â PawkyPenguin
Oct 15 '17 at 11:32
1
1
Just like the other (now deleted) answer, this assumes durations are single-digit...
â don_crissti
Oct 15 '17 at 10:59
Just like the other (now deleted) answer, this assumes durations are single-digit...
â don_crissti
Oct 15 '17 at 10:59
@don_crissti I answered the question because in the worst case I can just delete and in the best case this is exactly what he was looking for. I thought this was a better approach than waiting for an edit of the question (which potentially takes a long time, so by then the question might be lost).
â PawkyPenguin
Oct 15 '17 at 11:32
@don_crissti I answered the question because in the worst case I can just delete and in the best case this is exactly what he was looking for. I thought this was a better approach than waiting for an edit of the question (which potentially takes a long time, so by then the question might be lost).
â PawkyPenguin
Oct 15 '17 at 11:32
add a comment |Â
up vote
2
down vote
This an extension of MiniMaxâ answer that can handle a broader range of duration value like 1d3h10m40s
.
GNU Awk program (stored in parse-times.awk
for the sake of this answer):
#!/usr/bin/gawk -f
BEGIN
FPAT = "[0-9]+[dhms]";
duration["s"] = 1;
duration["m"] = 60;
duration["h"] = duration["m"] * 60;
duration["d"] = duration["h"] * 24;
t=0;
for (i=1; i<=NF; i++)
t += $i * duration[substr($i, length($i))];
print(t, $0);
Invocation:
gawk -f parse-times.awk input.txt | sort -n -k 1,1 | cut -d ' ' -f 2
add a comment |Â
up vote
2
down vote
This an extension of MiniMaxâ answer that can handle a broader range of duration value like 1d3h10m40s
.
GNU Awk program (stored in parse-times.awk
for the sake of this answer):
#!/usr/bin/gawk -f
BEGIN
FPAT = "[0-9]+[dhms]";
duration["s"] = 1;
duration["m"] = 60;
duration["h"] = duration["m"] * 60;
duration["d"] = duration["h"] * 24;
t=0;
for (i=1; i<=NF; i++)
t += $i * duration[substr($i, length($i))];
print(t, $0);
Invocation:
gawk -f parse-times.awk input.txt | sort -n -k 1,1 | cut -d ' ' -f 2
add a comment |Â
up vote
2
down vote
up vote
2
down vote
This an extension of MiniMaxâ answer that can handle a broader range of duration value like 1d3h10m40s
.
GNU Awk program (stored in parse-times.awk
for the sake of this answer):
#!/usr/bin/gawk -f
BEGIN
FPAT = "[0-9]+[dhms]";
duration["s"] = 1;
duration["m"] = 60;
duration["h"] = duration["m"] * 60;
duration["d"] = duration["h"] * 24;
t=0;
for (i=1; i<=NF; i++)
t += $i * duration[substr($i, length($i))];
print(t, $0);
Invocation:
gawk -f parse-times.awk input.txt | sort -n -k 1,1 | cut -d ' ' -f 2
This an extension of MiniMaxâ answer that can handle a broader range of duration value like 1d3h10m40s
.
GNU Awk program (stored in parse-times.awk
for the sake of this answer):
#!/usr/bin/gawk -f
BEGIN
FPAT = "[0-9]+[dhms]";
duration["s"] = 1;
duration["m"] = 60;
duration["h"] = duration["m"] * 60;
duration["d"] = duration["h"] * 24;
t=0;
for (i=1; i<=NF; i++)
t += $i * duration[substr($i, length($i))];
print(t, $0);
Invocation:
gawk -f parse-times.awk input.txt | sort -n -k 1,1 | cut -d ' ' -f 2
answered Oct 15 '17 at 17:04
David Foerster
917616
917616
add a comment |Â
add a comment |Â
up vote
1
down vote
Solution in Python 3:
#!/usr/bin/python3
import re, fileinput
class RegexMatchIterator:
def __init__(self, regex, string, error_on_incomplete=False):
self.regex = regex
self.string = string
self.error_on_incomplete = error_on_incomplete
self.pos = 0
def __iter__(self):
return self
def __next__(self):
match = self.regex.match(self.string, self.pos)
if match is not None:
if match.end() > self.pos:
self.pos = match.end()
return match
else:
fmt = '0!s returns an empty match at position 1:d for "3!r"'
elif self.error_on_incomplete and self.pos < len(self.string):
if isinstance(self.error_on_incomplete, str):
fmt = self.error_on_incomplete
else:
fmt = '0!s didn't match the suffix 3!r at position 1:d of 2!r'
else:
raise StopIteration(self.pos)
raise ValueError(fmt.format(
self.regex, self.pos, self.string, self.string[self.pos:]))
DURATION_SUFFIXES = 's': 1, 'm': 60, 'h': 3600, 'd': 24*3600
DURATION_PATTERN = re.compile(
'(\d+)(' + '|'.join(map(re.escape, DURATION_SUFFIXES.keys())) + ')')
def parse_duration(s):
return sum(
int(m.group(1)) * DURATION_SUFFIXES[m.group(2)]
for m in RegexMatchIterator(DURATION_PATTERN, s,
'Illegal duration string 3!r at position 1:d'))
if __name__ == '__main__':
with fileinput.input() as f:
result = sorted((l.rstrip('n') for l in f), key=parse_duration)
for item in result:
print(item)
As you can see I spent about âÂ
 of the line count towards a useful iterator over regex.match()
results because regex.finditer()
doesn't tie matches to the beginning of the current region and there are no other suitable ways to iterate over match results. *grrr*
add a comment |Â
up vote
1
down vote
Solution in Python 3:
#!/usr/bin/python3
import re, fileinput
class RegexMatchIterator:
def __init__(self, regex, string, error_on_incomplete=False):
self.regex = regex
self.string = string
self.error_on_incomplete = error_on_incomplete
self.pos = 0
def __iter__(self):
return self
def __next__(self):
match = self.regex.match(self.string, self.pos)
if match is not None:
if match.end() > self.pos:
self.pos = match.end()
return match
else:
fmt = '0!s returns an empty match at position 1:d for "3!r"'
elif self.error_on_incomplete and self.pos < len(self.string):
if isinstance(self.error_on_incomplete, str):
fmt = self.error_on_incomplete
else:
fmt = '0!s didn't match the suffix 3!r at position 1:d of 2!r'
else:
raise StopIteration(self.pos)
raise ValueError(fmt.format(
self.regex, self.pos, self.string, self.string[self.pos:]))
DURATION_SUFFIXES = 's': 1, 'm': 60, 'h': 3600, 'd': 24*3600
DURATION_PATTERN = re.compile(
'(\d+)(' + '|'.join(map(re.escape, DURATION_SUFFIXES.keys())) + ')')
def parse_duration(s):
return sum(
int(m.group(1)) * DURATION_SUFFIXES[m.group(2)]
for m in RegexMatchIterator(DURATION_PATTERN, s,
'Illegal duration string 3!r at position 1:d'))
if __name__ == '__main__':
with fileinput.input() as f:
result = sorted((l.rstrip('n') for l in f), key=parse_duration)
for item in result:
print(item)
As you can see I spent about âÂ
 of the line count towards a useful iterator over regex.match()
results because regex.finditer()
doesn't tie matches to the beginning of the current region and there are no other suitable ways to iterate over match results. *grrr*
add a comment |Â
up vote
1
down vote
up vote
1
down vote
Solution in Python 3:
#!/usr/bin/python3
import re, fileinput
class RegexMatchIterator:
def __init__(self, regex, string, error_on_incomplete=False):
self.regex = regex
self.string = string
self.error_on_incomplete = error_on_incomplete
self.pos = 0
def __iter__(self):
return self
def __next__(self):
match = self.regex.match(self.string, self.pos)
if match is not None:
if match.end() > self.pos:
self.pos = match.end()
return match
else:
fmt = '0!s returns an empty match at position 1:d for "3!r"'
elif self.error_on_incomplete and self.pos < len(self.string):
if isinstance(self.error_on_incomplete, str):
fmt = self.error_on_incomplete
else:
fmt = '0!s didn't match the suffix 3!r at position 1:d of 2!r'
else:
raise StopIteration(self.pos)
raise ValueError(fmt.format(
self.regex, self.pos, self.string, self.string[self.pos:]))
DURATION_SUFFIXES = 's': 1, 'm': 60, 'h': 3600, 'd': 24*3600
DURATION_PATTERN = re.compile(
'(\d+)(' + '|'.join(map(re.escape, DURATION_SUFFIXES.keys())) + ')')
def parse_duration(s):
return sum(
int(m.group(1)) * DURATION_SUFFIXES[m.group(2)]
for m in RegexMatchIterator(DURATION_PATTERN, s,
'Illegal duration string 3!r at position 1:d'))
if __name__ == '__main__':
with fileinput.input() as f:
result = sorted((l.rstrip('n') for l in f), key=parse_duration)
for item in result:
print(item)
As you can see I spent about âÂ
 of the line count towards a useful iterator over regex.match()
results because regex.finditer()
doesn't tie matches to the beginning of the current region and there are no other suitable ways to iterate over match results. *grrr*
Solution in Python 3:
#!/usr/bin/python3
import re, fileinput
class RegexMatchIterator:
def __init__(self, regex, string, error_on_incomplete=False):
self.regex = regex
self.string = string
self.error_on_incomplete = error_on_incomplete
self.pos = 0
def __iter__(self):
return self
def __next__(self):
match = self.regex.match(self.string, self.pos)
if match is not None:
if match.end() > self.pos:
self.pos = match.end()
return match
else:
fmt = '0!s returns an empty match at position 1:d for "3!r"'
elif self.error_on_incomplete and self.pos < len(self.string):
if isinstance(self.error_on_incomplete, str):
fmt = self.error_on_incomplete
else:
fmt = '0!s didn't match the suffix 3!r at position 1:d of 2!r'
else:
raise StopIteration(self.pos)
raise ValueError(fmt.format(
self.regex, self.pos, self.string, self.string[self.pos:]))
DURATION_SUFFIXES = 's': 1, 'm': 60, 'h': 3600, 'd': 24*3600
DURATION_PATTERN = re.compile(
'(\d+)(' + '|'.join(map(re.escape, DURATION_SUFFIXES.keys())) + ')')
def parse_duration(s):
return sum(
int(m.group(1)) * DURATION_SUFFIXES[m.group(2)]
for m in RegexMatchIterator(DURATION_PATTERN, s,
'Illegal duration string 3!r at position 1:d'))
if __name__ == '__main__':
with fileinput.input() as f:
result = sorted((l.rstrip('n') for l in f), key=parse_duration)
for item in result:
print(item)
As you can see I spent about âÂ
 of the line count towards a useful iterator over regex.match()
results because regex.finditer()
doesn't tie matches to the beginning of the current region and there are no other suitable ways to iterate over match results. *grrr*
edited Oct 15 '17 at 19:08
answered Oct 15 '17 at 18:56
David Foerster
917616
917616
add a comment |Â
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f398212%2fhow-to-sort-a-file-by-duration-column%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
3
Are there always only whole numbers, nothing like
1h30m40s
or1.30h
?â jimmij
Oct 15 '17 at 10:02