Integer with leading zeros (portable)?

Clash Royale CLAN TAG#URR8PPP
up vote
4
down vote
favorite
It is a "feature" of the shell that a number with a leading zero is interpreted as an octal number:
$ echo "$((00100))"
64
But there is no way to disallow this "feature" in many shells, so, it becomes difficult to force the interpretation of a digit sequence as a decimal (or other base) number.
When there is only a number to convert, there are several external programs that could do the trimming:
expr "00100" + 0
echo "00100" | sed 's/^0*//'
echo "00100" | grep -o '[^0].*$'
echo "00100" | awk 'print int($0)'
echo "00100" | perl -pe '$_=int."n";'
But it takes some time to execute them each and every time they are needed. Acumulate the use of such external tools over many calls and the delay becomes quite big. Just to measure up the delay caused, repeat the calls above a 1000 times and you will get (in sec):
expr 1.934
sed 3.450
grep 3.775
awk 5.291
perl 5.064
Of course (except expr) most tools could process a file with a 1000 lines in:
sed file 0.004
grep file 0.003
awk file 0.007
perl file 0.006
If all the individual 1000 values are available at the same point in time.
That could not be the case. So, what still remains to be answered is:
Is there a native (to the shell) way to extract an integer that is faster than calling external tools for each individual integer (not a list in a file) ?
Each call accumulates and the delay will become important.
The processing becomes more involved if the number may also have a leading sign and you want to reject invalid numbers.
shell-script shell
add a comment |Â
up vote
4
down vote
favorite
It is a "feature" of the shell that a number with a leading zero is interpreted as an octal number:
$ echo "$((00100))"
64
But there is no way to disallow this "feature" in many shells, so, it becomes difficult to force the interpretation of a digit sequence as a decimal (or other base) number.
When there is only a number to convert, there are several external programs that could do the trimming:
expr "00100" + 0
echo "00100" | sed 's/^0*//'
echo "00100" | grep -o '[^0].*$'
echo "00100" | awk 'print int($0)'
echo "00100" | perl -pe '$_=int."n";'
But it takes some time to execute them each and every time they are needed. Acumulate the use of such external tools over many calls and the delay becomes quite big. Just to measure up the delay caused, repeat the calls above a 1000 times and you will get (in sec):
expr 1.934
sed 3.450
grep 3.775
awk 5.291
perl 5.064
Of course (except expr) most tools could process a file with a 1000 lines in:
sed file 0.004
grep file 0.003
awk file 0.007
perl file 0.006
If all the individual 1000 values are available at the same point in time.
That could not be the case. So, what still remains to be answered is:
Is there a native (to the shell) way to extract an integer that is faster than calling external tools for each individual integer (not a list in a file) ?
Each call accumulates and the delay will become important.
The processing becomes more involved if the number may also have a leading sign and you want to reject invalid numbers.
shell-script shell
At least bash, probably other shells, has a way to force a base âÂÂ10#0010
â Jeff Schaller
Aug 8 at 11:37
@JeffSchaller Try (in bash)a=-00100;echo $((10#$a))ora=-++-00100; echo $((10#$a))(to get a positive number). ora=-0010; echo $((33#$a))to get-8instead of the correct-33ora=-001a; echo $((33#$a))to get this error:bash: 33#-001a: value too great for base (error token is "001a")⦠⦠ksh has similar problems.
â Isaac
Aug 8 at 11:44
@JeffSchaller And bash is not sh.
â Isaac
Aug 8 at 12:06
add a comment |Â
up vote
4
down vote
favorite
up vote
4
down vote
favorite
It is a "feature" of the shell that a number with a leading zero is interpreted as an octal number:
$ echo "$((00100))"
64
But there is no way to disallow this "feature" in many shells, so, it becomes difficult to force the interpretation of a digit sequence as a decimal (or other base) number.
When there is only a number to convert, there are several external programs that could do the trimming:
expr "00100" + 0
echo "00100" | sed 's/^0*//'
echo "00100" | grep -o '[^0].*$'
echo "00100" | awk 'print int($0)'
echo "00100" | perl -pe '$_=int."n";'
But it takes some time to execute them each and every time they are needed. Acumulate the use of such external tools over many calls and the delay becomes quite big. Just to measure up the delay caused, repeat the calls above a 1000 times and you will get (in sec):
expr 1.934
sed 3.450
grep 3.775
awk 5.291
perl 5.064
Of course (except expr) most tools could process a file with a 1000 lines in:
sed file 0.004
grep file 0.003
awk file 0.007
perl file 0.006
If all the individual 1000 values are available at the same point in time.
That could not be the case. So, what still remains to be answered is:
Is there a native (to the shell) way to extract an integer that is faster than calling external tools for each individual integer (not a list in a file) ?
Each call accumulates and the delay will become important.
The processing becomes more involved if the number may also have a leading sign and you want to reject invalid numbers.
shell-script shell
It is a "feature" of the shell that a number with a leading zero is interpreted as an octal number:
$ echo "$((00100))"
64
But there is no way to disallow this "feature" in many shells, so, it becomes difficult to force the interpretation of a digit sequence as a decimal (or other base) number.
When there is only a number to convert, there are several external programs that could do the trimming:
expr "00100" + 0
echo "00100" | sed 's/^0*//'
echo "00100" | grep -o '[^0].*$'
echo "00100" | awk 'print int($0)'
echo "00100" | perl -pe '$_=int."n";'
But it takes some time to execute them each and every time they are needed. Acumulate the use of such external tools over many calls and the delay becomes quite big. Just to measure up the delay caused, repeat the calls above a 1000 times and you will get (in sec):
expr 1.934
sed 3.450
grep 3.775
awk 5.291
perl 5.064
Of course (except expr) most tools could process a file with a 1000 lines in:
sed file 0.004
grep file 0.003
awk file 0.007
perl file 0.006
If all the individual 1000 values are available at the same point in time.
That could not be the case. So, what still remains to be answered is:
Is there a native (to the shell) way to extract an integer that is faster than calling external tools for each individual integer (not a list in a file) ?
Each call accumulates and the delay will become important.
The processing becomes more involved if the number may also have a leading sign and you want to reject invalid numbers.
shell-script shell
shell-script shell
edited Aug 9 at 3:56
asked Aug 8 at 11:25
Isaac
6,9851834
6,9851834
At least bash, probably other shells, has a way to force a base âÂÂ10#0010
â Jeff Schaller
Aug 8 at 11:37
@JeffSchaller Try (in bash)a=-00100;echo $((10#$a))ora=-++-00100; echo $((10#$a))(to get a positive number). ora=-0010; echo $((33#$a))to get-8instead of the correct-33ora=-001a; echo $((33#$a))to get this error:bash: 33#-001a: value too great for base (error token is "001a")⦠⦠ksh has similar problems.
â Isaac
Aug 8 at 11:44
@JeffSchaller And bash is not sh.
â Isaac
Aug 8 at 12:06
add a comment |Â
At least bash, probably other shells, has a way to force a base âÂÂ10#0010
â Jeff Schaller
Aug 8 at 11:37
@JeffSchaller Try (in bash)a=-00100;echo $((10#$a))ora=-++-00100; echo $((10#$a))(to get a positive number). ora=-0010; echo $((33#$a))to get-8instead of the correct-33ora=-001a; echo $((33#$a))to get this error:bash: 33#-001a: value too great for base (error token is "001a")⦠⦠ksh has similar problems.
â Isaac
Aug 8 at 11:44
@JeffSchaller And bash is not sh.
â Isaac
Aug 8 at 12:06
At least bash, probably other shells, has a way to force a base âÂÂ
10#0010â Jeff Schaller
Aug 8 at 11:37
At least bash, probably other shells, has a way to force a base âÂÂ
10#0010â Jeff Schaller
Aug 8 at 11:37
@JeffSchaller Try (in bash)
a=-00100;echo $((10#$a)) or a=-++-00100; echo $((10#$a)) (to get a positive number). or a=-0010; echo $((33#$a)) to get -8 instead of the correct -33 or a=-001a; echo $((33#$a)) to get this error: bash: 33#-001a: value too great for base (error token is "001a") ⦠⦠ksh has similar problems.â Isaac
Aug 8 at 11:44
@JeffSchaller Try (in bash)
a=-00100;echo $((10#$a)) or a=-++-00100; echo $((10#$a)) (to get a positive number). or a=-0010; echo $((33#$a)) to get -8 instead of the correct -33 or a=-001a; echo $((33#$a)) to get this error: bash: 33#-001a: value too great for base (error token is "001a") ⦠⦠ksh has similar problems.â Isaac
Aug 8 at 11:44
@JeffSchaller And bash is not sh.
â Isaac
Aug 8 at 12:06
@JeffSchaller And bash is not sh.
â Isaac
Aug 8 at 12:06
add a comment |Â
3 Answers
3
active
oldest
votes
up vote
2
down vote
Note that while $((010)) is required by POSIX to expand to 8, several shells don't do it by default (or only in some contexts) unless in a conformance mode as that's a feature you usually do not want.
With zsh, that's controlled by the octalzeroes option (off by default except in sh/ksh emulation).
$ zsh -c 'echo $((010))'
10
$ zsh -o octalzeroes -c 'echo $((010))'
8
$ (exec -a sh zsh -c 'echo "$((010))"')
8
In mksh, that's controlled by the posix option (off by default):
$ mksh -c 'echo "$((010))"'
10
$ mksh -o posix -c 'echo "$((010))"'
8
In bash, there's no option to turn it off, but you can use the $((10#010)) ksh syntax to force interpretation in decimal (also works in ksh and zsh), though in bash and mksh -o posix, $((10#-010)) doesn't work (treated as 10#0 - 010 as you can see from the expansion of $((-10#-010)) yielding -8), you need $((-10#010)) (or $((- 10#010)) for compatibility with zsh which complains about -10 being an invalid base).
$ bash -c 'echo "$((10#010))"'
10
With ksh93, compare:
$ ksh93 -c 'echo "$((010))"'
8
$ ksh93 -c '((a = 010)); echo "$a"'
8
with:
$ ksh93 -c 'a=010; echo "$((a))"'
10
$ ksh93 -c 'printf "%dn" 010'
10
$ ksh93 -c 'let a=010; echo "$a"'
10
$ ksh93 -c 'echo "$((010e0))"'
10
$ ksh93 -o letoctal -c 'let a=010; echo "$a"'
8
So at least if you're coding for any of those shells specifically, there are ways to work around that "misfeature".
But none of those would help when writing a POSIX portable script, in which case, you'd want to strip the leading zeros as you have shown.
add a comment |Â
up vote
0
down vote
Something similar could be done in one line with:
$ a=-00100; a=$a%"$a#[+-]"$a#"$a%%[!0+-]*"; a=$a:-0
$ echo "$a"
-100
It just takes 0.0482 for 1000 repetitions, 100 times less than using an external program.
That's based on two double parameter expansions:
- Extract the sign:
$a#[+-]remove the first character provided it is a sign.$a%"$a#[+-]"keeps the first sign provided that it is a sign.
- Remove all leading signs and/or zeros:
$a%%[!0+-]*remove starting at any ( not 0 or + or - ) to the end.$a#"$a%%[!0+-]*"remove the above, i.e., all leading zeros and signs.
That picks one sign and remove all leading zeros.
However it allows (without error):
- Several leading signs.
- Any characters after the leading signs and zeros.
- An "out of range" (too big) number.
If those tests are needed, keep reading.
The number of signs could be tested with:
signs=$a%%[!+-]*
[ $#signs -gt 1 ] && echo "$0: Invalid number $a: Too many signs"
The kind of characters allowed could be checked with:
num=$a#"$a%%[!0+-]*"
any=$num%%[!0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ@_]*
[ "$any" != "$num" ] && echo "$0: Invalid number $a"
hex=$num%%[!0123456789abcdefABCDEF]*
[ "$hex" != "$num" ] && echo "$0: Invalid hexadecimal number $a"
dec=$num%%[!0123456789]*
[ "$dec" != "$num" ] && echo "$0: Invalid decimal number $a"
And, finally, we can take advantage of the capacity of printf of printing a warning for numbers "out of range" (only for bases that printf understand):
printf '%d' $sign$dec >/dev/null # for a decimal number
printf '%d' "$sign0x$hex" >/dev/null # for hex numbers
Yes, all printf use %d, it is not a typo.
And, yes, all the above works in most shells that have printf.
0-9 or [:digit:] are not allowed to match more than 0123456789. This is becaise [:digit:] matches the same asisdigit()and that is not allowed to match more. 0-9 is contiguous even on EBCDIC
â schily
Aug 8 at 12:45
1
I had that misconception as well, but[[:digit:]]is required to match on 012345679 only, and not[0-9]and in practice, there are systems including GNU ones where[0-9]matches other characters in some locales.
â Stéphane Chazelas
Aug 8 at 12:46
@schily, that's not true of[0-9], there was a recent discussion on the austin-group-l ML which you participated in about it.
â Stéphane Chazelas
Aug 8 at 12:47
1
The question is whether there are other collating elements that are in the collating range 0..9. Do you know of such elements?
â schily
Aug 8 at 12:48
@schily, the discussion I was referring to is at mail-archive.com/austin-group-l@opengroup.org/msg02476.html showing my initial misconception, see the follow-ups which address most of the questions you have. Including, yes, that only applies to POSIX for [[:digit:]], and examples of systems including Solaris where 0-9 matches other collation elements and which ones.
â Stéphane Chazelas
Aug 8 at 12:57
 |Â
show 6 more comments
up vote
0
down vote
Here is your example x1000 on my system:
$ cat shell.sh
#!/bin/dash
q=1
while [ "$q" -le 1000 ]
do
z=-00100
z=$z%"$z#[+-]"$z#"$z%%[!0+-]*"
z=$z:-0
echo "$z"
q=$((q + 1))
done
Result:
$ time ./shell.sh >/dev/null
real 0m0.047s
Now I take issue with the sed example. I do see an example with a file, but i am
not seeing a clear reason why using a file is not acceptable. also the example
with pipe is problematic because a pipe is not needed - nor is calling sed 1000
times. if you simply cant use a file for whatever reason - a heredoc would be
fine:
cat > sed.sh <<alfa
sed 's/^0*//' <<bravo
$(yes 00100 | head -1000)
bravo
alfa
Result:
$ time ./sed.sh >/dev/null
real 0m0.047s
So on my system it is the exact same speed without the fuss.
The actual code for the loop has not been added for simplicity. If you insist, I could add it (note that it is a simple for loop from 1 to 1000). Also, I am still working to complete this question and answer. it has become quite more convoluted than what I though on the outset.
â Isaac
Aug 9 at 3:40
In timing, you get 0.047, I got 0.048, so we quite agree. Now if you try something liketime for (( i=0; i<$n; i++)); do echo "00100" | awk 'print int($0)' ; done >/dev/null(where n is 1000) for each of the possible solutions, you should get timings similar to what I got. As for the file, the reason is very simple, there are times when there is only one value to convert, repeat that conversions at many different times (needs) and the delays will add up.
â Isaac
Aug 9 at 3:46
add a comment |Â
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
2
down vote
Note that while $((010)) is required by POSIX to expand to 8, several shells don't do it by default (or only in some contexts) unless in a conformance mode as that's a feature you usually do not want.
With zsh, that's controlled by the octalzeroes option (off by default except in sh/ksh emulation).
$ zsh -c 'echo $((010))'
10
$ zsh -o octalzeroes -c 'echo $((010))'
8
$ (exec -a sh zsh -c 'echo "$((010))"')
8
In mksh, that's controlled by the posix option (off by default):
$ mksh -c 'echo "$((010))"'
10
$ mksh -o posix -c 'echo "$((010))"'
8
In bash, there's no option to turn it off, but you can use the $((10#010)) ksh syntax to force interpretation in decimal (also works in ksh and zsh), though in bash and mksh -o posix, $((10#-010)) doesn't work (treated as 10#0 - 010 as you can see from the expansion of $((-10#-010)) yielding -8), you need $((-10#010)) (or $((- 10#010)) for compatibility with zsh which complains about -10 being an invalid base).
$ bash -c 'echo "$((10#010))"'
10
With ksh93, compare:
$ ksh93 -c 'echo "$((010))"'
8
$ ksh93 -c '((a = 010)); echo "$a"'
8
with:
$ ksh93 -c 'a=010; echo "$((a))"'
10
$ ksh93 -c 'printf "%dn" 010'
10
$ ksh93 -c 'let a=010; echo "$a"'
10
$ ksh93 -c 'echo "$((010e0))"'
10
$ ksh93 -o letoctal -c 'let a=010; echo "$a"'
8
So at least if you're coding for any of those shells specifically, there are ways to work around that "misfeature".
But none of those would help when writing a POSIX portable script, in which case, you'd want to strip the leading zeros as you have shown.
add a comment |Â
up vote
2
down vote
Note that while $((010)) is required by POSIX to expand to 8, several shells don't do it by default (or only in some contexts) unless in a conformance mode as that's a feature you usually do not want.
With zsh, that's controlled by the octalzeroes option (off by default except in sh/ksh emulation).
$ zsh -c 'echo $((010))'
10
$ zsh -o octalzeroes -c 'echo $((010))'
8
$ (exec -a sh zsh -c 'echo "$((010))"')
8
In mksh, that's controlled by the posix option (off by default):
$ mksh -c 'echo "$((010))"'
10
$ mksh -o posix -c 'echo "$((010))"'
8
In bash, there's no option to turn it off, but you can use the $((10#010)) ksh syntax to force interpretation in decimal (also works in ksh and zsh), though in bash and mksh -o posix, $((10#-010)) doesn't work (treated as 10#0 - 010 as you can see from the expansion of $((-10#-010)) yielding -8), you need $((-10#010)) (or $((- 10#010)) for compatibility with zsh which complains about -10 being an invalid base).
$ bash -c 'echo "$((10#010))"'
10
With ksh93, compare:
$ ksh93 -c 'echo "$((010))"'
8
$ ksh93 -c '((a = 010)); echo "$a"'
8
with:
$ ksh93 -c 'a=010; echo "$((a))"'
10
$ ksh93 -c 'printf "%dn" 010'
10
$ ksh93 -c 'let a=010; echo "$a"'
10
$ ksh93 -c 'echo "$((010e0))"'
10
$ ksh93 -o letoctal -c 'let a=010; echo "$a"'
8
So at least if you're coding for any of those shells specifically, there are ways to work around that "misfeature".
But none of those would help when writing a POSIX portable script, in which case, you'd want to strip the leading zeros as you have shown.
add a comment |Â
up vote
2
down vote
up vote
2
down vote
Note that while $((010)) is required by POSIX to expand to 8, several shells don't do it by default (or only in some contexts) unless in a conformance mode as that's a feature you usually do not want.
With zsh, that's controlled by the octalzeroes option (off by default except in sh/ksh emulation).
$ zsh -c 'echo $((010))'
10
$ zsh -o octalzeroes -c 'echo $((010))'
8
$ (exec -a sh zsh -c 'echo "$((010))"')
8
In mksh, that's controlled by the posix option (off by default):
$ mksh -c 'echo "$((010))"'
10
$ mksh -o posix -c 'echo "$((010))"'
8
In bash, there's no option to turn it off, but you can use the $((10#010)) ksh syntax to force interpretation in decimal (also works in ksh and zsh), though in bash and mksh -o posix, $((10#-010)) doesn't work (treated as 10#0 - 010 as you can see from the expansion of $((-10#-010)) yielding -8), you need $((-10#010)) (or $((- 10#010)) for compatibility with zsh which complains about -10 being an invalid base).
$ bash -c 'echo "$((10#010))"'
10
With ksh93, compare:
$ ksh93 -c 'echo "$((010))"'
8
$ ksh93 -c '((a = 010)); echo "$a"'
8
with:
$ ksh93 -c 'a=010; echo "$((a))"'
10
$ ksh93 -c 'printf "%dn" 010'
10
$ ksh93 -c 'let a=010; echo "$a"'
10
$ ksh93 -c 'echo "$((010e0))"'
10
$ ksh93 -o letoctal -c 'let a=010; echo "$a"'
8
So at least if you're coding for any of those shells specifically, there are ways to work around that "misfeature".
But none of those would help when writing a POSIX portable script, in which case, you'd want to strip the leading zeros as you have shown.
Note that while $((010)) is required by POSIX to expand to 8, several shells don't do it by default (or only in some contexts) unless in a conformance mode as that's a feature you usually do not want.
With zsh, that's controlled by the octalzeroes option (off by default except in sh/ksh emulation).
$ zsh -c 'echo $((010))'
10
$ zsh -o octalzeroes -c 'echo $((010))'
8
$ (exec -a sh zsh -c 'echo "$((010))"')
8
In mksh, that's controlled by the posix option (off by default):
$ mksh -c 'echo "$((010))"'
10
$ mksh -o posix -c 'echo "$((010))"'
8
In bash, there's no option to turn it off, but you can use the $((10#010)) ksh syntax to force interpretation in decimal (also works in ksh and zsh), though in bash and mksh -o posix, $((10#-010)) doesn't work (treated as 10#0 - 010 as you can see from the expansion of $((-10#-010)) yielding -8), you need $((-10#010)) (or $((- 10#010)) for compatibility with zsh which complains about -10 being an invalid base).
$ bash -c 'echo "$((10#010))"'
10
With ksh93, compare:
$ ksh93 -c 'echo "$((010))"'
8
$ ksh93 -c '((a = 010)); echo "$a"'
8
with:
$ ksh93 -c 'a=010; echo "$((a))"'
10
$ ksh93 -c 'printf "%dn" 010'
10
$ ksh93 -c 'let a=010; echo "$a"'
10
$ ksh93 -c 'echo "$((010e0))"'
10
$ ksh93 -o letoctal -c 'let a=010; echo "$a"'
8
So at least if you're coding for any of those shells specifically, there are ways to work around that "misfeature".
But none of those would help when writing a POSIX portable script, in which case, you'd want to strip the leading zeros as you have shown.
edited Aug 8 at 16:23
answered Aug 8 at 14:18
Stéphane Chazelas
284k53523861
284k53523861
add a comment |Â
add a comment |Â
up vote
0
down vote
Something similar could be done in one line with:
$ a=-00100; a=$a%"$a#[+-]"$a#"$a%%[!0+-]*"; a=$a:-0
$ echo "$a"
-100
It just takes 0.0482 for 1000 repetitions, 100 times less than using an external program.
That's based on two double parameter expansions:
- Extract the sign:
$a#[+-]remove the first character provided it is a sign.$a%"$a#[+-]"keeps the first sign provided that it is a sign.
- Remove all leading signs and/or zeros:
$a%%[!0+-]*remove starting at any ( not 0 or + or - ) to the end.$a#"$a%%[!0+-]*"remove the above, i.e., all leading zeros and signs.
That picks one sign and remove all leading zeros.
However it allows (without error):
- Several leading signs.
- Any characters after the leading signs and zeros.
- An "out of range" (too big) number.
If those tests are needed, keep reading.
The number of signs could be tested with:
signs=$a%%[!+-]*
[ $#signs -gt 1 ] && echo "$0: Invalid number $a: Too many signs"
The kind of characters allowed could be checked with:
num=$a#"$a%%[!0+-]*"
any=$num%%[!0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ@_]*
[ "$any" != "$num" ] && echo "$0: Invalid number $a"
hex=$num%%[!0123456789abcdefABCDEF]*
[ "$hex" != "$num" ] && echo "$0: Invalid hexadecimal number $a"
dec=$num%%[!0123456789]*
[ "$dec" != "$num" ] && echo "$0: Invalid decimal number $a"
And, finally, we can take advantage of the capacity of printf of printing a warning for numbers "out of range" (only for bases that printf understand):
printf '%d' $sign$dec >/dev/null # for a decimal number
printf '%d' "$sign0x$hex" >/dev/null # for hex numbers
Yes, all printf use %d, it is not a typo.
And, yes, all the above works in most shells that have printf.
0-9 or [:digit:] are not allowed to match more than 0123456789. This is becaise [:digit:] matches the same asisdigit()and that is not allowed to match more. 0-9 is contiguous even on EBCDIC
â schily
Aug 8 at 12:45
1
I had that misconception as well, but[[:digit:]]is required to match on 012345679 only, and not[0-9]and in practice, there are systems including GNU ones where[0-9]matches other characters in some locales.
â Stéphane Chazelas
Aug 8 at 12:46
@schily, that's not true of[0-9], there was a recent discussion on the austin-group-l ML which you participated in about it.
â Stéphane Chazelas
Aug 8 at 12:47
1
The question is whether there are other collating elements that are in the collating range 0..9. Do you know of such elements?
â schily
Aug 8 at 12:48
@schily, the discussion I was referring to is at mail-archive.com/austin-group-l@opengroup.org/msg02476.html showing my initial misconception, see the follow-ups which address most of the questions you have. Including, yes, that only applies to POSIX for [[:digit:]], and examples of systems including Solaris where 0-9 matches other collation elements and which ones.
â Stéphane Chazelas
Aug 8 at 12:57
 |Â
show 6 more comments
up vote
0
down vote
Something similar could be done in one line with:
$ a=-00100; a=$a%"$a#[+-]"$a#"$a%%[!0+-]*"; a=$a:-0
$ echo "$a"
-100
It just takes 0.0482 for 1000 repetitions, 100 times less than using an external program.
That's based on two double parameter expansions:
- Extract the sign:
$a#[+-]remove the first character provided it is a sign.$a%"$a#[+-]"keeps the first sign provided that it is a sign.
- Remove all leading signs and/or zeros:
$a%%[!0+-]*remove starting at any ( not 0 or + or - ) to the end.$a#"$a%%[!0+-]*"remove the above, i.e., all leading zeros and signs.
That picks one sign and remove all leading zeros.
However it allows (without error):
- Several leading signs.
- Any characters after the leading signs and zeros.
- An "out of range" (too big) number.
If those tests are needed, keep reading.
The number of signs could be tested with:
signs=$a%%[!+-]*
[ $#signs -gt 1 ] && echo "$0: Invalid number $a: Too many signs"
The kind of characters allowed could be checked with:
num=$a#"$a%%[!0+-]*"
any=$num%%[!0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ@_]*
[ "$any" != "$num" ] && echo "$0: Invalid number $a"
hex=$num%%[!0123456789abcdefABCDEF]*
[ "$hex" != "$num" ] && echo "$0: Invalid hexadecimal number $a"
dec=$num%%[!0123456789]*
[ "$dec" != "$num" ] && echo "$0: Invalid decimal number $a"
And, finally, we can take advantage of the capacity of printf of printing a warning for numbers "out of range" (only for bases that printf understand):
printf '%d' $sign$dec >/dev/null # for a decimal number
printf '%d' "$sign0x$hex" >/dev/null # for hex numbers
Yes, all printf use %d, it is not a typo.
And, yes, all the above works in most shells that have printf.
0-9 or [:digit:] are not allowed to match more than 0123456789. This is becaise [:digit:] matches the same asisdigit()and that is not allowed to match more. 0-9 is contiguous even on EBCDIC
â schily
Aug 8 at 12:45
1
I had that misconception as well, but[[:digit:]]is required to match on 012345679 only, and not[0-9]and in practice, there are systems including GNU ones where[0-9]matches other characters in some locales.
â Stéphane Chazelas
Aug 8 at 12:46
@schily, that's not true of[0-9], there was a recent discussion on the austin-group-l ML which you participated in about it.
â Stéphane Chazelas
Aug 8 at 12:47
1
The question is whether there are other collating elements that are in the collating range 0..9. Do you know of such elements?
â schily
Aug 8 at 12:48
@schily, the discussion I was referring to is at mail-archive.com/austin-group-l@opengroup.org/msg02476.html showing my initial misconception, see the follow-ups which address most of the questions you have. Including, yes, that only applies to POSIX for [[:digit:]], and examples of systems including Solaris where 0-9 matches other collation elements and which ones.
â Stéphane Chazelas
Aug 8 at 12:57
 |Â
show 6 more comments
up vote
0
down vote
up vote
0
down vote
Something similar could be done in one line with:
$ a=-00100; a=$a%"$a#[+-]"$a#"$a%%[!0+-]*"; a=$a:-0
$ echo "$a"
-100
It just takes 0.0482 for 1000 repetitions, 100 times less than using an external program.
That's based on two double parameter expansions:
- Extract the sign:
$a#[+-]remove the first character provided it is a sign.$a%"$a#[+-]"keeps the first sign provided that it is a sign.
- Remove all leading signs and/or zeros:
$a%%[!0+-]*remove starting at any ( not 0 or + or - ) to the end.$a#"$a%%[!0+-]*"remove the above, i.e., all leading zeros and signs.
That picks one sign and remove all leading zeros.
However it allows (without error):
- Several leading signs.
- Any characters after the leading signs and zeros.
- An "out of range" (too big) number.
If those tests are needed, keep reading.
The number of signs could be tested with:
signs=$a%%[!+-]*
[ $#signs -gt 1 ] && echo "$0: Invalid number $a: Too many signs"
The kind of characters allowed could be checked with:
num=$a#"$a%%[!0+-]*"
any=$num%%[!0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ@_]*
[ "$any" != "$num" ] && echo "$0: Invalid number $a"
hex=$num%%[!0123456789abcdefABCDEF]*
[ "$hex" != "$num" ] && echo "$0: Invalid hexadecimal number $a"
dec=$num%%[!0123456789]*
[ "$dec" != "$num" ] && echo "$0: Invalid decimal number $a"
And, finally, we can take advantage of the capacity of printf of printing a warning for numbers "out of range" (only for bases that printf understand):
printf '%d' $sign$dec >/dev/null # for a decimal number
printf '%d' "$sign0x$hex" >/dev/null # for hex numbers
Yes, all printf use %d, it is not a typo.
And, yes, all the above works in most shells that have printf.
Something similar could be done in one line with:
$ a=-00100; a=$a%"$a#[+-]"$a#"$a%%[!0+-]*"; a=$a:-0
$ echo "$a"
-100
It just takes 0.0482 for 1000 repetitions, 100 times less than using an external program.
That's based on two double parameter expansions:
- Extract the sign:
$a#[+-]remove the first character provided it is a sign.$a%"$a#[+-]"keeps the first sign provided that it is a sign.
- Remove all leading signs and/or zeros:
$a%%[!0+-]*remove starting at any ( not 0 or + or - ) to the end.$a#"$a%%[!0+-]*"remove the above, i.e., all leading zeros and signs.
That picks one sign and remove all leading zeros.
However it allows (without error):
- Several leading signs.
- Any characters after the leading signs and zeros.
- An "out of range" (too big) number.
If those tests are needed, keep reading.
The number of signs could be tested with:
signs=$a%%[!+-]*
[ $#signs -gt 1 ] && echo "$0: Invalid number $a: Too many signs"
The kind of characters allowed could be checked with:
num=$a#"$a%%[!0+-]*"
any=$num%%[!0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ@_]*
[ "$any" != "$num" ] && echo "$0: Invalid number $a"
hex=$num%%[!0123456789abcdefABCDEF]*
[ "$hex" != "$num" ] && echo "$0: Invalid hexadecimal number $a"
dec=$num%%[!0123456789]*
[ "$dec" != "$num" ] && echo "$0: Invalid decimal number $a"
And, finally, we can take advantage of the capacity of printf of printing a warning for numbers "out of range" (only for bases that printf understand):
printf '%d' $sign$dec >/dev/null # for a decimal number
printf '%d' "$sign0x$hex" >/dev/null # for hex numbers
Yes, all printf use %d, it is not a typo.
And, yes, all the above works in most shells that have printf.
edited Aug 9 at 2:34
answered Aug 8 at 11:25
Isaac
6,9851834
6,9851834
0-9 or [:digit:] are not allowed to match more than 0123456789. This is becaise [:digit:] matches the same asisdigit()and that is not allowed to match more. 0-9 is contiguous even on EBCDIC
â schily
Aug 8 at 12:45
1
I had that misconception as well, but[[:digit:]]is required to match on 012345679 only, and not[0-9]and in practice, there are systems including GNU ones where[0-9]matches other characters in some locales.
â Stéphane Chazelas
Aug 8 at 12:46
@schily, that's not true of[0-9], there was a recent discussion on the austin-group-l ML which you participated in about it.
â Stéphane Chazelas
Aug 8 at 12:47
1
The question is whether there are other collating elements that are in the collating range 0..9. Do you know of such elements?
â schily
Aug 8 at 12:48
@schily, the discussion I was referring to is at mail-archive.com/austin-group-l@opengroup.org/msg02476.html showing my initial misconception, see the follow-ups which address most of the questions you have. Including, yes, that only applies to POSIX for [[:digit:]], and examples of systems including Solaris where 0-9 matches other collation elements and which ones.
â Stéphane Chazelas
Aug 8 at 12:57
 |Â
show 6 more comments
0-9 or [:digit:] are not allowed to match more than 0123456789. This is becaise [:digit:] matches the same asisdigit()and that is not allowed to match more. 0-9 is contiguous even on EBCDIC
â schily
Aug 8 at 12:45
1
I had that misconception as well, but[[:digit:]]is required to match on 012345679 only, and not[0-9]and in practice, there are systems including GNU ones where[0-9]matches other characters in some locales.
â Stéphane Chazelas
Aug 8 at 12:46
@schily, that's not true of[0-9], there was a recent discussion on the austin-group-l ML which you participated in about it.
â Stéphane Chazelas
Aug 8 at 12:47
1
The question is whether there are other collating elements that are in the collating range 0..9. Do you know of such elements?
â schily
Aug 8 at 12:48
@schily, the discussion I was referring to is at mail-archive.com/austin-group-l@opengroup.org/msg02476.html showing my initial misconception, see the follow-ups which address most of the questions you have. Including, yes, that only applies to POSIX for [[:digit:]], and examples of systems including Solaris where 0-9 matches other collation elements and which ones.
â Stéphane Chazelas
Aug 8 at 12:57
0-9 or [:digit:] are not allowed to match more than 0123456789. This is becaise [:digit:] matches the same as
isdigit() and that is not allowed to match more. 0-9 is contiguous even on EBCDICâ schily
Aug 8 at 12:45
0-9 or [:digit:] are not allowed to match more than 0123456789. This is becaise [:digit:] matches the same as
isdigit() and that is not allowed to match more. 0-9 is contiguous even on EBCDICâ schily
Aug 8 at 12:45
1
1
I had that misconception as well, but
[[:digit:]] is required to match on 012345679 only, and not [0-9] and in practice, there are systems including GNU ones where [0-9] matches other characters in some locales.â Stéphane Chazelas
Aug 8 at 12:46
I had that misconception as well, but
[[:digit:]] is required to match on 012345679 only, and not [0-9] and in practice, there are systems including GNU ones where [0-9] matches other characters in some locales.â Stéphane Chazelas
Aug 8 at 12:46
@schily, that's not true of
[0-9], there was a recent discussion on the austin-group-l ML which you participated in about it.â Stéphane Chazelas
Aug 8 at 12:47
@schily, that's not true of
[0-9], there was a recent discussion on the austin-group-l ML which you participated in about it.â Stéphane Chazelas
Aug 8 at 12:47
1
1
The question is whether there are other collating elements that are in the collating range 0..9. Do you know of such elements?
â schily
Aug 8 at 12:48
The question is whether there are other collating elements that are in the collating range 0..9. Do you know of such elements?
â schily
Aug 8 at 12:48
@schily, the discussion I was referring to is at mail-archive.com/austin-group-l@opengroup.org/msg02476.html showing my initial misconception, see the follow-ups which address most of the questions you have. Including, yes, that only applies to POSIX for [[:digit:]], and examples of systems including Solaris where 0-9 matches other collation elements and which ones.
â Stéphane Chazelas
Aug 8 at 12:57
@schily, the discussion I was referring to is at mail-archive.com/austin-group-l@opengroup.org/msg02476.html showing my initial misconception, see the follow-ups which address most of the questions you have. Including, yes, that only applies to POSIX for [[:digit:]], and examples of systems including Solaris where 0-9 matches other collation elements and which ones.
â Stéphane Chazelas
Aug 8 at 12:57
 |Â
show 6 more comments
up vote
0
down vote
Here is your example x1000 on my system:
$ cat shell.sh
#!/bin/dash
q=1
while [ "$q" -le 1000 ]
do
z=-00100
z=$z%"$z#[+-]"$z#"$z%%[!0+-]*"
z=$z:-0
echo "$z"
q=$((q + 1))
done
Result:
$ time ./shell.sh >/dev/null
real 0m0.047s
Now I take issue with the sed example. I do see an example with a file, but i am
not seeing a clear reason why using a file is not acceptable. also the example
with pipe is problematic because a pipe is not needed - nor is calling sed 1000
times. if you simply cant use a file for whatever reason - a heredoc would be
fine:
cat > sed.sh <<alfa
sed 's/^0*//' <<bravo
$(yes 00100 | head -1000)
bravo
alfa
Result:
$ time ./sed.sh >/dev/null
real 0m0.047s
So on my system it is the exact same speed without the fuss.
The actual code for the loop has not been added for simplicity. If you insist, I could add it (note that it is a simple for loop from 1 to 1000). Also, I am still working to complete this question and answer. it has become quite more convoluted than what I though on the outset.
â Isaac
Aug 9 at 3:40
In timing, you get 0.047, I got 0.048, so we quite agree. Now if you try something liketime for (( i=0; i<$n; i++)); do echo "00100" | awk 'print int($0)' ; done >/dev/null(where n is 1000) for each of the possible solutions, you should get timings similar to what I got. As for the file, the reason is very simple, there are times when there is only one value to convert, repeat that conversions at many different times (needs) and the delays will add up.
â Isaac
Aug 9 at 3:46
add a comment |Â
up vote
0
down vote
Here is your example x1000 on my system:
$ cat shell.sh
#!/bin/dash
q=1
while [ "$q" -le 1000 ]
do
z=-00100
z=$z%"$z#[+-]"$z#"$z%%[!0+-]*"
z=$z:-0
echo "$z"
q=$((q + 1))
done
Result:
$ time ./shell.sh >/dev/null
real 0m0.047s
Now I take issue with the sed example. I do see an example with a file, but i am
not seeing a clear reason why using a file is not acceptable. also the example
with pipe is problematic because a pipe is not needed - nor is calling sed 1000
times. if you simply cant use a file for whatever reason - a heredoc would be
fine:
cat > sed.sh <<alfa
sed 's/^0*//' <<bravo
$(yes 00100 | head -1000)
bravo
alfa
Result:
$ time ./sed.sh >/dev/null
real 0m0.047s
So on my system it is the exact same speed without the fuss.
The actual code for the loop has not been added for simplicity. If you insist, I could add it (note that it is a simple for loop from 1 to 1000). Also, I am still working to complete this question and answer. it has become quite more convoluted than what I though on the outset.
â Isaac
Aug 9 at 3:40
In timing, you get 0.047, I got 0.048, so we quite agree. Now if you try something liketime for (( i=0; i<$n; i++)); do echo "00100" | awk 'print int($0)' ; done >/dev/null(where n is 1000) for each of the possible solutions, you should get timings similar to what I got. As for the file, the reason is very simple, there are times when there is only one value to convert, repeat that conversions at many different times (needs) and the delays will add up.
â Isaac
Aug 9 at 3:46
add a comment |Â
up vote
0
down vote
up vote
0
down vote
Here is your example x1000 on my system:
$ cat shell.sh
#!/bin/dash
q=1
while [ "$q" -le 1000 ]
do
z=-00100
z=$z%"$z#[+-]"$z#"$z%%[!0+-]*"
z=$z:-0
echo "$z"
q=$((q + 1))
done
Result:
$ time ./shell.sh >/dev/null
real 0m0.047s
Now I take issue with the sed example. I do see an example with a file, but i am
not seeing a clear reason why using a file is not acceptable. also the example
with pipe is problematic because a pipe is not needed - nor is calling sed 1000
times. if you simply cant use a file for whatever reason - a heredoc would be
fine:
cat > sed.sh <<alfa
sed 's/^0*//' <<bravo
$(yes 00100 | head -1000)
bravo
alfa
Result:
$ time ./sed.sh >/dev/null
real 0m0.047s
So on my system it is the exact same speed without the fuss.
Here is your example x1000 on my system:
$ cat shell.sh
#!/bin/dash
q=1
while [ "$q" -le 1000 ]
do
z=-00100
z=$z%"$z#[+-]"$z#"$z%%[!0+-]*"
z=$z:-0
echo "$z"
q=$((q + 1))
done
Result:
$ time ./shell.sh >/dev/null
real 0m0.047s
Now I take issue with the sed example. I do see an example with a file, but i am
not seeing a clear reason why using a file is not acceptable. also the example
with pipe is problematic because a pipe is not needed - nor is calling sed 1000
times. if you simply cant use a file for whatever reason - a heredoc would be
fine:
cat > sed.sh <<alfa
sed 's/^0*//' <<bravo
$(yes 00100 | head -1000)
bravo
alfa
Result:
$ time ./sed.sh >/dev/null
real 0m0.047s
So on my system it is the exact same speed without the fuss.
answered Aug 9 at 3:20
Steven Penny
2,31521635
2,31521635
The actual code for the loop has not been added for simplicity. If you insist, I could add it (note that it is a simple for loop from 1 to 1000). Also, I am still working to complete this question and answer. it has become quite more convoluted than what I though on the outset.
â Isaac
Aug 9 at 3:40
In timing, you get 0.047, I got 0.048, so we quite agree. Now if you try something liketime for (( i=0; i<$n; i++)); do echo "00100" | awk 'print int($0)' ; done >/dev/null(where n is 1000) for each of the possible solutions, you should get timings similar to what I got. As for the file, the reason is very simple, there are times when there is only one value to convert, repeat that conversions at many different times (needs) and the delays will add up.
â Isaac
Aug 9 at 3:46
add a comment |Â
The actual code for the loop has not been added for simplicity. If you insist, I could add it (note that it is a simple for loop from 1 to 1000). Also, I am still working to complete this question and answer. it has become quite more convoluted than what I though on the outset.
â Isaac
Aug 9 at 3:40
In timing, you get 0.047, I got 0.048, so we quite agree. Now if you try something liketime for (( i=0; i<$n; i++)); do echo "00100" | awk 'print int($0)' ; done >/dev/null(where n is 1000) for each of the possible solutions, you should get timings similar to what I got. As for the file, the reason is very simple, there are times when there is only one value to convert, repeat that conversions at many different times (needs) and the delays will add up.
â Isaac
Aug 9 at 3:46
The actual code for the loop has not been added for simplicity. If you insist, I could add it (note that it is a simple for loop from 1 to 1000). Also, I am still working to complete this question and answer. it has become quite more convoluted than what I though on the outset.
â Isaac
Aug 9 at 3:40
The actual code for the loop has not been added for simplicity. If you insist, I could add it (note that it is a simple for loop from 1 to 1000). Also, I am still working to complete this question and answer. it has become quite more convoluted than what I though on the outset.
â Isaac
Aug 9 at 3:40
In timing, you get 0.047, I got 0.048, so we quite agree. Now if you try something like
time for (( i=0; i<$n; i++)); do echo "00100" | awk 'print int($0)' ; done >/dev/null (where n is 1000) for each of the possible solutions, you should get timings similar to what I got. As for the file, the reason is very simple, there are times when there is only one value to convert, repeat that conversions at many different times (needs) and the delays will add up.â Isaac
Aug 9 at 3:46
In timing, you get 0.047, I got 0.048, so we quite agree. Now if you try something like
time for (( i=0; i<$n; i++)); do echo "00100" | awk 'print int($0)' ; done >/dev/null (where n is 1000) for each of the possible solutions, you should get timings similar to what I got. As for the file, the reason is very simple, there are times when there is only one value to convert, repeat that conversions at many different times (needs) and the delays will add up.â Isaac
Aug 9 at 3:46
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f461263%2finteger-with-leading-zeros-portable%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
At least bash, probably other shells, has a way to force a base âÂÂ
10#0010â Jeff Schaller
Aug 8 at 11:37
@JeffSchaller Try (in bash)
a=-00100;echo $((10#$a))ora=-++-00100; echo $((10#$a))(to get a positive number). ora=-0010; echo $((33#$a))to get-8instead of the correct-33ora=-001a; echo $((33#$a))to get this error:bash: 33#-001a: value too great for base (error token is "001a")⦠⦠ksh has similar problems.â Isaac
Aug 8 at 11:44
@JeffSchaller And bash is not sh.
â Isaac
Aug 8 at 12:06