Integer with leading zeros (portable)?

up vote
4
down vote

favorite

It is a "feature" of the shell that a number with a leading zero is interpreted as an octal number:

$ echo "$((00100))"
64

But there is no way to disallow this "feature" in many shells, so, it becomes difficult to force the interpretation of a digit sequence as a decimal (or other base) number.

When there is only a number to convert, there are several external programs that could do the trimming:

expr "00100" + 0 
echo "00100" | sed 's/^0*//'
echo "00100" | grep -o '[^0].*$'
echo "00100" | awk 'print int($0)'
echo "00100" | perl -pe '$_=int."n";'

But it takes some time to execute them each and every time they are needed. Acumulate the use of such external tools over many calls and the delay becomes quite big. Just to measure up the delay caused, repeat the calls above a 1000 times and you will get (in sec):

expr 1.934
sed 3.450
grep 3.775
awk 5.291
perl 5.064

Of course (except expr) most tools could process a file with a 1000 lines in:

sed file 0.004
grep file 0.003
awk file 0.007
perl file 0.006

If all the individual 1000 values are available at the same point in time.

That could not be the case. So, what still remains to be answered is:

Is there a native (to the shell) way to extract an integer that is faster than calling external tools for each individual integer (not a list in a file) ?

Each call accumulates and the delay will become important.

The processing becomes more involved if the number may also have a leading sign and you want to reject invalid numbers.

edited Aug 9 at 3:56

asked Aug 8 at 11:25

Isaac

6,9851834

At least bash, probably other shells, has a way to force a base Ã¢Â€Â” 10#0010
â€“Â Jeff Schaller
Aug 8 at 11:37

@JeffSchaller Try (in bash) a=-00100;echo $((10#$a)) or a=-++-00100; echo $((10#$a)) (to get a positive number). or a=-0010; echo $((33#$a)) to get -8 instead of the correct -33 or a=-001a; echo $((33#$a)) to get this error: bash: 33#-001a: value too great for base (error token is "001a") Ã¢Â€Â¦ Ã¢Â€Â¦ ksh has similar problems.
â€“Â Isaac
Aug 8 at 11:44

@JeffSchaller And bash is not sh.
â€“Â Isaac
Aug 8 at 12:06

add a commentÂ |Â

up vote
4
down vote

favorite

It is a "feature" of the shell that a number with a leading zero is interpreted as an octal number:

$ echo "$((00100))"
64

But there is no way to disallow this "feature" in many shells, so, it becomes difficult to force the interpretation of a digit sequence as a decimal (or other base) number.

When there is only a number to convert, there are several external programs that could do the trimming:

expr "00100" + 0 
echo "00100" | sed 's/^0*//'
echo "00100" | grep -o '[^0].*$'
echo "00100" | awk 'print int($0)'
echo "00100" | perl -pe '$_=int."n";'

expr 1.934
sed 3.450
grep 3.775
awk 5.291
perl 5.064

Of course (except expr) most tools could process a file with a 1000 lines in:

sed file 0.004
grep file 0.003
awk file 0.007
perl file 0.006

If all the individual 1000 values are available at the same point in time.

That could not be the case. So, what still remains to be answered is:

Is there a native (to the shell) way to extract an integer that is faster than calling external tools for each individual integer (not a list in a file) ?

Each call accumulates and the delay will become important.

The processing becomes more involved if the number may also have a leading sign and you want to reject invalid numbers.

edited Aug 9 at 3:56

asked Aug 8 at 11:25

Isaac

6,9851834

At least bash, probably other shells, has a way to force a base Ã¢Â€Â” 10#0010
â€“Â Jeff Schaller
Aug 8 at 11:37

@JeffSchaller Try (in bash) a=-00100;echo $((10#$a)) or a=-++-00100; echo $((10#$a)) (to get a positive number). or a=-0010; echo $((33#$a)) to get -8 instead of the correct -33 or a=-001a; echo $((33#$a)) to get this error: bash: 33#-001a: value too great for base (error token is "001a") Ã¢Â€Â¦ Ã¢Â€Â¦ ksh has similar problems.
â€“Â Isaac
Aug 8 at 11:44

@JeffSchaller And bash is not sh.
â€“Â Isaac
Aug 8 at 12:06

add a commentÂ |Â

up vote
4
down vote

favorite

It is a "feature" of the shell that a number with a leading zero is interpreted as an octal number:

$ echo "$((00100))"
64

But there is no way to disallow this "feature" in many shells, so, it becomes difficult to force the interpretation of a digit sequence as a decimal (or other base) number.

When there is only a number to convert, there are several external programs that could do the trimming:

expr "00100" + 0 
echo "00100" | sed 's/^0*//'
echo "00100" | grep -o '[^0].*$'
echo "00100" | awk 'print int($0)'
echo "00100" | perl -pe '$_=int."n";'

expr 1.934
sed 3.450
grep 3.775
awk 5.291
perl 5.064

Of course (except expr) most tools could process a file with a 1000 lines in:

sed file 0.004
grep file 0.003
awk file 0.007
perl file 0.006

If all the individual 1000 values are available at the same point in time.

That could not be the case. So, what still remains to be answered is:

Is there a native (to the shell) way to extract an integer that is faster than calling external tools for each individual integer (not a list in a file) ?

Each call accumulates and the delay will become important.

The processing becomes more involved if the number may also have a leading sign and you want to reject invalid numbers.

edited Aug 9 at 3:56

asked Aug 8 at 11:25

Isaac

6,9851834

It is a "feature" of the shell that a number with a leading zero is interpreted as an octal number:

$ echo "$((00100))"
64

But there is no way to disallow this "feature" in many shells, so, it becomes difficult to force the interpretation of a digit sequence as a decimal (or other base) number.

When there is only a number to convert, there are several external programs that could do the trimming:

expr "00100" + 0 
echo "00100" | sed 's/^0*//'
echo "00100" | grep -o '[^0].*$'
echo "00100" | awk 'print int($0)'
echo "00100" | perl -pe '$_=int."n";'

expr 1.934
sed 3.450
grep 3.775
awk 5.291
perl 5.064

Of course (except expr) most tools could process a file with a 1000 lines in:

sed file 0.004
grep file 0.003
awk file 0.007
perl file 0.006

If all the individual 1000 values are available at the same point in time.

That could not be the case. So, what still remains to be answered is:

Is there a native (to the shell) way to extract an integer that is faster than calling external tools for each individual integer (not a list in a file) ?

Each call accumulates and the delay will become important.

The processing becomes more involved if the number may also have a leading sign and you want to reject invalid numbers.

shell-script shell

edited Aug 9 at 3:56

asked Aug 8 at 11:25

Isaac

6,9851834

edited Aug 9 at 3:56

asked Aug 8 at 11:25

Isaac

6,9851834

edited Aug 9 at 3:56

asked Aug 8 at 11:25

Isaac

6,9851834

asked Aug 8 at 11:25

Isaac

6,9851834

asked Aug 8 at 11:25

Isaac

6,9851834

At least bash, probably other shells, has a way to force a base Ã¢Â€Â” 10#0010
â€“Â Jeff Schaller
Aug 8 at 11:37

@JeffSchaller Try (in bash) a=-00100;echo $((10#$a)) or a=-++-00100; echo $((10#$a)) (to get a positive number). or a=-0010; echo $((33#$a)) to get -8 instead of the correct -33 or a=-001a; echo $((33#$a)) to get this error: bash: 33#-001a: value too great for base (error token is "001a") Ã¢Â€Â¦ Ã¢Â€Â¦ ksh has similar problems.
â€“Â Isaac
Aug 8 at 11:44

@JeffSchaller And bash is not sh.
â€“Â Isaac
Aug 8 at 12:06

add a commentÂ |Â

At least bash, probably other shells, has a way to force a base Ã¢Â€Â” 10#0010
â€“Â Jeff Schaller
Aug 8 at 11:37

@JeffSchaller Try (in bash) a=-00100;echo $((10#$a)) or a=-++-00100; echo $((10#$a)) (to get a positive number). or a=-0010; echo $((33#$a)) to get -8 instead of the correct -33 or a=-001a; echo $((33#$a)) to get this error: bash: 33#-001a: value too great for base (error token is "001a") Ã¢Â€Â¦ Ã¢Â€Â¦ ksh has similar problems.
â€“Â Isaac
Aug 8 at 11:44

@JeffSchaller And bash is not sh.
â€“Â Isaac
Aug 8 at 12:06

At least bash, probably other shells, has a way to force a base Ã¢Â€Â” 10#0010
â€“Â Jeff Schaller
Aug 8 at 11:37

@JeffSchaller Try (in bash) a=-00100;echo $((10#$a)) or a=-++-00100; echo $((10#$a)) (to get a positive number). or a=-0010; echo $((33#$a)) to get -8 instead of the correct -33 or a=-001a; echo $((33#$a)) to get this error: bash: 33#-001a: value too great for base (error token is "001a") Ã¢Â€Â¦ Ã¢Â€Â¦ ksh has similar problems.
â€“Â Isaac
Aug 8 at 11:44

@JeffSchaller And bash is not sh.
â€“Â Isaac
Aug 8 at 12:06

add a commentÂ |Â

3 Answers
3

active

oldest

votes

up vote
2
down vote

Note that while $((010)) is required by POSIX to expand to 8, several shells don't do it by default (or only in some contexts) unless in a conformance mode as that's a feature you usually do not want.

With zsh, that's controlled by the octalzeroes option (off by default except in sh/ksh emulation).

$ zsh -c 'echo $((010))'
10
$ zsh -o octalzeroes -c 'echo $((010))'
8
$ (exec -a sh zsh -c 'echo "$((010))"')
8

In mksh, that's controlled by the posix option (off by default):

$ mksh -c 'echo "$((010))"'
10
$ mksh -o posix -c 'echo "$((010))"'
8

In bash, there's no option to turn it off, but you can use the $((10#010)) ksh syntax to force interpretation in decimal (also works in ksh and zsh), though in bash and mksh -o posix, $((10#-010)) doesn't work (treated as 10#0 - 010 as you can see from the expansion of $((-10#-010)) yielding -8), you need $((-10#010)) (or $((- 10#010)) for compatibility with zsh which complains about -10 being an invalid base).

$ bash -c 'echo "$((10#010))"'
10

With ksh93, compare:

$ ksh93 -c 'echo "$((010))"'
8
$ ksh93 -c '((a = 010)); echo "$a"'
8

with:

$ ksh93 -c 'a=010; echo "$((a))"'
10
$ ksh93 -c 'printf "%dn" 010'
10
$ ksh93 -c 'let a=010; echo "$a"'
10
$ ksh93 -c 'echo "$((010e0))"'
10
$ ksh93 -o letoctal -c 'let a=010; echo "$a"'
8

So at least if you're coding for any of those shells specifically, there are ways to work around that "misfeature".

But none of those would help when writing a POSIX portable script, in which case, you'd want to strip the leading zeros as you have shown.

edited Aug 8 at 16:23

answered Aug 8 at 14:18

StÃ©phane Chazelas

284k53523861

add a commentÂ |Â

up vote
0
down vote

Something similar could be done in one line with:

$ a=-00100; a=$a%"$a#[+-]"$a#"$a%%[!0+-]*"; a=$a:-0
$ echo "$a"
-100

It just takes 0.0482 for 1000 repetitions, 100 times less than using an external program.

That's based on two double parameter expansions:

Extract the sign:
- $a#[+-] remove the first character provided it is a sign.
- $a%"$a#[+-]" keeps the first sign provided that it is a sign.

Remove all leading signs and/or zeros:
- $a%%[!0+-]* remove starting at any ( not 0 or + or - ) to the end.
- $a#"$a%%[!0+-]*" remove the above, i.e., all leading zeros and signs.

That picks one sign and remove all leading zeros.
However it allows (without error):

Several leading signs.

Any characters after the leading signs and zeros.

An "out of range" (too big) number.

If those tests are needed, keep reading.

The number of signs could be tested with:

signs=$a%%[!+-]* 
[ $#signs -gt 1 ] && echo "$0: Invalid number $a: Too many signs"

The kind of characters allowed could be checked with:

num=$a#"$a%%[!0+-]*"

any=$num%%[!0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ@_]*
[ "$any" != "$num" ] && echo "$0: Invalid number $a"

hex=$num%%[!0123456789abcdefABCDEF]*
[ "$hex" != "$num" ] && echo "$0: Invalid hexadecimal number $a"

dec=$num%%[!0123456789]*
[ "$dec" != "$num" ] && echo "$0: Invalid decimal number $a"

And, finally, we can take advantage of the capacity of printf of printing a warning for numbers "out of range" (only for bases that printf understand):

printf '%d' $sign$dec >/dev/null # for a decimal number
printf '%d' "$sign0x$hex" >/dev/null # for hex numbers

Yes, all printf use %d, it is not a typo.

And, yes, all the above works in most shells that have printf.

edited Aug 9 at 2:34

answered Aug 8 at 11:25

Isaac

6,9851834

0-9 or [:digit:] are not allowed to match more than 0123456789. This is becaise [:digit:] matches the same as isdigit() and that is not allowed to match more. 0-9 is contiguous even on EBCDIC
â€“Â schily
Aug 8 at 12:45

1

I had that misconception as well, but [[:digit:]] is required to match on 012345679 only, and not [0-9] and in practice, there are systems including GNU ones where [0-9] matches other characters in some locales.
â€“Â StÃ©phane Chazelas
Aug 8 at 12:46

@schily, that's not true of [0-9], there was a recent discussion on the austin-group-l ML which you participated in about it.
â€“Â StÃ©phane Chazelas
Aug 8 at 12:47

1

The question is whether there are other collating elements that are in the collating range 0..9. Do you know of such elements?
â€“Â schily
Aug 8 at 12:48

@schily, the discussion I was referring to is at mail-archive.com/austin-group-l@opengroup.org/msg02476.html showing my initial misconception, see the follow-ups which address most of the questions you have. Including, yes, that only applies to POSIX for [[:digit:]], and examples of systems including Solaris where 0-9 matches other collation elements and which ones.
â€“Â StÃ©phane Chazelas
Aug 8 at 12:57

Â |Â
show 6 more comments

up vote
0
down vote

Here is your example x1000 on my system:

$ cat shell.sh
#!/bin/dash
q=1
while [ "$q" -le 1000 ]
do
 z=-00100
 z=$z%"$z#[+-]"$z#"$z%%[!0+-]*"
 z=$z:-0
 echo "$z"
 q=$((q + 1))
done

Result:

$ time ./shell.sh >/dev/null
real 0m0.047s

Now I take issue with the sed example. I do see an example with a file, but i am
not seeing a clear reason why using a file is not acceptable. also the example
with pipe is problematic because a pipe is not needed - nor is calling sed 1000
times. if you simply cant use a file for whatever reason - a heredoc would be
fine:

cat > sed.sh <<alfa
sed 's/^0*//' <<bravo
$(yes 00100 | head -1000)
bravo
alfa

Result:

$ time ./sed.sh >/dev/null
real 0m0.047s

So on my system it is the exact same speed without the fuss.

answered Aug 9 at 3:20

Steven Penny

2,31521635

The actual code for the loop has not been added for simplicity. If you insist, I could add it (note that it is a simple for loop from 1 to 1000). Also, I am still working to complete this question and answer. it has become quite more convoluted than what I though on the outset.
â€“Â Isaac
Aug 9 at 3:40

In timing, you get 0.047, I got 0.048, so we quite agree. Now if you try something like time for (( i=0; i<$n; i++)); do echo "00100" | awk 'print int($0)' ; done >/dev/null (where n is 1000) for each of the possible solutions, you should get timings similar to what I got. As for the file, the reason is very simple, there are times when there is only one value to convert, repeat that conversions at many different times (needs) and the delays will add up.
â€“Â Isaac
Aug 9 at 3:46

add a commentÂ |Â

Your Answer

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f461263%2finteger-with-leading-zeros-portable%23new-answer', 'question_page');

);

Post as a guest

Name

3 Answers
3

active

oldest

votes

3 Answers
3

active

oldest

votes

up vote
2
down vote

With zsh, that's controlled by the octalzeroes option (off by default except in sh/ksh emulation).

$ zsh -c 'echo $((010))'
10
$ zsh -o octalzeroes -c 'echo $((010))'
8
$ (exec -a sh zsh -c 'echo "$((010))"')
8

In mksh, that's controlled by the posix option (off by default):

$ mksh -c 'echo "$((010))"'
10
$ mksh -o posix -c 'echo "$((010))"'
8

$ bash -c 'echo "$((10#010))"'
10

With ksh93, compare:

$ ksh93 -c 'echo "$((010))"'
8
$ ksh93 -c '((a = 010)); echo "$a"'
8

with:

$ ksh93 -c 'a=010; echo "$((a))"'
10
$ ksh93 -c 'printf "%dn" 010'
10
$ ksh93 -c 'let a=010; echo "$a"'
10
$ ksh93 -c 'echo "$((010e0))"'
10
$ ksh93 -o letoctal -c 'let a=010; echo "$a"'
8

So at least if you're coding for any of those shells specifically, there are ways to work around that "misfeature".

But none of those would help when writing a POSIX portable script, in which case, you'd want to strip the leading zeros as you have shown.

edited Aug 8 at 16:23

answered Aug 8 at 14:18

StÃ©phane Chazelas

284k53523861

add a commentÂ |Â

up vote
2
down vote

With zsh, that's controlled by the octalzeroes option (off by default except in sh/ksh emulation).

$ zsh -c 'echo $((010))'
10
$ zsh -o octalzeroes -c 'echo $((010))'
8
$ (exec -a sh zsh -c 'echo "$((010))"')
8

In mksh, that's controlled by the posix option (off by default):

$ mksh -c 'echo "$((010))"'
10
$ mksh -o posix -c 'echo "$((010))"'
8

$ bash -c 'echo "$((10#010))"'
10

With ksh93, compare:

$ ksh93 -c 'echo "$((010))"'
8
$ ksh93 -c '((a = 010)); echo "$a"'
8

with:

$ ksh93 -c 'a=010; echo "$((a))"'
10
$ ksh93 -c 'printf "%dn" 010'
10
$ ksh93 -c 'let a=010; echo "$a"'
10
$ ksh93 -c 'echo "$((010e0))"'
10
$ ksh93 -o letoctal -c 'let a=010; echo "$a"'
8

So at least if you're coding for any of those shells specifically, there are ways to work around that "misfeature".

But none of those would help when writing a POSIX portable script, in which case, you'd want to strip the leading zeros as you have shown.

edited Aug 8 at 16:23

answered Aug 8 at 14:18

StÃ©phane Chazelas

284k53523861

add a commentÂ |Â

up vote
2
down vote

With zsh, that's controlled by the octalzeroes option (off by default except in sh/ksh emulation).

$ zsh -c 'echo $((010))'
10
$ zsh -o octalzeroes -c 'echo $((010))'
8
$ (exec -a sh zsh -c 'echo "$((010))"')
8

In mksh, that's controlled by the posix option (off by default):

$ mksh -c 'echo "$((010))"'
10
$ mksh -o posix -c 'echo "$((010))"'
8

$ bash -c 'echo "$((10#010))"'
10

With ksh93, compare:

$ ksh93 -c 'echo "$((010))"'
8
$ ksh93 -c '((a = 010)); echo "$a"'
8

with:

$ ksh93 -c 'a=010; echo "$((a))"'
10
$ ksh93 -c 'printf "%dn" 010'
10
$ ksh93 -c 'let a=010; echo "$a"'
10
$ ksh93 -c 'echo "$((010e0))"'
10
$ ksh93 -o letoctal -c 'let a=010; echo "$a"'
8

So at least if you're coding for any of those shells specifically, there are ways to work around that "misfeature".

But none of those would help when writing a POSIX portable script, in which case, you'd want to strip the leading zeros as you have shown.

edited Aug 8 at 16:23

answered Aug 8 at 14:18

StÃ©phane Chazelas

284k53523861

With zsh, that's controlled by the octalzeroes option (off by default except in sh/ksh emulation).

$ zsh -c 'echo $((010))'
10
$ zsh -o octalzeroes -c 'echo $((010))'
8
$ (exec -a sh zsh -c 'echo "$((010))"')
8

In mksh, that's controlled by the posix option (off by default):

$ mksh -c 'echo "$((010))"'
10
$ mksh -o posix -c 'echo "$((010))"'
8

$ bash -c 'echo "$((10#010))"'
10

With ksh93, compare:

$ ksh93 -c 'echo "$((010))"'
8
$ ksh93 -c '((a = 010)); echo "$a"'
8

with:

$ ksh93 -c 'a=010; echo "$((a))"'
10
$ ksh93 -c 'printf "%dn" 010'
10
$ ksh93 -c 'let a=010; echo "$a"'
10
$ ksh93 -c 'echo "$((010e0))"'
10
$ ksh93 -o letoctal -c 'let a=010; echo "$a"'
8

So at least if you're coding for any of those shells specifically, there are ways to work around that "misfeature".

But none of those would help when writing a POSIX portable script, in which case, you'd want to strip the leading zeros as you have shown.

edited Aug 8 at 16:23

answered Aug 8 at 14:18

StÃ©phane Chazelas

284k53523861

edited Aug 8 at 16:23

answered Aug 8 at 14:18

StÃ©phane Chazelas

284k53523861

answered Aug 8 at 14:18

StÃ©phane Chazelas

284k53523861

answered Aug 8 at 14:18

StÃ©phane Chazelas

284k53523861

add a commentÂ |Â

up vote
0
down vote

Something similar could be done in one line with:

$ a=-00100; a=$a%"$a#[+-]"$a#"$a%%[!0+-]*"; a=$a:-0
$ echo "$a"
-100

It just takes 0.0482 for 1000 repetitions, 100 times less than using an external program.

That's based on two double parameter expansions:

Extract the sign:
- $a#[+-] remove the first character provided it is a sign.
- $a%"$a#[+-]" keeps the first sign provided that it is a sign.

Remove all leading signs and/or zeros:
- $a%%[!0+-]* remove starting at any ( not 0 or + or - ) to the end.
- $a#"$a%%[!0+-]*" remove the above, i.e., all leading zeros and signs.

That picks one sign and remove all leading zeros.
However it allows (without error):

Several leading signs.

Any characters after the leading signs and zeros.

An "out of range" (too big) number.

If those tests are needed, keep reading.

The number of signs could be tested with:

signs=$a%%[!+-]* 
[ $#signs -gt 1 ] && echo "$0: Invalid number $a: Too many signs"

The kind of characters allowed could be checked with:

num=$a#"$a%%[!0+-]*"

any=$num%%[!0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ@_]*
[ "$any" != "$num" ] && echo "$0: Invalid number $a"

hex=$num%%[!0123456789abcdefABCDEF]*
[ "$hex" != "$num" ] && echo "$0: Invalid hexadecimal number $a"

dec=$num%%[!0123456789]*
[ "$dec" != "$num" ] && echo "$0: Invalid decimal number $a"

And, finally, we can take advantage of the capacity of printf of printing a warning for numbers "out of range" (only for bases that printf understand):

printf '%d' $sign$dec >/dev/null # for a decimal number
printf '%d' "$sign0x$hex" >/dev/null # for hex numbers

Yes, all printf use %d, it is not a typo.

And, yes, all the above works in most shells that have printf.

edited Aug 9 at 2:34

answered Aug 8 at 11:25

Isaac

6,9851834

0-9 or [:digit:] are not allowed to match more than 0123456789. This is becaise [:digit:] matches the same as isdigit() and that is not allowed to match more. 0-9 is contiguous even on EBCDIC
â€“Â schily
Aug 8 at 12:45

1

I had that misconception as well, but [[:digit:]] is required to match on 012345679 only, and not [0-9] and in practice, there are systems including GNU ones where [0-9] matches other characters in some locales.
â€“Â StÃ©phane Chazelas
Aug 8 at 12:46

@schily, that's not true of [0-9], there was a recent discussion on the austin-group-l ML which you participated in about it.
â€“Â StÃ©phane Chazelas
Aug 8 at 12:47

1

The question is whether there are other collating elements that are in the collating range 0..9. Do you know of such elements?
â€“Â schily
Aug 8 at 12:48

@schily, the discussion I was referring to is at mail-archive.com/austin-group-l@opengroup.org/msg02476.html showing my initial misconception, see the follow-ups which address most of the questions you have. Including, yes, that only applies to POSIX for [[:digit:]], and examples of systems including Solaris where 0-9 matches other collation elements and which ones.
â€“Â StÃ©phane Chazelas
Aug 8 at 12:57

Â |Â
show 6 more comments

up vote
0
down vote

Something similar could be done in one line with:

$ a=-00100; a=$a%"$a#[+-]"$a#"$a%%[!0+-]*"; a=$a:-0
$ echo "$a"
-100

It just takes 0.0482 for 1000 repetitions, 100 times less than using an external program.

That's based on two double parameter expansions:

Extract the sign:
- $a#[+-] remove the first character provided it is a sign.
- $a%"$a#[+-]" keeps the first sign provided that it is a sign.

Remove all leading signs and/or zeros:
- $a%%[!0+-]* remove starting at any ( not 0 or + or - ) to the end.
- $a#"$a%%[!0+-]*" remove the above, i.e., all leading zeros and signs.

That picks one sign and remove all leading zeros.
However it allows (without error):

Several leading signs.

Any characters after the leading signs and zeros.

An "out of range" (too big) number.

If those tests are needed, keep reading.

The number of signs could be tested with:

signs=$a%%[!+-]* 
[ $#signs -gt 1 ] && echo "$0: Invalid number $a: Too many signs"

The kind of characters allowed could be checked with:

num=$a#"$a%%[!0+-]*"

any=$num%%[!0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ@_]*
[ "$any" != "$num" ] && echo "$0: Invalid number $a"

hex=$num%%[!0123456789abcdefABCDEF]*
[ "$hex" != "$num" ] && echo "$0: Invalid hexadecimal number $a"

dec=$num%%[!0123456789]*
[ "$dec" != "$num" ] && echo "$0: Invalid decimal number $a"

And, finally, we can take advantage of the capacity of printf of printing a warning for numbers "out of range" (only for bases that printf understand):

printf '%d' $sign$dec >/dev/null # for a decimal number
printf '%d' "$sign0x$hex" >/dev/null # for hex numbers

Yes, all printf use %d, it is not a typo.

And, yes, all the above works in most shells that have printf.

edited Aug 9 at 2:34

answered Aug 8 at 11:25

Isaac

6,9851834

0-9 or [:digit:] are not allowed to match more than 0123456789. This is becaise [:digit:] matches the same as isdigit() and that is not allowed to match more. 0-9 is contiguous even on EBCDIC
â€“Â schily
Aug 8 at 12:45

1

I had that misconception as well, but [[:digit:]] is required to match on 012345679 only, and not [0-9] and in practice, there are systems including GNU ones where [0-9] matches other characters in some locales.
â€“Â StÃ©phane Chazelas
Aug 8 at 12:46

@schily, that's not true of [0-9], there was a recent discussion on the austin-group-l ML which you participated in about it.
â€“Â StÃ©phane Chazelas
Aug 8 at 12:47

1

The question is whether there are other collating elements that are in the collating range 0..9. Do you know of such elements?
â€“Â schily
Aug 8 at 12:48

@schily, the discussion I was referring to is at mail-archive.com/austin-group-l@opengroup.org/msg02476.html showing my initial misconception, see the follow-ups which address most of the questions you have. Including, yes, that only applies to POSIX for [[:digit:]], and examples of systems including Solaris where 0-9 matches other collation elements and which ones.
â€“Â StÃ©phane Chazelas
Aug 8 at 12:57

Â |Â
show 6 more comments

up vote
0
down vote

Something similar could be done in one line with:

$ a=-00100; a=$a%"$a#[+-]"$a#"$a%%[!0+-]*"; a=$a:-0
$ echo "$a"
-100

It just takes 0.0482 for 1000 repetitions, 100 times less than using an external program.

That's based on two double parameter expansions:

Extract the sign:
- $a#[+-] remove the first character provided it is a sign.
- $a%"$a#[+-]" keeps the first sign provided that it is a sign.

Remove all leading signs and/or zeros:
- $a%%[!0+-]* remove starting at any ( not 0 or + or - ) to the end.
- $a#"$a%%[!0+-]*" remove the above, i.e., all leading zeros and signs.

That picks one sign and remove all leading zeros.
However it allows (without error):

Several leading signs.

Any characters after the leading signs and zeros.

An "out of range" (too big) number.

If those tests are needed, keep reading.

The number of signs could be tested with:

signs=$a%%[!+-]* 
[ $#signs -gt 1 ] && echo "$0: Invalid number $a: Too many signs"

The kind of characters allowed could be checked with:

num=$a#"$a%%[!0+-]*"

any=$num%%[!0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ@_]*
[ "$any" != "$num" ] && echo "$0: Invalid number $a"

hex=$num%%[!0123456789abcdefABCDEF]*
[ "$hex" != "$num" ] && echo "$0: Invalid hexadecimal number $a"

dec=$num%%[!0123456789]*
[ "$dec" != "$num" ] && echo "$0: Invalid decimal number $a"

And, finally, we can take advantage of the capacity of printf of printing a warning for numbers "out of range" (only for bases that printf understand):

printf '%d' $sign$dec >/dev/null # for a decimal number
printf '%d' "$sign0x$hex" >/dev/null # for hex numbers

Yes, all printf use %d, it is not a typo.

And, yes, all the above works in most shells that have printf.

edited Aug 9 at 2:34

answered Aug 8 at 11:25

Isaac

6,9851834

Something similar could be done in one line with:

$ a=-00100; a=$a%"$a#[+-]"$a#"$a%%[!0+-]*"; a=$a:-0
$ echo "$a"
-100

It just takes 0.0482 for 1000 repetitions, 100 times less than using an external program.

That's based on two double parameter expansions:

Extract the sign:
- $a#[+-] remove the first character provided it is a sign.
- $a%"$a#[+-]" keeps the first sign provided that it is a sign.

Remove all leading signs and/or zeros:
- $a%%[!0+-]* remove starting at any ( not 0 or + or - ) to the end.
- $a#"$a%%[!0+-]*" remove the above, i.e., all leading zeros and signs.

That picks one sign and remove all leading zeros.
However it allows (without error):

Several leading signs.

Any characters after the leading signs and zeros.

An "out of range" (too big) number.

If those tests are needed, keep reading.

The number of signs could be tested with:

signs=$a%%[!+-]* 
[ $#signs -gt 1 ] && echo "$0: Invalid number $a: Too many signs"

The kind of characters allowed could be checked with:

num=$a#"$a%%[!0+-]*"

any=$num%%[!0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ@_]*
[ "$any" != "$num" ] && echo "$0: Invalid number $a"

hex=$num%%[!0123456789abcdefABCDEF]*
[ "$hex" != "$num" ] && echo "$0: Invalid hexadecimal number $a"

dec=$num%%[!0123456789]*
[ "$dec" != "$num" ] && echo "$0: Invalid decimal number $a"

And, finally, we can take advantage of the capacity of printf of printing a warning for numbers "out of range" (only for bases that printf understand):

printf '%d' $sign$dec >/dev/null # for a decimal number
printf '%d' "$sign0x$hex" >/dev/null # for hex numbers

Yes, all printf use %d, it is not a typo.

And, yes, all the above works in most shells that have printf.

edited Aug 9 at 2:34

answered Aug 8 at 11:25

Isaac

6,9851834

edited Aug 9 at 2:34

answered Aug 8 at 11:25

Isaac

6,9851834

answered Aug 8 at 11:25

Isaac

6,9851834

answered Aug 8 at 11:25

Isaac

6,9851834

0-9 or [:digit:] are not allowed to match more than 0123456789. This is becaise [:digit:] matches the same as isdigit() and that is not allowed to match more. 0-9 is contiguous even on EBCDIC
â€“Â schily
Aug 8 at 12:45

1

I had that misconception as well, but [[:digit:]] is required to match on 012345679 only, and not [0-9] and in practice, there are systems including GNU ones where [0-9] matches other characters in some locales.
â€“Â StÃ©phane Chazelas
Aug 8 at 12:46

@schily, that's not true of [0-9], there was a recent discussion on the austin-group-l ML which you participated in about it.
â€“Â StÃ©phane Chazelas
Aug 8 at 12:47

1

The question is whether there are other collating elements that are in the collating range 0..9. Do you know of such elements?
â€“Â schily
Aug 8 at 12:48

@schily, the discussion I was referring to is at mail-archive.com/austin-group-l@opengroup.org/msg02476.html showing my initial misconception, see the follow-ups which address most of the questions you have. Including, yes, that only applies to POSIX for [[:digit:]], and examples of systems including Solaris where 0-9 matches other collation elements and which ones.
â€“Â StÃ©phane Chazelas
Aug 8 at 12:57

Â |Â
show 6 more comments

0-9 or [:digit:] are not allowed to match more than 0123456789. This is becaise [:digit:] matches the same as isdigit() and that is not allowed to match more. 0-9 is contiguous even on EBCDIC
â€“Â schily
Aug 8 at 12:45

1

I had that misconception as well, but [[:digit:]] is required to match on 012345679 only, and not [0-9] and in practice, there are systems including GNU ones where [0-9] matches other characters in some locales.
â€“Â StÃ©phane Chazelas
Aug 8 at 12:46

@schily, that's not true of [0-9], there was a recent discussion on the austin-group-l ML which you participated in about it.
â€“Â StÃ©phane Chazelas
Aug 8 at 12:47

1

The question is whether there are other collating elements that are in the collating range 0..9. Do you know of such elements?
â€“Â schily
Aug 8 at 12:48

@schily, the discussion I was referring to is at mail-archive.com/austin-group-l@opengroup.org/msg02476.html showing my initial misconception, see the follow-ups which address most of the questions you have. Including, yes, that only applies to POSIX for [[:digit:]], and examples of systems including Solaris where 0-9 matches other collation elements and which ones.
â€“Â StÃ©phane Chazelas
Aug 8 at 12:57

0-9 or [:digit:] are not allowed to match more than 0123456789. This is becaise [:digit:] matches the same as isdigit() and that is not allowed to match more. 0-9 is contiguous even on EBCDIC
â€“Â schily
Aug 8 at 12:45

I had that misconception as well, but [[:digit:]] is required to match on 012345679 only, and not [0-9] and in practice, there are systems including GNU ones where [0-9] matches other characters in some locales.
â€“Â StÃ©phane Chazelas
Aug 8 at 12:46

@schily, that's not true of [0-9], there was a recent discussion on the austin-group-l ML which you participated in about it.
â€“Â StÃ©phane Chazelas
Aug 8 at 12:47

The question is whether there are other collating elements that are in the collating range 0..9. Do you know of such elements?
â€“Â schily
Aug 8 at 12:48

@schily, the discussion I was referring to is at mail-archive.com/austin-group-l@opengroup.org/msg02476.html showing my initial misconception, see the follow-ups which address most of the questions you have. Including, yes, that only applies to POSIX for [[:digit:]], and examples of systems including Solaris where 0-9 matches other collation elements and which ones.
â€“Â StÃ©phane Chazelas
Aug 8 at 12:57

Â |Â
show 6 more comments

up vote
0
down vote

Here is your example x1000 on my system:

$ cat shell.sh
#!/bin/dash
q=1
while [ "$q" -le 1000 ]
do
 z=-00100
 z=$z%"$z#[+-]"$z#"$z%%[!0+-]*"
 z=$z:-0
 echo "$z"
 q=$((q + 1))
done

Result:

$ time ./shell.sh >/dev/null
real 0m0.047s

cat > sed.sh <<alfa
sed 's/^0*//' <<bravo
$(yes 00100 | head -1000)
bravo
alfa

Result:

$ time ./sed.sh >/dev/null
real 0m0.047s

So on my system it is the exact same speed without the fuss.

answered Aug 9 at 3:20

Steven Penny

2,31521635

The actual code for the loop has not been added for simplicity. If you insist, I could add it (note that it is a simple for loop from 1 to 1000). Also, I am still working to complete this question and answer. it has become quite more convoluted than what I though on the outset.
â€“Â Isaac
Aug 9 at 3:40

In timing, you get 0.047, I got 0.048, so we quite agree. Now if you try something like time for (( i=0; i<$n; i++)); do echo "00100" | awk 'print int($0)' ; done >/dev/null (where n is 1000) for each of the possible solutions, you should get timings similar to what I got. As for the file, the reason is very simple, there are times when there is only one value to convert, repeat that conversions at many different times (needs) and the delays will add up.
â€“Â Isaac
Aug 9 at 3:46

add a commentÂ |Â

up vote
0
down vote

Here is your example x1000 on my system:

$ cat shell.sh
#!/bin/dash
q=1
while [ "$q" -le 1000 ]
do
 z=-00100
 z=$z%"$z#[+-]"$z#"$z%%[!0+-]*"
 z=$z:-0
 echo "$z"
 q=$((q + 1))
done

Result:

$ time ./shell.sh >/dev/null
real 0m0.047s

cat > sed.sh <<alfa
sed 's/^0*//' <<bravo
$(yes 00100 | head -1000)
bravo
alfa

Result:

$ time ./sed.sh >/dev/null
real 0m0.047s

So on my system it is the exact same speed without the fuss.

answered Aug 9 at 3:20

Steven Penny

2,31521635

The actual code for the loop has not been added for simplicity. If you insist, I could add it (note that it is a simple for loop from 1 to 1000). Also, I am still working to complete this question and answer. it has become quite more convoluted than what I though on the outset.
â€“Â Isaac
Aug 9 at 3:40

In timing, you get 0.047, I got 0.048, so we quite agree. Now if you try something like time for (( i=0; i<$n; i++)); do echo "00100" | awk 'print int($0)' ; done >/dev/null (where n is 1000) for each of the possible solutions, you should get timings similar to what I got. As for the file, the reason is very simple, there are times when there is only one value to convert, repeat that conversions at many different times (needs) and the delays will add up.
â€“Â Isaac
Aug 9 at 3:46

add a commentÂ |Â

up vote
0
down vote

Here is your example x1000 on my system:

$ cat shell.sh
#!/bin/dash
q=1
while [ "$q" -le 1000 ]
do
 z=-00100
 z=$z%"$z#[+-]"$z#"$z%%[!0+-]*"
 z=$z:-0
 echo "$z"
 q=$((q + 1))
done

Result:

$ time ./shell.sh >/dev/null
real 0m0.047s

cat > sed.sh <<alfa
sed 's/^0*//' <<bravo
$(yes 00100 | head -1000)
bravo
alfa

Result:

$ time ./sed.sh >/dev/null
real 0m0.047s

So on my system it is the exact same speed without the fuss.

answered Aug 9 at 3:20

Steven Penny

2,31521635

Here is your example x1000 on my system:

$ cat shell.sh
#!/bin/dash
q=1
while [ "$q" -le 1000 ]
do
 z=-00100
 z=$z%"$z#[+-]"$z#"$z%%[!0+-]*"
 z=$z:-0
 echo "$z"
 q=$((q + 1))
done

Result:

$ time ./shell.sh >/dev/null
real 0m0.047s

cat > sed.sh <<alfa
sed 's/^0*//' <<bravo
$(yes 00100 | head -1000)
bravo
alfa

Result:

$ time ./sed.sh >/dev/null
real 0m0.047s

So on my system it is the exact same speed without the fuss.

answered Aug 9 at 3:20

Steven Penny

2,31521635

answered Aug 9 at 3:20

Steven Penny

2,31521635

answered Aug 9 at 3:20

Steven Penny

2,31521635

answered Aug 9 at 3:20

Steven Penny

2,31521635

The actual code for the loop has not been added for simplicity. If you insist, I could add it (note that it is a simple for loop from 1 to 1000). Also, I am still working to complete this question and answer. it has become quite more convoluted than what I though on the outset.
â€“Â Isaac
Aug 9 at 3:40

In timing, you get 0.047, I got 0.048, so we quite agree. Now if you try something like time for (( i=0; i<$n; i++)); do echo "00100" | awk 'print int($0)' ; done >/dev/null (where n is 1000) for each of the possible solutions, you should get timings similar to what I got. As for the file, the reason is very simple, there are times when there is only one value to convert, repeat that conversions at many different times (needs) and the delays will add up.
â€“Â Isaac
Aug 9 at 3:46

add a commentÂ |Â

The actual code for the loop has not been added for simplicity. If you insist, I could add it (note that it is a simple for loop from 1 to 1000). Also, I am still working to complete this question and answer. it has become quite more convoluted than what I though on the outset.
â€“Â Isaac
Aug 9 at 3:40

In timing, you get 0.047, I got 0.048, so we quite agree. Now if you try something like time for (( i=0; i<$n; i++)); do echo "00100" | awk 'print int($0)' ; done >/dev/null (where n is 1000) for each of the possible solutions, you should get timings similar to what I got. As for the file, the reason is very simple, there are times when there is only one value to convert, repeat that conversions at many different times (needs) and the delays will add up.
â€“Â Isaac
Aug 9 at 3:46

The actual code for the loop has not been added for simplicity. If you insist, I could add it (note that it is a simple for loop from 1 to 1000). Also, I am still working to complete this question and answer. it has become quite more convoluted than what I though on the outset.
â€“Â Isaac
Aug 9 at 3:40

In timing, you get 0.047, I got 0.048, so we quite agree. Now if you try something like time for (( i=0; i<$n; i++)); do echo "00100" | awk 'print int($0)' ; done >/dev/null (where n is 1000) for each of the possible solutions, you should get timings similar to what I got. As for the file, the reason is very simple, there are times when there is only one value to convert, repeat that conversions at many different times (needs) and the delays will add up.
â€“Â Isaac
Aug 9 at 3:46

add a commentÂ |Â

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

搜尋此網誌

mjhjmtu