Integer with leading zeros (portable)?

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
4
down vote

favorite
1












It is a "feature" of the shell that a number with a leading zero is interpreted as an octal number:



$ echo "$((00100))"
64


But there is no way to disallow this "feature" in many shells, so, it becomes difficult to force the interpretation of a digit sequence as a decimal (or other base) number.



When there is only a number to convert, there are several external programs that could do the trimming:



expr "00100" + 0 
echo "00100" | sed 's/^0*//'
echo "00100" | grep -o '[^0].*$'
echo "00100" | awk 'print int($0)'
echo "00100" | perl -pe '$_=int."n";'


But it takes some time to execute them each and every time they are needed. Acumulate the use of such external tools over many calls and the delay becomes quite big. Just to measure up the delay caused, repeat the calls above a 1000 times and you will get (in sec):



expr 1.934
sed 3.450
grep 3.775
awk 5.291
perl 5.064


Of course (except expr) most tools could process a file with a 1000 lines in:



sed file 0.004
grep file 0.003
awk file 0.007
perl file 0.006


If all the individual 1000 values are available at the same point in time.

That could not be the case. So, what still remains to be answered is:



Is there a native (to the shell) way to extract an integer that is faster than calling external tools for each individual integer (not a list in a file) ?



Each call accumulates and the delay will become important.



The processing becomes more involved if the number may also have a leading sign and you want to reject invalid numbers.










share|improve this question























  • At least bash, probably other shells, has a way to force a base — 10#0010
    – Jeff Schaller
    Aug 8 at 11:37










  • @JeffSchaller Try (in bash) a=-00100;echo $((10#$a)) or a=-++-00100; echo $((10#$a)) (to get a positive number). or a=-0010; echo $((33#$a)) to get -8 instead of the correct -33 or a=-001a; echo $((33#$a)) to get this error: bash: 33#-001a: value too great for base (error token is "001a") … … ksh has similar problems.
    – Isaac
    Aug 8 at 11:44











  • @JeffSchaller And bash is not sh.
    – Isaac
    Aug 8 at 12:06














up vote
4
down vote

favorite
1












It is a "feature" of the shell that a number with a leading zero is interpreted as an octal number:



$ echo "$((00100))"
64


But there is no way to disallow this "feature" in many shells, so, it becomes difficult to force the interpretation of a digit sequence as a decimal (or other base) number.



When there is only a number to convert, there are several external programs that could do the trimming:



expr "00100" + 0 
echo "00100" | sed 's/^0*//'
echo "00100" | grep -o '[^0].*$'
echo "00100" | awk 'print int($0)'
echo "00100" | perl -pe '$_=int."n";'


But it takes some time to execute them each and every time they are needed. Acumulate the use of such external tools over many calls and the delay becomes quite big. Just to measure up the delay caused, repeat the calls above a 1000 times and you will get (in sec):



expr 1.934
sed 3.450
grep 3.775
awk 5.291
perl 5.064


Of course (except expr) most tools could process a file with a 1000 lines in:



sed file 0.004
grep file 0.003
awk file 0.007
perl file 0.006


If all the individual 1000 values are available at the same point in time.

That could not be the case. So, what still remains to be answered is:



Is there a native (to the shell) way to extract an integer that is faster than calling external tools for each individual integer (not a list in a file) ?



Each call accumulates and the delay will become important.



The processing becomes more involved if the number may also have a leading sign and you want to reject invalid numbers.










share|improve this question























  • At least bash, probably other shells, has a way to force a base — 10#0010
    – Jeff Schaller
    Aug 8 at 11:37










  • @JeffSchaller Try (in bash) a=-00100;echo $((10#$a)) or a=-++-00100; echo $((10#$a)) (to get a positive number). or a=-0010; echo $((33#$a)) to get -8 instead of the correct -33 or a=-001a; echo $((33#$a)) to get this error: bash: 33#-001a: value too great for base (error token is "001a") … … ksh has similar problems.
    – Isaac
    Aug 8 at 11:44











  • @JeffSchaller And bash is not sh.
    – Isaac
    Aug 8 at 12:06












up vote
4
down vote

favorite
1









up vote
4
down vote

favorite
1






1





It is a "feature" of the shell that a number with a leading zero is interpreted as an octal number:



$ echo "$((00100))"
64


But there is no way to disallow this "feature" in many shells, so, it becomes difficult to force the interpretation of a digit sequence as a decimal (or other base) number.



When there is only a number to convert, there are several external programs that could do the trimming:



expr "00100" + 0 
echo "00100" | sed 's/^0*//'
echo "00100" | grep -o '[^0].*$'
echo "00100" | awk 'print int($0)'
echo "00100" | perl -pe '$_=int."n";'


But it takes some time to execute them each and every time they are needed. Acumulate the use of such external tools over many calls and the delay becomes quite big. Just to measure up the delay caused, repeat the calls above a 1000 times and you will get (in sec):



expr 1.934
sed 3.450
grep 3.775
awk 5.291
perl 5.064


Of course (except expr) most tools could process a file with a 1000 lines in:



sed file 0.004
grep file 0.003
awk file 0.007
perl file 0.006


If all the individual 1000 values are available at the same point in time.

That could not be the case. So, what still remains to be answered is:



Is there a native (to the shell) way to extract an integer that is faster than calling external tools for each individual integer (not a list in a file) ?



Each call accumulates and the delay will become important.



The processing becomes more involved if the number may also have a leading sign and you want to reject invalid numbers.










share|improve this question















It is a "feature" of the shell that a number with a leading zero is interpreted as an octal number:



$ echo "$((00100))"
64


But there is no way to disallow this "feature" in many shells, so, it becomes difficult to force the interpretation of a digit sequence as a decimal (or other base) number.



When there is only a number to convert, there are several external programs that could do the trimming:



expr "00100" + 0 
echo "00100" | sed 's/^0*//'
echo "00100" | grep -o '[^0].*$'
echo "00100" | awk 'print int($0)'
echo "00100" | perl -pe '$_=int."n";'


But it takes some time to execute them each and every time they are needed. Acumulate the use of such external tools over many calls and the delay becomes quite big. Just to measure up the delay caused, repeat the calls above a 1000 times and you will get (in sec):



expr 1.934
sed 3.450
grep 3.775
awk 5.291
perl 5.064


Of course (except expr) most tools could process a file with a 1000 lines in:



sed file 0.004
grep file 0.003
awk file 0.007
perl file 0.006


If all the individual 1000 values are available at the same point in time.

That could not be the case. So, what still remains to be answered is:



Is there a native (to the shell) way to extract an integer that is faster than calling external tools for each individual integer (not a list in a file) ?



Each call accumulates and the delay will become important.



The processing becomes more involved if the number may also have a leading sign and you want to reject invalid numbers.







shell-script shell






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Aug 9 at 3:56

























asked Aug 8 at 11:25









Isaac

6,9851834




6,9851834











  • At least bash, probably other shells, has a way to force a base — 10#0010
    – Jeff Schaller
    Aug 8 at 11:37










  • @JeffSchaller Try (in bash) a=-00100;echo $((10#$a)) or a=-++-00100; echo $((10#$a)) (to get a positive number). or a=-0010; echo $((33#$a)) to get -8 instead of the correct -33 or a=-001a; echo $((33#$a)) to get this error: bash: 33#-001a: value too great for base (error token is "001a") … … ksh has similar problems.
    – Isaac
    Aug 8 at 11:44











  • @JeffSchaller And bash is not sh.
    – Isaac
    Aug 8 at 12:06
















  • At least bash, probably other shells, has a way to force a base — 10#0010
    – Jeff Schaller
    Aug 8 at 11:37










  • @JeffSchaller Try (in bash) a=-00100;echo $((10#$a)) or a=-++-00100; echo $((10#$a)) (to get a positive number). or a=-0010; echo $((33#$a)) to get -8 instead of the correct -33 or a=-001a; echo $((33#$a)) to get this error: bash: 33#-001a: value too great for base (error token is "001a") … … ksh has similar problems.
    – Isaac
    Aug 8 at 11:44











  • @JeffSchaller And bash is not sh.
    – Isaac
    Aug 8 at 12:06















At least bash, probably other shells, has a way to force a base — 10#0010
– Jeff Schaller
Aug 8 at 11:37




At least bash, probably other shells, has a way to force a base — 10#0010
– Jeff Schaller
Aug 8 at 11:37












@JeffSchaller Try (in bash) a=-00100;echo $((10#$a)) or a=-++-00100; echo $((10#$a)) (to get a positive number). or a=-0010; echo $((33#$a)) to get -8 instead of the correct -33 or a=-001a; echo $((33#$a)) to get this error: bash: 33#-001a: value too great for base (error token is "001a") … … ksh has similar problems.
– Isaac
Aug 8 at 11:44





@JeffSchaller Try (in bash) a=-00100;echo $((10#$a)) or a=-++-00100; echo $((10#$a)) (to get a positive number). or a=-0010; echo $((33#$a)) to get -8 instead of the correct -33 or a=-001a; echo $((33#$a)) to get this error: bash: 33#-001a: value too great for base (error token is "001a") … … ksh has similar problems.
– Isaac
Aug 8 at 11:44













@JeffSchaller And bash is not sh.
– Isaac
Aug 8 at 12:06




@JeffSchaller And bash is not sh.
– Isaac
Aug 8 at 12:06










3 Answers
3






active

oldest

votes

















up vote
2
down vote













Note that while $((010)) is required by POSIX to expand to 8, several shells don't do it by default (or only in some contexts) unless in a conformance mode as that's a feature you usually do not want.



With zsh, that's controlled by the octalzeroes option (off by default except in sh/ksh emulation).



$ zsh -c 'echo $((010))'
10
$ zsh -o octalzeroes -c 'echo $((010))'
8
$ (exec -a sh zsh -c 'echo "$((010))"')
8


In mksh, that's controlled by the posix option (off by default):



$ mksh -c 'echo "$((010))"'
10
$ mksh -o posix -c 'echo "$((010))"'
8


In bash, there's no option to turn it off, but you can use the $((10#010)) ksh syntax to force interpretation in decimal (also works in ksh and zsh), though in bash and mksh -o posix, $((10#-010)) doesn't work (treated as 10#0 - 010 as you can see from the expansion of $((-10#-010)) yielding -8), you need $((-10#010)) (or $((- 10#010)) for compatibility with zsh which complains about -10 being an invalid base).



$ bash -c 'echo "$((10#010))"'
10


With ksh93, compare:



$ ksh93 -c 'echo "$((010))"'
8
$ ksh93 -c '((a = 010)); echo "$a"'
8


with:



$ ksh93 -c 'a=010; echo "$((a))"'
10
$ ksh93 -c 'printf "%dn" 010'
10
$ ksh93 -c 'let a=010; echo "$a"'
10
$ ksh93 -c 'echo "$((010e0))"'
10
$ ksh93 -o letoctal -c 'let a=010; echo "$a"'
8


So at least if you're coding for any of those shells specifically, there are ways to work around that "misfeature".



But none of those would help when writing a POSIX portable script, in which case, you'd want to strip the leading zeros as you have shown.






share|improve this answer





























    up vote
    0
    down vote













    Something similar could be done in one line with:



    $ a=-00100; a=$a%"$a#[+-]"$a#"$a%%[!0+-]*"; a=$a:-0
    $ echo "$a"
    -100


    It just takes 0.0482 for 1000 repetitions, 100 times less than using an external program.



    That's based on two double parameter expansions:



    1. Extract the sign:


      • $a#[+-] remove the first character provided it is a sign.


      • $a%"$a#[+-]" keeps the first sign provided that it is a sign.


    2. Remove all leading signs and/or zeros:


      • $a%%[!0+-]* remove starting at any ( not 0 or + or - ) to the end.


      • $a#"$a%%[!0+-]*" remove the above, i.e., all leading zeros and signs.


    That picks one sign and remove all leading zeros.
    However it allows (without error):



    1. Several leading signs.

    2. Any characters after the leading signs and zeros.

    3. An "out of range" (too big) number.

    If those tests are needed, keep reading.




    The number of signs could be tested with:



    signs=$a%%[!+-]* 
    [ $#signs -gt 1 ] && echo "$0: Invalid number $a: Too many signs"


    The kind of characters allowed could be checked with:



    num=$a#"$a%%[!0+-]*"

    any=$num%%[!0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ@_]*
    [ "$any" != "$num" ] && echo "$0: Invalid number $a"

    hex=$num%%[!0123456789abcdefABCDEF]*
    [ "$hex" != "$num" ] && echo "$0: Invalid hexadecimal number $a"

    dec=$num%%[!0123456789]*
    [ "$dec" != "$num" ] && echo "$0: Invalid decimal number $a"


    And, finally, we can take advantage of the capacity of printf of printing a warning for numbers "out of range" (only for bases that printf understand):



    printf '%d' $sign$dec >/dev/null # for a decimal number
    printf '%d' "$sign0x$hex" >/dev/null # for hex numbers


    Yes, all printf use %d, it is not a typo.



    And, yes, all the above works in most shells that have printf.






    share|improve this answer






















    • 0-9 or [:digit:] are not allowed to match more than 0123456789. This is becaise [:digit:] matches the same as isdigit() and that is not allowed to match more. 0-9 is contiguous even on EBCDIC
      – schily
      Aug 8 at 12:45







    • 1




      I had that misconception as well, but [[:digit:]] is required to match on 012345679 only, and not [0-9] and in practice, there are systems including GNU ones where [0-9] matches other characters in some locales.
      – Stéphane Chazelas
      Aug 8 at 12:46










    • @schily, that's not true of [0-9], there was a recent discussion on the austin-group-l ML which you participated in about it.
      – Stéphane Chazelas
      Aug 8 at 12:47






    • 1




      The question is whether there are other collating elements that are in the collating range 0..9. Do you know of such elements?
      – schily
      Aug 8 at 12:48










    • @schily, the discussion I was referring to is at mail-archive.com/austin-group-l@opengroup.org/msg02476.html showing my initial misconception, see the follow-ups which address most of the questions you have. Including, yes, that only applies to POSIX for [[:digit:]], and examples of systems including Solaris where 0-9 matches other collation elements and which ones.
      – Stéphane Chazelas
      Aug 8 at 12:57

















    up vote
    0
    down vote













    Here is your example x1000 on my system:



    $ cat shell.sh
    #!/bin/dash
    q=1
    while [ "$q" -le 1000 ]
    do
    z=-00100
    z=$z%"$z#[+-]"$z#"$z%%[!0+-]*"
    z=$z:-0
    echo "$z"
    q=$((q + 1))
    done


    Result:



    $ time ./shell.sh >/dev/null
    real 0m0.047s


    Now I take issue with the sed example. I do see an example with a file, but i am
    not seeing a clear reason why using a file is not acceptable. also the example
    with pipe is problematic because a pipe is not needed - nor is calling sed 1000
    times. if you simply cant use a file for whatever reason - a heredoc would be
    fine:



    cat > sed.sh <<alfa
    sed 's/^0*//' <<bravo
    $(yes 00100 | head -1000)
    bravo
    alfa


    Result:



    $ time ./sed.sh >/dev/null
    real 0m0.047s


    So on my system it is the exact same speed without the fuss.






    share|improve this answer




















    • The actual code for the loop has not been added for simplicity. If you insist, I could add it (note that it is a simple for loop from 1 to 1000). Also, I am still working to complete this question and answer. it has become quite more convoluted than what I though on the outset.
      – Isaac
      Aug 9 at 3:40










    • In timing, you get 0.047, I got 0.048, so we quite agree. Now if you try something like time for (( i=0; i<$n; i++)); do echo "00100" | awk 'print int($0)' ; done >/dev/null (where n is 1000) for each of the possible solutions, you should get timings similar to what I got. As for the file, the reason is very simple, there are times when there is only one value to convert, repeat that conversions at many different times (needs) and the delays will add up.
      – Isaac
      Aug 9 at 3:46










    Your Answer







    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "106"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    convertImagesToLinks: false,
    noModals: false,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













     

    draft saved


    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f461263%2finteger-with-leading-zeros-portable%23new-answer', 'question_page');

    );

    Post as a guest






























    3 Answers
    3






    active

    oldest

    votes








    3 Answers
    3






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    2
    down vote













    Note that while $((010)) is required by POSIX to expand to 8, several shells don't do it by default (or only in some contexts) unless in a conformance mode as that's a feature you usually do not want.



    With zsh, that's controlled by the octalzeroes option (off by default except in sh/ksh emulation).



    $ zsh -c 'echo $((010))'
    10
    $ zsh -o octalzeroes -c 'echo $((010))'
    8
    $ (exec -a sh zsh -c 'echo "$((010))"')
    8


    In mksh, that's controlled by the posix option (off by default):



    $ mksh -c 'echo "$((010))"'
    10
    $ mksh -o posix -c 'echo "$((010))"'
    8


    In bash, there's no option to turn it off, but you can use the $((10#010)) ksh syntax to force interpretation in decimal (also works in ksh and zsh), though in bash and mksh -o posix, $((10#-010)) doesn't work (treated as 10#0 - 010 as you can see from the expansion of $((-10#-010)) yielding -8), you need $((-10#010)) (or $((- 10#010)) for compatibility with zsh which complains about -10 being an invalid base).



    $ bash -c 'echo "$((10#010))"'
    10


    With ksh93, compare:



    $ ksh93 -c 'echo "$((010))"'
    8
    $ ksh93 -c '((a = 010)); echo "$a"'
    8


    with:



    $ ksh93 -c 'a=010; echo "$((a))"'
    10
    $ ksh93 -c 'printf "%dn" 010'
    10
    $ ksh93 -c 'let a=010; echo "$a"'
    10
    $ ksh93 -c 'echo "$((010e0))"'
    10
    $ ksh93 -o letoctal -c 'let a=010; echo "$a"'
    8


    So at least if you're coding for any of those shells specifically, there are ways to work around that "misfeature".



    But none of those would help when writing a POSIX portable script, in which case, you'd want to strip the leading zeros as you have shown.






    share|improve this answer


























      up vote
      2
      down vote













      Note that while $((010)) is required by POSIX to expand to 8, several shells don't do it by default (or only in some contexts) unless in a conformance mode as that's a feature you usually do not want.



      With zsh, that's controlled by the octalzeroes option (off by default except in sh/ksh emulation).



      $ zsh -c 'echo $((010))'
      10
      $ zsh -o octalzeroes -c 'echo $((010))'
      8
      $ (exec -a sh zsh -c 'echo "$((010))"')
      8


      In mksh, that's controlled by the posix option (off by default):



      $ mksh -c 'echo "$((010))"'
      10
      $ mksh -o posix -c 'echo "$((010))"'
      8


      In bash, there's no option to turn it off, but you can use the $((10#010)) ksh syntax to force interpretation in decimal (also works in ksh and zsh), though in bash and mksh -o posix, $((10#-010)) doesn't work (treated as 10#0 - 010 as you can see from the expansion of $((-10#-010)) yielding -8), you need $((-10#010)) (or $((- 10#010)) for compatibility with zsh which complains about -10 being an invalid base).



      $ bash -c 'echo "$((10#010))"'
      10


      With ksh93, compare:



      $ ksh93 -c 'echo "$((010))"'
      8
      $ ksh93 -c '((a = 010)); echo "$a"'
      8


      with:



      $ ksh93 -c 'a=010; echo "$((a))"'
      10
      $ ksh93 -c 'printf "%dn" 010'
      10
      $ ksh93 -c 'let a=010; echo "$a"'
      10
      $ ksh93 -c 'echo "$((010e0))"'
      10
      $ ksh93 -o letoctal -c 'let a=010; echo "$a"'
      8


      So at least if you're coding for any of those shells specifically, there are ways to work around that "misfeature".



      But none of those would help when writing a POSIX portable script, in which case, you'd want to strip the leading zeros as you have shown.






      share|improve this answer
























        up vote
        2
        down vote










        up vote
        2
        down vote









        Note that while $((010)) is required by POSIX to expand to 8, several shells don't do it by default (or only in some contexts) unless in a conformance mode as that's a feature you usually do not want.



        With zsh, that's controlled by the octalzeroes option (off by default except in sh/ksh emulation).



        $ zsh -c 'echo $((010))'
        10
        $ zsh -o octalzeroes -c 'echo $((010))'
        8
        $ (exec -a sh zsh -c 'echo "$((010))"')
        8


        In mksh, that's controlled by the posix option (off by default):



        $ mksh -c 'echo "$((010))"'
        10
        $ mksh -o posix -c 'echo "$((010))"'
        8


        In bash, there's no option to turn it off, but you can use the $((10#010)) ksh syntax to force interpretation in decimal (also works in ksh and zsh), though in bash and mksh -o posix, $((10#-010)) doesn't work (treated as 10#0 - 010 as you can see from the expansion of $((-10#-010)) yielding -8), you need $((-10#010)) (or $((- 10#010)) for compatibility with zsh which complains about -10 being an invalid base).



        $ bash -c 'echo "$((10#010))"'
        10


        With ksh93, compare:



        $ ksh93 -c 'echo "$((010))"'
        8
        $ ksh93 -c '((a = 010)); echo "$a"'
        8


        with:



        $ ksh93 -c 'a=010; echo "$((a))"'
        10
        $ ksh93 -c 'printf "%dn" 010'
        10
        $ ksh93 -c 'let a=010; echo "$a"'
        10
        $ ksh93 -c 'echo "$((010e0))"'
        10
        $ ksh93 -o letoctal -c 'let a=010; echo "$a"'
        8


        So at least if you're coding for any of those shells specifically, there are ways to work around that "misfeature".



        But none of those would help when writing a POSIX portable script, in which case, you'd want to strip the leading zeros as you have shown.






        share|improve this answer














        Note that while $((010)) is required by POSIX to expand to 8, several shells don't do it by default (or only in some contexts) unless in a conformance mode as that's a feature you usually do not want.



        With zsh, that's controlled by the octalzeroes option (off by default except in sh/ksh emulation).



        $ zsh -c 'echo $((010))'
        10
        $ zsh -o octalzeroes -c 'echo $((010))'
        8
        $ (exec -a sh zsh -c 'echo "$((010))"')
        8


        In mksh, that's controlled by the posix option (off by default):



        $ mksh -c 'echo "$((010))"'
        10
        $ mksh -o posix -c 'echo "$((010))"'
        8


        In bash, there's no option to turn it off, but you can use the $((10#010)) ksh syntax to force interpretation in decimal (also works in ksh and zsh), though in bash and mksh -o posix, $((10#-010)) doesn't work (treated as 10#0 - 010 as you can see from the expansion of $((-10#-010)) yielding -8), you need $((-10#010)) (or $((- 10#010)) for compatibility with zsh which complains about -10 being an invalid base).



        $ bash -c 'echo "$((10#010))"'
        10


        With ksh93, compare:



        $ ksh93 -c 'echo "$((010))"'
        8
        $ ksh93 -c '((a = 010)); echo "$a"'
        8


        with:



        $ ksh93 -c 'a=010; echo "$((a))"'
        10
        $ ksh93 -c 'printf "%dn" 010'
        10
        $ ksh93 -c 'let a=010; echo "$a"'
        10
        $ ksh93 -c 'echo "$((010e0))"'
        10
        $ ksh93 -o letoctal -c 'let a=010; echo "$a"'
        8


        So at least if you're coding for any of those shells specifically, there are ways to work around that "misfeature".



        But none of those would help when writing a POSIX portable script, in which case, you'd want to strip the leading zeros as you have shown.







        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Aug 8 at 16:23

























        answered Aug 8 at 14:18









        Stéphane Chazelas

        284k53523861




        284k53523861






















            up vote
            0
            down vote













            Something similar could be done in one line with:



            $ a=-00100; a=$a%"$a#[+-]"$a#"$a%%[!0+-]*"; a=$a:-0
            $ echo "$a"
            -100


            It just takes 0.0482 for 1000 repetitions, 100 times less than using an external program.



            That's based on two double parameter expansions:



            1. Extract the sign:


              • $a#[+-] remove the first character provided it is a sign.


              • $a%"$a#[+-]" keeps the first sign provided that it is a sign.


            2. Remove all leading signs and/or zeros:


              • $a%%[!0+-]* remove starting at any ( not 0 or + or - ) to the end.


              • $a#"$a%%[!0+-]*" remove the above, i.e., all leading zeros and signs.


            That picks one sign and remove all leading zeros.
            However it allows (without error):



            1. Several leading signs.

            2. Any characters after the leading signs and zeros.

            3. An "out of range" (too big) number.

            If those tests are needed, keep reading.




            The number of signs could be tested with:



            signs=$a%%[!+-]* 
            [ $#signs -gt 1 ] && echo "$0: Invalid number $a: Too many signs"


            The kind of characters allowed could be checked with:



            num=$a#"$a%%[!0+-]*"

            any=$num%%[!0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ@_]*
            [ "$any" != "$num" ] && echo "$0: Invalid number $a"

            hex=$num%%[!0123456789abcdefABCDEF]*
            [ "$hex" != "$num" ] && echo "$0: Invalid hexadecimal number $a"

            dec=$num%%[!0123456789]*
            [ "$dec" != "$num" ] && echo "$0: Invalid decimal number $a"


            And, finally, we can take advantage of the capacity of printf of printing a warning for numbers "out of range" (only for bases that printf understand):



            printf '%d' $sign$dec >/dev/null # for a decimal number
            printf '%d' "$sign0x$hex" >/dev/null # for hex numbers


            Yes, all printf use %d, it is not a typo.



            And, yes, all the above works in most shells that have printf.






            share|improve this answer






















            • 0-9 or [:digit:] are not allowed to match more than 0123456789. This is becaise [:digit:] matches the same as isdigit() and that is not allowed to match more. 0-9 is contiguous even on EBCDIC
              – schily
              Aug 8 at 12:45







            • 1




              I had that misconception as well, but [[:digit:]] is required to match on 012345679 only, and not [0-9] and in practice, there are systems including GNU ones where [0-9] matches other characters in some locales.
              – Stéphane Chazelas
              Aug 8 at 12:46










            • @schily, that's not true of [0-9], there was a recent discussion on the austin-group-l ML which you participated in about it.
              – Stéphane Chazelas
              Aug 8 at 12:47






            • 1




              The question is whether there are other collating elements that are in the collating range 0..9. Do you know of such elements?
              – schily
              Aug 8 at 12:48










            • @schily, the discussion I was referring to is at mail-archive.com/austin-group-l@opengroup.org/msg02476.html showing my initial misconception, see the follow-ups which address most of the questions you have. Including, yes, that only applies to POSIX for [[:digit:]], and examples of systems including Solaris where 0-9 matches other collation elements and which ones.
              – Stéphane Chazelas
              Aug 8 at 12:57














            up vote
            0
            down vote













            Something similar could be done in one line with:



            $ a=-00100; a=$a%"$a#[+-]"$a#"$a%%[!0+-]*"; a=$a:-0
            $ echo "$a"
            -100


            It just takes 0.0482 for 1000 repetitions, 100 times less than using an external program.



            That's based on two double parameter expansions:



            1. Extract the sign:


              • $a#[+-] remove the first character provided it is a sign.


              • $a%"$a#[+-]" keeps the first sign provided that it is a sign.


            2. Remove all leading signs and/or zeros:


              • $a%%[!0+-]* remove starting at any ( not 0 or + or - ) to the end.


              • $a#"$a%%[!0+-]*" remove the above, i.e., all leading zeros and signs.


            That picks one sign and remove all leading zeros.
            However it allows (without error):



            1. Several leading signs.

            2. Any characters after the leading signs and zeros.

            3. An "out of range" (too big) number.

            If those tests are needed, keep reading.




            The number of signs could be tested with:



            signs=$a%%[!+-]* 
            [ $#signs -gt 1 ] && echo "$0: Invalid number $a: Too many signs"


            The kind of characters allowed could be checked with:



            num=$a#"$a%%[!0+-]*"

            any=$num%%[!0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ@_]*
            [ "$any" != "$num" ] && echo "$0: Invalid number $a"

            hex=$num%%[!0123456789abcdefABCDEF]*
            [ "$hex" != "$num" ] && echo "$0: Invalid hexadecimal number $a"

            dec=$num%%[!0123456789]*
            [ "$dec" != "$num" ] && echo "$0: Invalid decimal number $a"


            And, finally, we can take advantage of the capacity of printf of printing a warning for numbers "out of range" (only for bases that printf understand):



            printf '%d' $sign$dec >/dev/null # for a decimal number
            printf '%d' "$sign0x$hex" >/dev/null # for hex numbers


            Yes, all printf use %d, it is not a typo.



            And, yes, all the above works in most shells that have printf.






            share|improve this answer






















            • 0-9 or [:digit:] are not allowed to match more than 0123456789. This is becaise [:digit:] matches the same as isdigit() and that is not allowed to match more. 0-9 is contiguous even on EBCDIC
              – schily
              Aug 8 at 12:45







            • 1




              I had that misconception as well, but [[:digit:]] is required to match on 012345679 only, and not [0-9] and in practice, there are systems including GNU ones where [0-9] matches other characters in some locales.
              – Stéphane Chazelas
              Aug 8 at 12:46










            • @schily, that's not true of [0-9], there was a recent discussion on the austin-group-l ML which you participated in about it.
              – Stéphane Chazelas
              Aug 8 at 12:47






            • 1




              The question is whether there are other collating elements that are in the collating range 0..9. Do you know of such elements?
              – schily
              Aug 8 at 12:48










            • @schily, the discussion I was referring to is at mail-archive.com/austin-group-l@opengroup.org/msg02476.html showing my initial misconception, see the follow-ups which address most of the questions you have. Including, yes, that only applies to POSIX for [[:digit:]], and examples of systems including Solaris where 0-9 matches other collation elements and which ones.
              – Stéphane Chazelas
              Aug 8 at 12:57












            up vote
            0
            down vote










            up vote
            0
            down vote









            Something similar could be done in one line with:



            $ a=-00100; a=$a%"$a#[+-]"$a#"$a%%[!0+-]*"; a=$a:-0
            $ echo "$a"
            -100


            It just takes 0.0482 for 1000 repetitions, 100 times less than using an external program.



            That's based on two double parameter expansions:



            1. Extract the sign:


              • $a#[+-] remove the first character provided it is a sign.


              • $a%"$a#[+-]" keeps the first sign provided that it is a sign.


            2. Remove all leading signs and/or zeros:


              • $a%%[!0+-]* remove starting at any ( not 0 or + or - ) to the end.


              • $a#"$a%%[!0+-]*" remove the above, i.e., all leading zeros and signs.


            That picks one sign and remove all leading zeros.
            However it allows (without error):



            1. Several leading signs.

            2. Any characters after the leading signs and zeros.

            3. An "out of range" (too big) number.

            If those tests are needed, keep reading.




            The number of signs could be tested with:



            signs=$a%%[!+-]* 
            [ $#signs -gt 1 ] && echo "$0: Invalid number $a: Too many signs"


            The kind of characters allowed could be checked with:



            num=$a#"$a%%[!0+-]*"

            any=$num%%[!0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ@_]*
            [ "$any" != "$num" ] && echo "$0: Invalid number $a"

            hex=$num%%[!0123456789abcdefABCDEF]*
            [ "$hex" != "$num" ] && echo "$0: Invalid hexadecimal number $a"

            dec=$num%%[!0123456789]*
            [ "$dec" != "$num" ] && echo "$0: Invalid decimal number $a"


            And, finally, we can take advantage of the capacity of printf of printing a warning for numbers "out of range" (only for bases that printf understand):



            printf '%d' $sign$dec >/dev/null # for a decimal number
            printf '%d' "$sign0x$hex" >/dev/null # for hex numbers


            Yes, all printf use %d, it is not a typo.



            And, yes, all the above works in most shells that have printf.






            share|improve this answer














            Something similar could be done in one line with:



            $ a=-00100; a=$a%"$a#[+-]"$a#"$a%%[!0+-]*"; a=$a:-0
            $ echo "$a"
            -100


            It just takes 0.0482 for 1000 repetitions, 100 times less than using an external program.



            That's based on two double parameter expansions:



            1. Extract the sign:


              • $a#[+-] remove the first character provided it is a sign.


              • $a%"$a#[+-]" keeps the first sign provided that it is a sign.


            2. Remove all leading signs and/or zeros:


              • $a%%[!0+-]* remove starting at any ( not 0 or + or - ) to the end.


              • $a#"$a%%[!0+-]*" remove the above, i.e., all leading zeros and signs.


            That picks one sign and remove all leading zeros.
            However it allows (without error):



            1. Several leading signs.

            2. Any characters after the leading signs and zeros.

            3. An "out of range" (too big) number.

            If those tests are needed, keep reading.




            The number of signs could be tested with:



            signs=$a%%[!+-]* 
            [ $#signs -gt 1 ] && echo "$0: Invalid number $a: Too many signs"


            The kind of characters allowed could be checked with:



            num=$a#"$a%%[!0+-]*"

            any=$num%%[!0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ@_]*
            [ "$any" != "$num" ] && echo "$0: Invalid number $a"

            hex=$num%%[!0123456789abcdefABCDEF]*
            [ "$hex" != "$num" ] && echo "$0: Invalid hexadecimal number $a"

            dec=$num%%[!0123456789]*
            [ "$dec" != "$num" ] && echo "$0: Invalid decimal number $a"


            And, finally, we can take advantage of the capacity of printf of printing a warning for numbers "out of range" (only for bases that printf understand):



            printf '%d' $sign$dec >/dev/null # for a decimal number
            printf '%d' "$sign0x$hex" >/dev/null # for hex numbers


            Yes, all printf use %d, it is not a typo.



            And, yes, all the above works in most shells that have printf.







            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited Aug 9 at 2:34

























            answered Aug 8 at 11:25









            Isaac

            6,9851834




            6,9851834











            • 0-9 or [:digit:] are not allowed to match more than 0123456789. This is becaise [:digit:] matches the same as isdigit() and that is not allowed to match more. 0-9 is contiguous even on EBCDIC
              – schily
              Aug 8 at 12:45







            • 1




              I had that misconception as well, but [[:digit:]] is required to match on 012345679 only, and not [0-9] and in practice, there are systems including GNU ones where [0-9] matches other characters in some locales.
              – Stéphane Chazelas
              Aug 8 at 12:46










            • @schily, that's not true of [0-9], there was a recent discussion on the austin-group-l ML which you participated in about it.
              – Stéphane Chazelas
              Aug 8 at 12:47






            • 1




              The question is whether there are other collating elements that are in the collating range 0..9. Do you know of such elements?
              – schily
              Aug 8 at 12:48










            • @schily, the discussion I was referring to is at mail-archive.com/austin-group-l@opengroup.org/msg02476.html showing my initial misconception, see the follow-ups which address most of the questions you have. Including, yes, that only applies to POSIX for [[:digit:]], and examples of systems including Solaris where 0-9 matches other collation elements and which ones.
              – Stéphane Chazelas
              Aug 8 at 12:57
















            • 0-9 or [:digit:] are not allowed to match more than 0123456789. This is becaise [:digit:] matches the same as isdigit() and that is not allowed to match more. 0-9 is contiguous even on EBCDIC
              – schily
              Aug 8 at 12:45







            • 1




              I had that misconception as well, but [[:digit:]] is required to match on 012345679 only, and not [0-9] and in practice, there are systems including GNU ones where [0-9] matches other characters in some locales.
              – Stéphane Chazelas
              Aug 8 at 12:46










            • @schily, that's not true of [0-9], there was a recent discussion on the austin-group-l ML which you participated in about it.
              – Stéphane Chazelas
              Aug 8 at 12:47






            • 1




              The question is whether there are other collating elements that are in the collating range 0..9. Do you know of such elements?
              – schily
              Aug 8 at 12:48










            • @schily, the discussion I was referring to is at mail-archive.com/austin-group-l@opengroup.org/msg02476.html showing my initial misconception, see the follow-ups which address most of the questions you have. Including, yes, that only applies to POSIX for [[:digit:]], and examples of systems including Solaris where 0-9 matches other collation elements and which ones.
              – Stéphane Chazelas
              Aug 8 at 12:57















            0-9 or [:digit:] are not allowed to match more than 0123456789. This is becaise [:digit:] matches the same as isdigit() and that is not allowed to match more. 0-9 is contiguous even on EBCDIC
            – schily
            Aug 8 at 12:45





            0-9 or [:digit:] are not allowed to match more than 0123456789. This is becaise [:digit:] matches the same as isdigit() and that is not allowed to match more. 0-9 is contiguous even on EBCDIC
            – schily
            Aug 8 at 12:45





            1




            1




            I had that misconception as well, but [[:digit:]] is required to match on 012345679 only, and not [0-9] and in practice, there are systems including GNU ones where [0-9] matches other characters in some locales.
            – Stéphane Chazelas
            Aug 8 at 12:46




            I had that misconception as well, but [[:digit:]] is required to match on 012345679 only, and not [0-9] and in practice, there are systems including GNU ones where [0-9] matches other characters in some locales.
            – Stéphane Chazelas
            Aug 8 at 12:46












            @schily, that's not true of [0-9], there was a recent discussion on the austin-group-l ML which you participated in about it.
            – Stéphane Chazelas
            Aug 8 at 12:47




            @schily, that's not true of [0-9], there was a recent discussion on the austin-group-l ML which you participated in about it.
            – Stéphane Chazelas
            Aug 8 at 12:47




            1




            1




            The question is whether there are other collating elements that are in the collating range 0..9. Do you know of such elements?
            – schily
            Aug 8 at 12:48




            The question is whether there are other collating elements that are in the collating range 0..9. Do you know of such elements?
            – schily
            Aug 8 at 12:48












            @schily, the discussion I was referring to is at mail-archive.com/austin-group-l@opengroup.org/msg02476.html showing my initial misconception, see the follow-ups which address most of the questions you have. Including, yes, that only applies to POSIX for [[:digit:]], and examples of systems including Solaris where 0-9 matches other collation elements and which ones.
            – Stéphane Chazelas
            Aug 8 at 12:57




            @schily, the discussion I was referring to is at mail-archive.com/austin-group-l@opengroup.org/msg02476.html showing my initial misconception, see the follow-ups which address most of the questions you have. Including, yes, that only applies to POSIX for [[:digit:]], and examples of systems including Solaris where 0-9 matches other collation elements and which ones.
            – Stéphane Chazelas
            Aug 8 at 12:57










            up vote
            0
            down vote













            Here is your example x1000 on my system:



            $ cat shell.sh
            #!/bin/dash
            q=1
            while [ "$q" -le 1000 ]
            do
            z=-00100
            z=$z%"$z#[+-]"$z#"$z%%[!0+-]*"
            z=$z:-0
            echo "$z"
            q=$((q + 1))
            done


            Result:



            $ time ./shell.sh >/dev/null
            real 0m0.047s


            Now I take issue with the sed example. I do see an example with a file, but i am
            not seeing a clear reason why using a file is not acceptable. also the example
            with pipe is problematic because a pipe is not needed - nor is calling sed 1000
            times. if you simply cant use a file for whatever reason - a heredoc would be
            fine:



            cat > sed.sh <<alfa
            sed 's/^0*//' <<bravo
            $(yes 00100 | head -1000)
            bravo
            alfa


            Result:



            $ time ./sed.sh >/dev/null
            real 0m0.047s


            So on my system it is the exact same speed without the fuss.






            share|improve this answer




















            • The actual code for the loop has not been added for simplicity. If you insist, I could add it (note that it is a simple for loop from 1 to 1000). Also, I am still working to complete this question and answer. it has become quite more convoluted than what I though on the outset.
              – Isaac
              Aug 9 at 3:40










            • In timing, you get 0.047, I got 0.048, so we quite agree. Now if you try something like time for (( i=0; i<$n; i++)); do echo "00100" | awk 'print int($0)' ; done >/dev/null (where n is 1000) for each of the possible solutions, you should get timings similar to what I got. As for the file, the reason is very simple, there are times when there is only one value to convert, repeat that conversions at many different times (needs) and the delays will add up.
              – Isaac
              Aug 9 at 3:46














            up vote
            0
            down vote













            Here is your example x1000 on my system:



            $ cat shell.sh
            #!/bin/dash
            q=1
            while [ "$q" -le 1000 ]
            do
            z=-00100
            z=$z%"$z#[+-]"$z#"$z%%[!0+-]*"
            z=$z:-0
            echo "$z"
            q=$((q + 1))
            done


            Result:



            $ time ./shell.sh >/dev/null
            real 0m0.047s


            Now I take issue with the sed example. I do see an example with a file, but i am
            not seeing a clear reason why using a file is not acceptable. also the example
            with pipe is problematic because a pipe is not needed - nor is calling sed 1000
            times. if you simply cant use a file for whatever reason - a heredoc would be
            fine:



            cat > sed.sh <<alfa
            sed 's/^0*//' <<bravo
            $(yes 00100 | head -1000)
            bravo
            alfa


            Result:



            $ time ./sed.sh >/dev/null
            real 0m0.047s


            So on my system it is the exact same speed without the fuss.






            share|improve this answer




















            • The actual code for the loop has not been added for simplicity. If you insist, I could add it (note that it is a simple for loop from 1 to 1000). Also, I am still working to complete this question and answer. it has become quite more convoluted than what I though on the outset.
              – Isaac
              Aug 9 at 3:40










            • In timing, you get 0.047, I got 0.048, so we quite agree. Now if you try something like time for (( i=0; i<$n; i++)); do echo "00100" | awk 'print int($0)' ; done >/dev/null (where n is 1000) for each of the possible solutions, you should get timings similar to what I got. As for the file, the reason is very simple, there are times when there is only one value to convert, repeat that conversions at many different times (needs) and the delays will add up.
              – Isaac
              Aug 9 at 3:46












            up vote
            0
            down vote










            up vote
            0
            down vote









            Here is your example x1000 on my system:



            $ cat shell.sh
            #!/bin/dash
            q=1
            while [ "$q" -le 1000 ]
            do
            z=-00100
            z=$z%"$z#[+-]"$z#"$z%%[!0+-]*"
            z=$z:-0
            echo "$z"
            q=$((q + 1))
            done


            Result:



            $ time ./shell.sh >/dev/null
            real 0m0.047s


            Now I take issue with the sed example. I do see an example with a file, but i am
            not seeing a clear reason why using a file is not acceptable. also the example
            with pipe is problematic because a pipe is not needed - nor is calling sed 1000
            times. if you simply cant use a file for whatever reason - a heredoc would be
            fine:



            cat > sed.sh <<alfa
            sed 's/^0*//' <<bravo
            $(yes 00100 | head -1000)
            bravo
            alfa


            Result:



            $ time ./sed.sh >/dev/null
            real 0m0.047s


            So on my system it is the exact same speed without the fuss.






            share|improve this answer












            Here is your example x1000 on my system:



            $ cat shell.sh
            #!/bin/dash
            q=1
            while [ "$q" -le 1000 ]
            do
            z=-00100
            z=$z%"$z#[+-]"$z#"$z%%[!0+-]*"
            z=$z:-0
            echo "$z"
            q=$((q + 1))
            done


            Result:



            $ time ./shell.sh >/dev/null
            real 0m0.047s


            Now I take issue with the sed example. I do see an example with a file, but i am
            not seeing a clear reason why using a file is not acceptable. also the example
            with pipe is problematic because a pipe is not needed - nor is calling sed 1000
            times. if you simply cant use a file for whatever reason - a heredoc would be
            fine:



            cat > sed.sh <<alfa
            sed 's/^0*//' <<bravo
            $(yes 00100 | head -1000)
            bravo
            alfa


            Result:



            $ time ./sed.sh >/dev/null
            real 0m0.047s


            So on my system it is the exact same speed without the fuss.







            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered Aug 9 at 3:20









            Steven Penny

            2,31521635




            2,31521635











            • The actual code for the loop has not been added for simplicity. If you insist, I could add it (note that it is a simple for loop from 1 to 1000). Also, I am still working to complete this question and answer. it has become quite more convoluted than what I though on the outset.
              – Isaac
              Aug 9 at 3:40










            • In timing, you get 0.047, I got 0.048, so we quite agree. Now if you try something like time for (( i=0; i<$n; i++)); do echo "00100" | awk 'print int($0)' ; done >/dev/null (where n is 1000) for each of the possible solutions, you should get timings similar to what I got. As for the file, the reason is very simple, there are times when there is only one value to convert, repeat that conversions at many different times (needs) and the delays will add up.
              – Isaac
              Aug 9 at 3:46
















            • The actual code for the loop has not been added for simplicity. If you insist, I could add it (note that it is a simple for loop from 1 to 1000). Also, I am still working to complete this question and answer. it has become quite more convoluted than what I though on the outset.
              – Isaac
              Aug 9 at 3:40










            • In timing, you get 0.047, I got 0.048, so we quite agree. Now if you try something like time for (( i=0; i<$n; i++)); do echo "00100" | awk 'print int($0)' ; done >/dev/null (where n is 1000) for each of the possible solutions, you should get timings similar to what I got. As for the file, the reason is very simple, there are times when there is only one value to convert, repeat that conversions at many different times (needs) and the delays will add up.
              – Isaac
              Aug 9 at 3:46















            The actual code for the loop has not been added for simplicity. If you insist, I could add it (note that it is a simple for loop from 1 to 1000). Also, I am still working to complete this question and answer. it has become quite more convoluted than what I though on the outset.
            – Isaac
            Aug 9 at 3:40




            The actual code for the loop has not been added for simplicity. If you insist, I could add it (note that it is a simple for loop from 1 to 1000). Also, I am still working to complete this question and answer. it has become quite more convoluted than what I though on the outset.
            – Isaac
            Aug 9 at 3:40












            In timing, you get 0.047, I got 0.048, so we quite agree. Now if you try something like time for (( i=0; i<$n; i++)); do echo "00100" | awk 'print int($0)' ; done >/dev/null (where n is 1000) for each of the possible solutions, you should get timings similar to what I got. As for the file, the reason is very simple, there are times when there is only one value to convert, repeat that conversions at many different times (needs) and the delays will add up.
            – Isaac
            Aug 9 at 3:46




            In timing, you get 0.047, I got 0.048, so we quite agree. Now if you try something like time for (( i=0; i<$n; i++)); do echo "00100" | awk 'print int($0)' ; done >/dev/null (where n is 1000) for each of the possible solutions, you should get timings similar to what I got. As for the file, the reason is very simple, there are times when there is only one value to convert, repeat that conversions at many different times (needs) and the delays will add up.
            – Isaac
            Aug 9 at 3:46

















             

            draft saved


            draft discarded















































             


            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f461263%2finteger-with-leading-zeros-portable%23new-answer', 'question_page');

            );

            Post as a guest













































































            Popular posts from this blog

            Peggy Mitchell

            The Forum (Inglewood, California)

            Palaiologos