Why does gawk treat `0123` as a decimal number when coming from the input data?

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP












5















According to $ man gawk, the strtonum() function can convert a string into a number:




strtonum(str) Examine str, and return its numeric value. If
str begins with a leading 0, treat it as an
octal number. If str begins with a leading 0x
or 0X, treat it as a hexadecimal number. Oth‐
erwise, assume it is a decimal number.




And if the string begins with a leading 0, the number is treated as octal, while if it begins with 0x it's treated as hexadecimal.



I've run these commands to check my understanding of the function:



$ awk 'END print strtonum("0123") ' <<<''
83

$ awk 'END print strtonum("0x123") ' <<<''
291


The string "0123" is correctly treated as containing an octal number and converted into the decimal number 83.
Similarly, the string "0x123" is correctly treated as containing an hexadecimal number and converted into the decimal number 291.



Now, here's what happens if I run the same commands, but moving the numerical strings from the program text to the input data:



$ awk 'END print strtonum($1) ' <<<'0123'
123

$ awk 'END print strtonum($1) ' <<<'0x123'
291


I understand the second result which is identical as in the previous commands, but I don't understand the first one. Why does gawk now treat 0123 as a decimal number, even though it begins with a leading 0 which characterizes octal numbers?



I suspect it has something to do with the strnum attribute, because for some reason 1, gawk gives this attribute to 0123 but not to 0x123:



$ awk 'END print typeof($1) ' <<<'0123'
strnum

$ awk 'END print typeof($1) ' <<<'0x123'
string


1 It may be due to a variation between awk implementations:




To clarify, only strings that are coming from a few sources (here quoting the
POSIX spec): [...] are to be considered a numeric string if their value happens
to be numerical (allowing leading and trailing blanks, with variations between
implementations in support for hex, octal
, inf, nan...).





I'm using gawk version 4.2.62, and the output of $ awk -V is:



GNU Awk 4.2.62, API: 2.0 (GNU MPFR 3.1.4, GNU MP 6.1.0)









share|improve this question




























    5















    According to $ man gawk, the strtonum() function can convert a string into a number:




    strtonum(str) Examine str, and return its numeric value. If
    str begins with a leading 0, treat it as an
    octal number. If str begins with a leading 0x
    or 0X, treat it as a hexadecimal number. Oth‐
    erwise, assume it is a decimal number.




    And if the string begins with a leading 0, the number is treated as octal, while if it begins with 0x it's treated as hexadecimal.



    I've run these commands to check my understanding of the function:



    $ awk 'END print strtonum("0123") ' <<<''
    83

    $ awk 'END print strtonum("0x123") ' <<<''
    291


    The string "0123" is correctly treated as containing an octal number and converted into the decimal number 83.
    Similarly, the string "0x123" is correctly treated as containing an hexadecimal number and converted into the decimal number 291.



    Now, here's what happens if I run the same commands, but moving the numerical strings from the program text to the input data:



    $ awk 'END print strtonum($1) ' <<<'0123'
    123

    $ awk 'END print strtonum($1) ' <<<'0x123'
    291


    I understand the second result which is identical as in the previous commands, but I don't understand the first one. Why does gawk now treat 0123 as a decimal number, even though it begins with a leading 0 which characterizes octal numbers?



    I suspect it has something to do with the strnum attribute, because for some reason 1, gawk gives this attribute to 0123 but not to 0x123:



    $ awk 'END print typeof($1) ' <<<'0123'
    strnum

    $ awk 'END print typeof($1) ' <<<'0x123'
    string


    1 It may be due to a variation between awk implementations:




    To clarify, only strings that are coming from a few sources (here quoting the
    POSIX spec): [...] are to be considered a numeric string if their value happens
    to be numerical (allowing leading and trailing blanks, with variations between
    implementations in support for hex, octal
    , inf, nan...).





    I'm using gawk version 4.2.62, and the output of $ awk -V is:



    GNU Awk 4.2.62, API: 2.0 (GNU MPFR 3.1.4, GNU MP 6.1.0)









    share|improve this question


























      5












      5








      5








      According to $ man gawk, the strtonum() function can convert a string into a number:




      strtonum(str) Examine str, and return its numeric value. If
      str begins with a leading 0, treat it as an
      octal number. If str begins with a leading 0x
      or 0X, treat it as a hexadecimal number. Oth‐
      erwise, assume it is a decimal number.




      And if the string begins with a leading 0, the number is treated as octal, while if it begins with 0x it's treated as hexadecimal.



      I've run these commands to check my understanding of the function:



      $ awk 'END print strtonum("0123") ' <<<''
      83

      $ awk 'END print strtonum("0x123") ' <<<''
      291


      The string "0123" is correctly treated as containing an octal number and converted into the decimal number 83.
      Similarly, the string "0x123" is correctly treated as containing an hexadecimal number and converted into the decimal number 291.



      Now, here's what happens if I run the same commands, but moving the numerical strings from the program text to the input data:



      $ awk 'END print strtonum($1) ' <<<'0123'
      123

      $ awk 'END print strtonum($1) ' <<<'0x123'
      291


      I understand the second result which is identical as in the previous commands, but I don't understand the first one. Why does gawk now treat 0123 as a decimal number, even though it begins with a leading 0 which characterizes octal numbers?



      I suspect it has something to do with the strnum attribute, because for some reason 1, gawk gives this attribute to 0123 but not to 0x123:



      $ awk 'END print typeof($1) ' <<<'0123'
      strnum

      $ awk 'END print typeof($1) ' <<<'0x123'
      string


      1 It may be due to a variation between awk implementations:




      To clarify, only strings that are coming from a few sources (here quoting the
      POSIX spec): [...] are to be considered a numeric string if their value happens
      to be numerical (allowing leading and trailing blanks, with variations between
      implementations in support for hex, octal
      , inf, nan...).





      I'm using gawk version 4.2.62, and the output of $ awk -V is:



      GNU Awk 4.2.62, API: 2.0 (GNU MPFR 3.1.4, GNU MP 6.1.0)









      share|improve this question
















      According to $ man gawk, the strtonum() function can convert a string into a number:




      strtonum(str) Examine str, and return its numeric value. If
      str begins with a leading 0, treat it as an
      octal number. If str begins with a leading 0x
      or 0X, treat it as a hexadecimal number. Oth‐
      erwise, assume it is a decimal number.




      And if the string begins with a leading 0, the number is treated as octal, while if it begins with 0x it's treated as hexadecimal.



      I've run these commands to check my understanding of the function:



      $ awk 'END print strtonum("0123") ' <<<''
      83

      $ awk 'END print strtonum("0x123") ' <<<''
      291


      The string "0123" is correctly treated as containing an octal number and converted into the decimal number 83.
      Similarly, the string "0x123" is correctly treated as containing an hexadecimal number and converted into the decimal number 291.



      Now, here's what happens if I run the same commands, but moving the numerical strings from the program text to the input data:



      $ awk 'END print strtonum($1) ' <<<'0123'
      123

      $ awk 'END print strtonum($1) ' <<<'0x123'
      291


      I understand the second result which is identical as in the previous commands, but I don't understand the first one. Why does gawk now treat 0123 as a decimal number, even though it begins with a leading 0 which characterizes octal numbers?



      I suspect it has something to do with the strnum attribute, because for some reason 1, gawk gives this attribute to 0123 but not to 0x123:



      $ awk 'END print typeof($1) ' <<<'0123'
      strnum

      $ awk 'END print typeof($1) ' <<<'0x123'
      string


      1 It may be due to a variation between awk implementations:




      To clarify, only strings that are coming from a few sources (here quoting the
      POSIX spec): [...] are to be considered a numeric string if their value happens
      to be numerical (allowing leading and trailing blanks, with variations between
      implementations in support for hex, octal
      , inf, nan...).





      I'm using gawk version 4.2.62, and the output of $ awk -V is:



      GNU Awk 4.2.62, API: 2.0 (GNU MPFR 3.1.4, GNU MP 6.1.0)






      input gawk numeric-data






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Feb 26 at 13:32









      Jeff Schaller

      43.8k1161141




      43.8k1161141










      asked Feb 26 at 13:11









      user938271user938271

      259119




      259119




















          1 Answer
          1






          active

          oldest

          votes


















          3














          This is related to the generalised strnum handling in version 4.2 of GAWK.



          Input values which look like numbers are treated as strnum values, represented internally as having both string and number types. “0123” qualifies as looking like a number, so it is handled as a strnum. strtonum is designed to handle both string and number inputs; it looks for numbers first, and when it encounters an input number, returns the number without transformation:



          NODE *
          do_strtonum(int nargs)

          NODE *tmp;
          AWKNUM d;

          tmp = fixtype(POP_SCALAR());
          if ((tmp->flags & NUMBER) != 0)
          d = (AWKNUM) tmp->numbr;
          else if (get_numbase(tmp->stptr, tmp->stlen, use_lc_numeric) != 10)
          d = nondec2awknum(tmp->stptr, tmp->stlen, NULL);
          else
          d = (AWKNUM) force_number(tmp)->numbr;

          DEREF(tmp);
          return make_number((AWKNUM) d);



          Thus “0123” becomes the number 123, and strtonum returns that directly.



          “0x123” doesn’t look like a number (by the rules defined in the link given above), so it is handled as a string and processed as you’d expect by strtonum.



          A number is defined as follows in AWK:




          The input string is decomposed into two parts: an initial, possibly empty, sequence of white-space characters (as specified by isspace()) and a subject sequence interpreted as a floating-point constant.



          The expected form of the subject sequence is an optional '+' or '-' sign, then a non-empty sequence of digits optionally containing a <period>, then an optional exponent part. An exponent part consists of 'e' or 'E', followed by an optional sign, followed by one or more decimal digits.



          The sequence starting with the first digit or the <period> (whichever occurs first) is interpreted as a floating constant of the C language, and if neither an exponent part nor a <period> appears, a is assumed to follow the last digit in the string. If the subject sequence begins with a <hyphen-minus>, the value resulting from the conversion is negated.







          share|improve this answer

























          • Thank you for the answer. So, the only way to bypass the fact that strtonum() looks for numbers first is to use a dummy string concatenation and force the numeric string to become a literal string: $ awk ' print strtonum($1 "") ' <<<'0123'. Could you clarify which rule in the link of the user manual explains why 0x123 doesn't look like a number? Because it looks like a number to me; at least that's how I would write 291 in hexadecimal in an awk progam text. Is it because of the alphabetical character x?

            – user938271
            Feb 26 at 14:13






          • 1





            See the description in the POSIX awk specification: a number is a possibly empty sequence of spaces, followed by “+” or “-”, followed by digits forming a floating-point decimal number. “x” isn’t allowed in a number, so it’s a string.

            – Stephen Kitt
            Feb 26 at 14:29






          • 1





            Note that if POSIXLY_CORRECT is set gawk treats 0x10 as a 16 number (as required (by mistake) by some older version of the POSIX spec, but now only allowed)

            – Stéphane Chazelas
            Feb 26 at 14:44






          • 1





            @user938271 an explicit or implicit split() in awk will create "dual-nature" values which are both strings and numbers. This does not happen with other functions: awk 'BEGINs="0 1 2"; split(s, a); z = substr(s, 1, 1); print a[1], z, (a[1] == z), a[1] ? "yes" : "no", z ? "yes" : "no"'. Same thing with $1 as with a[1]. As I already mentioned in a couple of comments / answers, this is not properly specified in the standard, but only vaguely alluded to. It probably deserves its own Q&A.

            – mosvy
            Feb 26 at 14:45











          Your Answer








          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "106"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: false,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          imageUploader:
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          ,
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );













          draft saved

          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f503133%2fwhy-does-gawk-treat-0123-as-a-decimal-number-when-coming-from-the-input-data%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          3














          This is related to the generalised strnum handling in version 4.2 of GAWK.



          Input values which look like numbers are treated as strnum values, represented internally as having both string and number types. “0123” qualifies as looking like a number, so it is handled as a strnum. strtonum is designed to handle both string and number inputs; it looks for numbers first, and when it encounters an input number, returns the number without transformation:



          NODE *
          do_strtonum(int nargs)

          NODE *tmp;
          AWKNUM d;

          tmp = fixtype(POP_SCALAR());
          if ((tmp->flags & NUMBER) != 0)
          d = (AWKNUM) tmp->numbr;
          else if (get_numbase(tmp->stptr, tmp->stlen, use_lc_numeric) != 10)
          d = nondec2awknum(tmp->stptr, tmp->stlen, NULL);
          else
          d = (AWKNUM) force_number(tmp)->numbr;

          DEREF(tmp);
          return make_number((AWKNUM) d);



          Thus “0123” becomes the number 123, and strtonum returns that directly.



          “0x123” doesn’t look like a number (by the rules defined in the link given above), so it is handled as a string and processed as you’d expect by strtonum.



          A number is defined as follows in AWK:




          The input string is decomposed into two parts: an initial, possibly empty, sequence of white-space characters (as specified by isspace()) and a subject sequence interpreted as a floating-point constant.



          The expected form of the subject sequence is an optional '+' or '-' sign, then a non-empty sequence of digits optionally containing a <period>, then an optional exponent part. An exponent part consists of 'e' or 'E', followed by an optional sign, followed by one or more decimal digits.



          The sequence starting with the first digit or the <period> (whichever occurs first) is interpreted as a floating constant of the C language, and if neither an exponent part nor a <period> appears, a is assumed to follow the last digit in the string. If the subject sequence begins with a <hyphen-minus>, the value resulting from the conversion is negated.







          share|improve this answer

























          • Thank you for the answer. So, the only way to bypass the fact that strtonum() looks for numbers first is to use a dummy string concatenation and force the numeric string to become a literal string: $ awk ' print strtonum($1 "") ' <<<'0123'. Could you clarify which rule in the link of the user manual explains why 0x123 doesn't look like a number? Because it looks like a number to me; at least that's how I would write 291 in hexadecimal in an awk progam text. Is it because of the alphabetical character x?

            – user938271
            Feb 26 at 14:13






          • 1





            See the description in the POSIX awk specification: a number is a possibly empty sequence of spaces, followed by “+” or “-”, followed by digits forming a floating-point decimal number. “x” isn’t allowed in a number, so it’s a string.

            – Stephen Kitt
            Feb 26 at 14:29






          • 1





            Note that if POSIXLY_CORRECT is set gawk treats 0x10 as a 16 number (as required (by mistake) by some older version of the POSIX spec, but now only allowed)

            – Stéphane Chazelas
            Feb 26 at 14:44






          • 1





            @user938271 an explicit or implicit split() in awk will create "dual-nature" values which are both strings and numbers. This does not happen with other functions: awk 'BEGINs="0 1 2"; split(s, a); z = substr(s, 1, 1); print a[1], z, (a[1] == z), a[1] ? "yes" : "no", z ? "yes" : "no"'. Same thing with $1 as with a[1]. As I already mentioned in a couple of comments / answers, this is not properly specified in the standard, but only vaguely alluded to. It probably deserves its own Q&A.

            – mosvy
            Feb 26 at 14:45
















          3














          This is related to the generalised strnum handling in version 4.2 of GAWK.



          Input values which look like numbers are treated as strnum values, represented internally as having both string and number types. “0123” qualifies as looking like a number, so it is handled as a strnum. strtonum is designed to handle both string and number inputs; it looks for numbers first, and when it encounters an input number, returns the number without transformation:



          NODE *
          do_strtonum(int nargs)

          NODE *tmp;
          AWKNUM d;

          tmp = fixtype(POP_SCALAR());
          if ((tmp->flags & NUMBER) != 0)
          d = (AWKNUM) tmp->numbr;
          else if (get_numbase(tmp->stptr, tmp->stlen, use_lc_numeric) != 10)
          d = nondec2awknum(tmp->stptr, tmp->stlen, NULL);
          else
          d = (AWKNUM) force_number(tmp)->numbr;

          DEREF(tmp);
          return make_number((AWKNUM) d);



          Thus “0123” becomes the number 123, and strtonum returns that directly.



          “0x123” doesn’t look like a number (by the rules defined in the link given above), so it is handled as a string and processed as you’d expect by strtonum.



          A number is defined as follows in AWK:




          The input string is decomposed into two parts: an initial, possibly empty, sequence of white-space characters (as specified by isspace()) and a subject sequence interpreted as a floating-point constant.



          The expected form of the subject sequence is an optional '+' or '-' sign, then a non-empty sequence of digits optionally containing a <period>, then an optional exponent part. An exponent part consists of 'e' or 'E', followed by an optional sign, followed by one or more decimal digits.



          The sequence starting with the first digit or the <period> (whichever occurs first) is interpreted as a floating constant of the C language, and if neither an exponent part nor a <period> appears, a is assumed to follow the last digit in the string. If the subject sequence begins with a <hyphen-minus>, the value resulting from the conversion is negated.







          share|improve this answer

























          • Thank you for the answer. So, the only way to bypass the fact that strtonum() looks for numbers first is to use a dummy string concatenation and force the numeric string to become a literal string: $ awk ' print strtonum($1 "") ' <<<'0123'. Could you clarify which rule in the link of the user manual explains why 0x123 doesn't look like a number? Because it looks like a number to me; at least that's how I would write 291 in hexadecimal in an awk progam text. Is it because of the alphabetical character x?

            – user938271
            Feb 26 at 14:13






          • 1





            See the description in the POSIX awk specification: a number is a possibly empty sequence of spaces, followed by “+” or “-”, followed by digits forming a floating-point decimal number. “x” isn’t allowed in a number, so it’s a string.

            – Stephen Kitt
            Feb 26 at 14:29






          • 1





            Note that if POSIXLY_CORRECT is set gawk treats 0x10 as a 16 number (as required (by mistake) by some older version of the POSIX spec, but now only allowed)

            – Stéphane Chazelas
            Feb 26 at 14:44






          • 1





            @user938271 an explicit or implicit split() in awk will create "dual-nature" values which are both strings and numbers. This does not happen with other functions: awk 'BEGINs="0 1 2"; split(s, a); z = substr(s, 1, 1); print a[1], z, (a[1] == z), a[1] ? "yes" : "no", z ? "yes" : "no"'. Same thing with $1 as with a[1]. As I already mentioned in a couple of comments / answers, this is not properly specified in the standard, but only vaguely alluded to. It probably deserves its own Q&A.

            – mosvy
            Feb 26 at 14:45














          3












          3








          3







          This is related to the generalised strnum handling in version 4.2 of GAWK.



          Input values which look like numbers are treated as strnum values, represented internally as having both string and number types. “0123” qualifies as looking like a number, so it is handled as a strnum. strtonum is designed to handle both string and number inputs; it looks for numbers first, and when it encounters an input number, returns the number without transformation:



          NODE *
          do_strtonum(int nargs)

          NODE *tmp;
          AWKNUM d;

          tmp = fixtype(POP_SCALAR());
          if ((tmp->flags & NUMBER) != 0)
          d = (AWKNUM) tmp->numbr;
          else if (get_numbase(tmp->stptr, tmp->stlen, use_lc_numeric) != 10)
          d = nondec2awknum(tmp->stptr, tmp->stlen, NULL);
          else
          d = (AWKNUM) force_number(tmp)->numbr;

          DEREF(tmp);
          return make_number((AWKNUM) d);



          Thus “0123” becomes the number 123, and strtonum returns that directly.



          “0x123” doesn’t look like a number (by the rules defined in the link given above), so it is handled as a string and processed as you’d expect by strtonum.



          A number is defined as follows in AWK:




          The input string is decomposed into two parts: an initial, possibly empty, sequence of white-space characters (as specified by isspace()) and a subject sequence interpreted as a floating-point constant.



          The expected form of the subject sequence is an optional '+' or '-' sign, then a non-empty sequence of digits optionally containing a <period>, then an optional exponent part. An exponent part consists of 'e' or 'E', followed by an optional sign, followed by one or more decimal digits.



          The sequence starting with the first digit or the <period> (whichever occurs first) is interpreted as a floating constant of the C language, and if neither an exponent part nor a <period> appears, a is assumed to follow the last digit in the string. If the subject sequence begins with a <hyphen-minus>, the value resulting from the conversion is negated.







          share|improve this answer















          This is related to the generalised strnum handling in version 4.2 of GAWK.



          Input values which look like numbers are treated as strnum values, represented internally as having both string and number types. “0123” qualifies as looking like a number, so it is handled as a strnum. strtonum is designed to handle both string and number inputs; it looks for numbers first, and when it encounters an input number, returns the number without transformation:



          NODE *
          do_strtonum(int nargs)

          NODE *tmp;
          AWKNUM d;

          tmp = fixtype(POP_SCALAR());
          if ((tmp->flags & NUMBER) != 0)
          d = (AWKNUM) tmp->numbr;
          else if (get_numbase(tmp->stptr, tmp->stlen, use_lc_numeric) != 10)
          d = nondec2awknum(tmp->stptr, tmp->stlen, NULL);
          else
          d = (AWKNUM) force_number(tmp)->numbr;

          DEREF(tmp);
          return make_number((AWKNUM) d);



          Thus “0123” becomes the number 123, and strtonum returns that directly.



          “0x123” doesn’t look like a number (by the rules defined in the link given above), so it is handled as a string and processed as you’d expect by strtonum.



          A number is defined as follows in AWK:




          The input string is decomposed into two parts: an initial, possibly empty, sequence of white-space characters (as specified by isspace()) and a subject sequence interpreted as a floating-point constant.



          The expected form of the subject sequence is an optional '+' or '-' sign, then a non-empty sequence of digits optionally containing a <period>, then an optional exponent part. An exponent part consists of 'e' or 'E', followed by an optional sign, followed by one or more decimal digits.



          The sequence starting with the first digit or the <period> (whichever occurs first) is interpreted as a floating constant of the C language, and if neither an exponent part nor a <period> appears, a is assumed to follow the last digit in the string. If the subject sequence begins with a <hyphen-minus>, the value resulting from the conversion is negated.








          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Feb 26 at 14:32

























          answered Feb 26 at 14:00









          Stephen KittStephen Kitt

          177k24402480




          177k24402480












          • Thank you for the answer. So, the only way to bypass the fact that strtonum() looks for numbers first is to use a dummy string concatenation and force the numeric string to become a literal string: $ awk ' print strtonum($1 "") ' <<<'0123'. Could you clarify which rule in the link of the user manual explains why 0x123 doesn't look like a number? Because it looks like a number to me; at least that's how I would write 291 in hexadecimal in an awk progam text. Is it because of the alphabetical character x?

            – user938271
            Feb 26 at 14:13






          • 1





            See the description in the POSIX awk specification: a number is a possibly empty sequence of spaces, followed by “+” or “-”, followed by digits forming a floating-point decimal number. “x” isn’t allowed in a number, so it’s a string.

            – Stephen Kitt
            Feb 26 at 14:29






          • 1





            Note that if POSIXLY_CORRECT is set gawk treats 0x10 as a 16 number (as required (by mistake) by some older version of the POSIX spec, but now only allowed)

            – Stéphane Chazelas
            Feb 26 at 14:44






          • 1





            @user938271 an explicit or implicit split() in awk will create "dual-nature" values which are both strings and numbers. This does not happen with other functions: awk 'BEGINs="0 1 2"; split(s, a); z = substr(s, 1, 1); print a[1], z, (a[1] == z), a[1] ? "yes" : "no", z ? "yes" : "no"'. Same thing with $1 as with a[1]. As I already mentioned in a couple of comments / answers, this is not properly specified in the standard, but only vaguely alluded to. It probably deserves its own Q&A.

            – mosvy
            Feb 26 at 14:45


















          • Thank you for the answer. So, the only way to bypass the fact that strtonum() looks for numbers first is to use a dummy string concatenation and force the numeric string to become a literal string: $ awk ' print strtonum($1 "") ' <<<'0123'. Could you clarify which rule in the link of the user manual explains why 0x123 doesn't look like a number? Because it looks like a number to me; at least that's how I would write 291 in hexadecimal in an awk progam text. Is it because of the alphabetical character x?

            – user938271
            Feb 26 at 14:13






          • 1





            See the description in the POSIX awk specification: a number is a possibly empty sequence of spaces, followed by “+” or “-”, followed by digits forming a floating-point decimal number. “x” isn’t allowed in a number, so it’s a string.

            – Stephen Kitt
            Feb 26 at 14:29






          • 1





            Note that if POSIXLY_CORRECT is set gawk treats 0x10 as a 16 number (as required (by mistake) by some older version of the POSIX spec, but now only allowed)

            – Stéphane Chazelas
            Feb 26 at 14:44






          • 1





            @user938271 an explicit or implicit split() in awk will create "dual-nature" values which are both strings and numbers. This does not happen with other functions: awk 'BEGINs="0 1 2"; split(s, a); z = substr(s, 1, 1); print a[1], z, (a[1] == z), a[1] ? "yes" : "no", z ? "yes" : "no"'. Same thing with $1 as with a[1]. As I already mentioned in a couple of comments / answers, this is not properly specified in the standard, but only vaguely alluded to. It probably deserves its own Q&A.

            – mosvy
            Feb 26 at 14:45

















          Thank you for the answer. So, the only way to bypass the fact that strtonum() looks for numbers first is to use a dummy string concatenation and force the numeric string to become a literal string: $ awk ' print strtonum($1 "") ' <<<'0123'. Could you clarify which rule in the link of the user manual explains why 0x123 doesn't look like a number? Because it looks like a number to me; at least that's how I would write 291 in hexadecimal in an awk progam text. Is it because of the alphabetical character x?

          – user938271
          Feb 26 at 14:13





          Thank you for the answer. So, the only way to bypass the fact that strtonum() looks for numbers first is to use a dummy string concatenation and force the numeric string to become a literal string: $ awk ' print strtonum($1 "") ' <<<'0123'. Could you clarify which rule in the link of the user manual explains why 0x123 doesn't look like a number? Because it looks like a number to me; at least that's how I would write 291 in hexadecimal in an awk progam text. Is it because of the alphabetical character x?

          – user938271
          Feb 26 at 14:13




          1




          1





          See the description in the POSIX awk specification: a number is a possibly empty sequence of spaces, followed by “+” or “-”, followed by digits forming a floating-point decimal number. “x” isn’t allowed in a number, so it’s a string.

          – Stephen Kitt
          Feb 26 at 14:29





          See the description in the POSIX awk specification: a number is a possibly empty sequence of spaces, followed by “+” or “-”, followed by digits forming a floating-point decimal number. “x” isn’t allowed in a number, so it’s a string.

          – Stephen Kitt
          Feb 26 at 14:29




          1




          1





          Note that if POSIXLY_CORRECT is set gawk treats 0x10 as a 16 number (as required (by mistake) by some older version of the POSIX spec, but now only allowed)

          – Stéphane Chazelas
          Feb 26 at 14:44





          Note that if POSIXLY_CORRECT is set gawk treats 0x10 as a 16 number (as required (by mistake) by some older version of the POSIX spec, but now only allowed)

          – Stéphane Chazelas
          Feb 26 at 14:44




          1




          1





          @user938271 an explicit or implicit split() in awk will create "dual-nature" values which are both strings and numbers. This does not happen with other functions: awk 'BEGINs="0 1 2"; split(s, a); z = substr(s, 1, 1); print a[1], z, (a[1] == z), a[1] ? "yes" : "no", z ? "yes" : "no"'. Same thing with $1 as with a[1]. As I already mentioned in a couple of comments / answers, this is not properly specified in the standard, but only vaguely alluded to. It probably deserves its own Q&A.

          – mosvy
          Feb 26 at 14:45






          @user938271 an explicit or implicit split() in awk will create "dual-nature" values which are both strings and numbers. This does not happen with other functions: awk 'BEGINs="0 1 2"; split(s, a); z = substr(s, 1, 1); print a[1], z, (a[1] == z), a[1] ? "yes" : "no", z ? "yes" : "no"'. Same thing with $1 as with a[1]. As I already mentioned in a couple of comments / answers, this is not properly specified in the standard, but only vaguely alluded to. It probably deserves its own Q&A.

          – mosvy
          Feb 26 at 14:45


















          draft saved

          draft discarded
















































          Thanks for contributing an answer to Unix & Linux Stack Exchange!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid


          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.

          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f503133%2fwhy-does-gawk-treat-0123-as-a-decimal-number-when-coming-from-the-input-data%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown






          Popular posts from this blog

          How to check contact read email or not when send email to Individual?

          Displaying single band from multi-band raster using QGIS

          How many registers does an x86_64 CPU actually have?