Difference in casting float to int, 32-bit C

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP












11















I currently working with an old code that needs to run a 32-bit system. During this work I stumbled across an issue that (out of academic interest) I would like to understand the cause of.



It seems that casting from float to int in 32-bit C behaves differently if the cast is done on a variable or on an expression. Consider the program:



#include <stdio.h>
int main()
int i,c1,c2;
float f1,f10;
for (i=0; i< 21; i++)
f1 = 3+i*0.1;
f10 = f1*10.0;
c1 = (int)f10;
c2 = (int)(f1*10.0);
printf("%d, %d, %d, %11.9f, %11.9fn",c1,c2,c1-c2,f10,f1*10.0);




Compiled (using gcc) either directly on a 32-bit system or on a 64-bit system using the -m32 modifier the output of the program is:



30, 30, 0, 30.000000000 30.000000000
31, 30, 1, 31.000000000 30.999999046
32, 32, 0, 32.000000000 32.000000477
33, 32, 1, 33.000000000 32.999999523
34, 34, 0, 34.000000000 34.000000954
35, 35, 0, 35.000000000 35.000000000
36, 35, 1, 36.000000000 35.999999046
37, 37, 0, 37.000000000 37.000000477
38, 37, 1, 38.000000000 37.999999523
39, 39, 0, 39.000000000 39.000000954
40, 40, 0, 40.000000000 40.000000000
41, 40, 1, 41.000000000 40.999999046
42, 41, 1, 42.000000000 41.999998093
43, 43, 0, 43.000000000 43.000001907
44, 44, 0, 44.000000000 44.000000954
45, 45, 0, 45.000000000 45.000000000
46, 45, 1, 46.000000000 45.999999046
47, 46, 1, 47.000000000 46.999998093
48, 48, 0, 48.000000000 48.000001907
49, 49, 0, 49.000000000 49.000000954
50, 50, 0, 50.000000000 50.000000000


Hence, it is clear that a difference exists between casting a variable and an expression. Note, that the issue exists also if float is changed to double and/or int is changed to short or long, also the issue do not manifest if program is compiled as 64-bit.



To clarify, the issue that I'm trying to understand here is not about floating-point arithmetic/rounding, but rather differences in memory handling in 32-bit.



The issue were tested on:



  • Linux version 4.15.0-45-generic (buildd@lgw01-amd64-031) (gcc version 7.3.0 (Ubuntu 7.3.0-16ubuntu3)), program compiled using: gcc -m32 Cast32int.c


  • Linux version 2.4.20-8 (bhcompile@porky.devel.redhat.com) (gcc version 3.2.2 20030222 (Red Hat Linux 3.2.2-5)), program compiled using: gcc Cast32int.c


Any pointers to help me understand what is going on here are appreciated.










share|improve this question



















  • 2





    welcome to the world of floating point math - you may want to read stackoverflow.com/questions/588004/…

    – 4386427
    Feb 26 at 8:55






  • 1





    Cannot reproduce, what is your platform/compiler/version etc.?

    – Jabberwocky
    Feb 26 at 8:55






  • 2





    Cannot reproduce either (windows/intel gcc 7.1.0 or Rasbian linux/ARM gcc 6.3.0)

    – Euan Smith
    Feb 26 at 9:02






  • 4





    The updated code multiplies by 10.0 which is a double. It may be that the float is promoted to double before the multiplication and then back to float before being cast to int. In the other case the double product is converted directly to int.

    – Weather Vane
    Feb 26 at 10:05







  • 2





    On some platforms and GCC versions, the floating point co-processor is used for intermediate computations (what you called "expressions") when targeting 32-bit x86. These computations are 80-bit. On 64-bit systems though this is never used, and you get SSE 64-bit computations instead.

    – Nikos C.
    Feb 26 at 10:13
















11















I currently working with an old code that needs to run a 32-bit system. During this work I stumbled across an issue that (out of academic interest) I would like to understand the cause of.



It seems that casting from float to int in 32-bit C behaves differently if the cast is done on a variable or on an expression. Consider the program:



#include <stdio.h>
int main()
int i,c1,c2;
float f1,f10;
for (i=0; i< 21; i++)
f1 = 3+i*0.1;
f10 = f1*10.0;
c1 = (int)f10;
c2 = (int)(f1*10.0);
printf("%d, %d, %d, %11.9f, %11.9fn",c1,c2,c1-c2,f10,f1*10.0);




Compiled (using gcc) either directly on a 32-bit system or on a 64-bit system using the -m32 modifier the output of the program is:



30, 30, 0, 30.000000000 30.000000000
31, 30, 1, 31.000000000 30.999999046
32, 32, 0, 32.000000000 32.000000477
33, 32, 1, 33.000000000 32.999999523
34, 34, 0, 34.000000000 34.000000954
35, 35, 0, 35.000000000 35.000000000
36, 35, 1, 36.000000000 35.999999046
37, 37, 0, 37.000000000 37.000000477
38, 37, 1, 38.000000000 37.999999523
39, 39, 0, 39.000000000 39.000000954
40, 40, 0, 40.000000000 40.000000000
41, 40, 1, 41.000000000 40.999999046
42, 41, 1, 42.000000000 41.999998093
43, 43, 0, 43.000000000 43.000001907
44, 44, 0, 44.000000000 44.000000954
45, 45, 0, 45.000000000 45.000000000
46, 45, 1, 46.000000000 45.999999046
47, 46, 1, 47.000000000 46.999998093
48, 48, 0, 48.000000000 48.000001907
49, 49, 0, 49.000000000 49.000000954
50, 50, 0, 50.000000000 50.000000000


Hence, it is clear that a difference exists between casting a variable and an expression. Note, that the issue exists also if float is changed to double and/or int is changed to short or long, also the issue do not manifest if program is compiled as 64-bit.



To clarify, the issue that I'm trying to understand here is not about floating-point arithmetic/rounding, but rather differences in memory handling in 32-bit.



The issue were tested on:



  • Linux version 4.15.0-45-generic (buildd@lgw01-amd64-031) (gcc version 7.3.0 (Ubuntu 7.3.0-16ubuntu3)), program compiled using: gcc -m32 Cast32int.c


  • Linux version 2.4.20-8 (bhcompile@porky.devel.redhat.com) (gcc version 3.2.2 20030222 (Red Hat Linux 3.2.2-5)), program compiled using: gcc Cast32int.c


Any pointers to help me understand what is going on here are appreciated.










share|improve this question



















  • 2





    welcome to the world of floating point math - you may want to read stackoverflow.com/questions/588004/…

    – 4386427
    Feb 26 at 8:55






  • 1





    Cannot reproduce, what is your platform/compiler/version etc.?

    – Jabberwocky
    Feb 26 at 8:55






  • 2





    Cannot reproduce either (windows/intel gcc 7.1.0 or Rasbian linux/ARM gcc 6.3.0)

    – Euan Smith
    Feb 26 at 9:02






  • 4





    The updated code multiplies by 10.0 which is a double. It may be that the float is promoted to double before the multiplication and then back to float before being cast to int. In the other case the double product is converted directly to int.

    – Weather Vane
    Feb 26 at 10:05







  • 2





    On some platforms and GCC versions, the floating point co-processor is used for intermediate computations (what you called "expressions") when targeting 32-bit x86. These computations are 80-bit. On 64-bit systems though this is never used, and you get SSE 64-bit computations instead.

    – Nikos C.
    Feb 26 at 10:13














11












11








11








I currently working with an old code that needs to run a 32-bit system. During this work I stumbled across an issue that (out of academic interest) I would like to understand the cause of.



It seems that casting from float to int in 32-bit C behaves differently if the cast is done on a variable or on an expression. Consider the program:



#include <stdio.h>
int main()
int i,c1,c2;
float f1,f10;
for (i=0; i< 21; i++)
f1 = 3+i*0.1;
f10 = f1*10.0;
c1 = (int)f10;
c2 = (int)(f1*10.0);
printf("%d, %d, %d, %11.9f, %11.9fn",c1,c2,c1-c2,f10,f1*10.0);




Compiled (using gcc) either directly on a 32-bit system or on a 64-bit system using the -m32 modifier the output of the program is:



30, 30, 0, 30.000000000 30.000000000
31, 30, 1, 31.000000000 30.999999046
32, 32, 0, 32.000000000 32.000000477
33, 32, 1, 33.000000000 32.999999523
34, 34, 0, 34.000000000 34.000000954
35, 35, 0, 35.000000000 35.000000000
36, 35, 1, 36.000000000 35.999999046
37, 37, 0, 37.000000000 37.000000477
38, 37, 1, 38.000000000 37.999999523
39, 39, 0, 39.000000000 39.000000954
40, 40, 0, 40.000000000 40.000000000
41, 40, 1, 41.000000000 40.999999046
42, 41, 1, 42.000000000 41.999998093
43, 43, 0, 43.000000000 43.000001907
44, 44, 0, 44.000000000 44.000000954
45, 45, 0, 45.000000000 45.000000000
46, 45, 1, 46.000000000 45.999999046
47, 46, 1, 47.000000000 46.999998093
48, 48, 0, 48.000000000 48.000001907
49, 49, 0, 49.000000000 49.000000954
50, 50, 0, 50.000000000 50.000000000


Hence, it is clear that a difference exists between casting a variable and an expression. Note, that the issue exists also if float is changed to double and/or int is changed to short or long, also the issue do not manifest if program is compiled as 64-bit.



To clarify, the issue that I'm trying to understand here is not about floating-point arithmetic/rounding, but rather differences in memory handling in 32-bit.



The issue were tested on:



  • Linux version 4.15.0-45-generic (buildd@lgw01-amd64-031) (gcc version 7.3.0 (Ubuntu 7.3.0-16ubuntu3)), program compiled using: gcc -m32 Cast32int.c


  • Linux version 2.4.20-8 (bhcompile@porky.devel.redhat.com) (gcc version 3.2.2 20030222 (Red Hat Linux 3.2.2-5)), program compiled using: gcc Cast32int.c


Any pointers to help me understand what is going on here are appreciated.










share|improve this question
















I currently working with an old code that needs to run a 32-bit system. During this work I stumbled across an issue that (out of academic interest) I would like to understand the cause of.



It seems that casting from float to int in 32-bit C behaves differently if the cast is done on a variable or on an expression. Consider the program:



#include <stdio.h>
int main()
int i,c1,c2;
float f1,f10;
for (i=0; i< 21; i++)
f1 = 3+i*0.1;
f10 = f1*10.0;
c1 = (int)f10;
c2 = (int)(f1*10.0);
printf("%d, %d, %d, %11.9f, %11.9fn",c1,c2,c1-c2,f10,f1*10.0);




Compiled (using gcc) either directly on a 32-bit system or on a 64-bit system using the -m32 modifier the output of the program is:



30, 30, 0, 30.000000000 30.000000000
31, 30, 1, 31.000000000 30.999999046
32, 32, 0, 32.000000000 32.000000477
33, 32, 1, 33.000000000 32.999999523
34, 34, 0, 34.000000000 34.000000954
35, 35, 0, 35.000000000 35.000000000
36, 35, 1, 36.000000000 35.999999046
37, 37, 0, 37.000000000 37.000000477
38, 37, 1, 38.000000000 37.999999523
39, 39, 0, 39.000000000 39.000000954
40, 40, 0, 40.000000000 40.000000000
41, 40, 1, 41.000000000 40.999999046
42, 41, 1, 42.000000000 41.999998093
43, 43, 0, 43.000000000 43.000001907
44, 44, 0, 44.000000000 44.000000954
45, 45, 0, 45.000000000 45.000000000
46, 45, 1, 46.000000000 45.999999046
47, 46, 1, 47.000000000 46.999998093
48, 48, 0, 48.000000000 48.000001907
49, 49, 0, 49.000000000 49.000000954
50, 50, 0, 50.000000000 50.000000000


Hence, it is clear that a difference exists between casting a variable and an expression. Note, that the issue exists also if float is changed to double and/or int is changed to short or long, also the issue do not manifest if program is compiled as 64-bit.



To clarify, the issue that I'm trying to understand here is not about floating-point arithmetic/rounding, but rather differences in memory handling in 32-bit.



The issue were tested on:



  • Linux version 4.15.0-45-generic (buildd@lgw01-amd64-031) (gcc version 7.3.0 (Ubuntu 7.3.0-16ubuntu3)), program compiled using: gcc -m32 Cast32int.c


  • Linux version 2.4.20-8 (bhcompile@porky.devel.redhat.com) (gcc version 3.2.2 20030222 (Red Hat Linux 3.2.2-5)), program compiled using: gcc Cast32int.c


Any pointers to help me understand what is going on here are appreciated.







c casting 32-bit






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Feb 26 at 10:00







cpaitor

















asked Feb 26 at 8:50









cpaitorcpaitor

1391111




1391111







  • 2





    welcome to the world of floating point math - you may want to read stackoverflow.com/questions/588004/…

    – 4386427
    Feb 26 at 8:55






  • 1





    Cannot reproduce, what is your platform/compiler/version etc.?

    – Jabberwocky
    Feb 26 at 8:55






  • 2





    Cannot reproduce either (windows/intel gcc 7.1.0 or Rasbian linux/ARM gcc 6.3.0)

    – Euan Smith
    Feb 26 at 9:02






  • 4





    The updated code multiplies by 10.0 which is a double. It may be that the float is promoted to double before the multiplication and then back to float before being cast to int. In the other case the double product is converted directly to int.

    – Weather Vane
    Feb 26 at 10:05







  • 2





    On some platforms and GCC versions, the floating point co-processor is used for intermediate computations (what you called "expressions") when targeting 32-bit x86. These computations are 80-bit. On 64-bit systems though this is never used, and you get SSE 64-bit computations instead.

    – Nikos C.
    Feb 26 at 10:13













  • 2





    welcome to the world of floating point math - you may want to read stackoverflow.com/questions/588004/…

    – 4386427
    Feb 26 at 8:55






  • 1





    Cannot reproduce, what is your platform/compiler/version etc.?

    – Jabberwocky
    Feb 26 at 8:55






  • 2





    Cannot reproduce either (windows/intel gcc 7.1.0 or Rasbian linux/ARM gcc 6.3.0)

    – Euan Smith
    Feb 26 at 9:02






  • 4





    The updated code multiplies by 10.0 which is a double. It may be that the float is promoted to double before the multiplication and then back to float before being cast to int. In the other case the double product is converted directly to int.

    – Weather Vane
    Feb 26 at 10:05







  • 2





    On some platforms and GCC versions, the floating point co-processor is used for intermediate computations (what you called "expressions") when targeting 32-bit x86. These computations are 80-bit. On 64-bit systems though this is never used, and you get SSE 64-bit computations instead.

    – Nikos C.
    Feb 26 at 10:13








2




2





welcome to the world of floating point math - you may want to read stackoverflow.com/questions/588004/…

– 4386427
Feb 26 at 8:55





welcome to the world of floating point math - you may want to read stackoverflow.com/questions/588004/…

– 4386427
Feb 26 at 8:55




1




1





Cannot reproduce, what is your platform/compiler/version etc.?

– Jabberwocky
Feb 26 at 8:55





Cannot reproduce, what is your platform/compiler/version etc.?

– Jabberwocky
Feb 26 at 8:55




2




2





Cannot reproduce either (windows/intel gcc 7.1.0 or Rasbian linux/ARM gcc 6.3.0)

– Euan Smith
Feb 26 at 9:02





Cannot reproduce either (windows/intel gcc 7.1.0 or Rasbian linux/ARM gcc 6.3.0)

– Euan Smith
Feb 26 at 9:02




4




4





The updated code multiplies by 10.0 which is a double. It may be that the float is promoted to double before the multiplication and then back to float before being cast to int. In the other case the double product is converted directly to int.

– Weather Vane
Feb 26 at 10:05






The updated code multiplies by 10.0 which is a double. It may be that the float is promoted to double before the multiplication and then back to float before being cast to int. In the other case the double product is converted directly to int.

– Weather Vane
Feb 26 at 10:05





2




2





On some platforms and GCC versions, the floating point co-processor is used for intermediate computations (what you called "expressions") when targeting 32-bit x86. These computations are 80-bit. On 64-bit systems though this is never used, and you get SSE 64-bit computations instead.

– Nikos C.
Feb 26 at 10:13






On some platforms and GCC versions, the floating point co-processor is used for intermediate computations (what you called "expressions") when targeting 32-bit x86. These computations are 80-bit. On 64-bit systems though this is never used, and you get SSE 64-bit computations instead.

– Nikos C.
Feb 26 at 10:13













4 Answers
4






active

oldest

votes


















7














With MS Visual C 2008 I was able to reproduce this.



Inspecting the assembler, the difference between the two is an intermediate store and fetch of a result with intermediate conversions:



 f10 = f1*10.0; // double result f10 converted to float and stored
c1 = (int)f10; // float result f10 fetched and converted to double
c2 = (int)(f1*10.0); // no store/fetch/convert


The assembler generated pushes values onto the FPU stack that get converted to 64 bits and then are multiplied. For c1 the result is then converted back to float and stored and is then retrieved again and placed on the FPU stack (and converted to double again) for a call to __ftol2_sse, a run-time function to convert a double to int.



For c2 the intermediate value is not converted to and from float and passed immediately to the __ftol2_sse function. For this function see also the answer at Convert double to int?.



Assembler:



 f10 = f1*10;
fld dword ptr [f1]
fmul qword ptr [__real@4024000000000000 (496190h)]
fstp dword ptr [f10]

c2 = (int)(f1*10);
fld dword ptr [f1]
fmul qword ptr [__real@4024000000000000 (496190h)]
call __ftol2_sse
mov dword ptr [c2],eax

c1 = (int)f10;
fld dword ptr [f10]
call __ftol2_sse
mov dword ptr [c1],eax





share|improve this answer


















  • 3





    The stores and fetches are immaterial; they are merely implementations of the C semantics. The key reason differences are observed between f1*10.0 and f10 is the former is a double expression while the latter is a float. The assignment f10 = f1*10.0; changes the value when it converts the double to float.

    – Eric Postpischil
    Feb 26 at 13:07











  • @EricPostpischil Are you sure? I am pretty sure that f1*10.0 will have the precision of a long double (80 bits), rather than a double (just 64 bits).

    – Martin Bonner
    Feb 26 at 13:15











  • @MartinBonner: It is irrelevant. f10 is a float; its significand is 24 bits on this platform. 10 has three significant bits, so multiplying by it produces a number with at most 27 bits. So its exact value will fit into the 53-bit significand of a double. Extended it to 64 bits will not change anything.

    – Eric Postpischil
    Feb 26 at 13:46











  • I think the correct answer is indeed @EricPostpischil's observation that in C all floating point calculations are performed in double precision. Hence the original code behaves as designed. My answer is just the identification of what happens "under the hood".

    – Paul Ogilvie
    Feb 26 at 13:50











  • @PaulOgilvie: It is not that all floating-point calculations are performed in double but that using one operand of type double causes other operands to be converted to double, per the usual arithmetic conversions. If all operands are of type float, as with using 10.0f instead of 10.0, the expression type remains float (although an implementation is permitted to use double to evaluate it, but that does not seem to be happening in OP’s code).

    – Eric Postpischil
    Feb 26 at 13:57


















5














In the “32-bit system,” the difference is caused by the fact that f1*10.0 uses full double precision, while f10 has only float precision because that is its type. f1*10.0 uses double precision because 10.0 is a double constant. When f1*10.0 is assigned to f10, the value changes because it is implicitly converted to float, which has less precision.



If you use the float constant 10.0f instead, the differences vanish.



Consider the first case, when i is 1. Then:



  • In f1 = 3+i*0.1, 0.1 is a double constant, so the arithmetic is performed in double, and the result is 3.100000000000000088817841970012523233890533447265625. Then, to assign this to f1, it is converted to float, which produces 3.099999904632568359375.

  • In f10 = f1*10.0;, 10.0 is a double constant, so the arithmetic is again performed in double, and the result is 30.99999904632568359375. For assignment to f10, this is converted to float, and the result is 31.

  • Later, when f10 and f1*10.0 are printed, we see the values given above, with nine digits after the decimal point, “31.000000000” for f10, and “30.999999046”.

If you print f1*10.0f, with the float constant 10.0f instead of the double constant 10.0, the result will be “31.000000000” rather than “30.999999046”.



(The above uses IEEE-754 basic 32-bit and 64-bit binary floating-point arithmetic.)



In particular, note this: The difference between f1*10.0 and f10 arises when f1*10.0 is converted to float for assignment to f10. While C permits implementations to use extra precision in evaluating expressions, it requires implementations to discard this precision in assignments and casts. Therefore, in a standard-conforming compiler, the assignment to f10 must use float precision. This means, even when the program is compiled for a “64-bit system,” the differences should occur. If they do not, the compiler does not conforming to the C standard.



Furthermore, if float is changed to double, the conversion to float does not happen, and the value will not be changed. In this case, no differences between f1*10.0 and f10 should manifest.



Given that the question reports the differences do not manifest with a “64-bit” compilation and do manifest with double, it is questionable whether the observations have been reported accurately. To clarify this, the exact code should be shown, and the observations should be reproduced by a third party.






share|improve this answer

























  • The question was why is there a difference between an -m32 and an -m64 build of the same source code, not what the difference between 10.0 and 10.0f is.

    – Nikos C.
    Feb 26 at 13:48












  • @NikosC.: Indeed, but the question falsely states “it is clear that a difference exists between casting a variable and an expression.” This answer disproves that and explains the difference. The question fails to present convincing evidence there is a difference between the 32-bit and 64-bit compilations—i suspect user error. I have updated the answer to note the issue.

    – Eric Postpischil
    Feb 26 at 13:53



















2














The C standard is not very strict about how floating point math is to be carried out. The standard allows an implementation to do calculation with higher precision than the involved types.



The result in your case is likely to come from the fact that c1 is calculated as "float-to-int" while c2 is calculated as "double-to-int" (or even higher precision).



Here is another example showing the same behavior.



#define DD 0.11111111

int main()

int i = 27;

int c1,c2,c3;
float f1;
double d1;
printf("%.60fn", DD);

f1 = i * DD;
d1 = i * DD;
c1 = (int)f1;
c2 = (int)(i * DD);
c3 = (int)d1;

printf("----------------------n");
printf("f1: %.60fn", f1);
printf("d1: %.60fn", d1);
printf("m : %.60fn", i * DD);
printf("%d, %d, %dn",c1,c2,c3);



My output:



0.111111109999999999042863407794357044622302055358886718750000
----------------------
f1: 3.000000000000000000000000000000000000000000000000000000000000
d1: 2.999999970000000182324129127664491534233093261718750000000000
m : 2.999999970000000182324129127664491534233093261718750000000000
3, 2, 2


The trick here is the number of ones in 0.11111111. The accurate result is "2.99999997". As you change the number of ones the accurate result is still in the form "2.99...997" (i.e. the number of 9 increases when the number of 1 increases).



At some point (aka some number of ones) you will reach a point where storing the result in a float rounds the result to "3.0" while the double is still able to hold "2.999999.....". Then a conversion to int will give different results.



Increasing the number of ones further will lead to a point where the double will also round to "3.0" and the conversion to int will consequently yield the same result.






share|improve this answer

























  • On some platforms and GCC versions, the floating point co-processor is used for intermediate computations (what the OP called "expressions"). These are 80-bit. On 64-bit systems though this is never used, and you get SSE 64-bit computations instead.

    – Nikos C.
    Feb 26 at 10:12











  • The differences do not arise in the conversions to int. The difference arises when f1*10.0, which is a double expression, is assigned to f10, which is a float. That assignment changes the value.

    – Eric Postpischil
    Feb 26 at 13:05











  • @EricPostpischil That was actually what I tried to write. Here: "... reach a point where storing the result in a float rounds the result to "3.0""

    – 4386427
    Feb 26 at 14:43


















1














The main reason is that the rounding-control (RC) field of the x87 FPU control register values are inconsistent in the follow two lines. eventually the values of c1 and c2 are different.



0x08048457 <+58>: fstps 0x44(%esp)
0x0804848b <+110>: fistpl 0x3c(%esp)


Add gcc compile option -mfpmath=387 -mno-sse, it can be reproduced (Even without -m32, or change the float to a double)

Like this:



gcc -otest test.c -g -mfpmath=387 -mno-sse -m32


Then use gdb to debug, breakpoint at 0x0804845b, and run to i=1



 0x08048457 <+58>: fstps 0x44(%esp)
0x0804845b <+62>: flds 0x44(%esp)

(gdb) info float
=>R7: Valid 0x4003f7ffff8000000000 +30.99999904632568359
R6: Empty 0x4002a000000000000000
R5: Empty 0x00000000000000000000
R4: Empty 0x00000000000000000000
R3: Empty 0x00000000000000000000
R2: Empty 0x00000000000000000000
R1: Empty 0x00000000000000000000
R0: Empty 0x00000000000000000000

Status Word: 0x3820 PE
TOP: 7
Control Word: 0x037f IM DM ZM OM UM PM
PC: Extended Precision (64-bits)
RC: Round to nearest
Tag Word: 0x3fff
Instruction Pointer: 0x00:0x08048455
Operand Pointer: 0x00:0x00000000
Opcode: 0x0000

(gdb) x /xw 0x44+$esp
0xffffb594: 0x41f80000 ==> 31.0, s=0, M=1.1111 E=4


observe the execution results of fstps,

at this time, the RC value on the control register on the fpu is Round to nearest.

the value on the fpu register is 30.99999904632568359 (80bit).

the value on 0x44(%esp) (variable "f10") is 31.0. (round to nearest)



Then use gdb to debug, breakpoint at 0x0804848b, and run to i=1



 0x0804848b <+110>: fistpl 0x3c(%esp)

(gdb) info float
=>R7: Valid 0x4003f7ffff8000000000 +30.99999904632568359
R6: Empty 0x4002a000000000000000
R5: Empty 0x00000000000000000000
R4: Empty 0x00000000000000000000
R3: Empty 0x00000000000000000000
R2: Empty 0x00000000000000000000
R1: Empty 0x00000000000000000000
R0: Empty 0x00000000000000000000

Status Word: 0x3820 PE
TOP: 7
Control Word: 0x0c7f IM DM ZM OM UM PM
PC: Single Precision (24-bits)
RC: Round toward zero
Tag Word: 0x3fff
Instruction Pointer: 0x00:0x08048485
Operand Pointer: 0x00:0x00000000
Opcode: 0x0000


at this time, the RC value on the control register on the fpu is Round toward zero.

the value on the fpu register is 30.99999904632568359 (80bit). value is the same as above

obviously when the integer is converted, the decimal point is truncated, and the value is 30.



Below is the main decompiled code



 (gdb) disas main
Dump of assembler code for function main:
0x0804841d <+0>: push %ebp
0x0804841e <+1>: mov %esp,%ebp
0x08048420 <+3>: and $0xfffffff0,%esp
0x08048423 <+6>: sub $0x50,%esp
0x08048426 <+9>: movl $0x0,0x4c(%esp)
0x0804842e <+17>: jmp 0x80484de <main+193>
0x08048433 <+22>: fildl 0x4c(%esp)
0x08048437 <+26>: fldl 0x80485a8
0x0804843d <+32>: fmulp %st,%st(1)
0x0804843f <+34>: fldl 0x80485b0
0x08048445 <+40>: faddp %st,%st(1)
0x08048447 <+42>: fstps 0x48(%esp)
0x0804844b <+46>: flds 0x48(%esp)
0x0804844f <+50>: flds 0x80485b8
0x08048455 <+56>: fmulp %st,%st(1)
0x08048457 <+58>: fstps 0x44(%esp) // store to f10
0x0804845b <+62>: flds 0x44(%esp)
0x0804845f <+66>: fnstcw 0x2a(%esp)
0x08048463 <+70>: movzwl 0x2a(%esp),%eax
0x08048468 <+75>: mov $0xc,%ah
0x0804846a <+77>: mov %ax,0x28(%esp)
0x0804846f <+82>: fldcw 0x28(%esp)
0x08048473 <+86>: fistpl 0x40(%esp)
0x08048477 <+90>: fldcw 0x2a(%esp)
0x0804847b <+94>: flds 0x48(%esp)
0x0804847f <+98>: fldl 0x80485c0
0x08048485 <+104>: fmulp %st,%st(1)
0x08048487 <+106>: fldcw 0x28(%esp)
0x0804848b <+110>: fistpl 0x3c(%esp) // f1 * 10 convert int
0x0804848f <+114>: fldcw 0x2a(%esp)
0x08048493 <+118>: flds 0x48(%esp)
0x08048497 <+122>: fldl 0x80485c0
0x0804849d <+128>: fmulp %st,%st(1)
0x0804849f <+130>: flds 0x44(%esp)
0x080484a3 <+134>: fxch %st(1)
0x080484a5 <+136>: mov 0x3c(%esp),%eax
0x080484a9 <+140>: mov 0x40(%esp),%edx
0x080484ad <+144>: sub %eax,%edx
0x080484af <+146>: mov %edx,%eax
0x080484b1 <+148>: fstpl 0x18(%esp)
0x080484b5 <+152>: fstpl 0x10(%esp)
0x080484b9 <+156>: mov %eax,0xc(%esp)
0x080484bd <+160>: mov 0x3c(%esp),%eax
0x080484c1 <+164>: mov %eax,0x8(%esp)
0x080484c5 <+168>: mov 0x40(%esp),%eax
0x080484c9 <+172>: mov %eax,0x4(%esp)
0x080484cd <+176>: movl $0x8048588,(%esp)
0x080484d4 <+183>: call 0x80482f0 <printf@plt>
0x080484d9 <+188>: addl $0x1,0x4c(%esp)
0x080484de <+193>: cmpl $0x14,0x4c(%esp)
0x080484e3 <+198>: jle 0x8048433 <main+22>
0x080484e9 <+204>: leave
0x080484ea <+205>: ret





share|improve this answer
























    Your Answer






    StackExchange.ifUsing("editor", function ()
    StackExchange.using("externalEditor", function ()
    StackExchange.using("snippets", function ()
    StackExchange.snippets.init();
    );
    );
    , "code-snippets");

    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "1"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













    draft saved

    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54881541%2fdifference-in-casting-float-to-int-32-bit-c%23new-answer', 'question_page');

    );

    Post as a guest















    Required, but never shown

























    4 Answers
    4






    active

    oldest

    votes








    4 Answers
    4






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    7














    With MS Visual C 2008 I was able to reproduce this.



    Inspecting the assembler, the difference between the two is an intermediate store and fetch of a result with intermediate conversions:



     f10 = f1*10.0; // double result f10 converted to float and stored
    c1 = (int)f10; // float result f10 fetched and converted to double
    c2 = (int)(f1*10.0); // no store/fetch/convert


    The assembler generated pushes values onto the FPU stack that get converted to 64 bits and then are multiplied. For c1 the result is then converted back to float and stored and is then retrieved again and placed on the FPU stack (and converted to double again) for a call to __ftol2_sse, a run-time function to convert a double to int.



    For c2 the intermediate value is not converted to and from float and passed immediately to the __ftol2_sse function. For this function see also the answer at Convert double to int?.



    Assembler:



     f10 = f1*10;
    fld dword ptr [f1]
    fmul qword ptr [__real@4024000000000000 (496190h)]
    fstp dword ptr [f10]

    c2 = (int)(f1*10);
    fld dword ptr [f1]
    fmul qword ptr [__real@4024000000000000 (496190h)]
    call __ftol2_sse
    mov dword ptr [c2],eax

    c1 = (int)f10;
    fld dword ptr [f10]
    call __ftol2_sse
    mov dword ptr [c1],eax





    share|improve this answer


















    • 3





      The stores and fetches are immaterial; they are merely implementations of the C semantics. The key reason differences are observed between f1*10.0 and f10 is the former is a double expression while the latter is a float. The assignment f10 = f1*10.0; changes the value when it converts the double to float.

      – Eric Postpischil
      Feb 26 at 13:07











    • @EricPostpischil Are you sure? I am pretty sure that f1*10.0 will have the precision of a long double (80 bits), rather than a double (just 64 bits).

      – Martin Bonner
      Feb 26 at 13:15











    • @MartinBonner: It is irrelevant. f10 is a float; its significand is 24 bits on this platform. 10 has three significant bits, so multiplying by it produces a number with at most 27 bits. So its exact value will fit into the 53-bit significand of a double. Extended it to 64 bits will not change anything.

      – Eric Postpischil
      Feb 26 at 13:46











    • I think the correct answer is indeed @EricPostpischil's observation that in C all floating point calculations are performed in double precision. Hence the original code behaves as designed. My answer is just the identification of what happens "under the hood".

      – Paul Ogilvie
      Feb 26 at 13:50











    • @PaulOgilvie: It is not that all floating-point calculations are performed in double but that using one operand of type double causes other operands to be converted to double, per the usual arithmetic conversions. If all operands are of type float, as with using 10.0f instead of 10.0, the expression type remains float (although an implementation is permitted to use double to evaluate it, but that does not seem to be happening in OP’s code).

      – Eric Postpischil
      Feb 26 at 13:57















    7














    With MS Visual C 2008 I was able to reproduce this.



    Inspecting the assembler, the difference between the two is an intermediate store and fetch of a result with intermediate conversions:



     f10 = f1*10.0; // double result f10 converted to float and stored
    c1 = (int)f10; // float result f10 fetched and converted to double
    c2 = (int)(f1*10.0); // no store/fetch/convert


    The assembler generated pushes values onto the FPU stack that get converted to 64 bits and then are multiplied. For c1 the result is then converted back to float and stored and is then retrieved again and placed on the FPU stack (and converted to double again) for a call to __ftol2_sse, a run-time function to convert a double to int.



    For c2 the intermediate value is not converted to and from float and passed immediately to the __ftol2_sse function. For this function see also the answer at Convert double to int?.



    Assembler:



     f10 = f1*10;
    fld dword ptr [f1]
    fmul qword ptr [__real@4024000000000000 (496190h)]
    fstp dword ptr [f10]

    c2 = (int)(f1*10);
    fld dword ptr [f1]
    fmul qword ptr [__real@4024000000000000 (496190h)]
    call __ftol2_sse
    mov dword ptr [c2],eax

    c1 = (int)f10;
    fld dword ptr [f10]
    call __ftol2_sse
    mov dword ptr [c1],eax





    share|improve this answer


















    • 3





      The stores and fetches are immaterial; they are merely implementations of the C semantics. The key reason differences are observed between f1*10.0 and f10 is the former is a double expression while the latter is a float. The assignment f10 = f1*10.0; changes the value when it converts the double to float.

      – Eric Postpischil
      Feb 26 at 13:07











    • @EricPostpischil Are you sure? I am pretty sure that f1*10.0 will have the precision of a long double (80 bits), rather than a double (just 64 bits).

      – Martin Bonner
      Feb 26 at 13:15











    • @MartinBonner: It is irrelevant. f10 is a float; its significand is 24 bits on this platform. 10 has three significant bits, so multiplying by it produces a number with at most 27 bits. So its exact value will fit into the 53-bit significand of a double. Extended it to 64 bits will not change anything.

      – Eric Postpischil
      Feb 26 at 13:46











    • I think the correct answer is indeed @EricPostpischil's observation that in C all floating point calculations are performed in double precision. Hence the original code behaves as designed. My answer is just the identification of what happens "under the hood".

      – Paul Ogilvie
      Feb 26 at 13:50











    • @PaulOgilvie: It is not that all floating-point calculations are performed in double but that using one operand of type double causes other operands to be converted to double, per the usual arithmetic conversions. If all operands are of type float, as with using 10.0f instead of 10.0, the expression type remains float (although an implementation is permitted to use double to evaluate it, but that does not seem to be happening in OP’s code).

      – Eric Postpischil
      Feb 26 at 13:57













    7












    7








    7







    With MS Visual C 2008 I was able to reproduce this.



    Inspecting the assembler, the difference between the two is an intermediate store and fetch of a result with intermediate conversions:



     f10 = f1*10.0; // double result f10 converted to float and stored
    c1 = (int)f10; // float result f10 fetched and converted to double
    c2 = (int)(f1*10.0); // no store/fetch/convert


    The assembler generated pushes values onto the FPU stack that get converted to 64 bits and then are multiplied. For c1 the result is then converted back to float and stored and is then retrieved again and placed on the FPU stack (and converted to double again) for a call to __ftol2_sse, a run-time function to convert a double to int.



    For c2 the intermediate value is not converted to and from float and passed immediately to the __ftol2_sse function. For this function see also the answer at Convert double to int?.



    Assembler:



     f10 = f1*10;
    fld dword ptr [f1]
    fmul qword ptr [__real@4024000000000000 (496190h)]
    fstp dword ptr [f10]

    c2 = (int)(f1*10);
    fld dword ptr [f1]
    fmul qword ptr [__real@4024000000000000 (496190h)]
    call __ftol2_sse
    mov dword ptr [c2],eax

    c1 = (int)f10;
    fld dword ptr [f10]
    call __ftol2_sse
    mov dword ptr [c1],eax





    share|improve this answer













    With MS Visual C 2008 I was able to reproduce this.



    Inspecting the assembler, the difference between the two is an intermediate store and fetch of a result with intermediate conversions:



     f10 = f1*10.0; // double result f10 converted to float and stored
    c1 = (int)f10; // float result f10 fetched and converted to double
    c2 = (int)(f1*10.0); // no store/fetch/convert


    The assembler generated pushes values onto the FPU stack that get converted to 64 bits and then are multiplied. For c1 the result is then converted back to float and stored and is then retrieved again and placed on the FPU stack (and converted to double again) for a call to __ftol2_sse, a run-time function to convert a double to int.



    For c2 the intermediate value is not converted to and from float and passed immediately to the __ftol2_sse function. For this function see also the answer at Convert double to int?.



    Assembler:



     f10 = f1*10;
    fld dword ptr [f1]
    fmul qword ptr [__real@4024000000000000 (496190h)]
    fstp dword ptr [f10]

    c2 = (int)(f1*10);
    fld dword ptr [f1]
    fmul qword ptr [__real@4024000000000000 (496190h)]
    call __ftol2_sse
    mov dword ptr [c2],eax

    c1 = (int)f10;
    fld dword ptr [f10]
    call __ftol2_sse
    mov dword ptr [c1],eax






    share|improve this answer












    share|improve this answer



    share|improve this answer










    answered Feb 26 at 10:10









    Paul OgilviePaul Ogilvie

    18.7k31235




    18.7k31235







    • 3





      The stores and fetches are immaterial; they are merely implementations of the C semantics. The key reason differences are observed between f1*10.0 and f10 is the former is a double expression while the latter is a float. The assignment f10 = f1*10.0; changes the value when it converts the double to float.

      – Eric Postpischil
      Feb 26 at 13:07











    • @EricPostpischil Are you sure? I am pretty sure that f1*10.0 will have the precision of a long double (80 bits), rather than a double (just 64 bits).

      – Martin Bonner
      Feb 26 at 13:15











    • @MartinBonner: It is irrelevant. f10 is a float; its significand is 24 bits on this platform. 10 has three significant bits, so multiplying by it produces a number with at most 27 bits. So its exact value will fit into the 53-bit significand of a double. Extended it to 64 bits will not change anything.

      – Eric Postpischil
      Feb 26 at 13:46











    • I think the correct answer is indeed @EricPostpischil's observation that in C all floating point calculations are performed in double precision. Hence the original code behaves as designed. My answer is just the identification of what happens "under the hood".

      – Paul Ogilvie
      Feb 26 at 13:50











    • @PaulOgilvie: It is not that all floating-point calculations are performed in double but that using one operand of type double causes other operands to be converted to double, per the usual arithmetic conversions. If all operands are of type float, as with using 10.0f instead of 10.0, the expression type remains float (although an implementation is permitted to use double to evaluate it, but that does not seem to be happening in OP’s code).

      – Eric Postpischil
      Feb 26 at 13:57












    • 3





      The stores and fetches are immaterial; they are merely implementations of the C semantics. The key reason differences are observed between f1*10.0 and f10 is the former is a double expression while the latter is a float. The assignment f10 = f1*10.0; changes the value when it converts the double to float.

      – Eric Postpischil
      Feb 26 at 13:07











    • @EricPostpischil Are you sure? I am pretty sure that f1*10.0 will have the precision of a long double (80 bits), rather than a double (just 64 bits).

      – Martin Bonner
      Feb 26 at 13:15











    • @MartinBonner: It is irrelevant. f10 is a float; its significand is 24 bits on this platform. 10 has three significant bits, so multiplying by it produces a number with at most 27 bits. So its exact value will fit into the 53-bit significand of a double. Extended it to 64 bits will not change anything.

      – Eric Postpischil
      Feb 26 at 13:46











    • I think the correct answer is indeed @EricPostpischil's observation that in C all floating point calculations are performed in double precision. Hence the original code behaves as designed. My answer is just the identification of what happens "under the hood".

      – Paul Ogilvie
      Feb 26 at 13:50











    • @PaulOgilvie: It is not that all floating-point calculations are performed in double but that using one operand of type double causes other operands to be converted to double, per the usual arithmetic conversions. If all operands are of type float, as with using 10.0f instead of 10.0, the expression type remains float (although an implementation is permitted to use double to evaluate it, but that does not seem to be happening in OP’s code).

      – Eric Postpischil
      Feb 26 at 13:57







    3




    3





    The stores and fetches are immaterial; they are merely implementations of the C semantics. The key reason differences are observed between f1*10.0 and f10 is the former is a double expression while the latter is a float. The assignment f10 = f1*10.0; changes the value when it converts the double to float.

    – Eric Postpischil
    Feb 26 at 13:07





    The stores and fetches are immaterial; they are merely implementations of the C semantics. The key reason differences are observed between f1*10.0 and f10 is the former is a double expression while the latter is a float. The assignment f10 = f1*10.0; changes the value when it converts the double to float.

    – Eric Postpischil
    Feb 26 at 13:07













    @EricPostpischil Are you sure? I am pretty sure that f1*10.0 will have the precision of a long double (80 bits), rather than a double (just 64 bits).

    – Martin Bonner
    Feb 26 at 13:15





    @EricPostpischil Are you sure? I am pretty sure that f1*10.0 will have the precision of a long double (80 bits), rather than a double (just 64 bits).

    – Martin Bonner
    Feb 26 at 13:15













    @MartinBonner: It is irrelevant. f10 is a float; its significand is 24 bits on this platform. 10 has three significant bits, so multiplying by it produces a number with at most 27 bits. So its exact value will fit into the 53-bit significand of a double. Extended it to 64 bits will not change anything.

    – Eric Postpischil
    Feb 26 at 13:46





    @MartinBonner: It is irrelevant. f10 is a float; its significand is 24 bits on this platform. 10 has three significant bits, so multiplying by it produces a number with at most 27 bits. So its exact value will fit into the 53-bit significand of a double. Extended it to 64 bits will not change anything.

    – Eric Postpischil
    Feb 26 at 13:46













    I think the correct answer is indeed @EricPostpischil's observation that in C all floating point calculations are performed in double precision. Hence the original code behaves as designed. My answer is just the identification of what happens "under the hood".

    – Paul Ogilvie
    Feb 26 at 13:50





    I think the correct answer is indeed @EricPostpischil's observation that in C all floating point calculations are performed in double precision. Hence the original code behaves as designed. My answer is just the identification of what happens "under the hood".

    – Paul Ogilvie
    Feb 26 at 13:50













    @PaulOgilvie: It is not that all floating-point calculations are performed in double but that using one operand of type double causes other operands to be converted to double, per the usual arithmetic conversions. If all operands are of type float, as with using 10.0f instead of 10.0, the expression type remains float (although an implementation is permitted to use double to evaluate it, but that does not seem to be happening in OP’s code).

    – Eric Postpischil
    Feb 26 at 13:57





    @PaulOgilvie: It is not that all floating-point calculations are performed in double but that using one operand of type double causes other operands to be converted to double, per the usual arithmetic conversions. If all operands are of type float, as with using 10.0f instead of 10.0, the expression type remains float (although an implementation is permitted to use double to evaluate it, but that does not seem to be happening in OP’s code).

    – Eric Postpischil
    Feb 26 at 13:57













    5














    In the “32-bit system,” the difference is caused by the fact that f1*10.0 uses full double precision, while f10 has only float precision because that is its type. f1*10.0 uses double precision because 10.0 is a double constant. When f1*10.0 is assigned to f10, the value changes because it is implicitly converted to float, which has less precision.



    If you use the float constant 10.0f instead, the differences vanish.



    Consider the first case, when i is 1. Then:



    • In f1 = 3+i*0.1, 0.1 is a double constant, so the arithmetic is performed in double, and the result is 3.100000000000000088817841970012523233890533447265625. Then, to assign this to f1, it is converted to float, which produces 3.099999904632568359375.

    • In f10 = f1*10.0;, 10.0 is a double constant, so the arithmetic is again performed in double, and the result is 30.99999904632568359375. For assignment to f10, this is converted to float, and the result is 31.

    • Later, when f10 and f1*10.0 are printed, we see the values given above, with nine digits after the decimal point, “31.000000000” for f10, and “30.999999046”.

    If you print f1*10.0f, with the float constant 10.0f instead of the double constant 10.0, the result will be “31.000000000” rather than “30.999999046”.



    (The above uses IEEE-754 basic 32-bit and 64-bit binary floating-point arithmetic.)



    In particular, note this: The difference between f1*10.0 and f10 arises when f1*10.0 is converted to float for assignment to f10. While C permits implementations to use extra precision in evaluating expressions, it requires implementations to discard this precision in assignments and casts. Therefore, in a standard-conforming compiler, the assignment to f10 must use float precision. This means, even when the program is compiled for a “64-bit system,” the differences should occur. If they do not, the compiler does not conforming to the C standard.



    Furthermore, if float is changed to double, the conversion to float does not happen, and the value will not be changed. In this case, no differences between f1*10.0 and f10 should manifest.



    Given that the question reports the differences do not manifest with a “64-bit” compilation and do manifest with double, it is questionable whether the observations have been reported accurately. To clarify this, the exact code should be shown, and the observations should be reproduced by a third party.






    share|improve this answer

























    • The question was why is there a difference between an -m32 and an -m64 build of the same source code, not what the difference between 10.0 and 10.0f is.

      – Nikos C.
      Feb 26 at 13:48












    • @NikosC.: Indeed, but the question falsely states “it is clear that a difference exists between casting a variable and an expression.” This answer disproves that and explains the difference. The question fails to present convincing evidence there is a difference between the 32-bit and 64-bit compilations—i suspect user error. I have updated the answer to note the issue.

      – Eric Postpischil
      Feb 26 at 13:53
















    5














    In the “32-bit system,” the difference is caused by the fact that f1*10.0 uses full double precision, while f10 has only float precision because that is its type. f1*10.0 uses double precision because 10.0 is a double constant. When f1*10.0 is assigned to f10, the value changes because it is implicitly converted to float, which has less precision.



    If you use the float constant 10.0f instead, the differences vanish.



    Consider the first case, when i is 1. Then:



    • In f1 = 3+i*0.1, 0.1 is a double constant, so the arithmetic is performed in double, and the result is 3.100000000000000088817841970012523233890533447265625. Then, to assign this to f1, it is converted to float, which produces 3.099999904632568359375.

    • In f10 = f1*10.0;, 10.0 is a double constant, so the arithmetic is again performed in double, and the result is 30.99999904632568359375. For assignment to f10, this is converted to float, and the result is 31.

    • Later, when f10 and f1*10.0 are printed, we see the values given above, with nine digits after the decimal point, “31.000000000” for f10, and “30.999999046”.

    If you print f1*10.0f, with the float constant 10.0f instead of the double constant 10.0, the result will be “31.000000000” rather than “30.999999046”.



    (The above uses IEEE-754 basic 32-bit and 64-bit binary floating-point arithmetic.)



    In particular, note this: The difference between f1*10.0 and f10 arises when f1*10.0 is converted to float for assignment to f10. While C permits implementations to use extra precision in evaluating expressions, it requires implementations to discard this precision in assignments and casts. Therefore, in a standard-conforming compiler, the assignment to f10 must use float precision. This means, even when the program is compiled for a “64-bit system,” the differences should occur. If they do not, the compiler does not conforming to the C standard.



    Furthermore, if float is changed to double, the conversion to float does not happen, and the value will not be changed. In this case, no differences between f1*10.0 and f10 should manifest.



    Given that the question reports the differences do not manifest with a “64-bit” compilation and do manifest with double, it is questionable whether the observations have been reported accurately. To clarify this, the exact code should be shown, and the observations should be reproduced by a third party.






    share|improve this answer

























    • The question was why is there a difference between an -m32 and an -m64 build of the same source code, not what the difference between 10.0 and 10.0f is.

      – Nikos C.
      Feb 26 at 13:48












    • @NikosC.: Indeed, but the question falsely states “it is clear that a difference exists between casting a variable and an expression.” This answer disproves that and explains the difference. The question fails to present convincing evidence there is a difference between the 32-bit and 64-bit compilations—i suspect user error. I have updated the answer to note the issue.

      – Eric Postpischil
      Feb 26 at 13:53














    5












    5








    5







    In the “32-bit system,” the difference is caused by the fact that f1*10.0 uses full double precision, while f10 has only float precision because that is its type. f1*10.0 uses double precision because 10.0 is a double constant. When f1*10.0 is assigned to f10, the value changes because it is implicitly converted to float, which has less precision.



    If you use the float constant 10.0f instead, the differences vanish.



    Consider the first case, when i is 1. Then:



    • In f1 = 3+i*0.1, 0.1 is a double constant, so the arithmetic is performed in double, and the result is 3.100000000000000088817841970012523233890533447265625. Then, to assign this to f1, it is converted to float, which produces 3.099999904632568359375.

    • In f10 = f1*10.0;, 10.0 is a double constant, so the arithmetic is again performed in double, and the result is 30.99999904632568359375. For assignment to f10, this is converted to float, and the result is 31.

    • Later, when f10 and f1*10.0 are printed, we see the values given above, with nine digits after the decimal point, “31.000000000” for f10, and “30.999999046”.

    If you print f1*10.0f, with the float constant 10.0f instead of the double constant 10.0, the result will be “31.000000000” rather than “30.999999046”.



    (The above uses IEEE-754 basic 32-bit and 64-bit binary floating-point arithmetic.)



    In particular, note this: The difference between f1*10.0 and f10 arises when f1*10.0 is converted to float for assignment to f10. While C permits implementations to use extra precision in evaluating expressions, it requires implementations to discard this precision in assignments and casts. Therefore, in a standard-conforming compiler, the assignment to f10 must use float precision. This means, even when the program is compiled for a “64-bit system,” the differences should occur. If they do not, the compiler does not conforming to the C standard.



    Furthermore, if float is changed to double, the conversion to float does not happen, and the value will not be changed. In this case, no differences between f1*10.0 and f10 should manifest.



    Given that the question reports the differences do not manifest with a “64-bit” compilation and do manifest with double, it is questionable whether the observations have been reported accurately. To clarify this, the exact code should be shown, and the observations should be reproduced by a third party.






    share|improve this answer















    In the “32-bit system,” the difference is caused by the fact that f1*10.0 uses full double precision, while f10 has only float precision because that is its type. f1*10.0 uses double precision because 10.0 is a double constant. When f1*10.0 is assigned to f10, the value changes because it is implicitly converted to float, which has less precision.



    If you use the float constant 10.0f instead, the differences vanish.



    Consider the first case, when i is 1. Then:



    • In f1 = 3+i*0.1, 0.1 is a double constant, so the arithmetic is performed in double, and the result is 3.100000000000000088817841970012523233890533447265625. Then, to assign this to f1, it is converted to float, which produces 3.099999904632568359375.

    • In f10 = f1*10.0;, 10.0 is a double constant, so the arithmetic is again performed in double, and the result is 30.99999904632568359375. For assignment to f10, this is converted to float, and the result is 31.

    • Later, when f10 and f1*10.0 are printed, we see the values given above, with nine digits after the decimal point, “31.000000000” for f10, and “30.999999046”.

    If you print f1*10.0f, with the float constant 10.0f instead of the double constant 10.0, the result will be “31.000000000” rather than “30.999999046”.



    (The above uses IEEE-754 basic 32-bit and 64-bit binary floating-point arithmetic.)



    In particular, note this: The difference between f1*10.0 and f10 arises when f1*10.0 is converted to float for assignment to f10. While C permits implementations to use extra precision in evaluating expressions, it requires implementations to discard this precision in assignments and casts. Therefore, in a standard-conforming compiler, the assignment to f10 must use float precision. This means, even when the program is compiled for a “64-bit system,” the differences should occur. If they do not, the compiler does not conforming to the C standard.



    Furthermore, if float is changed to double, the conversion to float does not happen, and the value will not be changed. In this case, no differences between f1*10.0 and f10 should manifest.



    Given that the question reports the differences do not manifest with a “64-bit” compilation and do manifest with double, it is questionable whether the observations have been reported accurately. To clarify this, the exact code should be shown, and the observations should be reproduced by a third party.







    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Feb 26 at 14:10

























    answered Feb 26 at 13:04









    Eric PostpischilEric Postpischil

    79k886167




    79k886167












    • The question was why is there a difference between an -m32 and an -m64 build of the same source code, not what the difference between 10.0 and 10.0f is.

      – Nikos C.
      Feb 26 at 13:48












    • @NikosC.: Indeed, but the question falsely states “it is clear that a difference exists between casting a variable and an expression.” This answer disproves that and explains the difference. The question fails to present convincing evidence there is a difference between the 32-bit and 64-bit compilations—i suspect user error. I have updated the answer to note the issue.

      – Eric Postpischil
      Feb 26 at 13:53


















    • The question was why is there a difference between an -m32 and an -m64 build of the same source code, not what the difference between 10.0 and 10.0f is.

      – Nikos C.
      Feb 26 at 13:48












    • @NikosC.: Indeed, but the question falsely states “it is clear that a difference exists between casting a variable and an expression.” This answer disproves that and explains the difference. The question fails to present convincing evidence there is a difference between the 32-bit and 64-bit compilations—i suspect user error. I have updated the answer to note the issue.

      – Eric Postpischil
      Feb 26 at 13:53

















    The question was why is there a difference between an -m32 and an -m64 build of the same source code, not what the difference between 10.0 and 10.0f is.

    – Nikos C.
    Feb 26 at 13:48






    The question was why is there a difference between an -m32 and an -m64 build of the same source code, not what the difference between 10.0 and 10.0f is.

    – Nikos C.
    Feb 26 at 13:48














    @NikosC.: Indeed, but the question falsely states “it is clear that a difference exists between casting a variable and an expression.” This answer disproves that and explains the difference. The question fails to present convincing evidence there is a difference between the 32-bit and 64-bit compilations—i suspect user error. I have updated the answer to note the issue.

    – Eric Postpischil
    Feb 26 at 13:53






    @NikosC.: Indeed, but the question falsely states “it is clear that a difference exists between casting a variable and an expression.” This answer disproves that and explains the difference. The question fails to present convincing evidence there is a difference between the 32-bit and 64-bit compilations—i suspect user error. I have updated the answer to note the issue.

    – Eric Postpischil
    Feb 26 at 13:53












    2














    The C standard is not very strict about how floating point math is to be carried out. The standard allows an implementation to do calculation with higher precision than the involved types.



    The result in your case is likely to come from the fact that c1 is calculated as "float-to-int" while c2 is calculated as "double-to-int" (or even higher precision).



    Here is another example showing the same behavior.



    #define DD 0.11111111

    int main()

    int i = 27;

    int c1,c2,c3;
    float f1;
    double d1;
    printf("%.60fn", DD);

    f1 = i * DD;
    d1 = i * DD;
    c1 = (int)f1;
    c2 = (int)(i * DD);
    c3 = (int)d1;

    printf("----------------------n");
    printf("f1: %.60fn", f1);
    printf("d1: %.60fn", d1);
    printf("m : %.60fn", i * DD);
    printf("%d, %d, %dn",c1,c2,c3);



    My output:



    0.111111109999999999042863407794357044622302055358886718750000
    ----------------------
    f1: 3.000000000000000000000000000000000000000000000000000000000000
    d1: 2.999999970000000182324129127664491534233093261718750000000000
    m : 2.999999970000000182324129127664491534233093261718750000000000
    3, 2, 2


    The trick here is the number of ones in 0.11111111. The accurate result is "2.99999997". As you change the number of ones the accurate result is still in the form "2.99...997" (i.e. the number of 9 increases when the number of 1 increases).



    At some point (aka some number of ones) you will reach a point where storing the result in a float rounds the result to "3.0" while the double is still able to hold "2.999999.....". Then a conversion to int will give different results.



    Increasing the number of ones further will lead to a point where the double will also round to "3.0" and the conversion to int will consequently yield the same result.






    share|improve this answer

























    • On some platforms and GCC versions, the floating point co-processor is used for intermediate computations (what the OP called "expressions"). These are 80-bit. On 64-bit systems though this is never used, and you get SSE 64-bit computations instead.

      – Nikos C.
      Feb 26 at 10:12











    • The differences do not arise in the conversions to int. The difference arises when f1*10.0, which is a double expression, is assigned to f10, which is a float. That assignment changes the value.

      – Eric Postpischil
      Feb 26 at 13:05











    • @EricPostpischil That was actually what I tried to write. Here: "... reach a point where storing the result in a float rounds the result to "3.0""

      – 4386427
      Feb 26 at 14:43















    2














    The C standard is not very strict about how floating point math is to be carried out. The standard allows an implementation to do calculation with higher precision than the involved types.



    The result in your case is likely to come from the fact that c1 is calculated as "float-to-int" while c2 is calculated as "double-to-int" (or even higher precision).



    Here is another example showing the same behavior.



    #define DD 0.11111111

    int main()

    int i = 27;

    int c1,c2,c3;
    float f1;
    double d1;
    printf("%.60fn", DD);

    f1 = i * DD;
    d1 = i * DD;
    c1 = (int)f1;
    c2 = (int)(i * DD);
    c3 = (int)d1;

    printf("----------------------n");
    printf("f1: %.60fn", f1);
    printf("d1: %.60fn", d1);
    printf("m : %.60fn", i * DD);
    printf("%d, %d, %dn",c1,c2,c3);



    My output:



    0.111111109999999999042863407794357044622302055358886718750000
    ----------------------
    f1: 3.000000000000000000000000000000000000000000000000000000000000
    d1: 2.999999970000000182324129127664491534233093261718750000000000
    m : 2.999999970000000182324129127664491534233093261718750000000000
    3, 2, 2


    The trick here is the number of ones in 0.11111111. The accurate result is "2.99999997". As you change the number of ones the accurate result is still in the form "2.99...997" (i.e. the number of 9 increases when the number of 1 increases).



    At some point (aka some number of ones) you will reach a point where storing the result in a float rounds the result to "3.0" while the double is still able to hold "2.999999.....". Then a conversion to int will give different results.



    Increasing the number of ones further will lead to a point where the double will also round to "3.0" and the conversion to int will consequently yield the same result.






    share|improve this answer

























    • On some platforms and GCC versions, the floating point co-processor is used for intermediate computations (what the OP called "expressions"). These are 80-bit. On 64-bit systems though this is never used, and you get SSE 64-bit computations instead.

      – Nikos C.
      Feb 26 at 10:12











    • The differences do not arise in the conversions to int. The difference arises when f1*10.0, which is a double expression, is assigned to f10, which is a float. That assignment changes the value.

      – Eric Postpischil
      Feb 26 at 13:05











    • @EricPostpischil That was actually what I tried to write. Here: "... reach a point where storing the result in a float rounds the result to "3.0""

      – 4386427
      Feb 26 at 14:43













    2












    2








    2







    The C standard is not very strict about how floating point math is to be carried out. The standard allows an implementation to do calculation with higher precision than the involved types.



    The result in your case is likely to come from the fact that c1 is calculated as "float-to-int" while c2 is calculated as "double-to-int" (or even higher precision).



    Here is another example showing the same behavior.



    #define DD 0.11111111

    int main()

    int i = 27;

    int c1,c2,c3;
    float f1;
    double d1;
    printf("%.60fn", DD);

    f1 = i * DD;
    d1 = i * DD;
    c1 = (int)f1;
    c2 = (int)(i * DD);
    c3 = (int)d1;

    printf("----------------------n");
    printf("f1: %.60fn", f1);
    printf("d1: %.60fn", d1);
    printf("m : %.60fn", i * DD);
    printf("%d, %d, %dn",c1,c2,c3);



    My output:



    0.111111109999999999042863407794357044622302055358886718750000
    ----------------------
    f1: 3.000000000000000000000000000000000000000000000000000000000000
    d1: 2.999999970000000182324129127664491534233093261718750000000000
    m : 2.999999970000000182324129127664491534233093261718750000000000
    3, 2, 2


    The trick here is the number of ones in 0.11111111. The accurate result is "2.99999997". As you change the number of ones the accurate result is still in the form "2.99...997" (i.e. the number of 9 increases when the number of 1 increases).



    At some point (aka some number of ones) you will reach a point where storing the result in a float rounds the result to "3.0" while the double is still able to hold "2.999999.....". Then a conversion to int will give different results.



    Increasing the number of ones further will lead to a point where the double will also round to "3.0" and the conversion to int will consequently yield the same result.






    share|improve this answer















    The C standard is not very strict about how floating point math is to be carried out. The standard allows an implementation to do calculation with higher precision than the involved types.



    The result in your case is likely to come from the fact that c1 is calculated as "float-to-int" while c2 is calculated as "double-to-int" (or even higher precision).



    Here is another example showing the same behavior.



    #define DD 0.11111111

    int main()

    int i = 27;

    int c1,c2,c3;
    float f1;
    double d1;
    printf("%.60fn", DD);

    f1 = i * DD;
    d1 = i * DD;
    c1 = (int)f1;
    c2 = (int)(i * DD);
    c3 = (int)d1;

    printf("----------------------n");
    printf("f1: %.60fn", f1);
    printf("d1: %.60fn", d1);
    printf("m : %.60fn", i * DD);
    printf("%d, %d, %dn",c1,c2,c3);



    My output:



    0.111111109999999999042863407794357044622302055358886718750000
    ----------------------
    f1: 3.000000000000000000000000000000000000000000000000000000000000
    d1: 2.999999970000000182324129127664491534233093261718750000000000
    m : 2.999999970000000182324129127664491534233093261718750000000000
    3, 2, 2


    The trick here is the number of ones in 0.11111111. The accurate result is "2.99999997". As you change the number of ones the accurate result is still in the form "2.99...997" (i.e. the number of 9 increases when the number of 1 increases).



    At some point (aka some number of ones) you will reach a point where storing the result in a float rounds the result to "3.0" while the double is still able to hold "2.999999.....". Then a conversion to int will give different results.



    Increasing the number of ones further will lead to a point where the double will also round to "3.0" and the conversion to int will consequently yield the same result.







    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Feb 26 at 10:35

























    answered Feb 26 at 10:10









    43864274386427

    21.9k31846




    21.9k31846












    • On some platforms and GCC versions, the floating point co-processor is used for intermediate computations (what the OP called "expressions"). These are 80-bit. On 64-bit systems though this is never used, and you get SSE 64-bit computations instead.

      – Nikos C.
      Feb 26 at 10:12











    • The differences do not arise in the conversions to int. The difference arises when f1*10.0, which is a double expression, is assigned to f10, which is a float. That assignment changes the value.

      – Eric Postpischil
      Feb 26 at 13:05











    • @EricPostpischil That was actually what I tried to write. Here: "... reach a point where storing the result in a float rounds the result to "3.0""

      – 4386427
      Feb 26 at 14:43

















    • On some platforms and GCC versions, the floating point co-processor is used for intermediate computations (what the OP called "expressions"). These are 80-bit. On 64-bit systems though this is never used, and you get SSE 64-bit computations instead.

      – Nikos C.
      Feb 26 at 10:12











    • The differences do not arise in the conversions to int. The difference arises when f1*10.0, which is a double expression, is assigned to f10, which is a float. That assignment changes the value.

      – Eric Postpischil
      Feb 26 at 13:05











    • @EricPostpischil That was actually what I tried to write. Here: "... reach a point where storing the result in a float rounds the result to "3.0""

      – 4386427
      Feb 26 at 14:43
















    On some platforms and GCC versions, the floating point co-processor is used for intermediate computations (what the OP called "expressions"). These are 80-bit. On 64-bit systems though this is never used, and you get SSE 64-bit computations instead.

    – Nikos C.
    Feb 26 at 10:12





    On some platforms and GCC versions, the floating point co-processor is used for intermediate computations (what the OP called "expressions"). These are 80-bit. On 64-bit systems though this is never used, and you get SSE 64-bit computations instead.

    – Nikos C.
    Feb 26 at 10:12













    The differences do not arise in the conversions to int. The difference arises when f1*10.0, which is a double expression, is assigned to f10, which is a float. That assignment changes the value.

    – Eric Postpischil
    Feb 26 at 13:05





    The differences do not arise in the conversions to int. The difference arises when f1*10.0, which is a double expression, is assigned to f10, which is a float. That assignment changes the value.

    – Eric Postpischil
    Feb 26 at 13:05













    @EricPostpischil That was actually what I tried to write. Here: "... reach a point where storing the result in a float rounds the result to "3.0""

    – 4386427
    Feb 26 at 14:43





    @EricPostpischil That was actually what I tried to write. Here: "... reach a point where storing the result in a float rounds the result to "3.0""

    – 4386427
    Feb 26 at 14:43











    1














    The main reason is that the rounding-control (RC) field of the x87 FPU control register values are inconsistent in the follow two lines. eventually the values of c1 and c2 are different.



    0x08048457 <+58>: fstps 0x44(%esp)
    0x0804848b <+110>: fistpl 0x3c(%esp)


    Add gcc compile option -mfpmath=387 -mno-sse, it can be reproduced (Even without -m32, or change the float to a double)

    Like this:



    gcc -otest test.c -g -mfpmath=387 -mno-sse -m32


    Then use gdb to debug, breakpoint at 0x0804845b, and run to i=1



     0x08048457 <+58>: fstps 0x44(%esp)
    0x0804845b <+62>: flds 0x44(%esp)

    (gdb) info float
    =>R7: Valid 0x4003f7ffff8000000000 +30.99999904632568359
    R6: Empty 0x4002a000000000000000
    R5: Empty 0x00000000000000000000
    R4: Empty 0x00000000000000000000
    R3: Empty 0x00000000000000000000
    R2: Empty 0x00000000000000000000
    R1: Empty 0x00000000000000000000
    R0: Empty 0x00000000000000000000

    Status Word: 0x3820 PE
    TOP: 7
    Control Word: 0x037f IM DM ZM OM UM PM
    PC: Extended Precision (64-bits)
    RC: Round to nearest
    Tag Word: 0x3fff
    Instruction Pointer: 0x00:0x08048455
    Operand Pointer: 0x00:0x00000000
    Opcode: 0x0000

    (gdb) x /xw 0x44+$esp
    0xffffb594: 0x41f80000 ==> 31.0, s=0, M=1.1111 E=4


    observe the execution results of fstps,

    at this time, the RC value on the control register on the fpu is Round to nearest.

    the value on the fpu register is 30.99999904632568359 (80bit).

    the value on 0x44(%esp) (variable "f10") is 31.0. (round to nearest)



    Then use gdb to debug, breakpoint at 0x0804848b, and run to i=1



     0x0804848b <+110>: fistpl 0x3c(%esp)

    (gdb) info float
    =>R7: Valid 0x4003f7ffff8000000000 +30.99999904632568359
    R6: Empty 0x4002a000000000000000
    R5: Empty 0x00000000000000000000
    R4: Empty 0x00000000000000000000
    R3: Empty 0x00000000000000000000
    R2: Empty 0x00000000000000000000
    R1: Empty 0x00000000000000000000
    R0: Empty 0x00000000000000000000

    Status Word: 0x3820 PE
    TOP: 7
    Control Word: 0x0c7f IM DM ZM OM UM PM
    PC: Single Precision (24-bits)
    RC: Round toward zero
    Tag Word: 0x3fff
    Instruction Pointer: 0x00:0x08048485
    Operand Pointer: 0x00:0x00000000
    Opcode: 0x0000


    at this time, the RC value on the control register on the fpu is Round toward zero.

    the value on the fpu register is 30.99999904632568359 (80bit). value is the same as above

    obviously when the integer is converted, the decimal point is truncated, and the value is 30.



    Below is the main decompiled code



     (gdb) disas main
    Dump of assembler code for function main:
    0x0804841d <+0>: push %ebp
    0x0804841e <+1>: mov %esp,%ebp
    0x08048420 <+3>: and $0xfffffff0,%esp
    0x08048423 <+6>: sub $0x50,%esp
    0x08048426 <+9>: movl $0x0,0x4c(%esp)
    0x0804842e <+17>: jmp 0x80484de <main+193>
    0x08048433 <+22>: fildl 0x4c(%esp)
    0x08048437 <+26>: fldl 0x80485a8
    0x0804843d <+32>: fmulp %st,%st(1)
    0x0804843f <+34>: fldl 0x80485b0
    0x08048445 <+40>: faddp %st,%st(1)
    0x08048447 <+42>: fstps 0x48(%esp)
    0x0804844b <+46>: flds 0x48(%esp)
    0x0804844f <+50>: flds 0x80485b8
    0x08048455 <+56>: fmulp %st,%st(1)
    0x08048457 <+58>: fstps 0x44(%esp) // store to f10
    0x0804845b <+62>: flds 0x44(%esp)
    0x0804845f <+66>: fnstcw 0x2a(%esp)
    0x08048463 <+70>: movzwl 0x2a(%esp),%eax
    0x08048468 <+75>: mov $0xc,%ah
    0x0804846a <+77>: mov %ax,0x28(%esp)
    0x0804846f <+82>: fldcw 0x28(%esp)
    0x08048473 <+86>: fistpl 0x40(%esp)
    0x08048477 <+90>: fldcw 0x2a(%esp)
    0x0804847b <+94>: flds 0x48(%esp)
    0x0804847f <+98>: fldl 0x80485c0
    0x08048485 <+104>: fmulp %st,%st(1)
    0x08048487 <+106>: fldcw 0x28(%esp)
    0x0804848b <+110>: fistpl 0x3c(%esp) // f1 * 10 convert int
    0x0804848f <+114>: fldcw 0x2a(%esp)
    0x08048493 <+118>: flds 0x48(%esp)
    0x08048497 <+122>: fldl 0x80485c0
    0x0804849d <+128>: fmulp %st,%st(1)
    0x0804849f <+130>: flds 0x44(%esp)
    0x080484a3 <+134>: fxch %st(1)
    0x080484a5 <+136>: mov 0x3c(%esp),%eax
    0x080484a9 <+140>: mov 0x40(%esp),%edx
    0x080484ad <+144>: sub %eax,%edx
    0x080484af <+146>: mov %edx,%eax
    0x080484b1 <+148>: fstpl 0x18(%esp)
    0x080484b5 <+152>: fstpl 0x10(%esp)
    0x080484b9 <+156>: mov %eax,0xc(%esp)
    0x080484bd <+160>: mov 0x3c(%esp),%eax
    0x080484c1 <+164>: mov %eax,0x8(%esp)
    0x080484c5 <+168>: mov 0x40(%esp),%eax
    0x080484c9 <+172>: mov %eax,0x4(%esp)
    0x080484cd <+176>: movl $0x8048588,(%esp)
    0x080484d4 <+183>: call 0x80482f0 <printf@plt>
    0x080484d9 <+188>: addl $0x1,0x4c(%esp)
    0x080484de <+193>: cmpl $0x14,0x4c(%esp)
    0x080484e3 <+198>: jle 0x8048433 <main+22>
    0x080484e9 <+204>: leave
    0x080484ea <+205>: ret





    share|improve this answer





























      1














      The main reason is that the rounding-control (RC) field of the x87 FPU control register values are inconsistent in the follow two lines. eventually the values of c1 and c2 are different.



      0x08048457 <+58>: fstps 0x44(%esp)
      0x0804848b <+110>: fistpl 0x3c(%esp)


      Add gcc compile option -mfpmath=387 -mno-sse, it can be reproduced (Even without -m32, or change the float to a double)

      Like this:



      gcc -otest test.c -g -mfpmath=387 -mno-sse -m32


      Then use gdb to debug, breakpoint at 0x0804845b, and run to i=1



       0x08048457 <+58>: fstps 0x44(%esp)
      0x0804845b <+62>: flds 0x44(%esp)

      (gdb) info float
      =>R7: Valid 0x4003f7ffff8000000000 +30.99999904632568359
      R6: Empty 0x4002a000000000000000
      R5: Empty 0x00000000000000000000
      R4: Empty 0x00000000000000000000
      R3: Empty 0x00000000000000000000
      R2: Empty 0x00000000000000000000
      R1: Empty 0x00000000000000000000
      R0: Empty 0x00000000000000000000

      Status Word: 0x3820 PE
      TOP: 7
      Control Word: 0x037f IM DM ZM OM UM PM
      PC: Extended Precision (64-bits)
      RC: Round to nearest
      Tag Word: 0x3fff
      Instruction Pointer: 0x00:0x08048455
      Operand Pointer: 0x00:0x00000000
      Opcode: 0x0000

      (gdb) x /xw 0x44+$esp
      0xffffb594: 0x41f80000 ==> 31.0, s=0, M=1.1111 E=4


      observe the execution results of fstps,

      at this time, the RC value on the control register on the fpu is Round to nearest.

      the value on the fpu register is 30.99999904632568359 (80bit).

      the value on 0x44(%esp) (variable "f10") is 31.0. (round to nearest)



      Then use gdb to debug, breakpoint at 0x0804848b, and run to i=1



       0x0804848b <+110>: fistpl 0x3c(%esp)

      (gdb) info float
      =>R7: Valid 0x4003f7ffff8000000000 +30.99999904632568359
      R6: Empty 0x4002a000000000000000
      R5: Empty 0x00000000000000000000
      R4: Empty 0x00000000000000000000
      R3: Empty 0x00000000000000000000
      R2: Empty 0x00000000000000000000
      R1: Empty 0x00000000000000000000
      R0: Empty 0x00000000000000000000

      Status Word: 0x3820 PE
      TOP: 7
      Control Word: 0x0c7f IM DM ZM OM UM PM
      PC: Single Precision (24-bits)
      RC: Round toward zero
      Tag Word: 0x3fff
      Instruction Pointer: 0x00:0x08048485
      Operand Pointer: 0x00:0x00000000
      Opcode: 0x0000


      at this time, the RC value on the control register on the fpu is Round toward zero.

      the value on the fpu register is 30.99999904632568359 (80bit). value is the same as above

      obviously when the integer is converted, the decimal point is truncated, and the value is 30.



      Below is the main decompiled code



       (gdb) disas main
      Dump of assembler code for function main:
      0x0804841d <+0>: push %ebp
      0x0804841e <+1>: mov %esp,%ebp
      0x08048420 <+3>: and $0xfffffff0,%esp
      0x08048423 <+6>: sub $0x50,%esp
      0x08048426 <+9>: movl $0x0,0x4c(%esp)
      0x0804842e <+17>: jmp 0x80484de <main+193>
      0x08048433 <+22>: fildl 0x4c(%esp)
      0x08048437 <+26>: fldl 0x80485a8
      0x0804843d <+32>: fmulp %st,%st(1)
      0x0804843f <+34>: fldl 0x80485b0
      0x08048445 <+40>: faddp %st,%st(1)
      0x08048447 <+42>: fstps 0x48(%esp)
      0x0804844b <+46>: flds 0x48(%esp)
      0x0804844f <+50>: flds 0x80485b8
      0x08048455 <+56>: fmulp %st,%st(1)
      0x08048457 <+58>: fstps 0x44(%esp) // store to f10
      0x0804845b <+62>: flds 0x44(%esp)
      0x0804845f <+66>: fnstcw 0x2a(%esp)
      0x08048463 <+70>: movzwl 0x2a(%esp),%eax
      0x08048468 <+75>: mov $0xc,%ah
      0x0804846a <+77>: mov %ax,0x28(%esp)
      0x0804846f <+82>: fldcw 0x28(%esp)
      0x08048473 <+86>: fistpl 0x40(%esp)
      0x08048477 <+90>: fldcw 0x2a(%esp)
      0x0804847b <+94>: flds 0x48(%esp)
      0x0804847f <+98>: fldl 0x80485c0
      0x08048485 <+104>: fmulp %st,%st(1)
      0x08048487 <+106>: fldcw 0x28(%esp)
      0x0804848b <+110>: fistpl 0x3c(%esp) // f1 * 10 convert int
      0x0804848f <+114>: fldcw 0x2a(%esp)
      0x08048493 <+118>: flds 0x48(%esp)
      0x08048497 <+122>: fldl 0x80485c0
      0x0804849d <+128>: fmulp %st,%st(1)
      0x0804849f <+130>: flds 0x44(%esp)
      0x080484a3 <+134>: fxch %st(1)
      0x080484a5 <+136>: mov 0x3c(%esp),%eax
      0x080484a9 <+140>: mov 0x40(%esp),%edx
      0x080484ad <+144>: sub %eax,%edx
      0x080484af <+146>: mov %edx,%eax
      0x080484b1 <+148>: fstpl 0x18(%esp)
      0x080484b5 <+152>: fstpl 0x10(%esp)
      0x080484b9 <+156>: mov %eax,0xc(%esp)
      0x080484bd <+160>: mov 0x3c(%esp),%eax
      0x080484c1 <+164>: mov %eax,0x8(%esp)
      0x080484c5 <+168>: mov 0x40(%esp),%eax
      0x080484c9 <+172>: mov %eax,0x4(%esp)
      0x080484cd <+176>: movl $0x8048588,(%esp)
      0x080484d4 <+183>: call 0x80482f0 <printf@plt>
      0x080484d9 <+188>: addl $0x1,0x4c(%esp)
      0x080484de <+193>: cmpl $0x14,0x4c(%esp)
      0x080484e3 <+198>: jle 0x8048433 <main+22>
      0x080484e9 <+204>: leave
      0x080484ea <+205>: ret





      share|improve this answer



























        1












        1








        1







        The main reason is that the rounding-control (RC) field of the x87 FPU control register values are inconsistent in the follow two lines. eventually the values of c1 and c2 are different.



        0x08048457 <+58>: fstps 0x44(%esp)
        0x0804848b <+110>: fistpl 0x3c(%esp)


        Add gcc compile option -mfpmath=387 -mno-sse, it can be reproduced (Even without -m32, or change the float to a double)

        Like this:



        gcc -otest test.c -g -mfpmath=387 -mno-sse -m32


        Then use gdb to debug, breakpoint at 0x0804845b, and run to i=1



         0x08048457 <+58>: fstps 0x44(%esp)
        0x0804845b <+62>: flds 0x44(%esp)

        (gdb) info float
        =>R7: Valid 0x4003f7ffff8000000000 +30.99999904632568359
        R6: Empty 0x4002a000000000000000
        R5: Empty 0x00000000000000000000
        R4: Empty 0x00000000000000000000
        R3: Empty 0x00000000000000000000
        R2: Empty 0x00000000000000000000
        R1: Empty 0x00000000000000000000
        R0: Empty 0x00000000000000000000

        Status Word: 0x3820 PE
        TOP: 7
        Control Word: 0x037f IM DM ZM OM UM PM
        PC: Extended Precision (64-bits)
        RC: Round to nearest
        Tag Word: 0x3fff
        Instruction Pointer: 0x00:0x08048455
        Operand Pointer: 0x00:0x00000000
        Opcode: 0x0000

        (gdb) x /xw 0x44+$esp
        0xffffb594: 0x41f80000 ==> 31.0, s=0, M=1.1111 E=4


        observe the execution results of fstps,

        at this time, the RC value on the control register on the fpu is Round to nearest.

        the value on the fpu register is 30.99999904632568359 (80bit).

        the value on 0x44(%esp) (variable "f10") is 31.0. (round to nearest)



        Then use gdb to debug, breakpoint at 0x0804848b, and run to i=1



         0x0804848b <+110>: fistpl 0x3c(%esp)

        (gdb) info float
        =>R7: Valid 0x4003f7ffff8000000000 +30.99999904632568359
        R6: Empty 0x4002a000000000000000
        R5: Empty 0x00000000000000000000
        R4: Empty 0x00000000000000000000
        R3: Empty 0x00000000000000000000
        R2: Empty 0x00000000000000000000
        R1: Empty 0x00000000000000000000
        R0: Empty 0x00000000000000000000

        Status Word: 0x3820 PE
        TOP: 7
        Control Word: 0x0c7f IM DM ZM OM UM PM
        PC: Single Precision (24-bits)
        RC: Round toward zero
        Tag Word: 0x3fff
        Instruction Pointer: 0x00:0x08048485
        Operand Pointer: 0x00:0x00000000
        Opcode: 0x0000


        at this time, the RC value on the control register on the fpu is Round toward zero.

        the value on the fpu register is 30.99999904632568359 (80bit). value is the same as above

        obviously when the integer is converted, the decimal point is truncated, and the value is 30.



        Below is the main decompiled code



         (gdb) disas main
        Dump of assembler code for function main:
        0x0804841d <+0>: push %ebp
        0x0804841e <+1>: mov %esp,%ebp
        0x08048420 <+3>: and $0xfffffff0,%esp
        0x08048423 <+6>: sub $0x50,%esp
        0x08048426 <+9>: movl $0x0,0x4c(%esp)
        0x0804842e <+17>: jmp 0x80484de <main+193>
        0x08048433 <+22>: fildl 0x4c(%esp)
        0x08048437 <+26>: fldl 0x80485a8
        0x0804843d <+32>: fmulp %st,%st(1)
        0x0804843f <+34>: fldl 0x80485b0
        0x08048445 <+40>: faddp %st,%st(1)
        0x08048447 <+42>: fstps 0x48(%esp)
        0x0804844b <+46>: flds 0x48(%esp)
        0x0804844f <+50>: flds 0x80485b8
        0x08048455 <+56>: fmulp %st,%st(1)
        0x08048457 <+58>: fstps 0x44(%esp) // store to f10
        0x0804845b <+62>: flds 0x44(%esp)
        0x0804845f <+66>: fnstcw 0x2a(%esp)
        0x08048463 <+70>: movzwl 0x2a(%esp),%eax
        0x08048468 <+75>: mov $0xc,%ah
        0x0804846a <+77>: mov %ax,0x28(%esp)
        0x0804846f <+82>: fldcw 0x28(%esp)
        0x08048473 <+86>: fistpl 0x40(%esp)
        0x08048477 <+90>: fldcw 0x2a(%esp)
        0x0804847b <+94>: flds 0x48(%esp)
        0x0804847f <+98>: fldl 0x80485c0
        0x08048485 <+104>: fmulp %st,%st(1)
        0x08048487 <+106>: fldcw 0x28(%esp)
        0x0804848b <+110>: fistpl 0x3c(%esp) // f1 * 10 convert int
        0x0804848f <+114>: fldcw 0x2a(%esp)
        0x08048493 <+118>: flds 0x48(%esp)
        0x08048497 <+122>: fldl 0x80485c0
        0x0804849d <+128>: fmulp %st,%st(1)
        0x0804849f <+130>: flds 0x44(%esp)
        0x080484a3 <+134>: fxch %st(1)
        0x080484a5 <+136>: mov 0x3c(%esp),%eax
        0x080484a9 <+140>: mov 0x40(%esp),%edx
        0x080484ad <+144>: sub %eax,%edx
        0x080484af <+146>: mov %edx,%eax
        0x080484b1 <+148>: fstpl 0x18(%esp)
        0x080484b5 <+152>: fstpl 0x10(%esp)
        0x080484b9 <+156>: mov %eax,0xc(%esp)
        0x080484bd <+160>: mov 0x3c(%esp),%eax
        0x080484c1 <+164>: mov %eax,0x8(%esp)
        0x080484c5 <+168>: mov 0x40(%esp),%eax
        0x080484c9 <+172>: mov %eax,0x4(%esp)
        0x080484cd <+176>: movl $0x8048588,(%esp)
        0x080484d4 <+183>: call 0x80482f0 <printf@plt>
        0x080484d9 <+188>: addl $0x1,0x4c(%esp)
        0x080484de <+193>: cmpl $0x14,0x4c(%esp)
        0x080484e3 <+198>: jle 0x8048433 <main+22>
        0x080484e9 <+204>: leave
        0x080484ea <+205>: ret





        share|improve this answer















        The main reason is that the rounding-control (RC) field of the x87 FPU control register values are inconsistent in the follow two lines. eventually the values of c1 and c2 are different.



        0x08048457 <+58>: fstps 0x44(%esp)
        0x0804848b <+110>: fistpl 0x3c(%esp)


        Add gcc compile option -mfpmath=387 -mno-sse, it can be reproduced (Even without -m32, or change the float to a double)

        Like this:



        gcc -otest test.c -g -mfpmath=387 -mno-sse -m32


        Then use gdb to debug, breakpoint at 0x0804845b, and run to i=1



         0x08048457 <+58>: fstps 0x44(%esp)
        0x0804845b <+62>: flds 0x44(%esp)

        (gdb) info float
        =>R7: Valid 0x4003f7ffff8000000000 +30.99999904632568359
        R6: Empty 0x4002a000000000000000
        R5: Empty 0x00000000000000000000
        R4: Empty 0x00000000000000000000
        R3: Empty 0x00000000000000000000
        R2: Empty 0x00000000000000000000
        R1: Empty 0x00000000000000000000
        R0: Empty 0x00000000000000000000

        Status Word: 0x3820 PE
        TOP: 7
        Control Word: 0x037f IM DM ZM OM UM PM
        PC: Extended Precision (64-bits)
        RC: Round to nearest
        Tag Word: 0x3fff
        Instruction Pointer: 0x00:0x08048455
        Operand Pointer: 0x00:0x00000000
        Opcode: 0x0000

        (gdb) x /xw 0x44+$esp
        0xffffb594: 0x41f80000 ==> 31.0, s=0, M=1.1111 E=4


        observe the execution results of fstps,

        at this time, the RC value on the control register on the fpu is Round to nearest.

        the value on the fpu register is 30.99999904632568359 (80bit).

        the value on 0x44(%esp) (variable "f10") is 31.0. (round to nearest)



        Then use gdb to debug, breakpoint at 0x0804848b, and run to i=1



         0x0804848b <+110>: fistpl 0x3c(%esp)

        (gdb) info float
        =>R7: Valid 0x4003f7ffff8000000000 +30.99999904632568359
        R6: Empty 0x4002a000000000000000
        R5: Empty 0x00000000000000000000
        R4: Empty 0x00000000000000000000
        R3: Empty 0x00000000000000000000
        R2: Empty 0x00000000000000000000
        R1: Empty 0x00000000000000000000
        R0: Empty 0x00000000000000000000

        Status Word: 0x3820 PE
        TOP: 7
        Control Word: 0x0c7f IM DM ZM OM UM PM
        PC: Single Precision (24-bits)
        RC: Round toward zero
        Tag Word: 0x3fff
        Instruction Pointer: 0x00:0x08048485
        Operand Pointer: 0x00:0x00000000
        Opcode: 0x0000


        at this time, the RC value on the control register on the fpu is Round toward zero.

        the value on the fpu register is 30.99999904632568359 (80bit). value is the same as above

        obviously when the integer is converted, the decimal point is truncated, and the value is 30.



        Below is the main decompiled code



         (gdb) disas main
        Dump of assembler code for function main:
        0x0804841d <+0>: push %ebp
        0x0804841e <+1>: mov %esp,%ebp
        0x08048420 <+3>: and $0xfffffff0,%esp
        0x08048423 <+6>: sub $0x50,%esp
        0x08048426 <+9>: movl $0x0,0x4c(%esp)
        0x0804842e <+17>: jmp 0x80484de <main+193>
        0x08048433 <+22>: fildl 0x4c(%esp)
        0x08048437 <+26>: fldl 0x80485a8
        0x0804843d <+32>: fmulp %st,%st(1)
        0x0804843f <+34>: fldl 0x80485b0
        0x08048445 <+40>: faddp %st,%st(1)
        0x08048447 <+42>: fstps 0x48(%esp)
        0x0804844b <+46>: flds 0x48(%esp)
        0x0804844f <+50>: flds 0x80485b8
        0x08048455 <+56>: fmulp %st,%st(1)
        0x08048457 <+58>: fstps 0x44(%esp) // store to f10
        0x0804845b <+62>: flds 0x44(%esp)
        0x0804845f <+66>: fnstcw 0x2a(%esp)
        0x08048463 <+70>: movzwl 0x2a(%esp),%eax
        0x08048468 <+75>: mov $0xc,%ah
        0x0804846a <+77>: mov %ax,0x28(%esp)
        0x0804846f <+82>: fldcw 0x28(%esp)
        0x08048473 <+86>: fistpl 0x40(%esp)
        0x08048477 <+90>: fldcw 0x2a(%esp)
        0x0804847b <+94>: flds 0x48(%esp)
        0x0804847f <+98>: fldl 0x80485c0
        0x08048485 <+104>: fmulp %st,%st(1)
        0x08048487 <+106>: fldcw 0x28(%esp)
        0x0804848b <+110>: fistpl 0x3c(%esp) // f1 * 10 convert int
        0x0804848f <+114>: fldcw 0x2a(%esp)
        0x08048493 <+118>: flds 0x48(%esp)
        0x08048497 <+122>: fldl 0x80485c0
        0x0804849d <+128>: fmulp %st,%st(1)
        0x0804849f <+130>: flds 0x44(%esp)
        0x080484a3 <+134>: fxch %st(1)
        0x080484a5 <+136>: mov 0x3c(%esp),%eax
        0x080484a9 <+140>: mov 0x40(%esp),%edx
        0x080484ad <+144>: sub %eax,%edx
        0x080484af <+146>: mov %edx,%eax
        0x080484b1 <+148>: fstpl 0x18(%esp)
        0x080484b5 <+152>: fstpl 0x10(%esp)
        0x080484b9 <+156>: mov %eax,0xc(%esp)
        0x080484bd <+160>: mov 0x3c(%esp),%eax
        0x080484c1 <+164>: mov %eax,0x8(%esp)
        0x080484c5 <+168>: mov 0x40(%esp),%eax
        0x080484c9 <+172>: mov %eax,0x4(%esp)
        0x080484cd <+176>: movl $0x8048588,(%esp)
        0x080484d4 <+183>: call 0x80482f0 <printf@plt>
        0x080484d9 <+188>: addl $0x1,0x4c(%esp)
        0x080484de <+193>: cmpl $0x14,0x4c(%esp)
        0x080484e3 <+198>: jle 0x8048433 <main+22>
        0x080484e9 <+204>: leave
        0x080484ea <+205>: ret






        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Feb 27 at 9:47

























        answered Feb 27 at 9:21









        ccxxshowccxxshow

        5343




        5343



























            draft saved

            draft discarded
















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54881541%2fdifference-in-casting-float-to-int-32-bit-c%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown






            Popular posts from this blog

            Peggy Mitchell

            The Forum (Inglewood, California)

            Palaiologos