Unions, aliasing and type-punning in practice: what works and what does not?

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP












16















I have a problem understanding what can and cannot be done using unions with GCC. I read the questions (in particular here and here) about it but they focus the C++ standard, I feel there's a mismatch between the C++ standard and the practice (the commonly used compilers).



In particular, I recently found confusing informations in the GCC online doc while reading about the compilation flag -fstrict-aliasing. It says:




-fstrict-aliasing



Allow the compiler to assume the strictest aliasing rules applicable to the language being compiled. For C (and C++), this activates optimizations based on the type of expressions. In particular, an object of one type is assumed never to reside at the same address as an object of a different type, unless the types are almost the same.
For example, an unsigned int can alias an int, but not a void* or a double. A character type may alias any other type.
Pay special attention to code like this:



union a_union 
int i;
double d;
;

int f()
union a_union t;
t.d = 3.0;
return t.i;



The practice of reading from a different union member than the one most recently written to (called “type-punning”) is common.
Even with -fstrict-aliasing, type-punning is allowed, provided the memory is accessed through the union type. So, the code above works as expected.




This is what I think I understood from this example and my doubts:



1) aliasing only works between similar types, or char



Consequence of 1): aliasing - as the word suggests - is when you have one value and two members to access it (i.e. the same bytes);



Doubt: are two types similar when they have the same size in bytes? If not, what are similar types?



Consequence of 1) for non similar types (whatever this means), aliasing does not work;



2) type punning is when we read a different member than the one we wrote to; it's common and it works as expected as long as the memory is accessed through the union type;



Doubt: is aliasing a specific case of type-punning where types are similar?



I get confused because it says unsigned int and double are not similar, so aliasing does not work; then in the example it's aliasing between int and double and it clearly says it works as expected, but calls it type-punning:
not because types are or are not similar, but because it's reading from a member it did not write. But reading from a member it did not write is what I understood aliasing is for (as the word suggests). I'm lost.



The questions:
can someone clarify the difference between aliasing and type-punning and what uses of the two techniques are working as expected in GCC? And what does the compiler flag do?










share|improve this question



















  • 5





    "I feel there's a mismatch between the specs and the practice" Until you upgrade your compiler and everything wreak havoc! (true story)

    – YSC
    Feb 19 at 9:06







  • 1





    For when you really need type punning: stackoverflow.com/a/17790026/8120642

    – hegel5000
    Feb 19 at 11:39















16















I have a problem understanding what can and cannot be done using unions with GCC. I read the questions (in particular here and here) about it but they focus the C++ standard, I feel there's a mismatch between the C++ standard and the practice (the commonly used compilers).



In particular, I recently found confusing informations in the GCC online doc while reading about the compilation flag -fstrict-aliasing. It says:




-fstrict-aliasing



Allow the compiler to assume the strictest aliasing rules applicable to the language being compiled. For C (and C++), this activates optimizations based on the type of expressions. In particular, an object of one type is assumed never to reside at the same address as an object of a different type, unless the types are almost the same.
For example, an unsigned int can alias an int, but not a void* or a double. A character type may alias any other type.
Pay special attention to code like this:



union a_union 
int i;
double d;
;

int f()
union a_union t;
t.d = 3.0;
return t.i;



The practice of reading from a different union member than the one most recently written to (called “type-punning”) is common.
Even with -fstrict-aliasing, type-punning is allowed, provided the memory is accessed through the union type. So, the code above works as expected.




This is what I think I understood from this example and my doubts:



1) aliasing only works between similar types, or char



Consequence of 1): aliasing - as the word suggests - is when you have one value and two members to access it (i.e. the same bytes);



Doubt: are two types similar when they have the same size in bytes? If not, what are similar types?



Consequence of 1) for non similar types (whatever this means), aliasing does not work;



2) type punning is when we read a different member than the one we wrote to; it's common and it works as expected as long as the memory is accessed through the union type;



Doubt: is aliasing a specific case of type-punning where types are similar?



I get confused because it says unsigned int and double are not similar, so aliasing does not work; then in the example it's aliasing between int and double and it clearly says it works as expected, but calls it type-punning:
not because types are or are not similar, but because it's reading from a member it did not write. But reading from a member it did not write is what I understood aliasing is for (as the word suggests). I'm lost.



The questions:
can someone clarify the difference between aliasing and type-punning and what uses of the two techniques are working as expected in GCC? And what does the compiler flag do?










share|improve this question



















  • 5





    "I feel there's a mismatch between the specs and the practice" Until you upgrade your compiler and everything wreak havoc! (true story)

    – YSC
    Feb 19 at 9:06







  • 1





    For when you really need type punning: stackoverflow.com/a/17790026/8120642

    – hegel5000
    Feb 19 at 11:39













16












16








16


5






I have a problem understanding what can and cannot be done using unions with GCC. I read the questions (in particular here and here) about it but they focus the C++ standard, I feel there's a mismatch between the C++ standard and the practice (the commonly used compilers).



In particular, I recently found confusing informations in the GCC online doc while reading about the compilation flag -fstrict-aliasing. It says:




-fstrict-aliasing



Allow the compiler to assume the strictest aliasing rules applicable to the language being compiled. For C (and C++), this activates optimizations based on the type of expressions. In particular, an object of one type is assumed never to reside at the same address as an object of a different type, unless the types are almost the same.
For example, an unsigned int can alias an int, but not a void* or a double. A character type may alias any other type.
Pay special attention to code like this:



union a_union 
int i;
double d;
;

int f()
union a_union t;
t.d = 3.0;
return t.i;



The practice of reading from a different union member than the one most recently written to (called “type-punning”) is common.
Even with -fstrict-aliasing, type-punning is allowed, provided the memory is accessed through the union type. So, the code above works as expected.




This is what I think I understood from this example and my doubts:



1) aliasing only works between similar types, or char



Consequence of 1): aliasing - as the word suggests - is when you have one value and two members to access it (i.e. the same bytes);



Doubt: are two types similar when they have the same size in bytes? If not, what are similar types?



Consequence of 1) for non similar types (whatever this means), aliasing does not work;



2) type punning is when we read a different member than the one we wrote to; it's common and it works as expected as long as the memory is accessed through the union type;



Doubt: is aliasing a specific case of type-punning where types are similar?



I get confused because it says unsigned int and double are not similar, so aliasing does not work; then in the example it's aliasing between int and double and it clearly says it works as expected, but calls it type-punning:
not because types are or are not similar, but because it's reading from a member it did not write. But reading from a member it did not write is what I understood aliasing is for (as the word suggests). I'm lost.



The questions:
can someone clarify the difference between aliasing and type-punning and what uses of the two techniques are working as expected in GCC? And what does the compiler flag do?










share|improve this question
















I have a problem understanding what can and cannot be done using unions with GCC. I read the questions (in particular here and here) about it but they focus the C++ standard, I feel there's a mismatch between the C++ standard and the practice (the commonly used compilers).



In particular, I recently found confusing informations in the GCC online doc while reading about the compilation flag -fstrict-aliasing. It says:




-fstrict-aliasing



Allow the compiler to assume the strictest aliasing rules applicable to the language being compiled. For C (and C++), this activates optimizations based on the type of expressions. In particular, an object of one type is assumed never to reside at the same address as an object of a different type, unless the types are almost the same.
For example, an unsigned int can alias an int, but not a void* or a double. A character type may alias any other type.
Pay special attention to code like this:



union a_union 
int i;
double d;
;

int f()
union a_union t;
t.d = 3.0;
return t.i;



The practice of reading from a different union member than the one most recently written to (called “type-punning”) is common.
Even with -fstrict-aliasing, type-punning is allowed, provided the memory is accessed through the union type. So, the code above works as expected.




This is what I think I understood from this example and my doubts:



1) aliasing only works between similar types, or char



Consequence of 1): aliasing - as the word suggests - is when you have one value and two members to access it (i.e. the same bytes);



Doubt: are two types similar when they have the same size in bytes? If not, what are similar types?



Consequence of 1) for non similar types (whatever this means), aliasing does not work;



2) type punning is when we read a different member than the one we wrote to; it's common and it works as expected as long as the memory is accessed through the union type;



Doubt: is aliasing a specific case of type-punning where types are similar?



I get confused because it says unsigned int and double are not similar, so aliasing does not work; then in the example it's aliasing between int and double and it clearly says it works as expected, but calls it type-punning:
not because types are or are not similar, but because it's reading from a member it did not write. But reading from a member it did not write is what I understood aliasing is for (as the word suggests). I'm lost.



The questions:
can someone clarify the difference between aliasing and type-punning and what uses of the two techniques are working as expected in GCC? And what does the compiler flag do?







c++ gcc strict-aliasing






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Feb 19 at 16:16









Justin

13.6k95798




13.6k95798










asked Feb 19 at 8:54









L.C.L.C.

481417




481417







  • 5





    "I feel there's a mismatch between the specs and the practice" Until you upgrade your compiler and everything wreak havoc! (true story)

    – YSC
    Feb 19 at 9:06







  • 1





    For when you really need type punning: stackoverflow.com/a/17790026/8120642

    – hegel5000
    Feb 19 at 11:39












  • 5





    "I feel there's a mismatch between the specs and the practice" Until you upgrade your compiler and everything wreak havoc! (true story)

    – YSC
    Feb 19 at 9:06







  • 1





    For when you really need type punning: stackoverflow.com/a/17790026/8120642

    – hegel5000
    Feb 19 at 11:39







5




5





"I feel there's a mismatch between the specs and the practice" Until you upgrade your compiler and everything wreak havoc! (true story)

– YSC
Feb 19 at 9:06






"I feel there's a mismatch between the specs and the practice" Until you upgrade your compiler and everything wreak havoc! (true story)

– YSC
Feb 19 at 9:06





1




1





For when you really need type punning: stackoverflow.com/a/17790026/8120642

– hegel5000
Feb 19 at 11:39





For when you really need type punning: stackoverflow.com/a/17790026/8120642

– hegel5000
Feb 19 at 11:39












5 Answers
5






active

oldest

votes


















9














Aliasing can be taken literally for what it means: it is when two different expressions refer to the same object. Type-punning is to "pun" a type, ie to use a object of some type as a different type.



Formally, type-punning is undefined behaviour with only a few exceptions. It happens commonly when you fiddle with bits carelessly



int mantissa(float f)

return (int&)f & 0x7FFFFF; // Accessing a float as if it's an int



The exceptions are (simplified)



  • Accessing integers as their unsigned/signed counterparts

  • Accessing anything as a char, unsigned char or std::byte

This is known as the strict-aliasing rule: the compiler can safely assume two expressions of different types never refer to the same object (except for the exceptions above) because they would otherwise have undefined behaviour. This facilitates optimizations such as



void transform(float* dst, const int* src, int n)

for(int i = 0; i < n; i++)
dst[i] = src[i]; // Can be unrolled and use vector instructions
// If dst and src alias the results would be wrong




What gcc says is it relaxes the rules a bit, and allows type-punning through unions even though the standard doesn't require it to



union 
int64_t num;
struct
int32_t hi, lo;
parts;
u = 42;
u.parts.hi = 420;


This is the type-pun gcc guarantees will work. Other cases may appear to work but may one day silently be broken.






share|improve this answer




















  • 2





    I think your example fails in that the layout of the bit fields in that structure is itself implementation defined. The poor definition of bit fields in C is one of those really annoying things that it is probably way too late to fix. The type pun is ok (in GCC at least), but the bit field may or may not do what you expect.

    – Dan Mills
    Feb 19 at 12:01











  • @DanMills Fair, but I couldn't think of a nice and easy pun off the top of my head. I reckoned if I wanted to show what practically works, might as well go all the way.

    – Passer By
    Feb 19 at 14:07






  • 2





    @PasserBy one somewhat common example is something like union long long x; struct unsigned low, high (or same, but with unsigned[2], you get the idea).

    – Dan M.
    Feb 19 at 14:25











  • In contexts other than the gcc/clang interpretation of the "strict aliasing" rule, the term "aliasing" would not be used to describe situations in which one reference is used to derive another, and the new reference is used to access the object and then abandoned before the object is used in any other way.

    – supercat
    Feb 19 at 16:58






  • 1





    @PasserBy: The fact that two references identify the same object at disjoint times does not imply aliasing. Nor does the fact that a reference is used to derive another reference that's immediately used to access the objects. Those are both expected access patterns, while the term "aliasing" refers to situations where one could see that the lifetimes of the references overlap without being able to see any relationship between them.

    – supercat
    Feb 20 at 4:22


















4














Terminology is a great thing, I can use it however I want, and so can everyone else!




are two types similar when they have the same size in bytes? If not, what are similar types?




Roughly speaking, types are similar when they differ by constness or signedness. Size in bytes alone is definitely not sufficient.




is aliasing a specific case of type-punning where types are similar?




Type punning is any technique that circumvents the type system.



Aliasing is a specific case of that which involves placing objects of different types at the same address. Aliasing is generally allowed when types are similar, and forbidden otherwise. In addition, one may access an object of any type through a char (or similar to char) lvalue, but doing the opposite (i.e. accessing an object of type char through a dissimilar type lvalue) is not allowed. This is guaranteed by both C and C++ standards, GCC simply implements what the standards mandate.



GCC documentation seems to use "type punning" in a narrow sense of reading a union member other than the one last written to. This kind of type punning is allowed by the C standard even when types are not similar. OTOH the C++ standard does not allow this. GCC may or may not extend the permission to C++, the documentation is not clear on this.



Without -fstrict-aliasing, GCC apparently relaxes these requirements, but it isn't clear to what exact extent. Note that -fstrict-aliasing is the default when performing an optimised build.



Bottom line, just program to the standard. If GCC relaxes the requirements of the standard, it isn't significant and isn't worth the trouble.






share|improve this answer


















  • 1





    The authors of the Standard deliberately allow specialized implementations to behave in ways that make them unsuitable for most purposes. Although 90%+ of the optimizations enabled by -fstrict-aliasing would be reasonable in a general-purpose implementation, the remaining 10% (phony "optimizations") make that mode unsuitable for many purposes.

    – supercat
    Feb 19 at 22:10



















2














In ANSI C (AKA C89) you have (section 3.3.2.3 Structure and union members):




if a member of a union object is accessed after a value has been stored in a different member of the object, the behavior is implementation-defined




In C99 you have (section 6.5.2.3 Structure and union members):




If the member used to access the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called "type punning"). This might be a trap representation.




IOW, union-based type punning is allowed in C, although the actual semantics may be different, depending on the language standard supported (note that the C99 semantics is narrower than the C89's implementation-defined).



In C99 you also have (section 6.5 Expressions):




An object shall have its stored value accessed only by an lvalue expression that has one of the following types:



— a type compatible with the effective type of the object,



— a qualified version of a type compatible with the effective type of the object,



— a type that is the signed or unsigned type corresponding to the effective type of the object,



— a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,



— an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or



— a character type.




And there's a section (6.2.7 Compatible type and composite type) in C99 that describes compatible types:




Two types have compatible type if their types are the same. Additional rules for
determining whether two types are compatible are described in 6.7.2 for type specifiers,
in 6.7.3 for type qualifiers, and in 6.7.5 for declarators. ...




And then (6.7.5.1 Pointer declarators):




For two pointer types to be compatible, both shall be identically qualified and both shall be pointers to compatible types.




Simplifying it a bit, this means that in C by using a pointer you can access signed ints as unsigned ints (and vice versa) and you can access individual chars in anything. Anything else would amount to aliasing violation.



You can find similar language in the various versions of the C++ standard. However, as far as I can see in C++03 and C++11 union-based type punning isn't explicitly allowed (unlike in C).






share|improve this answer























  • UV: this answer clarifies the "compatible types" concept (I suppose that's what they mean by "similar types"). I totally agree it's not explicitely allowed by the standard, but it works in some cases with GCC. It's one situation where "not explicitely allowed" does not mean forbidden.

    – L.C.
    Feb 19 at 10:45






  • 1





    @L.C. it doesn't mean that it won't suddenly break on a different compiler, arch, OS or even new compiler version either.

    – Dan M.
    Feb 19 at 14:28











  • You are right, point taken... but doing code that is open source and flexible and portable etc. is not always the main goal. It's not elegant, it's not a good practice, but sometimes one just wants a binary that runs on the current machine / OS... so if the compiler produces "valid" code that does what's expected... why not!

    – L.C.
    Feb 19 at 14:44


















2














According to the footnote 88 in the C11 draft N1570, the "strict aliasing rule" (6.5p7) is intended to specify the circumstances in which compilers must allow for the possibility that things may alias, but makes no attempt to define what aliasing is. Somewhere along the line, a popular belief has emerged that accesses other than those defined by the rule represent "aliasing", and those allowed don't, but in fact the opposite is true.



Given a function like:



int foo(int *p, int *q)
*p = 1; *q = 2; return *p;


Section 6.5p7 doesn't say that p and q won't alias if they identify the same storage. Rather, it specifies that they are allowed to alias.



Note that not all operations which involve accessing storage of one type as another represent aliasing. An operation on an lvalue which is freshly visibly derived from another object doesn't "alias" that other object. Instead, it is an operation upon that object. Aliasing occurs if, between the time a reference to some storage is created and the time it is used, the same storage is referenced in some way not derived from the first, or code enters a context wherein that occurs.



Although the ability to recognize when an lvalue is derived from another is a Quality of Implementation issue, the authors of the Standard must have expected implementations to recognize some constructs beyond those mandated. There is no general permission to access any of the storage associated with a struct or union by using an lvalue of member type, nor does anything in the Standard explicitly say that an operation involving someStruct.member must be recognized as an operation on a someStruct. Instead, the authors of the Standard expected that compiler writers who make a reasonable effort to support constructs their customers need should be better placed than the Committee to judge the needs of those customers and fulfill them. Since any compiler that makes an even-remotely-reasonable effort to recognize derived references would notice that someStruct.member is derived from someStruct, the authors of the Standard saw no need to explicitly mandate that.



Unfortunately, the treatment of constructs like:



actOnStruct(&someUnion.someStruct);
int q=*(someUnion.intArray+i)


has evolved from "It's sufficiently obvious that actOnStruct and the pointer dereference should be expected to act upon someUnion (and consequently all the members thereof) that there's no need to mandate such behavior" to "Since the Standard doesn't require that implementations recognize that the actions above might affect someUnion, any code relying upon such behavior is broken and need not be supported". Neither of the above constructs is reliably supported by gcc or clang except in -fno-strict-aliasing mode, even though most of the "optimizations" that would be blocked by supporting them would generate code that is "efficient" but useless.



If you're using -fno-strict-aliasing on any compiler having such an option, almost anything will work. If you're using -fstrict-aliasing on icc, it will try to support constructs that use type punning without aliasing, though I don't know if there's any documentation about exactly what constructs it does or does not handle. If you use -fstrict-aliasing on gcc or clang, anything at all that works is purely by happenstance.






share|improve this answer
































    0














    I think it's good to add a complementary answer, simply because when I asked the question I did not know how to fulfill my needs without using UNION: I got stubborn on using it because it seemed to answer precisely my needs.



    The good way to do type punning and to avoid possible consequences of undefined behavior (depending on the compiler and other env. settings) is to use std::memcpy and copy the memory bytes from one type to another. This is explained - for example - here and here.



    I've also read that often when a compiler produces valid code for type punning using unions, it produces the same binary code as if std::memcpy was used.



    Finally, even if this information does not directly answer my original question it's so strictly related that I felt it was useful to add it here.






    share|improve this answer
























      Your Answer






      StackExchange.ifUsing("editor", function ()
      StackExchange.using("externalEditor", function ()
      StackExchange.using("snippets", function ()
      StackExchange.snippets.init();
      );
      );
      , "code-snippets");

      StackExchange.ready(function()
      var channelOptions =
      tags: "".split(" "),
      id: "1"
      ;
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function()
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled)
      StackExchange.using("snippets", function()
      createEditor();
      );

      else
      createEditor();

      );

      function createEditor()
      StackExchange.prepareEditor(
      heartbeatType: 'answer',
      autoActivateHeartbeat: false,
      convertImagesToLinks: true,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: 10,
      bindNavPrevention: true,
      postfix: "",
      imageUploader:
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      ,
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      );



      );













      draft saved

      draft discarded


















      StackExchange.ready(
      function ()
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54762186%2funions-aliasing-and-type-punning-in-practice-what-works-and-what-does-not%23new-answer', 'question_page');

      );

      Post as a guest















      Required, but never shown

























      5 Answers
      5






      active

      oldest

      votes








      5 Answers
      5






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes









      9














      Aliasing can be taken literally for what it means: it is when two different expressions refer to the same object. Type-punning is to "pun" a type, ie to use a object of some type as a different type.



      Formally, type-punning is undefined behaviour with only a few exceptions. It happens commonly when you fiddle with bits carelessly



      int mantissa(float f)

      return (int&)f & 0x7FFFFF; // Accessing a float as if it's an int



      The exceptions are (simplified)



      • Accessing integers as their unsigned/signed counterparts

      • Accessing anything as a char, unsigned char or std::byte

      This is known as the strict-aliasing rule: the compiler can safely assume two expressions of different types never refer to the same object (except for the exceptions above) because they would otherwise have undefined behaviour. This facilitates optimizations such as



      void transform(float* dst, const int* src, int n)

      for(int i = 0; i < n; i++)
      dst[i] = src[i]; // Can be unrolled and use vector instructions
      // If dst and src alias the results would be wrong




      What gcc says is it relaxes the rules a bit, and allows type-punning through unions even though the standard doesn't require it to



      union 
      int64_t num;
      struct
      int32_t hi, lo;
      parts;
      u = 42;
      u.parts.hi = 420;


      This is the type-pun gcc guarantees will work. Other cases may appear to work but may one day silently be broken.






      share|improve this answer




















      • 2





        I think your example fails in that the layout of the bit fields in that structure is itself implementation defined. The poor definition of bit fields in C is one of those really annoying things that it is probably way too late to fix. The type pun is ok (in GCC at least), but the bit field may or may not do what you expect.

        – Dan Mills
        Feb 19 at 12:01











      • @DanMills Fair, but I couldn't think of a nice and easy pun off the top of my head. I reckoned if I wanted to show what practically works, might as well go all the way.

        – Passer By
        Feb 19 at 14:07






      • 2





        @PasserBy one somewhat common example is something like union long long x; struct unsigned low, high (or same, but with unsigned[2], you get the idea).

        – Dan M.
        Feb 19 at 14:25











      • In contexts other than the gcc/clang interpretation of the "strict aliasing" rule, the term "aliasing" would not be used to describe situations in which one reference is used to derive another, and the new reference is used to access the object and then abandoned before the object is used in any other way.

        – supercat
        Feb 19 at 16:58






      • 1





        @PasserBy: The fact that two references identify the same object at disjoint times does not imply aliasing. Nor does the fact that a reference is used to derive another reference that's immediately used to access the objects. Those are both expected access patterns, while the term "aliasing" refers to situations where one could see that the lifetimes of the references overlap without being able to see any relationship between them.

        – supercat
        Feb 20 at 4:22















      9














      Aliasing can be taken literally for what it means: it is when two different expressions refer to the same object. Type-punning is to "pun" a type, ie to use a object of some type as a different type.



      Formally, type-punning is undefined behaviour with only a few exceptions. It happens commonly when you fiddle with bits carelessly



      int mantissa(float f)

      return (int&)f & 0x7FFFFF; // Accessing a float as if it's an int



      The exceptions are (simplified)



      • Accessing integers as their unsigned/signed counterparts

      • Accessing anything as a char, unsigned char or std::byte

      This is known as the strict-aliasing rule: the compiler can safely assume two expressions of different types never refer to the same object (except for the exceptions above) because they would otherwise have undefined behaviour. This facilitates optimizations such as



      void transform(float* dst, const int* src, int n)

      for(int i = 0; i < n; i++)
      dst[i] = src[i]; // Can be unrolled and use vector instructions
      // If dst and src alias the results would be wrong




      What gcc says is it relaxes the rules a bit, and allows type-punning through unions even though the standard doesn't require it to



      union 
      int64_t num;
      struct
      int32_t hi, lo;
      parts;
      u = 42;
      u.parts.hi = 420;


      This is the type-pun gcc guarantees will work. Other cases may appear to work but may one day silently be broken.






      share|improve this answer




















      • 2





        I think your example fails in that the layout of the bit fields in that structure is itself implementation defined. The poor definition of bit fields in C is one of those really annoying things that it is probably way too late to fix. The type pun is ok (in GCC at least), but the bit field may or may not do what you expect.

        – Dan Mills
        Feb 19 at 12:01











      • @DanMills Fair, but I couldn't think of a nice and easy pun off the top of my head. I reckoned if I wanted to show what practically works, might as well go all the way.

        – Passer By
        Feb 19 at 14:07






      • 2





        @PasserBy one somewhat common example is something like union long long x; struct unsigned low, high (or same, but with unsigned[2], you get the idea).

        – Dan M.
        Feb 19 at 14:25











      • In contexts other than the gcc/clang interpretation of the "strict aliasing" rule, the term "aliasing" would not be used to describe situations in which one reference is used to derive another, and the new reference is used to access the object and then abandoned before the object is used in any other way.

        – supercat
        Feb 19 at 16:58






      • 1





        @PasserBy: The fact that two references identify the same object at disjoint times does not imply aliasing. Nor does the fact that a reference is used to derive another reference that's immediately used to access the objects. Those are both expected access patterns, while the term "aliasing" refers to situations where one could see that the lifetimes of the references overlap without being able to see any relationship between them.

        – supercat
        Feb 20 at 4:22













      9












      9








      9







      Aliasing can be taken literally for what it means: it is when two different expressions refer to the same object. Type-punning is to "pun" a type, ie to use a object of some type as a different type.



      Formally, type-punning is undefined behaviour with only a few exceptions. It happens commonly when you fiddle with bits carelessly



      int mantissa(float f)

      return (int&)f & 0x7FFFFF; // Accessing a float as if it's an int



      The exceptions are (simplified)



      • Accessing integers as their unsigned/signed counterparts

      • Accessing anything as a char, unsigned char or std::byte

      This is known as the strict-aliasing rule: the compiler can safely assume two expressions of different types never refer to the same object (except for the exceptions above) because they would otherwise have undefined behaviour. This facilitates optimizations such as



      void transform(float* dst, const int* src, int n)

      for(int i = 0; i < n; i++)
      dst[i] = src[i]; // Can be unrolled and use vector instructions
      // If dst and src alias the results would be wrong




      What gcc says is it relaxes the rules a bit, and allows type-punning through unions even though the standard doesn't require it to



      union 
      int64_t num;
      struct
      int32_t hi, lo;
      parts;
      u = 42;
      u.parts.hi = 420;


      This is the type-pun gcc guarantees will work. Other cases may appear to work but may one day silently be broken.






      share|improve this answer















      Aliasing can be taken literally for what it means: it is when two different expressions refer to the same object. Type-punning is to "pun" a type, ie to use a object of some type as a different type.



      Formally, type-punning is undefined behaviour with only a few exceptions. It happens commonly when you fiddle with bits carelessly



      int mantissa(float f)

      return (int&)f & 0x7FFFFF; // Accessing a float as if it's an int



      The exceptions are (simplified)



      • Accessing integers as their unsigned/signed counterparts

      • Accessing anything as a char, unsigned char or std::byte

      This is known as the strict-aliasing rule: the compiler can safely assume two expressions of different types never refer to the same object (except for the exceptions above) because they would otherwise have undefined behaviour. This facilitates optimizations such as



      void transform(float* dst, const int* src, int n)

      for(int i = 0; i < n; i++)
      dst[i] = src[i]; // Can be unrolled and use vector instructions
      // If dst and src alias the results would be wrong




      What gcc says is it relaxes the rules a bit, and allows type-punning through unions even though the standard doesn't require it to



      union 
      int64_t num;
      struct
      int32_t hi, lo;
      parts;
      u = 42;
      u.parts.hi = 420;


      This is the type-pun gcc guarantees will work. Other cases may appear to work but may one day silently be broken.







      share|improve this answer














      share|improve this answer



      share|improve this answer








      edited Feb 19 at 14:41

























      answered Feb 19 at 10:19









      Passer ByPasser By

      10.1k32560




      10.1k32560







      • 2





        I think your example fails in that the layout of the bit fields in that structure is itself implementation defined. The poor definition of bit fields in C is one of those really annoying things that it is probably way too late to fix. The type pun is ok (in GCC at least), but the bit field may or may not do what you expect.

        – Dan Mills
        Feb 19 at 12:01











      • @DanMills Fair, but I couldn't think of a nice and easy pun off the top of my head. I reckoned if I wanted to show what practically works, might as well go all the way.

        – Passer By
        Feb 19 at 14:07






      • 2





        @PasserBy one somewhat common example is something like union long long x; struct unsigned low, high (or same, but with unsigned[2], you get the idea).

        – Dan M.
        Feb 19 at 14:25











      • In contexts other than the gcc/clang interpretation of the "strict aliasing" rule, the term "aliasing" would not be used to describe situations in which one reference is used to derive another, and the new reference is used to access the object and then abandoned before the object is used in any other way.

        – supercat
        Feb 19 at 16:58






      • 1





        @PasserBy: The fact that two references identify the same object at disjoint times does not imply aliasing. Nor does the fact that a reference is used to derive another reference that's immediately used to access the objects. Those are both expected access patterns, while the term "aliasing" refers to situations where one could see that the lifetimes of the references overlap without being able to see any relationship between them.

        – supercat
        Feb 20 at 4:22












      • 2





        I think your example fails in that the layout of the bit fields in that structure is itself implementation defined. The poor definition of bit fields in C is one of those really annoying things that it is probably way too late to fix. The type pun is ok (in GCC at least), but the bit field may or may not do what you expect.

        – Dan Mills
        Feb 19 at 12:01











      • @DanMills Fair, but I couldn't think of a nice and easy pun off the top of my head. I reckoned if I wanted to show what practically works, might as well go all the way.

        – Passer By
        Feb 19 at 14:07






      • 2





        @PasserBy one somewhat common example is something like union long long x; struct unsigned low, high (or same, but with unsigned[2], you get the idea).

        – Dan M.
        Feb 19 at 14:25











      • In contexts other than the gcc/clang interpretation of the "strict aliasing" rule, the term "aliasing" would not be used to describe situations in which one reference is used to derive another, and the new reference is used to access the object and then abandoned before the object is used in any other way.

        – supercat
        Feb 19 at 16:58






      • 1





        @PasserBy: The fact that two references identify the same object at disjoint times does not imply aliasing. Nor does the fact that a reference is used to derive another reference that's immediately used to access the objects. Those are both expected access patterns, while the term "aliasing" refers to situations where one could see that the lifetimes of the references overlap without being able to see any relationship between them.

        – supercat
        Feb 20 at 4:22







      2




      2





      I think your example fails in that the layout of the bit fields in that structure is itself implementation defined. The poor definition of bit fields in C is one of those really annoying things that it is probably way too late to fix. The type pun is ok (in GCC at least), but the bit field may or may not do what you expect.

      – Dan Mills
      Feb 19 at 12:01





      I think your example fails in that the layout of the bit fields in that structure is itself implementation defined. The poor definition of bit fields in C is one of those really annoying things that it is probably way too late to fix. The type pun is ok (in GCC at least), but the bit field may or may not do what you expect.

      – Dan Mills
      Feb 19 at 12:01













      @DanMills Fair, but I couldn't think of a nice and easy pun off the top of my head. I reckoned if I wanted to show what practically works, might as well go all the way.

      – Passer By
      Feb 19 at 14:07





      @DanMills Fair, but I couldn't think of a nice and easy pun off the top of my head. I reckoned if I wanted to show what practically works, might as well go all the way.

      – Passer By
      Feb 19 at 14:07




      2




      2





      @PasserBy one somewhat common example is something like union long long x; struct unsigned low, high (or same, but with unsigned[2], you get the idea).

      – Dan M.
      Feb 19 at 14:25





      @PasserBy one somewhat common example is something like union long long x; struct unsigned low, high (or same, but with unsigned[2], you get the idea).

      – Dan M.
      Feb 19 at 14:25













      In contexts other than the gcc/clang interpretation of the "strict aliasing" rule, the term "aliasing" would not be used to describe situations in which one reference is used to derive another, and the new reference is used to access the object and then abandoned before the object is used in any other way.

      – supercat
      Feb 19 at 16:58





      In contexts other than the gcc/clang interpretation of the "strict aliasing" rule, the term "aliasing" would not be used to describe situations in which one reference is used to derive another, and the new reference is used to access the object and then abandoned before the object is used in any other way.

      – supercat
      Feb 19 at 16:58




      1




      1





      @PasserBy: The fact that two references identify the same object at disjoint times does not imply aliasing. Nor does the fact that a reference is used to derive another reference that's immediately used to access the objects. Those are both expected access patterns, while the term "aliasing" refers to situations where one could see that the lifetimes of the references overlap without being able to see any relationship between them.

      – supercat
      Feb 20 at 4:22





      @PasserBy: The fact that two references identify the same object at disjoint times does not imply aliasing. Nor does the fact that a reference is used to derive another reference that's immediately used to access the objects. Those are both expected access patterns, while the term "aliasing" refers to situations where one could see that the lifetimes of the references overlap without being able to see any relationship between them.

      – supercat
      Feb 20 at 4:22













      4














      Terminology is a great thing, I can use it however I want, and so can everyone else!




      are two types similar when they have the same size in bytes? If not, what are similar types?




      Roughly speaking, types are similar when they differ by constness or signedness. Size in bytes alone is definitely not sufficient.




      is aliasing a specific case of type-punning where types are similar?




      Type punning is any technique that circumvents the type system.



      Aliasing is a specific case of that which involves placing objects of different types at the same address. Aliasing is generally allowed when types are similar, and forbidden otherwise. In addition, one may access an object of any type through a char (or similar to char) lvalue, but doing the opposite (i.e. accessing an object of type char through a dissimilar type lvalue) is not allowed. This is guaranteed by both C and C++ standards, GCC simply implements what the standards mandate.



      GCC documentation seems to use "type punning" in a narrow sense of reading a union member other than the one last written to. This kind of type punning is allowed by the C standard even when types are not similar. OTOH the C++ standard does not allow this. GCC may or may not extend the permission to C++, the documentation is not clear on this.



      Without -fstrict-aliasing, GCC apparently relaxes these requirements, but it isn't clear to what exact extent. Note that -fstrict-aliasing is the default when performing an optimised build.



      Bottom line, just program to the standard. If GCC relaxes the requirements of the standard, it isn't significant and isn't worth the trouble.






      share|improve this answer


















      • 1





        The authors of the Standard deliberately allow specialized implementations to behave in ways that make them unsuitable for most purposes. Although 90%+ of the optimizations enabled by -fstrict-aliasing would be reasonable in a general-purpose implementation, the remaining 10% (phony "optimizations") make that mode unsuitable for many purposes.

        – supercat
        Feb 19 at 22:10
















      4














      Terminology is a great thing, I can use it however I want, and so can everyone else!




      are two types similar when they have the same size in bytes? If not, what are similar types?




      Roughly speaking, types are similar when they differ by constness or signedness. Size in bytes alone is definitely not sufficient.




      is aliasing a specific case of type-punning where types are similar?




      Type punning is any technique that circumvents the type system.



      Aliasing is a specific case of that which involves placing objects of different types at the same address. Aliasing is generally allowed when types are similar, and forbidden otherwise. In addition, one may access an object of any type through a char (or similar to char) lvalue, but doing the opposite (i.e. accessing an object of type char through a dissimilar type lvalue) is not allowed. This is guaranteed by both C and C++ standards, GCC simply implements what the standards mandate.



      GCC documentation seems to use "type punning" in a narrow sense of reading a union member other than the one last written to. This kind of type punning is allowed by the C standard even when types are not similar. OTOH the C++ standard does not allow this. GCC may or may not extend the permission to C++, the documentation is not clear on this.



      Without -fstrict-aliasing, GCC apparently relaxes these requirements, but it isn't clear to what exact extent. Note that -fstrict-aliasing is the default when performing an optimised build.



      Bottom line, just program to the standard. If GCC relaxes the requirements of the standard, it isn't significant and isn't worth the trouble.






      share|improve this answer


















      • 1





        The authors of the Standard deliberately allow specialized implementations to behave in ways that make them unsuitable for most purposes. Although 90%+ of the optimizations enabled by -fstrict-aliasing would be reasonable in a general-purpose implementation, the remaining 10% (phony "optimizations") make that mode unsuitable for many purposes.

        – supercat
        Feb 19 at 22:10














      4












      4








      4







      Terminology is a great thing, I can use it however I want, and so can everyone else!




      are two types similar when they have the same size in bytes? If not, what are similar types?




      Roughly speaking, types are similar when they differ by constness or signedness. Size in bytes alone is definitely not sufficient.




      is aliasing a specific case of type-punning where types are similar?




      Type punning is any technique that circumvents the type system.



      Aliasing is a specific case of that which involves placing objects of different types at the same address. Aliasing is generally allowed when types are similar, and forbidden otherwise. In addition, one may access an object of any type through a char (or similar to char) lvalue, but doing the opposite (i.e. accessing an object of type char through a dissimilar type lvalue) is not allowed. This is guaranteed by both C and C++ standards, GCC simply implements what the standards mandate.



      GCC documentation seems to use "type punning" in a narrow sense of reading a union member other than the one last written to. This kind of type punning is allowed by the C standard even when types are not similar. OTOH the C++ standard does not allow this. GCC may or may not extend the permission to C++, the documentation is not clear on this.



      Without -fstrict-aliasing, GCC apparently relaxes these requirements, but it isn't clear to what exact extent. Note that -fstrict-aliasing is the default when performing an optimised build.



      Bottom line, just program to the standard. If GCC relaxes the requirements of the standard, it isn't significant and isn't worth the trouble.






      share|improve this answer













      Terminology is a great thing, I can use it however I want, and so can everyone else!




      are two types similar when they have the same size in bytes? If not, what are similar types?




      Roughly speaking, types are similar when they differ by constness or signedness. Size in bytes alone is definitely not sufficient.




      is aliasing a specific case of type-punning where types are similar?




      Type punning is any technique that circumvents the type system.



      Aliasing is a specific case of that which involves placing objects of different types at the same address. Aliasing is generally allowed when types are similar, and forbidden otherwise. In addition, one may access an object of any type through a char (or similar to char) lvalue, but doing the opposite (i.e. accessing an object of type char through a dissimilar type lvalue) is not allowed. This is guaranteed by both C and C++ standards, GCC simply implements what the standards mandate.



      GCC documentation seems to use "type punning" in a narrow sense of reading a union member other than the one last written to. This kind of type punning is allowed by the C standard even when types are not similar. OTOH the C++ standard does not allow this. GCC may or may not extend the permission to C++, the documentation is not clear on this.



      Without -fstrict-aliasing, GCC apparently relaxes these requirements, but it isn't clear to what exact extent. Note that -fstrict-aliasing is the default when performing an optimised build.



      Bottom line, just program to the standard. If GCC relaxes the requirements of the standard, it isn't significant and isn't worth the trouble.







      share|improve this answer












      share|improve this answer



      share|improve this answer










      answered Feb 19 at 11:20









      n.m.n.m.

      73.3k883169




      73.3k883169







      • 1





        The authors of the Standard deliberately allow specialized implementations to behave in ways that make them unsuitable for most purposes. Although 90%+ of the optimizations enabled by -fstrict-aliasing would be reasonable in a general-purpose implementation, the remaining 10% (phony "optimizations") make that mode unsuitable for many purposes.

        – supercat
        Feb 19 at 22:10













      • 1





        The authors of the Standard deliberately allow specialized implementations to behave in ways that make them unsuitable for most purposes. Although 90%+ of the optimizations enabled by -fstrict-aliasing would be reasonable in a general-purpose implementation, the remaining 10% (phony "optimizations") make that mode unsuitable for many purposes.

        – supercat
        Feb 19 at 22:10








      1




      1





      The authors of the Standard deliberately allow specialized implementations to behave in ways that make them unsuitable for most purposes. Although 90%+ of the optimizations enabled by -fstrict-aliasing would be reasonable in a general-purpose implementation, the remaining 10% (phony "optimizations") make that mode unsuitable for many purposes.

      – supercat
      Feb 19 at 22:10






      The authors of the Standard deliberately allow specialized implementations to behave in ways that make them unsuitable for most purposes. Although 90%+ of the optimizations enabled by -fstrict-aliasing would be reasonable in a general-purpose implementation, the remaining 10% (phony "optimizations") make that mode unsuitable for many purposes.

      – supercat
      Feb 19 at 22:10












      2














      In ANSI C (AKA C89) you have (section 3.3.2.3 Structure and union members):




      if a member of a union object is accessed after a value has been stored in a different member of the object, the behavior is implementation-defined




      In C99 you have (section 6.5.2.3 Structure and union members):




      If the member used to access the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called "type punning"). This might be a trap representation.




      IOW, union-based type punning is allowed in C, although the actual semantics may be different, depending on the language standard supported (note that the C99 semantics is narrower than the C89's implementation-defined).



      In C99 you also have (section 6.5 Expressions):




      An object shall have its stored value accessed only by an lvalue expression that has one of the following types:



      — a type compatible with the effective type of the object,



      — a qualified version of a type compatible with the effective type of the object,



      — a type that is the signed or unsigned type corresponding to the effective type of the object,



      — a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,



      — an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or



      — a character type.




      And there's a section (6.2.7 Compatible type and composite type) in C99 that describes compatible types:




      Two types have compatible type if their types are the same. Additional rules for
      determining whether two types are compatible are described in 6.7.2 for type specifiers,
      in 6.7.3 for type qualifiers, and in 6.7.5 for declarators. ...




      And then (6.7.5.1 Pointer declarators):




      For two pointer types to be compatible, both shall be identically qualified and both shall be pointers to compatible types.




      Simplifying it a bit, this means that in C by using a pointer you can access signed ints as unsigned ints (and vice versa) and you can access individual chars in anything. Anything else would amount to aliasing violation.



      You can find similar language in the various versions of the C++ standard. However, as far as I can see in C++03 and C++11 union-based type punning isn't explicitly allowed (unlike in C).






      share|improve this answer























      • UV: this answer clarifies the "compatible types" concept (I suppose that's what they mean by "similar types"). I totally agree it's not explicitely allowed by the standard, but it works in some cases with GCC. It's one situation where "not explicitely allowed" does not mean forbidden.

        – L.C.
        Feb 19 at 10:45






      • 1





        @L.C. it doesn't mean that it won't suddenly break on a different compiler, arch, OS or even new compiler version either.

        – Dan M.
        Feb 19 at 14:28











      • You are right, point taken... but doing code that is open source and flexible and portable etc. is not always the main goal. It's not elegant, it's not a good practice, but sometimes one just wants a binary that runs on the current machine / OS... so if the compiler produces "valid" code that does what's expected... why not!

        – L.C.
        Feb 19 at 14:44















      2














      In ANSI C (AKA C89) you have (section 3.3.2.3 Structure and union members):




      if a member of a union object is accessed after a value has been stored in a different member of the object, the behavior is implementation-defined




      In C99 you have (section 6.5.2.3 Structure and union members):




      If the member used to access the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called "type punning"). This might be a trap representation.




      IOW, union-based type punning is allowed in C, although the actual semantics may be different, depending on the language standard supported (note that the C99 semantics is narrower than the C89's implementation-defined).



      In C99 you also have (section 6.5 Expressions):




      An object shall have its stored value accessed only by an lvalue expression that has one of the following types:



      — a type compatible with the effective type of the object,



      — a qualified version of a type compatible with the effective type of the object,



      — a type that is the signed or unsigned type corresponding to the effective type of the object,



      — a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,



      — an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or



      — a character type.




      And there's a section (6.2.7 Compatible type and composite type) in C99 that describes compatible types:




      Two types have compatible type if their types are the same. Additional rules for
      determining whether two types are compatible are described in 6.7.2 for type specifiers,
      in 6.7.3 for type qualifiers, and in 6.7.5 for declarators. ...




      And then (6.7.5.1 Pointer declarators):




      For two pointer types to be compatible, both shall be identically qualified and both shall be pointers to compatible types.




      Simplifying it a bit, this means that in C by using a pointer you can access signed ints as unsigned ints (and vice versa) and you can access individual chars in anything. Anything else would amount to aliasing violation.



      You can find similar language in the various versions of the C++ standard. However, as far as I can see in C++03 and C++11 union-based type punning isn't explicitly allowed (unlike in C).






      share|improve this answer























      • UV: this answer clarifies the "compatible types" concept (I suppose that's what they mean by "similar types"). I totally agree it's not explicitely allowed by the standard, but it works in some cases with GCC. It's one situation where "not explicitely allowed" does not mean forbidden.

        – L.C.
        Feb 19 at 10:45






      • 1





        @L.C. it doesn't mean that it won't suddenly break on a different compiler, arch, OS or even new compiler version either.

        – Dan M.
        Feb 19 at 14:28











      • You are right, point taken... but doing code that is open source and flexible and portable etc. is not always the main goal. It's not elegant, it's not a good practice, but sometimes one just wants a binary that runs on the current machine / OS... so if the compiler produces "valid" code that does what's expected... why not!

        – L.C.
        Feb 19 at 14:44













      2












      2








      2







      In ANSI C (AKA C89) you have (section 3.3.2.3 Structure and union members):




      if a member of a union object is accessed after a value has been stored in a different member of the object, the behavior is implementation-defined




      In C99 you have (section 6.5.2.3 Structure and union members):




      If the member used to access the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called "type punning"). This might be a trap representation.




      IOW, union-based type punning is allowed in C, although the actual semantics may be different, depending on the language standard supported (note that the C99 semantics is narrower than the C89's implementation-defined).



      In C99 you also have (section 6.5 Expressions):




      An object shall have its stored value accessed only by an lvalue expression that has one of the following types:



      — a type compatible with the effective type of the object,



      — a qualified version of a type compatible with the effective type of the object,



      — a type that is the signed or unsigned type corresponding to the effective type of the object,



      — a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,



      — an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or



      — a character type.




      And there's a section (6.2.7 Compatible type and composite type) in C99 that describes compatible types:




      Two types have compatible type if their types are the same. Additional rules for
      determining whether two types are compatible are described in 6.7.2 for type specifiers,
      in 6.7.3 for type qualifiers, and in 6.7.5 for declarators. ...




      And then (6.7.5.1 Pointer declarators):




      For two pointer types to be compatible, both shall be identically qualified and both shall be pointers to compatible types.




      Simplifying it a bit, this means that in C by using a pointer you can access signed ints as unsigned ints (and vice versa) and you can access individual chars in anything. Anything else would amount to aliasing violation.



      You can find similar language in the various versions of the C++ standard. However, as far as I can see in C++03 and C++11 union-based type punning isn't explicitly allowed (unlike in C).






      share|improve this answer













      In ANSI C (AKA C89) you have (section 3.3.2.3 Structure and union members):




      if a member of a union object is accessed after a value has been stored in a different member of the object, the behavior is implementation-defined




      In C99 you have (section 6.5.2.3 Structure and union members):




      If the member used to access the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called "type punning"). This might be a trap representation.




      IOW, union-based type punning is allowed in C, although the actual semantics may be different, depending on the language standard supported (note that the C99 semantics is narrower than the C89's implementation-defined).



      In C99 you also have (section 6.5 Expressions):




      An object shall have its stored value accessed only by an lvalue expression that has one of the following types:



      — a type compatible with the effective type of the object,



      — a qualified version of a type compatible with the effective type of the object,



      — a type that is the signed or unsigned type corresponding to the effective type of the object,



      — a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,



      — an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or



      — a character type.




      And there's a section (6.2.7 Compatible type and composite type) in C99 that describes compatible types:




      Two types have compatible type if their types are the same. Additional rules for
      determining whether two types are compatible are described in 6.7.2 for type specifiers,
      in 6.7.3 for type qualifiers, and in 6.7.5 for declarators. ...




      And then (6.7.5.1 Pointer declarators):




      For two pointer types to be compatible, both shall be identically qualified and both shall be pointers to compatible types.




      Simplifying it a bit, this means that in C by using a pointer you can access signed ints as unsigned ints (and vice versa) and you can access individual chars in anything. Anything else would amount to aliasing violation.



      You can find similar language in the various versions of the C++ standard. However, as far as I can see in C++03 and C++11 union-based type punning isn't explicitly allowed (unlike in C).







      share|improve this answer












      share|improve this answer



      share|improve this answer










      answered Feb 19 at 10:36









      Alexey FrunzeAlexey Frunze

      52.1k953129




      52.1k953129












      • UV: this answer clarifies the "compatible types" concept (I suppose that's what they mean by "similar types"). I totally agree it's not explicitely allowed by the standard, but it works in some cases with GCC. It's one situation where "not explicitely allowed" does not mean forbidden.

        – L.C.
        Feb 19 at 10:45






      • 1





        @L.C. it doesn't mean that it won't suddenly break on a different compiler, arch, OS or even new compiler version either.

        – Dan M.
        Feb 19 at 14:28











      • You are right, point taken... but doing code that is open source and flexible and portable etc. is not always the main goal. It's not elegant, it's not a good practice, but sometimes one just wants a binary that runs on the current machine / OS... so if the compiler produces "valid" code that does what's expected... why not!

        – L.C.
        Feb 19 at 14:44

















      • UV: this answer clarifies the "compatible types" concept (I suppose that's what they mean by "similar types"). I totally agree it's not explicitely allowed by the standard, but it works in some cases with GCC. It's one situation where "not explicitely allowed" does not mean forbidden.

        – L.C.
        Feb 19 at 10:45






      • 1





        @L.C. it doesn't mean that it won't suddenly break on a different compiler, arch, OS or even new compiler version either.

        – Dan M.
        Feb 19 at 14:28











      • You are right, point taken... but doing code that is open source and flexible and portable etc. is not always the main goal. It's not elegant, it's not a good practice, but sometimes one just wants a binary that runs on the current machine / OS... so if the compiler produces "valid" code that does what's expected... why not!

        – L.C.
        Feb 19 at 14:44
















      UV: this answer clarifies the "compatible types" concept (I suppose that's what they mean by "similar types"). I totally agree it's not explicitely allowed by the standard, but it works in some cases with GCC. It's one situation where "not explicitely allowed" does not mean forbidden.

      – L.C.
      Feb 19 at 10:45





      UV: this answer clarifies the "compatible types" concept (I suppose that's what they mean by "similar types"). I totally agree it's not explicitely allowed by the standard, but it works in some cases with GCC. It's one situation where "not explicitely allowed" does not mean forbidden.

      – L.C.
      Feb 19 at 10:45




      1




      1





      @L.C. it doesn't mean that it won't suddenly break on a different compiler, arch, OS or even new compiler version either.

      – Dan M.
      Feb 19 at 14:28





      @L.C. it doesn't mean that it won't suddenly break on a different compiler, arch, OS or even new compiler version either.

      – Dan M.
      Feb 19 at 14:28













      You are right, point taken... but doing code that is open source and flexible and portable etc. is not always the main goal. It's not elegant, it's not a good practice, but sometimes one just wants a binary that runs on the current machine / OS... so if the compiler produces "valid" code that does what's expected... why not!

      – L.C.
      Feb 19 at 14:44





      You are right, point taken... but doing code that is open source and flexible and portable etc. is not always the main goal. It's not elegant, it's not a good practice, but sometimes one just wants a binary that runs on the current machine / OS... so if the compiler produces "valid" code that does what's expected... why not!

      – L.C.
      Feb 19 at 14:44











      2














      According to the footnote 88 in the C11 draft N1570, the "strict aliasing rule" (6.5p7) is intended to specify the circumstances in which compilers must allow for the possibility that things may alias, but makes no attempt to define what aliasing is. Somewhere along the line, a popular belief has emerged that accesses other than those defined by the rule represent "aliasing", and those allowed don't, but in fact the opposite is true.



      Given a function like:



      int foo(int *p, int *q)
      *p = 1; *q = 2; return *p;


      Section 6.5p7 doesn't say that p and q won't alias if they identify the same storage. Rather, it specifies that they are allowed to alias.



      Note that not all operations which involve accessing storage of one type as another represent aliasing. An operation on an lvalue which is freshly visibly derived from another object doesn't "alias" that other object. Instead, it is an operation upon that object. Aliasing occurs if, between the time a reference to some storage is created and the time it is used, the same storage is referenced in some way not derived from the first, or code enters a context wherein that occurs.



      Although the ability to recognize when an lvalue is derived from another is a Quality of Implementation issue, the authors of the Standard must have expected implementations to recognize some constructs beyond those mandated. There is no general permission to access any of the storage associated with a struct or union by using an lvalue of member type, nor does anything in the Standard explicitly say that an operation involving someStruct.member must be recognized as an operation on a someStruct. Instead, the authors of the Standard expected that compiler writers who make a reasonable effort to support constructs their customers need should be better placed than the Committee to judge the needs of those customers and fulfill them. Since any compiler that makes an even-remotely-reasonable effort to recognize derived references would notice that someStruct.member is derived from someStruct, the authors of the Standard saw no need to explicitly mandate that.



      Unfortunately, the treatment of constructs like:



      actOnStruct(&someUnion.someStruct);
      int q=*(someUnion.intArray+i)


      has evolved from "It's sufficiently obvious that actOnStruct and the pointer dereference should be expected to act upon someUnion (and consequently all the members thereof) that there's no need to mandate such behavior" to "Since the Standard doesn't require that implementations recognize that the actions above might affect someUnion, any code relying upon such behavior is broken and need not be supported". Neither of the above constructs is reliably supported by gcc or clang except in -fno-strict-aliasing mode, even though most of the "optimizations" that would be blocked by supporting them would generate code that is "efficient" but useless.



      If you're using -fno-strict-aliasing on any compiler having such an option, almost anything will work. If you're using -fstrict-aliasing on icc, it will try to support constructs that use type punning without aliasing, though I don't know if there's any documentation about exactly what constructs it does or does not handle. If you use -fstrict-aliasing on gcc or clang, anything at all that works is purely by happenstance.






      share|improve this answer





























        2














        According to the footnote 88 in the C11 draft N1570, the "strict aliasing rule" (6.5p7) is intended to specify the circumstances in which compilers must allow for the possibility that things may alias, but makes no attempt to define what aliasing is. Somewhere along the line, a popular belief has emerged that accesses other than those defined by the rule represent "aliasing", and those allowed don't, but in fact the opposite is true.



        Given a function like:



        int foo(int *p, int *q)
        *p = 1; *q = 2; return *p;


        Section 6.5p7 doesn't say that p and q won't alias if they identify the same storage. Rather, it specifies that they are allowed to alias.



        Note that not all operations which involve accessing storage of one type as another represent aliasing. An operation on an lvalue which is freshly visibly derived from another object doesn't "alias" that other object. Instead, it is an operation upon that object. Aliasing occurs if, between the time a reference to some storage is created and the time it is used, the same storage is referenced in some way not derived from the first, or code enters a context wherein that occurs.



        Although the ability to recognize when an lvalue is derived from another is a Quality of Implementation issue, the authors of the Standard must have expected implementations to recognize some constructs beyond those mandated. There is no general permission to access any of the storage associated with a struct or union by using an lvalue of member type, nor does anything in the Standard explicitly say that an operation involving someStruct.member must be recognized as an operation on a someStruct. Instead, the authors of the Standard expected that compiler writers who make a reasonable effort to support constructs their customers need should be better placed than the Committee to judge the needs of those customers and fulfill them. Since any compiler that makes an even-remotely-reasonable effort to recognize derived references would notice that someStruct.member is derived from someStruct, the authors of the Standard saw no need to explicitly mandate that.



        Unfortunately, the treatment of constructs like:



        actOnStruct(&someUnion.someStruct);
        int q=*(someUnion.intArray+i)


        has evolved from "It's sufficiently obvious that actOnStruct and the pointer dereference should be expected to act upon someUnion (and consequently all the members thereof) that there's no need to mandate such behavior" to "Since the Standard doesn't require that implementations recognize that the actions above might affect someUnion, any code relying upon such behavior is broken and need not be supported". Neither of the above constructs is reliably supported by gcc or clang except in -fno-strict-aliasing mode, even though most of the "optimizations" that would be blocked by supporting them would generate code that is "efficient" but useless.



        If you're using -fno-strict-aliasing on any compiler having such an option, almost anything will work. If you're using -fstrict-aliasing on icc, it will try to support constructs that use type punning without aliasing, though I don't know if there's any documentation about exactly what constructs it does or does not handle. If you use -fstrict-aliasing on gcc or clang, anything at all that works is purely by happenstance.






        share|improve this answer



























          2












          2








          2







          According to the footnote 88 in the C11 draft N1570, the "strict aliasing rule" (6.5p7) is intended to specify the circumstances in which compilers must allow for the possibility that things may alias, but makes no attempt to define what aliasing is. Somewhere along the line, a popular belief has emerged that accesses other than those defined by the rule represent "aliasing", and those allowed don't, but in fact the opposite is true.



          Given a function like:



          int foo(int *p, int *q)
          *p = 1; *q = 2; return *p;


          Section 6.5p7 doesn't say that p and q won't alias if they identify the same storage. Rather, it specifies that they are allowed to alias.



          Note that not all operations which involve accessing storage of one type as another represent aliasing. An operation on an lvalue which is freshly visibly derived from another object doesn't "alias" that other object. Instead, it is an operation upon that object. Aliasing occurs if, between the time a reference to some storage is created and the time it is used, the same storage is referenced in some way not derived from the first, or code enters a context wherein that occurs.



          Although the ability to recognize when an lvalue is derived from another is a Quality of Implementation issue, the authors of the Standard must have expected implementations to recognize some constructs beyond those mandated. There is no general permission to access any of the storage associated with a struct or union by using an lvalue of member type, nor does anything in the Standard explicitly say that an operation involving someStruct.member must be recognized as an operation on a someStruct. Instead, the authors of the Standard expected that compiler writers who make a reasonable effort to support constructs their customers need should be better placed than the Committee to judge the needs of those customers and fulfill them. Since any compiler that makes an even-remotely-reasonable effort to recognize derived references would notice that someStruct.member is derived from someStruct, the authors of the Standard saw no need to explicitly mandate that.



          Unfortunately, the treatment of constructs like:



          actOnStruct(&someUnion.someStruct);
          int q=*(someUnion.intArray+i)


          has evolved from "It's sufficiently obvious that actOnStruct and the pointer dereference should be expected to act upon someUnion (and consequently all the members thereof) that there's no need to mandate such behavior" to "Since the Standard doesn't require that implementations recognize that the actions above might affect someUnion, any code relying upon such behavior is broken and need not be supported". Neither of the above constructs is reliably supported by gcc or clang except in -fno-strict-aliasing mode, even though most of the "optimizations" that would be blocked by supporting them would generate code that is "efficient" but useless.



          If you're using -fno-strict-aliasing on any compiler having such an option, almost anything will work. If you're using -fstrict-aliasing on icc, it will try to support constructs that use type punning without aliasing, though I don't know if there's any documentation about exactly what constructs it does or does not handle. If you use -fstrict-aliasing on gcc or clang, anything at all that works is purely by happenstance.






          share|improve this answer















          According to the footnote 88 in the C11 draft N1570, the "strict aliasing rule" (6.5p7) is intended to specify the circumstances in which compilers must allow for the possibility that things may alias, but makes no attempt to define what aliasing is. Somewhere along the line, a popular belief has emerged that accesses other than those defined by the rule represent "aliasing", and those allowed don't, but in fact the opposite is true.



          Given a function like:



          int foo(int *p, int *q)
          *p = 1; *q = 2; return *p;


          Section 6.5p7 doesn't say that p and q won't alias if they identify the same storage. Rather, it specifies that they are allowed to alias.



          Note that not all operations which involve accessing storage of one type as another represent aliasing. An operation on an lvalue which is freshly visibly derived from another object doesn't "alias" that other object. Instead, it is an operation upon that object. Aliasing occurs if, between the time a reference to some storage is created and the time it is used, the same storage is referenced in some way not derived from the first, or code enters a context wherein that occurs.



          Although the ability to recognize when an lvalue is derived from another is a Quality of Implementation issue, the authors of the Standard must have expected implementations to recognize some constructs beyond those mandated. There is no general permission to access any of the storage associated with a struct or union by using an lvalue of member type, nor does anything in the Standard explicitly say that an operation involving someStruct.member must be recognized as an operation on a someStruct. Instead, the authors of the Standard expected that compiler writers who make a reasonable effort to support constructs their customers need should be better placed than the Committee to judge the needs of those customers and fulfill them. Since any compiler that makes an even-remotely-reasonable effort to recognize derived references would notice that someStruct.member is derived from someStruct, the authors of the Standard saw no need to explicitly mandate that.



          Unfortunately, the treatment of constructs like:



          actOnStruct(&someUnion.someStruct);
          int q=*(someUnion.intArray+i)


          has evolved from "It's sufficiently obvious that actOnStruct and the pointer dereference should be expected to act upon someUnion (and consequently all the members thereof) that there's no need to mandate such behavior" to "Since the Standard doesn't require that implementations recognize that the actions above might affect someUnion, any code relying upon such behavior is broken and need not be supported". Neither of the above constructs is reliably supported by gcc or clang except in -fno-strict-aliasing mode, even though most of the "optimizations" that would be blocked by supporting them would generate code that is "efficient" but useless.



          If you're using -fno-strict-aliasing on any compiler having such an option, almost anything will work. If you're using -fstrict-aliasing on icc, it will try to support constructs that use type punning without aliasing, though I don't know if there's any documentation about exactly what constructs it does or does not handle. If you use -fstrict-aliasing on gcc or clang, anything at all that works is purely by happenstance.







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Feb 19 at 18:09

























          answered Feb 19 at 17:25









          supercatsupercat

          57.6k3117153




          57.6k3117153





















              0














              I think it's good to add a complementary answer, simply because when I asked the question I did not know how to fulfill my needs without using UNION: I got stubborn on using it because it seemed to answer precisely my needs.



              The good way to do type punning and to avoid possible consequences of undefined behavior (depending on the compiler and other env. settings) is to use std::memcpy and copy the memory bytes from one type to another. This is explained - for example - here and here.



              I've also read that often when a compiler produces valid code for type punning using unions, it produces the same binary code as if std::memcpy was used.



              Finally, even if this information does not directly answer my original question it's so strictly related that I felt it was useful to add it here.






              share|improve this answer





























                0














                I think it's good to add a complementary answer, simply because when I asked the question I did not know how to fulfill my needs without using UNION: I got stubborn on using it because it seemed to answer precisely my needs.



                The good way to do type punning and to avoid possible consequences of undefined behavior (depending on the compiler and other env. settings) is to use std::memcpy and copy the memory bytes from one type to another. This is explained - for example - here and here.



                I've also read that often when a compiler produces valid code for type punning using unions, it produces the same binary code as if std::memcpy was used.



                Finally, even if this information does not directly answer my original question it's so strictly related that I felt it was useful to add it here.






                share|improve this answer



























                  0












                  0








                  0







                  I think it's good to add a complementary answer, simply because when I asked the question I did not know how to fulfill my needs without using UNION: I got stubborn on using it because it seemed to answer precisely my needs.



                  The good way to do type punning and to avoid possible consequences of undefined behavior (depending on the compiler and other env. settings) is to use std::memcpy and copy the memory bytes from one type to another. This is explained - for example - here and here.



                  I've also read that often when a compiler produces valid code for type punning using unions, it produces the same binary code as if std::memcpy was used.



                  Finally, even if this information does not directly answer my original question it's so strictly related that I felt it was useful to add it here.






                  share|improve this answer















                  I think it's good to add a complementary answer, simply because when I asked the question I did not know how to fulfill my needs without using UNION: I got stubborn on using it because it seemed to answer precisely my needs.



                  The good way to do type punning and to avoid possible consequences of undefined behavior (depending on the compiler and other env. settings) is to use std::memcpy and copy the memory bytes from one type to another. This is explained - for example - here and here.



                  I've also read that often when a compiler produces valid code for type punning using unions, it produces the same binary code as if std::memcpy was used.



                  Finally, even if this information does not directly answer my original question it's so strictly related that I felt it was useful to add it here.







                  share|improve this answer














                  share|improve this answer



                  share|improve this answer








                  edited Mar 1 at 8:32

























                  answered Feb 28 at 10:33









                  L.C.L.C.

                  481417




                  481417



























                      draft saved

                      draft discarded
















































                      Thanks for contributing an answer to Stack Overflow!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid


                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.

                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function ()
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54762186%2funions-aliasing-and-type-punning-in-practice-what-works-and-what-does-not%23new-answer', 'question_page');

                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown






                      Popular posts from this blog

                      How to check contact read email or not when send email to Individual?

                      Bahrain

                      Postfix configuration issue with fips on centos 7; mailgun relay