Blog of Julian Andres Klode

Debian, Ubuntu, Linux in general, and other free software

underscores and undefined behavior

As everyone should know, underscores in C are not cool, as they cause undefined behavior per 7.1.3:

All identifiers that begin with an underscore and either an uppercase letter or another underscore are always reserved for any use.
[…]
If the program declares or defines an identifier in a context in which it is reserved (other than as allowed by 7.1.4), or defines a reserved identifier as a macro name, the behavior is undefined.

Yet, they are widely used everywhere. Here are some examples:

  • inclusion guards in GLib: __G_VARIANT_H__
  • internal Python functions: _PyUnicode_AsString
  • various macros in APT: __deprecated, __hot

All of this triggers undefined behavior and is thus uncool. Of course in APT, it’s most stupid, as we do not have any namespace and could thus
end up redefining things we should not much more likely then the other two.

But why were those solutions chosen in the first place, and what is the alternative? I cannot answer the first question, but for the second one, the obvious alternative is to use trailing underscores:

  • inclusion guards, defined behavior: G_VARIANT_H__
  • internal functions, defined behavior: PyUnicode_AsString_
  • various macros, defined behavior: deprecated__, hot__

Then there is another class of reserved identifiers with underscores:

All identifiers that begin with an underscore are always reserved for use as identifiers
with file scope in both the ordinary and tag name spaces.

Meaning that everything except for parameters, local variables and members of structs/unions that starts with an underscore is reserved. So, if you happen to create a variable _mylibrary_debug_flag, you trigger undefined behavior as well. And while we’re at it, do not think you can create a type ending in _t: POSIX reserves all identifiers ending in _t for its own use.

In summary, whenever you write C and want to be 100% safe of undefined-behavior-because-of-naming, do not start any identifier with an underscore and do not end any identifier with _t.

About these ads

Written by Julian Andres Klode

May 11, 2011 at 18:05

Posted in General

4 Responses

Subscribe to comments with RSS.

  1. You don’t want to use trailing double underscores either. The C++ standard, at least, reserves all identifiers containing two underscores in a row anywhere in the identifier, not just at the beginning. So, assuming you want C++ compatibility, don’t use double underscores at all.

    Josh Triplett

    May 11, 2011 at 18:59

  2. I’d need to check the C and C++ standards myself to be sure, but I slightly doubt that include guards (and all other stuff touched by the preprocessor) are concerned: the C parser will never ever get to see them. Same holds for C++, obviously.

    Michael Tautschnig

    May 12, 2011 at 02:46

  3. Michael Tautschnig: ‘The implementation’ includes the standard C and C++ library headers. They normally need to use many macros, and that is why such identifiers are reserved as macro names too.

    Ben Hutchings

    May 12, 2011 at 03:00

  4. […] difficulty really comes down to a dispute of what deliverance is all about. Deliverance is not a behavior alteration program built on self-effort, but instead a metamorphosis of the inner man. Yes, it does […]

    Howard Good | TALKING BOOK

    November 19, 2011 at 06:15


Comments are closed.

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: