Uninitialised global data in C – .bss section vs common symbols

Opposite as one might expect, compiler developers are humans and sometimes a simple issue makes us think about what we really know. Sometimes it looks like everyday that passes by I know that I know less and less about C. The good thing is that everyday I learn something new.

Let’s try to understand what happened today.

Uninitialised global data in C is treated a bit specially (compared with initialised global data) because the compiler has to make sure it’ll be initialised to zero, i.e., it really means that pointers will be initialised to NULL and basic types to zero.  Obviously that there’s no need to save all the zeros inside an executable (ELF in my case). It should be enough to save some information about the symbol as being uninitialised and the size of that symbol. The loader will be responsible to initialise the ‘uninitialised’ global data in load time.

Let’s start with the case of using initialised global data.

File A:

int arr[5] = {1,2,3,4,5};

File B:

int arr[5] = {1,2,3,4,5};

Compiling and linking both files, what do you reckon should be the result of the compilation?

bss2.o:(.data+0x0): multiple definition of `arr2′
bss.o:(.data+0x0): first defined here
collect2: ld returned 1 exit status

The linker complained about multiple definitions. Brilliant! This should avoid some bugs in the future.

Let’s remove the initialisations and see what happens:

File A:

int arr[5];

File B:

int arr[5];

Result : compiled and linked ELF without problems. Why didn’t the linker complain about both symbol definitions? Because the C Standard says nothing to prohibit. The C Standard specifies very little about those kind of issues. It basically only states that the the compiler should emit errors/warnings for syntax errors.

As you might have guessed by reading the title of this post, the problem lies on the definition in the assembly file of both symbols. If the compiler uses a special section inside an ELF to specify uninitialised data(.bss section), the linker will see both definitions of the same symbol and it will complain about it. It’ll emit a warning by default but  it’s something…
You can force GCC to do that by specifying ‘-fno-common’ that basically tells GCC to not emit common symbols. What are common symbols?  ‘Using as’ (GNU Assembler) states that a common (.comm) symbol is a symbol that the linker can merge and if it finds no definition/initialisation, it will be treated as uninitialised data. There’s more… If it finds N definitions with different sizes, it will allocate space for the largest… and it’ll emit no warning unless you specify ‘–warn-common’ to the linker or ‘-Wl,–warn-common’ using the GCC driver.

If you see a big advantage of using common symbols in spite of ‘normal’ symbols inside a .bss section, please say so. The only benefit I can think of is the more compact definition of the symbol using .comm symbol, size[,alignment].

This is just Chapter 1 in a series that I’m planning to publish about my experience with C.

Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s