Uninitialised global data in C – .bss section vs common symbols

Opposite as one might expect, compiler developers are humans and sometimes a simple issue makes us think about what we really know. Sometimes it looks like everyday that passes by I know that I know less and less about C. The good thing is that everyday I learn something new.

Let’s try to understand what happened today.

Uninitialised global data in C is treated a bit specially (compared with initialised global data) because the compiler has to make sure it’ll be initialised to zero, i.e., it really means that pointers will be initialised to NULL and basic types to zero.  Obviously that there’s no need to save all the zeros inside an executable (ELF in my case). It should be enough to save some information about the symbol as being uninitialised and the size of that symbol. The loader will be responsible to initialise the ‘uninitialised’ global data in load time.

Let’s start with the case of using initialised global data.

File A:

int arr[5] = {1,2,3,4,5};

File B:

int arr[5] = {1,2,3,4,5};

Compiling and linking both files, what do you reckon should be the result of the compilation?

bss2.o:(.data+0x0): multiple definition of `arr2′
bss.o:(.data+0x0): first defined here
collect2: ld returned 1 exit status

The linker complained about multiple definitions. Brilliant! This should avoid some bugs in the future.

Let’s remove the initialisations and see what happens:

File A:

int arr[5];

File B:

int arr[5];

Result : compiled and linked ELF without problems. Why didn’t the linker complain about both symbol definitions? Because the C Standard says nothing to prohibit. The C Standard specifies very little about those kind of issues. It basically only states that the the compiler should emit errors/warnings for syntax errors.

As you might have guessed by reading the title of this post, the problem lies on the definition in the assembly file of both symbols. If the compiler uses a special section inside an ELF to specify uninitialised data(.bss section), the linker will see both definitions of the same symbol and it will complain about it. It’ll emit a warning by default but  it’s something…
You can force GCC to do that by specifying ‘-fno-common’ that basically tells GCC to not emit common symbols. What are common symbols?  ‘Using as’ (GNU Assembler) states that a common (.comm) symbol is a symbol that the linker can merge and if it finds no definition/initialisation, it will be treated as uninitialised data. There’s more… If it finds N definitions with different sizes, it will allocate space for the largest… and it’ll emit no warning unless you specify ‘–warn-common’ to the linker or ‘-Wl,–warn-common’ using the GCC driver.

If you see a big advantage of using common symbols in spite of ‘normal’ symbols inside a .bss section, please say so. The only benefit I can think of is the more compact definition of the symbol using .comm symbol, size[,alignment].

This is just Chapter 1 in a series that I’m planning to publish about my experience with C.


Vim functionality in Emacs – Increment next number

Being a software developer, developing tests is part of my daily routine. Some tests that I create are to identify if certain rules of a compiler are triggered so it is a bit common to create simple tests with variable names like var1, var2, var3… Also several function that I create are called func_0, func_1…  mainly because they don’t represent anything special.

Vim has a functionality by default that allows the number number being incremented “automatically” by using the key ‘a’ in edition mode. The cursor basically jumps to the next number and increments it. This is helpful to create a list of variables like var1, var2, var3, etc…

(defun inc-number ()
    (let ((end (search-forward-regexp "[0-9]+"))
        (beg (search-backward-regexp "[^0-9]+")))
      (forward-char 1)
          (+ 1 (string-to-number
             (delete-and-extract-region (+ beg 1) end)))))
      (backward-char 1)))

The following line just sets a key sequence to call the function.

(global-set-key "\C-q" 'inc-number)

Gnome-shell and Firefox JavaScript Library

You can easily find information about how to compile gnome-shell from the repositories using jhbuild. You can also find a page with common problems that may happen during compilation (http://live.gnome.org/GnomeShell/SwatList). I’m writing this because all the solutions I found to build (using Debian Squeeze) are not entirely correct. For instance, during the compilation of gjs , you can find an error : “Cannot find where Firefox Javascript Library lives”. I found here that you can use the experimental repository. The problem with that is that the version you find in that repository requires dependencies that are not trivial to solve. If you use the other link some links are broken and some take you to a page where you don’t have a .deb package compiled to your architecture.

My solution is to download xulrunner 1.9.2 from here and after unpacking, use the command xulrunner –register-global.

Future of Search Engines. Inference and Natural Language Processing.

Since this is my very first essay, I’ll make it short. I’m writing because I want to read this blog 3 ou 4 years from now and I want to know if anything I wrote was implemented.

A little bit of context of search engines. Currently, the main search engines are Google, Bing and Yahoo! I believe Google has by far the best “sorting” algorithm of them all. It’s hard for me to imagine all the parameters used to compute the best results. Bing is the youngest of the three and experienced an interesting growth. I read an interesting article a couple of weeks ago and it can be found on HN (http://myprasanna.posterous.com/birth-and-death-of-microsoft-bing).

I’m writing this because I believe a great functionality that can be incorporated in search engines is inference and natural language processing. I use Google a lot in my life and I believe I have to change my questions to receive the best results. For instance, if I want to know the name of the singer of The Shins, or I’ll google “The shins Wikipedia” or simply “The shins” and then I look in the top results to see the best one where I can find the name of the members of the band. That’s not that bad but when you have systems like IBM Watson, it looks a little bit outdated. I’d love to ask : “members of The Shins” and receive the answer. Just think with me for a second. What is the complexity of doing that? Understand that the “The Shins” is a band. Understand that the user wants to know information about “members”. What kind of information? Just names? Previous projects? Biography? What happened to Semantic Web ? Wasn’t this kind of problem that Semantic Web was created to solve?

The previous example was more declarative and probably more difficult than this one. What about inference? What if the search engine can infer that some information is “True” or “False”. I studied Logic Programming (Prolog) for two semesters and it is a very interesting programming paradigm. Prolog is a language used in programming contests with great results. For instance, what if I want to know if an arbitrary city is the capital of a country ? E.g.: “Is Lisbon the capital of Portugal” ? The search engine could respond : True or False. If negative it could even give the correct answer.

I really like Quora. It is a great project that I can see myself spending 16h/day to make it better. I believe a system like Quora can be incorporated in a Search Engine rather than a search engine returning links to Quora.

I’ll not be surprised if the next great company will be the company that can provide that kind of system. I believe Google is on the right path. Right know and 41 years after moon landing (most amazing event ever?), we’re still manipulating data and using the same programming abstractions the same way we did 50 years ago ? I know that I’m not that old but I’m a big fan of history. I’ll finish with a link to a very interesting video of Robert Martin (Object Mentor) at RailsConf 2010 where he talks about the evolution of programming languages. The video is a little bit long and if you don’t want to see it, basically it says that after 50 years and an increase in processing power and storage by 25 times (orders of magnitude), we’re still programming the same way we did in Fortran.

Wireless Tools. Debian Squeeze

Installing a new Operating System can be extremely painful if you don’t have another computer to download drivers or tools. This happens because you need *The Internet* to install drivers to access *The Internet*.

Since I didn’t have an Ethernet cable and the drivers for my wireless card aren’t open enough to be include in the Debian CD, I had to download the firmware and wireless-tools from another PC. Not a problem. For my big surprise, gnome-network-manager was not part of the Debian squeeze CD1. Also not a problem. I’m a Linux user and I remember that in the beginning, you had to configure you wireless connection by hand. 🙂

Command 1. Check available wireless networks : iwlist wlan0 scanning
Command 2. Up and down network interfaces : ifconfig wlan0 up/down
Command 3. DCHP. dhclient -r wlan0 (release) / dhclient wlan0

But usually your router uses WPA1 or WPA2 for authentication. And I believe using ifconfig you can only access WEP networks. Here’s the interesting part and why I’m writing this post. wpa_supplicant is the way to go. Here’s the config file I used.



If you have a *common* network card, you can use the generic Linux drivers : wext. wpa_supplicant -iwlan0 -c/path/to/wpa_supplicant.conf -Dwext -dd

Having connection, you can start installing gnome-wireless-tools and all the dependencies that comes with it. Otherwise it will be a not very pleasant experience installing gnome-wireless-tools by hand.