paulgorman.org

The C Programming Language

Hello, World

int main(int argc, char *argv[]) {
    printf("Hello, world!\n");
    return 0;
}

Compile:

$ gcc -Wall hello.c -o hello
$ ls
hello*  hello.c
$ ./hello 
Hello, world!

A very simple Makefile:

hello:
	gcc -std=c99 -Wall -o hello hello.c

clean:
	rm hello

Basic Syntax

#include <stdio.h>
#include <stdlib.h>

/* This is a comment.  */

#define MYCONSTANT 42    /* Note: no semi-colon. */

int myglobal;    /* External/global variable available to all functions. */

int main() {
    int a = '2';
    char yesOrNo[4];
    scanf("%3s", yesOrNo);
    char mystring[] = "All the king's horses...";
    int myarray[] = {1, 2, 3};

    if (a >= 1) {
        puts("The number is greater than 1.");
    } else if (a == 0) {
        puts("Zero");
    } else {
        printf("The number is %i.\n", a);
    }

    int i = 0;
    do {
        switch(i) {
        case 0:
            puts("0");
            break;
        case 1:
            puts("1");
            break;
        case 9:
            puts("9");
            break;
        default:
            puts("Not 0, 1, or 9");
        }
        i++;
    } while (i != 10)

    while (i >= 0) {
        puts("i is positive");
        i--;
    }

    return 0;
}

The curly braces are optional for if blocks that execute a single statement.

Variables & Memory

Think about C as memory with some syntax sprinkled on top. A C variable is:

Writing int foo = 314; tells the compiler to pick a starting memory address, set aside enough subsequent bytes to hold an integer, and store the value 314 there.

Arrays work the same way, though the compiler sets aside enough consecutive bytes to store all the values (i.e.—an area of memory in bytes equal to size_of_data_type times number_of_elements_in_array).

Structs are similar, although the values are not necessarily stored in contiguous blocks of memory.

Pointers

A pointer is a variable. Its value is a memory address.

The * (splat/star/asterisk) is used when declaring and dereferencing (getting the pointed-at value of) a pointer. It's called the indirection operator, but might be read as value at address.

The & (ampersand) is used to get the memory address of a value.

Pointers have a type (int *x;). This is so that the compiler knows where the value pointed at ends, and enables pointer arithmetic (e.g. for x + 1 the compiler knows how far to move the pointer — the width of a char, the width of an int, etc. — based on the type of x).

Why user pointers? We can pass pointers into functions, so our functions can affect more variables than just the simple return value.

The heap and the stack

C memory can be managed statically, automatically, or dynamically. The difference is how long the memory lives (scope) and when it begins life (compile time versus run time). Static memory is set at compile time, along with the instruction code of the program, and is allocated for the entire runtime of the program. Automatic memory is allocated on the stack, and comes and goes as functions are called and returned. Dynamic memory management gives the programmer the most flexibility (along with more responsibility) in managing the lifetime of memory allocation. Dynamic memory is allocated from the heap during the lifetime of the process with malloc() and returned to the heap with free(). Because the memory location isn't know until it's allocated, the program accesses heap memory with a pointer returned by malloc(). The heap is much larger than the stack.

Every function has a frame (block of memory) in the stack. That is, when a function is called, all its values are pushed onto the stack in one frame/block, and that frame lives until the function returns. Stack space in managed as LIFO—last in, first out; the most recently allocated block is always the next block to be freed. All stack memory is local; when a function exits, its stack memory is freed. There is an OS-dependant limit on the size of variables which may be stored on the stack. If you try to use more memory than the stack can contain or call too many nested functions (with each one pushing an additional frame onto the stack), it causes a stack overflow, crashing the program.

The stack is faster than the heap, because the LIFO/contiguous nature of the stack has less overhead than allocating and freeing blocks of heap — it's just a matter of moving a pointer to indicate the top of the stack. Memory allocated in the heap can be located anywhere in the heap, and it can be deallocated in any order, requiring more bookkeeping operations across potentially non-contiguous memory locations.

(The LIFO stack is how recursive and nested functions work without clobber each other's memory—each new one gets a new frame on top of the stack, and when it returns its caller's frame moves back to the top of the stack.)

Each thread gets its own stack, but all the treads in a process usually share the same heap (and must coordinate safe access to that shared heap). Memory from a stack is reclaimed when a thread exits; memory from the heap lives for the lifetime of the process.

Variables on the heap are not so constrained in size as those on the stack. The size of stack is fixed at run time (or when a thread is created??), but the heap can grow during runtime. If you fail to free memory on the heap when you process ends (and before the variable falls out of scope), you create a memory leak; un-freed memory will be unavailable to other processes. When there are a lot of allocations and deallocations, the heap can become fragmented and slow (it takes more operations to allocate, read, write, and deallocate non-contiguous memory scattered across many locations in the heap)

Allocate memory on the heap if you need a large amount of it, or if you need to keep it around outside the scope of a function.

Memory layout of a Linux binary

An linux ELF binary gives some clue how memory is allocated at compile time and run time. The size command shows the amount of memory (bytes) in some of the executable's sections. The "text" section contains the program instructions. Any running copies of the program share the instructions loaded in memory. The "data" section contains all the variables initialized with values at compile time (statics, globals). The "bss" section contains only a number: the total memory size of uninitialized values the program will require at runtime. So, the number size shows for "bss" isn't bytes stored on disk; the value stored on disk is just the number shown (which will be the bytes of memory allocated when the executable runs). The sum of the "text" and "data" sections, on the other hand, are probably pretty near the total size of the executable file stored on disk; those sections will be read into memory when the program runs.

$ size /bin/ps
   text	   data	    bss	    dec	    hex	filename
  86386	   1544	 135016	 222946	  366e2	/bin/ps
$ ls -l /bin/ps 
-rwxr-xr-x 1 root root 93096 Sep 27 19:46 /bin/ps

readelf -a /bin/ps gives more of a peak into ELF executables.

Tooling

Source code becomes an executable in four steps. The pre-processor scans our source code, expands any include-files, conditional compilation instructions and macros, then hands off its output to the compiler. The compiler translates the source code into lower-level assembly code. The assembler transforms the assembly output by the compiler into object files (assembly with address offsets). The linker takes the object files output by the assembler, and packages them as one executable program.

The loader runs programs. It looks at the executable file stored on disk, checks its headers, allocates RAM (the majority as one block for the stack and heap, with the stack growing from the top down into the block and the heap growing from the bottom up), and copies the executable's Text and Data sections into primary memory. The loader then copies any command line arguments onto the stack, and feeds them to main(). A runtime linker will also be involved to handle loading any objects from shared libraries.

File types

.c
source code to be pre-processed
.i
source code not pre-processed
.h
header file (not compiled or linked)
.s
assembler code
.o
object file

Valgrind

sudo apt-get install valgrind

This suite of tools includes memcheck, which finds memory leaks, deallocation errors, etc. See the quick start guide and the explaination of error messages from memcheck.

valgrind --tool=memcheck --leak-check=yes --show-reachable=yes --num-callers=20 --track-fds=yes ./foo

Further Reading