The source code notes: January 2015

Saturday, 31 January 2015

How to add or remove an element at the beginning of a linked list in C

In this post we expand the code given in the last post about linked lists (I'm in a hurry, bring me to the code ). We have seen how appending and removing an element on the tail of the list is done. To insert one on the opposite side ( head ), we don't need to traverse the list, as we can use the address of the first element in it

void push( list ** llist, int data )
{
list * curr = malloc( sizeof( list ));
curr->data = data;
curr->next = *llist;
*llist = curr;
printf("push: %d\n", data);
}

pushing an element on the linked list requires a change in the caller's memory. So, as with the function pop_tail(), with the function push() we need a direct reference to it and not just a copy. That's why we have a pointer to a pointer to a struct list as first parameter of the function ( see the previous post for a more detailed explanation ). The second parameter is, of course, the data stored in the new element. In this particular case, the new element, becomes also the new first element of our list, and, also, the new starting address. In the first two lines of code inside push(), we create a new element with its data load, element which still isn't linked to the list. In the third line of code, curr->next = *llist, we copy the actual first element of the list inside the next pointer of the newly created element ( curr->next -- the linking pointer ). Then we set the new first element of the linked list by copying the new one ( curr ) inside llist. Note how we dereference llist with the indirection operator ( * ), operation which allows us to access and modify the memory at the address stored inside llist.

To finish this part of the tutorial, we write a function which removes an element from the head's side of the linked list

int pop( list ** llist )
{
int r = (*llist)->data;
free( *llist );
*llist = (*llist)->next;
printf( "pop: %d\n", r );
return r;
}

again, the code above, is minimalistic and unsafe, as we only want it to show the basic of what it needs to be done for performing the actions on the data. As in the function push(), here we need to access the caller's memory. Thus we have a double pointer parameter. The parenthesis are necessary because the structure member access via pointer operator ( -> ) has a higher precedence on the indirection operator ( * ). Thus we use parenthesis to prioritize the dereferencing operation and access the data correctly.

The following is the complete code with some minors improvement

#include <stdio.h>
#include <stdlib.h>

typedef struct list
{
    int data;
    struct list * next;
}list;

list * create(int data)
{
list * first = malloc(sizeof(list));
first->data = data;
first->next = NULL;

return first;
}

void append( list * llist, int data )
{
list * curr = llist;
while( curr->next != NULL )
    {
      curr = curr->next;
    }

curr->next = create( data );
printf("append:%d\n", data);
}

int pop_tail( list ** llist )
{
list * curr = *llist;
if( curr == NULL )
    {
      printf("pop_tail:list is empty! nothing to pop\n");
      return;
    }

while( curr->next->next != NULL )
    {
      curr = curr->next;
    }

int retval = curr->next->data;
free( curr->next );
curr->next = NULL;
printf("pop_tail:%d\n",retval);

return retval;
}

void push( list ** l, int data )
{
list * c = malloc(sizeof(list));
c->data = data;
c->next = *l;
*l = c;
printf("push: %d\n", data);
}

int pop( list ** l )
{
list * c = *l;

if(c)
    {
      int r = (*l)->data;
      free(*l);
      *l = (*l)->next;
      printf("pop: %d\n", r);
      return r;
    }
else
    {
      printf("pop:list is empty! nothing to pop\n");
      return ;
    }
}

void print( list * llist )
{
   list * curr = llist;

if( curr == NULL ){
      puts("print:List empty!");
      return;
}

printf("print:");
while( curr != NULL )
    {
      printf("%d ", curr->data);
      curr = curr->next;
    }
puts(" ");
}

int main()
{
/* Initializes the linked list with 10. mylist = { 10 }     */
list * mylist = create(10);
puts("created list with 10 at first node");

/* Appends 4 elements. mylist = { 10, 20, 30, 40, 50 }      */
append(mylist, 20);
append(mylist, 30);
append(mylist, 40);
append(mylist, 50);
/* Pushes 2 elements. mylist = { 4, 2, 10, 20, 30, 40, 50 } */
push(&mylist, 2);
push(&mylist, 4);

print(mylist);

/* Pops 2 elements from tail. mylist = { 4, 2, 10, 20, 30 } */
int data = pop_tail(&mylist);
printf("pop_tail returned:%d\n", data);
data = pop_tail(&mylist);
printf("pop_tail returned:%d\n", data);
/* Pops 1 element from head. mylist = { 2, 10, 20, 30 }     */
data = pop(&mylist);
printf("pop returned:%d\n", data);

print(mylist);

return 0;
}

Wednesday, 28 January 2015

How to create a linked list in C

In the previous post we saw, in a very basic way, how linked lists use memory to store data differently from arrays. In this post we go through a very minimalistic implementation of a linked list.

First thing first, we need to define what we want to store inside the list, thus we need a structure wrapping up our data:

typedef struct list
{
int data;
struct list * next;
}list;

if you have read the previous post, this is self explanatory. By the way, the above code is just a simple C structure, defining the 'data model' ( or type, or object or whatever you like to call it ) that we will use to implement the linked list. As you can see, inside the structure, we have an integer representing the data we need to store in every node of the list. Also we have a pointer to a structure of the same type, which will be used to link the next node in the list to the current one. Before someone gets scandalized, I must say that the code I show here, is not written by following any 'good code practice', but to best represent and explain the arguments treated. Also, if you look for a 'ready to use' linked list implementation, this is not the place to look for it. My intentions are to make more clear for the reader how this type of data structure works. So the code tries to be easy to be understood, not clever or safe. Comments and corrections are very welcome.

As start, we write a function that will be used to initialize the first node of our linked list. The code is basic and the function has only one parameter, representing the integer data stored into the element created. The function returns a pointer to some memory allocated with malloc(), in which we store this element. To notice how the pointer first->next is nullified. This will be done for every node added on the tail of the list, as we go ahead by building it. The null pointer is needed as reference to know where the list terminates.

list * create(int data)
{
list * first = malloc(sizeof(list));
first->data = data;
first->next = NULL;

return first;
}

next, we write a function that will give us the possibility to add a node to the list. We can choose the side where we want to add the new node. Our function adds it to the tail side of the current node. Refer to the next picture to better understand what's happening inside the code:

the function declaration:

void append( list * llist, int data )
{
list * curr = llist;
while( curr->next != NULL )
    {
      curr = curr->next;
    }

    curr->next = malloc( sizeof( list ));
    curr->next->data = data;
    curr->next->next = NULL;
}

the two parameters represent a pointer to the linked list and the data inserted for storage into the element added. The pointer is copied inside curr, then curr is used to traverse the list with a while loop, from head to tail, until last element, the null pointer, is reached. Once there, the null pointer is set to point to the new element's memory, allocated with malloc(), and the next pointer of this new created element is set to null, to mark the list's end. When curr goes out of scope, the memory allocated on it with malloc() will be always available by using the caller's pointer to the list. This will be available to the program until free() is called by passing this particular pointer as argument.

Our next step will be writing a function which removes an element from the list and returns the integer data stored in that element. The operation is performed again on the tail side of the list, where last list's element containing data is removed.

int pop_tail( list ** llist )
{
list * curr = *llist;

while( curr->next->next != NULL )
    {
      curr = curr->next;

    }

int retval = curr->next->data;
free( curr->next );
curr->next = NULL;

return retval;
}

while in the append() function we just allocate some memory block, linking it to the caller's existing memory, we need pop_tail() to be able to change some of the caller's memory ( namely the llist variable ). The traditional method to allow a function to change its caller's memory is to pass a pointer to the caller's memory instead of a copy. So in C, to change an int in the caller, we pass a int* instead. To change a struct fraction, we pass a struct fraction* instead. To change an X, pass an X*. So in this case, the value we want to change is struct list*, so we pass a struct list** instead.
The first line of code, saves the address of the list in curr, which in turn is used to iterate with a while loop on each element in it. When curr->next->next points to the last element ( null ), curr->next points to the last list's element containing data. At this point, the while loop's test condition becomes false and the loop exits. The integer data in the element pointed by curr->next, is saved in retval, to be returned by the function, and the element is destroyed by passing its effective address ( curr->next ) to free(). Then, curr->next is set to null to mark the new end element of the list.

We are set to test our code, but we need a way to display the linked list's content, after we have started adding or removing data. Nothing more simple than that as the code - in a way - is already embedded in the function append(). In fact, all we need to do, is to iterate on the list, printing out each element's data.

void print( list * llist )
{
list * curr = llist;
printf("print:");

while( curr != NULL )
    {
      printf("%d ", curr->data);
      curr = curr->next;
    }
puts(" ");
}

as I said before, the code was written to illustrate a simple technique for the implementation of a linked list, without taking care of operational safety, nor error recovery techniques for avoiding errors such as segmentation faults. For example the print() function will throw this type of error, if it is called by passing an empty list ( null pointer ), so you need to take care to initialize with create() any list you intend to use, or re-initialize the current one , if all elements have been popped out. The same behaviour it's expected from pop_tail().
The following is the code testing the linked list implementation:

int main()
{
/* Initializes the linked list with 10. mylist = { 10 } */
list * mylist = create(10);

/* Appends 4 elements. mylist = { 10, 20, 30, 40, 50 } */
append(mylist, 20);
append(mylist, 30);
append(mylist, 40);
append(mylist, 50);

print(mylist);

/* Pops 3 elements from tail side. mylist = { 10, 20 } */
int data = pop_tail(&mylist);
printf("pop_tail:%d\n", data);
data = pop_tail(&mylist);
printf("pop_tail:%d\n", data);
data = pop_tail(&mylist);
printf("pop_tail:%d\n", data);
print(mylist);

return 0;
}

in the next post we will complete this implementation by adding more functionality, illustrating also how a dynamic stack may be built with linked lists.

Wednesday, 21 January 2015

Linked lists in C: memory usage and differences with arrays.

The linked list is a type of data structure which doesn't requires the programmer to know at compile time the exact size of the data he needs to store. Another peculiarity of the linked list is the dynamic nature of its size, which can be modified during the program execution. The elements of a linked list can be added to or removed from both the extremities, making this data structure an ideal storage structure for both stack and queue. This dynamical behavior results in the non-contiguity of the memory addresses to which the linked list's elements are stored at. Differently from the array, which is made of a unique block of contiguous memory, in the linked list the elements forming its building blocks are stored at random locations.

To better understand the difference, let's see how arrays store their data:

#include <stdlib.h>
#include <stdio.h>
int main()
{
   int myarray[80];
   int i;
   for ( i=0; i<80; i++ )
      printf("%p\n",myarray+i);

   return 0;
}

The above program prints the memory address for each element of myarray.
Compiling and running the code above will generate the following output:

0xbffca380
0xbffca384
0xbffca388
0xbffca38c
0xbffca390
0xbffca394
0xbffca398
0xbffca39c
0xbffca3a0
0xbffca3a4
...

Here are only reported the first 10 line of the actually 80 printed by the program. As we can see, the address of each element of myarray, increments by 4 on each iteration. This means that the cells of memory used to store myarray elements are contiguous. Thus, myarray elements can be accessed directly and randomly, using an index. If you run the program few times, the addresses will change each time because the program is run from the RAM of your PC. Nevertheless, they will always be contiguous and equally spaced ( the space is determined by the array's data type ).

That brings us to the next difference between array and linked list:
while the array elements can be accessed randomly, by using the [] operator,
accessing the linked list elements requires writing a loop which traverses all the elements of the list, until the desired one is reached. We shall see later the pros and cons of using a linked list instead of an array. For now we just want to understand how differently memory is allocated in a linked list, compared to an array. We will see this by rewriting the previous program using a linked list. Linked list are used exclusively with structure, so as first thing we define a structure representing an element of the list:

/* begin code */
#include <stdio.h>
#include <stdlib.h>

typedef struct item
{
int data;
struct item * next;
}item;

this structure defines the most basic elements a linked list can be made of: the data stored by the element and a pointer which is used to link the element to the next one in the list.

int main()
{
item * head, *node;
head = node;
node = malloc(sizeof(item));
node->data = 2;

In the previous part of the code we declare two pointers, head and node. We use malloc() to allocate the memory for the first element of the list. node points to this element and head, in turn, points to node; we should notice that a piece of memory allocated with malloc() it's always accessible by the program, until we call free(), even if we use malloc() inside a function. This is important to understand because in professional written programs, a linked list is generally created and manipulated through functions, and that it's possible thanks to the particular way in which malloc() allocates memory (on the heap).

node->next = malloc(sizeof(item));
node->next->data = 3;
node->next->next = NULL;

As we said before, an element of the linked list has (at least) one link to the next element. Here, we allocate memory for the second element while using the pointer next, contained in the first element, to point at it, so that they link together. Now, let's say stop, we don't need any more elements on this side of the list. To mark the end of a linked list, we nullify the last element, which in this case is represented by the pointer node->next->next.

item * tmp = malloc(sizeof(item));
tmp->data = 1;

head = tmp;
tmp->next = node;

In the code above we see the power and flexibility of linked lists. Until now, the element of the list have been added in a contiguous fashion, just like an array, with the difference that for arrays, memory is allocated at compile time. Above we add an element at the beginning of the list, to the opposite side from where we added the second element. This is done by allocating memory for an new element with malloc() and using tmp to point to that memory. Then we copy tmp to head ( which will point always to the first element of the list ) . We finish by linking this new added element to the list, by copying node to tmp->next.

What we do next it's entirely up to our needs. We can keep appending elements to any of the two sides of our list.
To finish the program, we see how to traverse the list, for printing out the data stored into it. To achieve this, we use the head pointer and a while loop:

while(head!=NULL)
    {
      printf( "address %p data %d\n", &head->next, head->data );
      head = head->next;
    }

free(tmp);           //frees node containing data=1
tmp = NULL;

free(node);          //frees node containing data=2
node = NULL;

free(node->next);    //frees node containing data=3
node->next = NULL;

return 0;
}/* end code */

The following is the output of the program:

address 0x934d02c value 1
address 0x934d00c value 2
address 0x934d01c value 3

As you can see the address of the elements doesn't follow the order in which the elements are printed out by the program, or to be more precise, the order in which they are placed inside the list.

The obvious advantage of using linked lists is the fact that their size can be modified during the execution of the program. This property is empowered even more by the possibility to expand the linked lists in any direction and on any of their elements. Though, this comports using some extra time and memory. Extra time, which we need for scanning when we must access an element that isn't the next during a list traversal. Extra memory, needed to maintain one or more pointers ( depending on the type and implementation of a linked list ) to link and navigate the list.
To exploit the possibilities offered by this kind of data structure, they should be used in all those situations where the size of data is unknown before the program is executed or during it's execution, like in real time computing. For example, compilers or interpreters construction represent one of those cases where linked lists shine out, compared to arrays. In the next post, we will see how to implement a linked list.


Graphic representation of a linked list		(en.wikipedia.org)

MathJax 3