The source code notes: 2015

Thursday, 10 September 2015

Image processing in C: enbossing filter for PPM images

This is the last post of the series about image manipulation for C/C++ beginners. We will see how to create a simple filter to process an image using C. The filter we will implement is named 'embossing filter'. We can find this type of filter image in manipulation programs and image packages like GIMP, Photoshop and ImageMagick.

The embossing filter implementation

We will write a function that takes a pointer to an image as an argument and returns another pointer to a filtered image. As a start, the function calculates the difference between each pixel channel's value and the value of the channels of its upper left neighbour. Then saves the highest absolute difference, adding to it 128. If there are multiple equal differences with different signs (e.g. -3 and 3), the function will favour the red channel's difference first, then the greens, then the blues. A value above 255 and below 0 will be set to 255 or 0, respectively.

The following code is the C implementation of the embossing filter:

Image * emboss_image(Image * img)
{
    //get size of image
   unsigned int w = getImgWidth(img);
   unsigned int h = getImgHeight(img);

    //create a new image to return
    Image * emb_img = new_image(w, h);

    Pixel diff;
    Pixel upleft;
    Pixel curr;
    char maxDiff, tmp = 0;
    unsigned char v;

    //for each pixel
    int x, y;
    for(y=1; y<h; y++)
    {
        for(x=1; x<w; x++)
        {

               if(maxDiff > tmp)
                  tmp = maxDiff;
               else
                  maxDiff = tmp;

               //get upper-left pixel value
               upleft = getPixel(img, x-1,y-1);
                 //get current pixel value
               curr = getPixel(img, x, y);
                 //find difference between channels
               diff.r = curr.r - upleft.r;
                 diff.g = curr.g - upleft.g;
               diff.b = curr.b - upleft.b;

                 //for equal values choose in favor of red
               if((abs(diff.r)==diff.g && diff.g > diff.b)||(abs(diff.r)==diff.b && diff.b > diff.g))
                  maxDiff = diff.r;

                 //or find the maximum value
               else
                   maxDiff = max(diff.r,max(diff.g,diff.b));

                 //add 128 to the maximum difference
               v = 128 + maxDiff;
                 //remove values above 255 or below 0
               if(v<0)v=0;
               if(v>255)v=255;

               //create and set the pixel in emb_img
               Pixel val2 = {v,v,v};
               setPixel(x,y,&emb_img,val2);
        }
    }
   return emb_img;
}

Loading a PPM image in memory

Before we can process an image, we require the capacity to load and retrieve this from our computer's memory. Luckily for us, achieving this with a PPM image, it's simple. Simple as reading a text file. The following C function reads a PPM image's data and stores it inside an Image structure. Then, returns a pointer to this structure:

Image * read_PPM(const char *filename)
{
     char buff[16];
     Image * img;
     int ch;
     int rgb;

     //open and read image file
     FILE * fp = fopen(filename, "rb");
     if (!fp) {
          fprintf(stderr, "Unable to open file '%s'\n", filename);
          exit(1);
     }

     //read image format
     if (!fgets(buff, sizeof(buff), fp)) {
          perror(filename);
          fclose(fp);
          exit(1);
     }

    //parse magic number
    if (buff[0] != 'P' || buff[1] != '6') {
         fprintf(stderr, "Invalid image format (must be 'P6')\n");
         fclose(fp);
         exit(1);
    }

    //parse comment
    ch = getc(fp);
    while (ch == '#') {
        while (getc(fp) != '\n') ;
            ch = getc(fp);
    }
   ungetc(ch, fp);

    int width, height;
    //parse width and height
    if (fscanf(fp, "%d %d", & width, & height) != 2) {
         fprintf(stderr, "Invalid image size (error loading '%s')\n", filename);
         exit(1);
    }

    //parse max rgb
    if (fscanf(fp, "%d", &rgb) != 1) {
         fprintf(stderr, "Invalid rgb component (error loading '%s')\n", filename);
         fclose(fp);
         exit(1);
    }

    //check max rgb value correctness
    if (rgb != RGB) {
         fprintf(stderr, "'%s' is not a 24-bits image.\n", filename);
         fclose(fp);
         exit(1);
    }


   //create new image
    img = new_image(width, height);

    int bytes;
    //read image data from file and store it inside pixel array
    if ((bytes=fread(img->data, 3 * img->width, img->height, fp)) != img->height) {
         fprintf(stderr, "Error loading image '%s'\n", filename);
         exit(1);
    }

    fclose(fp);
    return img;
}

By reading the comment in the code, it should be easy to understand what is happening. We start by opening the PPM image which we want to process. Once we did that, we need to parse the header, which will provide us with all the information's we need to process the image, as the size and the maximum value for the intensity of the colour. Also, we check for comments, as they can be present in a PPM image file, after the magic number. After we finished reading the header, we will create a new image and will store in it the data from the file. At this point, the file can be closed and we can return a pointer to our Image structure.

If you have read the emboss function, you may have noticed that we call a function called getPixel. With this function we retrieve the RGB value of each pixel stored into the Image structure pointed to by the Image pointer argument :

Pixel getPixel(Image * i, int x, int y){
    return i->data[getImgWidth(i)*y + x];
}

As you can see, the function returns an element of the pixel array of the image passed as an argument. The second and the third parameters represent the coordinates of the pixel in the image plane. If the index used in the array looks weird for you, just remember that if we store the pixels of an image in a 1d array, by using the x y image plane pixel's coordinate, we can access a pixel in the 1d array by indexing this with the integer resulting from the following calculation:


                          pixel position = image_width * y + x

from this page you can copy or download the complete code for this post. Below you can see the image before and after applying the emboss filter:


*before*	*after*

Conclusions

The image formats implemented in the Netpbm package, and in particular, the PPM format, represent a great and easy media to work with, especially for beginners who wish to start their journey to learning the art of computer graphics, to whom this post series was dedicated.

Tuesday, 8 September 2015

Create an image and a pixel structure for PPM images in C

This is the third of a series of posts dedicated to the C/C++ beginners who want to have some fun by working with images, without having to go through the fuss of learning some image manipulation library. It's my opinion that learning to program by getting some kind of visual feedback from it, improves the experience and facilitates the task a lot. Thought, if you are a complete C beginner or if C is the first language you are learning, handling anything outside of the standard library, might confuse and discourage you.

This is where the Netbpm image formats come handy, and this is why I'm using them in this series. They don't require us to install any library, thus we don't need to learn any new API, so we can concentrate on the task of learning C or C++, and on play with image processing or whatever we like to do with an image.

We already saw, in the previous post , how to create a small PPM image, one of the image formats implemented in the Netpbm package. But I bet that, if you are reading this post, you wish to do something more than just creating a randomly coloured image.

A PPM image is an RGB image, so it has three channels for each one of its pixels. We can easily represent this in code by using a structure:


typedef struct {
    unsigned char r,g,b;
}Pixel;

an image is made of pixels (and a lot of them), which form a quadrilateral shape. So, the next natural step in our coding, it's to write another structure to represent this shape:

typedef struct {
    unsigned int width, height;
    Pixel * data;
}Image;

the data field, it's a pointer used to read and to write the values in the array of pixels forming our quadrilateral shape, or more precisely, our image, the size of which will be width x height. Remember that each pixel has three channels, which we have represented in the Pixel structure with the fields, r, g and b. In a PPM image, each channel contains 1 byte of data(8 bits). This makes the size of the image equals to width x height x 3 bytes large, plus few bytes for the header.

In the previous post we used a nested loop to write the data of each pixel. Now that we have structured our data, we can write a function which will help us to write more efficient code. But first we are going to write two helper functions, so our code will be more readable:

int getImgWidth(Image * i){ return i->width; }
int getImgHeight(Image * i){ return i->height; }

the next function will make it very easy for us to set the value of any of the pixels in the image, without traversing the image with a loop until the desired pixel's position, first:

void setPixel(int x, int y, Image ** data, Pixel value)
{
    Pixel * pixel = (*data)->data;
    pixel[x + y * getImgWidth(*data)].r = value.r;
    pixel[x + y * getImgWidth(*data)].g = value.g;
    pixel[x + y * getImgWidth(*data)].b = value.b;

}

The first two parameter represent the coordinate of the pixel. The third parameter is the image which the pixel belongs to. We need a pointer to pointer because when in C we want to modify the argument of a function, we need to pass it by reference, and that is, we need to pass the address of the "thing" we want to modify, hence a pointer to it. Now, because we already handle our image with a pointer (we will see this later), to modify the image through the function, we need a pointer to a pointer parameter.
The fourth parameter is the new value for the pixel to.

The following is a very stripped version of the main function for the test program: I removed error checking and other stuff, to make more clear to the beginner what is happening in the code:

int main() {

char filename[80] = "myimage.ppm";
//create and open image file
FILE * fp = fopen(filename, "wb");

//allocate immage memory
Image * i = malloc(sizeof(Image));

//initialize image structure
i->width = 256;
i->height = 256;

//allocate pixels memory
i->data = malloc(sizeof(Pixel) * getImgWidth(i) * getImgHeight(i));



memset(i->data, 255, sizeof(Pixel) * getImgWidth(i) *   getImgHeight(i));

//initialize a pixel to red
Pixel red = {255,0,0};
//and set with it the central pixel of the image
setPixel(getImgWidth(i)/2, getImgHeight(i)/2, &i, red);

//write the image header to the file
fprintf(fp, "P6 %d\n %d\n 255\n", getImgWidth(i), getImgHeight(i));

//write the image data to file
fwrite(i->data, sizeof(Pixel), getImgWidth(i) * getImgHeight(i), fp);

fclose(fp);
return 0;
}

If we go through the code, it's quite intuitive and easy to read. We create a new file for the image and open it, then we allocate memory and initialize the variables storing its size (width and height). After we allocate memory for the pixel array. With memset, we fill all the element of the pixel array with the value 255. Then, we initialize a pixel to red which we will use in our setPixel function to paint the image's central pixel.

Below, the image generated by the code:

One pixel image

In the browser it's hard to see, but if you look well at the image generated by the code on your computer, you will notice that the central pixel is painted in red. From this link you can download the complete source for this post.

Thursday, 3 September 2015

Create colour images using only standard C

In this tutorial we will see how to work with PPM images. For those of you who have missed the previous post, PPM refers to one of the image formats implemented in the Netpbm package.

In the previous post, we saw how to create a very small PGM image using a text file. We could have been doing it with any other programming language, as PGM is in text format. We will use C to create a PPM image, but before we must understand the structure of this format.

In a PGM image, each pixel is represented with one numerical value. With PPM , things change a little. Because PPM is an RGB format, we need three numerical value to represent the three channels of each pixel. If we write the data directly to a file, for three red pixels, after the header, we need to write:

255 0 0 255 0 0 255 0 0 ...

The magic number for a PPM image is P3, if the file is written in ascii mode, and P6 if it's written directly in binary. The following is a program that create a 5 by 3 PPM image containing random colours:

#include <stdlib.h>
#include <stdio.h>
#include <math.h>

int main()
{
FILE * file = fopen("image.ppm", "wb");

if(!file)
    {
      printf("Error! Cannot open file image.ppm");
      exit(EXIT_FAILURE);
    }

int width = 5;
int height = 3;
int i, j;

//header
fprintf(file, "P6\n%d %d\n255\n", width, height);

for(i=0; i<height; i++)
    for(j=0; j<width; j++)
      {
        fputc((random()%255), file);//red
        fputc((random()%255), file);//green
fputc((random()%255), file);//blue
      }
fclose(file);
return 0;
}

The image below is a scaled version of the one generated by the code above on my machine:

In the next post we will do some image processing, using only a PPM image and our best friend language, C.

Tuesday, 1 September 2015

Working with images in C and C++ for beginners: the Netpbm image format

C and C++ don't have any standard library to work with graphics, GUI or media, diversely from some other programming languages, which support natively such kinds of facilities. Nevertheless, a myriad of open source libraries written in C and C++ does exist, which can be used to manipulate virtually any image or media type and to build GUIs for any kind of device or platform.

However, using third-party libraries makes writing a program that operates on images, for the C/C++ beginner, a little tricky. This is due, in part, to the complexity of the installation and configuration process of these libraries, in part, to the new API which a user must learn before using them.

But, if you are a beginner and all you want is to experiment with images in C or C++, regardless of the image's format, there's an easy shortcut to it.

The Netpbm image formats

In the Netpbm package are implemented various portable image formats, supported by all major platforms. What makes these formats great for beginners, is the fact that they can be used without installing any development library, thus without learning any new API.

In fact, in the Netbpm image formats, data can be represented in Ascii code, which makes creating an image file as easy as writing to a text file. Data can also be written directly in binary mode, using always the standard C/C++ functions, normally used to write text to a common text stream.

The main image types among the Netpbm are three: PBM, PGM and PPM.

The P stands for "portable", while the M stands for "map". The B, the G and P in the middle, stand for "bitmap", "grey" and "pixmap", respectively. Also, there is an alpha and an arbitrary depth format.

The header section of a Netpbm image is very minimalistic and simple. It is made by a magic number, used as a descriptor for the type of data, and some values, defining the size of the image and the maximum number of colours in it.

The header attributes are as follow:

image extension	magic number	image colours
PBM	P1(ascii) P4(binary)	2 black/white
PGM	P2(ascii) P5(binary)	255 Greys
PPM	P3(ascii) P6(binary)	255 RGB

So, for example, if we wanted to create an image with a width of 7 and with a height of 6 containing only grey pixels, with a value between 0 and 255, we'd write the following header:

P2            
7 6
255

Soon after the header, we can start writing the image's data, inserting each pixel's value in the exact order in which it appears on the screen



0 0 0 0 0 0 0
0 255 255 255 255 255 0
0 255 128 128 128 255 0
0 255 128 128 128 255 0
0 255 255 255 255 255 0
0 0 0 0 0 0 0

Save the above text in a file and name myimage.pgm, and you have a nice greyscale Netpbm image.

Now, this is a very small image, but it's big enough to understand how "it" works. The one below is a scaled version:

In the next post, we will see how it's simple to create cool images using C or C++.

Monday, 2 February 2015

How to remove a specific element from a linked list

In some data structure like the stack, we may want to pop elements content always from the head of the linked list, or, as in the queue case, always from the tail. We may also face very easily a case where the list must be scanned, indexed or sorted in some way, to get to the element we wish to extract the data from. In a case like this, we must be able to search the data, by comparing each piece of it with some kind of query, like an index, a string, or whatever we can use as reference to what it's stored inside the structure. To stick to the style of this tutorial, we will take in to consideration the simplest case scenario, by writing a function that will allow us to check a linked list and determine if it contains a specific integer value. The following function expands on the code given in the previous post

int pop_val( list ** llist, int val )
{
list * curr = *llist;
list * prev = NULL;

while( curr != NULL )
    {
      if( curr->data == val )
        {
          if( prev == NULL )
            {
            *llist = curr->next;
    }
    else
    {
          prev->next = curr->next;
          }

    free( curr );
      printf( "pop_val: %d\n", val );
return 0;
}

      prev = curr;
      curr = curr->next;
    }

printf( "pop_val:value %d not found!\n", val );
return 1;
}

the function will return 0, if the value passed is found, 1 otherwise.
The most important thing to note is the introduction of the variable prev, which is needed to keep track of the previous element. We need to store an element in case we scan it without finding the value we are looking for. Once we encounter the element containing the right value

      if( curr->data == val )

if pre is null, which means that we have found the value in the first element of the list, we simply fix the first element, by copying inside it the second one

      *llist = curr->next;

and deleting the first through curr

      free(curr);

else we copy the next list's element inside prev own next pointer

      prev->next = curr->next;

reconnecting the list before the current element is deleted.
In each iteration we copy curr in prev, if we didn't find in it the value searched. This is done just before the pointer is advanced to the next element

      prev = curr;
      curr = curr->next;

to test the function pop_val(), we will use the code given in the previous post .
The following is the new main function of the test code

int main()
{
/* Initializes the linked list with 10. mylist = { 10 }     */
list * mylist = create(10);
puts("created list with 10 at first node");

/* Appends 4 elements. mylist = { 10, 20, 30, 40, 50 }      */
append(mylist, 20);
append(mylist, 30);
append(mylist, 40);
append(mylist, 50);
/* Pushes 2 elements. mylist = { 4, 2, 10, 20, 30, 40, 50 } */
push(&mylist, 2);
push(&mylist, 4);

print(mylist);

/* Pops 2 elements from tail. mylist = { 4, 2, 10, 20, 30 } */
int data = pop_tail(&mylist);
printf("pop_tail returned:%d\n", data);
data = pop_tail(&mylist);
printf("pop_tail returned:%d\n", data);
/* Pops 1 element from head. mylist = { 2, 10, 20, 30 }     */
data = pop(&mylist);
printf("pop returned:%d\n", data);

/* Pops the element storing 10. mylist = { 2, 20, 30 }      */
pop_val(&mylist, 20);

/* Pops the element storing 2. mylist = { 20, 30 }          */
pop_val(&mylist, 2);
print(mylist);

return 0;
}

Saturday, 31 January 2015

How to add or remove an element at the beginning of a linked list in C

In this post we expand the code given in the last post about linked lists (I'm in a hurry, bring me to the code ). We have seen how appending and removing an element on the tail of the list is done. To insert one on the opposite side ( head ), we don't need to traverse the list, as we can use the address of the first element in it

void push( list ** llist, int data )
{
list * curr = malloc( sizeof( list ));
curr->data = data;
curr->next = *llist;
*llist = curr;
printf("push: %d\n", data);
}

pushing an element on the linked list requires a change in the caller's memory. So, as with the function pop_tail(), with the function push() we need a direct reference to it and not just a copy. That's why we have a pointer to a pointer to a struct list as first parameter of the function ( see the previous post for a more detailed explanation ). The second parameter is, of course, the data stored in the new element. In this particular case, the new element, becomes also the new first element of our list, and, also, the new starting address. In the first two lines of code inside push(), we create a new element with its data load, element which still isn't linked to the list. In the third line of code, curr->next = *llist, we copy the actual first element of the list inside the next pointer of the newly created element ( curr->next -- the linking pointer ). Then we set the new first element of the linked list by copying the new one ( curr ) inside llist. Note how we dereference llist with the indirection operator ( * ), operation which allows us to access and modify the memory at the address stored inside llist.

To finish this part of the tutorial, we write a function which removes an element from the head's side of the linked list

int pop( list ** llist )
{
int r = (*llist)->data;
free( *llist );
*llist = (*llist)->next;
printf( "pop: %d\n", r );
return r;
}

again, the code above, is minimalistic and unsafe, as we only want it to show the basic of what it needs to be done for performing the actions on the data. As in the function push(), here we need to access the caller's memory. Thus we have a double pointer parameter. The parenthesis are necessary because the structure member access via pointer operator ( -> ) has a higher precedence on the indirection operator ( * ). Thus we use parenthesis to prioritize the dereferencing operation and access the data correctly.

The following is the complete code with some minors improvement

#include <stdio.h>
#include <stdlib.h>

typedef struct list
{
    int data;
    struct list * next;
}list;

list * create(int data)
{
list * first = malloc(sizeof(list));
first->data = data;
first->next = NULL;

return first;
}

void append( list * llist, int data )
{
list * curr = llist;
while( curr->next != NULL )
    {
      curr = curr->next;
    }

curr->next = create( data );
printf("append:%d\n", data);
}

int pop_tail( list ** llist )
{
list * curr = *llist;
if( curr == NULL )
    {
      printf("pop_tail:list is empty! nothing to pop\n");
      return;
    }

while( curr->next->next != NULL )
    {
      curr = curr->next;
    }

int retval = curr->next->data;
free( curr->next );
curr->next = NULL;
printf("pop_tail:%d\n",retval);

return retval;
}

void push( list ** l, int data )
{
list * c = malloc(sizeof(list));
c->data = data;
c->next = *l;
*l = c;
printf("push: %d\n", data);
}

int pop( list ** l )
{
list * c = *l;

if(c)
    {
      int r = (*l)->data;
      free(*l);
      *l = (*l)->next;
      printf("pop: %d\n", r);
      return r;
    }
else
    {
      printf("pop:list is empty! nothing to pop\n");
      return ;
    }
}

void print( list * llist )
{
   list * curr = llist;

if( curr == NULL ){
      puts("print:List empty!");
      return;
}

printf("print:");
while( curr != NULL )
    {
      printf("%d ", curr->data);
      curr = curr->next;
    }
puts(" ");
}

int main()
{
/* Initializes the linked list with 10. mylist = { 10 }     */
list * mylist = create(10);
puts("created list with 10 at first node");

/* Appends 4 elements. mylist = { 10, 20, 30, 40, 50 }      */
append(mylist, 20);
append(mylist, 30);
append(mylist, 40);
append(mylist, 50);
/* Pushes 2 elements. mylist = { 4, 2, 10, 20, 30, 40, 50 } */
push(&mylist, 2);
push(&mylist, 4);

print(mylist);

/* Pops 2 elements from tail. mylist = { 4, 2, 10, 20, 30 } */
int data = pop_tail(&mylist);
printf("pop_tail returned:%d\n", data);
data = pop_tail(&mylist);
printf("pop_tail returned:%d\n", data);
/* Pops 1 element from head. mylist = { 2, 10, 20, 30 }     */
data = pop(&mylist);
printf("pop returned:%d\n", data);

print(mylist);

return 0;
}

Wednesday, 28 January 2015

How to create a linked list in C

In the previous post we saw, in a very basic way, how linked lists use memory to store data differently from arrays. In this post we go through a very minimalistic implementation of a linked list.

First thing first, we need to define what we want to store inside the list, thus we need a structure wrapping up our data:

typedef struct list
{
int data;
struct list * next;
}list;

if you have read the previous post, this is self explanatory. By the way, the above code is just a simple C structure, defining the 'data model' ( or type, or object or whatever you like to call it ) that we will use to implement the linked list. As you can see, inside the structure, we have an integer representing the data we need to store in every node of the list. Also we have a pointer to a structure of the same type, which will be used to link the next node in the list to the current one. Before someone gets scandalized, I must say that the code I show here, is not written by following any 'good code practice', but to best represent and explain the arguments treated. Also, if you look for a 'ready to use' linked list implementation, this is not the place to look for it. My intentions are to make more clear for the reader how this type of data structure works. So the code tries to be easy to be understood, not clever or safe. Comments and corrections are very welcome.

As start, we write a function that will be used to initialize the first node of our linked list. The code is basic and the function has only one parameter, representing the integer data stored into the element created. The function returns a pointer to some memory allocated with malloc(), in which we store this element. To notice how the pointer first->next is nullified. This will be done for every node added on the tail of the list, as we go ahead by building it. The null pointer is needed as reference to know where the list terminates.

list * create(int data)
{
list * first = malloc(sizeof(list));
first->data = data;
first->next = NULL;

return first;
}

next, we write a function that will give us the possibility to add a node to the list. We can choose the side where we want to add the new node. Our function adds it to the tail side of the current node. Refer to the next picture to better understand what's happening inside the code:

the function declaration:

void append( list * llist, int data )
{
list * curr = llist;
while( curr->next != NULL )
    {
      curr = curr->next;
    }

    curr->next = malloc( sizeof( list ));
    curr->next->data = data;
    curr->next->next = NULL;
}

the two parameters represent a pointer to the linked list and the data inserted for storage into the element added. The pointer is copied inside curr, then curr is used to traverse the list with a while loop, from head to tail, until last element, the null pointer, is reached. Once there, the null pointer is set to point to the new element's memory, allocated with malloc(), and the next pointer of this new created element is set to null, to mark the list's end. When curr goes out of scope, the memory allocated on it with malloc() will be always available by using the caller's pointer to the list. This will be available to the program until free() is called by passing this particular pointer as argument.

Our next step will be writing a function which removes an element from the list and returns the integer data stored in that element. The operation is performed again on the tail side of the list, where last list's element containing data is removed.

int pop_tail( list ** llist )
{
list * curr = *llist;

while( curr->next->next != NULL )
    {
      curr = curr->next;

    }

int retval = curr->next->data;
free( curr->next );
curr->next = NULL;

return retval;
}

while in the append() function we just allocate some memory block, linking it to the caller's existing memory, we need pop_tail() to be able to change some of the caller's memory ( namely the llist variable ). The traditional method to allow a function to change its caller's memory is to pass a pointer to the caller's memory instead of a copy. So in C, to change an int in the caller, we pass a int* instead. To change a struct fraction, we pass a struct fraction* instead. To change an X, pass an X*. So in this case, the value we want to change is struct list*, so we pass a struct list** instead.
The first line of code, saves the address of the list in curr, which in turn is used to iterate with a while loop on each element in it. When curr->next->next points to the last element ( null ), curr->next points to the last list's element containing data. At this point, the while loop's test condition becomes false and the loop exits. The integer data in the element pointed by curr->next, is saved in retval, to be returned by the function, and the element is destroyed by passing its effective address ( curr->next ) to free(). Then, curr->next is set to null to mark the new end element of the list.

We are set to test our code, but we need a way to display the linked list's content, after we have started adding or removing data. Nothing more simple than that as the code - in a way - is already embedded in the function append(). In fact, all we need to do, is to iterate on the list, printing out each element's data.

void print( list * llist )
{
list * curr = llist;
printf("print:");

while( curr != NULL )
    {
      printf("%d ", curr->data);
      curr = curr->next;
    }
puts(" ");
}

as I said before, the code was written to illustrate a simple technique for the implementation of a linked list, without taking care of operational safety, nor error recovery techniques for avoiding errors such as segmentation faults. For example the print() function will throw this type of error, if it is called by passing an empty list ( null pointer ), so you need to take care to initialize with create() any list you intend to use, or re-initialize the current one , if all elements have been popped out. The same behaviour it's expected from pop_tail().
The following is the code testing the linked list implementation:

int main()
{
/* Initializes the linked list with 10. mylist = { 10 } */
list * mylist = create(10);

/* Appends 4 elements. mylist = { 10, 20, 30, 40, 50 } */
append(mylist, 20);
append(mylist, 30);
append(mylist, 40);
append(mylist, 50);

print(mylist);

/* Pops 3 elements from tail side. mylist = { 10, 20 } */
int data = pop_tail(&mylist);
printf("pop_tail:%d\n", data);
data = pop_tail(&mylist);
printf("pop_tail:%d\n", data);
data = pop_tail(&mylist);
printf("pop_tail:%d\n", data);
print(mylist);

return 0;
}

in the next post we will complete this implementation by adding more functionality, illustrating also how a dynamic stack may be built with linked lists.

Wednesday, 21 January 2015

Linked lists in C: memory usage and differences with arrays.

The linked list is a type of data structure which doesn't requires the programmer to know at compile time the exact size of the data he needs to store. Another peculiarity of the linked list is the dynamic nature of its size, which can be modified during the program execution. The elements of a linked list can be added to or removed from both the extremities, making this data structure an ideal storage structure for both stack and queue. This dynamical behavior results in the non-contiguity of the memory addresses to which the linked list's elements are stored at. Differently from the array, which is made of a unique block of contiguous memory, in the linked list the elements forming its building blocks are stored at random locations.

To better understand the difference, let's see how arrays store their data:

#include <stdlib.h>
#include <stdio.h>
int main()
{
   int myarray[80];
   int i;
   for ( i=0; i<80; i++ )
      printf("%p\n",myarray+i);

   return 0;
}

The above program prints the memory address for each element of myarray.
Compiling and running the code above will generate the following output:

0xbffca380
0xbffca384
0xbffca388
0xbffca38c
0xbffca390
0xbffca394
0xbffca398
0xbffca39c
0xbffca3a0
0xbffca3a4
...

Here are only reported the first 10 line of the actually 80 printed by the program. As we can see, the address of each element of myarray, increments by 4 on each iteration. This means that the cells of memory used to store myarray elements are contiguous. Thus, myarray elements can be accessed directly and randomly, using an index. If you run the program few times, the addresses will change each time because the program is run from the RAM of your PC. Nevertheless, they will always be contiguous and equally spaced ( the space is determined by the array's data type ).

That brings us to the next difference between array and linked list:
while the array elements can be accessed randomly, by using the [] operator,
accessing the linked list elements requires writing a loop which traverses all the elements of the list, until the desired one is reached. We shall see later the pros and cons of using a linked list instead of an array. For now we just want to understand how differently memory is allocated in a linked list, compared to an array. We will see this by rewriting the previous program using a linked list. Linked list are used exclusively with structure, so as first thing we define a structure representing an element of the list:

/* begin code */
#include <stdio.h>
#include <stdlib.h>

typedef struct item
{
int data;
struct item * next;
}item;

this structure defines the most basic elements a linked list can be made of: the data stored by the element and a pointer which is used to link the element to the next one in the list.

int main()
{
item * head, *node;
head = node;
node = malloc(sizeof(item));
node->data = 2;

In the previous part of the code we declare two pointers, head and node. We use malloc() to allocate the memory for the first element of the list. node points to this element and head, in turn, points to node; we should notice that a piece of memory allocated with malloc() it's always accessible by the program, until we call free(), even if we use malloc() inside a function. This is important to understand because in professional written programs, a linked list is generally created and manipulated through functions, and that it's possible thanks to the particular way in which malloc() allocates memory (on the heap).

node->next = malloc(sizeof(item));
node->next->data = 3;
node->next->next = NULL;

As we said before, an element of the linked list has (at least) one link to the next element. Here, we allocate memory for the second element while using the pointer next, contained in the first element, to point at it, so that they link together. Now, let's say stop, we don't need any more elements on this side of the list. To mark the end of a linked list, we nullify the last element, which in this case is represented by the pointer node->next->next.

item * tmp = malloc(sizeof(item));
tmp->data = 1;

head = tmp;
tmp->next = node;

In the code above we see the power and flexibility of linked lists. Until now, the element of the list have been added in a contiguous fashion, just like an array, with the difference that for arrays, memory is allocated at compile time. Above we add an element at the beginning of the list, to the opposite side from where we added the second element. This is done by allocating memory for an new element with malloc() and using tmp to point to that memory. Then we copy tmp to head ( which will point always to the first element of the list ) . We finish by linking this new added element to the list, by copying node to tmp->next.

What we do next it's entirely up to our needs. We can keep appending elements to any of the two sides of our list.
To finish the program, we see how to traverse the list, for printing out the data stored into it. To achieve this, we use the head pointer and a while loop:

while(head!=NULL)
    {
      printf( "address %p data %d\n", &head->next, head->data );
      head = head->next;
    }

free(tmp);           //frees node containing data=1
tmp = NULL;

free(node);          //frees node containing data=2
node = NULL;

free(node->next);    //frees node containing data=3
node->next = NULL;

return 0;
}/* end code */

The following is the output of the program:

address 0x934d02c value 1
address 0x934d00c value 2
address 0x934d01c value 3

As you can see the address of the elements doesn't follow the order in which the elements are printed out by the program, or to be more precise, the order in which they are placed inside the list.

The obvious advantage of using linked lists is the fact that their size can be modified during the execution of the program. This property is empowered even more by the possibility to expand the linked lists in any direction and on any of their elements. Though, this comports using some extra time and memory. Extra time, which we need for scanning when we must access an element that isn't the next during a list traversal. Extra memory, needed to maintain one or more pointers ( depending on the type and implementation of a linked list ) to link and navigate the list.
To exploit the possibilities offered by this kind of data structure, they should be used in all those situations where the size of data is unknown before the program is executed or during it's execution, like in real time computing. For example, compilers or interpreters construction represent one of those cases where linked lists shine out, compared to arrays. In the next post, we will see how to implement a linked list.


Graphic representation of a linked list		(en.wikipedia.org)

0	0	0	0	0
255	255	255	255	255
255	128	128	128	255
255	128	128	128	255
255	255	255	255	255
0	0	0	0	0

0	0	0	0	0
255	255	255	255	255
255	128	128	128	255
255	128	128	128	255
255	255	255	255	255
0	0	0	0	0

MathJax 3