This is the first post of a series on image processing theory and practice. We will study some of the most important algorithms used in this field and we will learn how to implement them. I chose Python as the implementation language because this will make this series approachable for a large audience.
Python has become the language of choice for many scientists, students and engineers involved in machine vision and A.I. in general. Probably, the reasons for that are Python's easy syntax and Python's portability. But, without doubt, Python owns a good part of its success also to the large number of powerful and well-established libraries existing for the language, which allow users to perform nearly any kind of task with images and other types of data. Why bother to follow this tutorial, then? Firstly, because learning it's fun. Secondly, because if you are serious about your domain of expertise, you should have a good understanding of what a library (or whatever tool you use) is doing with your data, so to be able to choose the right functionality for your tasks, and in the right order.
What's image processing and why do we need it
A digital image is a signal, thus as such, it can be analysed and processed, to modify or to improve its properties. As with all signals, an image can have some noise in it that has to be removed, or maybe it needs to go through some transformation step before it can be used for some scope. Image processing is the study and the application of the mathematics needed to perform these improvements and transformations, utilizing computers.
A digital image can be represented as a bidimensional array of pixels, where each pixel is a sample of the image. Depending on the type of image, a pixel can have 1, 3 or 4 values associated with it. Such values are also called channels. Each channel holds a component of the image, like the level of one of the primary colours. The components of a pixel may vary a lot, depending on the encoding of the image (e.g. the file format) and the implementation algorithm. Also, the bit depth of an image can vary. The bit depth is the number of memory's bits used to hold the value associated with one pixel's channel.
When we consider all the components of a pixel and the range of values they can hold, we call these the image's colour space. Among all, the most common colour space used in digital images is probably the RGB, used also in PC's monitors and scanners. Printers use CMYK (cyan, magenta, yellow, and black).
The pixel of an RGB image has usually 8 bits per channel, thus in such a case, we say that the image has a bit depth of 24 bits, for a total of 3 bytes of data per pixel.
Another important type of image is the greyscale. This image is characterized by the absence of colours, except for the black, white and a range of shades of grey obtained by computing the amount of light carried by each pixel. If only black and white are present, we have a binary image, another important image type for machine vision applications. Grayscale images are lightweight images as they only need 1 byte per pixel (8 bits). In grayscale images, each pixel can be rendered with a value between 0 and 255, and that takes the same amount of memory's space as an ASCII character. These properties make this type of image more suitable for tasks where a lot of memory and computational power is needed. It's not for a case that in many algorithms, the first processing step on a colour image is to convert it from its colour space to grayscale.
Colour space conversion
As I mentioned above, sometimes we need to convert the colour space of our images to a model more suitable for our needs. We can convert any colour space to another, but the most important conversion it's, without doubt, the conversion to greyscale. The simplest method to achieve this with an RGB image is to average the value of the three channels:
\[P = (R+G+B)/3\]
where P is the value of the pixel in the output image. Other algorithms are less naive, and take into consideration how we perceive the colours and their properties, as in the case of the gamma compression and the channel-dependent luminance algorithms. The former considers how luminance affects the way we perceive differences in colours, the latter how our eyes are more sensitive to green rather than blue or red. The implementation of gamma correction is expensive but gives a more accurate result, while the opposite is true for the channel-dependent luminance. Another algorithm do exist which provide a good balance between computational costs and luminance correctness, and that is the linear approximation, also used by some popular imaging libraries, like Opencv and Pillow:
\[P = 0.299R + 0.587G + 0.114B\]
Implementing RGB to Grayscale conversion
If you don't have Pillow (PIL fork) installed, you need it if you want to try the code in this tutorial. If you use pip as a package manager, just install it with the command pip install pillow.
Pillow has quite a few ready-made filters implemented, but we will use it only because makes it very easy for us to access an image's data. We will also use Tkinter, from Python's standard library, as a display interface.
Let's import our libraries:
from PIL import Image, ImageTk
import tkinter as tk
follows our conversion function:
def rgb2gray(input_image):
if input_image.mode != 'RGB':
return None
else:
output_image = Image.new('L', (input_image.width,
input_image.height))
for x in range(input_image.width):
for y in range(input_image.height):
pix = input_image.getpixel((x, y))
pix = int(pix[0]*0.299 +
pix[1]*0.587 +
pix[2]*0.114)
output_image.putpixel((x,y), pix)
return output_image
this line checks that we have an input image that is compatible with this function:
if input_image.mode != 'RGB':
return None
if this is the case, we create a new image in 'L' mode, which means 8 bits per pixel. Then, we iterate over all the input image's pixels, retrieving the pixel's tuple for each position (x,y) visited:
output_image = Image.new('L', (input_image.width,
input_image.height))
for x in range(input_image.width):
for y in range(input_image.height):
pix = input_image.getpixel((x, y))
we use the tuple's three values (r,g,b) to create the gray pixel, by applying linear approximation. Note that we reuse the pix variable. Next step, we set the output image's pixel:
pix = int(int(pix[0]*0.299)+
int(pix[1]*0.587)+
int(pix[2]*0.114))
output_image.putpixel((x,y), pix)
finally, the driver's code:
if __name__ == "__main__":
root = tk.Tk()
img = Image.open("retriver.png")
width = img.width*2+20
height = img.height
root.geometry(f'{width}x{height}')
output_im = rgb2gray(img)
if output_im != None:
output_im = ImageTk.PhotoImage(output_im)
input_im = ImageTk.PhotoImage(img)
canvas = tk.Canvas(root, width=width, height=height, bg="#ffffff")
canvas.create_image(width/4-1, height/2-1, image=input_im, state="normal")
canvas.create_image(20+3*img.width/2-1, img.height/2-1, image=output_im, state="normal")
canvas.place(x=0, y=0)
canvas.pack()
root.mainloop()
else:
print("Input image must be in 'RGB' colour space")
below you can see the result after applying the rgb2gray function to the input image:
The complete code:
from PIL import Image, ImageTk
import tkinter as tk
def rgb2gray(input_image):
if input_image.mode != 'RGB':
return None
else:
output_image = Image.new('L', (input_image.width,
input_image.height))
for x in range(input_image.width):
for y in range(input_image.height):
pix = input_image.getpixel((x, y))
pix = int(int(pix[0]*0.299)+
int(pix[1]*0.587)+
int(pix[2]*0.114))
output_image.putpixel((x,y), pix)
return output_image
if __name__ == "__main__":
root = tk.Tk()
img = Image.open("retriver.png")
width = img.width*2+20
height = img.height
root.geometry(f'{width}x{height}')
output_im = rgb2gray(img)
if output_im != None:
output_im = ImageTk.PhotoImage(output_im)
input_im = ImageTk.PhotoImage(img)
canvas = tk.Canvas(root, width=width, height=height, bg="#ffffff")
canvas.create_image(width/4-1, height/2-1, image=input_im, state="normal")
canvas.create_image(20+3*img.width/2-1, img.height/2-1, image=output_im, state="normal")
canvas.place(x=0, y=0)
canvas.pack()
root.mainloop()
else:
print("Input image must be in 'RGB' colour space")
Final words
In this first post, we have seen what image processing is and how to implement a simple RGB to the greyscale converter. In the next post, we will study and implement another algorithm to deepen our knowledge about image processing and machine vision.
No comments:
Post a Comment