Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
ACS61012 “Machine Vision” Lab Assignment
The purpose of the lab sessions is to give you practical skills in machine vision and
especially in image enhancement, image understanding and video processing. Machine
vision is essential for a number of areas - autonomous systems, including robotics,
Unmanned Aerial Vehicles (UAVs), intelligent transportation systems, medical diagnostics,
surveillance, augmented reality and virtual reality systems.
The first labs focus on performing operations on images such as reading, writing calculating
image histograms, flipping images and extracting the important colour and edges image
features. You will become familiar how to use these features for the purposes of object
segmentation (separation of static and moving objects) and for the next high-level tasks of
stereo vision, object detection, classification, tracking and behaviour analysis. These are
inherent steps of semi-supervised and unsupervised systems where the involvement of the
human operators reduces to minimum or is excluded.
Your assignment consists of several subtasks listed below and described detail in the lab
session parts. This is a brief description of all your tasks:
Task 1: Introduction to machine vision:
The aim of this task is for you to learn how to read images in different formats convert them
from one format to another and analyse image histograms
Part I of this task: Understanding different image formats, analysis of image
histogram, You can use Images from the file
File: Lab 1 - Part I - Introduction to Images and Videos.zip or your own image).
Part II of this task: Different types of image noise/ image denoising, static object
segmentation based on edge detection.
For the report from Task 1, you need to present results with:
● The Red, Green, Blue (RGB) image histogram of your own picture and analysis the
histogram. The original picture should be shown as well (Lab session 1 – Part I)
● Results with different edge detection algorithms, e.g. Sobel, Prewitt and comment
on their accuracy with different parameters. Visualise the results and draw
conclusions (Lab session 1 – Part II).
[10 marks equally distributed between
part I and part II ]
Task 2: Optical flow estimation algorithm:
● Find corner points and apply the optical flow estimation algorithm.
(file Lab 2.zip – image Gingerbread Man).
[5 marks]
● Track a single point with the optical flow approach (file: Lab 2.zip – the red square
image).
[9 marks]
For the report, you need to:
● Presents results for the ‘Gingerbread Man’ tasks and visualise the results
● Visualise the track on the last frame and the ground truth track of ‘Red Square’ tasks
2
● Compute and visualise the root mean square error of the estimated track by the optical
flow algorithm in comparison with the groundtruth values (the red square).
Task 3: Automatic detection of moving objects in a sequence of video frames
You are designing algorithms for automatic vehicular traffic surveillance. As part of this
task, you need to apply two types of approaches: the basic frame differencing approach
and the Gaussian mixture approach to detect moving objects.
Part I: with the frame differencing approach:
● Apply the frame differencing approach (Lab 3.zip file)
For the report, you need to present results with:
● Image results of the accomplished tasks
● Analyse the algorithms performance when you vary the detection threshold.
[10 marks]
Part II: with the Gaussian mixture approach:
● Apply the Gaussian mixture model (file Lab 5.zip)
For the report, you need to present results showing:
● The algorithm performance when you vary parameters such as number of Gaussian
components, initialisation parameters and the threshold for decision making
● Detection results of the moving objects, show snapshots of images.
[10 marks]
Task 4: Treasure hunting:
● Application of the basic image processing techniques for finding “a treasure” in an
image (Lab 4.zip file). There are three types of images – with easy (10 marks),
medium (10 marks) and high level of difficulty (there are two treasures: the sun
and the clove). In the third case you need to find both treasures.
For the report, you need to present results with:
● The three different images showing the path of finding “the treasure”
● Explain your solution, present your algorithm and the related MATLAB code
[35 marks]
Task 5. Study and compare capsule Convolutional Neural Networks (CNNs) with
the Siamese CNNs and YOLO CNN with respect to: their architecture,
principle of operation, advantages, disadvantages and applications – with
respect to tasks such as detection, classification and segmentation.
[21 marks]
A Well-written Report Contains:
● A title page, including your ID number, course name, etc., followed by a content page.
● The main part: description of the tasks and how they are performed, including results
from all subtasks. For instance: “This report presents results on reading and writing
images in MATLAB. Next, the study of different edge detection algorithms is presented
and their sensitivity to different parameters…” You are requested to present in
Appendices the MATLAB code that you have written to obtain these results. A very
important part of your report is the analysis of the results. For instance, what does the
image histogram tell you? How can you characterise the results? Are they accurate? Is
there a lot of noise?
3
● Conclusions describe briefly what has been done, with a summary of the main
results.
● Appendix: Present and describe briefly in an Appendix the code only for tasks 2-
4. Add comments to your code to make it approachable and easy to understand.
● Cite all references and materials used. Write with own style and words to minimise and
avoid similarities.
Report Submission
The deadline for your report is indicated on MOLE.
The advisable maximum number of words is 4000.
Please submit: 1) your course work report in a pdf format, and 2) the code (for all
assignment tasks) in a zipped file via MOLE.
Lab Session 1 - Part I: Introduction to Image Processing
In this lab you will learn how to perform basic operations on images of different types, to
work with image histograms and how to visualise the results.
Background Knowledge
A digital image is composed of pixels which can be thought of as small dots on the screen.
We know that all numeric calculations in MATLAB are performed using double (64-bit)
floating-point numbers, so this is also a frequent data class encountered in image
processing. Some of the most common formats used in image processing are presented in
Tables 1 and 2 given below.
All MATLAB functions and capabilities work with double arrays. To reduce memory
requirements, MATLAB supports storing image data in arrays of class uint8 and uint16. The
data in these arrays is stored as 8-bit or 16-bit unsigned integers. These arrays require one
eighth or one-fourth as much memory as data in double arrays.
Table 1. Data classes and their ranges
Most of the mathematic operations are not supported for types uint8 and uint16. It is
therefore required to convert to double for operations and back to uint8/16 for storage,
display and printing.
4
Table 2. Numeric formats used in image processing
Image Types
I. Intensity image (Grey scale image)
This form represents an image as a matrix where every element has a value corresponding
to how bright/ dark the pixel at the corresponding position should be coloured. There are
two ways to represent the brightness of the pixel:
1. The double class (or data type). This assigns a floating number ("a number with
decimals") in the range -10308 to +10308 for each pixel. Values of scaled class double
are in the range [0,1]. The value 0 corresponds to black and the value 1 corresponds
to white.
2. The other class uint8 assigns an integer between 0 and 255 to represent the intensity
of a pixel. The value 0 corresponds to black and 255 to white. The class uint8 only
requires roughly 1/8 of the storage compared to the class double. However, many
mathematical functions can only be applied to the double class.
II. Binary image
This image format also stores an image as a matrix but can only colour a pixel black or
white (and nothing in between): 0 – is for black and a 1 – is for white.
III. Indexed image
This is a practical way of representing colour images. An indexed image stores an image as
two arrays. The first matrix has the same size as the image and one number for each pixel.
The second matrix is called the colour map and its size may be different from the image.
The numbers in the first matrix is an instruction of what number to use in the colour map
matrix.
IV. RGB image
This format represents an image with three matrices of sizes matching the image format.
Each matrix corresponds to one of the colours red, green or blue and gives an instruction of
how much of each of these colours a certain pixel should use. Colours are always
represented with non-negative numbers.