Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
The purpose of the lab sessions is to give you both theoretical and practical skills in machine vision and especially in image enhancement, image understanding and video processing. Machine vision is essential for a number of areas - autonomous systems, including robotics, Unmanned Aerial Vehicles (UAVs), intelligent transportation systems, medical diagnostics, surveillance, augmented reality and virtual reality systems.
The first labs focus on performing operations on images such as reading, writing calculating image histograms, flipping images and extracting the important colour and edges image features. You will become familiar how to use these features for the purposes of object segmentation (separation of static and moving objects) and for the next high-level tasks of stereo vision, object detection, classification, tracking and behaviour analysis. These are inherent steps of semi-supervised and unsupervised systems where the involvement of the human operators reduces to minimum or is excluded.
Required for Each Subtask
Task 1: Introduction to Machine Vision
For the report from Task 1, you need to present results with:
From Lab Session 1 – Part I
● The Red, Green, Blue (RGB) image histogram of your own picture and analysis the histogram. Several pictures are provided, if you wish to use one of them. Alternatively, you could work with a picture of your choice. The original picture needs to be shown as well. Please discuss the results. For instance, what is the differences between the histograms? What do we learn from the visualised red, green and blue components of the image histogram?
Files: Lab 1 - Part I - Introduction to Images and Videos.zip and Images.zip. You can work with one of the provided images from Images.zip or with your own image.
From Lab Session 1 – Part II
● Results with different edge detection algorithms, e.g. Sobel, Prewitt and comment on their accuracy with different parameters (threshold, and different types of noise especially). Include the visualisation and your conclusions about static objects segmentation using edge detection (steps 9-11 with Sobel, Canny and {Prewitt operators)) in your report. Visualise the results and draw conclusions.
[8 marks equally distributed between part I and part II]
Task 2: Optical Flow Estimation Algorithm
For the report, you need to:
● Find corner points and apply the optical flow estimation algorithm. (use file Lab 2.zip – image Gingerbread Man). Presents results for the ‘Gingerbread Man’ tasks and visualise the results
[4 marks]
● Track a single point with the optical flow approach (file: Lab 2.zip – the red square image). Visualise the trajectory on the last frame and the ground truth track of ‘Red Square’ tasks.
1
● Compute and visualise the root mean square error of the trajectory estimated over the video frames by the optical flow algorithm. Compare the estimates with the exact coordinates given in the file called groundtruth. You need to include the results only with one corner. Give the equation for the root-mean square error. Analyse the results and make conclusions about the accuracy of the method based on the root mean square error.
[8 marks]
Task 3: Automatic Detection of Moving Objects in a Sequence of Video Frames
You are designing algorithms for automatic vehicular traffic surveillance. As part of this task, you need to apply two types of approaches: the basic frame differencing approach and the Gaussian mixture approach to detect moving objects.
Part I: with the frame differencing approach
● Apply the frame differencing approach (Lab 3.zip file)
For the report, you need to present:
● Image results of the accomplished tasks
● Analyse the algorithms performance when you vary the detection threshold.
[5 marks]
Part II: with the Gaussian mixture approach For the report, you need to present:
● Results for the algorithm performance when you vary parameters such as number of Gaussian components, initialisation parameters and the threshold for decision making
● Detection results of the moving objects, show snapshots of images.
● Analyse all results – how does the change of the threshold and number of Gaussian
components affect the detection of the moving objects?
Task 4: Robot Treasure Hunting
[5 marks]
A robot is given a task to search and find “treasures” in imagery
data. There are three tasks: easy , medium and difficult.
. The starting point of the robot search is where the red arrow is. For the medium case the blue fish is the only treasure, for the difficult case the clove and sun are
2
“treasures” that need to be found. Ideally, one algorithm needs to be able to find the “treasures” from all images, although a solution with separate algorithms is acceptable. For Task 4, in the report, you need to present results with:
● The three different images (easy, medium and difficult showing the path of finding “the treasure”.
● Include the intermediate steps of your results in your report, e.g. of the binarisation of the images and the value of the threshold that you found or any other algorithm that you propose for the solution of the tasks.
● Explain your solution, present your algorithms and the related MATLAB code.
● Include the brief description of main idea of your functions in your report and the
actual code of the functions in an Appendix of your report.
In the guidance for the labs, one possible solution is discussed, but others are available. Creativity is welcome in this task and if you have different solutions, they are welcome.
Here 8 marks are given for the easy task, 10 for the medium task and 12 for the most difficult task.
[30 marks]
Task 5. Image Classification with a Convolutional Neural Network
1. Provide your classification results with the CNN, demonstrating its accuracy and analyse them in your report.
[2 marks]
2. Calculate the Precision, Recall, and the F1 score functions characterising further the CNN performance.
[6 marks]
3. ImprovetheCNNclassificationresults.Pleaseexplainhowyouhaveachievedthe improvements.
[12 marks]
4. Discuss ethics aspects in Computer Vision tasks such as image classification, detection and segmentation. Consider ethics in broad aspects – what are the positives when Ethics is considered. What ethics challenges do ethics poses and how could they be reduced and mitigated? In your answer you need to include aspects of Equality, Diversity and Inclusion (EDI).
[10 marks]
Finally, the quality of writing and presentation style are assessed. These include the clarity, conciseness, structure, logical flow, figures, tables, and the use of references.
[10 marks]
3
Guidance on the Course Work Submission
You need to submit your report and code that you have written to accomplish the tasks. There are two separate submission links on Blackboard.
Report and Code Submission
There are two submission links on Blackboard: 1) for your course work report in a pdf format and 2) for the requested code in a zipped file.
A Well-written Report Contains:
● A title page, including your ID number, course name, etc., followed by a content page.
● The main part: description of the tasks and how they are performed, including results from all subtasks. For instance: “This report presents results on reading and writing images in MATLAB. Next, the study of different edge detection algorithms is presented and their sensitivity to different parameters...” You are requested to present in Appendices the MATLAB code that you have written to obtain these results. A very important part of your report is the analysis of the results. For instance, what does the image histogram tell you? How can you characterise the results? Are they accurate? Is
there a lot of noise?
● Conclusions describe briefly what has been done, with a summary of the main
results.
● Appendix: Present and describe briefly in an Appendix the code only for tasks 2-
5. Add comments to your code to make it understandable. Provide the full code
as one compressed file, in the separate submission link given for it.
● Cite all references and materials used. Adding references demonstrates additional independent study. Write with own style and words to minimise and avoid similarities. Every student needs to write own independent report.
● Please name the files with your report and code for the submission on Blackboard by adding your ID card registration number, e.g. CW_Report_1101133888 and CW_Code_1101133888.
The advisable maximum number of words is 4000.
Submission Deadline: Week 10 of the spring semester, Sunday midnight
4
Guidance to Accomplish the Tasks
Lab Session 1 - Part I: Introduction to Image Processing
In this lab you will learn how to perform basic operations on images of different types, e.g. how to read them, convert them from one format to another, calculate image histograms and analyse them.
Background Knowledge
A digital image is composed of pixels which can be thought of as small dots on the screen. We know that all numeric calculations in MATLAB are performed using double (64-bit) floating-point numbers, so this is also a frequent data class encountered in image processing. Some of the most common formats used in image processing are presented in Tables 1 and 2 given below.
All MATLAB functions work with double arrays. To reduce memory requirements, MATLAB supports storing image data in arrays of class uint8 and uint16. The data in these arrays is stored as 8-bit or 16-bit unsigned integers. Such arrays require respectively, one eighth or one-fourth as much memory as data in double arrays.
Table 1. Data classes and their ranges
Most of the mathematic operations are not supported for types uint8 and uint16. It is therefore required to convert to double for operations and back to uint8/16 for storage, display and printing.
Table 2. Numeric formats used in image processing
Image Types
I. Intensity Image (Grey Scale Image)
This form represents an image as a matrix where every element has a value corresponding to how bright/ dark the pixel at the corresponding position should be coloured. There are two ways to represent the brightness of the pixel:
5
1. The double class (or data type) format. This assigns a floating number ("a number with decimals") in the range -10308 to +10308 for each pixel. Values of scaled class double are in the range [0,1]. The value 0 corresponds to black and the value 1 corresponds to white.
2. Theotherclassuint8assignsanintegerbetween0and255torepresenttheintensity of a pixel. The value 0 corresponds to black and 255 to white. The class uint8 only requires roughly 1/8 of the storage compared to the class double. However, many mathematical functions can only be applied to the double class.
II. Binary Image
The binary image format also stores an image as a matrix but can colour a pixel as black or white (and nothing in between): 0 – is for black and a 1 – is for white.
III. Indexed Image
This is a practical way of representing colour images. An indexed image stores an image as two arrays. The first array has the same size as the image and one number for each pixel. The second array (matrix) is called colour map and its size may be different from the image size. The numbers in the first matrix represent an instruction of what number to use in the colour map matrix.
IV. RGB Image
This format represents an image with three matrices of sizes matching the image format. Each matrix corresponds to one of the colours red, green or blue and gives an instruction of how much of each of these colours a certain pixel should use. Colours are always represented with non-negative numbers.