Multimedia Software Systems
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
CS4551 Multimedia Software Systems
Homework2 (15%) – Vector Quantization and DCT Coding
• Due: Electronic submission via CSNS by Sunday, 04/19/2020.
• What to turn in:
o Submit source code with necessary files for “compile and run”.
o Do NOT submit data files.
o You MUST provide a readme.txt file containing all information to help with the grading process.
• If your program produces any compile errors, you will receive 0 automatically no matter how close your
program is to the solution.
• Programming requirements:
o You are not allowed to use any Java built-in image class methods, library, or tools to complete this
homework.
o Do not create one mega-size main class.
o Do not change any given methods of MImage class nor create a new class that duplicates MImage
class. Treat MImage as a part of imported library.
o Test your program with all test data.
o If you do not meet any of the requirements above, you will receive a significant reduction.
0. What your program should do
Name your main application CS4551_[YourLastName].java (e.g. CS4551_Doe.java) and expand the given
template program to perform the following tasks.
Receive the input file as command line arguments.
On Command Prompt
java CS4551_Doe Ducky.ppm
Read a 24bit input PPM image and display the following main menu to the user and receive the user’s input.
Main Menu-----------------------------------
1. Vector Quantization
2. DCT-based Coding
3. Quit
Please enter the task number [1-3]:
After performing a selected task, go back to display the menu again.
1. Task 1 – Vector Quantization (50 pts)
Compress the 24 bits per pixel input image to 2 bits per pixel using Vector Quantization (VQ). Implement VQ
encoding/decoding using the requirements below.
CS4551 Spring 2020 HW #2
2
Encoding:
• Input vectors are formed by 2×2 blocks of RGB pixels. Each input vector 𝒙 consists of RGB values of
FOUR pixels, P1, P2, P3, and P4, and therefore 𝒙 is 12 dimensional.
P1 P2
P3 P4
Diagram of a 2×2 pixel block of the input image
𝒙 = {𝑃1𝑅,𝑃1𝐺,𝑃1𝐵,𝑃2𝑅, 𝑃2𝐺,𝑃2𝐵,𝑃3𝑅,𝑃3𝐺,𝑃3𝐵, 𝑃4𝑅,𝑃4𝐺,𝑃4𝐵
}
• Codebook and codebook vectors: The 2-bits per pixel quantization is equivalent to using 8 bits per 4
pixels. Therefore, the VQ should the vector space into 256 (=28
) cells and the codebooks should have
256 entries that are centroids of the 256 cells. After the vector quantization, each vector 𝒙 belongs to one
cell and each cell number is represented by 1 byte. In order words, after the quantization, each 2×2 block
(4×3=12 bytes) is encoded by a 1-byte codebook index. So, the compression ratio is 12.
• Codebook generation: Use K-means clustering algorithm to generate codebook vectors (centroids of
cells).
K-means Clustering Algorithm
Inputs: K, number of clusters and the data set (input vectors 𝒙)
K is 256 in our case.
Assume that 𝒄[𝑖] store the K centroids. 𝑖 = 0, 1, ⋯ , 255.
Each 𝒄[𝑖] is a 12-dimensional vector.
1. Assign randomly generated initial values for the 𝐾 centroids.
2. For each 𝒙,
For each 𝑖 = 0 to 255
If 𝒄[𝑖] is the closest cell (cluster) to 𝒙 based on the Euclidean distance between 𝒙 and 𝒄[𝑖],
assign 𝒙 to 𝒄[𝑖] cluster
3. Update the 𝐾 centroids.
4. Iterate 2 & 3 until the algorithm meets a stopping condition (i.e. no data points change clusters, the
sum of the distance is minimized, or the maximum number of iterations is reached.)
• Display the codebook. This is equivalent to displaying 𝒄[𝑖], 𝑖 = 0, 1, ⋯ ,255.
• Extra credit (10 pts) – Save the quantized image (i.e. indices of all 2×2 blocks) into a PPM file. Given
𝑊 × 𝐻 input image, the quantized image resolution is 𝑊/2 × 𝐻/2. The quantized image is a grayscale
image because each pixel is an 8-bit index.
Decoding:
• Given the quantized image and the codebook, for each pixel of the quantized image, recover RGB pixel
values of 4 pixels.