Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
CSC 230 Project 5
Lossy Image Compression
Images have been a common subject for compression. There are a couple of reasons for this. Without compression, image data
can require a lot of storage space, and images are often tolerant of a small amount of loss. It's generally OK if the compressed
image doesn't capture the original image exactly. If it can make the compressed representation smaller, most applications are
able to accept small changes in the intensity values stored in some pixels of the image.
This is what you'll be working on in this assignment. You'll be writing a pair of programs that compress and decompress grayscale
images. We'll be using a technique that's kind of like how the JPEG image format works. We'll be breaking the image into a lot of
little square blocks, applying a transformation called the Discrete Cosine Transform (DCT) to each block, then encoding the result
using different numbers of bits for various values. We'll call our compressed format j10, since it's like a simplified JPEG format,
but it uses 10 x 10 image blocks rather than the 8 x 8 blocks normally used by JPEG.
Our image compression program will be called encode. The following shows how we can run it. Here, the program will read an
input image named "image-06.pgm" and write out a compressed image named "output.j10".
$ ./encode image-06.pgm output.j10
We won't be able to directly display images in the j10 format. It's a format we just made up for this assignment, so no standard
image viewing program will support it. To view a compressed image, we'll need to first decompress it to PGM. Then, we can view
the resulting PGM image. Our decompression program will be called decode. The following shows how we can run it, taking the
compressed image we just created and decompressing it to produce an image in the standard, pgm format.
$ ./decode output.j10 output.pgm
The figure below shows the results of this compression and decompression. The image on the left is the original, and the one on
the right is the result of compressing and decompressing this image. There are some small differences. That's what you would
normally expect from a lossy compression technique. Overall, the two images look very similar.
Original image (left) and compressed/decompressed image (right)
If we're willing to tolerate some small differences introduced by compression, a lossy technique can save a lot of storage space.
The shell commands below show that we've reduced the storage cost for this image by more than 2/3. The original image
required almost 208 kilobytes, but the compressed representation used only about 63.
$ ls -l image-06.pgm output.j10
-rw-r--r-- 1 dbsturgi staff 212816 Mar 27 10:45 image-06.pgm
-rw-r--r-- 1 dbsturgi staff 64959 Mar 28 10:48 output.j10
As with recent assignments, you'll be developing this project using git for revision control. You should be able to just unpack the
starter into the p5 directory of your cloned repo to get started. See the Getting Started section for instructions.
This Project supports a number of our course objectives. See the Learning Outcomes section for a list.
Rules for Project 5
You get to complete this project individually. If you're unsure what's permitted, you can have a look at the academic integrity
guidelines in the course syllabus.
In the design section, you'll see some instructions for how your implementation is expected to work. Be sure you follow these
rules. It's not enough to just turn in a working program; your program has to follow the design constraints we've asked you to
follow. For this project, we're putting some constraints on the functions you'll need to define, and on one data structure you'll
need to use. Still, you will have lots of opportunities to design parts of the project for yourself.
Requirements
This section says what your programs are supposed to be able to do, and what they should do when something goes wrong.
Encode / Decode Execution
The encode program expects two command-line arguments, the name of an uncompressed input image in PGM format and the
name of a file for storing the compressed image in our j10 format. For example, you should be able to run it as follows:
./encode input.pgm output.j10
If you give the program the wrong number of command-line arguments, it should print the following line to standard error and
then terminate with an exit status of 1.
usage: encode
If one of the given files can't be opened, the program should print an error message to standard error, then terminate with an exit
status of 1. The error message should be generated using the perror() standard library function, with the filename passed to
perror() as the argument. For example, on a common platform system, you'll get an error message like the following if you try to
open a file that doesn't exist. Since the specific message is determined by perror(), you may get different error messages on
other systems of for different kinds of errors.
input.pgm: No such file or directory
If the input image isn't valid (see the description of the raw PGM format below), the encode program should print the following
line to standard error and then terminate with an exit status of 1. An input image could be invalid if the header didn't have the
expected fields, if the maximum intensity value wasn't 255, or if the file wasn't large enough to contain a byte for each pixel. Our
compression technique also requires image width and height to both be a multiple of 10 (there's an extra credit option described
below that lifts this restriction). If you don't do the extra credit, you should also report this error for images that don't have a
width and height that meet this constraint.
Invalid image file
As you'll see below, our compression technique will only support files with a width and height that are both less than 4096 pixels.
The encode program should also print the above message and terminate with a status of 1 if the input image is too wide or too
tall.
The decode program is similar to encode; it expects two command-line arguments, the name of a compressed input file in our j10
format and the name of an output file for the uncompressed image in PGM format. For example, you should be able to run it as
follows:
./decode input.j10 output.pgm
If you give the program the wrong number of command-line arguments, it should print the following line to standard error and
then terminate with an exit status of 1.
usage: decode
If one of the given files can't be opened, decode will also use perror() to print an error message to standard error, then terminate
with an exit status of 1. So, it might print an error message like the following if it couldn't create the requested output file:
output.pgm: Permission denied
If the decode program is given an input file that doesn't contain enough bits to represent an encoded image (if the input file
reaches the end-of-file before it should), it should print the following line to standard error and then terminate with an exit status
of 1. Unless you do the extra credit, you should also print this line and terminate if the input describes an image where the width
and height aren't a multiple of 10.
Invalid encoded file
PGM Image Format
For uncompressed images, we'll be using the raw form of the PGM image format. This is similar to the image format we used on
project 2, but it represents pixel data in binary, rather than in text. This makes it much more space-efficient (but, our compressed
format will generally be able to do better). We'll place some additional constraints on our input and output images, to make them
easier to parse and to write out.
The raw PGM image format starts with a 3-line header like the following. This part of the file is in text. It starts with two
characters, P5 on the first line. This two-character code indicates that this is an image file in raw PGM format. The next line gives
the size of the image, as width then height. The example below is showing an image 60 pixels wide and 45 pixels tall. Notice that
the width, height order is backward from how we normally describe two-dimensional data. Everywhere else in our program, we'll
generally order these as height (rows) then width (columns). In this one place, though, we have to respect the order defined in
the PGM image format, so, for PGM images, this line of the header will give width then height.
P5
60 45
255
The third line of the header gives the maximum intensity for pixels. All the images we work with will be at least 1 pixel wide and 1
pixel tall, and they'll all have 255 as their maximum intensity. We'll also assume there's just one space between the width and
height.
The 255 at the end of the header is followed by one whitespace character (a newline for all of our images). Then, the remaining
width x height bytes in the file give the intensity values for the image pixels, one byte per pixel. These are given in row-major
order, so the first byte of pixel data gives the intensity of the pixel in the upper-left corner, then the intensity of the next pixel on
the top row, and so on, up to the last pixel on the top row. Then, immediately afterward are all the pixel intensities for the second
row, then the next row all the way to the last row of the image. Note that all the pixel data is in binary, one byte per pixel. There's
no separation between the intensity values for pixels (e.g., no spaces or newlines separating the data values).
The following shows the contents of a small pgm file using the hexdump program. This lets us see the start of the file as text
(over on the right) and the contents of the pixel data as binary. You can see this is a 10 x 10 image. The hexadecimal 0a near the
end of the first line is the newline after the 255. After this, the ad is the intensity of the pixel in the upper-left corner of the
image. This value, along with the 99 bytes after it give the intensity values of all the image's pixels, in row-major order. The file
ends right after the intensity of the last pixel (31 in hexadecimal).
$ hexdump -C image-02.pgm
00000000 50 35 0a 31 30 20 31 30 0a 32 35 35 0a ad a7 9c |P5.10 10.255....|
00000010 8c 79 65 53 43 37 31 ad a7 9c 8c 79 65 53 43 37 |.yeSC71....yeSC7|
00000020 31 ad a7 9c 8c 79 65 53 43 37 31 ad a7 9c 8c 79 |1....yeSC71....y|
00000030 65 53 43 37 31 ad a7 9c 8c 79 65 53 43 37 31 ad |eSC71....yeSC71.|
00000040 a7 9c 8c 79 65 53 43 37 31 ad a7 9c 8c 79 65 53 |...yeSC71....yeS|
00000050 43 37 31 ad a7 9c 8c 79 65 53 43 37 31 ad a7 9c |C71....yeSC71...|
00000060 8c 79 65 53 43 37 31 ad a7 9c 8c 79 65 53 43 37 |.yeSC71....yeSC7|
00000070 31 |1|
The following figure shows what the image above looks like. Here, the image is enlarged considerably to show individual pixels.
Image defined by the PGM file above.
Image Block Transformation
For the pixel data, our compression technique will work by applying a version of the two-dimensional Discrete Cosine Transform
(DCT) to blocks of the image, then writing out an encoding of each of the resulting blocks. The DCT is a common transformation
used in lossy compression tasks. In fact, if you're familiar with the show, "Silicon Valley", the DCT gets mentioned at least once in
the show. The DCT will let us look at the frequency components in each image block, making it possible to store information about
how the shade is changing within a block, rather than just storing intensity values for individual pixels.
The following figure shows what the DCT does. The left-hand side shows a 10x10 image, with intensity values for each pixel. The
right-hand side shows a DCT of this image, with values rounded to integers after the DCT is calculated. After applying the DCT, we
are left with a grid of values, but many of these values will be zero or close to zero. That will make the transformed image easier
to compress, we can represent it in a way that makes it easy to skip over the zero values.
Example DCT Computation for a small image
We will be applying the DCT separately to 10x10-pixel blocks of the input image. Here, we'll describe how to apply the DCT and
its inverse to an individual block. Compressing an image will require applying this transformation to each 10x10 block and
encoding the result in an output file. Decompressing will require decoding each 10x10 block and applying the inverse DCT to that
block.
For an image block, we'll use X[ i ][ j ] to stand for represent the intensities of the individual pixels. So, X[ 0 ][ 0 ] will stand for
the intensity of a pixel in the upper left corner of the block, X[ 0 ][ 9 ] will be the intensity of a pixel in the upper right, X[ 9 ][ 9 ]
will be a pixel in the lower right corner of the block and X[ 9 ][ 0 ] will be the lower left corner. Even though pixel intensities are
integers in the range 0 .. 255, the transformation will expect these integers to be converted to real values (doubles, with values
between 0 and 255).
From the X values representing pixel intensities in a block, we will compute a 10x10 array of real numbers representing the
transformed block. We will call this array, Y and we will use Y[ m][ n ] to stand for a single element of this array. Each element of
Y will be computed as a weighted sum of all the values of the 10x10 X array. So, applying the DCT will be a little bit expensive.
Each of the 100 elements of Y will require computing a different weighted sum of all 100 values in the X array. There are more
efficient ways to compute the DCT, but we won't worry about that for this assignment.
The following formula shows how to compute the value of any Y[ m ][ n ]. In this definition, B stands for the block size of 10. The
constant, S is a scaling factor that we'll set to S = 0.044997. This value for S should make sure the DCT yields values that can be
encoded with a particular number of bits, and it should help to make sure small differences in how the DCT is computed don't
result in different rounding behavior. The values Sm and Sn are also scaling factors.
These will usually have a value of 1 except Sm will have a value of 1 / 2 when m = 0 and Sn will have a value of 1 / 2 when n = 0.