In the lectures, you have studied the differences between sequential and random file access. In this assignment, you will read ···
In the lectures, you have studied the differences between sequential and random file access. In this assignment, you will read increasing amounts of data using sequential and random access on a large file and plot the results obtained.
a. This will be used as our test file.
b. Upload this file to your Google Drive.
c. Open a new Python 3 notebook in Google Colab (Warning: Python 2 notebooks will not be accepted).
d. Import Pandas, NumPy, and Matplotlib1 , you may use them in this homework.
e. Many of you indicated that you have used Jupyter Notebook before, so we want to provide a starter notebook that you can open in Google Colab to get a head start. The notebook outlines the expected layout of your final submission. This step is not mandatory but is strongly recommended.
f. For the Lab 1 submission due on Thursday night, simply plot the y = sin x function in Google Colab and submit the .ipynb file including the plot.
1 Here are good guides to get started with Pandas, Numpy, and Matplotlib:
Pandas Quick Tutorial
Numpy Quick Tutorial
Matplotlib Quick Tutorial
a. Open the test file in unbuffered mode2.
b. Sequentially read [1, 4, 16, 64, 256, 1024, 4*1024, 16*1024, 64*1024, 256*1024, 1024*1024] blocks of data3 . Use a fixed block size of 4KB. Measure the latency for each iteration in terms of wall-clock time.
c. Repeat 2b with random reads instead of sequential.
d. Plot the latencies measured in 2b and 2c against the number of blocks read. Both sequential and random results should appear on the same plot and the number of blocks should be scaled logarithmically instead of linearly. Briefly describe your observations from this plot.
e. Calculate the bandwidth for each iteration of 2b and 2c using the latency and amount of data transferred. Plot the results in the same manner as latency and briefly describe your observations.
2 Refer to the documentation to switch buffering off. Also make sure you use the binary read mode while opening.
3 It may be possible that you reach the end of the file prematurely during sequential access. Make sure to seek to the start of the file again and continue reading in this case.
3. Measurement Statistics [50 points]
a. Run 10 times of steps 2b and 2c and store the results.
b. For each of the 11 iterations, calculate the mean and standard error4 over the 10 runs.
c. Use the results of 3b to generate errorbar plots for latency and bandwidth. Again, both sequential and random results should be on the same plot and the number of blocks should be scaled logarithmically. Briefly describe your observations from these plots.
4 The standard error is defined as the standard deviation divided by the square root of the number of observations.
Submission
Grading Criteria
Important Notes