Relational Operators and Memory Buffer
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
COMP9315 Assignment
Relational Operators and Memory Buffer
Management
DBMS Implementation
Last updated: Saturday 8th April 6:48pm
Most recent changes are shown in red ... older changes are shown in brown.
Recent Updates
[2023-04-07 22:40]
A minor update for test4.
[2023-04-07 15:00]
We have prepared four additional test cases. You can download the updated code from
the same address. Note that the updated version is only about test cases. No code is
changed. Please let us know if you find some errors in the source code.
We also provides more marking details. In real applications, you cannot make any
assumption about the input. Therefore, around 10--15 test cases (including the provided
ones) will be used in marking. If you implement the buffer correctly, your output should pass
all these cases. To facilitate the verification of your solution in marking, db.c may be
modified at two places. First, information in log_read_page, log_release_page,
log_open_file, and log_close_file will be printed as described in code comments.
Second, as you can see in init_db, the current version simply assigns page id by
increasing an integer from 0. In marking, a different way will be used to produce page id.
That means the only way you get the page id is from the first INT64 in each page, and it can
be any possible value. The only guaranteed thing is that each page id is unique for each
table (i.e., page id of pages from different tables may be same).
We also provide some tips to do assignment. As we introduced in lectures, you do not need
to consider the data in current memory buffers when analyzing IOs (i.e., you can assume the
memory buffer is empty). A reference can be found in cost models. For example, when join
two tables via nested for loop, you need to choose one table for outer for loop to minimize
IOs. Some table pages may already in the memory buffer, but you do not need to consider
them. The clock-sweep strategy is an important in this assignment. To help you do the
assignment, we present the following pseudocode about the process when requesting a
page from the buffer. You can compare it with your implementation.
... initialization ...
nvb = 0
foreach slot in buffer[]: slot.pin=0, slot.usage=0
... request a page from buffer ...
function request_page(page)
if(page in buffer)
buffer[page index].usage++
buffer[page index].pin=1
return the page
while(true)
if(buffer[nvb].pin == 0 && buffer[nvb].usage == 0)
read page from disk to buffer[nvb]
buffer[nvb].pin = 1
buffer[nvb].usage = 1
nvb = (nvb+1) % buffer_size
return the page
else
if(buffer[nvb].usage > 0) buffer[nvb].usage --
nvb = (nvb+1) % buffer_size
... notify buffer stop using the page ...
function release_page(page)
buffer[page index].pin = 0
[2023-04-03]
We provide some tips for doing assignment and marking. To simplify the assignment, you
do not need to leave an output buffer for both select and join. For example, assume
that N represent the number of buffer slots. To conduct a nested for-loop join, you can use
N-1 slots to store N-1 pages of one table and use 1 slot to store one page of the other table
(note that in lecture notes, you need to allocate one slot as an output page). You can also
use additional space (e.g., arrays) for sorting or building hash table. All additional space use
in each join need to be freed. The other tip for marking the join operation is that you need to
consider the performance of the join order. For instance, in the nested for-loop join, using
a different table in the outer loop may produce different read IOs. Therefore, you need to
choose the reasonable table in the outer for-loop. As we discussed in lectures, when
analyzing the performance (number of IOs) of an operation, you do not need to consider the
data in the memory buffer slots. If you are not clear about any requirement of the
assignment, feel free to ask questions on the forum.