Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
Preview Test: INFS4205/7205 Semester One Final Examination 2020 Test Information Description Instructions Timed Test This test has a time limit of 2 hours and 30 minutes.This test will save and be submitted automatically when the time expires. Warnings appear when half the time, 5 minutes, 1 minute, and 30 seconds remain. [The timer does not appear when previewing this test] Multiple Attempts Not allowed. This Test can only be taken once. Force Completion This Test can be saved and resumed at any point until time has expired. The timer will continue to run if you leave the test. Your answers are saved automatically. INFS4205/7205 Advanced Techniques for High Dimensional Data Semester One 2020 - Final Examination Exam Duration: 120 minutes + 30 mins additional time Materials permitted include Unrestricted number of A4 paper for reviewing and planning your approach to the exam, Water bottle, and Unrestricted calculator. If you experience any technical diculties during the exam, talk to your online invigilator via the webcam or chat functions. If the technical trouble cannot be resolved, you should ask for an email (or transcript of the chat) documenting any technical advice provided to support your request for a deferred exam. Answer all questions. There are 50 marks in total. QUESTION 1 Please use this space to specify any assumptions you have made in completing the exam and which questions those assumptions relate to. You may also include queries you may have made with respect to a particular question, should you have been able to ‘raise your hand’ in an examination room. 0 points Save Answer QUESTION 2 a. When x = y b. When x is a prex of y Many spatial indexing methods used in a 2D space can also be used in a higher dimensional space. (1) [3 Marks ] How can we obtain a Z-value for a point in a 3D space? Please provide one method with detailed description. [Hint: You can follow the idea of recursive decomposition to obtain a Z-value in a 2D space.] (2) [2 marks ] For two Z-values x and y obtained in a 3D space, what is the spatial relationship between the cells represented by them in the following two cases respectively? [A spatial relationship can be identical, inside, overlap, disjoint etc.] 5 points Save Answer QUESTION 3 (1) [5 marks] Compare and contrast R-tree and R+-tree when they are used to index polygons. [Hint: you should explain how they index polygons, what are in common and what are dierent, their relative advantages and disadvantages.] (2) [2 marks] How should we delete a non-leaf node (like root) in point Quadtree? 7 points Save Answer QUESTION 4 A set of spatiotemporal data in a geographical space can be viewed as 3D data, where the three dimensions are (x, y, t) with (x, y) for geographical locations and t for time. (1) [3 marks] Why 3D-Tree is not ecient for trajectory data? (2) [4 marks] Historical R-tree (HR-tree) is proposed to index trajectory more naturally. Please briey describe the key ideas and benets of HR-Tree. (3) [2 marks] What is the basic idea of TPR-Tree and what is its benet? 9 points Save Answer QUESTION 5 It is a fundamental operation to measure trajectory similarity. Explain and discuss the following two methods. (1) [2 marks] What is the lock-step window-based Euclidian distance? (2) [2 marks] What is LCSS? (3) [1 mark] Which method above is more computationally ecient? Why? (4) [1mark] which one is more robust to data noise? Why? 6 points Save Answer QUESTION 6 “Dimensionality curse” is a well-known problem in high dimensional space. (1) [4 marks] Please explain the problem of “dimensionality curse”, and briey discuss its impact on the space-centric index, the object- centric index, and the nearest neighbor search in a high dimensional space. (2) [2 marks] If the dimensionality curse is so severe, why we still can organize the data in practice eciently? (3) [2 marks] VA-File is a very simple index structure to index the uniformly distributed high-dimensional data. Queries on the VA-le, such as nding the nearest neighbors, make use of a two-stage procedure such as a lter-and-rene algorithm. Why is it not necessary to sort the elements obtained in the rst stage prior to starting to process them in the second stage? 8 points Save Answer QUESTION 7 Consider an image database that supports color-based similarity search. (1) [2 marks] Describe a method to represent an image using 256 colors. (2) [3 marks] Dene a similarity function that can be used to compare the color features extracted in the above step. Explain the steps of the query with an image. (3) [2 marks] Precision is the ratio of the correctly returned ones in the returned results, while the recall is the ratio of the correctly returned ones in all the correct ones. How can we achieve high recall? At what cost? (4) [2 marks] How can we increase the precision and recall at the same time? 9 points Save Answer QUESTION 8 Skyline computation is very powerful to identify a set of results that satisfy all the monotonically increasing preference functions for multi-criteria optimization. Its power does not limit to the kNN search, but can also extend to other applications. For example, in the route planning in road network, we mostly consider the distance d as an optimization goal. However, other factors like toll charge, and travel time are also very important.Therefore, it would be benecial if we could provide users with a set of skyline paths instead of only one path. Suppose we have a graph G(V,E), where V is the vertex set denoting the intersections, and E is the edge set denoting the roads. For each edge (u,v) of E, it has two weights: length d(u,v) and cost c(u,v). Suppose the starting vertex is s and the destination vertex is t. (1) [3 marks] Dene the skyline shortest path problem using the above notations. (2) [3 marks] Please give a sketch of how to solve this problem. Either in plain English or pseudo-code. 6 points Save Answer