Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
KIT308 Multicore Architecture and Programming Assignment
SIMD Aims of the assignment The purpose of this assignment is to give you experience at writing a program using SIMD programming techniques. This assignment will give you an opportunity to demonstrate your understanding of: the where there is one there is many approach (WTIOTIM); structures of arrays; the use of SIMD for calculation; and the translation of conditional statements into SIMD. Due Date 11:55pm Friday 24th of September (Week 10 of semester) Late assignments will only be accepted in exceptional circumstances and provided that the proper procedures have been followed (see the School Office or this link for details) assignments which are submitted late without good reason will be subject to mark penalties if they are accepted at all (see the School office or this link for details on this as well). Forms to request extensions of time to submit assignments are available from the Discipline of ICT office. Requests must be accompanied by suitable documentation and should be submitted before the assignment due date. Assignment Submission Your assignment is to be submitted electronically via MyLO and should contain: An assignment cover sheet; A .zip (or .rar) containing a Visual Studio Solution containing a project for each attempted stage of the assignment (in the format provided in the downloadable materials provided below). A document containing: A table of timing information comparing the original single-threaded code (from assignment 1), the provided base code times, and the Stage 1, 2, 3, 4, and 5 SIMD implementation on each scene file (all running with the maximum threads natively supported by your CPU). An analysis of the above timing data. You do not need to (and shouldn't) submit executables, temporary object files, or images. In particular, you must delete the ".vs" diretory before submission as it just Visual Studio temporary files and 100s of MBs. Do not however delete the "Scenes" folder or the "Outputs" folder (but do delete the images within this one). Task/Topic Marks 1. Planes — Where there is One there is Many (WTIOTIM) 10% Correct implementation of WTIOTIM approach to isPlaneIntersected function to calculate distance to nearest object — and use in the objectIntersection function (in Intersection.cpp) 5% Correct implementation of WTIOTIM approach to create a short-circuiting isPlaneIntersected function to return (as soon as possible) whether any plane intersects — and use in the isInShadow function (in Lighting.cpp) 5% 2. Planes — Structures of Arrays 15% Correct declarations of data structures for SoA SIMD form of Plane container data 5% Correct code to convert AoS container to dynamically declared SoA structures (conversion should happen after the scene has been loaded) 5% Correct rewriting of isPlaneIntersected functions, and the calls to them, to use SoA code 5% 3. Planes — SIMD Conversion of Intersection Test 35% A. SIMD Conversion of short-circuiting isPlaneIntersected function 25% Correct and efficient (e.g. no use of if-statements, for-loops, or scalar code) intersection point (and exit point) calculation (i.e. t0) 5% Correct and efficient (e.g. no use of if-statements) handling of cases where no intersection occurs due to point being parallel to the plane (i.e. first if-statement) 5% Correct and efficient (e.g. no use of if-statements) handling of cases where intersection does occur (i.e. second if- statement) 5% Correct declaration of loop constants before loop 5% Correct handling of plane list length not being divisible by 8 5% B. SIMD Conversion of non-short-circuiting isPlaneIntersected function 10% Correct and efficient (e.g. no use of if-statements) SIMD calculation of closest intersection (i.e. t) 5% Correct and efficient (e.g. no use of if-statements) SIMD calculation of corresponding plane index (i.e. index) 5% 4. Cylinders 30% (NOTE: this section will require WTIOTIM conversion and SoA declarations and conversion from AoS for Cylinder container data) A. SIMD Conversion of short-circuiting isCylinderIntersected function 15% Correct and efficient (e.g. no use of if-statements, for-loops, or scalar code) intersection points calculation (i.e. tBody and tCaps) 5% Correct and efficient (e.g. no use of if-statements) handling of cases where no intersection occurs due to ray not intersecting with infinite cylinder (i.e. first if-statement) 5% Correct and efficient (e.g. no use of if-statements) handling of cases where intersection does occur (i.e. second/third and fourth/fifth if-statement) 5% B. SIMD Conversion of non-short-circuiting isCylinderIntersected function 10% Correct and efficient (e.g. no use of if-statements) SIMD calculation of closest intersection and corresponding cylinder index (i.e. t and index) 5% Correct and efficient (e.g. no use of if-statements) SIMD calculation of corresponding normal (i.e. normal) 5% C. SIMD Conversion of short-circuiting isCylinderIntersected function 5% Correct and efficient (e.g. no use of if-statements) SIMD calculation of mathematical helper functions that don't have direct intrinsics (i.e. abs and sign) 5% Documentation 10% Outputs showing timing information for the base assignment code and Stages 1–4 (with the maximum threads natively supported by your CPU) on all applicable scene files 4% Analysis of data (comparisons across the base code and Stages 1–4) 4% Well reasoned explanation of which areas of the code should be converted to SIMD next (i.e. would likely yield the largest speed increase for the average scene) and which areas would not be worth translating to SIMD. 2% Penalties Failure to comply with submission instructions (eg. no cover sheet, incorrect submission of files, abnormal solution/project structure, etc.) -10% Poor programming style (eg. insufficient / poor comments, poor variable names, poor indenting, obfuscated code without documentation, compiler warnings, etc.) up to -20% Lateness (-20% for up to 24 hours, -50% for up to 7 days, -100% after 7 days) up to -100% Marking This assignment will be marked out of 100. NOTE: if your code for a particular stage does not correctly produce the expected output images, you will only be able to receive a maximum of half marks for that stage — see below for more details. The following is the breakdown of marks: Marking and Correct Images As SIMD code is very very difficult to debug — as grows exponentially harder as the amount of it grows — there will be limited opportunity in the marking process to determine where exactly mistakes have been made, and even less chance of being able to provide fixes to those mistakes. As a result, if your code for a stage does not produce the expected image, then you will only be eligible for half marks for that stage of the assignment. In order to work within this constraint, you should attempt translations into SIMD in very small steps (i.e. a single line at a time, or even a partial line at a time). The lectures will demonstrate this approach to SIMD translation. You will likely receive more marks for a partially complete SIMD translation than a fully complete translation that doesn't work 100%. Programming Style This assignment is not focussed on programming style (although it is concerned with efficient code), but you should endeavour to follow good programming practices. You should, for example: comment your code; use sensible variables names; use correct and consistent indenting; and internally document (with comments) any notable design decisions. [NOTE: any examples in the provided assignment materials that don't live up to the above criteria, should be considered to be deliberate examples of what not to do and are provided to aid your learning ;P] The Assignment Task You are to modify the (square-based) multithreaded implementation of a "simple" raytracer from the first assignment to take advantage of SIMD instructions. This requires changes across multiple files. To aid you in this conversion, the sphere/ray intersection code has already been translated into SIMD code (by following the techniques required to do stages 1–3 of the assignment). From the provided (square-based) multithreaded raytracer implementation, for Stages 1–3 of the assignment you will create multiple subsequent versions that modify the plane implementation as follows: 1. Rewritten functions for the plane/ray intersection tests in a where-there-is-one-there-is-many approach. 2. Translation of array-of-structures (AoS) with structures-of-arrays (SoA) for the plane container. 3. Optimisation of the plane/ray intersection test to take advantage of SIMD. For Stage 4, you will follow a similar pattern, but for cylinder intersections. Implementation — SIMD Planes (Stages 1–3) The following section describes in detail the steps required to complete Stages 1–3 of the assignment. If you complete this step, only then should you attempt Stage 4 and do similar steps for lights. If you complete this step, only then should you attempt Stage 5. 1. Where there is One there is Many This stage involves replacing the isPlaneIntersected function with two new versions that use the where there is one there is many paradigm. In order to complete this step you will need to: Write two functions that take the entire plane container (or at least a pointer/reference) as an argument and will perform whatever action took place at the calling site. There are two versions as the isInShadow function (in Lighting.cpp) uses a short-circuited approach, exiting as soon as any plane collision is discovered, and the objectIntersection function (in Intersection.cpp) finds the closest plane intersected with (which means it must examine them all). A version of isPlaneIntersected that returns a boolean depending on whether or not a plane intersects with the ray (i.e. stopping as soon as one is found). A version of isPlaneIntersected that returns a boolean (as above), updates its t parameter to be the closest plane that intersects with the ray, and updates its index parameter to specify the index of the closest plane. 2. Structures of Arrays This stage involves modifying the Scene struct (Scene.h) to store a duplicate of the plane data via structure of arrays rather than the currently existing array of structures (currently in the planeContainer variable). In order to complete this step you will need to: Create a SoA copy of the data that is stored in planeContainer in the Scene struct. NOTE: this means the program has two copies of the plane data, the original one in AoS form and a second one in SoA form. As this assignment predominately only requires changes to isPlaneIntersected (and the sites of the calls to this function) the rest of the code will still use the original AoS version of the data. Fill this SoA copy of the plane data after the Scene struct has been loaded (dynamically allocating memory for the SoA representation). Rewrite the isPlaneIntersected function to make use of the SoA. At times this code update may require conversions to and/or from Vector structs to the equivalently stored data in the SoA. At the end of this stage the program should still produce the same results as the base code. Be thorough in your testing (i.e. test all the scenes) to ensure that everything works correctly before progressing. 3A. SIMD Conversion of isPlaneIntersected function This stage involves converting the first of the two new version of the isPlaneIntersected function from Stage 2 (the bool returning one) into a SIMD implementation. In order to complete this step you will need to: Convert many of the input parameters (or parts of them) into SIMD-ready values (e.g. the ray starting location and direction need to be converted into a SIMD format). Step through the SoA in chunks proportional to the number of value stored in each SIMD variable (i.e. 8). Convert the calculation to use SIMD. Care must be taken to correctly deal with the various conditional expressions in the loop. There are two if-statements that must be converted to SIMD code via the use of appropriate masking statements. Ensure your approach deals with the situation where the plane count isn't equally divisible by the number in each SIMD calculation (e.g. 9 planes with 8 in each SIMD value). If you follow the approach taken in the provided sphere code to creating the SoA copies, there should be nothing extra to do in this function to make this work. 3B. SIMD Conversion of isPlaneIntersected function This stage involves converting the latter of the two new versions of the isPlaneIntersected function from Stage 2 (the one that updates its t and index values) into a SIMD implementation. Most of this conversion is identical to step 3A, and the code can just be copied from that function. It does however require the addition of: New variables to keep track of the closest intersection found, and which plane index it was found at. Consolidation of the values calculated using SIMD into a single scalar return value. This should be done after the loop finishes. Hints / Tips The techniques required to complete each stage rely heavily on work done in the SIMD tutorials and the step-by-step approach shown in lectures — refer to them often. When implementing the SIMD stage, it's best to SIMD-ify small portions of code at a time (e.g. even a single line is often a lot) and then perform the rest of the calculation through the existing scalar code, or even compare the result of the SIMD version to the existing scalar code. Again for SIMD, it's helpful to render scenes at a greatly reduced size for testing, with an equally small block size, one sample per pixel, and with one thread (e.g. try using ‑input allMaterials.txt ‑size 8 8 ‑blockSize 8 ‑samples 1 ‑threads 1). It may even be helpful to fix the number of rays cast (MAX_RAYS_CAST) to be 1, so that reflection and refraction don't occur. This doesn't produce a very nice image, but images can be easily compared visually for problems (the example is only 64 pixels) and printf statements can be used to verify the correctness of calculations without outputting a huge amount of data (the use of debuggers is somewhat perilous in a multithreaded environment, but that shouldn't be an issue here). When converting conditional statements to masks it can be helpful to do this with scalars first (although C/C++ implementations often use 1 rather than 0xFFFFFFFF for true, so the calculations may be different). Write functions to output all the elements in a SIMD value for easier printf debugging (I am not a fan of how these types are shown in the debugger). Implementation — SIMD Cylinders (Stage 4) The following section describes in detail the steps required to complete Stage 4 of the assignment. You should only attempt this step if you have fully completed (and tested) Stages 1–3. This stage involves rewriting the isCylinderIntersected function. In order to complete this step you will need to: Apply the WTIOTM approach to the isCylinderIntersected function as done for spheres and planes (including creating two versions of the function). Create SoA declarations and conversions from AoS for Cylinder container data (in a similar way that this was done for spheres and planes). Convert the isCylinderIntersected function to SIMD. NOTE: the non-short-circuiting version of isCylinderIntersected needs to return (via pointers to values) the new value for t, the index of the intersected object, and (in addition to what is required for spheres and planes) the normal at the point of intersection. Missing Mathematical Functions There are no definitions in the AVX/AVX2 instruction sets of a couple of mathematical functions needed to translate isCylinderIntersected to SIMD: The abs function (as defined in cstdlib, but mapping to fabsf), as defined here. The sign function (as defined in Primitives.h). You must define replacements for these functions that work on a SIMD vector. Hints / Tips All the hints/tips from the stages 1–3 apply here (i.e. complete this in small steps, use 8x8 images as tests, use a lot of printfs, etc.). The calculation and logic for the cylinder/ray intersection test is significantly more complex than for the plane/ray intersection test, so make sure you test each small change thoroughly – preferrably with the testing scripts, not just by examining the image by eye. Timing Test BuildConfig Average Time of 5 Runs (Milliseconds) Base Code SIMD Spheres Stage 1 WTIOTIM Stage 2 SoA Stage 3 SIMD Planes Stage 4 SIMD Cylinders 1. -input Scenes/cornell.txt -size 1024 1024 -samples 4 Release FastRelease 2. -input Scenes/allmaterials.txt -size 1000 1000 -samples 4 Release FastRelease 3. -input Scenes/5000spheres.txt -size 1280 720 -samples 1 Release FastRelease 4. -input Scenes/donuts.txt - size 1024 1024 -samples 1 Release FastRelease 5. -input Scenes/cornell- 199lights.txt -size 1024 1024 -samples 1 Release FastRelease Test Image Result 1. -blockSize 8 -size 8 8 -samples 1 -threads 1 2. -blockSize 16 -size 256 256 -samples 1 3. -blockSize 16 -size 256 256 -samples 2 Documentation When completing either stage of the assignment you should provide: timing information for each scene file for the average time (to 1 decimal place) taken over 5 runs for a render using a thread count supported by your the number of logical processors in your system and a block size of 16. See a later section for an example format for this timing table; and an explanation of the results (e.g. why there's no difference between the performance of stages 1, 2, and 3 (NOTE: this is a made up example and isn't necessarily what to expect), or why a particular type of implementation works well (or poorly) on a particular scene, etc.). This explanation should be with respect to the CPU on the system on which you ran the tests, and you should discuss how the architectural features of the CPU explain the results. Tests / Timing The following table lists all the tests that your code needs to generate correctly at each stage. It also shows the timing tests that need to be performed in order to fully complete the documentation section of the assignment. Fully completing this tests may take up to an hour (with the 5 required runs) on some hardware, so plan your time accordingly. In order to confirm your images match the images created by the base version of the assignment code, it's strongly recommended you use an image comparison tool. For part of the marking for this, Image Magick will be used (as it was in Assignment 1). The timing tests need to be run in the "Release" build configuration as well as "FastRelease" build configuration. The "Release" mode uses a "strict" floating-point model setting to ensure that the SIMD conversion doesn't introduce calculation inconsistencies with the base code, and the "FastRelease" allows for inaccuracies in order to prioritise speed. Accordingly, only the former should be used for the comparison tests with ImageMagick (and its this version that will be used by the marker to perform these tests) — the latter will show a varying amount of difference depending on the test. The following tests will be run on your code for each scene file. You also might find the 8x8 tests useful for your SIMD code conversion: Provided Materials The materials provided with this assignment contain: The source code of the base multi-threaded version of the raytracer (i.e. a solution to Assignment 1). The source code of the base multi-threaded version of the raytracer with a SIMD implementation of the Ray-Sphere test. A set of scene files to be supplied to the program. A set of reference images for testing. Some batch files for testing purposes. Download the materials as an ZIP file. Source Code The provided MSVC solution contains 6 projects. RayTracerAss2 The provided code consists of 19 source files. Raytracing logic: Raytrace.cpp: this file contains the main function which reads the supplied scene file, begins the raytracing, and writes the output BMP file. The main render loop, ray trace function, and handling of reflection and refraction is also in this file. Intersection.h and Intersection.cpp: these files define a datastructure for capturing relevant information at the point of intersection between a ray and a scene object and functions for testing for individual ray-object collisions and ray-scene collisions. Lighting.h and Lighting.cpp: these files provide functions to apply a lighting calculation at a single intersection point. Texturing.h and Texturing.cpp: these files provide functions for the reading points from 3D procedural textures. Constants.h: this header provide constant definitions used in the raytracing. Basic types: Primitives.h: this header contains definitions for points, vector, and rays. It also provides functions and overloaded operators for performing calculations with vectors and points. SceneObjects.h: this header file provides definitions for scene objects (ie. materials, lights, spheres, planes, and cylinders). Colour.h: this header defines a datastructure for representing colours (with each colour component represented as a float) and simple operations on colours, including conversions to/from the standard BGR pixel format. Scene definition and I/O: Scene.h and Scene.cpp: the header file contains the datastructure to represent a scene and a single function that initialises this datastructure from a file. The scene datastructure itself consists of properties of the scene and lists of the various scene objects as described above. The implementation file contains many functions to aide in the scene loading process. Scene loading relies upon the functionality provided by the Config class. Config.h and Config.cpp: this class provide facilities for parsing the scene file. SimpleString.h: this is helper string class used by the Config class. Image I/O: ImageIO.h and ImageIO.cpp: these files contain the definitions of functions to read and write BMP files. Miscellaneous: Timer.h: this class provides a simple timer that makes use of different system functions depending on whether TARGET_WINDOWS, TARGET_PPU, or TARGET_SPU is defined (we don't use the latter two, but I left this file unchanged in case anyone wanted to see how such cross-platform stuff can be handled). Stage0_SIMDSpheres The provided code consists of 20 source files. There are a number of modified files: Scene definition and I/O: Scene.h: SIMD SoA datastructures for representing spheres. Raytracing logic: Raytrace.cpp: simdifySceneContainers makes a copy of the SoA sphere data in AoS SIMD form. Intersection.h: prototypes of WTIOTIM short-circuiting and non-short-circuiting versions of isSphereIntersected. Intersection.cpp: selectMinimumAndIndex helper function. WTIOTIM short-circuiting version of isSphereIntersected. WTIOTIM non-short-circuiting version of isSphereIntersected.