Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
ECSE444 Microprocessors
Lab 1: Kalman Filter using Floating-Point Assembly Language Programming and its Evaluation with C and CMSIS-DSP Objective This exercise introduces the ARM Cortex and FP coprocessor assembly languages, instruction sets and their addressing modes. The ARM calling convention will need to be respected, such that the assembly code can be used with C programming language. The lab and a prior tutorial will introduce you to the STM32CubeIDE, including the compiler and associated tools. In the second part of the exercise, the code developed here will be used in a larger program written in C and the Cortex Microprocessor Software Interface Standard (CMSIS-DSP) application programming interface (API) that incorporates a large set of routines optimized for different ARM Cortex processors.
Hence, this lab consists of two components, each requiring a week to compete: Part 1: Assembly language exercise – Kalman filter in one dimension Part 2: Combining assembly/embedded C and optimizing performance; CMSIS-DSP
Background – ARM Calling Convention In assembly and C, parameters for a subroutine are passed via stack or internal registers. In ARM processors, the registers R0:R3 are used for passing integer or pointer variables. Up to four parameters are placed in these registers, and the result is placed in R0 and R1. If any parameter requires more than 32 bits, then multiple registers are used. If there are no free scratch registers, or the parameter requires more registers than remain, then the parameter is pushed onto the stack. Since we will be also dealing with the floating-point parameters on hardware that performs floating-point arithmetic, be aware of having the option of using either software or hardware floating-point linkage, depending on whether the parameters are passed via general purpose or floating-point registers. The objective here is to use the hardware linkage, hence the floating-point registers will be used for parameter and result passing. In addition to the class notes, please refer to the document “Procedure Call Standard for the ARM Architecture”, especially its sections describing The Base Procedure Call Standard. Other documents that will be of importance include the Cortex M4 programming manual, quick reference cards for ARM ISA and the (vector) floating point instructions, all available within the course online documentation. This particular order of passing parameters is applied by major compilers.
Using the STM32CubeIDE Integrated Development Environment Tool
To prepare for Lab 1, you will need to go through Tutorial 1, where you will learn how to create and define projects, including assembly code projects. The tutorial shows you how to let the tool insert the proper startup code for the given processor, write and compile the code, as well as provide the basics of the program debugging.
Lab 1: Definition You will develop the working assembly language code for single-variable Kalman filter that can be used in later exercises. The single-variable version avoids the use of matrix operations required for larger Kalman filters, and makes it amenable to an assembly code implementation, while it still allows experimenting with and appreciating the features of this filter. Kalman Filter Kalman filter is a state-based adaptive estimator of a physical process. Its estimation error is provably minimal for linear systems with Gaussian noise. It is the type of an adaptive filter, which is generally preferred to the fixed linear filters. The state space adaptation is performed by a sequence of discrete steps, during which the parameters of the filter change depending on the observed physical value, as well as the current state.
Kalman filter performs the adaptation by maintaining the internal state, consisting of the estimated value x, the adaptive tuning factor k and the estimation error, represented by its covariance p. To obtain these values, it requires the knowledge of the noise parameters of the input measurements and the state estimation, represented by their respective covariances q and r.
The high-level description of the Kalman filter code is given in the working python program in
Figure 1. While the code is fully functional, and it can be directly run within a larger (Python) program, it is used here as a compact high-level specification. Please note that only the update function is required in the assembly part of Lab 1. In the second part, when you include your assembly code with the C code, the initialization function will be needed. That part will be written in C. Note also that there will be differences in the code caused by different syntax and semantics of C, compared to Python. For instance, you will need to carefully specify the data types and include the function prototypes in the code to be able to correctly link the assembly and C code.
Figure 1: Python Code Definition of the Kalman Filter Class used in Lab 1
One interpretation of Kalman filter is that of tracking (or estimating the trajectory of) device for the input signal stream. At each time instance i, it generates the tracking consisting of the value vector x[i], that aims to reconstruct the original signal measurement[i]. An important feature of the Kalman filter is that the estimated value differs from the input by a value that has the statistical properties of the white noise. You will be asked later to inspect those properties using your additional statistical processing.
Part 1: Exercise Write a subroutine kalman in ARM Cortex M4 assembly language that processes one measurement input to update the local state required for the process estimation. You should naturally use the built-in floating-point unit by using the existing floating-point assembler instructions
Your subroutine should follow the ARM compiler parameter passing convention. Recall that up to four integer and 4 floating-point parameters can be passed by integer and floating-point registers, respectively. For instance, R0 and R1 to contain the values of the first two integers and S0 will contain the value of the floating-point parameter to be added. If the datatype is more complex (e.g, struct or a matrix), then a pointer to it is passed instead. For the function return value, the register R0 or S0 are used for integers and floating-point result of the subroutine, respectively.
In your case, there is a need for one fixed-point (the address of the struct) detailed below and one floating-point input parameter (value of current measurement). The state of the Kalman filter is the result of your subroutine, but keep in mind that this can be achieved in a way that no output value is produced by the subroutine. This is effectively, the “call by reference” that you should be familiar with from the C programming.
The filter will hold its state as a quintuple (q, r, x, p, k) that are five single-precision floating-point numbers. In the next lab, it will be convenient to keep these state variables in a single C language struct, holding Kalman filter state in each time step, consisting of: float q; //process noise covariance float r; //measurement noise covariance float x; //estimated value float p; //estimation error covariance float k; // adaptive Kalman filter gain.
The operation of the filter should be correct for all operations of inputs and state variables, including when there are the arithmetic conditions such as overflow.
Function Requirements 1. All registers specified by ARM calling convention to be preserved, as well as the stack position should be unaffected by the subroutine call upon returning. 2. The calling convention will obey that of the compiler. Two registers, R0 and S0 contain the arguments, which are respectively the pointer to the state variables structure and the current measurement value. While there is no required result value, please note that the state variables pointed to by the pointer, will be modified. 3. No memory location should be used to store any intermediate data. Not only that it should be faster, but no memory allocation is needed that way. 4. The subroutine should be location independent. It should be able to run properly when it is placed in different memory locations. 5. The subroutine should not use any global variables. 6. The subroutine should be as fast as possible, but robust and correct for all cases of positive and negative numbers, as well as the overflows. Grading of the experiment will depend on how efficiently you solve the problem, with all the corner cases being correct.
Hints: ARM Assembly Code
1. It helps to solve the problem conceptually at first, taking into account all corner cases, including the arithmetic overflows. Move to assembly code when the algorithm is well defined. (If you wish to implement the code in C, it can only help you here and also for Part 2, where you will be asked to produce several variations to the basic solution.) 2. Follow the examples of codes and instructions elaborated in classes, tutorials and material posted online for getting quickly to the working code. When optimizing for speed, some ARM Cortex instructions, such as those replacing directly bits between two words could help. 3. Document assembly code thoroughly and thoughtfully. It takes only a few hours for you to forget what each register holds and what implicit assumptions were used. It is useful to consider the assembler features that improve the coding style, such as the declaration of register variables, symbolic constants and similar. 4. Please keep in mind that the processor does not activate the floating-point unit on powerup. Following the example given in the tutorial, ensure that the floating-point unit is not bypassed.
Linking 1. When creating a new project, it is simplest to follow the created template. You will notice that your IDE creates main.c and also the startup code in assembly. You can either a. modify the startup code to branch to kalman rather than __main for testing assembly code alone, prior to embedding into C main program, or b. you can call your assembly code from the main (all calling conventions need to be observed). Please note that you will need to declare as global (exported) the subroutine name in your assembly code, as well as follow up the syntax rules for GCC Assembly language. 2. For linking with C code that has main.c, no special measures should be needed, so option b) above is better and is readily expandable to the Part 2 extension. 3. If the linker complains about some other missing variable to be imported in startup code, you can either declare it as “dummy” in your assembly code, or comment its mention in the startup code.