Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
ECSE444 Microprocessors
Lab 1: Kalman Filter using Floating-Point Assembly Language Programming and
its Evaluation with C and CMSIS-DSP
Objective
This exercise introduces the ARM Cortex and FP coprocessor assembly languages, instruction sets
and their addressing modes. The ARM calling convention will need to be respected, such that the
assembly code can be used with C programming language. The lab and a prior tutorial will
introduce you to the STM32CubeIDE, including the compiler and associated tools. In the second
part of the exercise, the code developed here will be used in a larger program written in C and the
Cortex Microprocessor Software Interface Standard (CMSIS-DSP) application programming
interface (API) that incorporates a large set of routines optimized for different ARM Cortex
processors.
Hence, this lab consists of two components, each requiring a week to compete:
Part 1: Assembly language exercise – Kalman filter in one dimension
Part 2: Combining assembly/embedded C and optimizing performance; CMSIS-DSP
Background – ARM Calling Convention
In assembly and C, parameters for a subroutine are passed via stack or internal registers. In ARM
processors, the registers R0:R3 are used for passing integer or pointer variables. Up to four
parameters are placed in these registers, and the result is placed in R0 and R1. If any parameter
requires more than 32 bits, then multiple registers are used. If there are no free scratch registers, or
the parameter requires more registers than remain, then the parameter is pushed onto the stack.
Since we will be also dealing with the floating-point parameters on hardware that performs
floating-point arithmetic, be aware of having the option of using either software or hardware
floating-point linkage, depending on whether the parameters are passed via general purpose or
floating-point registers. The objective here is to use the hardware linkage, hence the floating-point
registers will be used for parameter and result passing.
In addition to the class notes, please refer to the document “Procedure Call Standard for the ARM
Architecture”, especially its sections describing The Base Procedure Call Standard. Other
documents that will be of importance include the Cortex M4 programming manual, quick reference
cards for ARM ISA and the (vector) floating point instructions, all available within the course
online documentation. This particular order of passing parameters is applied by major compilers.
Using the STM32CubeIDE Integrated Development Environment Tool
To prepare for Lab 1, you will need to go through Tutorial 1, where you will learn how to create and
define projects, including assembly code projects. The tutorial shows you how to let the tool insert
the proper startup code for the given processor, write and compile the code, as well as provide the
basics of the program debugging.
Lab 1: Definition
You will develop the working assembly language code for single-variable Kalman filter that can be
used in later exercises. The single-variable version avoids the use of matrix operations required for
larger Kalman filters, and makes it amenable to an assembly code implementation, while it still
allows experimenting with and appreciating the features of this filter.
Kalman Filter
Kalman filter is a state-based adaptive estimator of a physical process. Its estimation error is
provably minimal for linear systems with Gaussian noise. It is the type of an adaptive filter, which
is generally preferred to the fixed linear filters. The state space adaptation is performed by a
sequence of discrete steps, during which the parameters of the filter change depending on the
observed physical value, as well as the current state.
Kalman filter performs the adaptation by maintaining the internal state, consisting of the estimated
value x, the adaptive tuning factor k and the estimation error, represented by its covariance p. To
obtain these values, it requires the knowledge of the noise parameters of the input measurements
and the state estimation, represented by their respective covariances q and r.
The high-level description of the Kalman filter code is given in the working python program in
Figure 1. While the code is fully functional, and it can be directly run within a larger (Python)
program, it is used here as a compact high-level specification. Please note that only the update
function is required in the assembly part of Lab 1. In the second part, when you include your
assembly code with the C code, the initialization function will be needed. That part will be written
in C. Note also that there will be differences in the code caused by different syntax and semantics of
C, compared to Python. For instance, you will need to carefully specify the data types and include
the function prototypes in the code to be able to correctly link the assembly and C code.
Figure 1: Python Code Definition of the Kalman Filter Class used in Lab 1
One interpretation of Kalman filter is that of tracking (or estimating the trajectory of) device for the
input signal stream. At each time instance i, it generates the tracking consisting of the value vector
x[i], that aims to reconstruct the original signal measurement[i]. An important feature of the
Kalman filter is that the estimated value differs from the input by a value that has the statistical
properties of the white noise. You will be asked later to inspect those properties using your
additional statistical processing.
Part 1: Exercise
Write a subroutine kalman in ARM Cortex M4 assembly language that processes one measurement
input to update the local state required for the process estimation. You should naturally use the
built-in floating-point unit by using the existing floating-point assembler instructions
Your subroutine should follow the ARM compiler parameter passing convention. Recall that up to
four integer and 4 floating-point parameters can be passed by integer and floating-point registers,
respectively. For instance, R0 and R1 to contain the values of the first two integers and S0 will
contain the value of the floating-point parameter to be added. If the datatype is more complex (e.g,
struct or a matrix), then a pointer to it is passed instead. For the function return value, the register
R0 or S0 are used for integers and floating-point result of the subroutine, respectively.
In your case, there is a need for one fixed-point (the address of the struct) detailed below and one
floating-point input parameter (value of current measurement). The state of the Kalman filter is the
result of your subroutine, but keep in mind that this can be achieved in a way that no output value is
produced by the subroutine. This is effectively, the “call by reference” that you should be familiar
with from the C programming.
The filter will hold its state as a quintuple (q, r, x, p, k) that are five single-precision floating-point
numbers. In the next lab, it will be convenient to keep these state variables in a single C language
struct, holding Kalman filter state in each time step, consisting of:
float q; //process noise covariance
float r; //measurement noise covariance
float x; //estimated value
float p; //estimation error covariance
float k; // adaptive Kalman filter gain.
The operation of the filter should be correct for all operations of inputs and state variables,
including when there are the arithmetic conditions such as overflow.
Function Requirements
1. All registers specified by ARM calling convention to be preserved, as well as the stack
position should be unaffected by the subroutine call upon returning.
2. The calling convention will obey that of the compiler. Two registers, R0 and S0 contain the
arguments, which are respectively the pointer to the state variables structure and the
current measurement value. While there is no required result value, please note that the
state variables pointed to by the pointer, will be modified.
3. No memory location should be used to store any intermediate data. Not only that it should
be faster, but no memory allocation is needed that way.
4. The subroutine should be location independent. It should be able to run properly when it is
placed in different memory locations.
5. The subroutine should not use any global variables.
6. The subroutine should be as fast as possible, but robust and correct for all cases of positive
and negative numbers, as well as the overflows. Grading of the experiment will depend on
how efficiently you solve the problem, with all the corner cases being correct.
Hints:
ARM Assembly Code
1. It helps to solve the problem conceptually at first, taking into account all corner cases,
including the arithmetic overflows. Move to assembly code when the algorithm is well
defined. (If you wish to implement the code in C, it can only help you here and also for Part
2, where you will be asked to produce several variations to the basic solution.)
2. Follow the examples of codes and instructions elaborated in classes, tutorials and material
posted online for getting quickly to the working code. When optimizing for speed, some
ARM Cortex instructions, such as those replacing directly bits between two words could
help.
3. Document assembly code thoroughly and thoughtfully. It takes only a few hours for you to
forget what each register holds and what implicit assumptions were used. It is useful to
consider the assembler features that improve the coding style, such as the declaration of
register variables, symbolic constants and similar.
4. Please keep in mind that the processor does not activate the floating-point unit on powerup.
Following the example given in the tutorial, ensure that the floating-point unit is not
bypassed.
Linking
1. When creating a new project, it is simplest to follow the created template. You will notice
that your IDE creates main.c and also the startup code in assembly. You can either
a. modify the startup code to branch to kalman rather than __main for testing
assembly code alone, prior to embedding into C main program, or
b. you can call your assembly code from the main (all calling conventions need to be
observed). Please note that you will need to declare as global (exported) the
subroutine name in your assembly code, as well as follow up the syntax rules for
GCC Assembly language.
2. For linking with C code that has main.c, no special measures should be needed, so option b)
above is better and is readily expandable to the Part 2 extension.
3. If the linker complains about some other missing variable to be imported in startup code,
you can either declare it as “dummy” in your assembly code, or comment its mention in the
startup code.