**Assignment updates**

*15/11/21* Added matrix multiplication example.

*12/11/21* Stick to minimal *while loop*, as using *for loop* gives
different results.

*04/11/21* Added an explanation of BRUTE targets.

# 1 Assignment

Measure how cache memory influences memory access time. Provide the
results in the form of **graph** similar to the figure below.

Create an array of the following structures:

#define GAP 0 struct elem { struct elem *next; int gap[GAP]; };

The size (in bytes) of the array is the "working set size" from the
graph above, i.e. the graph shows how does a measured quantity depend on
the size of the array. Your graph should show memory access time for
working set sizes **from 1 KiB to 4 MiB**.

The goal is to measure how much CPU cycles is needed for accessing an
element in the structure depending on the access pattern (random,
sequential). To measure that, write a program that will traverse the
above array in **sequential** and **random** order and measure the
total time. The total time divided by the number of accessed elements
will give you the average time for accessing a single element.

It is essential for your program to perform only the necessary
operations. Therefore, prepare the array according to the figures below.
The `next` pointer will point either to the following element or to a
random one. The last traversed element will always point to the first
traversed one. In case of random order, it is important that the "chain"
traverses all elements, i.e. there are no "cycles" shorter than the
whole array.

The measuring program will be a simple loop where you measure how long
it takes to iterate e.g. ten million times over the elements of the array.
The resulting time, divided by ten millions, will correspond to the
average time per element. For your measurements, set the `GAP`
constant to 0, but if you are interested, you can also measure results for
other values. The measuring loop should be at least as simple as in
the following example:

int jumps = 10000000; struct elem *p = &array[0]; unsigned start = sysTimestamp(); while (--jumps) p = p->next; unsigned end = sysTimestamp();

Note: Using different loop (e.g., for loop) results into different values, especially for sequentially ordered array.

# 2 Submission guidelines

To be awarded by points, your solution has to be both checked by upload system and demonstrated on real hardware (MicroZed). BRUTE contains two targets: 5cache and 5cache-hw:

- 5cache tests your code in the simulator to find out whether you link the array properly. It is designed to help you and to confirm that your implementation is suited for the measurement on the board.
- 5cache-hw tests your code on the board and outputs the measurements.

Created application is

**Downloadable Kernel Module**.Entry point to the program is called

`measureCache`and it takes two arguments, as written below.Use this code (inside config.h) as a base structure:

/* config.h * PSR 5cache assignment * DO NOT MODIFY */ /* sysClkRateSet(CLOCK_RATE) */ #define CLOCK_RATE 1000 #define GAP 0 struct elem { struct elem *next; int gap[GAP]; }; #define MAX_ARRAY_LEN (4*1024*1024/sizeof(struct elem)) /* Modes for `measureCache` */ #define SEQUENTIAL 0 #define RANDOM 1 /* * seqArray() * : (struct elem*) array -- array to modify * : (int) n -- number of elements to modify * * This function takes first `n` elements of `array` * and links them together in a sequential order, * i.e., * array[0].next == array[1] * array[1].next == array[2] * ... * array[n-1].next == array[0] */ void seqArray(struct elem* array, int n); /* * ranArray() * : (struct elem*) array -- array to modify * : (int) n -- number of elements to modify * * This function takes first `n` elements of `array` * and links them together in a random order, * i.e. (e.g.), * array[0].next == array[5] * array[5].next == array[1] * ... * array[x].next == array[0] */ void ranArray(struct elem* array, int n); /* * measureCache() * : (int) mode -- mode of ordering elements in an array * : (int) jumps -- number of pointer "jumps" in the single working * set measurement * * This function makes a measurement of memory access time. * First, it prepares an array (according to selected `mode`). * Then, for every working set size (1 KiB - 4 MiB) it measures * the time of following the links in the array `jumps` times. * * The function should print to stdout 'Measurement started' * message. Then, it prints for each measurement the working set * size (in bytes) and the average access time for a single * "jump" (in CPU ticks). At the end the 'Measurement finished' * message is printed. * * Example: * Measurement started * 1024 22 * 2048 23 * 4096 22 * Measurement finished */ void measureCache(int mode, int jumps);

## 2.1 Lab submission guidelines

In order to submit this assignment during the lab, please prepare:

- A graph with results from both measurements. The axes should be similiar
to the figure on top of this page.
*x-axis*is in log-scale, showing powers of 2,*y-axis*shows average number of ticks per memory access.

# 3 Hints

- Do not store the array on the stack (local variables), which is (by
default) too small to fit the big array. Define the array either as
global variable of use
`malloc()`to allocate the memory.

To use

`sysTimestamp()`to receive current time (in system ticks) you might have to enable it first by calling`sysTimestampEnable()`. If you would like to obtain time instead, divide received value by frequency, which is available from`sysTimestampFreq()`.Many students are unfamiliar with how to work with pointers. Here is an example of how to setup sequential pointers in the array.

#define MAX_ARRAY_LEN (4*1024*1024/sizeof(struct elem)) struct elem array[MAX_ARRAY_LEN]; /* 4 MB array */ unsigned size = 1024, nElem, i; nElem = size/sizeof(struct elem); /* number of elements in the array to fill */ for (i=0; i < nElem - 1; i++) { array[i].next = &array[i+1]; } array[i].next = &array[0]; /* close the cycle */

Function

`rand()`on VxWorks returns normally a number between 0 and 32767. Using this function directly to generate the randomized array, especially for arrays with more than 32768 elements, will lead to poor results. A possible solution is to use the following function:unsigned longrand(long int max) { #if RAND_MAX == 32767 return (rand() ^ (rand() << 15)) % max; #else #error Unsupported RAND_MAX value #endif }

Use

`sysClkRateSet()`function to set higher resolution of system clock.It is important how we access the data in the memory. An example is matrix multiplication as shown in Lecture 9 of Effective Software course. Start at slide 29. As a backup, you can also download the materials here: lecture, and animations.