15/11/21 Added matrix multiplication example.

12/11/21 Stick to minimal while loop, as using for loop gives different results.

04/11/21 Added an explanation of BRUTE targets.

# 1   Assignment

Measure how cache memory influences memory access time. Provide the results in the form of graph similar to the figure below. Create an array of the following structures:

```#define GAP 0
struct elem {
struct elem *next;
int gap[GAP];
};
```

The size (in bytes) of the array is the "working set size" from the graph above, i.e. the graph shows how does a measured quantity depend on the size of the array. Your graph should show memory access time for working set sizes from 1 KiB to 4 MiB.

The goal is to measure how much CPU cycles is needed for accessing an element in the structure depending on the access pattern (random, sequential). To measure that, write a program that will traverse the above array in sequential and random order and measure the total time. The total time divided by the number of accessed elements will give you the average time for accessing a single element.

It is essential for your program to perform only the necessary operations. Therefore, prepare the array according to the figures below. The next pointer will point either to the following element or to a random one. The last traversed element will always point to the first traversed one. In case of random order, it is important that the "chain" traverses all elements, i.e. there are no "cycles" shorter than the whole array. The measuring program will be a simple loop where you measure how long it takes to iterate e.g. ten million times over the elements of the array. The resulting time, divided by ten millions, will correspond to the average time per element. For your measurements, set the GAP constant to 0, but if you are interested, you can also measure results for other values. The measuring loop should be at least as simple as in the following example:

```int jumps = 10000000;
struct elem *p = &array;
unsigned start = sysTimestamp();
while (--jumps) p = p->next;
unsigned end = sysTimestamp();
```

Note: Using different loop (e.g., for loop) results into different values, especially for sequentially ordered array.

# 2   Submission guidelines

• To be awarded by points, your solution has to be both checked by upload system and demonstrated on real hardware (MicroZed). BRUTE contains two targets: 5cache and 5cache-hw:

• 5cache tests your code in the simulator to find out whether you link the array properly. It is designed to help you and to confirm that your implementation is suited for the measurement on the board.
• 5cache-hw tests your code on the board and outputs the measurements.
• Created application is Downloadable Kernel Module.

• Entry point to the program is called measureCache and it takes two arguments, as written below.

• Use this code (inside config.h) as a base structure:

```/* config.h
* PSR 5cache assignment
* DO NOT MODIFY
*/

/* sysClkRateSet(CLOCK_RATE) */
#define CLOCK_RATE 1000

#define GAP 0

struct elem {
struct elem *next;
int gap[GAP];
};

#define MAX_ARRAY_LEN (4*1024*1024/sizeof(struct elem))

/* Modes for `measureCache` */
#define SEQUENTIAL 0
#define RANDOM 1

/*
* seqArray()
*  : (struct elem*) array -- array to modify
*  : (int) n -- number of elements to modify
*
*  This function takes first `n` elements of `array`
*  and links them together in a sequential order,
*  i.e.,
*      array.next == array
*      array.next == array
*      ...
*      array[n-1].next == array
*/
void seqArray(struct elem* array, int n);

/*
* ranArray()
*  : (struct elem*) array -- array to modify
*  : (int) n -- number of elements to modify
*
*  This function takes first `n` elements of `array`
*  and links them together in a random order,
*  i.e. (e.g.),
*      array.next == array
*      array.next == array
*      ...
*      array[x].next == array
*/
void ranArray(struct elem* array, int n);

/*
* measureCache()
*  : (int) mode -- mode of ordering elements in an array
*  : (int) jumps -- number of pointer "jumps" in the single working
*                   set measurement
*
*  This function makes a measurement of memory access time.
*  First, it prepares an array (according to selected `mode`).
*  Then, for every working set size (1 KiB - 4 MiB) it measures
*  the time of following the links in the array `jumps` times.
*
*  The function should print to stdout 'Measurement started'
*  message. Then, it prints for each measurement the working set
*  size (in bytes) and the average access time for a single
*  "jump" (in CPU ticks). At the end the 'Measurement finished'
*  message is printed.
*
*  Example:
*      Measurement started
*      1024    22
*      2048    23
*      4096    22
*      Measurement finished
*/
void measureCache(int mode, int jumps);
```

## 2.1   Lab submission guidelines

In order to submit this assignment during the lab, please prepare:

1. A graph with results from both measurements. The axes should be similiar to the figure on top of this page.
• x-axis is in log-scale, showing powers of 2,
• y-axis shows average number of ticks per memory access.

# 3   Hints

• Do not store the array on the stack (local variables), which is (by default) too small to fit the big array. Define the array either as global variable of use malloc() to allocate the memory.
• To use sysTimestamp() to receive current time (in system ticks) you might have to enable it first by calling sysTimestampEnable(). If you would like to obtain time instead, divide received value by frequency, which is available from sysTimestampFreq().

• Many students are unfamiliar with how to work with pointers. Here is an example of how to setup sequential pointers in the array.

```#define MAX_ARRAY_LEN (4*1024*1024/sizeof(struct elem))
struct elem array[MAX_ARRAY_LEN]; /* 4 MB array */

unsigned size = 1024, nElem, i;
nElem = size/sizeof(struct elem);   /* number of elements in the array to fill */
for (i=0; i < nElem - 1; i++) {
array[i].next = &array[i+1];
}
array[i].next = &array;   /* close the cycle */
```
• Function rand() on VxWorks returns normally a number between 0 and 32767. Using this function directly to generate the randomized array, especially for arrays with more than 32768 elements, will lead to poor results. A possible solution is to use the following function:

```unsigned longrand(long int max)
{
#if RAND_MAX == 32767
return (rand() ^ (rand() << 15)) % max;
#else
#error Unsupported RAND_MAX value
#endif
}
```
• Use sysClkRateSet() function to set higher resolution of system clock.

• It is important how we access the data in the memory. An example is matrix multiplication as shown in Lecture 9 of Effective Software course. Start at slide 29. As a backup, you can also download the materials here: lecture, and animations.