# 1   Assignment

Measure how cache memory influences memory access time. Provide the results in the form of graph similar to the figure below. Create an array of the following structures:

```#define GAP 0
struct elem {
struct elem *next;
int gap[GAP];
};
```

The size (in bytes) of the array is the "working set size" from the graph above, i.e. the graph shows how does the measured quantity depend on the size of the array. Your graph should show memory access time for working set sizes from 1 KiB to 4 MiB.

The goal is to measure how much CPU cycles is needed for accessing an element in the structure depending on the access pattern (random, sequential). To measure that, write a program that will traverse the above array in sequential and random order and measure the total time. The total time divided by the number of accessed elements will give you the average time for accessing a single element.

It is essential for your program to perform only the necessary operations. Therefore, prepare the array according to the figures below. The next pointer will point either to the following element or to a random one. The last traversed element will always point to the first traversed one. In case of random order, it is important that the "chain" traverses all elements, i.e. there are no "cycles" shorter than the whole array. You make take inspiration from the Fisher–Yates shuffle algorithm. The measuring program will be a simple loop where you measure how long it takes to iterate e.g. ten million times over the elements of the array. The resulting time, divided by ten millions, will correspond to the average time per element. For your measurements, set the GAP constant to 0, but if you are interested, you can also measure results for other values. The measuring loop should be at least as simple as in the following example:

```int jumps = 10000000;
struct elem *p = &array;
UINT64 start, end;
sysTimestamp64Lock(&start);
while (--jumps) p = p->next;
sysTimestamp64Lock(&end);
```

Note: Using different loop (e.g., for loop) results into different values, especially for sequentially ordered array.

# 2   Submission guidelines

• To be awarded by points, your solution has to be both checked by upload system and demonstrated on real hardware (MicroZed).

• Entry point to the program is called measureCache and it takes two arguments, as written below.

• Use this code (inside config.h) as a base structure:

```/* config.h
* PSR 5cache assignment
* DO NOT MODIFY
*/

/* sysClkRateSet(CLOCK_RATE) */
#define CLOCK_RATE 1000

#define GAP 0

struct elem {
struct elem *next;
int gap[GAP];
};

#define MAX_ARRAY_LEN (4*1024*1024/sizeof(struct elem))

/* Modes for `measureCache` */
#define SEQUENTIAL 0
#define RANDOM 1

/*
* seqArray()
*  : (struct elem*) array -- array to modify
*  : (int) n -- number of elements to modify
*
*  This function modifies first `n` elements of `array`
*  and links them together in a sequential order,
*  i.e.,
*      array.next == array
*      array.next == array
*      ...
*      array[n-1].next == array
*/
void seqArray(struct elem* array, int n);

/*
* ranArray()
*  : (struct elem*) array -- array to modify
*  : (int) n -- number of elements to modify
*
*  This function modifies first `n` elements of `array`
*  and links them together in a random order,
*  i.e. (e.g.),
*      array.next == array
*      array.next == array
*      ...
*      array[x].next == array
*/
void ranArray(struct elem* array, int n);

/*
* measureCache()
*  : (int) mode -- mode of ordering elements in an array
*  : (int) jumps -- number of pointer "jumps" in the single working
*                   set measurement
*
*  This function makes a measurement of memory access time.
*  First, it prepares an array (according to the selected `mode`).
*  Then, for every working set size (1 KiB – 4 MiB) it measures
*  the time of following the links in the array `jumps` times.
*
*  The function should print to stdout 'Measurement started'
*  message. Then, it prints for each measurement the working set
*  size (in bytes) and the average access time for a single
*  "jump" (in CPU clocks). At the end the 'Measurement finished'
*  message is printed.
*
*  Example:
*      Measurement started
*      1024    22
*      2048    23
*      4096    22
*      Measurement finished
*/
void measureCache(int mode, int jumps);
```

## 2.1   Lab submission guidelines

In order to submit this assignment during the lab, please prepare:

1. A graph with results from both measurements. The axes should be similiar to the figure on top of this page.
• x-axis is in log-scale, showing powers of 2,
• y-axis shows average number of ticks per memory access.

# 3   Hints

• Do not store the array on the stack (local variables), which is (by default) too small to fit the big array. Define the array either as global variable of use malloc() to allocate the memory.
• The CPU clock frequency is 866 MHz. Use this to calculate the length of the CPU clock cycle (number on the vertical axis).

• To use sysTimestamp64Lock() to obtain current time from a high-resolution clock. You have to enable it first by calling sysTimestampEnable(). If you would like convert the returned value to seconds, divide it by frequency, returned by sysTimestamp64Freq().

• Many students are unfamiliar with how to work with pointers. Here is an example of how to setup sequential pointers in the array.

```#define MAX_ARRAY_LEN (4*1024*1024/sizeof(struct elem))
struct elem array[MAX_ARRAY_LEN]; /* 4 MB array */

unsigned size = 1024, nElem, i;
nElem = size/sizeof(struct elem);   /* number of elements in the array to fill */
for (i=0; i < nElem - 1; i++) {
array[i].next = &array[i+1];
}
array[i].next = &array;   /* close the cycle */
```
• It is important how we access the data in the memory. An example is matrix multiplication as shown in Lecture 9 of Effective Software course. Start at slide 29. As a backup, you can also download the materials here: lecture, and animations.