Starting from:

$30

CSE140-Project 3 Solved

You (plus optional teammate) are tasked with the job of making the fastest matrix
multiplication program as possible for all machines. That means you cannot
specifically target a machine. But you are free to research and find all usual
architectures specification for personal and server machines. You may assume
that everything is Intel architecture (x86_64) to make life easier.
Background Reading:
Chapter 4.12
The matrix is column major. Naïve implementation is given in dgemm-naive.c and
you can run the bench-naive to see the output.
void dgemm( int m, int n, float *A, float *C )
{
for( int i = 0; i < m; i++ )
for( int k = 0; k < n; k++ )
for( int j = 0; j < m; j++ )
C[i+j*m] += A[i+k*m] * A[j+k*m];
}
C is where the result is stored and we are doing all the calculations from just one
matrix, A. You are required to do all the calculations and no optimization is
allowed on this front to make benchmarking easier. Zip contains the following files :
Makefile: to make and benchmark
benchmark.c: do not modify. It check results and produce performance numbers
dgemm-naive.c: naïve implementation as shown above
dgemm-optimize.c: your optimization
Choose at most 3 of the following common optimizations (1 per function, 

More products