As the final lab assignment for our Concurrent Programming module, we were asked to optimize matrix multiplication. C++ was set as the language and Linux was set as the environment for this task. The tasks for this lab assignment were as follows:
- Write a naive sequential program to perform matrix multiplication on a n x n matrix containing double precision values. The matrices were to be filled with random double values. Only the time taken for the actual multiplication part was to be recorded.
- Identify a C++ library for Linux environment which supports parallel for loops and write a parallelized version of the above program.
- Running the above 2 versions of matrix multiplication for matrix sizes of n = 100 to n = 1000 in increments of 100 and record the time taken for each of these executions.
- We were then supposed to look in to ways to optimize matrix multiplication taking the processor architecture into account as well and come up with a 3rd version of the matrix multiplication making use of both parallel for loops and various optimization techniques.
The full source code of this project can be found here: https://github.com/pubudu91/mm-optimization
Naive Sequential Version
For this, the standard matrix multiplication algorithm was used. If the matrices are called A, B and C,
|for (int i = 0; i < n; i++)|
|for (int j = 0; j < n; j++)|
|for (int k = 0; k < n; k++)|
|C[i][j] += A[i][k] * B[k][j];|