Parallel Programming with CUDA
Matrix Multiplication: the “Hello World” of parallel programming. In an effort to learn parallel programming with a NVIDIA GPU, I am documenting my findings as I attempt to migrate from a serial cpu matrix multiplication implementation to a highly parallel algorithm on a GPU using CUDA. For simplicity sake, we will be dealing with NxN [...]







