There are 3 popular BLAS implementations, namely Intel MKL, OpenBLAS and ATLAS. A benchmark shows that ATLAS is the slowest, while MKL and OpenBLAS are on par. I could not get OpenBLAS to work, as I keep getting the error "DLL load failed: The specified module could not be found" despite ensuring that all the dependent DLLs are on my path.
Getting the MKL library
- Before Intel would let you download the MKL library, you have to register at https://software.intel.com/en-us/intel-mkl
- You will receive an email containing the link to the download page of various Intel libraries. Download the Math Kernel Library (MKL) installer
- By default, MKL gets installed into C:\Program Files (x86)\IntelSWTools. I could not figure out what escape sequence I need to enter to get spaces to work in gcc. Thus use Link Shell Extension to create a symbolic link from C:\tools\IntelSWTools to C:\Program Files (x86)\IntelSWTools.
- Edit C:\Users
\.theanorc and add the following
ldflags = -LC:/tools/IntelSWTools/compilers_and_libraries/windows/mkl/lib/intel64_win -LC:/tools/IntelSWTools/compilers_and_libraries/windows/mkl/lib/intel64_win_mic -lmkl_core -lmkl_intel_thread -lmkl_lapack95_lp64 -lmkl_blas95_lp64 -lmkl_rt
- Add C:\tools\IntelSWTools\compilers_and_libraries\windows\redist\intel64_win\mkl to your system environmental variable PATH
Checking that it works
- Open command line and cd to
- python.exe check_blas.py
- The output should have something like the following
We executed 10 calls to gemm with a and b matrices of shapes (5000, 5000) and (5000, 5000).
Total execution time: 18.00s on CPU (with direct Theano binding to blas).
Try to run this script a few times. Experience shows that the first time is not as fast as followings calls. The difference is not big, but consistent.