python - numpy peformance differences between Linux and Windows -

i trying run sklearn.decomposition.truncatedsvd() on 2 different computers , understand performance differences.

computer 1 (windows 7, physical computer)

os name microsoft windows 7 professional  scheme type x64-based pc processor   intel(r) core(tm) i7-3770 cpu @ 3.40ghz, 3401 mhz, 4 core(s),  8 logical installed physical memory (ram)   8.00 gb total physical memory   7.89 gb

computer 2 (debian, on amazon cloud)

architecture:          x86_64 cpu op-mode(s):        32-bit, 64-bit byte order:            little endian cpu(s):                8  width: 64 bits capabilities: ldt16 vsyscall32 *-core    description: motherboard    physical id: 0 *-memory    description:  scheme memory    physical id: 0    size: 29gib *-cpu    product: intel(r) xeon(r) cpu e5-2670 0 @ 2.60ghz    vendor: intel corp.    physical id: 1    bus info: cpu@0    width: 64 bits

computer 3 (windows 2008r2, on amazon cloud)

os name microsoft windows server 2008 r2 datacenter version 6.1.7601 service pack 1 build 7601  scheme type x64-based pc processor   intel(r) xeon(r) cpu e5-2670 v2 @ 2.50ghz, 2500 mhz,  4 core(s), 8 logical processor(s) installed physical memory (ram) 30.0 gb

both computers running python 3.2 , identical sklearn, numpy, scipy versions

i ran cprofile follows:

print(vectors.shape) >>> (7500, 2042)  _decomp = truncatedsvd(n_components=680, random_state=1) global _o _o = _decomp cprofile.runctx('_o.fit_transform(vectors)', globals(), locals(), sort=1)

computer 1 output

>>>    833 function calls in 1.710 seconds ordered by: internal time  ncalls  tottime  percall  cumtime  percall filename:lineno(function)     1    0.767    0.767    0.782    0.782 decomp_svd.py:15(svd)     1    0.249    0.249    0.249    0.249 {method 'enable' of '_lsprof.profiler' objects}     1    0.183    0.183    0.183    0.183 {method 'normal' of 'mtrand.randomstate' objects}     6    0.174    0.029    0.174    0.029 {built-in method csr_matvecs}     6    0.123    0.021    0.123    0.021 {built-in method csc_matvecs}     2    0.110    0.055    0.110    0.055 decomp_qr.py:14(safecall)     1    0.035    0.035    0.035    0.035 {built-in method dot}     1    0.020    0.020    0.589    0.589 extmath.py:185(randomized_range_finder)     2    0.018    0.009    0.019    0.010 function_base.py:532(asarray_chkfinite)    24    0.014    0.001    0.014    0.001 {method 'ravel' of 'numpy.ndarray' objects}     1    0.007    0.007    0.009    0.009 twodim_base.py:427(triu)     1    0.004    0.004    1.710    1.710 extmath.py:232(randomized_svd)

computer 2 output

>>>    858 function calls in 40.145 seconds ordered by: internal time ncalls  tottime  percall  cumtime  percall filename:lineno(function)     2   32.116   16.058   32.116   16.058 {built-in method dot}     1    6.148    6.148    6.156    6.156 decomp_svd.py:15(svd)     2    0.561    0.281    0.561    0.281 decomp_qr.py:14(safecall)     6    0.561    0.093    0.561    0.093 {built-in method csr_matvecs}     1    0.337    0.337    0.337    0.337 {method 'normal' of 'mtrand.randomstate' objects}     6    0.202    0.034    0.202    0.034 {built-in method csc_matvecs}     1    0.052    0.052    1.633    1.633 extmath.py:183(randomized_range_finder)     1    0.045    0.045    0.054    0.054 _methods.py:73(_var)     1    0.023    0.023    0.023    0.023 {method 'argmax' of 'numpy.ndarray' objects}     1    0.023    0.023    0.046    0.046 extmath.py:531(svd_flip)     1    0.016    0.016   40.145   40.145 <string>:1(<module>)    24    0.011    0.000    0.011    0.000 {method 'ravel' of 'numpy.ndarray' objects}     6    0.009    0.002    0.009    0.002 {method 'reduce' of 'numpy.ufunc' objects}     2    0.008    0.004    0.009    0.004 function_base.py:532(asarray_chkfinite)

computer 3 output

>>>         858 function calls in 2.223 seconds ordered by: internal time ncalls  tottime  percall  cumtime  percall filename:lineno(function)     1    0.956    0.956    0.972    0.972 decomp_svd.py:15(svd)     2    0.306    0.153    0.306    0.153 {built-in method dot}     1    0.274    0.274    0.274    0.274 {method 'normal' of 'mtrand.randomstate' objects}     6    0.205    0.034    0.205    0.034 {built-in method csr_matvecs}     6    0.151    0.025    0.151    0.025 {built-in method csc_matvecs}     2    0.133    0.067    0.133    0.067 decomp_qr.py:14(safecall)     1    0.032    0.032    0.043    0.043 _methods.py:73(_var)     1    0.030    0.030    0.030    0.030 {method 'argmax' of 'numpy.ndarray' objects}    24    0.026    0.001    0.026    0.001 {method 'ravel' of 'numpy.ndarray' objects}     2    0.019    0.010    0.020    0.010 function_base.py:532(asarray_chkfinite)     1    0.019    0.019    0.773    0.773 extmath.py:183(randomized_range_finder)     1    0.019    0.019    0.049    0.049 extmath.py:531(svd_flip)

notice {built-in method dot} difference 0.035s/call 16.058s/call, 450 times slower!!

------+---------+---------+---------+---------+--------------------------------------- ncalls| tottime | percall | cumtime | percall | filename:lineno(function)  hardware ------+---------+---------+---------+---------+--------------------------------------- 1     |  0.035  |  0.035  |  0.035  |  0.035  | {built-in method dot}      computer 1 2     | 32.116  | 16.058  | 32.116  | 16.058  | {built-in method dot}      computer 2 2     |  0.306  |  0.153  |  0.306  |  0.153  | {built-in method dot}      computer 3

i understand there should performance differences, should high?

is there way can farther debug performance issue?

edit

i tested new computer, computer 3 hw similar computer 2 , different os

the results 0.153s/call {built-in method dot} still 100 times faster linux!!

edit 2

computer 1 numpy config

>>> np.__config__.show() lapack_opt_info:     libraries = ['mkl_lapack95_lp64', 'mkl_blas95_lp64', 'mkl_intel_lp64', 'mkl_intel_thread', 'mkl_core', 'libiomp5md', 'libifportmd', 'mkl_lapack95_lp64', 'mkl_blas95_lp64', 'mkl_intel_lp64', 'mkl_intel_thread', 'mkl_core', 'libiomp5md', 'libifportmd']     library_dirs = ['c:/program files (x86)/intel/composer xe/mkl/lib/intel64']     define_macros = [('scipy_mkl_h', none)]     include_dirs = ['c:/program files (x86)/intel/composer xe/mkl/include'] blas_opt_info:     libraries = ['mkl_lapack95_lp64', 'mkl_blas95_lp64', 'mkl_intel_lp64', 'mkl_intel_thread', 'mkl_core', 'libiomp5md', 'libifportmd']     library_dirs = ['c:/program files (x86)/intel/composer xe/mkl/lib/intel64']     define_macros = [('scipy_mkl_h', none)]     include_dirs = ['c:/program files (x86)/intel/composer xe/mkl/include'] openblas_info:   not available lapack_mkl_info:     libraries = ['mkl_lapack95_lp64', 'mkl_blas95_lp64', 'mkl_intel_lp64', 'mkl_intel_thread', 'mkl_core', 'libiomp5md', 'libifportmd', 'mkl_lapack95_lp64', 'mkl_blas95_lp64', 'mkl_intel_lp64', 'mkl_intel_thread', 'mkl_core', 'libiomp5md', 'libifportmd']     library_dirs = ['c:/program files (x86)/intel/composer xe/mkl/lib/intel64']     define_macros = [('scipy_mkl_h', none)]     include_dirs = ['c:/program files (x86)/intel/composer xe/mkl/include'] blas_mkl_info:     libraries = ['mkl_lapack95_lp64', 'mkl_blas95_lp64', 'mkl_intel_lp64', 'mkl_intel_thread', 'mkl_core', 'libiomp5md', 'libifportmd']     library_dirs = ['c:/program files (x86)/intel/composer xe/mkl/lib/intel64']     define_macros = [('scipy_mkl_h', none)]     include_dirs = ['c:/program files (x86)/intel/composer xe/mkl/include'] mkl_info:     libraries = ['mkl_lapack95_lp64', 'mkl_blas95_lp64', 'mkl_intel_lp64', 'mkl_intel_thread', 'mkl_core', 'libiomp5md', 'libifportmd']     library_dirs = ['c:/program files (x86)/intel/composer xe/mkl/lib/intel64']     define_macros = [('scipy_mkl_h', none)]     include_dirs = ['c:/program files (x86)/intel/composer xe/mkl/include']

computer 2 numpy config

>>> np.__config__.show() lapack_info:   not available lapack_opt_info:   not available blas_info:     libraries = ['blas']     library_dirs = ['/usr/lib']     language = f77 atlas_threads_info:   not available atlas_blas_info:   not available lapack_src_info:   not available openblas_info:   not available atlas_blas_threads_info:   not available blas_mkl_info:   not available blas_opt_info:     libraries = ['blas']     library_dirs = ['/usr/lib']     language = f77     define_macros = [('no_atlas_info', 1)] atlas_info:   not available lapack_mkl_info:   not available mkl_info:   not available

{built-in method dot} np.dot function, numpy wrapper around cblas routines matrix-matrix, matrix-vector , vector-vector multiplication. windows machines uses heavily tuned intel mkl version of cblas. linux machine using slow old reference implementation.

if install atlas or openblas (both available through linux bundle managers) or, in fact, intel mkl, you're see massive speedups. seek sudo apt-get install libatlas-dev, check numpy config 1 time again see if picked atlas, , measure again.

once you've decided on right cblas library, may want recompile scikit-learn. of uses numpy linear algebra needs, algorithms (notably k-means) utilize cblas directly.

the os has little this.

python performance numpy scikit-learn

Search This Blog

Four

python - numpy peformance differences between Linux and Windows -

Comments

Post a Comment

Popular posts from this blog

formatting - SAS SQL Datepart function returning odd values -

php - Yii 2: Unable to find a class into the extension 'yii2-admin' -

c++ - Apple Mach-O Linker Error(Duplicate Symbols For Architecture armv7) -