# 2007-11-15: comparison of comsol solvers, # versions and numbers of threads; mainly # interested in comsol-3.4 with matlab-7.5 # (both threaded). The test-case used is pdfem # of 2007-11-13, with the random number # generator starting in the default state # each time. # # The point: getting a single job done quicker # is nice, but: # i) *is* it quicker for having more threads, # even on a multiprocessor computer? # ii) how much less efficient is it, i.e. # how much CPU time is being wasted that # other processes could have used, just # to get one job a bit quicker? # iii) what is the optimal number of threads # and the best solver (for pdfem, at least) # iv) is there any advantage of comsol 3.4 over # comsol 3.3? # # Since matlab is being used as the main part of # the program, it's matlab's maxNumCompThreads(n) # function that's used to set the number of threads; # according to the comsol manual, this overrides all # comsol options such as -np or environment variables. # # The conclusion is that for pdfem (and similarly # for comsol examples run in plain comsol without # matlab) there is not much advantage from adding # any number of threads, there is even an absolute # reduction in speed for too many threads, and there # is a *lot* of wasted CPU time for the small gain # in speed! # For pdfemb (batched), keep to 1 thread (by setting # this up in matlab, or calling 'maxNumCompThreads(1);' # at the start of the script). For a single run, it # may help to have 2 to 4 threads. # key to results lines # comsolversion, solver, time(real|user)[s], [number of threads] # gnu: 1 process (3.3 seems to use multiproc, 3.4 not) 3.4, pardiso, 194|161 3.3, pardiso, 143|157 3.4, pardiso, 133|188 2-threads 3.4, umfpack, 146|149 3.3, umfpack, 127|148 3.4, umfpack, 118|178 2-threads # pdlabsim: 1 process (has only one cpu, no HT) 3.4, umfpack, 420|398 3.3, umfpack, 298|289 # diagsim: varied nproc (has 8 cpus) 3.4, pardiso, 144|151 1-thread 3.4, pardiso, 114|201 2-threads 3.4, pardiso, 101|309 4-threads 3.4, pardiso, 114|356 8-threads 3.4, umfpack, 139|146 1-thread 3.4, umfpack, 108|191 2-threads 3.4, umfpack, 96|286 4-threads 3.4, umfpack, 103|332 8-threads 3.3, umfpack, 114|141 3.3, pardiso, 118|142 3.3, umfpack, 115|144 (using -blas atlas -- any effect?) # Conclusions: # -- umfpack seems generally rather faster than pardiso, # on either version and any system that we have (amd64, # em64t, i686), single or multiple CPU. # -- Multithreading on multiprocessor systems gains only # little advantage, and beyond about 4 actually becomes worse. # -- Comsol-3.3 seems faster than comsol-3.4, especially # on an i686 system. # #----------------------------------------------------------------- # Now, to compare 8 processes on 8 CPUs, either single-threaded # or multi-threaded. (This, obviously from the 8 CPUs, is on diagsim.) # The point of this is to see whether using multithreading, which is # for example useful for finishing off the last job of a batch # quickly, wastes time on jobs when others are running in parallel. $ for i in `seq 1 8` ; do cd $i ; time xterm -e comsol-3.4 -np 1 matlab -ml -nodesktop -mlr runpdfem & cd .. ; done 3.4, umfpack (1-thread), 207s realtime mean, all within 5s or so 3.4, umfpack (4-threads), 322s realtime mean, from 296 to 330 (why?) # Answer: using multithreading in comsol matlab (or, from other tests # with the comsol examples, in comsol alone) sucks. It's only worth # anything when there are no competing processes, and even then it # can actually make the overall time *greater* if too many threads are # used. This conclusion applies only to our examples of use of comsol # `multithreaded' solvers and meshers, not to multithreading on SMP # systems in general; perhaps comsol will eventually become better. #----------------------------------------------------------------- # Added, 2007-11-30, trying a comsol example: Model Library, Comsol, Acoustics, automotive_muffler 40221 DOF diagsim: SPOOLES, -np 4, 590 s diagsim: PARDISO, -np 4, 230 s diagsim: PARDISO, -np 8, ???s /pkg/bin64/comsol-3.4: line 1790: 21840 Aborted ${JAVA} ${JVMARGS} ${CLTMPARG} -classpath ${FLCP} ${SHOWVERSION} ${MAINCLS} ${APPLARGS} diagsim: PARDISO, -np 1, 390 s gnu: SPOOLES, -np 2, 815 s gnu: PARDISO, -np 2, 368 s gnu: PARDISO, -np 1, crash, as on diagsim