java - Strange behavior while scaling treatments over many CPUs -


i studying performances while scaling java code on many cpus. that, wrote simple program runs 50000 fibonacci on 1 thread, 2*50000 on 2 threads, 3*50000 on 3 threads , on, until number of cpu of target host reached.

here code:

import java.util.concurrent.executorservice; import java.util.concurrent.executors;  public class multithreadscalability {      static final int max_threads = 4;     static final int nb_run_per_thread = 50000;     static final int fibo_value = 25;      public static void main(string[] args) {         multithreadscalability multithreadscalability = new multithreadscalability();         multithreadscalability.runtest();     }       private void runtest() {         int availableprocs = runtime.getruntime().availableprocessors();         system.out.println(availableprocs + " processors available");          (int = 1 ; <= availableprocs ; i++) {             system.out.println("running scalability test " + + " threads");             long timeinmillisecs = runtestforthreads(i);             system.out.println("=> " + timeinmillisecs + " milli-seconds");         }     }       private long runtestforthreads(int threadsnumber) {         final int nbrun = nb_run_per_thread * threadsnumber;         executorservice executor = executors.newfixedthreadpool(threadsnumber);          long starttime = system.currenttimemillis();          (int = 0 ; < nbrun ; i++) {             runnable worker = new runnable()             {                 public void run()                 {                     fibo(fibo_value);                 }             };              executor.execute(worker);         }          executor.shutdown();          while (!executor.isterminated())         {}          return (system.currenttimemillis() - starttime);     }       private static long fibo(int n) {         if (n < 2) {             return (n);         }          return (fibo(n - 1) + fibo(n - 2));     }  } 

in given condtions, expected - independent of number of threads - execution time remain constant.

i ran on power-full machine , had following output:

48 processors available running scalability test 1 threads => 34199 milli-seconds running scalability test 2 threads => 34141 milli-seconds running scalability test 3 threads => 34009 milli-seconds running scalability test 4 threads => 34000 milli-seconds running scalability test 5 threads => 34034 milli-seconds running scalability test 6 threads => 34086 milli-seconds running scalability test 7 threads => 34094 milli-seconds running scalability test 8 threads => 34673 milli-seconds running scalability test 9 threads => 35297 milli-seconds running scalability test 10 threads => 35486 milli-seconds running scalability test 11 threads => 35913 milli-seconds running scalability test 12 threads => 36324 milli-seconds running scalability test 13 threads => 35722 milli-seconds running scalability test 14 threads => 35750 milli-seconds running scalability test 15 threads => 35634 milli-seconds running scalability test 16 threads => 35970 milli-seconds running scalability test 17 threads => 37914 milli-seconds running scalability test 18 threads => 36560 milli-seconds running scalability test 19 threads => 36720 milli-seconds running scalability test 20 threads => 37028 milli-seconds running scalability test 21 threads => 37381 milli-seconds running scalability test 22 threads => 37529 milli-seconds running scalability test 23 threads => 37632 milli-seconds running scalability test 24 threads => 39942 milli-seconds running scalability test 25 threads => 40090 milli-seconds running scalability test 26 threads => 41238 milli-seconds running scalability test 27 threads => 42336 milli-seconds running scalability test 28 threads => 43377 milli-seconds running scalability test 29 threads => 44394 milli-seconds running scalability test 30 threads => 46245 milli-seconds running scalability test 31 threads => 45928 milli-seconds running scalability test 32 threads => 47490 milli-seconds running scalability test 33 threads => 47674 milli-seconds running scalability test 34 threads => 48775 milli-seconds running scalability test 35 threads => 56456 milli-seconds running scalability test 36 threads => 50557 milli-seconds running scalability test 37 threads => 51393 milli-seconds running scalability test 38 threads => 52971 milli-seconds running scalability test 39 threads => 53077 milli-seconds running scalability test 40 threads => 54015 milli-seconds running scalability test 41 threads => 55924 milli-seconds running scalability test 42 threads => 55560 milli-seconds running scalability test 43 threads => 56554 milli-seconds running scalability test 44 threads => 57073 milli-seconds running scalability test 45 threads => 65193 milli-seconds running scalability test 46 threads => 58549 milli-seconds running scalability test 47 threads => 59302 milli-seconds running scalability test 48 threads => 60662 milli-seconds 

the time remains almost same until 24 threads. becomes slower , slower you can see on graph

i asking in order understand why such "break" happens

last not least, cpu configuration of host on ran test following one:

$ cat /proc/cpuinfo processor       : 0 vendor_id       : genuineintel cpu family      : 6 model           : 46 model name      : intel(r) xeon(r) cpu           e7540  @ 2.00ghz stepping        : 6 cpu mhz         : 1997.885 cache size      : 18432 kb physical id     : 0 siblings        : 12 core id         : 0 cpu cores       : 6 apicid          : 0 fpu             : yes fpu_exception   : yes cpuid level     : 11 wp              : yes flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat p se36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx rdtscp lm constant_tsc id nonstop_tsc pni monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr sse4_1 sse4_2 popcnt lah f_lm bogomips        : 3995.77 clflush size    : 64 cache_alignment : 64 address sizes   : 44 bits physical, 48 bits virtual power management: [8] 

here, see real number of cores 6. runtime.getruntime().availableprocessors() not return number of pysical cpu number of "hyper-threads": 48

do think can explain "break" observe @ 24 threads?

it looks me if machine has 4 intel e7540 cpus, each 6 cores , 12 threads, giving total of 24 cores , 48 threads. can execute 24 instructions @ same time.

the 48 threads refers hyperthreading feature, built make use of micro pauses occur if thread has fetch memory continue. since test doesn't access new memory in innermost loop, limited 24 cores.

so yes, number of cores vs. number of threads explains it.


Comments

Popular posts from this blog

java - Run spring boot application error: Cannot instantiate interface org.springframework.context.ApplicationListener -

reactjs - React router and this.props.children - how to pass state to this.props.children -

Excel VBA "Microsoft Windows Common Controls 6.0 (SP6)" Location Changes -