Intel(R) Architecture Code Analyzer Version - v3.0-28-g1ba2cbb build date: 2017-10-23;16:42:45 Analyzed File - ./pcg128-iaca Binary Format - 64Bit Architecture - SKL Analysis Type - Throughput Throughput Analysis Report -------------------------- Block Throughput: 9.53 Cycles Throughput Bottleneck: FrontEnd Loop Count: 22 Port Binding In Cycles Per Iteration: -------------------------------------------------------------------------------------------------- | Port | 0 - DV | 1 | 2 - D | 3 - D | 4 | 5 | 6 | 7 | -------------------------------------------------------------------------------------------------- | Cycles | 6.3 0.0 | 6.2 | 1.7 1.7 | 1.6 1.3 | 2.0 | 6.3 | 6.3 | 1.7 | -------------------------------------------------------------------------------------------------- DV - Divider pipe (on port 0) D - Data fetch pipe (on ports 2 and 3) F - Macro Fusion with the previous instruction occurred * - instruction micro-ops not bound to a port ^ - Micro Fusion occurred # - ESP Tracking sync uop was issued @ - SSE instruction followed an AVX256/AVX512 instruction, dozens of cycles penalty is expected X - instruction not supported, was not accounted in Analysis | Num Of | Ports pressure in cycles | | | Uops | 0 - DV | 1 | 2 - D | 3 - D | 4 | 5 | 6 | 7 | ----------------------------------------------------------------------------------------- | 1 | | 0.7 | | | | 0.3 | | | mov rsi, 0x4385df649fccf645 | 1 | | 0.3 | | | | 0.7 | | | mov rax, 0x2360ed051fc65da4 | 1* | | | | | | | | | mov rdx, rsi | 2 | | 1.0 | 0.7 0.7 | 0.3 0.3 | | | | | imul rax, qword ptr [rip+0x200b29] | 2 | | 1.0 | 0.3 0.3 | 0.7 0.7 | | | | | imul rdx, qword ptr [rip+0x200b29] | 1 | 0.3 | | | | | 0.5 | 0.3 | | add rax, rdx | 1* | | | | | | | | | mov rdx, rsi | 3^ | | 1.0 | 0.7 0.7 | 0.3 0.3 | | 1.0 | | | mulx rdi, rsi, qword ptr [rip+0x200b12] | 1 | 0.3 | | | | | | 0.7 | | mov rdx, 0x5851f42d4c957f2d | 1 | 0.7 | | | | | | 0.3 | | add rdi, rax | 1 | 0.3 | | | | | 0.3 | 0.5 | | mov rax, 0x14057b7ef767814f | 1 | 0.5 | | | | | 0.3 | 0.3 | | add rsi, rax | 1 | 0.3 | | | | | | 0.7 | | adc rdi, rdx | 1* | | | | | | | | | mov r9, rsi | 2^ | | | | | 1.0 | | | 1.0 | mov qword ptr [rip+0x200aeb], rsi | 1 | | 1.0 | | | | | | | shrd r9, rdi, 0x2b | 1* | | | | | | | | | mov r10, rdi | 2^ | | | | 0.3 | 1.0 | | | 0.7 | mov qword ptr [rip+0x200ae4], rdi | 1* | | | | | | | | | mov rcx, r9 | 1 | 0.7 | | | | | | 0.3 | | shr r10, 0x2b | 1 | | | | | | 1.0 | | | xor rcx, rsi | 1* | | | | | | | | | mov rax, rcx | 1* | | | | | | | | | mov rcx, r10 | 1 | 0.3 | | | | | 0.5 | 0.3 | | xor rcx, rdi | 1* | | | | | | | | | mov rdx, rcx | 1* | | | | | | | | | mov rcx, rdi | 1 | 0.3 | | | | | | 0.7 | | shr rcx, 0x3c | 1 | 0.5 | | | | | 0.5 | | | add ecx, 0x2d | 4 | 1.3 | 1.2 | | | | 0.3 | 1.3 | | shrd rax, rdx, cl | 1 | 0.3 | | | | | | 0.7 | | shrx rdx, rdx, rcx | 1 | | | | | | 1.0 | | | test cl, 0x40 | 1 | 0.7 | | | | | | 0.3 | | cmovnz rax, rdx Total Num Of Uops: 41