SEMPAR Myelin profile
Load parser from local/sempar.flow
248.84 ms loading parser
Benchmarking parser on local/sempar/dev.rec
15084 documents, 291746 tokens, 3314.9 tokens/sec
LR LSTM
Profile for 291746 invocations of lr_lstm with 820528 operations
CPU model: Intel(R) Xeon(R) CPU E3-1220 v5 @ 3.50GHz
CPU architecture: Skylake (family 06 model 5e stepping 03), 4 cores
CPU features: MMX SSE SSE2 SSE3 SSE4.1 F16C AVX AVX2 FMA3
+---------+---------+--------------+---------+----------------------------+---+------------------------
| percent | accum% | time | gflops | kernel | t | step
+---------+---------+--------------+---------+----------------------------+---+------------------------
| 0.03% | 0.03% | 0.007 μs | 0.000 | DragnnLookupSingle | | lr_lstm/punctuation/Lookup
| 0.04% | 0.07% | 0.011 μs | 4.533 | DragnnLookupUnrolled | | lr_lstm/suffix/Lookup
| 0.21% | 0.27% | 0.053 μs | 0.000 | DragnnLookupSingle | | lr_lstm/quote/Lookup
| 0.03% | 0.30% | 0.007 μs | 0.000 | DragnnLookupSingle | | lr_lstm/capitalization/Lookup
| 0.02% | 0.32% | 0.006 μs | 0.000 | DragnnLookupSingle | | lr_lstm/digit/Lookup
| 0.02% | 0.35% | 0.006 μs | 0.000 | DragnnLookupSingle | | lr_lstm/hyphen/Lookup
| 0.03% | 0.37% | 0.006 μs | 0.000 | DragnnLookupSingle | | lr_lstm/words/Lookup
| 0.45% | 0.83% | 0.116 μs | 0.000 | BasicConcat | | lr_lstm/concat
| 5.13% | 5.96% | 1.313 μs | 34.305 | AVXFltVecMatMulV[U8] | | lr_lstm/MatMul_3
| 5.14% | 11.09% | 1.314 μs | 34.279 | AVXFltVecMatMulV[U8] | | lr_lstm/MatMul
| 15.99% | 27.08% | 4.091 μs | 32.098 | AVXFltVecMatMulAddV[U8] | | lr_lstm/MatMul_4
| 16.31% | 43.39% | 4.174 μs | 31.465 | AVXFltVecMatMulAddV[U8] | | lr_lstm/MatMul_1
| 16.11% | 59.50% | 4.124 μs | 31.845 | AVXFltVecMatMulAddV[U8] | | lr_lstm/MatMul_2
| 2.08% | 61.58% | 0.531 μs | 39.015 | Calculate[VFltAVX256] | | lr_lstm/add_4 [$0=Sigmoid(Add(%2,%3));@0=Add(Mul($0,Tanh(Add(%0,%1))),Mul(Sub(_1,$0),%4));@1=Tanh(@0)]
| 16.09% | 77.67% | 4.119 μs | 31.824 | AVXFltVecMatMulV[U8] | | lr_lstm/MatMul_6
| 5.26% | 82.93% | 1.346 μs | 33.667 | AVXFltVecMatMulAddV[U8] | | lr_lstm/MatMul_5
| 15.85% | 98.78% | 4.056 μs | 32.379 | AVXFltVecMatMulAddV[U8] | | lr_lstm/MatMul_7
| 0.89% | 99.67% | 0.228 μs | 34.789 | Calculate[VFltAVX256] | | lr_lstm/add_7 [@0=Mul(Sigmoid(Add(%0,%1)),%2)]
| 0.33% | 100.00% | 0.085 μs | 0.000 | | | Entry & Exit
+---------+---------+--------------+---------+----------------------------+---+------------------------
| 100.00% | 100.00% | 25.594 μs | 32.060 | TOTAL | |
+---------+---------+--------------+---------+----------------------------+---+------------------------
RL LSTM
Profile for 291746 invocations of rl_lstm with 820528 operations
CPU model: Intel(R) Xeon(R) CPU E3-1220 v5 @ 3.50GHz
CPU architecture: Skylake (family 06 model 5e stepping 03), 4 cores
+---------+---------+--------------+---------+----------------------------+---+------------------------
| percent | accum% | time | gflops | kernel | t | step
+---------+---------+--------------+---------+----------------------------+---+------------------------
| 0.03% | 0.03% | 0.006 μs | 0.000 | DragnnLookupSingle | | rl_lstm/hyphen/Lookup
| 0.02% | 0.05% | 0.006 μs | 0.000 | DragnnLookupSingle | | rl_lstm/capitalization/Lookup
| 0.02% | 0.07% | 0.006 μs | 0.000 | DragnnLookupSingle | | rl_lstm/quote/Lookup
| 0.02% | 0.10% | 0.006 μs | 0.000 | DragnnLookupSingle | | rl_lstm/words/Lookup
| 0.02% | 0.12% | 0.006 μs | 0.000 | DragnnLookupSingle | | rl_lstm/digit/Lookup
| 0.02% | 0.15% | 0.006 μs | 0.000 | DragnnLookupSingle | | rl_lstm/punctuation/Lookup
| 0.05% | 0.20% | 0.014 μs | 3.508 | DragnnLookupUnrolled | | rl_lstm/suffix/Lookup
| 0.52% | 0.72% | 0.133 μs | 0.000 | BasicConcat | | rl_lstm/concat
| 5.32% | 6.05% | 1.356 μs | 33.223 | AVXFltVecMatMulV[U8] | | rl_lstm/MatMul
| 5.09% | 11.14% | 1.298 μs | 34.719 | AVXFltVecMatMulV[U8] | | rl_lstm/MatMul_3
| 15.96% | 27.10% | 4.066 μs | 32.299 | AVXFltVecMatMulAddV[U8] | | rl_lstm/MatMul_1
| 16.25% | 43.34% | 4.139 μs | 31.731 | AVXFltVecMatMulAddV[U8] | | rl_lstm/MatMul_4
| 16.34% | 59.69% | 4.164 μs | 31.537 | AVXFltVecMatMulAddV[U8] | | rl_lstm/MatMul_2
| 2.10% | 61.78% | 0.534 μs | 38.826 | Calculate[VFltAVX256] | | rl_lstm/add_4 [$0=Sigmoid(Add(%2,%3));@0=Add(Mul($0,Tanh(Add(%0,%1))),Mul(Sub(_1,$0),%4));@1=Tanh(@0)]
| 16.10% | 77.89% | 4.103 μs | 31.945 | AVXFltVecMatMulV[U8] | | rl_lstm/MatMul_6
| 5.28% | 83.17% | 1.345 μs | 33.694 | AVXFltVecMatMulAddV[U8] | | rl_lstm/MatMul_5
| 15.80% | 98.96% | 4.024 μs | 32.632 | AVXFltVecMatMulAddV[U8] | | rl_lstm/MatMul_7
| 0.86% | 99.83% | 0.220 μs | 36.017 | Calculate[VFltAVX256] | | rl_lstm/add_7 [@0=Mul(Sigmoid(Add(%0,%1)),%2)]
| 0.17% | 100.00% | 0.044 μs | 0.000 | | | Entry & Exit
+---------+---------+--------------+---------+----------------------------+---+------------------------
| 100.00% | 100.00% | 25.478 μs | 32.206 | TOTAL | |
+---------+---------+--------------+---------+----------------------------+---+------------------------
FF
Profile for 545030 invocations of ff with 2565688 operations
CPU model: Intel(R) Xeon(R) CPU E3-1220 v5 @ 3.50GHz
CPU architecture: Skylake (family 06 model 5e stepping 03), 4 cores
CPU features: MMX SSE SSE2 SSE3 SSE4.1 F16C AVX AVX2 FMA3
+---------+---------+--------------+---------+----------------------------+---+------------------------
| percent | accum% | time | gflops | kernel | t | step
+---------+---------+--------------+---------+----------------------------+---+------------------------
| 0.02% | 0.02% | 0.020 μs | 0.000 | DragnnCollect | | ff/rl/Collect
| 0.12% | 0.13% | 0.151 μs | 0.000 | DragnnCollect | | ff/frame-end-lr/Collect
| 0.12% | 0.25% | 0.147 μs | 0.000 | DragnnCollect | | ff/frame-end-rl/Collect
| 0.03% | 0.28% | 0.043 μs | 11.850 | DragnnLookupUnrolled | | ff/in-roles/Lookup
| 0.03% | 0.32% | 0.041 μs | 12.376 | DragnnLookupUnrolled | | ff/unlabeled-roles/Lookup
| 0.04% | 0.35% | 0.046 μs | 11.148 | DragnnLookupUnrolled | | ff/labeled-roles/Lookup
| 0.03% | 0.38% | 0.042 μs | 12.172 | DragnnLookupUnrolled | | ff/out-roles/Lookup
| 0.06% | 0.45% | 0.083 μs | 0.000 | DragnnCollect | | ff/frame-focus-steps/Collect
| 0.06% | 0.51% | 0.071 μs | 0.000 | DragnnCollect | | ff/frame-creation-steps/Collect
| 0.02% | 0.53% | 0.025 μs | 0.000 | DragnnCollect | | ff/lr/Collect
| 0.06% | 0.59% | 0.083 μs | 0.000 | DragnnCollect | | ff/history/Collect
| 0.44% | 1.03% | 0.564 μs | 29.144 | AVXFltVecMatMulV[U4] | | ff/rl/MatMul
| 1.80% | 2.83% | 2.298 μs | 35.794 | AVXFltMatMatMul | | ff/frame-end-lr/MatMul
| 1.79% | 4.63% | 2.289 μs | 35.927 | AVXFltMatMatMul | | ff/frame-end-rl/MatMul
| 2.56% | 7.19% | 3.266 μs | 25.282 | AVXFltMatMatMul | | ff/frame-focus-steps/MatMul
| 2.57% | 9.75% | 3.278 μs | 25.184 | AVXFltMatMatMul | | ff/frame-creation-steps/MatMul
| 0.42% | 10.18% | 0.541 μs | 30.400 | AVXFltVecMatMulV[U4] | | ff/lr/MatMul
| 2.08% | 12.26% | 2.655 μs | 24.874 | AVXFltMatMatMul | | ff/history/MatMul
| 0.12% | 12.38% | 0.156 μs | 0.000 | BasicConcat | | ff/concat
| 9.65% | 22.03% | 12.310 μs | 27.972 | AVXFltVecMatMulAddReluV[U8]| | ff/MatMul
| 77.95% | 99.98% | 99.464 μs | 18.004 | AVXFltVecMatMulAddV[U1] | | ff/MatMul_1
| 0.02% | 100.00% | 0.027 μs | 0.000 | | | Entry & Exit
+---------+---------+--------------+---------+----------------------------+---+------------------------
| 100.00% | 100.00% | 127.601 μs | 20.107 | TOTAL | |
+---------+---------+--------------+---------+----------------------------+---+------------------------
Raw flow
Generated code
lr_lstm entry address: 0x7fda0c052000 code size: 4352 data size: 8736
rl_lstm entry address: 0x7fda0c050000 code size: 4352 data size: 8736
ff entry address: 0x7fda0c04f000 code size: 2761 data size: 59584