SEMPAR Myelin profile

Load parser from local/sempar.flow
248.84 ms loading parser
Benchmarking parser on local/sempar/dev.rec
15084 documents, 291746 tokens, 3314.9 tokens/sec

LR LSTM

flow cluster_1 lr_lstm lr_lstm/punctuation/Lookup Lookup float32[1x8] lr_lstm/concat ConcatV2 float32[1x88] lr_lstm/punctuation/Lookup->lr_lstm/concat lr_lstm/suffix/Lookup Lookup float32[1x16] lr_lstm/suffix/Lookup->lr_lstm/concat lr_lstm/quote/Lookup Lookup float32[1x8] lr_lstm/quote/Lookup->lr_lstm/concat lr_lstm/capitalization/Lookup Lookup float32[1x8] lr_lstm/capitalization/Lookup->lr_lstm/concat lr_lstm/digit/Lookup Lookup float32[1x8] lr_lstm/digit/Lookup->lr_lstm/concat lr_lstm/hyphen/Lookup Lookup float32[1x8] lr_lstm/hyphen/Lookup->lr_lstm/concat lr_lstm/words/Lookup Lookup float32[1x32] lr_lstm/words/Lookup->lr_lstm/concat lr_lstm/MatMul_3 MatMul float32[1x256] lr_lstm/concat->lr_lstm/MatMul_3 lr_lstm/MatMul MatMul float32[1x256] lr_lstm/concat->lr_lstm/MatMul lr_lstm/MatMul_5 MatMulAdd float32[1x256] lr_lstm/concat->lr_lstm/MatMul_5 lr_lstm/MatMul_4 MatMulAdd float32[1x256] lr_lstm/MatMul_3->lr_lstm/MatMul_4 lr_lstm/MatMul_1 MatMulAdd float32[1x256] lr_lstm/MatMul->lr_lstm/MatMul_1 lr_lstm/add_4 $2=Sigmoid(Add(%2,%3));@0=Add(Mul($2,Tanh(Add(%0,%1))),Mul(Sub(_1,$2),%5));@1=Tanh(@0) &float32[1x256] lr_lstm/MatMul_4->lr_lstm/add_4 lr_lstm/MatMul_2 MatMulAdd float32[1x256] lr_lstm/MatMul_1->lr_lstm/MatMul_2 lr_lstm/MatMul_2->lr_lstm/add_4 lr_lstm/add_7 @0=Mul(Sigmoid(Add(%0,%1)),%2) &float32[1x256] lr_lstm/add_4->lr_lstm/add_7 v:lr_lstm/c_out c_out &float32[1x256] lr_lstm/add_4->v:lr_lstm/c_out lr_lstm/MatMul_6 MatMul float32[1x256] lr_lstm/MatMul_6->lr_lstm/MatMul_5 lr_lstm/MatMul_7 MatMulAdd float32[1x256] lr_lstm/MatMul_5->lr_lstm/MatMul_7 lr_lstm/MatMul_7->lr_lstm/add_7 v:lr_lstm/h_out h_out &float32[1x256] lr_lstm/add_7->v:lr_lstm/h_out v:lr_lstm/punctuation punctuation int32[1x1] v:lr_lstm/punctuation->lr_lstm/punctuation/Lookup v:lr_lstm/digit digit int32[1x1] v:lr_lstm/digit->lr_lstm/digit/Lookup v:lr_lstm/c2i c2i float32[256x256] v:lr_lstm/c2i->lr_lstm/MatMul_2 v:lr_lstm/c2o c2o float32[256x256] v:lr_lstm/c2o->lr_lstm/MatMul_6 v:lr_lstm/h2c h2c float32[256x256] v:lr_lstm/h2c->lr_lstm/MatMul_4 v:lr_lstm/h2o h2o float32[256x256] v:lr_lstm/h2o->lr_lstm/MatMul_7 v:lr_lstm/h2i h2i float32[256x256] v:lr_lstm/h2i->lr_lstm/MatMul_1 v:lr_lstm/quote quote int32[1x1] v:lr_lstm/quote->lr_lstm/quote/Lookup v:lr_lstm/words words int32[1x1] v:lr_lstm/words->lr_lstm/words/Lookup v:lr_lstm/x2c x2c float32[88x256] v:lr_lstm/x2c->lr_lstm/MatMul_3 v:lr_lstm/x2o x2o float32[88x256] v:lr_lstm/x2o->lr_lstm/MatMul_5 v:lr_lstm/x2i x2i float32[88x256] v:lr_lstm/x2i->lr_lstm/MatMul v:lr_lstm/capitalization capitalization int32[1x1] v:lr_lstm/capitalization->lr_lstm/capitalization/Lookup v:lr_lstm/c_in c_in &float32[1x256] v:lr_lstm/c_in->lr_lstm/MatMul_2 v:lr_lstm/c_in->lr_lstm/add_4 v:lr_lstm/h_in h_in &float32[1x256] v:lr_lstm/h_in->lr_lstm/MatMul_4 v:lr_lstm/h_in->lr_lstm/MatMul_1 v:lr_lstm/h_in->lr_lstm/MatMul_7 v:lr_lstm/axis:0 axis:0 int32 1 v:lr_lstm/axis:0->lr_lstm/concat v:lr_lstm/fixed_embedding_matrix_3 fixed_embedding_matrix_3 float32[2x8] v:lr_lstm/fixed_embedding_matrix_3->lr_lstm/hyphen/Lookup v:lr_lstm/fixed_embedding_matrix_0 fixed_embedding_matrix_0 float32[53257x32] v:lr_lstm/fixed_embedding_matrix_0->lr_lstm/words/Lookup v:lr_lstm/fixed_embedding_matrix_6 fixed_embedding_matrix_6 float32[3x8] v:lr_lstm/fixed_embedding_matrix_6->lr_lstm/digit/Lookup v:lr_lstm/fixed_embedding_matrix_5 fixed_embedding_matrix_5 float32[4x8] v:lr_lstm/fixed_embedding_matrix_5->lr_lstm/quote/Lookup v:lr_lstm/fixed_embedding_matrix_4 fixed_embedding_matrix_4 float32[3x8] v:lr_lstm/fixed_embedding_matrix_4->lr_lstm/punctuation/Lookup v:lr_lstm/bc bc float32[256] v:lr_lstm/bc->lr_lstm/add_4 v:lr_lstm/hyphen hyphen int32[1x1] v:lr_lstm/hyphen->lr_lstm/hyphen/Lookup v:lr_lstm/bi bi float32[256] v:lr_lstm/bi->lr_lstm/add_4 v:lr_lstm/bo bo float32[256] v:lr_lstm/bo->lr_lstm/add_7 v:lr_lstm/fixed_embedding_matrix_2 fixed_embedding_matrix_2 float32[5x8] v:lr_lstm/fixed_embedding_matrix_2->lr_lstm/capitalization/Lookup v:lr_lstm/fixed_embedding_matrix_1 fixed_embedding_matrix_1 float32[8334x16] v:lr_lstm/fixed_embedding_matrix_1->lr_lstm/suffix/Lookup v:lr_lstm/suffix suffix int32[1x3] v:lr_lstm/suffix->lr_lstm/suffix/Lookup v:lr_lstm/ones:0 ones:0 float32[1x1] [[1.000000]] v:lr_lstm/ones:0->lr_lstm/add_4 v:lr_lstm/c_out->lr_lstm/MatMul_6
Profile for 291746 invocations of lr_lstm with 820528 operations
CPU model: Intel(R) Xeon(R) CPU E3-1220 v5 @ 3.50GHz
CPU architecture: Skylake (family 06 model 5e stepping 03), 4 cores
CPU features: MMX SSE SSE2 SSE3 SSE4.1 F16C AVX AVX2 FMA3

+---------+---------+--------------+---------+----------------------------+---+------------------------
| percent |  accum% |      time    |  gflops | kernel                     | t | step
+---------+---------+--------------+---------+----------------------------+---+------------------------
|   0.03% |   0.03% |     0.007 μs |   0.000 | DragnnLookupSingle         |   | lr_lstm/punctuation/Lookup
|   0.04% |   0.07% |     0.011 μs |   4.533 | DragnnLookupUnrolled       |   | lr_lstm/suffix/Lookup
|   0.21% |   0.27% |     0.053 μs |   0.000 | DragnnLookupSingle         |   | lr_lstm/quote/Lookup
|   0.03% |   0.30% |     0.007 μs |   0.000 | DragnnLookupSingle         |   | lr_lstm/capitalization/Lookup
|   0.02% |   0.32% |     0.006 μs |   0.000 | DragnnLookupSingle         |   | lr_lstm/digit/Lookup
|   0.02% |   0.35% |     0.006 μs |   0.000 | DragnnLookupSingle         |   | lr_lstm/hyphen/Lookup
|   0.03% |   0.37% |     0.006 μs |   0.000 | DragnnLookupSingle         |   | lr_lstm/words/Lookup
|   0.45% |   0.83% |     0.116 μs |   0.000 | BasicConcat                |   | lr_lstm/concat
|   5.13% |   5.96% |     1.313 μs |  34.305 | AVXFltVecMatMulV[U8]       |   | lr_lstm/MatMul_3
|   5.14% |  11.09% |     1.314 μs |  34.279 | AVXFltVecMatMulV[U8]       |   | lr_lstm/MatMul
|  15.99% |  27.08% |     4.091 μs |  32.098 | AVXFltVecMatMulAddV[U8]    |   | lr_lstm/MatMul_4
|  16.31% |  43.39% |     4.174 μs |  31.465 | AVXFltVecMatMulAddV[U8]    |   | lr_lstm/MatMul_1
|  16.11% |  59.50% |     4.124 μs |  31.845 | AVXFltVecMatMulAddV[U8]    |   | lr_lstm/MatMul_2
|   2.08% |  61.58% |     0.531 μs |  39.015 | Calculate[VFltAVX256]      |   | lr_lstm/add_4 [$0=Sigmoid(Add(%2,%3));@0=Add(Mul($0,Tanh(Add(%0,%1))),Mul(Sub(_1,$0),%4));@1=Tanh(@0)]
|  16.09% |  77.67% |     4.119 μs |  31.824 | AVXFltVecMatMulV[U8]       |   | lr_lstm/MatMul_6
|   5.26% |  82.93% |     1.346 μs |  33.667 | AVXFltVecMatMulAddV[U8]    |   | lr_lstm/MatMul_5
|  15.85% |  98.78% |     4.056 μs |  32.379 | AVXFltVecMatMulAddV[U8]    |   | lr_lstm/MatMul_7
|   0.89% |  99.67% |     0.228 μs |  34.789 | Calculate[VFltAVX256]      |   | lr_lstm/add_7 [@0=Mul(Sigmoid(Add(%0,%1)),%2)]
|   0.33% | 100.00% |     0.085 μs |   0.000 |                            |   | Entry & Exit
+---------+---------+--------------+---------+----------------------------+---+------------------------
| 100.00% | 100.00% |    25.594 μs |  32.060 | TOTAL                      |   |
+---------+---------+--------------+---------+----------------------------+---+------------------------

RL LSTM

flow cluster_0 rl_lstm rl_lstm/hyphen/Lookup Lookup float32[1x8] rl_lstm/concat ConcatV2 float32[1x88] rl_lstm/hyphen/Lookup->rl_lstm/concat rl_lstm/capitalization/Lookup Lookup float32[1x8] rl_lstm/capitalization/Lookup->rl_lstm/concat rl_lstm/quote/Lookup Lookup float32[1x8] rl_lstm/quote/Lookup->rl_lstm/concat rl_lstm/words/Lookup Lookup float32[1x32] rl_lstm/words/Lookup->rl_lstm/concat rl_lstm/digit/Lookup Lookup float32[1x8] rl_lstm/digit/Lookup->rl_lstm/concat rl_lstm/punctuation/Lookup Lookup float32[1x8] rl_lstm/punctuation/Lookup->rl_lstm/concat rl_lstm/suffix/Lookup Lookup float32[1x16] rl_lstm/suffix/Lookup->rl_lstm/concat rl_lstm/MatMul MatMul float32[1x256] rl_lstm/concat->rl_lstm/MatMul rl_lstm/MatMul_3 MatMul float32[1x256] rl_lstm/concat->rl_lstm/MatMul_3 rl_lstm/MatMul_5 MatMulAdd float32[1x256] rl_lstm/concat->rl_lstm/MatMul_5 rl_lstm/MatMul_1 MatMulAdd float32[1x256] rl_lstm/MatMul->rl_lstm/MatMul_1 rl_lstm/MatMul_4 MatMulAdd float32[1x256] rl_lstm/MatMul_3->rl_lstm/MatMul_4 rl_lstm/MatMul_2 MatMulAdd float32[1x256] rl_lstm/MatMul_1->rl_lstm/MatMul_2 rl_lstm/add_4 $2=Sigmoid(Add(%2,%3));@0=Add(Mul($2,Tanh(Add(%0,%1))),Mul(Sub(_1,$2),%5));@1=Tanh(@0) &float32[1x256] rl_lstm/MatMul_4->rl_lstm/add_4 rl_lstm/MatMul_2->rl_lstm/add_4 rl_lstm/add_7 @0=Mul(Sigmoid(Add(%0,%1)),%2) &float32[1x256] rl_lstm/add_4->rl_lstm/add_7 v:rl_lstm/c_out c_out &float32[1x256] rl_lstm/add_4->v:rl_lstm/c_out rl_lstm/MatMul_6 MatMul float32[1x256] rl_lstm/MatMul_6->rl_lstm/MatMul_5 rl_lstm/MatMul_7 MatMulAdd float32[1x256] rl_lstm/MatMul_5->rl_lstm/MatMul_7 rl_lstm/MatMul_7->rl_lstm/add_7 v:rl_lstm/h_out h_out &float32[1x256] rl_lstm/add_7->v:rl_lstm/h_out v:rl_lstm/words words int32[1x1] v:rl_lstm/words->rl_lstm/words/Lookup v:rl_lstm/punctuation punctuation int32[1x1] v:rl_lstm/punctuation->rl_lstm/punctuation/Lookup v:rl_lstm/suffix suffix int32[1x3] v:rl_lstm/suffix->rl_lstm/suffix/Lookup v:rl_lstm/ones:0 ones:0 float32[1x1] [[1.000000]] v:rl_lstm/ones:0->rl_lstm/add_4 v:rl_lstm/bo bo float32[256] v:rl_lstm/bo->rl_lstm/add_7 v:rl_lstm/bi bi float32[256] v:rl_lstm/bi->rl_lstm/add_4 v:rl_lstm/bc bc float32[256] v:rl_lstm/bc->rl_lstm/add_4 v:rl_lstm/c2o c2o float32[256x256] v:rl_lstm/c2o->rl_lstm/MatMul_6 v:rl_lstm/c2i c2i float32[256x256] v:rl_lstm/c2i->rl_lstm/MatMul_2 v:rl_lstm/quote quote int32[1x1] v:rl_lstm/quote->rl_lstm/quote/Lookup v:rl_lstm/axis:0 axis:0 int32 1 v:rl_lstm/axis:0->rl_lstm/concat v:rl_lstm/x2i x2i float32[88x256] v:rl_lstm/x2i->rl_lstm/MatMul v:rl_lstm/x2c x2c float32[88x256] v:rl_lstm/x2c->rl_lstm/MatMul_3 v:rl_lstm/h2i h2i float32[256x256] v:rl_lstm/h2i->rl_lstm/MatMul_1 v:rl_lstm/h2o h2o float32[256x256] v:rl_lstm/h2o->rl_lstm/MatMul_7 v:rl_lstm/h2c h2c float32[256x256] v:rl_lstm/h2c->rl_lstm/MatMul_4 v:rl_lstm/capitalization capitalization int32[1x1] v:rl_lstm/capitalization->rl_lstm/capitalization/Lookup v:rl_lstm/x2o x2o float32[88x256] v:rl_lstm/x2o->rl_lstm/MatMul_5 v:rl_lstm/h_in h_in &float32[1x256] v:rl_lstm/h_in->rl_lstm/MatMul_1 v:rl_lstm/h_in->rl_lstm/MatMul_4 v:rl_lstm/h_in->rl_lstm/MatMul_7 v:rl_lstm/c_in c_in &float32[1x256] v:rl_lstm/c_in->rl_lstm/MatMul_2 v:rl_lstm/c_in->rl_lstm/add_4 v:rl_lstm/digit digit int32[1x1] v:rl_lstm/digit->rl_lstm/digit/Lookup v:rl_lstm/hyphen hyphen int32[1x1] v:rl_lstm/hyphen->rl_lstm/hyphen/Lookup v:rl_lstm/fixed_embedding_matrix_6 fixed_embedding_matrix_6 float32[3x8] v:rl_lstm/fixed_embedding_matrix_6->rl_lstm/digit/Lookup v:rl_lstm/fixed_embedding_matrix_5 fixed_embedding_matrix_5 float32[4x8] v:rl_lstm/fixed_embedding_matrix_5->rl_lstm/quote/Lookup v:rl_lstm/fixed_embedding_matrix_4 fixed_embedding_matrix_4 float32[3x8] v:rl_lstm/fixed_embedding_matrix_4->rl_lstm/punctuation/Lookup v:rl_lstm/fixed_embedding_matrix_3 fixed_embedding_matrix_3 float32[2x8] v:rl_lstm/fixed_embedding_matrix_3->rl_lstm/hyphen/Lookup v:rl_lstm/fixed_embedding_matrix_2 fixed_embedding_matrix_2 float32[5x8] v:rl_lstm/fixed_embedding_matrix_2->rl_lstm/capitalization/Lookup v:rl_lstm/fixed_embedding_matrix_1 fixed_embedding_matrix_1 float32[8334x16] v:rl_lstm/fixed_embedding_matrix_1->rl_lstm/suffix/Lookup v:rl_lstm/fixed_embedding_matrix_0 fixed_embedding_matrix_0 float32[53257x32] v:rl_lstm/fixed_embedding_matrix_0->rl_lstm/words/Lookup v:rl_lstm/c_out->rl_lstm/MatMul_6
Profile for 291746 invocations of rl_lstm with 820528 operations
CPU model: Intel(R) Xeon(R) CPU E3-1220 v5 @ 3.50GHz
CPU architecture: Skylake (family 06 model 5e stepping 03), 4 cores
+---------+---------+--------------+---------+----------------------------+---+------------------------
| percent |  accum% |      time    |  gflops | kernel                     | t | step
+---------+---------+--------------+---------+----------------------------+---+------------------------
|   0.03% |   0.03% |     0.006 μs |   0.000 | DragnnLookupSingle         |   | rl_lstm/hyphen/Lookup
|   0.02% |   0.05% |     0.006 μs |   0.000 | DragnnLookupSingle         |   | rl_lstm/capitalization/Lookup
|   0.02% |   0.07% |     0.006 μs |   0.000 | DragnnLookupSingle         |   | rl_lstm/quote/Lookup
|   0.02% |   0.10% |     0.006 μs |   0.000 | DragnnLookupSingle         |   | rl_lstm/words/Lookup
|   0.02% |   0.12% |     0.006 μs |   0.000 | DragnnLookupSingle         |   | rl_lstm/digit/Lookup
|   0.02% |   0.15% |     0.006 μs |   0.000 | DragnnLookupSingle         |   | rl_lstm/punctuation/Lookup
|   0.05% |   0.20% |     0.014 μs |   3.508 | DragnnLookupUnrolled       |   | rl_lstm/suffix/Lookup
|   0.52% |   0.72% |     0.133 μs |   0.000 | BasicConcat                |   | rl_lstm/concat
|   5.32% |   6.05% |     1.356 μs |  33.223 | AVXFltVecMatMulV[U8]       |   | rl_lstm/MatMul
|   5.09% |  11.14% |     1.298 μs |  34.719 | AVXFltVecMatMulV[U8]       |   | rl_lstm/MatMul_3
|  15.96% |  27.10% |     4.066 μs |  32.299 | AVXFltVecMatMulAddV[U8]    |   | rl_lstm/MatMul_1
|  16.25% |  43.34% |     4.139 μs |  31.731 | AVXFltVecMatMulAddV[U8]    |   | rl_lstm/MatMul_4
|  16.34% |  59.69% |     4.164 μs |  31.537 | AVXFltVecMatMulAddV[U8]    |   | rl_lstm/MatMul_2
|   2.10% |  61.78% |     0.534 μs |  38.826 | Calculate[VFltAVX256]      |   | rl_lstm/add_4 [$0=Sigmoid(Add(%2,%3));@0=Add(Mul($0,Tanh(Add(%0,%1))),Mul(Sub(_1,$0),%4));@1=Tanh(@0)]
|  16.10% |  77.89% |     4.103 μs |  31.945 | AVXFltVecMatMulV[U8]       |   | rl_lstm/MatMul_6
|   5.28% |  83.17% |     1.345 μs |  33.694 | AVXFltVecMatMulAddV[U8]    |   | rl_lstm/MatMul_5
|  15.80% |  98.96% |     4.024 μs |  32.632 | AVXFltVecMatMulAddV[U8]    |   | rl_lstm/MatMul_7
|   0.86% |  99.83% |     0.220 μs |  36.017 | Calculate[VFltAVX256]      |   | rl_lstm/add_7 [@0=Mul(Sigmoid(Add(%0,%1)),%2)]
|   0.17% | 100.00% |     0.044 μs |   0.000 |                            |   | Entry & Exit
+---------+---------+--------------+---------+----------------------------+---+------------------------
| 100.00% | 100.00% |    25.478 μs |  32.206 | TOTAL                      |   |
+---------+---------+--------------+---------+----------------------------+---+------------------------

FF

flow cluster_2 ff ff/rl/Collect Collect float32[1x257] ff/rl/MatMul MatMul float32[1x32] ff/rl/Collect->ff/rl/MatMul ff/frame-end-lr/Collect Collect float32[5x257] ff/frame-end-lr/MatMul MatMul float32[5x32] ff/frame-end-lr/Collect->ff/frame-end-lr/MatMul ff/frame-end-rl/Collect Collect float32[5x257] ff/frame-end-rl/MatMul MatMul float32[5x32] ff/frame-end-rl/Collect->ff/frame-end-rl/MatMul ff/in-roles/Lookup Lookup float32[1x16] ff/concat ConcatV2 float32[1x1344] ff/in-roles/Lookup->ff/concat ff/unlabeled-roles/Lookup Lookup float32[1x16] ff/unlabeled-roles/Lookup->ff/concat ff/labeled-roles/Lookup Lookup float32[1x16] ff/labeled-roles/Lookup->ff/concat ff/out-roles/Lookup Lookup float32[1x16] ff/out-roles/Lookup->ff/concat ff/frame-focus-steps/Collect Collect float32[5x129] ff/frame-focus-steps/MatMul MatMul float32[5x64] ff/frame-focus-steps/Collect->ff/frame-focus-steps/MatMul ff/frame-creation-steps/Collect Collect float32[5x129] ff/frame-creation-steps/MatMul MatMul float32[5x64] ff/frame-creation-steps/Collect->ff/frame-creation-steps/MatMul ff/lr/Collect Collect float32[1x257] ff/lr/MatMul MatMul float32[1x32] ff/lr/Collect->ff/lr/MatMul ff/history/Collect Collect float32[4x129] ff/history/MatMul MatMul float32[4x64] ff/history/Collect->ff/history/MatMul ff/rl/MatMul->ff/concat ff/frame-end-lr/Reshape Reshape float32[1x160] ff/frame-end-lr/MatMul->ff/frame-end-lr/Reshape ff/frame-end-rl/Reshape Reshape float32[1x160] ff/frame-end-rl/MatMul->ff/frame-end-rl/Reshape ff/frame-focus-steps/Reshape Reshape float32[1x320] ff/frame-focus-steps/MatMul->ff/frame-focus-steps/Reshape ff/frame-creation-steps/Reshape Reshape float32[1x320] ff/frame-creation-steps/MatMul->ff/frame-creation-steps/Reshape ff/lr/MatMul->ff/concat ff/history/Reshape Reshape float32[1x256] ff/history/MatMul->ff/history/Reshape ff/frame-end-lr/Reshape->ff/concat ff/frame-end-rl/Reshape->ff/concat ff/frame-focus-steps/Reshape->ff/concat ff/frame-creation-steps/Reshape->ff/concat ff/history/Reshape->ff/concat ff/MatMul MatMulAddRelu &float32[1x128] ff/concat->ff/MatMul v:ff/hidden hidden &float32[1x128] ff/MatMul->v:ff/hidden ff/MatMul_1 MatMulAdd float32[1x6968] v:ff/add:0 add:0 float32[1x6968] ff/MatMul_1->v:ff/add:0 v:ff/link/lr_lstm lr_lstm &float32[?x256] v:ff/link/lr_lstm->ff/frame-end-lr/Collect v:ff/link/lr_lstm->ff/lr/Collect v:ff/frame-end-rl frame-end-rl int32[1x5] v:ff/frame-end-rl->ff/frame-end-rl/Collect v:ff/bias_0 bias_0 float32[128] v:ff/bias_0->ff/MatMul v:ff/rl rl int32[1x1] v:ff/rl->ff/rl/Collect v:ff/history history int32[1x4] v:ff/history->ff/history/Collect v:ff/unlabeled-roles unlabeled-roles int32[1x32] v:ff/unlabeled-roles->ff/unlabeled-roles/Lookup v:ff/frame-end-lr frame-end-lr int32[1x5] v:ff/frame-end-lr->ff/frame-end-lr/Collect v:ff/link/rl_lstm rl_lstm &float32[?x256] v:ff/link/rl_lstm->ff/rl/Collect v:ff/link/rl_lstm->ff/frame-end-rl/Collect v:ff/frame-creation-steps/shape:0 shape:0 int32[2] [1,320] v:ff/frame-creation-steps/shape:0->ff/frame-creation-steps/Reshape v:ff/in-roles in-roles int32[1x32] v:ff/in-roles->ff/in-roles/Lookup v:ff/axis:0 axis:0 int32 1 v:ff/axis:0->ff/concat v:ff/steps steps &float32[?x128] v:ff/steps->ff/frame-focus-steps/Collect v:ff/steps->ff/frame-creation-steps/Collect v:ff/steps->ff/history/Collect v:ff/weights_softmax weights_softmax float32[128x6968] v:ff/weights_softmax->ff/MatMul_1 v:ff/linked_embedding_matrix_2 linked_embedding_matrix_2 float32[257x32] v:ff/linked_embedding_matrix_2->ff/frame-end-lr/MatMul v:ff/labeled-roles labeled-roles int32[1x32] v:ff/labeled-roles->ff/labeled-roles/Lookup v:ff/lr lr int32[1x1] v:ff/lr->ff/lr/Collect v:ff/frame-end-rl/shape:0 shape:0 int32[2] [1,160] v:ff/frame-end-rl/shape:0->ff/frame-end-rl/Reshape v:ff/bias_softmax bias_softmax float32[6968] v:ff/bias_softmax->ff/MatMul_1 v:ff/linked_embedding_matrix_6 linked_embedding_matrix_6 float32[257x32] v:ff/linked_embedding_matrix_6->ff/rl/MatMul v:ff/linked_embedding_matrix_5 linked_embedding_matrix_5 float32[257x32] v:ff/linked_embedding_matrix_5->ff/lr/MatMul v:ff/linked_embedding_matrix_4 linked_embedding_matrix_4 float32[129x64] v:ff/linked_embedding_matrix_4->ff/history/MatMul v:ff/linked_embedding_matrix_3 linked_embedding_matrix_3 float32[257x32] v:ff/linked_embedding_matrix_3->ff/frame-end-rl/MatMul v:ff/linked_embedding_matrix_1 linked_embedding_matrix_1 float32[129x64] v:ff/linked_embedding_matrix_1->ff/frame-focus-steps/MatMul v:ff/linked_embedding_matrix_0 linked_embedding_matrix_0 float32[129x64] v:ff/linked_embedding_matrix_0->ff/frame-creation-steps/MatMul v:ff/frame-creation-steps frame-creation-steps int32[1x5] v:ff/frame-creation-steps->ff/frame-creation-steps/Collect v:ff/frame-focus-steps/shape:0 shape:0 int32[2] [1,320] v:ff/frame-focus-steps/shape:0->ff/frame-focus-steps/Reshape v:ff/history/shape:0 shape:0 int32[2] [1,256] v:ff/history/shape:0->ff/history/Reshape v:ff/weights_0 weights_0 float32[1344x128] v:ff/weights_0->ff/MatMul v:ff/frame-focus-steps frame-focus-steps int32[1x5] v:ff/frame-focus-steps->ff/frame-focus-steps/Collect v:ff/out-roles out-roles int32[1x32] v:ff/out-roles->ff/out-roles/Lookup v:ff/frame-end-lr/shape:0 shape:0 int32[2] [1,160] v:ff/frame-end-lr/shape:0->ff/frame-end-lr/Reshape v:ff/fixed_embedding_matrix_1 fixed_embedding_matrix_1 float32[125x16] v:ff/fixed_embedding_matrix_1->ff/out-roles/Lookup v:ff/fixed_embedding_matrix_0 fixed_embedding_matrix_0 float32[125x16] v:ff/fixed_embedding_matrix_0->ff/in-roles/Lookup v:ff/fixed_embedding_matrix_3 fixed_embedding_matrix_3 float32[25x16] v:ff/fixed_embedding_matrix_3->ff/unlabeled-roles/Lookup v:ff/fixed_embedding_matrix_2 fixed_embedding_matrix_2 float32[625x16] v:ff/fixed_embedding_matrix_2->ff/labeled-roles/Lookup v:ff/hidden->ff/MatMul_1
Profile for 545030 invocations of ff with 2565688 operations
CPU model: Intel(R) Xeon(R) CPU E3-1220 v5 @ 3.50GHz
CPU architecture: Skylake (family 06 model 5e stepping 03), 4 cores
CPU features: MMX SSE SSE2 SSE3 SSE4.1 F16C AVX AVX2 FMA3

+---------+---------+--------------+---------+----------------------------+---+------------------------
| percent |  accum% |      time    |  gflops | kernel                     | t | step
+---------+---------+--------------+---------+----------------------------+---+------------------------
|   0.02% |   0.02% |     0.020 μs |   0.000 | DragnnCollect              |   | ff/rl/Collect
|   0.12% |   0.13% |     0.151 μs |   0.000 | DragnnCollect              |   | ff/frame-end-lr/Collect
|   0.12% |   0.25% |     0.147 μs |   0.000 | DragnnCollect              |   | ff/frame-end-rl/Collect
|   0.03% |   0.28% |     0.043 μs |  11.850 | DragnnLookupUnrolled       |   | ff/in-roles/Lookup
|   0.03% |   0.32% |     0.041 μs |  12.376 | DragnnLookupUnrolled       |   | ff/unlabeled-roles/Lookup
|   0.04% |   0.35% |     0.046 μs |  11.148 | DragnnLookupUnrolled       |   | ff/labeled-roles/Lookup
|   0.03% |   0.38% |     0.042 μs |  12.172 | DragnnLookupUnrolled       |   | ff/out-roles/Lookup
|   0.06% |   0.45% |     0.083 μs |   0.000 | DragnnCollect              |   | ff/frame-focus-steps/Collect
|   0.06% |   0.51% |     0.071 μs |   0.000 | DragnnCollect              |   | ff/frame-creation-steps/Collect
|   0.02% |   0.53% |     0.025 μs |   0.000 | DragnnCollect              |   | ff/lr/Collect
|   0.06% |   0.59% |     0.083 μs |   0.000 | DragnnCollect              |   | ff/history/Collect
|   0.44% |   1.03% |     0.564 μs |  29.144 | AVXFltVecMatMulV[U4]       |   | ff/rl/MatMul
|   1.80% |   2.83% |     2.298 μs |  35.794 | AVXFltMatMatMul            |   | ff/frame-end-lr/MatMul
|   1.79% |   4.63% |     2.289 μs |  35.927 | AVXFltMatMatMul            |   | ff/frame-end-rl/MatMul
|   2.56% |   7.19% |     3.266 μs |  25.282 | AVXFltMatMatMul            |   | ff/frame-focus-steps/MatMul
|   2.57% |   9.75% |     3.278 μs |  25.184 | AVXFltMatMatMul            |   | ff/frame-creation-steps/MatMul
|   0.42% |  10.18% |     0.541 μs |  30.400 | AVXFltVecMatMulV[U4]       |   | ff/lr/MatMul
|   2.08% |  12.26% |     2.655 μs |  24.874 | AVXFltMatMatMul            |   | ff/history/MatMul
|   0.12% |  12.38% |     0.156 μs |   0.000 | BasicConcat                |   | ff/concat
|   9.65% |  22.03% |    12.310 μs |  27.972 | AVXFltVecMatMulAddReluV[U8]|   | ff/MatMul
|  77.95% |  99.98% |    99.464 μs |  18.004 | AVXFltVecMatMulAddV[U1]    |   | ff/MatMul_1
|   0.02% | 100.00% |     0.027 μs |   0.000 |                            |   | Entry & Exit
+---------+---------+--------------+---------+----------------------------+---+------------------------
| 100.00% | 100.00% |   127.601 μs |  20.107 | TOTAL                      |   |
+---------+---------+--------------+---------+----------------------------+---+------------------------

Raw flow Generated code
lr_lstm entry address: 0x7fda0c052000 code size: 4352 data size: 8736
rl_lstm entry address: 0x7fda0c050000 code size: 4352 data size: 8736
ff entry address: 0x7fda0c04f000 code size: 2761 data size: 59584