To give an example: for convolution: input = [1, 97, 128, 1], output is [1, 97, 128, 32], (NHWC),
It generate a single block.
But I was reading the “block based operation” of manual.
• OFM_BLOCK_WIDTH must be in the range 1-64 and a multiple of theMIN_BLOCK_WIDTH. •
OFM_BLOCK_HEIGHT must be in the range 1-32 and a multiple of theMIN_BLOCK_HEIGHT. •
OFM_BLOCK_DEPTH must be in the range1-128 and a multiple of MIN_BLOCK_DEPTH.
Here neither outputH or outputW are in the range…
print_high_level_command_stream() <unnamed>_split_1
0 <NPUStripe: ps=conv, ifm_box=<Box [0, 0, 0, 0] - [1, 97, 128, 1]>, ifm2_box=<Box [] - []>, ofm_box=<Box [0, 0, 0, 0] - [1, 97, 128, 32]>, weight_box=<Box [0, 0, 0, 0] - [1, 1, 1, 32]>, block_config=[8, 16, 8, 16]>
0 Conv2D, <NPUStripe: ps=conv, ifm_box=<Box [0, 0, 0, 0] - [1, 97, 128, 1]>, ifm2_box=<Box [] - []>, ofm_box=<Box [0, 0, 0, 0] - [1, 97, 128, 32]>, weight_box=<Box [0, 0, 0, 0] - [1, 1, 1, 32]>, block_config=[8, 16, 8, 16]>
IFM: h=97,w=128,c=1, region=1, NHWC, UINT8, size=12416, scale: 0.011761300265789032, zero: 85
Stride y/x/c: 128/1/1, tiles: w0=128, h0=97, h1=97, base=['0x0', '0x0', '0x0', '0x0']
OFM: h=97,w=128,c=32, region=1, NHWC, UINT8, size=397312, scale: 0.006358124315738678, zero: 131
Stride y/x/c: 4096/32/1, tiles: w0=128, h0=97, h1=97, base=['0x3080', '0x0', '0x0', '0x0']
Kernel: w=3, h=3, stride=(1, 1), dilation=(1, 1)
NpuPadding(top=1, left=1, bottom=1, right=1)
Weights: (region=0, address=0x280, length=496)
Scales: (region=0, address=0x140, length=320)
NpuBlockTraversal.PART_KERNEL_FIRST
Block config: h=8,w=16,c=16, NpuResamplingMode.NONE, NpuRoundingMode.TFL
Code: Command: Param: Payload:
0x010f cmd0.NPU_SET_IFM_REGION 1 -
0x4000 cmd1.NPU_SET_IFM_BASE0 0 0x00000000 (0)
0x4001 cmd1.NPU_SET_IFM_BASE1 0 0x00000000 (0)
0x4002 cmd1.NPU_SET_IFM_BASE2 0 0x00000000 (0)
0x4003 cmd1.NPU_SET_IFM_BASE3 0 0x00000000 (0)
0x010b cmd0.NPU_SET_IFM_HEIGHT0_M1 96 -
0x010c cmd0.NPU_SET_IFM_HEIGHT1_M1 96 -
0x010a cmd0.NPU_SET_IFM_WIDTH0_M1 127 -
0x0104 cmd0.NPU_SET_IFM_DEPTH_M1 0 -
0x4006 cmd1.NPU_SET_IFM_STRIDE_C 0 0x00000001 (1)
0x4005 cmd1.NPU_SET_IFM_STRIDE_Y 0 0x00000080 (128)
0x4004 cmd1.NPU_SET_IFM_STRIDE_X 0 0x00000001 (1)
0x0109 cmd0.NPU_SET_IFM_ZERO_POINT 85 -
0x0105 cmd0.NPU_SET_IFM_PRECISION 0 -
0x0107 cmd0.NPU_SET_IFM_UPSCALE 0 -
0x0100 cmd0.NPU_SET_IFM_PAD_TOP 1 -
0x0101 cmd0.NPU_SET_IFM_PAD_LEFT 1 -
0x0103 cmd0.NPU_SET_IFM_PAD_BOTTOM 1 -
0x0102 cmd0.NPU_SET_IFM_PAD_RIGHT 1 -
0x011f cmd0.NPU_SET_OFM_REGION 1 -
0x4010 cmd1.NPU_SET_OFM_BASE0 0 0x00003080 (12416)
0x4011 cmd1.NPU_SET_OFM_BASE1 0 0x00000000 (0)
0x4012 cmd1.NPU_SET_OFM_BASE2 0 0x00000000 (0)
0x4013 cmd1.NPU_SET_OFM_BASE3 0 0x00000000 (0)
0x011b cmd0.NPU_SET_OFM_HEIGHT0_M1 96 -
0x011c cmd0.NPU_SET_OFM_HEIGHT1_M1 96 -
0x011a cmd0.NPU_SET_OFM_WIDTH0_M1 127 -
0x0112 cmd0.NPU_SET_OFM_HEIGHT_M1 96 -
0x0111 cmd0.NPU_SET_OFM_WIDTH_M1 127 -
0x0113 cmd0.NPU_SET_OFM_DEPTH_M1 31 -
0x4016 cmd1.NPU_SET_OFM_STRIDE_C 0 0x00000001 (1)
0x4015 cmd1.NPU_SET_OFM_STRIDE_Y 0 0x00001000 (4096)
0x4014 cmd1.NPU_SET_OFM_STRIDE_X 0 0x00000020 (32)
0x0118 cmd0.NPU_SET_OFM_ZERO_POINT 131 -
0x0114 cmd0.NPU_SET_OFM_PRECISION 0 -
0x0121 cmd0.NPU_SET_KERNEL_HEIGHT_M1 2 -
0x0120 cmd0.NPU_SET_KERNEL_WIDTH_M1 2 -
0x0122 cmd0.NPU_SET_KERNEL_STRIDE 4 -
0x0128 cmd0.NPU_SET_WEIGHT_REGION 0 -
0x4020 cmd1.NPU_SET_WEIGHT_BASE 0 0x00000280 (640)
0x4021 cmd1.NPU_SET_WEIGHT_LENGTH 0 0x000001f0 (496)
0x0129 cmd0.NPU_SET_SCALE_REGION 0 -
0x4022 cmd1.NPU_SET_SCALE_BASE 0 0x00000140 (320)
0x4023 cmd1.NPU_SET_SCALE_LENGTH 0 0x00000140 (320)
0x0125 cmd0.NPU_SET_ACTIVATION 0 -
0x0126 cmd0.NPU_SET_ACTIVATION_MIN 0 -
0x0127 cmd0.NPU_SET_ACTIVATION_MAX 255 -
0x0116 cmd0.NPU_SET_OFM_BLK_HEIGHT_M1 7 -
0x0115 cmd0.NPU_SET_OFM_BLK_WIDTH_M1 15 -
0x0117 cmd0.NPU_SET_OFM_BLK_DEPTH_M1 15 -
0x010d cmd0.NPU_SET_IFM_IB_END 6 -
0x012d cmd0.NPU_SET_AB_START 6 -
0x0124 cmd0.NPU_SET_ACC_FORMAT 0 -
0x012f cmd0.NPU_SET_BLOCKDEP 0 -
0x0002 cmd0.NPU_OP_CONV 0 -
0x0000 cmd0.NPU_OP_STOP 65535 -
number of commands 56
command stream length in words 74