Can the yolov8n model be run on the object detection of yolofastest?

LYG49 · August 31, 2024, 3:22am

HI ,
I’m a newbie,
Currently I have a question. I trained a yolov8n model.
It works well on PC.
The tflite model has been converted to vela,
However, when the program runs, it shows that the model init failed.

The following is the log converted by vela:
Accelerator configuration Ethos_U55_256
System configuration RTSS_HP_SRAM_MRAM
Memory mode Shared_Sram
Accelerator clock 400 MHz
Design peak SRAM bandwidth 2.44 GB/s
Design peak Off-chip Flash bandwidth 0.39 GB/s

Total SRAM used 2758.31 KiB
Total Off-chip Flash used 2204.52 KiB

CPU operators = 5 (1.7%)
NPU operators = 287 (98.3%)

Average SRAM bandwidth 2.00 GB/s
Input SRAM bandwidth 48.18 MB/batch
Weight SRAM bandwidth 11.96 MB/batch
Output SRAM bandwidth 28.89 MB/batch
Total SRAM bandwidth 89.09 MB/batch
Total SRAM bandwidth per input 89.09 MB/inference (batch size 1)

Average Off-chip Flash bandwidth 0.05 GB/s
Input Off-chip Flash bandwidth 0.02 MB/batch
Weight Off-chip Flash bandwidth 2.14 MB/batch
Output Off-chip Flash bandwidth 0.00 MB/batch
Total Off-chip Flash bandwidth 2.17 MB/batch
Total Off-chip Flash bandwidth per input 2.17 MB/inference (batch size 1)

Neural network macs 1014000800 MACs/batch
Network Tops/s 0.05 Tops/s

NPU cycles 13486835 cycles/batch
SRAM Access cycles 12643265 cycles/batch
DRAM Access cycles 0 cycles/batch
On-chip Flash Access cycles 0 cycles/batch
Off-chip Flash Access cycles 18347 cycles/batch
Total cycles 17795775 cycles/batch

Batch Inference time 44.49 ms, 22.48 inferences/s (batch size 1)

The following is the log of the program running:
Loading model from address: 0x0x8018bcb0
Model size: 2082048 bytes
Loading op resolver
INFO - Added ethos-u support to op resolver
Creating allocator using tensor arena at address: 0x0x2000000
INFO - Creating allocator using tensor arena at 0x0x2000000
Created new allocator at address: 0x0x226ff8c
INFO - Allocating tensors
ERROR - tensor allocation failed!

AlexTawseArm · September 1, 2024, 1:10pm

Hi @LVY49, I can see that the model is trying to use 2758.31KiB of SRAM - this will exceed the 2MiB SRAM region allocated for the activation buffer.

Normally this would be allocated at 0x31000000 though I can see you are trying to create this at 0x2000000 - did you modify the linker script/scatter file in any way? Also, are you building with -DLOG_LEVEL=LOG_LEVEL_DEBUG?

LYG49 · September 2, 2024, 9:06am

HI @AlexTawseArm

I changed the size of ACTIVATION_BUF_SZ in usecase.cmake
Change it to 0x31000000. But the following log is displayed.

/usr/local/bin/gcc-arm-none-eabi-10.3-2021.10/bin/…/lib/gcc/arm-none-eabi/10.3.1/…/…/…/…/arm-none-eabi/bin/ld: bin/ethos-u-alif_object_detection.axf section .bss.sram1' will not fit in region SRAM1’
/usr/local/bin/gcc-arm-none-eabi-10.3-2021.10/bin/…/lib/gcc/arm-none-eabi/10.3.1/…/…/…/…/arm-none-eabi/bin/ld: region `SRAM1’ overflowed by 589824 bytes
collect2: error: ld returned 1 exit status
make[3]: *** [CMakeFiles/ethos-u-alif_object_detection.dir/build.make:132：bin/ethos-u-alif_object_detection.axf] error 1
make[2]: *** [CMakeFiles/Makefile2:1050：CMakeFiles/ethos-u-alif_object_detection.dir/all] error 2
make[1]: *** [CMakeFiles/Makefile2:1057：CMakeFiles/ethos-u-alif_object_detection.dir/rule] error 2

LYG49 · September 2, 2024, 9:07am

However my build settings are as follows:

cmake -DTARGET_PLATFORM=ensemble
-DTARGET_BOARD=AppKit
-DTARGET_SUBSYSTEM=RTSS-HP
-DROTATE_DISPLAY=180
-DCONSOLE_UART=4
-DCMAKE_TOOLCHAIN_FILE=scripts/cmake/toolchains/bare-metal-gcc.cmake
-DTARGET_REVISION=B
-DLOG_LEVEL=LOG_LEVEL_DEBUG
-DCMAKE_BUILD_TYPE=Release …