How to print and set NPU frequency on FVP?

I wonder how to print and set the Ethous-U85 frequency on the FVP.
I am attempting to adjust the frequencies of NPU and SRAM in order to compare the number of cycles estimated by Vela.

Hi @ccg,

As you may be aware, the repository has moved to Gitlab here ml-embedded-evaluation-kit, we intend to monitor questions raised there more frequently and this space will be deprecated soon.

Regarding the question - FVPs are intended to model functional behaviour rather than replicate precise clock timing/frequency. The Arm Ethos-U85 (and other Arm Ethos-U NPUs) within the fast model can provide a reliable estimate of cycle counts for a given workload. By scaling these cycle counts against a frequency (within the IP’s recommended operating range), you can derive an approximation of wall-clock time. While there is no direct way to adjust the “clock,” in the Arm Corstone-320 design the NPU shares the same clock source as the main CPU

Just a word on Vela too - the performance numbers there are experimental and may not match the real-world data. We recommend using the FVP to get cycle counts estimate if the workload maps completely on the NPU. If there is considerable fallback onto the CPU, then FPGA implementation would be the only option.

Hope this helps,
/KS

Thanks a lot. Next time, I will ask questions on the new platform.
By the way, I would like to ask if the CPU frequency on FVP cannot be changed? I saw a register MPS4_SCC->CFG_ACLK, but I was unable to configure it.

Yes, the ACLK is a RO register and it is likely to read 0 for FVP. On an FPGA implementation it should give the configured clock value.

Even if you did manage to change the frequency, it won’t impact the cycle count numbers coming from the FVP though. As I mentioned previously, FVP is for validating the functional flow, it doesn’t allow for playing with clock frequencies unfortunately. But looking at the cycle counts you can infer what performance you will get for a certain frequency. For example, if the output states 1M clock cycles for a certain workload, NPU at 1GHz will finish the task in 1e6/1e9 => 1 millisecond for the workload.

Hope this helps.

Thanks,
/KS

OK, I got it.
Another question I posted received no response, so I would like to ask if you are familiar with the configurations of SRAM_MODE and EXT_MODE. Can these parameters be used to configure the frequency division coefficient? If the configuration of the frequency division coefficient is not supported here, why do I see some people mentioning that the NPU cycle of FVP is more precise than that of Vela?

so I would like to ask if you are familiar with the configurations of SRAM_MODE and EXT_MODE. Can these parameters be used to configure the frequency division coefficient?

Unfortunately, not. I’ve now responded on the other question as well.

If the configuration of the frequency division coefficient is not supported here, why do I see some people mentioning that the NPU cycle of FVP is more precise than that of Vela?

Vela does not execute workloads directly; instead, it estimates cycles based on a simplified model of bandwidth and latency parameters. In contrast, the NPU component of the FVP actually runs the workload, and its reported active cycle counts have been validated to be within about 10% of FPGA implementations. Additional TA parameters allow you to adjust the bandwidth and latency envelope to see the resulting impact on cycle counts.

While Vela can provide reasonable estimates, it can also deviate significantly. For NPU performance evaluation, we therefore recommend using FVP, with the understanding that the definitive reference for more comprehensive benchmarking is always a physical implementation.

Thanks a lot. Your answer really helps.