Tosa to LLVM IR

Hi, I am trying to lower the tosa to llvm ir using these passes:

./mlir-opt --pass-pipeline="builtin.module(func.func(tosa-to-linalg-named))"  tosa.mlir > linalg_tmp.mlir
./mlir-opt --tosa-to-arith linalg_tmp.mlir > linalg_tmp_.mlir
./mlir-opt --pass-pipeline="builtin.module(func.func(tosa-to-linalg))"  linalg_tmp_.mlir > linalg.mlir
./mlir-opt --one-shot-bufferize="bufferize-function-boundaries=1 " --drop-equivalent-buffer-results --split-input-file --convert-linalg-to-affine-loops --cse linalg.mlir > affine.mlir
./mlir-opt --affine-loop-fusion affine.mlir > affine-fusion.mlir
./mlir-opt --affine-loop-tile="tile-size=32" affine-fusion.mlir > affine-tile.mlir
./mlir-opt --lower-affine --convert-scf-to-cf --convert-math-to-llvm --arith-expand -memref-expand -finalize-memref-to-llvm -convert-func-to-llvm -reconcile-unrealized-casts affine-tile.mlir > llvm.mlir

The input tosa.mlir contains a MATMUL as follows:

module {
  func.func @main(%arg0: tensor<1024x1024xf32>) -> tensor<1024x1024xf32> {
    %0 = "tosa.const"() <{value = dense_resource<torch_tensor_1024_1024_torch.float32> : tensor<1024x1024xf32>}> : () -> tensor<1024x1024xf32>
    %1 = tosa.reshape %arg0 {new_shape = array<i64: 1, 1024, 1024>} : (tensor<1024x1024xf32>) -> tensor<1x1024x1024xf32>
    %2 = tosa.reshape %0 {new_shape = array<i64: 1, 1024, 1024>} : (tensor<1024x1024xf32>) -> tensor<1x1024x1024xf32>
    %3 = tosa.matmul %1, %2 : (tensor<1x1024x1024xf32>, tensor<1x1024x1024xf32>) -> tensor<1x1024x1024xf32>
    %4 = tosa.reshape %3 {new_shape = array<i64: 1024, 1024>} : (tensor<1x1024x1024xf32>) -> tensor<1024x1024xf32>
    return %4 : tensor<1024x1024xf32>
  }
}

But the conversion fails at bufferization throwing this error:

within split at linalg.mlir:1 offset :4:10: error: op was not bufferized
    %0 = tosa.reshape %arg0 {new_shape = array<i64: 1, 1024, 1024>} : (tensor<1024x1024xf32>) -> tensor<1x1024x1024xf32>
         ^
within split at linalg.mlir:1 offset :4:10: note: see current operation: %3 = "tosa.reshape"(%0) <{new_shape = array<i64: 1, 1024, 1024>}> : (tensor<1024x1024xf32>) -> tensor<1x1024x1024xf32>

So, I want to know how to get the reshape removed? What modifications should I do for the passes described above. Thanks you.