面向飞腾处理器平台的快速卷积算法优化
作者:
作者单位:

作者简介:

通讯作者:

基金项目:

国家自然科学基金资助项目(61402321);天津市自然科学基金资助项目(23JCYBJC01770);2024年第一批天津市制造业高质量发展专项资金资助项目(24ZGNGX00020)


Fast convolution algorithm optimization for Phytium processor
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
    摘要:

    为解决卷积神经网络难以在计算资源受限设备上部署的问题,面向国产FT-2000/4多核处理器提出一种高性能的快速卷积算法FastInfer。采用分块策略优化通用矩阵乘法,将处理器访问频率高的数据存入更靠近处理器的缓存中,从而提高计算过程中的访存效率。配合分块方案设计实现高性能的矩阵乘法微内核,使用向量外积运算更新数据,提高计算访存比,实现最大程度掩盖访存指令的延迟。最终实验结果表明,FastInfer在FT-2000/4处理器上的峰值计算性能达到99.56 GFLOPS。在不同输入规模的通用矩阵乘法测试中,FastInfer性能是OpenBLAS算法的1.07倍和1.52倍。在卷积测试中,FastInfer性能是ARM Compute Library算法的1.32倍,实现了在FT-2000/4多核处理器上的高性能卷积计算。

    Abstract:

    To address the issue of deploying convolutional neural networks on resource-constrained devices, a high-performance fast convolution algorithm(FastInfer)was proposed for the FT-2000/4 multi-core processor. The algorithm optimized general matrix multiplication using a block-based strategy, storing frequently accessed data closer to the processor's cache to improve memory access efficiency during computation. In addition, a high-performance matrix multiplication microkernel was designed and implemented, utilizing vector outer product operations to update data and enhance the memory-to-computation ratio. This design maximized the masking of memory instruction latency. Experimental results demonstrated that FastInfer achieved a peak computational performance of 99.56 GFLOPS on the FT-2000/4 processor. In tests with general matrix multiplication at various input scales, FastInfer outperformed OpenBLAS by 1.07 and 1.52 times. In convolution tests, FastInfer performed 1.32 times better than the ARM Compute Library, achieving high-performance convolution computation on the FT-2000/4 multi-core processor.

    参考文献
    相似文献
    引证文献
引用本文

赵亚飞,杨耀功,王永刚,魏继增.面向飞腾处理器平台的快速卷积算法优化[J].上海理工大学学报,2024,46(6):610-619.

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
历史
  • 收稿日期:2024-10-16
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2024-12-28