Huawei’s HiSilicon Ascend 910C is a version of the company’s Ascend 910 processor for AI training introduced in 2019. By now, the performance of the Ascend 910 is barely sufficient for the cost-efficient training of large AI models. Still, when it comes to inference, it delivers 60% of Nvidia’s H100 performance, according to researchers from DeepSeek. While the Ascend 910C is not a performance champion, it can succeed in reducing China’s reliance on Nvidia GPUs.
Testing by DeepSeek revealed that the 910C processor exceeded expectations in inference performance. Additionally, with manual optimizations of CUNN kernels, its efficiency could be further improved. DeepSeek’s native support for Ascend processors and its PyTorch repository allows for seamless CUDA-to-CUNN conversion with minimal effort, making it easier to integrate Huawei’s hardware into AI workflows.
This suggests that Huawei’s AI processor’s capabilities are advancing rapidly, despite sanctions by the U.S. government and the lack of access to leading-edge process technologies of TSMC.
While Huawei and SMIC have managed to catch up with TSMC’s capabilities in the 2019 – 2020 era and produce a chip that can be considered competitive with Nvidia’s A100 and H100 processors, the Ascend 910C is not the best option for AI training. AI training remains a domain where Nvidia maintains its undisputable lead.
DeepSeek’s Yuchen Jin said that long-term training reliability is a critical weakness of Chinese processors. This challenge stems from the deep integration of Nvidia’s hardware and software ecosystem, which has been developed over two decades. While inference performance can be optimized, sustained training workloads require further improvements in Huawei’s hardware and software stack.
Just like the original Ascend 910, the new Ascend 910C chip uses chiplet packaging, and its main compute SoC has around 53 billion transistors. While the original compute chiplet of the Ascend 910 was made by TSMC using its N7+ fabrication technology (7nm-class with EUV), the compute chiplet of the Ascend 910C is made by SMIC on its 2nd Generation 7nm-class process technology known as N+2.
Looking ahead, some experts predict that as AI models converge to Transformer architectures, the importance of Nvidia’s software ecosystem may decline. DeepSeek’s expertise in the optimization of hardware and software can also significantly reduce dependency on Nvidia, offering AI companies a more cost-effective alternative, particularly for inference. However, to compete at a global scale, China must overcome the challenge of training stability and further refine its AI computing infrastructure.
#DeepSeek #research #suggests #Huaweis #Ascend #910C #delivers #Nvidia #H100 #inference #performance