手机处理器排名图片DPU在数据中心的新霸主
AI、5G和云计算技术的飞速发展正重塑世界,数据中心作为这些技术的基石,在数字化转型中扮演着关键角色。面对挑战,传统的CPU和GPU已无法满足快速变化的应用需求,而性能更强大、更专用、异构设计的芯片则成为数据中心不可或缺的一员。
芯片巨头们紧跟这一趋势,不仅通过收购也自研了多样化芯片。英伟达在去年十月推出了BlueField-2,这是DPU(Data Processing Unit)的先驱。而今年4月,英伟达CEO黄仁勋宣布了其数据中心芯片战略升级,将GPU+CPU+DPU三合一,以逐年飞跃,并展示了自研Arm架构CPUGrace。
那么,DPU又是什么?为什么它能在数据中心“上位”?未来数据中心为何将会是一体机?
DPU价值双重
要理解DPU,我们首先需要了解它解决的问题。在发布时,黄仁勋指出,当下的数据中心软件定义,使得它们更加灵活但同时产生巨大的负担。基础架构运行消耗20%-30%的CPU核,因此需要新的处理器——即DPU。
或者说,以往以CPU为核心的架构已经不再足够,以数据为核心才能更好地满足市场和应用需求。英伟达网络事业部亚太区市场开发高级总监宋庆春表示:“以前计算规模和数据量没那么大,冯诺依曼架构很好地解决了提高计算性能的问题,但随着数据量越来越大,以及AI技术的发展,传统模型会造成网络拥塞。”以数据为中心新架构可以解决这些问题,并带来10倍性能提升。
具体来说,英伟达DPU集成了三个关键要素:基于广泛Arm架构且与SoC组件密切配合的人类标准最高性能及可编程多核CPU;能够以线速解析处理并高效传输到GPU和CPUNetwork接口;以及各种灵活可编程加速引擎,可以卸载AI、机器学习、安全等业务并提升性能。
这意味着DPU可以针对安全、网络存储以及HPC等业务进行加速,是其第一层价值。而第二层价值则在于提供创新思路,为以 数据为中心的计算架构提供创新的功能实现之前难以或无法实现的事项。
以前所有操作都由CPU完成,不仅需要大量内核,还非常低效。如果将一些操作,如OVS卸载到DPU上运行,不仅能提升效率减少利用率,还能实现业务隔离。此外,与VMWare合作开发Monterey项目,即把Hypervisor里的防火墙存储管理等功能卸载到DPUSong庆春举例说明,这样做既保证了高安全性也实现裸金属业务性能。这也是VMWare第一次把源代码开放给合作伙伴共同开发基于企业级云解决方案。
另一个例子是与RedHat合作。RedHat无论是在容器还是虚拟化场景下,即便使用所有内核,也无法达到100G线速。这时采用DPURun HYPERServer OVS or Container opersion in DPUTo achieve 100G or even 200G full line speed, and provide all CPU resources to business.
如何提升?
“我们选择DPUBecause we encountered traditional server bandwidth bottlenecks. We wanted to solve network performance bottlenecks and reduce costs.” UCloud technical expert Ma Yankang explained. “Both sides initially had the same cognition that Dpu can implement hardware offload, software and hardware integration will become the trend of the future.”
UCloud achieved a series of data center performance enhancements with Dpu and matching software stack DOCA.
Ma Yankang introduced that UCloud originally used VPC gateway as a method for dividing VPCs between bare metal servers, requiring many gateway server clusters for management, which brought cost challenges (approximately 4-8 servers per small cluster). With Dpu, they integrated VPC management into the internal Dpu, including OVS packet forwarding and GRE encapsulation, significantly improving efficiency. The original 10GbE network card upgrade to 25GbE also greatly improved performance.
Such improvements have been recognized by customers. "With the solution using Dpu already in use in some big data finance database container cloud businesses, their feedback is very good." For example, a company doing big data business reduced its VPC cluster after deploying N-to-N data calculations on multiple machines with higher bandwidth and better performance while reducing maintenance costs; another financial customer deployed four servers for their previous architecture but switched to several cards with DPu.
Data center storage also benefits from DPu. Previously UCloud used local disks for storage which was prone to bad disk issues difficult maintenance lost data recovery difficult; In new architecture they adopted RSSd cloud storage backend groups core based on DPu NVMe SNAP functionality implementing computation & Storage Decoupling user can bypass installation achieving minute-level deployment reducing VM types allowing flexible use of disks fast fault migration three copies more secure reliable security enhanced by DPu
In summary,Dpupromises significant improvements over existing solutions offering flexibility scalability cost-effectiveness high-performance networking advanced security features such as Hardware-based encryption decryption support IPsec algorithms along with AI acceleration capabilities further enhancing its value proposition within the context of Cloud computing