Skip to content
  • Categories
  • Newsletter
  • Recent
  • AI Insights
  • Tags
  • Popular
  • World
  • Groups
Skins
  • Light
  • Brite
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
  1. Home
  2. AI Insights
  3. Tencent's Latest Large Model Training Method: Angel Framework Upgrade Boosts Efficiency by 2.6x
uSpeedo.ai - AI marketing assistant
Try uSpeedo.ai — Boost your marketing

Tencent's Latest Large Model Training Method: Angel Framework Upgrade Boosts Efficiency by 2.6x

Scheduled Pinned Locked Moved AI Insights
techinteligencia-ar
1 Posts 1 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • baoshi.raoB Offline
    baoshi.raoB Offline
    baoshi.rao
    wrote on last edited by
    #1

    Amid exponential growth in large model parameter scales, Tencent recently disclosed its latest training method for the Hunyuan large model. By upgrading its self-developed machine learning framework Angel, the company has successfully improved large model training efficiency. This upgrade enables up to 50% reduction in computing power costs for training billion-parameter models, providing strong support to address computing power shortages. The Angel framework upgrade not only enhances efficiency but also supports ultra-large-scale training of single tasks with tens of thousands of GPUs, further improving the performance and efficiency of Tencent Cloud's HCC large model dedicated computing cluster.

    64b6550ec922a.jpg

    Image source note: AI-generated image, authorized by Midjourney

    To further enhance the training and inference efficiency of large models, Tencent has independently developed the machine learning training framework AngelPTM. In terms of storage, AngelPTM employs multi-dimensional parallel computing, including data parallelism, model parallelism, pipeline parallelism, and sequence parallelism.

    Additionally, by introducing unified perspective technology based on ZeRO-Cache, the framework integrates GPU memory and main memory, effectively expanding GPU memory capacity and increasing single-machine storage capacity by 90%. For communication, Tencent adopted a hardware-software co-design approach, building a 3.2T RDMA network to broaden bandwidth while implementing GPU topology awareness at the framework software level to achieve load-balanced pipeline parallelism. To ensure stability, Tencent has implemented monitoring across infrastructure networks, hardware, storage, and cloud-native scheduling, along with automatic retraining and system fault tolerance mechanisms.

    To address the rising costs of inference, Tencent has launched the large model inference framework AngelHCF. By expanding parallel capabilities and optimizing key functions—including Embedding sharing, Attention operator optimization, and Paged Attention optimization—the framework enhances inference performance. Compared to mainstream frameworks, AngelHCF achieves a 1.3x speedup in inference. In Tencent's Hunyuan large model for text-to-image generation, this framework reduces inference time from 10 seconds to just 3-4 seconds.

    Tencent has not only achieved significant efficiency improvements in large model training but also made substantial optimizations in the inference phase. These technological advancements are now available on Tencent Cloud, offering users superior training and inference acceleration capabilities while supporting end-to-end fine-tuning for customized intelligent applications. Over 300 internal Tencent services and application scenarios have already integrated the Hunyuan large model for testing, covering areas such as text summarization, creation, translation, and coding. This marks a comprehensive upgrade across the entire production pipeline—from model development to application deployment—forming a one-stop platform that further accelerates the advancement of large model applications.

    1 Reply Last reply
    0
    Reply
    • Reply as topic
    Log in to reply
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes


    • Login

    • Don't have an account? Register

    • Login or register to search.
    • First post
      Last post
    0
    • Categories
    • Newsletter
    • Recent
    • AI Insights
    • Tags
    • Popular
    • World
    • Groups