Skip to content
  • Categories
  • Newsletter
  • Recent
  • AI Insights
  • Tags
  • Popular
  • World
  • Groups
Skins
  • Light
  • Brite
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
  1. Home
  2. AI Insights
  3. Microsoft Open Sources SliceGPT: Compresses Large Models by ~25% While Maintaining Performance
uSpeedo.ai - AI marketing assistant
Try uSpeedo.ai — Boost your marketing

Microsoft Open Sources SliceGPT: Compresses Large Models by ~25% While Maintaining Performance

Scheduled Pinned Locked Moved AI Insights
ai-articles
1 Posts 1 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • baoshi.raoB Offline
    baoshi.raoB Offline
    baoshi.rao
    wrote last edited by
    #1

    Microsoft and ETH Zurich researchers jointly open-sourced SliceGPT, a technology that can drastically compress the weight matrices of large models, reducing model size by approximately 25% while maintaining performance. Experimental data shows that SliceGPT has been successfully applied to several large models, such as LLAMA-270B, OPT66B, and Phi-2, while preserving zero-shot task performance.

    The core technology of SliceGPT lies in leveraging computational invariance to simplify and compress models. By applying orthogonal matrix transformations to each weight matrix, SliceGPT achieves extreme model compression. Additionally, the sliced models can run directly on consumer-grade GPUs, such as NVIDIA's 4090 and 4080, without requiring additional code optimization, making deployment more convenient.

    In experiments, researchers found that SliceGPT's slicing technique is remarkably simple and efficient. Model compression can be completed within a few hours using a single GPU, without the need for complex fine-tuning processes. The sliced models maintain high-quality generative task performance while improving throughput, delivering overall satisfactory results. The open-source release of SliceGPT offers a novel and effective method for compressing large models, substantially reducing deployment resources while preserving model performance. This technology is expected to provide developers and enterprises with more convenient and efficient solutions for large model applications.

    Open-source address: https://github.com/microsoft/TransformerCompression

    Paper address: https://arxiv.org/abs/2401.15024

    1 Reply Last reply
    0
    Reply
    • Reply as topic
    Log in to reply
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes


    • Login

    • Don't have an account? Register

    • Login or register to search.
    • First post
      Last post
    0
    • Categories
    • Newsletter
    • Recent
    • AI Insights
    • Tags
    • Popular
    • World
    • Groups