Skip to content
  • Categories
  • Newsletter
  • Recent
  • AI Insights
  • Tags
  • Popular
  • World
  • Groups
Skins
  • Light
  • Brite
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
  1. Home
  2. AI Insights
  3. Together AI Releases AI Model StripedHyena-7B, Outperforming Llama-27B
uSpeedo.ai - AI marketing assistant
Try uSpeedo.ai — Boost your marketing

Together AI Releases AI Model StripedHyena-7B, Outperforming Llama-27B

Scheduled Pinned Locked Moved AI Insights
techinteligencia-ar
1 Posts 1 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • baoshi.raoB Offline
    baoshi.raoB Offline
    baoshi.rao
    wrote on last edited by
    #1

    Together AI has released StripedHyena-7B, an innovative force in the field of artificial intelligence models that has garnered widespread attention. The base version of this model is StripedHyena-Hessian-7B (SH7B), and a chat model, StripedHyena-Nous-7B (SH-N7B), has also been introduced. StripedHyena builds on key lessons learned from a series of effective sequence modeling architectures created the previous year, such as H3, Hyena, HyenaDNA, and Monarch Mixer.

    image.png

    Researchers emphasize that StripedHyena offers higher processing efficiency during training, fine-tuning, and generating long sequences, with faster speeds and better memory efficiency. By employing a unique hybrid technique, StripedHyena combines gated convolutions and attention into what is known as the Hyena operator. In short-sequence tasks, including those on the OpenLLM leaderboard, StripedHyena outperforms Llama-27B, Yi7B, and the most powerful Transformer alternatives, such as RWKV14B.

    image.png

    image.png

    The model was evaluated on various benchmarks for both short sequence tasks and long prompts. Perplexity scaling experiments conducted on Project Gutenberg books showed that perplexity saturates at 32k or decreases below this point, indicating the model's ability to absorb information from longer prompts.

    StripedHyena achieves efficiency by combining attention and gated convolution into a unique hybrid structure called Hyena operators. Researchers optimized this hybrid design using innovative grafting techniques, allowing architectural modifications during training.

    Researchers emphasize that one of the key advantages of StripedHyena is its speed and memory efficiency in various tasks such as training, fine-tuning, and generating long sequences. In end-to-end training on 32k, 64k, and 128k lines, StripedHyena achieves 30%, 50%, and 100% improvements respectively compared to optimized Transformer baselines (using FlashAttention v2 and custom kernels).

    In the future, researchers hope to make significant progress in multiple areas of the StripedHyena model. They plan to build larger models to handle longer contexts, pushing the boundaries of information understanding. Additionally, they aim to introduce multimodal support, enhancing the model's adaptability by allowing it to process and understand data from various sources such as text and images.

    The StripedHyena model is expected to surpass Transformer models in performance by introducing additional computations (e.g., using multiple heads in gated convolutions). This method, inspired by linear attention, has been proven to improve model quality during training in architectures like H3 and MultiHyena, while also offering advantages in inference efficiency. Readers can check the project's blog and details to give due credit to all researchers involved in this study.

    Project URL: https://huggingface.co/togethercomputer/StripedHyena-Hessian-7B

    Blog: https://www.together.ai/blog/stripedhyena-7b

    1 Reply Last reply
    0
    Reply
    • Reply as topic
    Log in to reply
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes


    • Login

    • Don't have an account? Register

    • Login or register to search.
    • First post
      Last post
    0
    • Categories
    • Newsletter
    • Recent
    • AI Insights
    • Tags
    • Popular
    • World
    • Groups