Microsoft Azure AI Adds 40 New Large Models Including Phi and Jais

baoshi.rao

Microsoft has officially announced the addition of 40 new models to its Azure AI cloud development platform, including Falcon, Phi, Jais, Code Llama, CLIP, Whisper V3, Stable Diffusion, covering text, image, code, speech generation, and more.

Developers can quickly integrate these models into applications using APIs or SDKs, while also supporting tailored features such as data fine-tuning and instruction optimization.

Additionally, developers can use keyword searches to quickly find suitable products in Azure AI's "Model Marketplace." For example, typing "code" will display relevant models.

Experience it here: https://ai.azure.com/

Here is a brief introduction to some notable new models.

Whisper V3

Whisper V3 is the latest speech model developed by OpenAI, trained on multilingual data consisting of 1 million hours of weakly labeled audio and 4 million hours of pseudo-labeled audio. It has been trained for both speech recognition and speech translation, supporting functions such as speech translation and transcription.

Stable Diffusion

Stable Diffusion is a text-to-image diffusion model developed by Stability AI, capable of generating various types of images such as sketches, oil paintings, cartoons, and 3D images. It is currently one of the strongest open-source diffusion models.

Microsoft Azure AI will provide five different versions of the model: Stable-Diffusion-V1-4, Stable-Diffusion-2-1, Stable-Diffusion-V1-5, Stable-Diffusion-Inpainting, and Stable-Diffusion-2-Inpainting.

Phi

Phi-1-5 is a 1.3 billion parameter Transformer model. It was trained on the same data as Phi-1, with the addition of a new data source composed of various NLP synthetic texts.

When evaluated on benchmarks testing common sense, language understanding, and logical reasoning, Phi-1.5 stands out as one of the top-performing models with fewer than 10 billion parameters. This model can write poetry, draft emails, create stories, summarize text, and write Python code, among other tasks.

Phi-2, with 2.7 billion parameters, shows significant improvements in reasoning capabilities and safety measures compared to Phi-1.5. Despite its smaller size relative to other Transformer-based models in the industry, it delivers impressive performance.

Falcon

The Falcon model is a large language model developed by the Abu Dhabi research team in the UAE. Trained on a dataset of 1 trillion tokens, it supports text generation, content summarization, and other functions. The model is available in four variants: Falcon-40b, Falcon-40b-Instruct, Falcon-7b-Instruct, and Falcon-7b.

SAM

SAM (Segment Anything Model) is an image segmentation model developed by Meta that can quickly segment images based on prompts. SAM was trained on a dataset of 11 million images and 1.1 billion masks.

SAM supports zero-shot training for new image segmentation tasks and currently offers three models: Facebook-Sam-Vit-Large, Facebook-Sam-Vit-Huge, and Facebook-Sam-Vit-Base.

CLIP

CLIP is a multimodal AI model developed by OpenAI, trained on extensive image-text pairs to understand image content and associate it with natural language descriptions. Through joint representation learning of images and text, CLIP significantly enhances various computer vision tasks, including classification, object detection, image captioning, and more.

Currently, there are three versions: OpenAI-CLIP-Image-Text-Embeddings-ViT-Base-Patch32, OpenAI-CLIP-ViT-Base-Patch32, and OpenAI-CLIP-ViT-Large-Patch14.

Code Llama

Code Llama is a development-focused model developed by Meta, capable of generating, reviewing, and modifying code through text. It includes eight versions, such as CodeLlama-34b-Python and CodeLlama-13b-Instruct, and is one of the strongest open-source code models available today.