AI2 Releases Open Language Model OLMo, Claiming Performance Comparable to Llama2

baoshi.rao

AI2's newly released Open Language Model (OLMo) framework aims to promote research and experimentation in large-scale language models. By offering training code, models, and evaluation code on Hugging Face and GitHub, AI2 is committed to enabling academia and researchers to collaboratively study the science of language models, explore the impact of new pre-training data subsets on downstream performance, and investigate novel pre-training methods and stability.

The first batch of models in this project includes four 7B-scale final variants, corresponding to different architectures, optimizers, and training hardware, as well as a 1B-scale model, all trained on at least 2T tokens. This is the first step in a long-term plan, with intentions to continue releasing larger-scale models, instruction-tuned models, and more variants. Each model provides complete training data, including the code for generating the training data, as well as AI2's Dolma and WIMBD for analyzing pre-training data. Additionally, full model weights, training code, training logs, training metrics in the form of Weights & Biases logs, and inference code are provided. Over 500 checkpoints from each model's training process are also available as revisions on HuggingFace.

In creating powerful open models, AI2 has drawn lessons from many other open and partially open models, using them as competitive benchmarks for OLMo. The project's technical report mentions that the OLMo7B model outperforms Llama2 in areas such as generative tasks or reading comprehension (e.g., truthfulQA), but slightly lags behind in popular QA tasks like MMLU or Big-bench Hard.

For the 1B OLMo model, AI2's Paloma and checkpoints available on GitHub were used for analysis to explore the relationship between factors like language prediction and model scale. AI2 emphasizes that Paloma's approach attempts to more evenly represent the numerous domains where language models are used by uniformly sampling across various fields. The OLMo framework incorporates many trends from recent literature, including bias mitigation (such as stability in PaLM), the SwiGLU activation function used by PaLM and Llama, Rotary Position Embeddings (RoPE), and a modified version of GPT-NeoX-20B's BPE-based tokenizer designed to reduce personally identifiable information.

This release represents just the beginning of OLMo and its framework, with future plans to introduce work across different scales, modalities, datasets, safety measures, and evaluations. AI2 encourages the use of OLMo models, providing straightforward installation steps and usage examples, and has indicated that future releases will include features such as fine-tuning guidance models, complete training logs, and wandb reports.

Blog URL: https://blog.allenai.org/olmo-open-language-model-87ccfc95f58 Project Portal: https://top.aibase.com/tool/olmo