This is a large language model for mathematics. On the MATH benchmark Llemma outperforms all known open base models, as well as the unreleased Minerva model suite on an equi-parameter basis. Moreover, Llemma is capable of tool use and formal theorem proving without any further finetuning.
The resulting models show improved mathematical capabilities, and can be adapted to various tasks through prompting or additional fine-tuning.
These models are particularly strong at chain-of-thought mathematical reasoning and using computational tools for mathematics, such as Python and formal theorem provers.
Link: