FLM Family
FLM is a large language model jointly developed by the Cognitive Team (Cofe-AI) of BAAI, together with Tsinghua University, ICT, Nanyang Technological University, and University of Electronic Science and Technology. The project aims to develop a cost-effective, fully open-source, and highly effective large model. The FLM series has evolved to its second generation at the current stage.
1. FLM-2
FLM-2 is a more significant attempt. Doing
2. FLM-101B
FLM-101B inherits the structure of FreeLM and employs the Growth Strategy (MSG) to reduce costs by more than 70%. Additionally, it utilizes loss prediction to determine the optimal hyperparameters. FLM-101B represents a significant milestone, not only validating the feasibility of individual sub-technologies but also successfully implementing them at the system level. Regarding the relationship between FLM-101B and MSG, we perceive it as analogous to the relationship between GPT-3 and the Transformer architecture—it is not merely a matter of scaling up, but rather, it signifies the first successful implementation at the system level.
For detail, please refer to the FLM-101B and How to train it with a $100,000 Budget.
3. FreeLM
FreeLM is at generation 0, with the objective of validating the feasibility of integrating relevant knowledge learning stages into language model training.
For detail, please refer to the FreeLM.
4. Concepts for Large Modes Development
Our team’s philosophy about the development of large models is:
- Both system capabilities and research capabilities are essential.
- Without system capabilities, it is not possible to develop large models, as it would be impossible to control costs.
- Without research capabilities, one can only follow in the footsteps of others; under the circumstances where the leader in large models chooses to close the source code, it will be impossible to make further breakthroughs.
We welcome researchers with strong capabilities in both system capabilities and research capabilities to contact us!
Easter egg: This page was generated by an early version of FLM-2, without further editing.