人工智能基础模型支撑平台与评测技术

11月 1, 2022

本项目以大模型建设为抓手，以促进我国超大规模智能模型前沿技术发展、促进人工智能赋能经济社会发展为目标，构建国际领先的基础模型开源技术体系，建立以基础模型为核心的人工智能开放创新生态。

Yequan Wang

研究员，团队主管

我的研究兴趣包含大模型，具身智能和自然语言处理等。

出版物

Masked Structural Growth for 2x Faster Language Model Pre-training

To lower the computional cost of training large model, we focus on speeding up pre-training by progressively growing from a small Transformer structure to a large one.

Yiqun Yao, Zheng Zhang, Jing Li, Yequan Wang

52B to 1T: Lessons Learned via Tele-FLM Series

As scaling laws underscore the potential of increasing model sizes, the academic community has intensified its investigations into LLMs with capacities exceeding 50 billion parameters. This technical report builds on our prior work with Tele-FLM (also known as FLM-2), a publicly available 52-billion-parameter model.

Xiang Li, Yiqun Yao, Xin Jiang, Xuezhi Fang, China Telecom, Yequan Wang, Zhongjiang He, Zhongyuan Wang, Xuelong Li, Tiejun Huang

Research without Re-search: Maximal Update Parametrization Yields Accurate Loss Prediction across Scales

We find that Maximal Update parametrization (uP) enables accurate fitting of scaling laws for hyperparameters close to common loss basins, without any search. Thus, different models can be directly compared on large scales with loss prediction even before the training starts. We propose a new paradigm as a first step towards reliable academic research for any model scale without heavy computation.

Yiqun Yao, Yequan Wang

Research without Re-search: Maximal Update Parametrization Yields Accurate Loss Prediction across Scales