About me

I am a fourth-year PhD student from the Department of Computer Science and Technology, Tsinghua University. I am fortunately supervised by Professor Maosong Sun and Zhiyuan Liu. In the summer of 2019, I visited MILA and conducted research under Professor Jian Tang.

My current passion revolves around building SCALABLE solutions to AGI, which means the solutions will bring improvement simply with more resources on computation and data. This include:

  1. Scaling Model. This include
    1. Scaling Law. Ensuring the growth of language models’ ability is measureable and predictable.
    2. Scalable Supervision for LLM. As we are running out of pretraining data and step into super-human tasks, developing mechanisms for scalable oversight and dynamic evaluation is important.
    3. Scalable and Unified Foundation Models. Unifing modality, unifing training pipeline. Using generative model for everything.
    4. Scale up Model’s Reception. Long context and life long context.
  2. Scaling Data. This include
    1. Scaling Data for LLM. Scaling pretraining data, human expert data, synthetic data, etc. Data is all you need for AGI.
    2. Scalable Data for LWM. Large world model will be the next milestone. Have we prepared enough data for them?

In fact, I believe that we will not have achieved AGI until the model is capable of conducting scientific research independently. All the aforementioned points contribute to this objective.

The nature of intelligence always interests me. Currently I think it might take the form:

\[I := - \int_{\mathcal{P}} E_0 \frac{\mathrm{d}E}{E}\]

where $I$ is the amount of intelligence, $E$ is information entropy, $E_0$ is the information entropy before applying intelligence, and $\mathcal{P}$ is the joint probability of the information over the world. Note that this definition is only used to express a belief that is neither rigorous nor verified.

News!

🔥 2024.2 We post an elegant work on training dynamics.

🔥 2024.2 We release MiniCPM. A small LLM with 2.4B non-embedding parameters that rivals Llama-13B or Mistral-7B.

Selected Publications

For other papers, please refer to my google scholar

Selected Projects

Readme Card

Readme Card

Readme Card

Readme Card

Readme Card

Readme Card

Awards

Master&PhD

Bachelor

  • Academic Excellence Award in 2016-2017 , 2017-2018, 2018-2019.
  • Good reading scholarship in 2018-2019 year.
  • Zhuzhou Scholarship in 2017-2018 year

Earlier

  • Silver Medal in the 32nd National Middle School Physics Competition