About me
I am a fourth-year PhD student from the Department of Computer Science and Technology, Tsinghua University. I am fortunately supervised by Professor Maosong Sun and Zhiyuan Liu. In the summer of 2019, I visited MILA and conducted research under Professor Jian Tang.
My current passion revolves around building SCALABLE solutions to AGI, which means the solutions will bring improvement simply with more resources on computation and data. This include:
- Scaling Model. This include
- Scaling Law. Ensuring the growth of language models’ ability is measureable and predictable.
- Scalable Supervision for LLM. As we are running out of pretraining data and step into super-human tasks, developing mechanisms for scalable oversight and dynamic evaluation is important.
- Scalable and Unified Foundation Models. Unifing modality, unifing training pipeline. Using generative model for everything.
- Scale up Model’s Reception. Long context and life long context.
- Scaling Data. This include
- Scaling Data for LLM. Scaling pretraining data, human expert data, synthetic data, etc. Data is all you need for AGI.
- Scalable Data for LWM. Large world model will be the next milestone. Have we prepared enough data for them?
In fact, I believe that we will not have achieved AGI until the model is capable of conducting scientific research independently. All the aforementioned points contribute to this objective.
The nature of intelligence always interests me. Currently I think it might take the form:
\[I := - \int_{\mathcal{P}} E_0 \frac{\mathrm{d}E}{E}\]where $I$ is the amount of intelligence, $E$ is information entropy, $E_0$ is the information entropy before applying intelligence, and $\mathcal{P}$ is the joint probability of the information over the world. Note that this definition is only used to express a belief that is neither rigorous nor verified.
News!
🔥 2024.2 We post an elegant work on training dynamics.
🔥 2024.2 We release MiniCPM. A small LLM with 2.4B non-embedding parameters that rivals Llama-13B or Mistral-7B.
Selected Publications
A simplest and beautiful mechanistic interpretability work. Providing perspective to understand the fascinating phenomena during model scaling and data scaling, encompassing Grokking, Double Descent, and Emergent Abilities.
Yufei Huang, Shengding Hu, Xu Han, Zhiyuan Liu, Maosong Sun
MiniCPM: Unveiling the Potential of End-side Large Language Models. Blog
A small LLM with 2.4B non-embedding parameters that rivals Llama-13B or Mistral-7B. With several scaling experiments. WSD learning rate scheduler is proposed as a substitude for Cosine learning scheduler. Trending on Github and Huggingface.
A benchmark that challenges AGI with humanity’s most eminent intellectual contests, serving as a beacon for future AGI development and a platform for studying scalable oversight.
Chaoqun He, Renjie Luo, Yuzhuo Bai, Shengding Hu, Zhen Leng Thai, Junhao Shen, Jinyi Hu, Xu Han, Yujie Huang, Yuxiang Zhang, Jie Liu, Lei Qi, Zhiyuan Liu, Maosong Sun
∞Bench: Extending Long Context Evaluation Beyond 100K Tokens. Arxiv
Benchmark for super long context. Long context is almost everything (better in an efficient way)
Xinrong Zhang, Yingfa Chen, Shengding Hu, Zihang Xu, Junhao Chen, Moo Khai Hao, Xu Han, Zhen Leng Thai, Shuo Wang, Zhiyuan Liu, Maosong Sun
Predicting Emergent Abilities with Infinite Resolution Evaluation. Preprint
The first work that achieves predictable scaling besides GPT-4
Shengding Hu, Xin Liu, Xu Han, Xinrong Zhang, Chaoqun He, Weilin Zhao, Yankai Lin, Ning Ding, Zebin Ou, Guoyang Zeng, Zhiyuan Liu, Maosong Sun
Won’t Get Fooled Again: Answering Questions with False Premises
ACL Oral Representation
Shengding Hu, Yifan Luo, Huadong Wang, Xingyi Cheng, Zhiyuan Liu, Maosong Sun
Tool Learning with Foundation Models Preprints.
A 75 page study of LLM tool use.
Yujia Qin, Shengding Hu, Yankai Lin, Weize Chen, Ning Ding, Ganqu Cui, …(other 33 co-authors), Zhiyuan Liu, Maosong Sun.
OpenPrompt: An Open-source Framework for Prompt-learning ACL 2022 Demo
ACL 2022 Best Demo Award
Ning Ding*, Shengding Hu*, Weilin Zhao*, Yulin Chen, Zhiyuan Liu, Hai-Tao Zheng, Maosong Sun
Knowledgeable Prompt-tuning: Incorporating Knowledge into Prompt Verbalizer for Text Classification. ACL 2022.
More than 200 citations
Shengding Hu, Ning Ding, Huadong Wang, Zhiyuan Liu, JinGang Wang, Juanzi Li, Wei Wu, Maosong Sun
Delta tuning: A comprehensive study of parameter efficient methods for pre-trained language models Preprint
Neural Machine Intelligence Cover Article
Ning Ding*, Yujia Qin*, Guang Yang, Fuchao Wei, Zonghan Yang, Yusheng Su, Shengding Hu, Yulin Chen, Chi-Min Chan, Weize Chen, Jing Yi, Weilin Zhao, Xiaozhi Wang, Zhiyuan Liu, Hai-Tao Zheng, Jianfei Chen, Yang Liu, Jie Tang, Juanzi Li, Maosong Sun
Graph Neural Networks: A Review of Methods and Applications. AI Open 2021.
More than 4000 citations
Jie Zhou*, Ganqu Cui*, Shengding Hu, Zhengyan Zhang, Cheng Yang, Zhiyuan Liu, Lifeng Wang, Changcheng Li, Maosong Sun.
For other papers, please refer to my google scholar
Selected Projects
Awards
Master&PhD
1/83 across the world, Price $30,000
National Scholarship 2021.
One of the highest award in Tsinghua University.
Bachelor
- Academic Excellence Award in 2016-2017 , 2017-2018, 2018-2019.
- Good reading scholarship in 2018-2019 year.
- Zhuzhou Scholarship in 2017-2018 year
Earlier
- Silver Medal in the 32nd National Middle School Physics Competition