Iteration 49. SmolLM2
04-11-2024
Goal
Does SmolLM2 improve over SmolLM?
Motivation
I have just seen the release of SmolLM2. The biggest change is that the model is prepared to handle inputs of 8192 tokens (and I believe even longer inputs up to 20k tokens.) The model also improves the accuracy on some datasets. Thus it is very likely that replacing SmolLM by SmolLM2 will give free improvements.
Development
Tiny modifications to SmolLM2
I'm going to simply increase the max_position_embedding
from 8192 to 20480, just like I did with SmolLM
, both
on the config.json
and tokenizer_config.json
Experiment design
The idea is to run the exact same training experiment I recently did with SmolLM with the new SmolLM2. Hopefully we will see a faster training loss decrease.
Results
model | accuracy | pass_n | vote_2 | vote_1 |
---|---|---|---|---|
Qwen2.5-0.5B-Instruct | 10.32% | 27.62% | 19.38% | 16.88% |
SmollLM-135M-Instruct | 4.20% | 18.25% | 11.25% | 9.3% |
SmollLM2-135M-Instruct | 5.52% | 20.50% | 12.62% | 9.8% |
We can see a noticeable improvement when using SmolLM2 over SmolLM, but it is still far from the accuracy of Qwen2 model.
Conclusion
SmolLM2 improves over SmolLM, but not enough to be able to compete with Qwen2.5.
Next steps
TODO
- Compare the accuracy on evaluation dataset vs SmolLM