Iteration 18. Train models for submission

03-09-2024

Goal

Train a set of models with the submission in mind. This implies using all the available data for training and skipping the validation step.

Motivation

It is likely that training in all the available data would bring small improvements on leaderboard. Solving an extra problem could be the difference between winning a prize or not.

Development

12k training steps could be a good duration when using the original ARC dataset. It is likely that when adding new classes I could train for longer and get better results.

Initially I will be using the whole ARC dataset. That way I will have an additional 100 evaluation tasks that could boost the leaderboard score. I will be adding new data once I verify that it is beneficial.

Using a higher lora rank than 32 might give better results, but I still have to verify it.

Use the bigger max_seq_len possible, because there will be long problems in the test set. I have found that I can use a max_seq_len of 10240 without any problem, and very little slowdown because each prompt is processed independently, so they are not padded.

Results

Conclusion

Next steps

TODO

Do we see improvements in LB score when increasing the max_seq_len?
Do we see improvements when increasing LoRA rank? 128 better than 32, 512 is better than 128? Whole model?
Verify that I can use a 512 lora model in the submission without additional changes
Do we see improvements when training with RE-ARC?
Is it beneficial to use a bigger number of predictions? F.e. 256