Iteration 15. Prompt tuning

14-04-2024

Goal

Can I improve the LB score by using prompt tuning?

Motivation

So far I have no seen leaderboard differences when using different models. I'm able to fine-tune them and learn the train data without any problem but it is hard to improve the LB score. One interesting thing is that I'm able to overfit on the train data using LoRA and just r=1.

Do I really need fine-tuning for this task? Maybe we can simply learn a good prompt and really on the power of the model to do the task. I believe it is unlikely that with the current scale of data (~1k) the model is going to learn a new task. Thus using the smallest amount possible of parameters could be better.

Development

How to do prompt fine-tuning

Fine-Tuning Models using Prompt-Tuning with Hugging Face’s PEFT Library. This seems very similar to the LoRA fine-tuning.
Prompt tuning for causal language modeling, official Hugging Face
Prompt tuning official documentation

Experiment design

The idea is to take Mistral-7B and do prompt fine-tuning as similar as possible as fine-tuning. I will compare the results to the previous iteration centered on data.

Inference

It seems that load adapter does not work. I have to use the PEFTModel and created a new notebook to make this submissions.

Results

Shuffle the train dataset

When running multiple experiments with different learning rates and number of virtual tokens I have noticed that the train loss of the different runs had a very similar pattern.

Thus I have realized that the training was not shuffling the data. So maybe I have to revisit previous fine-tuning adding data shuffling.

The model trained with shuffled data consistently outperforms the other model.

No improvement on leaderboard

I have made a few submissions and the results did not improve over fine-tuning. Moreover one of the submissions seemed very brittle and a tiny change of adding one space to the submission resulted in very different predictions.

base_model	virtual tokens	train steps	train loss	val loss	LB score
mistral	8	250	0.97	1	0.61
mistral	64	250	1.1	1.15	0.61
mistral	8	1000	0.61	0.78	0.56
mixtral	8	1000	0.96	1	0.6

Conclusion

Prompt-tuning is not the solution we are looking for.

Last update: 2024-04-15