Iteration 19. Plan the last submissions

16-04-2024

Goal

Plan the last submissions to be as successful as possible.

Motivation

Only one day for the end of the challenge, that means I have 5 submissions left. I'm far from the top position but there might be a shakeup.

It's 12:00 and I thought the competition was ending tomorrow and I had 10 submissions left. I should pay more attention to the deadline.

Development

Number of epochs

Mistral_v3 -> 450 steps, batch size 16 -> 7200 train samples, max steps 940 https://wandb.ai/guillermobarbadillo/huggingface/runs/ulrk5o38
Mixtral_v5 -> 114 steps, batch size 16 -> 1824 train samples, max steps 1520 https://wandb.ai/guillermobarbadillo/huggingface/runs/b8rxqez9
Mixtral-22b -> 450 steps, batch size 16 -> 7200 train samples, max steps 1000 https://wandb.ai/guillermobarbadillo/other_models/runs/kh0hr94u?nw=nwuserguillermobarbadillo
public datasets -> 50 steps, batch size 16 -> 800 train samples, max steps 250 https://wandb.ai/guillermobarbadillo/datacentric_mistral/runs/8sc27bj9?nw=nwuserguillermobarbadillo

The data above shows the effective batch size, I have found that when using accumulated gradients each step is every time the gradient is updated.

The number of steps is variable, from 50 to 450. I'm going to train for 500 steps. Later I will decide which is the step used for submission.

After visualizing the training metrics I have decided to take the checkpoints from step 200.

Training data

I believe the best possible training data would be:

mooney_test_with_gpt4,
gemma_suppl_rewrite_curated_with_gpt4,
high_quality_dataset_v2-4 but removing duplicated prompts

This results in around 1k training samples, so the total train will be around 8 epochs.

LoRA configuration

I'm going to use r=1 because training and validation loss was almost the same as r=16 and leaderboard score was successful.

Models

These are my candidate models for the final ensemble:

Mistral-7B
Llama-13B
Mistral-22B
Mixtral-8x7B

I will train them all and later try to find the best possible combination.

Submission planning

I can only make 5 submissions:

A single model with the maximum predictions possible, mistral_v3 is a candidate for this, f.e. mistral_v3x10
The four models from previous train runs, using sampling. MMML v1
The above with mean prompt
The four models trained on this iteration with one prediction each, using sampling. That would take 6 hours. MMML v2
The above with mean prompt

The deadline is at 23:59 UTC, which is 2 hours more in my timezone. If I make the submission at 18:00 it would take 8 hours to run, that should be enough. But there might be GPU crunch.

Results

submission	public LB	private LB
MMML v1	0.64	0.65
MMML v1 + 0.63 prompt	0.66	0.67

Conclusion

The challenge has ended, I'm on a provisional 20th position.

Next steps

Prepare a summary of the solution.

TODO

How many predictions is the optimum to combine with the mean prompt? I believe infinite is the best if not using mean prompts. I know that 4 is better than 3 for Mixtral_v5
- Submit a model with
Is it better to make a lot of submissions with Mistral, or to combine different models? Intuition says is to use different models.

Last update: 2024-04-17