Iteration 14. Speedup training with unsloth

01-09-2024

Goal

Can we speedup training using unsloth?

Motivation

On previous iterations I have tried using different learning rate schedules to speedup the training, or changing the batch size without success. Unsloth library might be an easy way to speedup training.

If I'm able to speedup training I will be able to train faster, but at the same time I could do longer test-time fine-tuning when making a submission.

Hopefully this will be a very fast iteration.

Development

Install unsloth

I'm going to create a new local conda environment following their instructions

I have to decrease the python version to 3.10, with 3.11 could not find a compatible distribution of xformers.

conda create --name arc-unsloth \
    python=3.10 \
    pytorch-cuda=12.1 \
    pytorch cudatoolkit xformers -c pytorch -c nvidia -c xformers
conda activate arc-unsloth

python -c "import torch; print(torch.cuda.is_available())"


pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
pip install --no-deps trl peft accelerate bitsandbytes
pip install wandb termcolor

I have accidentally removed conda when trying to clean my base environment :(

Regenerating the arc original environment:

conda create -n arc pytest rope pylint tqdm numpy pandas scikit-learn ipython ipykernel coverage ipywidgets matplotlib python=3.10 -y
conda activate arc
pip install -r requirements.txt
pip install flash-attn --no-build-isolation
pip install vllm

Fine-tuning script

On a first step I'm going to duplicate the fine-tuning script and adapt it to use unsloth. Once I validate that it works and it's faster I will look for the way of having a single script.

Iteration 14. Speedup training with unsloth

Goal

Motivation

Development

Install unsloth

Fine-tuning script

Results

Conclusion

Next steps

TODO