Skip to main content

finetune llama 3 on custom dataset

Did you know that when it comes to the fine-tuning of one of the most sophisticated language models LLAMA 3, you can follow these steps? Look no further! Let us get started in this post to learn how to fine-tune LLAMA 3 with the help of the UnSloth tool.

Untitled

Prerequisites

Let me first remind you to enable the GPU because without it, it will take a very long time to fine-tune the LLAMA 3 model. The dataset we will be using is the alpaca clean dataset which is in the alpaca format containing instruction, input, and output.

Installing Required Packages

To fine-tune the LLaMA 3 model, the following packages have to be installed. We will require UnSloth, Transformers, TRL PEFT Accelerate, and Bits and Bytes. These libraries are rather important for the model training and prediction processes.

Here’s the command to install the packages:

git clone https://github.com/unsloth/unsloth.git
pip install -r unsloth/requirements.txt –no-deps
pip install transformers==0.0.1 trl==0.9.0

Loading and Configuring the Model

Next is loading and configuring of the model: For the maximum sequence length, we will use 2048 and as for the data type, it will be inferentially distinguished. We will convert the model into float16 and similarly we will load the LLaMA 3 model from UnSloth.

Here’s the code:

import torch
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
model = AutoModelForSeq2SeqLM.from_pretrained(“llama-3”)
tokenizer = AutoTokenizer.from_pretrained(“llama-3”)
config = {
“max_length”: 2048,
“dtype”: torch.float16
}
model.to(config[“dtype”])

Applying Parameter-Efficient Fine-Tuning using LoRA

Next, let’s perform a parameter-efficient fine-tuning strategy called LoRA. Here, we will get the model, and specify the rank parameter of 16 and the particular modules for adaptation. In addition, we will also set our LoRA alpha to 16 and LoRA dropout to 0, while the bias will always be NaN.

Here’s the code:

from lora import LoRA
lora_model = LoRA(model, rank=16, target_modules=[“module1”, “module2”], alpha=16, dropout=0, bias=torch.nan)

Defining the Prompt Template

Next on the agenda is the definition of the template that lays at the foundation of each generated prompt. For each pair we will generate an Alpaca prompt template to format the series, which will include an instruction, input, and response. This we will append to let the LLM know that the sentence has ended and it is time to add the end of sequence token.

Here’s the code:

prompt_template = “instruction: {instruction}\ninput: {input}\nresponse: {response}\n”

Loading and Formatting the Dataset

We’ll load the Alpaca dataset and apply the formatting of the prompt template to each row of the dataset in different batches.

Here’s the code:

from datasets import load_dataset

dataset = load_dataset(“alpaca”)
formatted_dataset = []
for batch in dataset:
formatted_batch = []
for example in batch:
instruction = example[“instruction”]
input = example[“input”]
response = example[“response”]
formatted_example = prompt_template.format(instruction=instruction, input=input, response=response)
formatted_batch.append(formatted_example)
formatted_dataset.append(formatted_batch)

Setup and Training the Model

Next is to define and to also train this model. Here we are going to import the SFT trainer and give it the model, tokenizer, train data, maximum length as well as any additional parameters. The training arguments that we are going to set are batch size per device, gradient accumulation steps, warm-up steps, max steps, learning rate, logging steps and weight decay.

Here’s the code:

from sft import SFTTrainer

trainer = SFTTrainer(
model=lora_model,
tokenizer=tokenizer,
train_dataset=formatted_dataset,
max_length=2048,
extra_args={
“batch_size_per_device”: 2,
“gradient_accumulation_steps”: 4,
“warmup_steps”: 100,
“max_steps”: 1000,
“learning_rate”: 1e-4,
“logging_steps”: 10,
“weight_decay”: 0.01
}
)

trainer.train()

Inference and Generation

Once the training is complete, we’ll prepare the model for inference by preparing the input prompt.

Here’s the code:

FastLanguageModel.for_inference(model)
inputs = tokenizer(
[
alpaca_prompt.format(
“Provide first 10 elements of fibonnaci series.”, # instruction
“”, # input
“”, # output – leave this blank for generation!
)
], return_tensors = “pt”).to(“cuda”)

outputs = model.generate(**inputs, max_new_tokens = 64, use_cache = True)
tokenizer.batch_decode(outputs)

 

0
    0
    Your Cart
    Your cart is emptyReturn to Courses