Revolutionizing Content Generation: The Dynamic Duo of RLHF Unleashed - PPO and PEFT Fine-Tuned LLMs

PART II

Aug 12, 2023

In part 1 ,we have discussed about the basic of RLHF and inferences from zero shots, one shots, and a few shots.In this part, we will fine tune a generative AI model (instruct fine tuned LLM) that will be used as a reference model for the Reward model.We are going to fine tune FLAN-T5 by parameter Efficient Fine-Tuning (PEFT) approach.

Now install the required packages for the LLM and datasets.

#installing necessary libraries
! pip install torch==1.13.1 
! pip torchdata==0.5.1 --quiet

!pip install \
    transformers==4.27.2 \
    datasets==2.11.0 \
    evaluate==0.4.0 \
    rouge_score==0.1.2 \
    loralib==0.1.1 \
    peft==0.3.0 --quiet

Import the necessary libraries.

#install necessary libraries
from datasets import load_dataset
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer, GenerationConfig, TrainingArguments, Trainer
import torch
import time
import evaluate
import pandas as pd
import numpy as np

Load Dataset and LLM

We'll be using the DialogSum Hugging Face dataset. It includes over 10,000 dialogues with manually labelled summaries and subjects.

#loading dataset from hugging face
dataset_name = "knkarthick/dialogsum"

dataset = load_dataset(dataset_name)

dataset

Loading HuggingFace's pre-trained FLAN-T5 model and tokenizer. It should be noted that we will be using the small version of FLAN-T5. The torch_dtype=torch.bfloat16 setting determines the memory type that will be used by this model.

#loading Model from hugging face
model_name='google/flan-t5-base'
original_model = AutoModelForSeq2SeqLM.from_pretrained(model_name, torch_dtype=torch.bfloat16)
tokenizer = AutoTokenizer.from_pretrained(model_name)

It is feasible to get the number of model parameters and determine how many are trainable. The following function can be used to accomplish this.

def get_trainable_model_parameters(model):
    trainable_model_params = 0
    all_model_params = 0
    for _, param in model.named_parameters():
        all_model_params += param.numel()
        if param.requires_grad:
            trainable_model_params += param.numel()
    return f"trainable model parameters: {trainable_model_params}\nall model parameters: {all_model_params}\npercentage of trainable model parameters: {100 * trainable_model_params / all_model_params:.2f}%"

print(get_trainable_model_parameters(original_model))

OUTPUT For Full Fine tuning : 
trainable model parameters: 247577856
all model parameters: 247577856
percentage of trainable model parameters: 100.00%

Perform Parameter Efficient Fine-Tuning (PEFT)

Parameter Efficient Fine-Tuning (PEFT) is a form of instruction fine-tuning.PEFT, a broad term encompassing concepts like Low-Rank Adaptation (LoRA) and distinct from prompt engineering, is commonly used to refer to LoRA in most contexts. LoRA, in essence, enables efficient fine-tuning of models with minimal computational resources, sometimes even on a single GPU. After fine-tuning a base language model (LLM) for specific tasks or use cases using LoRA, a novel "LoRA adapter" is derived, leaving the original LLM unaffected. The size of this LoRA adapter is notably smaller, often just a fraction of the LLM's size (in MBs instead of GBs).

During inference, the LoRA adapter is combined with the original LLM to process requests. This approach offers the advantage of reusing the core LLM, resulting in reduced memory requirements for serving multiple tasks and use cases simultaneously.

Prepare the Dataset

We must convert the dialog-summary (prompt-response) pairs into explicit LLM instructions. Prepend the following instruction to the beginning of the dialogue with Summarise the following conversation and to the beginning of the summary with Summary.

def prepare_dataset(example):
    start_prompt = 'Summarize the following conversation.\n\n'
    end_prompt = '\n\nSummary: '
    prompt = [start_prompt + dialogue + end_prompt for dialogue in example["dialogue"]]
    example['input_ids'] = tokenizer(prompt, padding="max_length", truncation=True, return_tensors="pt").input_ids
    example['labels'] = tokenizer(example["summary"], padding="max_length", truncation=True, return_tensors="pt").input_ids
    
    return example

tokenized_datasets = dataset.map(prepare_dataset, batched=True)
tokenized_datasets = tokenized_datasets.remove_columns(['id', 'topic', 'dialogue', 'summary',])

PEFT/LoRA model for Fine-Tuning FLAN-T5

To establish the PEFT/LoRA model for fine-tuning with a fresh layer/parameter adapter, the process involves utilizing PEFT/LoRA principles. Within this approach, the base LLM remains fixed, while the focus of training narrows down to the adapter. Take a look at the provided LoRA configuration, paying attention to the "rank" (r) hyper-parameter. This parameter determines the dimension or rank of the adapter that is subjected to training.

from peft import LoraConfig, get_peft_model, TaskType

lora_config = LoraConfig(
    r=32, # Rank
    lora_alpha=32,
    target_modules=["q", "v"],
    lora_dropout=0.05,
    bias="none",
    task_type=TaskType.SEQ_2_SEQ_LM # FLAN-T5
)

peft_model = get_peft_model(original_model, 
                            lora_config)
print(get_trainable_model_parameters(peft_model))

OUTPUT For PEFT:
trainable model parameters: 3538944
all model parameters: 251116800
percentage of trainable model parameters: 1.41%

Train PEFT Adapter

Define training arguments and create Trainer instance.

output_dir = f'./peft-training-{str(int(time.time()))}'

peft_training_args = TrainingArguments(
    output_dir=output_dir,
    auto_find_batch_size=True,
    learning_rate=1e-3, # Higher learning rate than full fine-tuning.
    num_train_epochs=10,
    logging_steps=1,
    max_steps=100    
)
    
peft_trainer = Trainer(
    model=peft_model,
    args=peft_training_args,
    train_dataset=tokenized_datasets["train"],
)
peft_trainer.train()

Saving the tuned model

peft_model_path="./peft-tuned-output"
peft_trainer.model.save_pretrained(peft_model_path)
tokenizer.save_pretrained(peft_model_path)

Inferencing Using Finetuned Model

from peft import PeftModel, PeftConfig

peft_model_base = AutoModelForSeq2SeqLM.from_pretrained("google/flan-t5-base", torch_dtype=torch.bfloat16)
tokenizer = AutoTokenizer.from_pretrained("google/flan-t5-base")

peft_model = PeftModel.from_pretrained(peft_model_base, 
                                       '/content/peft-tuned-output_final/', 
                                       torch_dtype=torch.bfloat16,
                                       is_trainable=False)

index = 300
dialogue = dataset['test'][index]['dialogue']
baseline_human_summary = dataset['test'][index]['summary']

prompt = f"""
Summarize the following conversation.

{dialogue}

Summary: """

input_ids = tokenizer(prompt, return_tensors="pt").input_ids

peft_model_outputs = peft_model.generate(input_ids=input_ids, generation_config=GenerationConfig(max_new_tokens=200, num_beams=1))
peft_model_text_output = tokenizer.decode(peft_model_outputs[0], skip_special_tokens=True)

print(dash_line)
print(f'BASELINE HUMAN SUMMARY:\n{baseline_human_summary}')
print(dash_line)
print(dash_line)
print(f'PEFT MODEL: {peft_model_text_output}')
print(dash_line)

Pushing Tuned Model to Hugging Face

from huggingface_hub import notebook_login
notebook_login()
peft_model.push_to_hub("flan-t5_fine_tuned_summarization")

Decoding the Wonders of Natural Language Processing

Discussion about this post

Ready for more?