In the vast realm of machine learning, fine-tuning stands out as one of the most crucial techniques for adapting pre-trained models to new tasks. Ludwig, a deep learning toolkit, offers a diverse palette of fine-tuning strategies that cater to different needs. In this blog, we'll delve into these techniques, especially focusing on the Quantization-Based Fine-Tuning (QLoRA) method, as we explore the Code Alpaca project's efforts in instruction-based code generation using LLaMA models.
Before diving into the specifics of our journey with Code Alpaca, let's set the stage by understanding the three primary fine-tuning approaches provided by Ludwig:
Full Fine-Tuning:
Parameter Efficient Fine-Tuning (PEFT):
Quantization-Based Fine-Tuning (QLoRA):
Now that we've covered the fine-tuning landscape, it's time to introduce the Code Alpaca project and set the stage for our experiments.
To get started, we'll first import the necessary libraries and load the dataset:
```python import logging import os import torch import yaml from ludwig.api import LudwigModel import warnings import numpy as np; np.random.seed(123) import pandas as pd
df = pd.read_json("https://raw.githubusercontent.com/sahil280114/codealpaca/master/data/code_alpaca_20k.json") ```
With an aim to advance instruction-based code generation, Code Alpaca harnesses Ludwig's capabilities, starting with a zero-shot approach.
This method utilizes pre-trained models without any additional fine-tuning, leveraging their existing knowledge to generate code based on the given instructions.
The zero-shot model configuration looks like this:
```python
zero_shot_config = yaml.safe_load( """ model_type: llm base_model: openlm-research/open_llama_3b_v2
input_features: - name: instruction type: text
output_features: - name: output type: text
prompt: template: >- Below is an instruction that describes a task, paired with an input that may provide further context. Write a response that appropriately completes the request.
### Instruction: {instruction}
### Input: {input}
### Response:
generation: temperature: 0.1 # Temperature is used to control the randomness of predictions. max_new_tokens: 512
preprocessing: split: type: fixed
quantization: bits: 4 """ ) ```
The zero_shot_config specifies the model type, base model, and other parameters like input features, prompt templates, generation settings, and quantization details. After setting this configuration, we proceed with training:
python
zero_shot_model = LudwigModel(config=zero_shot_config, logging_level=logging.INFO)
results = zero_shot_model.train(dataset=df, skip_save_model=False)
With a zero-shot model in place, the next step is to fine-tune it. For this, we employ the QLoRA technique combined with Ludwig's fine-tuning capabilities:
```python qlora_fine_tuning_config = yaml.safe_load( """ model_type: llm base_model: openlm-research/open_llama_3b_v2
input_features: - name: instruction type: text
output_features: - name: output type: text
prompt: template: >- Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
### Instruction: {instruction}
### Input: {input}
### Response:
generation: temperature: 0.1 max_new_tokens: 512
adapter: type: lora
quantization: bits: 4
trainer: type: finetune epochs: 20 batch_size: 1 eval_batch_size: 2 gradient_accumulation_steps: 16 learning_rate: 0.00001 optimizer: type: adam params: eps: 1.e-8 betas: - 0.9 - 0.999 weight_decay: 0 learning_rate_scheduler: warmup_fraction: 0.03 reduce_on_plateau: 0 """ )
model = LudwigModel(config=qlora_fine_tuning_config, logging_level=logging.INFO) results = model.train(dataset=df) model.save("fine_tuned_model") ```
The qlora_fine_tuning_config mirrors the zero-shot configuration but introduces additional parameters for the adapter type (LoRA), quantization settings, and various training parameters.
In our journey to understand the effectiveness of Zero Shot and Fine-Tuned models, it's essential to evaluate them on a diverse set of examples. Here, we'll dissect the results from a set of coding-related instructions.
We started by defining a set of coding instructions:
python
test_examples = pd.DataFrame([
{
"instruction": "Create an array of length 15 containing numbers divisible by 3 up to 45.",
"input": "",
},
{
"instruction": "Create a nested loop to print every combination of numbers between 0-9",
"input": ""
},
{
"instruction": "Generate a function that computes the sum of the numbers in a given list",
"input": "",
},
{
"instruction": "Create a class to store student names, ages and grades.",
"input": "",
},
{
"instruction": "Print out the values in the following dictionary.",
"input": "my_dict = {\n 'name': 'John Doe',\n 'age': 32,\n 'city': 'New York'\n}",
},
])
Next, we evaluated the two models on these test examples:
```python print('Zero Shot Model', '-' * 50) predictions = zero_shot_model.predict(test_examples)[0] for i, input_with_prediction in enumerate(zip(test_examples['instruction'], test_examples['input'], predictions['output_response'])): print('Zero Shot Example', i ) print(f"Instruction: {input_with_prediction[0]}") print(f"Input: {input_with_prediction[1]}") print(f"Generated Output: {input_with_prediction[2][0]}") print("\n\n")
print('Fine Tuned Model', '-' * 50) predictions = model.predict(test_examples)[0] for i, input_with_prediction in enumerate(zip(test_examples['instruction'], test_examples['input'], predictions['output_response'])): print('Fine Tuend Example', i ) print(f"Instruction: {input_with_prediction[0]}") print(f"Input: {input_with_prediction[1]}") print(f"Generated Output: {input_with_prediction[2][0]}") print("\n\n") ```
Building upon our understanding of Ludwig's fine-tuning techniques and our journey with Code Alpaca's LLaMA models, it's essential to evaluate the practical implications of our efforts. Zero-shot learning, while powerful, may sometimes falter in producing precise or optimal results. On the other hand, a fine-tuned model, honed for a specific task, often outperforms its zero-shot counterpart. Let's delve into some examples to see this difference in action.
Instruction: Create an array of length 15 containing numbers divisible by 3 up to 45.
Zero-Shot Output: ```c
int main() { int arr[15] = {0}; int i; for(i=0;i<15;i++) { arr[i] = i; } for(i=0;i<15;i++) { printf("%d\n",arr[i]); } return 0; } ```
Fine-Tuned Output:
javascript
var numbers = [3, 6, 9, 12, 15, 18, 21, 24, 27, 30, 33, 36, 39, 42];
In this case, the zero-shot model produces C code that doesn't correctly create an array of numbers divisible by 3. The fine-tuned model, however, nails the instruction with a concise Javascript snippet.
Instruction: Create a nested loop to print every combination of numbers between 0-9
Zero-Shot Output:
java
for (int i = 0; i < 10; i++) {
for (int j = 0; j < 10; j++) {
System.out.print(i + " " + j + " ");
}
System.out.println();
}
Fine-Tuned Output:
pythoh
for i in range(10):
for j in range(10):
print(i, j)
Both the models get the essence of the instruction right, but the fine-tuned model generates a more concise Python code, emphasizing its proficiency.
Instruction: Generate a function that computes the sum of the numbers in a given list.
Zero-Shot Output:
python
def sum(numbers):
sum = 0
for number in numbers:
sum += number
return sum
Fine-Tuned Output:
python
def sum(numbers):
sum = 0
for number in numbers:
sum += number
return sum
Interestingly, both the zero-shot and fine-tuned models produce nearly identical outputs for this instruction.
Instruction: Create a class to store student names, ages and grades.
Zero-Shot Model Output: The output for this example seems to be truncated or not provided.
Fine-Tuned Model Output:
javascript
class student {
string name;
int age;
float grade;
};
In this example, the fine-tuned model provides a concise representation of a student class with attributes for name, age, and grade. The structure is clear, and it's evident that the fine-tuned model grasps the essence of object-oriented programming, offering a blueprint of a class.
Instruction: Print out the values in the following dictionary.
Input:
python
my_dict = {
'name': 'john doe',
'age': 32,
'city': 'new york'
}
Zero-Shot Model Output:
python
print(my_dict['name'])
print(my_dict['name'])
...
Fine-Tuned Model Output:
python
for key, value in my_dict.items():
print(key, value)
The fine-tuned model provides a more sophisticated solution. It uses a for loop to iterate over the dictionary items and prints all the key-value pairs. This approach ensures that all items in any dictionary of this structure will be printed, making it a more generalized and correct solution.
From the provided examples, it's evident that the fine-tuned model outperforms the zero-shot model. The fine-tuned model's solutions are not only accurate but also more comprehensive and tailored to the tasks. This underscores the importance of fine-tuning when adapting general language models to specific domains or tasks.
Harnessing Ludwig's fine-tuning techniques, especially the QLoRA method, we've made significant strides with the Code Alpaca project, pushing the boundaries of instruction-based code generation. With a combination of zero-shot modeling and fine-tuning, the LLaMA models present a promising future in the realm of automated code generation. The journey is just beginning, and the possibilities are endless.
Created 2023-09-01T23:02:01-07:00, updated 2023-11-04T20:02:03-07:00 · History · Edit