Training a GPT-Based Large Language Model (e.g GPT-Neo) with Happy Transformer
Assuming you already know how to Use HappyTransformer for Text Generation -- this TIL focuses on training a large language model using your own "stuff".
This script will:
- Initiate a generator
- show what output it produces before training
- Train it
- show what output it produces after training
- Save our trained model
Here is the script: (save this in a .py
python file)
from happytransformer import HappyGeneration, GENSettings, GENTrainArgs
prompt = "To make a cheese sandwich "
print(prompt)
# 1. Initiate a generator
happy_gen = HappyGeneration(model_type="GPT-NEO", model_name="EleutherAI/gpt-neo-1.3B")
generator_args = GENSettings(no_repeat_ngram_size=2, do_sample=True, early_stopping=False, top_k=5, temperature=0.8, max_length=30)
# 2. show what output it produces before training
print("RESULT BEFORE MORE... TRAINING....")
print("1:")
result = happy_gen.generate_text(prompt, args=generator_args)
print(prompt)
print(result.text)
print("2:")
result = happy_gen.generate_text(prompt, args=generator_args)
print(prompt)
print(result.text)
print("3:")
result = happy_gen.generate_text(prompt, args=generator_args)
print(prompt)
print(result.text)
# 3. Train it
# Description of the training args is coming up soon
trainer_args = GENTrainArgs(learning_rate =8e-5, num_train_epochs = 2, save_preprocessed_data=True, save_preprocessed_data_path="preprocessed-data.json", batch_size=4)
# input_filepath -- a path file to a text file that contains nothing but text to train the model.
happy_gen.train(input_filepath = "cheese_instructions.txt", args=trainer_args)
# 4. show what output it produces after training
print("RESULT AFTER TRAINING....")
print("1:")
result = happy_gen.generate_text(prompt, args=generator_args)
print(prompt)
print(result.text)
print("2:")
result = happy_gen.generate_text(prompt, args=generator_args)
print(prompt)
print(result.text)
print("3:")
result = happy_gen.generate_text(prompt, args=generator_args)
print(prompt)
print(result.text)
# 5. Save our trained model
happy_gen.save("model/")
Next time we go to initiate our generator, we can load our trained model...
# happy_gen = HappyGeneration(model_type="GPT-NEO", model_name="EleutherAI/gpt-neo-1.3B") -- add a 'load_path' argument...
happy_gen = HappyGeneration(model_type="GPT-NEO", model_name="EleutherAI/gpt-neo-1.3B", load_path="model/")
Training Args
Parameters used by GENTrainArgs ---
Below is a table with all of the parameters. First a discussion of those used in the example above:
trainer_args = GENTrainArgs(learning_rate =8e-5, num_train_epochs = 2, save_preprocessed_data=True, save_preprocessed_data_path="preprocessed-data.json", batch_size=4)
I gave a very high learning rate (8 times the default), so that learning could be demonstrated quickly, even if it was possibly too extreme.
I set save_preprocessed_data_path
to preprocessed-data.json
-- this produces an interesting json
file, which demonstrates how the tokenization works (more on that soon).
Parameters | Default | Description |
---|---|---|
learning_rate |
5e-5 | How much the model's weights are adjusted per step. Too low and the model will take a long time to learn or get stuck in a suboptimal solution. Too high can cause can divergent behaviors. |
num_train_epochs |
3 | The number of times the training data is iterated over. |
batch_size |
1 | Number of training examples used per iteration |
weight_decay |
0 | A type of regularization. It prevents weights from getting too large. Thus, preventing overfitting. |
adam_beta1 |
0.9 | The beta1 parameter for the Adam with weight decay optimizer. |
adam_beta2 |
0.999 | The beta2 parameter for the Adam with weight decay optimizer. |
adam_epsilon |
1e-8 | The epsilon parameter for the Adam with weight decay optimizer. |
max_grad_norm |
1.0 | Used to prevent exploding gradients. Prevents the derivatives of the loss function from exceeding the absolute value of max_grad_norm . |
save_preprocessed_data |
False | |
save_preprocessed_data_path |
"" | |
load_preprocessed_data |
False | |
load_preprocessed_data_path |
"" | |
preprocessing_processes |
1 | |
fp16 |
False | If true, enables half precision training which saves space by using 16 bits instead of 32 to store the model's weights. Only available when CUDA/a a GPU is being used. |
About the training data
For the purposes of this example, I generated a file of cheese sandwich making instructions, with one sentence per line.
I used the "spintax" syntax, and plugged it into the spintax-spinner here: spintax test editor - then used NimbleText to sort it, remove duplicates, and shuffle it. That gave me a lot of boring examples, for illustration purposes.
'''spintax {you can|you should|anyone can} {construct|build|make} {a|a basic|your|your basic|a yummy} cheese sandwich {by|by just|by simply} {putting|placing|sticking|inserting|tucking} a {cheese slice|slice of cheese|cheese slice (or two)|slab of cheese} {between|in-between|inside} two {pieces|slices} of {bread|toast|your favorite bread} '''