Foundations of LLMs Lesson 1
Notes for lesson 1 of w&b course on foundations for LLM
Last updated
Notes for lesson 1 of w&b course on foundations for LLM
Last updated
Learning objectives:
Have a good mental model of when to train or fine-tune LLMs
Understand high level the key pieces to make it work successfully
Understand why and how to participate in the Neurips LLM Efficiency Challenge
LLMs are decoder-only models. Trained to predict the next word in the sequence.
Pre-train/ train the LLM when you want to have full control over the training data.
Finetune the open-source LLM when you have to have more control but want to have cheaper inference.
Use commercial APIs when you have to reduce the time to market.
Instruction tuning and RLHF instruction tuning: We give text as the instruction and expect the model to follow that instruction.
Supervised finetuning: We provide the model with instructions and responses and train the model to behave in a particular way.
Chatgpt started as the code model trained to help with programming Then in further iterations it was trained with RLHF (technique above discussed) for aligning more with human responses.
What is the goal and what are the evaluation criteria?
Choose model architecture and foundation model
Create the right dataset.
Efficient training and finetuning of the model on data.
Quantization ( to fit the model into limited memory )
PEFT ( instead of training the whole model we train some part or append model with new parameters) ( see images below)
Techniques by model structure
adapters
LORA Low-rank matrix adaptation
QLORA Quantized low-rank matrix adaptation(using lower precision for finetuning the LLM)
Techniques by feeding data in different types
Prompt tuning
Prefix tuning
P-tuning
Task specific: When we have a task specified we can finetune the model to do that task based on a certain prompt or instruction.
RLHF Reinforcement learning with human feedback: We have two model techniques. involves training the reward model where we teach that reward model to what humans prefer. We train the LLM with reinforcement learning to align with human preferences.