Back to the Index

Instruction Datasets for Fine-Tuning LLMs

What are Instruction Datasets for Fine-Tuning LLMs?

Instruction datasets are used to fine-tune LLMs. Fine-tuning LLMs typically uses supervised machine learning and includes both an input string and an expected output string. The input and output string follow a template known as an instruction dataset format (e.g., [INST] <<SYS>>). ChatML by OpenAI and Alpaca from Stanford are examples of Instruction Dataset Formats. The following is the instruction data format used by Alpaca for fine-tuning the includes context information (the input field below):

Below is an instruction that describes a task, paired with an input that provides further context. 
Write a response that appropriately completes the request.

### Instruction:
{instruction}

### Input:
{input}

### Response: 
Does this content look outdated? If you are interested in helping us maintain this, feel free to contact us.