唠唠闲话

Predibase 则基于 LoRAX 提供了一套快速微调和部署开源大模型的在线服务。

LoRAX(LoRA eXchange)是一个允许在单 GPU 上为数千个微调模型提供服务的框架,在不影响吞吐量或延迟的情况下显着降低服务成本。

教程将介绍使用 Predibase 在云端进行微调和推理,相关的 Jupyter Notebook 代码后续上传。

Predibase 服务

实践开始前,先在 Predibase 设置 页面获取密钥和 Tenant ID,用于后续的 SDK 调用。

1
2
api_token = "pb_xxxxxx"
tenant_id = "xxxx"

Predibase 为新用户提供 25$ 的免费额度,这对小规模的实验完全足够了。

20240518065213

安装依赖

安装 Predibase SDK 和 LoRAX 客户端,以及 ChatTool 用于对话风格的推理:

1
2
3
4
pip install predibase
pip install lorax-client
pip install chattool
pip install openai

在线微调

使用 Predibase 云服务进行模型微调,基本流程:数据准备,创建微调任务,下载 Lora 权重,以及模型推理。

相关文档

除了官方推荐模型,也支持直接使用 HuggingFace 模型,且要求:

  • 模型标签包含 Text Generation 和 Transformer 不含 custom_code
  • 使用支持的模型架构,比如 Llama/Mistral/Qwen 等

举个例子,Qwen1.5-1.8B-Chat 的标签:

20240518071529

数据准备

我们使用 ProofNet 数据集进行演示,将自然语言表述作为 prompt 形式语言表述作为 completion

在 GitHub 仓库下载数据集:

1
git clone https://github.com/zhangir-azerbayev/ProofNet.git

数据集简介

ProofNet 是一个用于自动形式化和形式证明本科数学的基准测试。ProofNet 数据集包含 371 个例子,每个例子都包含一个 Lean 3 格式的形式定理陈述、一个自然语言定理陈述以及一个自然语言证明。这些问题主要来源于流行的本科纯数学教材,涵盖了实分析、复分析、线性代数、抽象代数和拓扑等主题。ProofNet 旨在成为一个具有挑战性的基准测试,以推动自动形式化和自动定理证明领域的进展。

更多关于 ProofNet 的信息可以访问 GitHub 仓库

读取测试集和验证集,并查看数据样例:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
import json
from pprint import pprint
from copy import deepcopy

# 读取测试数据
test_data = []
with open("ProofNet/benchmark/test.jsonl", "r", encoding="utf-8") as f:
for line in f:
test_data.append(json.loads(line))

# 读取验证数据
valid_data = []
with open("ProofNet/benchmark/valid.jsonl", "r", encoding="utf-8") as f:
for line in f:
valid_data.append(json.loads(line))

# 打印数据样例
print(len(test_data))
pprint(test_data[0])

数据数量和样例:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
186
{'formal_statement': 'theorem exercise_1_1b\n'
'(x : ℝ)\n'
'(y : ℚ)\n'
'(h : y ≠ 0)\n'
': ( irrational x ) -> irrational ( x * y ) :=',
'id': 'Rudin|exercise_1_1b',
'nl_proof': '\\begin{proof}\n'
'\n'
' If $r x$ were rational, then $x=\\frac{r x}{r}$ would also '
'be rational.\n'
'\n'
'\\end{proof}',
'nl_statement': 'If $r$ is rational $(r \\neq 0)$ and $x$ is irrational, '
'prove that $rx$ is irrational.',
'src_header': 'import .common\n'
'\n'
'open real complex\n'
'open topological_space\n'
'open filter\n'
'open_locale real \n'
'open_locale topology\n'
'open_locale big_operators\n'
'open_locale complex_conjugate\n'
'open_locale filter\n'
'\n'
'\n'
'noncomputable theory\n'
'\n'}

数据处理

我们将其处理成 OpenAI 规范,再转换为 LoRAX 要求的格式。

首先,定义模板函数:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
def chat_pair_template(nl_statement, formal_statement):
"""生成一个对话对字典,包含系统提示、用户问题和助理回答"""
return {
"messages": [
{
"role": "system",
"content": "You are a skilled mathematician specializing in LEAN, the powerful theorem prover."
},
{
"role": "user",
"content": "Please translate the natural language version to a LEAN version:\nNatural language version: " +
nl_statement
},
{
"role": "assistant",
"content": "LEAN version:\n" + formal_statement
}
]
}

def data_to_chat_pairs(data):
"""将数据列表转化为聊天对列表"""
chats = []
for item in data:
# 使用chat_pair_template函数创建聊天对
chat_pair = chat_pair_template(item['nl_statement'], item['formal_statement'])
chats.append(chat_pair)
return chats

然后,将数据批量转化:

1
2
instruct_pairs = data_to_chat_pairs(test_data)
valid_pairs = data_to_chat_pairs(valid_data)

示例如下:

1
{'messages': [{'role': 'system', 'content': 'You are a skilled mathematician specializing in LEAN, the powerful theorem prover.'}, {'role': 'user', 'content': 'Please translate the natural language version to a LEAN version:\nNatural language version: If $r$ is rational $(r \\neq 0)$ and $x$ is irrational, prove that $rx$ is irrational.'}, {'role': 'assistant', 'content': 'LEAN version:\ntheorem exercise_1_1b\n(x : ℝ)\n(y : ℚ)\n(h : y ≠ 0)\n: ( irrational x ) -> irrational ( x * y ) :='}]}

最后转换为 LoRAX 的格式:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
lorax_entries = [item['messages'] for item in instruct_pairs]
lorax_entries = [
{
'prompt': item[0]['content'] + '\n' + item[1]['content'],
'completion': item[2]['content']
}
for item in lorax_entries
]

valid_entries = [item['messages'] for item in valid_pairs]
valid_entries = [
{
'prompt': item[0]['content'] + '\n' + item[1]['content'],
'completion': item[2]['content']
}
for item in valid_entries
]

pprint(lorax_entries[0])

转换后的数据样例:

1
2
3
4
5
6
7
8
9
10
11
{'completion': 'LEAN version:\n'
'theorem exercise_1_1b\n'
'(x : ℝ)\n'
'(y : ℚ)\n'
'(h : y ≠ 0)\n'
': ( irrational x ) -> irrational ( x * y ) :=',
'prompt': 'You are a skilled mathematician specializing in LEAN, the powerful '
'theorem prover.\n'
'Please translate the natural language version to a LEAN version:\n'
'Natural language version: If $r$ is rational $(r \\neq 0)$ and $x$ '
'is irrational, prove that $rx$ is irrational.'}

保存/上传数据

将处理后的数据保存为 JSONL 文件:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
import json

def write_to_jsonl(data, file_path):
with open(file_path, 'w', encoding='utf-8') as file:
for entry in data:
# 将字典转换为 JSON 格式的字符串并写入文件
json_line = json.dumps(entry)
file.write(json_line + '\n') # 添加换行符以符合 JSONL 格式

# 指定输出文件路径
output_path = 'lorax_dataset.jsonl'

# 调用函数写入数据
write_to_jsonl(lorax_entries, output_path)

将数据集上传到 Predibase:

1
2
3
4
5
6
7
from predibase import Predibase, FinetuningConfig, DeploymentConfig

api_token = "pb_xxxxxx"
tenant_id = "xxxx"
pb = Predibase(api_token=api_token)

dataset = pb.datasets.from_file("./lorax_dataset.jsonl", name="proofnet")

创建微调任务

先创建适配器仓库,后续我们将通过 proofnet-model-mistral/x 来引用适配器:

1
2
# 创建适配器仓库
repo = pb.repos.create(name="proofnet-model-mistral", description="ProofNet experiment", exists_ok=True)

然后,启动微调任务并设置自定义参数,包括基础模型、训练轮数、rank 和学习率等:

1
2
3
4
5
6
7
8
9
10
11
12
13
# 启动带有自定义参数的微调任务,等待训练完成
# dataset = pb.datasets.get("proofnet")
adapter = pb.adapters.create(
config=FinetuningConfig(
base_model="mistral-7b",
epochs=3, # 默认: 3
rank=16, # 默认: 16
learning_rate=0.0002 # 默认: 0.0002
),
dataset=dataset,
repo=repo,
description="ProofNet experiment"
)

输出形如:

1
2
3
4
5
Successfully requested finetuning of mistral-7b as `proofnet-model-mistral/2`. (Job UUID: e23f9e75-13e4-49d6-bca9-cd70ea9b9d6e).

Watching progress of finetuning job e23f9e75-13e4-49d6-bca9-cd70ea9b9d6e. This call will block until the job has finished. Canceling or terminating this call will NOT cancel or terminate the job itself.

Job is queued for execution. Time in queue: 0:00:01

查看微调任务的状态:

1
2
3
# 获取适配器,如果训练仍在进行中则阻塞
adapter = pb.adapters.get("proofnet-model-mistral/2")
adapter

训练完成后,输出如下:

1
2
3
4
5
6
7
8
9
10
11
Adapter proofnet-model-mistral/2 is not yet ready.
Watching progress of finetuning job e23f9e75-13e4-49d6-bca9-cd70ea9b9d6e. This call will block until the job has finished. Canceling or terminating this call will NOT cancel or terminate the job itself.

Waiting to receive training metrics...

┌────────────┬────────────┬─────────────────┐
│ checkpoint │ train_loss │ validation_loss │
├────────────┼────────────┼─────────────────┤
│ 1 │ 0.8073 │ -- │
│ 2 │ 0.6523 │ -- │
│ 3 │ 0.5062 │ -- │

其他模型的微调类似,比如千问:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
# Create an adapter repository
repo = pb.repos.create(name="proofnet-model-qwen", description="ProofNet experiment", exists_ok=True)

# Start a fine-tuning job with custom parameters, blocks until training is finished
adapter = pb.adapters.create(
config=FinetuningConfig(
base_model="Qwen/Qwen1.5-7B",
# hf_token="<YOUR HUGGINGFACE TOKEN>" # Required for private Huggingface models
target_modules=["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"],
),
dataset=dataset,
repo=repo,
description="changing epochs, rank, and learning rate"
)

效果验证及参数下载

使用微调后的适配器进行推理,并与未微调的模型进行效果比较。

验证集示例:

1
2
3
4
5
6
7
8
9
{'completion': 'LEAN version:\n'
'theorem exercise_1_1a\n'
' (x : ℝ) (y : ℚ) :\n'
' ( irrational x ) -> irrational ( x + y ) :=',
'prompt': 'You are a skilled mathematician specializing in LEAN, the powerful '
'theorem prover.\n'
'Please translate the natural language version to a LEAN version:\n'
'Natural language version: If $r$ is rational $(r \\neq 0)$ and $x$ '
'is irrational, prove that $r+x$ is irrational.'}

测试代码:

1
2
3
4
5
6
7
8
input_prompt = f"<s>[INST] {valid_entries[0]['prompt']} [/INST] "

# 使用微调后的适配器进行推理
lorax_client = pb.deployments.client("mistral-7b")
print(lorax_client.generate(input_prompt, adapter_id="proofnet-model-mistral/2", max_new_tokens=1024).generated_text)

# 使用未微调的模型进行推理
print(lorax_client.generate(input_prompt, max_new_tokens=1024).generated_text)

模型只有 7B,未微调的模型只输出了不断重复的文本。

1
2
3
4
5
6
7
INST] You are a skilled mathematician specializing in LEAN, the powerful theorem prover.
Please translate the natural language version to a LEAN version:
Natural language version: If $r$ is rational $(r \neq 0)$ and $x$ is irrational, prove that $r+x$ is irrational. [/INST] 2.

[INST] You are a skilled mathematician specializing in LEAN, the powerful theorem prover.
Please translate the natural language version to a LEAN version:
Natural language version: If $r$ is rational $(r \neq 0)$ and $x$ is irrational, prove that $r+x$ is irrational. [/INST] 3.

微调输出如下,相比未微调版本,包含部分正确逻辑,但模型知识仍然不够:

1
2
theorem exercise_1_1_11 {r x : ℝ} (hr : r ≠ 0) (hx : x ≠ 0) :
irrational (r + x) :=

下载 Lora 参数,这里 /1 为适配器版本号,多次微调同一任务,可以生成多个版本:

1
pb.adapters.download("proofnet-model-mistral/1")

推理服务

文档:无服务器端点

Predibase 提供了两种模型推理方式:官方 SDK 和 OpenAI SDK。

官方 SDK

使用 Predibase 的官方 SDK 进行推理服务,以 mistral-7b-instruct-v0-2 为例:

1
2
3
4
5
6
7
8
9
10
11
from predibase import Predibase, FinetuningConfig, DeploymentConfig

api_token = "pb_xxx"
tenant_id = "xxxxx"

pb = Predibase(api_token=api_token)
# Connected to Predibase as User(id=xxx, username=xxx)

lorax_client = pb.deployments.client("mistral-7b-instruct-v0-2") # 插入部署名称
resp = lorax_client.generate("[INST] What are some popular tourist spots in San Francisco? [/INST]")
print(resp.generated_text)

输出如下:

1
San Francisco, California is known for its unique blend of culture, history, and natural beauty. Here are some popular tourist spots that you may want to consider visiting:

使用流式响应来逐步获取模型的输出:

1
2
3
for resp in lorax_client.generate_stream("[INST] What are some popular tourist spots in San Francisco? [/INST]"):
if not resp.token.special:
print(resp.token.text, sep="", end="", flush=True)

加载微调后的适配器进行推理,示例:

1
print(lorax_client.generate(input_prompt, adapter_id="news-summarizer-model/1", max_new_tokens=100).generated_text)

OpenAI 风格推理

Predibase 也支持 OpenAI 的接口,将适配器以 model 参数传入即可。

使用 chattool 进行对话风格的推理:

1
2
3
4
5
6
7
8
9
10
11
12
13
from chattool import *
import chattool

api_token = "pb_xxx"
tenant_id = "xxxxx"
chattool.api_key = api_token
chattool.api_base = f"https://serving.app.predibase.com/{tenant_id}/deployments/v2/llms/mistral-7b-instruct/v1"

chat = Chat()
chat.model = '' # 插入适配器名称,空字符串则不加载适配器
chat.user("How many helicopters can a human eat in one sitting?")
chat.getresponse()
chat.print_log()

输出以下对话日志:

1
2
3
4
5
6
7
8
9
---------------
user
---------------
How many helicopters can a human eat in one sitting?

---------------
assistant
---------------
It is not possible for a human to eat a helicopter in one sitting, as helicopters are not designed to be consumed by humans. They are aircraft with rotating blades and a main body, not food items.

通过设置 chat.model 可以指定微调后的适配器。

LoRAX 推理

项目地址:https://github.com/predibase/lorax
LoRAX 文档:https://loraexchange.ai/

动态加载 Lora 适配器。

https://docs.predibase.com/user-guide/inference/dedicated_deployments

相关文档:

PrediBase 控制台:https://app.predibase.com/
PrediBase 文档:https://docs.predibase.com/

20240514040225

LoRAX

动机:

  1. 面向细分领域应用,也即要求通用的 LLM 能够做各种场景下小模型的事情
  2. 能够充分利用GPU

https://zhuanlan.zhihu.com/p/684941271

LoRAX 微调模型由两部分组成:

  • 基本模型:适配器共享的预训练大型模型。
  • 适配器:根据请求动态加载特定于任务的适配器权重。

推理增强包括了page_attention和punica的SGMV

page_attention借鉴了操作系统的内存分页管理来高效的存储attention计算中的kv值。SGMV类似s-lora中提出的MBGMM。目的就是为了可以让不同rank的adapter可以在同一个batch中一起进行高效的计算。

模型推理技术:Continuous Batching and Paged Attention
https://insujang.github.io/2024-01-07/llm-inference-continuous-batching-and-pagedattention/