[THUDM/ChatGLM-6B]chatglm-6B-int4 如何在 Mac M2 上微调

推理过程遇到的问题

加载模型的时候，使用 float() 可以正常加载，但是如果使用 to("mps") 也会报错
```
model = AutoModel.from_pretrained("chatglm-6b-int4", trust_remote_code=True).float()
```

File "/Users/diaojunxian/anaconda3/envs/py3.11/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _call_impl return forward_call(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/diaojunxian/.cache/huggingface/modules/transformers_modules/chatglm-6b-int4/quantization.py", line 392, in forward output = W8A16Linear.apply(input, self.weight, self.weight_scale, self.weight_bit_width) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/diaojunxian/anaconda3/envs/py3.11/lib/python3.11/site-packages/torch/autograd/function.py", line 506, in apply return super().apply(args, **kwargs) # type: ignore[misc] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/diaojunxian/.cache/huggingface/modules/transformers_modules/chatglm-6b-int4/quantization.py", line 57, in forward weight = extract_weight_to_half(quant_w, scale_w, weight_bit_width) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/diaojunxian/.cache/huggingface/modules/transformers_modules/chatglm-6b-int4/quantization.py", line 275, in extract_weight_to_half func = kernels.int4WeightExtractionHalf ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ AttributeError: 'NoneType' object has no attribute 'int4WeightExtractionHalf'


2. 可以在 m2 基于原始版本进行推理测试，但是真正执行 微调的时候，由于没有办法使用 gpu，想体验 torch.device("mps")，看起来无法执行？

3. 已经做过尝试的步骤，仍然无效；
![image](https://github.com/THUDM/ChatGLM-6B/assets/19700467/a3bac13b-2933-4b3c-b5e8-bf7875c2307f)

### 微调过程中遇到的问题

1. 下载 chatglm-6B-int4 的模型，基于此进行微调；
2. 运行环境：

Apple Mac M2 机器 torch==2.1.0.dev20230507

5. 执行 bash train.sh 报错

Traceback (most recent call last): File "/Users/diaojunxian/Documents/agi/ChatGLM-6B/ptuning/main.py", line 433, in main() File "/Users/diaojunxian/Documents/agi/ChatGLM-6B/ptuning/main.py", line 372, in main train_result = trainer.train(resume_from_checkpoint=checkpoint) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/diaojunxian/Documents/agi/ChatGLM-6B/ptuning/trainer.py", line 1635, in train return inner_training_loop( ^^^^^^^^^^^^^^^^^^^^ File "/Users/diaojunxian/Documents/agi/ChatGLM-6B/ptuning/trainer.py", line 1904, in _inner_training_loop tr_loss_step = self.training_step(model, inputs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/diaojunxian/Documents/agi/ChatGLM-6B/ptuning/trainer.py", line 2665, in training_step loss.backward() File "/Users/diaojunxian/anaconda3/envs/py3.11/lib/python3.11/site-packages/torch/_tensor.py", line 488, in backward torch.autograd.backward( File "/Users/diaojunxian/anaconda3/envs/py3.11/lib/python3.11/site-packages/torch/autograd/init.py", line 204, in backward Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass File "/Users/diaojunxian/anaconda3/envs/py3.11/lib/python3.11/site-packages/torch/autograd/function.py", line 274, in apply return user_fn(self, *args) ^^^^^^^^^^^^^^^^^^^^ File "/Users/diaojunxian/anaconda3/envs/py3.11/lib/python3.11/site-packages/torch/utils/checkpoint.py", line 226, in backward torch.autograd.backward(outputs_with_grad, args_with_grad) File "/Users/diaojunxian/anaconda3/envs/py3.11/lib/python3.11/site-packages/torch/autograd/init.py", line 204, in backward Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass RuntimeError: function W8A16LinearCPUBackward returned an incorrect number of gradients (expected 5, got 4)

diaojunxian

RuntimeError: function W8A16LinearCPUBackward returned an incorrect number of gradients (expected 5, got 4)，这个错误通过修改quantization.py line93为return grad_input.view(ctx.inp_shape), grad_weight.view(ctx.weight_shape), None, None, None 可以解决

mrzrx

是的，这个可以解决，能够实现通过 cpu 微调，但是如果我在 quantizaiton.py 中这么改动，微调的时候，就报错了。

-            if self.device == torch.device("cpu"):
+            if self.device == torch.device("mps"):

 File "/Users/diaojunxian/anaconda3/envs/3.11/lib/python3.11/site-packages/torch/autograd/function.py", line 506, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/diaojunxian/.cache/huggingface/modules/transformers_modules/chatglm-6b-int4/quantization.py", line 56, in forward
    weight = extract_weight_to_half(quant_w, scale_w, weight_bit_width)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/diaojunxian/.cache/huggingface/modules/transformers_modules/chatglm-6b-int4/quantization.py", line 274, in extract_weight_to_half
    func = kernels.int4WeightExtractionHalf
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'int4WeightExtractionHalf'

diaojunxian

@duzx16 单纯用 cpu P-tuning 训练 chatglm-6B-int4 出来的模型，在加载以后报错了。

Traceback (most recent call last):
  File "/Users/diaojunxian/Library/Application Support/JetBrains/PyCharmCE2023.1/scratches/scratch.py", line 21, in <module>
    model.transformer.prefix_encoder.load_state_dict(new_prefix_state_dict)
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/diaojunxian/anaconda3/envs/3.11/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1630, in __getattr__
    raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'ChatGLMModel' object has no attribute 'prefix_encoder'

已经特别检查加载方式，所以还有什么可能会导致报错？

tokenizer = AutoTokenizer.from_pretrained("chatglm-6b-int4", trust_remote_code=True, pre_seq_len=128)
transformers 的版本 transformers==4.28.1

已经解决：

增加 config 引入 config = AutoConfig.from_pretrained("chatglm-6b-int4", trust_remote_code=True, pre_seq_len=128)

diaojunxian

我是进行ptuning微调过程中出现相同问题，经过以下改动。改动：

config = AutoConfig.from_pretrained("chatglm-6b-int4", trust_remote_code=True, pre_seq_len=128)
return grad_input.view(ctx.inp_shape), grad_weight.view(ctx.weight_shape), None, None, None

还是报同样的问题： File "/usr/local/lib/python3.10/dist-packages/torch/autograd/init.py", line 200, in backward Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass RuntimeError: function W8A16LinearCPUBackward returned an incorrect number of gradients (expected 5, got 4)

我是windos10系统 @mrzrx @diaojunxian

xiaoToby

你好，这个问题AttributeError: 'NoneType' object has no attribute 'int4WeightExtractionHalf'，是通过什么方法进行解决的呢？谢谢

qianxianyang

@qianxianyang 其实我没有解决，我有遇到是因为我强制指定了 model.to("mps")，导致走了非 device=cpu 的分支，如果你本地的环境，没有 cuda，你可以指定 device == cpu

还有，kernel 如果为None，会遇到 NoneType' object has no attribute 'int4WeightExtractionHalf 这个问题，应该还是走了 device=cuda 的走个条件，建议检查一下代码。

diaojunxian

你好，这个问题AttributeError: 'NoneType' object has no attribute 'int4WeightExtractionHalf'，是通过什么方法进行解决的呢？谢谢

其实就按楼主的办法就解决问题了，主要是你要找到你的kernel路径

xiaoToby

请问如果使用M2 Max 96GB的配置，是否可以直接用FP16的精度，在Pytorch mps模式上面进行ptuning？

xgsong

请问如果使用M2 Max 96GB的配置，是否可以直接用FP16的精度，在Pytorch mps模式上面进行ptuning？

我其实没有 run 起来，用 mps 跑的时候，会有错误。后来我用 cpu 验证了一下过程。

diaojunxian

[THUDM/ChatGLM-6B]chatglm-6B-int4 如何在 Mac M2 上微调

回答

相关问题