2
首先我用train_chat.sh
对chatglm-6b-int4
模型进行训练。
然后我尝试通过 https://github.com/THUDM/ChatGLM-6B/tree/main/ptuning#%E6%A8%A1%E5%9E%8B%E9%83%A8%E7%BD%B2 的方法来加载微调后的模型。
在执行model.transformer.prefix_encoder.load_state_dict(new_prefix_state_dict)
的时候报错:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 2041, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for PrefixEncoder:
size mismatch for embedding.weight: copying a param with shape torch.Size([8, 229376]) from checkpoint, the shape in current model is torch.Size([128, 229376]).
root@VM-32-16-ubuntu:/mnt/nfs/ChatGLM-6B/ptuning# python3
Python 3.10.6 (main, Nov 2 2022, 18:53:38) [GCC 11.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> import torch
>>> from transformers import AutoConfig, AutoModel, AutoTokenizer
>>> tokenizer = AutoTokenizer.from_pretrained("/root/chatglm-6b-int4", trust_remote_code=True)
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
>>> config = AutoConfig.from_pretrained("/root/chatglm-6b-int4", trust_remote_code=True, pre_seq_len=128)
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
>>> model = AutoModel.from_pretrained("/root/chatglm-6b-int4", config=config, trust_remote_code=True)
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
No compiled kernel found.
Compiling kernels : /root/.cache/huggingface/modules/transformers_modules/chatglm-6b-int4/quantization_kernels_parallel.c
Compiling gcc -O3 -fPIC -pthread -fopenmp -std=c99 /root/.cache/huggingface/modules/transformers_modules/chatglm-6b-int4/quantization_kernels_parallel.c -shared -o /root/.cache/huggingface/modules/transformers_modules/chatglm-6b-int4/quantization_kernels_parallel.so
Load kernel : /root/.cache/huggingface/modules/transformers_modules/chatglm-6b-int4/quantization_kernels_parallel.so
Setting CPU quantization kernel threads to 6
Using quantization cache
Applying quantization to glm layers
Some weights of ChatGLMForConditionalGeneration were not initialized from the model checkpoint at /root/chatglm-6b-int4 and are newly initialized: ['transformer.prefix_encoder.embedding.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
>>> prefix_state_dict = torch.load("/mnt/nfs/chatglm_checkpoint/checkpoint-40/pytorch_model.bin")
>>> new_prefix_state_dict = {}
>>> for k, v in prefix_state_dict.items():
... if k.startswith("transformer.prefix_encoder."):
... new_prefix_state_dict[k[len("transformer.prefix_encoder."):]] = v
...
>>> new_prefix_state_dict
{'embedding.weight': tensor([[ 1.9268, 1.4873, 0.9009, ..., -0.9419, -0.3421, -0.3513],
[-0.4839, -0.1821, 1.0518, ..., 0.8515, -2.6099, -0.2716],
[ 0.6895, -0.0231, -0.3374, ..., -1.5180, -0.3101, 1.9832],
...,
[ 0.2471, -0.4341, 0.2673, ..., -0.4657, -0.3695, 0.4011],
[-0.2043, -0.4939, -1.4922, ..., -0.0732, -0.6814, -2.1821],
[ 1.5078, 1.1973, -0.9023, ..., 0.3872, -0.8471, 0.8122]],
device='cuda:0')}
>>> model.transformer.prefix_encoder.load_state_dict(new_prefix_state_dict)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 2041, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for PrefixEncoder:
size mismatch for embedding.weight: copying a param with shape torch.Size([8, 229376]) from checkpoint, the shape in current model is torch.Size([128, 229376]).
>>>
Environment
- OS: Ubuntu 22.04
- Python: Python 3.10.6
- Transformers: 4.27.1
- PyTorch: 2.0.0
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) : True