[THUDM/ChatGLM-6B][BUG/Help] 运行bash ds_train_finetune.sh 报错:RuntimeError: expected scalar type Half but found Float

2024-05-20 892 views
2

我希望使用ds_train_finetune对chatglm-6b-int4-qe模型进行微调,但报错:RuntimeError: expected scalar type Half but found Float

  • 运行结果track back如下:
    
    05/11/2023 13:43:01 - WARNING - transformers_modules.chatglm-6b-int4-qe.modeling_chatglm - `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
    ╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
    │ /home/ecs-user/chatglm/ChatGLM-6B/ptuning/main.py:431 in <module>                                │
    │                                                                                                  │
    │   428                                                                                            │
    │   429                                                                                            │
    │   430 if __name__ == "__main__":                                                                 │
    │ ❱ 431 │   main()                                                                                 │
    │   432                                                                                            │
    │                                                                                                  │
    │ /home/ecs-user/chatglm/ChatGLM-6B/ptuning/main.py:370 in main                                    │
    │                                                                                                  │
    │   367 │   │   #     checkpoint = last_checkpoint                                                 │
    │   368 │   │   model.gradient_checkpointing_enable()                                              │
    │   369 │   │   model.enable_input_require_grads()                                                 │
    │ ❱ 370 │   │   train_result = trainer.train(resume_from_checkpoint=checkpoint)                    │
    │   371 │   │   # trainer.save_model()  # Saves the tokenizer too for easy upload                  │
    │   372 │   │                                                                                      │
    │   373 │   │   metrics = train_result.metrics                                                     │
    │                                                                                                  │
    │ /home/ecs-user/chatglm/ChatGLM-6B/ptuning/trainer.py:1635 in train                               │
    │                                                                                                  │
    │   1632 │   │   inner_training_loop = find_executable_batch_size(                                 │
    │   1633 │   │   │   self._inner_training_loop, self._train_batch_size, args.auto_find_batch_size  │
    │   1634 │   │   )                                                                                 │
    │ ❱ 1635 │   │   return inner_training_loop(                                                       │
    │   1636 │   │   │   args=args,                                                                    │
    │   1637 │   │   │   resume_from_checkpoint=resume_from_checkpoint,                                │
    │   1638 │   │   │   trial=trial,                                                                  │
    │                                                                                                  │
    │ /home/ecs-user/chatglm/ChatGLM-6B/ptuning/trainer.py:1904 in _inner_training_loop                │
    │                                                                                                  │
    │   1901 │   │   │   │   │   with model.no_sync():                                                 │
    │   1902 │   │   │   │   │   │   tr_loss_step = self.training_step(model, inputs)                  │
    │   1903 │   │   │   │   else:                                                                     │
    │ ❱ 1904 │   │   │   │   │   tr_loss_step = self.training_step(model, inputs)                      │
    │   1905 │   │   │   │                                                                             │
    │   1906 │   │   │   │   if (                                                                      │
    │   1907 │   │   │   │   │   args.logging_nan_inf_filter                                           │
    │                                                                                                  │
    │ /home/ecs-user/chatglm/ChatGLM-6B/ptuning/trainer.py:2647 in training_step                       │
    │                                                                                                  │
    │   2644 │   │   │   return loss_mb.reduce_mean().detach().to(self.args.device)                    │
    │   2645 │   │                                                                                     │
    │   2646 │   │   with self.compute_loss_context_manager():                                         │
    │ ❱ 2647 │   │   │   loss = self.compute_loss(model, inputs)                                       │
    │   2648 │   │                                                                                     │
    │   2649 │   │   if self.args.n_gpu > 1:                                                           │
    │   2650 │   │   │   loss = loss.mean()  # mean() to average on multi-gpu parallel training        │
    │                                                                                                  │
    │ /home/ecs-user/chatglm/ChatGLM-6B/ptuning/trainer.py:2679 in compute_loss                        │
    │                                                                                                  │
    │   2676 │   │   │   labels = inputs.pop("labels")                                                 │
    │   2677 │   │   else:                                                                             │
    │   2678 │   │   │   labels = None                                                                 │
    │ ❱ 2679 │   │   outputs = model(**inputs)                                                         │
    │   2680 │   │   # Save past state if it exists                                                    │
    │   2681 │   │   # TODO: this needs to be fixed and made cleaner later.                            │
    │   2682 │   │   if self.args.past_index >= 0:                                                     │
    │                                                                                                  │
    │ /home/ecs-user/.local/lib/python3.8/site-packages/torch/nn/modules/module.py:1501 in _call_impl  │
    │                                                                                                  │
    │   1498 │   │   if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks   │
    │   1499 │   │   │   │   or _global_backward_pre_hooks or _global_backward_hooks                   │
    │   1500 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):                   │
    │ ❱ 1501 │   │   │   return forward_call(*args, **kwargs)                                          │
    │   1502 │   │   # Do not call functions when jit is used                                          │
    │   1503 │   │   full_backward_hooks, non_full_backward_hooks = [], []                             │
    │   1504 │   │   backward_pre_hooks = []                                                           │
    │                                                                                                  │
    │ /home/ecs-user/.local/lib/python3.8/site-packages/deepspeed/utils/nvtx.py:15 in wrapped_fn       │
    │                                                                                                  │
    │   12 │                                                                                           │
    │   13 │   def wrapped_fn(*args, **kwargs):                                                        │
    │   14 │   │   get_accelerator().range_push(func.__qualname__)                                     │
    │ ❱ 15 │   │   ret_val = func(*args, **kwargs)                                                     │
    │   16 │   │   get_accelerator().range_pop()                                                       │
    │   17 │   │   return ret_val                                                                      │
    │   18                                                                                             │
    │                                                                                                  │
    │ /home/ecs-user/.local/lib/python3.8/site-packages/deepspeed/runtime/engine.py:1724 in forward    │
    │                                                                                                  │
    │   1721 │   │   if self.fp16_auto_cast():                                                         │
    │   1722 │   │   │   inputs = self._cast_inputs_half(inputs)                                       │
    │   1723 │   │                                                                                     │
    │ ❱ 1724 │   │   loss = self.module(*inputs, **kwargs)                                             │
    │   1725 │   │                                                                                     │
    │   1726 │   │   if self.zero_optimization_partition_weights():                                    │
    │   1727 │   │   │   # Disable automated discovery of external parameters                          │
    │                                                                                                  │
    │ /home/ecs-user/.local/lib/python3.8/site-packages/torch/nn/modules/module.py:1501 in _call_impl  │
    │                                                                                                  │
    │   1498 │   │   if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks   │
    │   1499 │   │   │   │   or _global_backward_pre_hooks or _global_backward_hooks                   │
    │   1500 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):                   │
    │ ❱ 1501 │   │   │   return forward_call(*args, **kwargs)                                          │
    │   1502 │   │   # Do not call functions when jit is used                                          │
    │   1503 │   │   full_backward_hooks, non_full_backward_hooks = [], []                             │
    │   1504 │   │   backward_pre_hooks = []                                                           │
    │                                                                                                  │
    │ /home/ecs-user/.cache/huggingface/modules/transformers_modules/chatglm-6b-int4-qe/modeling_chatg │
    │ lm.py:1158 in forward                                                                            │
    │                                                                                                  │
    │   1155 │   │   use_cache = use_cache if use_cache is not None else self.config.use_cache         │
    │   1156 │   │   return_dict = return_dict if return_dict is not None else self.config.use_return  │
    │   1157 │   │                                                                                     │
    │ ❱ 1158 │   │   transformer_outputs = self.transformer(                                           │
    │   1159 │   │   │   input_ids=input_ids,                                                          │
    │   1160 │   │   │   position_ids=position_ids,                                                    │
    │   1161 │   │   │   attention_mask=attention_mask,                                                │
    │                                                                                                  │
    │ /home/ecs-user/.local/lib/python3.8/site-packages/torch/nn/modules/module.py:1501 in _call_impl  │
    │                                                                                                  │
    │   1498 │   │   if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks   │
    │   1499 │   │   │   │   or _global_backward_pre_hooks or _global_backward_hooks                   │
    │   1500 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):                   │
    │ ❱ 1501 │   │   │   return forward_call(*args, **kwargs)                                          │
    │   1502 │   │   # Do not call functions when jit is used                                          │
    │   1503 │   │   full_backward_hooks, non_full_backward_hooks = [], []                             │
    │   1504 │   │   backward_pre_hooks = []                                                           │
    │                                                                                                  │
    │ /home/ecs-user/.cache/huggingface/modules/transformers_modules/chatglm-6b-int4-qe/modeling_chatg │
    │ lm.py:960 in forward                                                                             │
    │                                                                                                  │
    │    957 │   │   │   layer_past = past_key_values[i]                                               │
    │    958 │   │   │                                                                                 │
    │    959 │   │   │   if self.gradient_checkpointing and self.training:                             │
    │ ❱  960 │   │   │   │   layer_ret = torch.utils.checkpoint.checkpoint(                            │
    │    961 │   │   │   │   │   layer,                                                                │
    │    962 │   │   │   │   │   hidden_states,                                                        │
    │    963 │   │   │   │   │   position_ids,                                                         │
    │                                                                                                  │
    │ /home/ecs-user/.local/lib/python3.8/site-packages/torch/utils/checkpoint.py:249 in checkpoint    │
    │                                                                                                  │
    │   246 │   │   raise ValueError("Unexpected keyword arguments: " + ",".join(arg for arg in kwar   │
    │   247 │                                                                                          │
    │   248 │   if use_reentrant:                                                                      │
    │ ❱ 249 │   │   return CheckpointFunction.apply(function, preserve, *args)                         │
    │   250 │   else:                                                                                  │
    │   251 │   │   return _checkpoint_without_reentrant(                                              │
    │   252 │   │   │   function,                                                                      │
    │                                                                                                  │
    │ /home/ecs-user/.local/lib/python3.8/site-packages/torch/autograd/function.py:506 in apply        │
    │                                                                                                  │
    │   503 │   │   if not torch._C._are_functorch_transforms_active():                                │
    │   504 │   │   │   # See NOTE: [functorch vjp and autograd interaction]                           │
    │   505 │   │   │   args = _functorch.utils.unwrap_dead_wrappers(args)                             │
    │ ❱ 506 │   │   │   return super().apply(*args, **kwargs)  # type: ignore[misc]                    │
    │   507 │   │                                                                                      │
    │   508 │   │   if cls.setup_context == _SingleLevelFunction.setup_context:                        │
    │   509 │   │   │   raise RuntimeError(                                                            │
    │                                                                                                  │
    │ /home/ecs-user/.local/lib/python3.8/site-packages/torch/utils/checkpoint.py:107 in forward       │
    │                                                                                                  │
    │   104 │   │   ctx.save_for_backward(*tensor_inputs)                                              │
    │   105 │   │                                                                                      │
    │   106 │   │   with torch.no_grad():                                                              │
    │ ❱ 107 │   │   │   outputs = run_function(*args)                                                  │
    │   108 │   │   return outputs                                                                     │
    │   109 │                                                                                          │
    │   110 │   @staticmethod                                                                          │
    │                                                                                                  │
    │ /home/ecs-user/.local/lib/python3.8/site-packages/torch/nn/modules/module.py:1501 in _call_impl  │
    │                                                                                                  │
    │   1498 │   │   if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks   │
    │   1499 │   │   │   │   or _global_backward_pre_hooks or _global_backward_hooks                   │
    │   1500 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):                   │
    │ ❱ 1501 │   │   │   return forward_call(*args, **kwargs)                                          │
    │   1502 │   │   # Do not call functions when jit is used                                          │
    │   1503 │   │   full_backward_hooks, non_full_backward_hooks = [], []                             │
    │   1504 │   │   backward_pre_hooks = []                                                           │
    │                                                                                                  │
    │ /home/ecs-user/.cache/huggingface/modules/transformers_modules/chatglm-6b-int4-qe/modeling_chatg │
    │ lm.py:609 in forward                                                                             │
    │                                                                                                  │
    │    606 │   │                                                                                     │
    │    607 │   │   # Layer norm at the begining of the transformer layer.                            │
    │    608 │   │   # [seq_len, batch, hidden_size]                                                   │
    │ ❱  609 │   │   attention_input = self.input_layernorm(hidden_states)                             │
    │    610 │   │                                                                                     │
    │    611 │   │   # Self attention.                                                                 │
    │    612 │   │   attention_outputs = self.attention(                                               │
    │                                                                                                  │
    │ /home/ecs-user/.local/lib/python3.8/site-packages/torch/nn/modules/module.py:1501 in _call_impl  │
    │                                                                                                  │
    │   1498 │   │   if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks   │
    │   1499 │   │   │   │   or _global_backward_pre_hooks or _global_backward_hooks                   │
    │   1500 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):                   │
    │ ❱ 1501 │   │   │   return forward_call(*args, **kwargs)                                          │
    │   1502 │   │   # Do not call functions when jit is used                                          │
    │   1503 │   │   full_backward_hooks, non_full_backward_hooks = [], []                             │
    │   1504 │   │   backward_pre_hooks = []                                                           │
    │                                                                                                  │
    │ /home/ecs-user/.local/lib/python3.8/site-packages/torch/nn/modules/normalization.py:190 in       │
    │ forward                                                                                          │
    │                                                                                                  │
    │   187 │   │   │   init.zeros_(self.bias)                                                         │
    │   188 │                                                                                          │
    │   189 │   def forward(self, input: Tensor) -> Tensor:                                            │
    │ ❱ 190 │   │   return F.layer_norm(                                                               │
    │   191 │   │   │   input, self.normalized_shape, self.weight, self.bias, self.eps)                │
    │   192 │                                                                                          │
    │   193 │   def extra_repr(self) -> str:                                                           │
    │                                                                                                  │
    │ /home/ecs-user/.local/lib/python3.8/site-packages/torch/nn/functional.py:2515 in layer_norm      │
    │                                                                                                  │
    │   2512 │   │   return handle_torch_function(                                                     │
    │   2513 │   │   │   layer_norm, (input, weight, bias), input, normalized_shape, weight=weight, b  │
    │   2514 │   │   )                                                                                 │
    │ ❱ 2515 │   return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.c  │
    │   2516                                                                                           │
    │   2517                                                                                           │
    │   2518 def group_norm(                 

RuntimeError: expected scalar type Half but found Float 0%| | 0/3000 [00:02<?, ?it/s] [2023-05-11 13:43:05,765] [INFO] [launch.py:428:sigkill_handler] Killing subprocess 31673 [2023-05-11 13:43:05,766] [ERROR] [launch.py:434:sigkill_handler] ['/usr/bin/python3', '-u', 'main.py', '--local_rank=0', '--deepspeed', 'deepspeed.json', '--do_train', '--train_file', 'AdvertiseGen/train.json', '--test_file', 'AdvertiseGen/dev.json', '--prompt_column', 'content', '--response_column', 'summary', '--overwrite_cache', '--model_name_or_path', 'train_models/chatglm-6b-int4-qe', '--output_dir', './output/adgen-chatglm-6b-ft-128-2e-2', '--overwrite_output_dir', '--max_source_length', '64', '--max_target_length', '64', '--per_device_train_batch_size', '1', '--per_device_eval_batch_size', '1', '--gradient_accumulation_steps', '16', '--predict_with_generate', '--max_steps', '3000', '--logging_steps', '10', '--save_steps', '1000', '--learning_rate', '2e-2'] exits with return code = 1


### Expected Behavior

_No response_

### Steps To Reproduce

运行 ds_train_finetune.sh :

LR=2e-2 MASTER_PORT=$(shuf -n 1 -i 10000-65535) CHECKPOINT=adgen-chatglm-6b-pt-128-2e-2 STEP=3000

deepspeed --num_gpus=1 --master_port $MASTER_PORT main.py \ --deepspeed deepspeed.json \ --do_train \ --train_file AdvertiseGen/train.json \ --test_file AdvertiseGen/dev.json \ --prompt_column content \ --response_column summary \ --overwrite_cache \ --model_name_or_path train_models/chatglm-6b-int4-qe \ --output_dir ./output/adgen-chatglm-6b-ft-$LR \ --overwrite_output_dir \ --max_source_length 64 \ --max_target_length 64 \ --per_device_train_batch_size 1 \ --per_device_eval_batch_size 1 \ --gradient_accumulation_steps 16\ --predict_with_generate \ --max_steps 3000 \ --logging_steps 10 \ --save_steps 1000 \ --learning_rate $LR \


### Environment

```markdown
- OS: Ubuntu 20.04
- Python: 3.8.10
- Transformers: 4.27.1
- PyTorch: 2.0.0+cu117
- CUDA Support: True

回答

8

--fp16

7

请问,--fp 16是否是必须选项?

我尝试添加了--fp 16,但是会报Out of Memory的错误。

我尝试使用quantization_bit 4量化减轻内存负担,但就会出现我上面提到的Runtime Error: expected scalar type Half but found Float错误。

再后来我直接使用量化好的chatglm-6b-int4-qe模型,但并没有什么变化,依然会出现Runtime Error: expected scalar type Half but found Float错误。

2

不加fp16 quantization=4 那你就在finetune的地方加上model=model.half() 不要model.float()

6

已解决。

当需要使用INT4或者INT8量化的时候,需要在ds_train_finetune.sh中加上 pre_seq_len的配置,例如 --pre_seq_len 128

根据 main.py中的源码,当指定 pre_seq_len 时,采用 P-Tuning v2,其中包含了 model=model.half();不指定 pre_seq_len 时采用 Finetune,model=model.float()

你不是要全参数finetune吗? 你加pre_seq_len 这个参数干嘛?这个是pt的时候的soft prompt长度,pt的时候才需要配置的呀

5

已解决。 当需要使用INT4或者INT8量化的时候,需要在ds_train_finetune.sh中加上 pre_seq_len的配置,例如 --pre_seq_len 128 根据 main.py中的源码,当指定 pre_seq_len 时,采用 P-Tuning v2,其中包含了 model=model.half();不指定 pre_seq_len 时采用 Finetune,model=model.float()

你不是要全参数finetune吗? 你加pre_seq_len 这个参数干嘛?这个是pt的时候的soft prompt长度,pt的时候才需要配置的呀

🤣抱歉,我的理解有问题。应该修改main.py中部分,将model=model.float()修改成model=model.half()

...
    if model_args.pre_seq_len is not None:
        # P-tuning v2
        model = model.half()
        model.transformer.prefix_encoder.float()
    else:
        # Finetune
        #model = model.float()
        model = model.half()

...