[THUDM/ChatGLM-6B][BUG/Help] 使用deepspeed做全量finetune的时候，开启warmup_steps，保存下来的模型参数特别大。

使用deepspeed做全量finetune的时候，开启warmup_steps，保存下来的模型参数特别大。

保存下来3块模型，每块23g如图

但是加入我没有打开warmup_steps，就只保存下来两块模型，每个12g

请问有大佬遇到过吗

训练脚本如下

LR=1e-4 MASTER_PORT=$(shuf -n 1 -i 10000-65535)

deepspeed --num_gpus=8 --master_port $MASTER_PORT main.py \ --deepspeed deepspeed.json \ --do_train \ --train_file datapath \ --validation_file datapath \ --prompt_column input \ --response_column output \ --overwrite_cache \ --model_name_or_path modelpath \ --output_dir outputpath \ --overwrite_output_dir \ --max_source_length 512 \ --max_target_length 512 \ --per_device_train_batch_size 16 \ --per_device_eval_batch_size 1 \ --gradient_accumulation_steps 1 \ --predict_with_generate \ --max_steps 2000 \ --logging_steps 10 \ --save_steps 50 \ --learning_rate $LR \ --warmup_steps 100 \ --save_total_limit 2 \ --fp16

deepspeed.json { "train_micro_batch_size_per_gpu": "auto", "zero_allow_untested_optimizer": true, "fp16": { "enabled": "auto", "loss_scale": 0, "initial_scale_power": 16, "loss_scale_window": 1000, "hysteresis": 2, "min_loss_scale": 1 }, "zero_optimization": { "stage": 2, "allgather_partitions": true, "allgather_bucket_size": 5e8, "overlap_comm": false, "reduce_scatter": true, "reduce_bucket_size": 5e8, "contiguous_gradients" : true } }

Environment

- OS:CentOS Linux 7 (Core)
- Python:3.8
- Transformers:4.29.0dev
- PyTorch:2.0.0
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :11.7

GeorgeLuImmortal

请问如何解决

GUORUIWANG

torch save的模型参数共享了内存，需要clone后再保存

dwdb

方便分享下用到的包的版本嘛？

Kouuh

方便分享下用到的包的版本嘛？

OS:CentOS Linux 7 (Core)
Python:3.8
Transformers:4.29.0dev
PyTorch:2.0.0
CUDA Support (python -c "import torch; print(torch.cuda.is_available())") :11.7

GeorgeLuImmortal

请问如何解决

没有解决

GeorgeLuImmortal

原来的模型文件加起来也只有12G，全参数微调之后有两个12G的，想请教这个是什么原因造成的呢？

NoobJiahua

[THUDM/ChatGLM-6B][BUG/Help] 使用deepspeed做全量finetune的时候，开启warmup_steps，保存下来的模型参数特别大。

回答

相关问题