使用deepspeed做全量finetune的时候,开启warmup_steps,保存下来的模型参数特别大。
保存下来3块模型,每块23g如图
但是加入我没有打开warmup_steps,就只保存下来两块模型,每个12g
请问有大佬遇到过吗
训练脚本如下
LR=1e-4 MASTER_PORT=$(shuf -n 1 -i 10000-65535)
deepspeed --num_gpus=8 --master_port $MASTER_PORT main.py \ --deepspeed deepspeed.json \ --do_train \ --train_file datapath \ --validation_file datapath \ --prompt_column input \ --response_column output \ --overwrite_cache \ --model_name_or_path modelpath \ --output_dir outputpath \ --overwrite_output_dir \ --max_source_length 512 \ --max_target_length 512 \ --per_device_train_batch_size 16 \ --per_device_eval_batch_size 1 \ --gradient_accumulation_steps 1 \ --predict_with_generate \ --max_steps 2000 \ --logging_steps 10 \ --save_steps 50 \ --learning_rate $LR \ --warmup_steps 100 \ --save_total_limit 2 \ --fp16
deepspeed.json { "train_micro_batch_size_per_gpu": "auto", "zero_allow_untested_optimizer": true, "fp16": { "enabled": "auto", "loss_scale": 0, "initial_scale_power": 16, "loss_scale_window": 1000, "hysteresis": 2, "min_loss_scale": 1 }, "zero_optimization": { "stage": 2, "allgather_partitions": true, "allgather_bucket_size": 5e8, "overlap_comm": false, "reduce_scatter": true, "reduce_bucket_size": 5e8, "contiguous_gradients" : true } }
Environment- OS:CentOS Linux 7 (Core)
- Python:3.8
- Transformers:4.29.0dev
- PyTorch:2.0.0
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :11.7