[THUDM/ChatGLM-6B][BUG] evaluate时predict结果为空,

2024-05-21 628 views
9

evaluate.sh内容: PRE_SEQ_LEN=128 CHECKPOINT=viewgen0421-chatglm-6b-pt-128-2e-2 STEP=5000

CUDA_VISIBLE_DEVICES=1 python3 main.py \ --do_predict \ --validation_file /home/workspace/data/dev.json \ --test_file /home/workspace/data/dev.json \ --overwrite_cache \ --prompt_column content \ --response_column summary \ --model_name_or_path /home/workspace/chatglm/chatglm-6B \ --ptuning_checkpoint ./output/$CHECKPOINT/checkpoint-$STEP \ --output_dir ./output/$CHECKPOINT \ --overwrite_output_dir \ --max_source_length 512 \ --max_target_length 512 \ --per_device_eval_batch_size 1 \ --predict_with_generate \ --pre_seq_len $PRE_SEQ_LEN \ --quantization_bit 4

日志输出warning: Input length of input_ids is 512, but max_length is set to 512. This can lead to unexpected behavior. You should consider increasing max_new_tokens.

导致 rouge.get_scores报错 ValueError: Hypothesis is empty. https://github.com/THUDM/ChatGLM-6B/blob/aeced3619b804d20d2396576f6d5bc8dc8226913/ptuning/main.py#L328

尝试调整max_length =1025 ,可以修复这个问题 https://github.com/THUDM/ChatGLM-6B/blob/aeced3619b804d20d2396576f6d5bc8dc8226913/ptuning/main.py#L397

请问这个原因是啥?

evaluate.sh入参 --max_source_length 512 --max_target_length 512 可以触发

Environment
- OS: centos 8
- Python:3.9
- Transformers:4.26.1
- PyTorch:1.12
- CUDA Support True

回答

5

另发现padding较多也会输出为空。

1

I met the same question!

1

同问!! 想知道PRE_SEQ_LEN、max_source_length和max_traget_length的关系是什么?

5

In my case, the model prediction hypothesis only contains one newline character, which causes the rouge calculation error, so we need to judge the model output and skip the empty output.

image

To solve this problem, you need to change ptuning/main.py#L327 to the following code:

            hypothesis = ' '.join(hypothesis)
            reference = ' '.join(reference)
            if not hypothesis.strip() or not reference.strip():
                continue
            scores = rouge.get_scores(hypothesis , reference)
8

同样遇到。 明显的bug,这里eval和predict的长度应该和train.sh的参数保持一致,否则tokenizer有问题,推理后解码出来全是