[PaddlePaddle/PaddleOCR]文本方向分类器训练报错

2024-05-13 299 views
4
训练时报错: Error Message Summary:

FatalError: Process abort signal is detected by the operating system. [TimeInfo: Aborted at 1686708594 (unix time) try "date -d @1686708594" if you are using GNU date ] [SignalInfo: SIGABRT (@0x403002b491a) received by PID 2836762 (TID 0x7f698fd8d3c0) from PID 2836762 ]

Traceback (most recent call last): File "tools/train.py", line 208, in main(config, device, logger, vdl_writer) File "tools/train.py", line 183, in main amp_level, amp_custom_black_list) File "/data/heys/钢筋算量符号/paddleocr_cls/tools/program.py", line 258, in train for idx, batch in enumerate(train_dataloader): File "/data/heys/anaconda3/envs/paddle/lib/python3.7/site-packages/paddle/fluid/dataloader/dataloader_iter.py", line 746, in next Exception in thread Thread-1: Traceback (most recent call last): File "/data/heys/anaconda3/envs/paddle/lib/python3.7/site-packages/paddle/fluid/dataloader/dataloader_iter.py", line 620, in _get_data data = self._data_queue.get(timeout=self._timeout) File "/data/heys/anaconda3/envs/paddle/lib/python3.7/multiprocessing/queues.py", line 105, in get raise Empty _queue.Empty

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/data/heys/anaconda3/envs/paddle/lib/python3.7/threading.py", line 926, in _bootstrap_inner self.run() File "/data/heys/anaconda3/envs/paddle/lib/python3.7/threading.py", line 870, in run self._target(*self._args, **self._kwargs) File "/data/heys/anaconda3/envs/paddle/lib/python3.7/site-packages/paddle/fluid/dataloader/dataloader_iter.py", line 534, in _thread_loop batch = self._get_data() File "/data/heys/anaconda3/envs/paddle/lib/python3.7/site-packages/paddle/fluid/dataloader/dataloader_iter.py", line 636, in _get_data "pids: {}".format(len(failed_workers), pids)) RuntimeError: DataLoader 8 workers exit unexpectedly, pids: 2836759, 2836760, 2836761, 2836762, 2836763, 2836764, 2836765, 2836766

data = self._reader.read_next_var_list()

SystemError: (Fatal) Blocking queue is killed because the data reader raises an exception. [Hint: Expected killed != true, but received killed:1 == true:1.] (at /paddle/paddle/fluid/operators/reader/blocking_queue.h:166) 好像是数据有问题,但不知道具体错在哪

回答

5

确保数据准备是否符合模型训练所需要的格式

2

确保数据准备是否符合模型训练所需要的格式

image z这是我的数据集文件结构, image zhe'sh这是我的数据格式。 image zhe'sh这是我生成数据的代码。请帮我看一下是哪里的问题。

0

确保数据准备是否符合模型训练所需要的格式

image 这是配置文件里面的数据集以及对应标签的路径。

0

目前只支持0和180分类,详细参考文档https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/doc/doc_ch/angle_class.md#1-%E6%96%B9%E6%B3%95%E4%BB%8B%E7%BB%8D

5

目前只支持0和180分类,详细参考文档https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/doc/doc_ch/angle_class.md#1-%E6%96%B9%E6%B3%95%E4%BB%8B%E7%BB%8D

我知道,但我想尝试四分类的,所以说我这个错误是4分类导致的?

5

四分类需要改代码的,数据读取和网络部分都需要改

3

四分类需要改代码的,数据读取和网络部分都需要改

可以请教一下具体怎么改么?

7

自己看下源码哈,主要是类别部分,原来默认是2类

9

自己看下源码哈,主要是类别部分,原来默认是2类

我主要不太清楚要修改那个py文件

7

改一下配置文件里面head部分的class_dim,然后改一下编码解码label_list参数,也可以通过直接改配置文件,在相应字段下面加入参数就行

2

改一下配置文件里面head部分的class_dim,然后改一下编码解码label_list参数,也可以通过直接改配置文件,在相应字段下面加入参数就行

编码解码的label_list参数在哪里改呢,或者是配置文件具体怎么加入参数?

9

改一下配置文件里面head部分的class_dim,然后改一下编码解码label_list参数,也可以通过直接改配置文件,在相应字段下面加入参数就行

配置文件中的label_list中我已经修改成了四分类。 Global: use_gpu: True epoch_num: 100 log_smooth_window: 20 print_batch_step: 10 save_model_dir: ./output/cls/mv3/ save_epoch_step: 3

evaluation is run every 5000 iterations after the 4000th iteration

eval_batch_step: [0, 1000] cal_metric_during_train: True pretrained_model: checkpoints: save_inference_dir: use_visualdl: False infer_img: doc/imgs_words_en/word_10.png label_list: ['0','90','180','270']

Architecture: model_type: cls algorithm: CLS Transform: Backbone: name: MobileNetV3 scale: 0.35 model_name: small Neck: Head: name: ClsHead class_dim: 4

Loss: name: ClsLoss

Optimizer: name: Adam beta1: 0.9 beta2: 0.999 lr: name: Cosine learning_rate: 0.001 regularizer: name: 'L2' factor: 0

PostProcess: name: ClsPostProcess

Metric: name: ClsMetric main_indicator: acc

Train: dataset: name: SimpleDataSet data_dir: ./train_data/cls label_file_list:

  • ./train_data/cls/cls_gt_train.txt transforms:
  • DecodeImage: # load image img_mode: BGR channel_first: False
  • ClsLabelEncode: # Class handling label
  • BaseDataAugmentation:
  • RandAugment:
  • ClsResizeImg: image_shape: [3, 48, 192]
  • KeepKeys: keep_keys: ['image', 'label'] # dataloader will return list in this order loader: shuffle: True batch_size_per_card: 512 drop_last: True num_workers: 8

Eval: dataset: name: SimpleDataSet data_dir: ./train_data/cls label_file_list:

  • ./train_data/cls/cls_gt_test.txt transforms:
  • DecodeImage: # load image img_mode: BGR channel_first: False
  • ClsLabelEncode: # Class handling label
  • ClsResizeImg: image_shape: [3, 48, 192]
  • KeepKeys: keep_keys: ['image', 'label'] # dataloader will return list in this order loader: shuffle: False drop_last: False batch_size_per_card: 512 num_workers: 0
4

不是已经给你发了代码链接吗,直接改源码就行,或者你改配置文件

1

不是已经给你发了代码链接吗,直接改源码就行,或者你改配置文件

你发的是个分类训练评估的文档,这我已经看过了,所以我上面的配置文件合适么

4

改一下配置文件里面head部分的class_dim,然后改一下编码解码label_list参数,也可以通过直接改配置文件,在相应字段下面加入参数就行

这里不是给链接了吗

6

看到了

7

改一下配置文件里面head部分的class_dim,然后改一下编码解码label_list参数,也可以通过直接改配置文件,在相应字段下面加入参数就行

这里不是给链接了吗 看到了,具体怎么修改呢,是要修改label_list的长度么