模型: rec_svtr_large_ch 训练集数量:580W (vocab字符数量7720,简体 + 繁体)
问题描述: 训练了大概3-4个epoch后,继续多次调整learning rate至0.000004,batch size至32,训了了一个epoch loss都无法继续下降(loss保持在1.8 - 2.3左右波动),验证集acc在也没有提升。
问题:
- 请问针对svtr large模型,batch size和learning rate一般可以最低设置多少可以有效训练?
- 我怀疑是我的训练集有问题,因为我的训练集有长文本和超短文本(甚至单个字)所以图片size差别很大(如下图),请问这样设置是否合理?是否需要在控制在(32,320)上下才能有效训练否则无法收敛吗?
非常感谢百度Paddle团队带来的开源技术,也谢谢帮忙解答!
另外这是我的配置文件
Global:
use_gpu: True
epoch_num: 2
log_smooth_window: 100
print_batch_step: 100
save_model_dir: ./output/rec/rec_svtr_large_ch/
save_epoch_step: 1
evaluation is run every 2000 iterations after the 0th iteration
eval_batch_step: [0, 10000]
cal_metric_during_train: True
pretrained_model: ./output/rec/rec_svtr_large_ch/best_accuracy
checkpoints:
save_inference_dir:
use_visualdl: False
infer_img: doc/imgs_words/ch/word_1.jpg
for data or label process
character_dict_path: vocab.txt
max_text_length: 40
infer_mode: False
use_space_char: True
save_res_path: ./output/rec/predicts_svtr_large_ch.txt
Optimizer:
name: AdamW
beta1: 0.9
beta2: 0.99
epsilon: 0.00000008
weight_decay: 0.05
no_weight_decay_name: norm pos_embed
one_dim_param_no_weight_decay: true
lr:
name: Cosine
learning_rate: 0.000004
warmup_epoch: 0
Architecture:
model_type: rec
algorithm: SVTR
Transform:
name: STN_ON
tps_inputsize: [32, 64]
tps_outputsize: [32, 320]
num_control_points: 20
tps_margins: [0.05,0.05]
stn_activation: none
Backbone:
name: SVTRNet
img_size: [32, 320]
out_char_num: 40
out_channels: 384
patch_merging: 'Conv'
embed_dim: [192, 256, 512]
depth: [3, 9, 9]
num_heads: [6, 8, 16]
mixer: ['Local','Local','Local','Local','Local','Local','Local','Local','Local','Local','Global','Global','Global','Global','Global','Global','Global','Global','Global','Global','Global']
local_mixer: [[7, 11], [7, 11], [7, 11]]
prenorm: False
Neck:
name: SequenceEncoder
encoder_type: reshape
Head:
name: CTCHead
Loss:
name: CTCLoss
PostProcess:
name: CTCLabelDecode
Metric:
name: RecMetric
main_indicator: acc
Train:
dataset:
name: SimpleDataSet
data_dir: /home/aistudio/text_renderer/output/TCSynth_raw
label_file_list: /home/aistudio/text_renderer/output/TCSynth_raw/train.txt
transforms:
- DecodeImage: # load image
img_mode: BGR
channel_first: False
- CTCLabelEncode: # Class handling label
- SVTRRecResizeImg:
image_shape: [3, 32, 320]
padding: False
- KeepKeys:
keep_keys: ['image', 'label', 'length'] # dataloader will return list in this order
loader:
shuffle: True
batch_size_per_card: 48
drop_last: True
num_workers: 12
Eval:
dataset:
name: SimpleDataSet
data_dir: /home/aistudio/text_renderer/output/label/
label_file_list: /home/aistudio/text_renderer/output/label/label.txt
transforms:
- DecodeImage: # load image
img_mode: BGR
channel_first: False
- CTCLabelEncode: # Class handling label
- SVTRRecResizeImg:
image_shape: [3, 32, 320]
padding: False
- KeepKeys:
keep_keys: ['image', 'label', 'length'] # dataloader will return list in this order
loader:
shuffle: False
drop_last: False
batch_size_per_card: 64
num_workers: 2