已查看源码实现,infer_kie_token_ser_re 参数infer_mode为False会将文件夹图片标注的val.json 的标注框信息 原样返回,请问这样设计的目的是什么?
val.json内容如下
b33.jpg [{"transcription": "No", "label": "question", "points": [[2882, 472], [3026, 472], [3026, 588], [2882, 588]], "id": 0, "linking": [[0, 1]]}, {"transcription": "12269563", "label": "answer", "points": [[3066, 448], [3598, 448], [3598, 576], [3066, 576]], "id": 1, "linking": [[0, 1]]}, {"transcription": "开票日期", "label": "question", "points": [[3014, 660], [3290, 660], [3290, 744], [3014, 744]], "id": 2, "linking": [[2, 3]]}, {"transcription": "2016年06月12日", "label": "answer", "points": [[3390, 680], [3882, 680], [3882, 748], [3390, 748]], "id": 3, "linking": [[2, 3]]}, {"transcription": "名称", "label": "question", "points": [[550, 868], [946, 868], [946, 928], [550, 928]], "id": 4, "linking": [[4, 5]]}, {"transcription": "深圳市购机汇网络有限公司", "label": "answer", "points": [[1046, 887], [1778, 887], [1778, 947], [1046, 947]], "id": 5, "linking": [[4, 5]]}, {"transcription": "纳税人识别号", "label": "question", "points": [[558, 960], [938, 960], [938, 1024], [558, 1024]], "id": 6, "linking": [[6, 7]]}, {"transcription": "440300083885931", "label": "answer", "points": [[1102, 976], [1878, 976], [1878, 1036], [1102, 1036]], "id": 7, "linking": [[6, 7]]}, {"transcription": "地址、电话", "label": "question", "points": [[550, 1056], [946, 1056], [946, 1112], [550, 1112]], "id": 8, "linking": [[8, 9]]}, {"transcription": "深圳市龙华新区民治街道民治大道展消科技大厦A12070755-23806606", "label": "answer", "points": [[1054, 1060], [2394, 1060], [2394, 1124], [1054, 1124]], "id": 9, "linking": [[8, 9]]}, {"transcription": "开户行及账号", "label": "question", "points": [[546, 1152], [938, 1152], [938, 1208], [546, 1208]], "id": 10, "linking": [[10, 11]]}, {"transcription": "中国工商银行股份有限公司深圳园岭支行4000024709200172809", "label": "answer", "points": [[1058, 1152], [2438, 1152], [2438, 1216], [1058, 1216]], "id": 11, "linking": [[10, 11]]}, {"transcription": "金额", "label": "question", "points": [[2882, 1204], [3138, 1204], [3138, 1272], [2882, 1272]], "id": 12, "linking": [[12, 13]]}, {"transcription": "¥2987.18", "label": "answer", "points": [[2966, 1884], [3326, 1884], [3326, 1976], [2966, 1976]], "id": 13, "linking": [[12, 13]]}, {"transcription": "税率", "label": "question", "points": [[3294, 1188], [3454, 1188], [3454, 1252], [3294, 1252]], "id": 14, "linking": [[14, 15]]}, {"transcription": "17%", "label": "answer", "points": [[3350, 1284], [3466, 1284], [3466, 1372], [3350, 1372]], "id": 15, "linking": [[14, 15]]}, {"transcription": "税颜", "label": "question", "points": [[3610, 1176], [3862, 1176], [3862, 1248], [3610, 1248]], "id": 16, "linking": [[16, 17]]}, {"transcription": "¥507.82", "label": "answer", "points": [[3710, 1864], [4030, 1864], [4030, 1956], [3710, 1956]], "id": 17, "linking": [[16, 17]]}, {"transcription": "价税合计", "label": "question", "points": [[562, 2060], [894, 2060], [894, 2148], [562, 2148]], "id": 18, "linking": [[18, 19]]}, {"transcription": "¥3495.00", "label": "answer", "points": [[3350, 1992], [3766, 1992], [3766, 2088], [3350, 2088]], "id": 19, "linking": [[18, 19]]}]
当我将infer_mode改为true时,发生了另外一个错误:
FatalError: Segmentation fault is detected by the operating system.
[TimeInfo: Aborted at 1684757213 (unix time) try "date -d @1684757213" if you are using GNU date ]
[SignalInfo: SIGSEGV @.) received by PID 145099 (TID 0x7fb5630082c0) from PID 0 ***]
我打断点查看到我传入参数为use_gpu出现该错,初步判断为我的cunn版本与cuda不对应,该问题由我先自行排错后,我再次询问您。
我深入查看源码label_ops.py的_load_ocr_info方法中infer_mode为true时 会调用paddleocr.ocr方法 使用ch_PP-OCRv3_rec_infer和ch_PP-OCRv3_det_infer模型 识别后 效果不佳 图片的每一段落文字或者文字中有间隔 它的识别率非常低,我已经基于发票案例识别出了语义实体识别和关系抽取的模型,是否还需要将文字检测和文字识别模型一同训练呢??
最后paddle-ocr在gitee的案例中:
https://gitee.com/paddlepaddle/PaddleOCR/blob/release/2.6/applications/%E5%8F%91%E7%A5%A8%E5%85%B3%E9%94%AE%E4%BF%A1%E6%81%AF%E6%8A%BD%E5%8F%96.md#43-%E8%AF%AD%E4%B9%89%E5%AE%9E%E4%BD%93%E8%AF%86%E5%88%AB-semantic-entity-recognition
4.4.4 模型预测 infer_mode是不是应该为true 模型推理infer_mode是不是应该为false,另外可否基于本案例或其他案例增加一个完整的服务化部署的文档案例 paddleHub或paddleServing都可以,十分期待您的回复,我困在这个问题已经一个星期了。