(ocrfeat): 增加 PDF 文件文字识别功能

- 实现了 ITesseractOcrService接口中的 recognizePdfText 方法- 添加了 PDFBox 依赖用于处理 PDF 文件 - 在 TesseractOcrServiceImpl 中实现了 PDF 文件的文字提取和清理 - 在 WmsPurchasePlanController 中添加了识别 PDF 文件文字的 API 接口
2025-08-04 10:18:17 +08:00
parent 04bcf53116
commit 831695e236
4 changed files with 94 additions and 5 deletions
--- a/klp-wms/pom.xml
+++ b/klp-wms/pom.xml
@@ -28,7 +28,12 @@
            <artifactId>tess4j</artifactId>
            <version>5.11.0</version>
        </dependency>
-
+        <!-- https://mvnrepository.com/artifact/org.apache.pdfbox/pdfbox -->
+        <dependency>
+            <groupId>org.apache.pdfbox</groupId>
+            <artifactId>pdfbox</artifactId>
+            <version>2.0.29</version>
+        </dependency>

    </dependencies>
 </project>