必威体育Betway必威体育官网
当前位置:首页 > IT技术

Tesseract识别指定字符范围的字符

时间:2019-07-30 02:13:30来源:IT技术作者:seo实验室小编阅读:80次「手机版」
 

roads untraveled

可以通过配置Tesseract来使用Tesseract进行OCR,OpenCV和opencv的C#版本Emgu都集成了Tesseract这个工具

但是在使用时经常会出现误判,比如把“s”识别成“5”,把“1”识别成“l”或“i”。可以设置相应的参数来识别指定范围的字符。

下面是Emgu中关于这个函数的API文档:

Emgu.CV.OCR.Tesseract.Tesseract(string, string, Emgu.CV.OCR.Tesseract.OcrEngineMode, string)

public Tesseract(string dataPath, string language, Emgu.CV.OCR.Tesseract.OcrEngineMode mode, string whiteList)

   Member of Emgu.CV.OCR.Tesseract

Summary:

Create an tesseract OCR engine.

parameters:

dataPath: The datapath must be the name of the parent directory of tessdata and must end in / . Any name after the last / will be stripped.

language: The language is (usually) an ISO 639-3 string or NULL will default to eng.  It is entirely safe (and eventually will be efficient too) to call init multiple times on the same instance to change language, or just to reset the classifier.  The language may be a string of the form [~]%lt;lang>[+[~]<lang>]* indicating that multiple languages are to be loaded. Eg hin+eng will load Hindi and English. Languages may specify internally that they want to be loaded with one or more other languages, so the ~ sign is available to override that. Eg if hin were set to load eng by default, then hin+~eng would force loading only hin. The number of loaded languages is limited only by memory, with the caveat that loading additional languages will impact both speed and accuracy, as there is more work to do to decide on the APPlicable language, and there is more chance of hallucinating incorrect words.

mode: OCR engine mode

whiteList: This can be used to specify a white list for OCR. e.g. specify "1234567890" to recognize digits only. Note that the white list currently seems to only work with OcrEngineMode.OEM_TESSERACT_ONLY

Tesseract tesseract = new Tesseract();

tesseract.Init(path, lang,Tesseract.OcrEngineMode.OEM_TESSERACT_ONLY);//path为语言包路径,lang为语言

tesseract.SetVariable("tessedit_char_whitelist", "0123456789");

上面代码可以实现只识别数字,这样就会大大提高识别的准度。把“0123456789”改成“abcdefghijkmnopqrstuvwxyz”即可只识别出字母。

设置前:

设置后:

尴尬貌似还是会失误啊。。。

大笑不要在乎这些细节

相关阅读

快速学习COSMIC方法之十一:如何识别输入?

在COSMIC方法中,功能处理可拆分为四种数据移动:输入,输出,读,写。数据移动是最小的、不可再拆分的、软件内部的动作。在数据移动中包含

Oracle中字符串截取常用方法总结

substr 函数:截取字符串 语法:SUBSTR(string,start, [length]) string:表示源字符串,即要截取的字符串。 start:开始位置,从1开始查找

人脸识别三大经典算法(附PDF下载、经典论文列表)

后台回复“1814

几款OCR识别软件

分享一下我老师大神的人工智能教程!零基础,通俗易懂!http://blog.csdn.net/jiangjunshow也欢迎大家转载本篇文章。分享知识,造福人民,

atoi()和stoi()的区别----数字字符串的处理

相同点: ①都是C++的字符处理函数,把数字字符串转换成int输出 ②头文件都是#include<cstring> 不同点: ①atoi()的参数是 const

分享到:

栏目导航

推荐阅读

热门阅读