• Apache OpenNLP的初探


    https://blog.csdn.net/Richard_vi/article/details/78909939?utm_medium=distribute.pc_relevant.none-task-blog-2%7Edefault%7EBlogCommendFromBaidu%7Edefault-5.control&depth_1-utm_source=distribute.pc_relevant.none-task-blog-2%7Edefault%7EBlogCommendFromBaidu%7Edefault-5.control

    环境:IDEA+jdk8+maven 3.5.2
    新建maven项目,添加nlp的maven依赖:

    <dependency>
    <groupId>org.apache.opennlp</groupId>
    <artifactId>opennlp-tools</artifactId>
    <version>1.8.4</version>
    </dependency>

    然后就可以使用nlp的开发工具了。我们来看一些实例:

        //divide sentences
        public static void SentenceDetect() throws IOException {
            String paragraph = "Hi. How are you? This is JD_Dog. He is my good friends.He is very kind.but he is no more handsome than me. ";
            InputStream is = new FileInputStream("E:\NLP_Practics\models\en-sent.bin");
            SentenceModel model = new SentenceModel(is);
            SentenceDetectorME sdetector = new SentenceDetectorME(model);
            String sentences[] = sdetector.sentDetect(paragraph);
            for (String single : sentences) {
                System.out.println(single);
            }
            is.close();
        }
    

      

    这是一个英文分词的实例,我们首先要去下载英文分词的模型,在这里,我将它放到了E:NLP_Practicsmodels目录下。
    关于更多模型的下载可以在地址:
    http://maven.tamingtext.com/opennlp-models/models-1.5/
    中找到。
    我们来看下对应的输出结果:

    Hi. How are you?
    This is JD_Dog.
    He is my good friends.He is very kind.but he is no more handsome than me.
    

      是不是很神奇呢?哈哈哈也没什么可神奇的。这里只是使用现有的一个简单模型做了一个示范,模型是从大量的训练数据中具象出来的,因此分析的结果好坏还要取决于你使用的模型。
    我们再看一个英文分词的例子:

    //devide words
        public static void Tokenize() throws IOException {
            InputStream is = new FileInputStream("E:\NLP_Practics\models\en-token.bin");
            TokenizerModel model = new TokenizerModel(is);
            Tokenizer tokenizer = new TokenizerME(model);
            String tokens[] = tokenizer.tokenize("Hi. How are you? This is Richard. Richard is still single. please help him find his girl");
            for (String a : tokens)
                System.out.println(a);
            is.close();
        }
    

      运行结果:

    Hi
    .
    How
    are
    you
    ?
    This
    is
    Richard
    .
    Richard
    is
    still
    single
    .
    please
    help
    him
    find
    his
    girl
    

      

    完整测试代码:

    package package01;
    
    import opennlp.tools.sentdetect.SentenceDetectorME;
    import opennlp.tools.sentdetect.SentenceModel;
    import opennlp.tools.tokenize.Tokenizer;
    import opennlp.tools.tokenize.TokenizerME;
    import opennlp.tools.tokenize.TokenizerModel;
    
    import java.io.FileInputStream;
    import java.io.IOException;
    import java.io.InputStream;
    
    public class Test01 {
    
        //divide sentences
        public static void SentenceDetect() throws IOException {
            String paragraph = "Hi. How are you? This is JD_Dog. He is my good friends.He is very kind.but he is no more handsome than me. ";
            InputStream is = new FileInputStream("E:\NLP_Practics\models\en-sent.bin");
            SentenceModel model = new SentenceModel(is);
            SentenceDetectorME sdetector = new SentenceDetectorME(model);
            String sentences[] = sdetector.sentDetect(paragraph);
            for (String single : sentences) {
                System.out.println(single);
            }
            is.close();
        }
    
        //devide words
        public static void Tokenize() throws IOException {
            InputStream is = new FileInputStream("E:\NLP_Practics\models\en-token.bin");
            TokenizerModel model = new TokenizerModel(is);
            Tokenizer tokenizer = new TokenizerME(model);
            String tokens[] = tokenizer.tokenize("Hi. How are you? This is Richard. Richard is still single. please help him find his girl");
            for (String a : tokens)
                System.out.println(a);
            is.close();
        }
    
        public static void main(String[] args) throws IOException {
    //        Test01.SentenceDetect();
            Test01.Tokenize();
        }
    
    }
    

      

    https://github.com/godmaybelieve
  • 相关阅读:
    活动安排
    中国剩余定理
    欧拉回路
    单词游戏
    Ant Trip
    John's Trip
    太鼓达人
    相框
    原始生物
    Blockade
  • 原文地址:https://www.cnblogs.com/yuyu666/p/15029427.html
Copyright © 2020-2023  润新知