-
庖丁分词工具使用
普通类 -
- 支持
- 批判
- 提问
- 解释
- 补充
- 删除
-
-
步骤一 下载并配置庖丁分词压缩包
下载paoding-analysis-2.0.4-beta.zip包,解压到本地。(注意路径不要含中文名。)
编辑paoding-analysis.jar中的paoding-dic-home.properties文件,去掉“#paoding.dic.home=dic”前面的#号,并将等号后面的dic改为dic文件夹在你本地存放的具体路径,如:E://laboratory//learningcell//paoding-analysis-2.0.4-beta//dic
(注:编辑paoding-analysis.jar中的文件,可先用WinRAR将paoding-analysis.jar打开,再用记事本或写字板直接编辑paoding-dic-home.properties文件后保存即可)
-
步骤二 编写示例
- 新建工程,将paoding-analysis.jar、commons-logging.jar、lucene-analyzers-2.2.0.jar和lucene-core-2.2.0.jar四个包导入工程
- 创建一个主类,编写测试小程序,如下:
import java.io.*;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.Token;
import org.apache.lucene.analysis.TokenStream;
import net.paoding.analysis.analyzer.PaodingAnalyzer;
public class FenciTest {
public static void main(String[] args)
{
Analyzer analyzer = new PaodingAnalyzer();
String docText = null;
File file = new File("F://test1.txt");
docText = readText(file);
TokenStream tokenStream = analyzer.tokenStream(docText, new StringReader(docText));
try {
Token t;
//System.out.println(docText);
while ((t = tokenStream.next()) != null)
{
System.out.println(t);
}
} catch (IOException e) {
e.printStackTrace();
}
}
private static String readText(File file) {
String text = null;
try
{
InputStreamReader read1 = new InputStreamReader(new FileInputStream(file), "GBK");
BufferedReader br1 = new BufferedReader(read1);
StringBuffer buff1 = new StringBuffer();
while((text = br1.readLine()) != null)
{
buff1.append(text + "/r/n");
}
br1.close();
text = buff1.toString();
}
catch(FileNotFoundException e)
{
System.out.println(e);
}
catch(IOException e)
{
System.out.println(e);
}
return text;
}
}txt文档内容:中华人民共和国 五四运动 唐瑶 你好吗
运行结果:
-
-
- 标签:
- file
- import
- string
- null
- doctext
- paoding-analysis.jar
- text
- system.out.println
- catch
- dic
-
学习元评论 (0条)
聪明如你,不妨在这 发表你的看法与心得 ~