软工作业二 201621044079韩烨

1.码云项目地址

https://gitee.com/HYSOUL/PersonalProject-Java/tree/master/

2.PSP表格

PSP2.1	个人开发流程	预估耗费时间（分钟）	实际耗费时间（分钟）
Planning	计划	30	30
· Estimate	明确需求和其他相关因素，估计每个阶段的时间成本	30	30
Development	开发	500	580
· Analysis	需求分析 (包括学习新技术)	50	100
· Design Spec	生成设计文档	50	50
· Design Review	设计复审	30	10
· Coding Standard	代码规范	30	10
· Design	具体设计	100	80
· Coding	具体编码	150	180
· Code Review	代码复审	30	20
· Test	测试（自我测试，修改代码，提交修改）	60	130
Reporting	报告	80	90
·	测试报告	30	30
·	计算工作量	20	20
·	并提出过程改进计划	30	40

3.解题思路描述

题目要求如下：

统计文件的字符数：
- 只需要统计Ascii码，汉字不需考虑
- 空格，水平制表符，换行符，均算字符
- 统计文件的单词总数，单词：以4个英文字母开头，跟上字母数字符号，单词以分隔符分割，不区分大小写。
- 英文字母：A-Z，a-z
- 字母数字符号：A-Z，a-z，0-9
- 分割符：空格，非字母数字符号
- 例：file123是一个单词，1file不是一个单词。file，File和FILE是同一个单词
统计文件的有效行数：任何包含非空白字符的行，都需要统计。
统计文件中各单词的出现次数，最终只输出频率最高的10个。频率相同的单词，优先输出字典序靠前的单词。
按照字典序输出到文件result.txt：例如，windows95，windows98和windows2000同时出现时，则先输出windows2000
- 输出的单词统一为小写格式

可以看出，其中需要主要运用到了字符串的操作，其中包括字符串的分割、字符串判断、字符串大小写的转换等等。
另外在统计词频方面还会运用到Map之类的来进行存储。
最后就是关于文件写入写出的操作的使用。

4.设计实现过程

这次的代码中主要包含了两个类：字符处理类以及文件处理类。

字符处理类
- getCharCount()：实现字符个数的统计
- getLineCount()：实现有效行数的计算
- getWordCount()：实现单词数的统计
- getWordFreq()：实现单词词频的统计
文件处理类
- readFile()：实现文件的读取操作
- writeFile()：实现文件的写入操作

5.代码说明

1.getCharCount()函数

该函数通过遍历字符进行判断，以此来达到统计字符数的效果，要注意此题中中文并不算字符，因此只要统计ascll码中的字符以及一些特殊字符即可。

	public int getCharCount() // 统计文件字符数
	{
		char c;
		int i = 0;
		while (i < text.length()) {
			c = text.charAt(i);
			if (c >= 32 && c <= 126 || c == '
' || c == '
' || c == '	') {
				charNum++;
			}
			i++;
		}
		return charNum;
	}

2.getWordCount()函数

该函数用于判断单词的个数，先根据空白字符来进行分词操作，然后再根据单词的要求，即长度，格式等规定，进行判断，然后返回总数。。

	public int getWordCount() // 统计单词总数
	{
		String t = text;
		String[] spWord = t.split("\s"); // 分词
		for (int i = 0; i < spWord.length; i++) {
			if (spWord[i].length() < 4) { // 判断长度是否大于等于4
				continue;
			} else {
				int flag = 1; // 判断字符串的前四位是否是英文字母
				char c;
				for (int j = 0; j < 4; j++) {
					c = spWord[i].charAt(j);
					if (!(c >= 'A' && c <= 'Z' || c >= 'a' && c <= 'z')) {
						flag = 0;
					}
				}
				if (flag == 1) {
					wordCount++;
				}
			}
		}
		return wordCount;
	}

3.getLineCount()函数

该函数用于统计有效行数，先将每一行进行分割，然后存入字符数组，然后遍历数组，判断字符串去掉空格之后的长度是否为0，如果为0即为无效行数。最后返回有效行数的个数。

 public int getLineCount() { // 统计有效行数

     String[] line = text.split("
"); // 将每一行分开放入一个字符串数组
     for (int i = 0; i < line.length; i++) { // 找出无效行，统计有效行

         if (line[i].trim().length() == 0)
             continue;
         ValidLine = ValidLine + 1;
     }
     return ValidLine;
 }

4.getWordFreq()函数

该函数用于单词词频的统计，先按照单词的判断方法进行判断一个词是否是单词，然后使用一个Map进行存储，具体方法为先判断Map中之前是否存过该数据，若没有存过，则将其存入并置value值为1，若存过，则将该值的value值加1。最后将即可map根据要求进行排序即可

	public List getWordFreq() { // 对单词词频的Map进行排序

		wordFreq = new HashMap<String, Integer>();
		String t = text;

		String[] spWord = t.split("\s"); // 分词
		for (int i = 0; i < spWord.length; i++) {
			if (spWord[i].length() < 4) {
				continue;
			} else {

				int flag = 1;
				char c;

				for (int j = 0; j < 4; j++) {
					c = spWord[i].charAt(j);

					if (!(c >= 'A' && c <= 'Z' || c >= 'a' && c <= 'z')) {
						flag = 0;
					}
				}
				if (flag == 1) {
					spWord[i] = spWord[i].trim().toLowerCase();
					if (wordFreq.get(spWord[i]) == null) {
						wordFreq.put(spWord[i], 1);
					} else
						wordFreq.put(spWord[i], wordFreq.get(spWord[i]) + 1);

				}
			}
		}

		List<Map.Entry<String, Integer>> list = new ArrayList<Map.Entry<String, Integer>>(wordFreq.entrySet());
		Collections.sort(list, new Comparator<Map.Entry<String, Integer>>() {

			@Override
			public int compare(Entry<String, Integer> o1, Entry<String, Integer> o2) { // 对Map中内容进行排序，先按词频后按字典顺序
				if (o1.getValue() == o2.getValue()) {
					return o1.getKey().compareTo(o2.getKey());
				}
				return o2.getValue() - o1.getValue();
			}

		});
		return list;
	}

6.单元测试

测试文件
测试代码

	@Test
	public void testGetCharCount() throws IOException {//统计字符数量测试
		FileDeal fd = new FileDeal();
		String text1 = fd.FileToString("text/text1.txt");	
	
		WordDeal wd1 = new WordDeal(text1);
	
		int cn1 = wd1.getCharCount();
	
	}

	@Test
	public void testGetWordCount() throws IOException {//统计单词数量测试
		FileDeal fd = new FileDeal();
		String text1 = fd.FileToString("text/text1.txt");
		WordDeal wd1 = new WordDeal(text1);
		int wn1 = wd1.getWordCount();
		
	}

	@Test
	public void testGetWordFreq() throws IOException {//统计词频测试
		
		FileDeal fd = new FileDeal();
		String text1 = fd.FileToString("text/text1.txt");
		WordDeal wd1 = new WordDeal(text1);
		List wf1 = wd1.getWordFreq();
		
	}


	@Test
	public void testGetLineCount() throws IOException {//统计有效行数测试
		FileDeal fd = new FileDeal();
		String text1 = fd.FileToString("text/text1.txt");
		WordDeal wd1 = new WordDeal(text1);
		int wn1 = wd1.getLineCount();
	
	}

测试截图

7.心得体会

这次的实验最难的地方还是每一步的规范，以前写代码的时候并不会注意到那么多，所以在规范上花的时间还蛮多的。然后就是在写代码的时候忘记了很多Java的语法，特别是词频统计的函数，用到了比较器之类的，也是花了很多时间。然后在异常处理方面考虑还是不够多，下次会注意这方面的内容。

相关阅读:
Jenkins job 之间实现带参数触发
 svn hooks post-commit钩子自动部署
 Ubuntu PPA软件源
 图片文字OCR识别-tesseract-ocr
scala 学习笔记十元组
 scala 学习笔记九定义操作符
 scala 学习笔记八简洁性
 scala 学习笔记七基于类型的模式匹配
 scala 学习笔记六推导
 scala 学习笔记五 foreach, map, reduce
原文地址：https://www.cnblogs.com/HYSOUL/p/9664682.html