Python 爬取热词并进行分类数据分析-[数据修复]

日期：2020.02.01

博客期：140

星期六

　　【本博客的代码如若要使用，请在下方评论区留言，之后再用（就是跟我说一声）】

　　所有相关跳转：

　　a.【简单准备】

　　b.【云图制作+数据导入】

　　c.【拓扑数据】

　　d.【数据修复】（本期博客）

　　e.【解释修复+热词引用】

　　 f.【JSP演示+页面跳转】

　　g.【热词分类+目录生成】

　　h.【热词关系图+报告生成】

　　i . 【App制作】

　　j . 【安全性改造】

　　今天问了一下老师，好像是之前数据爬取的内容就不对，不应该爬取标签，我仔细想了一下，也确实不是，所以今天我们来爬取IT新闻里的高频词！

　　我大致分了下面几个步骤

　　1、选择想要爬取的网站

　　　　之前那个网站有标签，所以我按照那个爬的，实际上没有必要，随便一个IT新闻网站都可以爬的！而且上一次的爬取网站有很大的问题就是它不能加载太多数据，加载个200次，就基本卡死了！所以我们尽量要找到一个有页数下表的列表类型的网页，要不然就是有“下一页”或“下一篇新闻”类似的链接的网页。

　　　　下面是提供参考的网站：

　　　　　　(1)、IT之家（大概可以爬到700条数据，数据大致横跨7天，推荐每周爬取一次，并进行汇总查重，其中有非信息类新闻夹杂）

　　　　　　(2)、博客园（推荐，大概可以一次爬3000条数据，数据大致横跨2个月零4天，推荐隔2个月爬一次，其中有少量非信息类新闻夹杂，且单项数据的文字数目较少）

　　　　　　(3)、DoNews（这个是针对互联网的）

　　　　　　(4)、ZOL中关村在线（这个只有一页，数据横跨两周，推荐隔13天爬）

　　　　　　(5)、IT界（可以直接一次爬取14969项新闻，其中有少量非信息类新闻夹杂，仅提供一次性爬取，最早数据日期为2012-04-23）

　　　　　　(6)、51CTO（上次推荐的网站，有标签标记和关键索引）

　　　　　　(7)、走廊网（和上面一样是滚动式网站，一样的弊病，还有这个网站分类有IT类，但是内容不完全是IT相关的）

　　　　　　(8)、说IT资讯网（数据都是老数据了，2011年还行，我们要的是热词，不推荐）

　　　　　　国外IT新闻网站推荐博客地址：https://www.cr173.com/html/5311_1.html

　　2、开始针对于网站进行爬取（目标：获得文字内容和网址链接）

　　　　我最终还是决定爬博客园了（我爬我自己），因为数据量足够（虽然不及老师的要求10万，但以上几个网站的数据量都那样吧，想要大量新闻数据...也说不定还有第三次重新数据爬取的博客呢！）

　　　　分析博客园的新闻链接地址

　　　　　　第一页链接：https://news.cnblogs.com/

　　　　　　第二页链接：https://news.cnblogs.com/n/page/2/

　　　　　　第n(n>=2&n<=100)页链接：https://news.cnblogs.com/n/page/{$n}/

　　　　来分析数据项

　　　　　需要爬取标题、内部内容和本地链接，如果需要以“下一篇”的形式做数据跳转，那你还需要爬取下一篇的链接地址

　　　　　爬取数据格式如下：

 1 import codecs
 2 
 3 
 4 class News:
 5     title = ""
 6     info = ""
 7     link = ""
 8 
 9     def __init__(self,title,info,link):
10         self.title = title
11         self.info = info
12         self.link = link
13 
14     def __toString__(self):
15         return self.title+"	"+self.info+"	"+self.link
16 
17     def __toFile__(self,filePath):
18         f = codecs.open(filePath, "a+", 'utf-8')
19         f.write(self.__toString__() + "
")
20         f.close()

News.py

　　　　　　数据处理以后对应格式如下：

import codecs


class KeyWords:
    word = ""
    link = ""
    num = 0

    def __init__(self,word,link,num):
        self.word = word
        self.link = link
        self.num = num

    def __toString__(self):
        return self.word +"	"+str(self.num)+"	"+self.link

    def __toFile__(self,filePath):
        f = codecs.open(filePath, "a+", 'utf-8')
        f.write(self.__toString__() + "
")
        f.close()

KeyWords.py

　　　　爬取工具编写：

　　　　　　这个工具写了很久，因为博客园爬取需要模拟验证码登录，但你以为我成功找到了自动输入验证码的工具了吗？不！我只是取巧了一下：Canvas的代码我还不太了解，不可能深入去学习的（因为今天必须要爬到数据），嗯，怎么解决呢？你想一下，步枪有全自动的也有半自动的啊！我就不能来个半自动爬取吗？诶！我还真就是这样做的，登录需要点击验证码，我们就使用time.sleep()方法让代码晚一点再执行，等到它模拟出来了验证码，咱们人工给它验证一下！再然后呢？就等着它的数据自动被爬了呗！当然，等待多少时间因你的主机情况和网速而定，网速较慢的话，就给等待时间长一点！

　　　　　　单个新闻页面爬取类

  1 import parsel
  2 from urllib import request
  3 import codecs
  4 from selenium import webdriver
  5 import time
  6 
  7 
  8 # [ 一次性网页爬取的对象 ]
  9 from itWords.retire.Kord import News
 10 
 11 
 12 # [ 对字符串的特殊处理方法-集合 ]
 13 class StrSpecialDealer:
 14     # 取得当前标签内的文本
 15     @staticmethod
 16     def getReaction(stri):
 17         strs = StrSpecialDealer.simpleDeal(str(stri))
 18         strs = strs[strs.find('>')+1:strs.rfind('<')]
 19         return  strs
 20 
 21     # 去除基本的分隔符
 22     @staticmethod
 23     def simpleDeal(stri):
 24         strs = str(stri).replace(" ", "")
 25         strs = strs.replace("	", "")
 26         strs = strs.replace("
", "")
 27         strs = strs.replace("
", "")
 28         return strs
 29 
 30     # 删除所有标签标记
 31     @staticmethod
 32     def deleteRe(stri):
 33         strs = str(stri)
 34         st = strs.find('<')
 35         while(st!=-1):
 36             str_delete = strs[strs.find('<'):strs.find('>')+1]
 37             strs = strs.replace(str_delete,"")
 38             st = strs.find('<')
 39 
 40         return strs
 41 
 42     # 删除带有 日期 的句子
 43     @staticmethod
 44     def de_date(stri):
 45         lines = str(stri).split("。")
 46         strs = ""
 47         num = lines.__len__()
 48         for i in range(0,num):
 49             st = str(lines[i])
 50             if (st.__contains__("年") | st.__contains__("月")):
 51                 pass
 52             else:
 53                 strs += st + "。"
 54         strs = strs.replace("。。", "。")
 55         return strs
 56 
 57     # 取得带有 日期 的句子之前的句子
 58     @staticmethod
 59     def ut_date(stri):
 60         lines = str(stri).split("。")
 61         strs = ""
 62         num = lines.__len__()
 63         for i in range(0, num):
 64             st = str(lines[i])
 65             if (st.__contains__("年")| st.__contains__("月")):
 66                 break
 67             else:
 68                 strs += st + "。"
 69         strs = strs.replace("。。","。")
 70         return strs
 71 
 72     @staticmethod
 73     def beat(stri,num):
 74         strs = str(stri)
 75         for i in range(0,num):
 76             strs = strs.replace("["+str(i)+"]","")
 77 
 78         return  strs
 79 
 80 
 81 class Oranpick:
 82     basicURL = ""
 83     profile = ""
 84 
 85     # ---[定义构造方法]
 86     def __init__(self, url):
 87         self.basicURL = url
 88         self.profile = webdriver.Firefox()
 89         self.profile.get("https://account.cnblogs.com/signin?returnUrl=https%3A%2F%2Fnews.cnblogs.com%2Fn%2F654191%2F")
 90         self.profile.find_element_by_id("LoginName").send_keys("初等变换不改变矩阵的秩")
 91         self.profile.find_element_by_id("Password").send_keys("password") # your password
 92         time.sleep(2)
 93         self.profile.find_element_by_id("submitBtn").click()
 94         # 给予 15s 的验证码人工验证环节
 95         time.sleep(15)
 96         self.profile.get(url)
 97 
 98     # 重新设置
 99     def __reset__(self,url):
100         self.basicURL = url
101         self.profile.get(url)
102 
103     # ---[定义释放方法]
104     def __close__(self):
105         self.profile.quit()
106 
107     # 获取 url 的内部 HTML 代码
108     def getHTMLText(self):
109         a = self.profile.page_source
110         return a
111 
112     # 获取基本数据
113     def getNews(self):
114         index_html = self.getHTMLText()
115         index_sel = parsel.Selector(index_html)
116         context = index_sel.css('#news_title a')[0].extract()
117         context = StrSpecialDealer.getReaction(context)
118         context = StrSpecialDealer.simpleDeal(context)
119         conform = index_sel.css('#news_body')[0].extract()
120         conform = StrSpecialDealer.deleteRe(conform)
121         conform = StrSpecialDealer.simpleDeal(conform)
122         news = News(title=context, info=conform, link=self.basicURL)
123         return news
124 
125 
126 def main():
127     url = "https://news.cnblogs.com/n/654221/"
128     ora = Oranpick(url)
129     # print(ora.getNews().__toString__())
130 
131 
132 
133 # main()

Oranpick.py

　　　　　　新闻页面地址爬取类

 1 import time
 2 
 3 import parsel
 4 from urllib import request
 5 import codecs
 6 
 7 from itWords.retire.Oranpick import Oranpick
 8 
 9 # [ 连续网页爬取的对象 ]
10 
11 
12 class Surapity:
13     page = 1
14     headers = {
15         'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.75 Safari/537.36'}
16     basicURL = ""
17     oran = ""
18 
19     # ---[定义构造方法]
20     def __init__(self):
21         self.page = 1
22         self.basicURL = "https://news.cnblogs.com/"
23         self.oran = Oranpick("https://start.firefoxchina.cn/")
24 
25     def __close__(self):
26         self.oran.__close__()
27 
28     def __next__(self):
29         self.page = self.page + 1
30         self.basicURL = 'https://news.cnblogs.com/n/page/'+str(self.page)+'/'
31 
32     # 获取 url 的内部 HTML 代码
33     def getHTMLText(self):
34         req = request.Request(url=self.basicURL, headers=self.headers)
35         r = request.urlopen(req).read().decode()
36         return r
37 
38     # 获取页面内的基本链接
39     def getMop(self,filePath):
40         index_html = self.getHTMLText()
41         index_sel = parsel.Selector(index_html)
42         links = index_sel.css(".news_entry a::attr(href)").extract()
43         size = links.__len__()
44         for i in range(0,size):
45             link = "https://news.cnblogs.com"+links[i]
46             self.oran.__reset__(link)
47             self.oran.getNews().__toFile__(filePath)
48 
49 
50 def fileReset(filePath):
51     f = codecs.open(filePath, "w+", 'utf-8')
52     f.write("")
53     f.close()
54 
55 
56 def main():
57     filepath = "../../testFile/rc/news.txt"
58     s = Surapity()
59     fileReset(filepath)
60     s.getMop(filepath)
61     s.__next__()
62     s.getMop(filepath)
63     while s.page <= 100:
64         s.__next__()
65         s.getMop(filepath)
66     s.__close__()
67 
68 
69 main()

Surapity.py

　　　　　　这样就能够爬取到相关数据

　　3、利用 Python 的开源 jieba 组件进行中文词频统计

　　　　jieba组件下载地址：https://pypi.org/project/jieba/

　　　　我的下载方法：(确保电脑处于联机状态——就是你联网了，你也可以参照上述官网下载地址的下载方法)

　　　　　　(1)打开PyCharm

　　　　　　(2)在非菜单栏、非窗口、非代码演示部分鼠标右击，并选中"Open in Terminal"

　　　　　　(3) 输入命令（因你运行的 Python 环境而异）

　　　　　　　　easy_install jieba 　　无限制

　　　　　　　　pip install jieba 　　 Python 2 & Python 3

　　　　　　　　pip3 install jieba　　 Python 3

　　　　　　(4)等待其下载完成，如图：　　　　　

　　　　使用方法参照以下博客（本期博客非针对jieba，不再过多赘述）：

　　　　　　(1)、python之jieba模块高频词统计

　　　　　　(2)、Python jieba库的使用说明

　　　　　　(3)、Python大数据：jieba分词，词频统计

　　　　小注：

　　　　　　其实我们对 jieba 组件的使用还有一些问题的，不过我们只要高频词，使用那三种模式应该无所谓了（还是推荐精准模式）

　　4、制作词语筛选部分，并进行封装

　　　　测试文件

《2019年OPPO开放平台年度总结》正式发布
近日，OPPO开放平台通过官微平台发布了《2019年OPPO开放平台年度总结》。
这份年度总结对OPPO智能服务新生态的用户属性、用户偏好、市场增长，以及OPPO开放平台的技术能力和服务能力进行了详细的介绍，帮助开发者及合作伙伴挖掘数据背后的衍生价值，携手共创更优质的用户体验。
ColorOS全球月活超3.2亿，以优质年轻群体为主
根据《2019年OPPO开放平台年度总结》显示，目前ColorOS全球月活跃用户数已超过3.2亿，覆盖国家和地区超过140个。而在国内用户中，25岁~34岁的优质年轻群体占比更是高达63%，24岁以下用户占比为21%，足见OPPO手机设备深受年轻群体所喜爱。
正因如此，OPPO无论是硬件端的产品创新，还是软件端的“黑科技”研发，也都始终迎合年轻群体偏好。如在2019年10月上市的OPPO Reno Ace，其配置为骁龙855 Plus、65W超级闪充、90Hz电竞屏、最高12GB+256GB存储组合，2999元起的高性价比优势，让其开售5分钟销售额破亿，斩获全平台手机单品销量＆销售额双冠军。
此外，该产品搭载OPPO“五大系统能力开放引擎”之一的Hyper Boost，并与游戏厂商深度合作，更充分地发挥了硬件性能。OPPO Reno Ace高性价比的产品配置以及“黑科技”加持，让年轻消费者直呼“这很Ace！”。
OPPO开放平台携手合作伙伴共建智能服务新生态，打造优质用户体验
产品受到用户喜爱，同样也离不开智能服务新生态的建设。OPPO开放平台为了给用户带来更优质的产品体验，将其技术能力深度赋能给合作伙伴，携手合作伙伴合作共赢。
根据《2019年OPPO开放平台年度总结》显示，在OPPO开放平台的应用分发情况分析中，视频播放类、教育学习类、实用工具类APP是最受用户青睐的应用类别。
时代大环境下，OPPO积极建设视频功能迎合用户需求，OPPO短视频业务月活跃用户已突破6000万，每日人均使用时长超过50分钟，为优质的视频内容分发和应用分发，提供了可以结合用户手机操作偏好的又一大渠道。
在短视频类目的软件能力建设方面，OPPO也始终走在创新前沿。当抖音、快手等热门短视频类APP接入“五大系统能力开放引擎”之一的CameraUnit，调用OPPO手机核心功能“超级防抖”，就能够让用户直接拍摄出稳定、清晰的视频。
深度挖掘数据的衍生价值，OPPO早已不再是一家纯粹的手机公司
硬件产品受到年轻用户喜爱，软件能力不断创新，也让OPPO的业务线早已不再局限于手机制造。当前，OPPO已经建设了更为完善的开放生态，除了技术能力加持赋能合作伙伴，依托自身市场优势，也为应用、游戏、快应用、小游戏等产品分发推广和联运提供了更为广阔的发展空间，为各链端合作伙伴提供全方位的服务。
根据《2019年OPPO开放平台年度总结》显示，以OPPO软件商店和游戏中心的全球月活跃用户数已超过3亿，全球日分发次数也超过7.8亿次。同时，OPPO开放平台还在积极扩展自身的业务服务范围，并不断创新服务形式。以应用分发业务为例，通过数据赋能、活动赋能、素材A/B test、活动组建化赋能等形式，帮助开发者实现更加高效的APP运营。
除此之外，OPPO还在科技的各个领域积极探索。例如，在2019年12月19日的2019 OPPO开发者大会上发布IoT“启能行动”，将帮助更多品牌厂商快速实现产品的智能化。
此外，2020年OPPO将继续投入价值10亿资源，为应用、服务、内容、出海领域的优秀合作伙伴，提供开发、流量、营销推广等一系列的资源支持，全方位助力合作伙伴的业务发展；OPPO荣获中文机器阅读理解挑战赛DuReader 2019年度冠军，AI领域再次取得新突破……
由此可见，通过对多维度技术的持续、广泛的布局，OPPO早已不再是一家纯粹的手机公司。据OPPO创始人陈明永介绍，OPPO未来三年将投入500亿研发预算，持续关注5G、人工智能、AR、大数据等前沿技术，并着力构建底层硬件核心技术以及软件工程和系统能力。
OPPO开放平台作为B端业务的主要窗口，这份《2019年OPPO开放平台年度总结》的公布既能让行业窥见到OPPO综合能力的一方天地，也将吸引更多合作伙伴加入OPPO开放平台，合作共创新未来。
查看完整年度总结，请关注OPPO开放平台官方微信公众号“OPPO开发者”或微博“OPPO开放平台”。

ad.txt

　　　　标准规范类

 1 # 新闻段落高频词分析器
 2 import jieba
 3 import jieba.analyse
 4 
 5 
 6 class ToolToMakeHighWords:
 7     test_str = ""
 8 
 9     # 初始化
10     def __init__(self,test_str):
11         self.test_str = str(test_str)
12         pass
13 
14     def buildWithFile(self,filePath,type):
15         file = open(filePath, encoding=type)
16         self.test_str = file.read()
17 
18     def buildWithStr(self,test_str):
19         self.test_str = test_str
20         pass
21 
22     # 统计词
23     def getWords(self,isSimple,isAll):
24         if(isSimple):
25             words = jieba.lcut_for_search(self.test_str)
26             return words
27         else:
28             # True - 全模式 , False - 精准模式
29             words = jieba.cut(self.test_str, cut_all=isAll)
30             return words
31 
32     # 统计词频并排序
33     def getHighWords(self,words):
34         data = {}
35         for charas in words:
36             if len(charas) < 2:
37                 continue
38             if charas in data:
39                 data[charas] += 1
40             else:
41                 data[charas] = 1
42 
43         data = sorted(data.items(), key=lambda x: x[1], reverse=True)  # 排序
44 
45         return data
46 
47     # 以频率要求数目为依据进行筛选
48     def selectObjGroup(self,num):
49         a = jieba.analyse.extract_tags(self.test_str, topK=num, withWeight=True, allowPOS=())
50         return a
51 
52     def selectWordGroup(self,num):
53         b = jieba.analyse.extract_tags(self.test_str, topK=num, allowPOS=())
54         return b
55 
56 
57 def main():
58     file = open('../testFile/rc/ad.txt', encoding="utf-8")
59     file_context = file.read()
60     ttmhw = ToolToMakeHighWords(file_context)
61     li = ttmhw.selectWordGroup(2)
62     print(li)
63 
64 main()

ToolToMakeHighWords.py

　　　　测试截图

　　5、相关类进行关联得到需要的数据

　　　　整理以上代码

　　　　对已经写好的Surapity.py文件进行修改：（使其在爬取的过程中，直接完成统计，并记录网址）

 1 import time
 2 
 3 import parsel
 4 from urllib import request
 5 import codecs
 6 
 7 from itWords.retire.Kord import KeyWords
 8 from itWords.retire.Oranpick import Oranpick
 9 
10 # [ 连续网页爬取的对象 ]
11 from itWords.retire.highWords import ToolToMakeHighWords
12 
13 
14 class Surapity:
15     page = 1
16     headers = {
17         'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.75 Safari/537.36'}
18     basicURL = ""
19     oran = ""
20 
21     # ---[定义构造方法]
22     def __init__(self):
23         self.page = 1
24         self.basicURL = "https://news.cnblogs.com/"
25         self.oran = Oranpick("https://start.firefoxchina.cn/")
26 
27     def __close__(self):
28         self.oran.__close__()
29 
30     def __next__(self):
31         self.page = self.page + 1
32         self.basicURL = 'https://news.cnblogs.com/n/page/'+str(self.page)+'/'
33 
34     # 获取 url 的内部 HTML 代码
35     def getHTMLText(self):
36         req = request.Request(url=self.basicURL, headers=self.headers)
37         r = request.urlopen(req).read().decode()
38         return r
39 
40     # 获取页面内的基本链接
41     def getMop(self,filePath):
42         index_html = self.getHTMLText()
43         index_sel = parsel.Selector(index_html)
44         links = index_sel.css(".news_entry a::attr(href)").extract()
45         size = links.__len__()
46         for i in range(0,size):
47             link = "https://news.cnblogs.com"+links[i]
48             self.oran.__reset__(link)
49             news = self.oran.getNews()
50             ttm = ToolToMakeHighWords(news.getSimple())
51             words = ttm.getHighWords(ttm.getWords(False,False))
52             leng = words.__len__()
53             # 频数 要在 15次 以上
54             for i in range(0,leng):
55                 if words[i][1]<=15:
56                     break
57                 keyw = KeyWords(word=words[i][0],link=link,num=words[i][1])
58                 keyw.__toFile__(filePath)
59 
60 
61 def fileReset(filePath):
62     f = codecs.open(filePath, "w+", 'utf-8')
63     f.write("")
64     f.close()
65 
66 
67 def main():
68     filepath = "../../testFile/rc/news.txt"
69     s = Surapity()
70     fileReset(filepath)
71     s.getMop(filepath)
72     s.__next__()
73     s.getMop(filepath)
74     while s.page <= 100:
75         s.__next__()
76         s.getMop(filepath)
77     s.__close__()
78 
79 
80 main()

Surapity.py

　　　　对应测试截图：

　　　　小注：这只是中间过程，需要进一步统计（上述是实现了每一篇新闻的频数大于15的高频词）

　　　　上述结果已经可以导入MySql了，如果不想用文件导入，就用下面的Sql语句，别忘了输出之前先建表（words表）

　　　　有了文件sql语句可以由此生成：

 1 import codecs
 2 
 3 filePath = "../../testFile/rc/words_sql.txt"
 4 f = codecs.open(filePath, "w+", 'utf-8')
 5 f.write("")
 6 f.close()
 7 
 8 
 9 fw = open("../../testFile/rc/news.txt", mode='r', encoding='utf-8')
10 tmp = fw.readlines()
11 
12 num = tmp.__len__()
13 
14 for i in range(0,num):
15     group = tmp[i].split("	")
16     group[0] = "'" + group[0] + "'"
17     group[2] = "'" + group[2][0:group[2].__len__()-1] + "'"
18     f = codecs.open(filePath, "a+", 'utf-8')
19     f.write("Insert into words values ("+group[0]+","+group[1]+","+group[2]+");"+"
")
20     f.close()

SqlDeal.py

　　　　数据库对应Sql文件下载地址：https://files.cnblogs.com/files/onepersonwholive/words.zip

　　　　之后建立视图 keywords

　　　　视图定义如下：

1 SELECT
2     `words`.`word` AS `word`,
3     sum(`words`.`num`) AS `num`
4 FROM
5     `words`
6 GROUP BY
7     `words`.`word`
8 ORDER BY
9     `num` DESC

keywords(View)

　　　　视图展示：

　　　　然后，将第136期博客的 Servlet 修改一下：

 1 package com.servlet;
 2 
 3 import java.io.IOException;
 4 import java.sql.SQLException;
 5 import java.util.List;
 6 
 7 import javax.servlet.ServletException;
 8 import javax.servlet.ServletOutputStream;
 9 import javax.servlet.annotation.WebServlet;
10 import javax.servlet.http.HttpServlet;
11 import javax.servlet.http.HttpServletRequest;
12 import javax.servlet.http.HttpServletResponse;
13 
14 import org.json.JSONArray;
15 import org.json.JSONObject;
16 
17 import com.dblink.basic.utils.SqlUtils;
18 import com.dblink.basic.utils.sqlKind.MySql_s;
19 import com.dblink.basic.utils.user.UserInfo;
20 import com.dblink.bean.BeanGroup;
21 import com.dblink.sql.DBLink;
22 
23 @SuppressWarnings("unused")
24 public class ServletForWords extends HttpServlet{
25     /**
26      * 
27      */
28     private static final long serialVersionUID = 1L;
29     //----------------------------------------------------------------------//
30     public void doPost(HttpServletRequest request,HttpServletResponse response) throws ServletException, IOException
31     {
32         request.setCharacterEncoding("utf-8");
33         response.setCharacterEncoding("utf-8");
34         response.setContentType("application/json");
35         response.setHeader("Cache-Control", "no-cache");
36         response.setHeader("Pragma", "no-cache");
37         
38         JSONArray jsonArray = new JSONArray();
39         
40         DBLink dbLink = new DBLink(new SqlUtils(new MySql_s("rc"),new UserInfo("root","123456")));
41         BeanGroup bg = null;
42         try {
43             bg = dbLink.getSelect("Select * From keywords ").beans;//where num > 6
44             int leng = bg.size();
45             for(int i=0;i<leng;++i)
46             {
47                 JSONObject jsonObject = new JSONObject();
48                 jsonObject.put("name",bg.get(i).get(0));
49                 jsonObject.put("value",bg.get(i).get(1));
50                 jsonArray.put(jsonObject);
51             }
52         } catch (SQLException e) {
53             // Do Nothing ...
54         }
55         dbLink.free();
56         
57         ServletOutputStream os = response.getOutputStream();
58         os.write(jsonArray.toString().getBytes());
59         os.flush();
60         os.close();
61     }
62     //---------------------------------------------------------------------------------//
63 }

ServletForWords.java

　　　　对应截图：

相关阅读:
三栏布局
 两栏布局
 WEB前端开发笔试题（1）
JavaScript document 对象
 设计一个有3个超链接的页面，单击这些链接时分别打开和关闭窗口以及关闭本身窗口。
在窗体中有两个多选列表，用户可以从左侧列表中选择任意项，添加到右侧列表中。反之亦然。
How and when: ridge regression with glmnet
Fitting Bayesian Linear Mixed Models for continuous and binary data using Stan: A quick tutorial
An Introduction to Stock Market Data Analysis with R (Part 1)
Lesser known purrr tricks
原文地址：https://www.cnblogs.com/onepersonwholive/p/12248077.html

Python 爬取 热词并进行分类数据分析-[数据修复]

Python 爬取热词并进行分类数据分析-[数据修复]