• 20183215 实验四《Python程序设计》实验报告


    20183215 2019-2020-2 《Python程序设计》实验四报告

    课程:《Python程序设计》
    班级: 1832
    姓名: 董振龙
    学号: 20183215
    实验教师:王志强
    实验日期:2020年6月13日
    必修/选修:公选课

    1.实验内容

    python综合实践:爬虫与GUI界面初步结合

    2. 实验过程及结果

    首先,我利用wxFormBuilder,构建出了爬虫程序的GUI界面:

    获得其对应的.py文件,打开pycharm,将上述.py文件复制到工程文件夹中,在文件夹中新建.py文件继承图形界面,其大致内容如下

    import wx
    import pawindow
    class PAFrame(pawindow.PAzhihu):
        def __init__(self, parent):
            pawindow.PAzhihu.__init__(self, parent)
    def main():
        app = wx.App(False)
        frame = PAFrame(None)
        frame.Show(True)
        app.MainLoop()
    if __name__ == "__main__":
        main()
        pass
    

    运行一下,显示结果如上图所示。接下来就是编写对应的事件函数
    选择文件夹并将文件夹路径打印:

        def select_file(self, event):
            dlg = wx.DirDialog(self, u"选择文件夹", style=wx.DD_DEFAULT_STYLE)
            if dlg.ShowModal() == wx.ID_OK:
                self.ospath.SetValue(dlg.GetPath())
    

    开始爬取:

        def save_images(self, event):
            text = self.url_input.GetValue() 
            text = re.sub("D", "", text)[0:9]  # 获取输入的网址并解析得到问题代码
            url = "https://www.zhihu.com/api/v4/questions/" + text + "/answers?include=data%5B%2A%5D.is_normal%2Cadmin_closed_comment%2Creward_info%2Cis_collapsed%2Cannotation_action%2Cannotation_detail%2Ccollapse_reason%2Cis_sticky%2Ccollapsed_by%2Csuggest_edit%2Ccomment_count%2Ccan_comment%2Ccontent%2Ceditable_content%2Cvoteup_count%2Creshipment_settings%2Ccomment_permission%2Ccreated_time%2Cupdated_time%2Creview_info%2Crelevant_info%2Cquestion%2Cexcerpt%2Crelationship.is_authorized%2Cis_author%2Cvoting%2Cis_thanked%2Cis_nothelp%2Cis_labeled%2Cis_recognized%2Cpaid_info%3Bdata%5B%2A%5D.mark_infos%5B%2A%5D.url%3Bdata%5B%2A%5D.author.follower_count%2Cbadge%5B%2A%5D.topics&limit=5&offset=0&platform=desktop&sort_by=default"
            answer_total = int(answer(url)['paging']['totals'])  # 获取回答总数
            offset = 0
            while offset < answer_total:
                url = "https://www.zhihu.com/api/v4/questions/" + text + "/answers?include=data%5B%2A%5D.is_normal%2Cadmin_closed_comment%2Creward_info%2Cis_collapsed%2Cannotation_action%2Cannotation_detail%2Ccollapse_reason%2Cis_sticky%2Ccollapsed_by%2Csuggest_edit%2Ccomment_count%2Ccan_comment%2Ccontent%2Ceditable_content%2Cvoteup_count%2Creshipment_settings%2Ccomment_permission%2Ccreated_time%2Cupdated_time%2Creview_info%2Crelevant_info%2Cquestion%2Cexcerpt%2Crelationship.is_authorized%2Cis_author%2Cvoting%2Cis_thanked%2Cis_nothelp%2Cis_labeled%2Cis_recognized%2Cpaid_info%3Bdata%5B%2A%5D.mark_infos%5B%2A%5D.url%3Bdata%5B%2A%5D.author.follower_count%2Cbadge%5B%2A%5D.topics&limit=5&offset=" + str(
                    offset) + "&platform=desktop&sort_by=default"
                offset += 5
                # print(offset)
                answer_row = answer(url)
                data = answer_row['data']
                if data.__len__ == 0:
                    break
                else:
                    for index, data_ in enumerate(data):
                        answer_content = data[index]['content']
                        img_urls = re.findall('src="(https://.*?)"', answer_content)  # 使用正则表达式获取图片URL
                        img_urls = list(set(img_urls))
                        print(json.dumps(img_urls))
                        if img_urls.__len__() == 0:
                            break
                        for img_url in img_urls:
                            local_path = parse.urlsplit(img_url)[2].replace("/", "") #打开选定的文件夹将爬取的图片保存
                            dir_path = self.ospath.GetValue() + "\"
                            f = open(dir_path + local_path, 'wb')
                            f.write(requests.get(img_url, headers=header).content)
                            f.close()
                            time.sleep(1)
            time.sleep(1)
    

    运行效果
    再然后commit+push到git,所有完整代码在这里

    3. 实验过程中遇到的问题和解决过程

    • 问题1:不知道如何弹出打开文件夹的对话框
    • 问题1解决方案:百度之后,找到了一位大佬的博客,根据他的代码,成功解决问题。
    • 问题2:最初设想是想要边保存图片,边在GUI界面显示进度,但是GUI界面始终无响应
    • 问题2解决方案:并没有成功解决,百度之后发现涉及到多线程问题,但是我自学时发现我的代码已经基本成型,加入多线程需要修改很多代码,因为担心不必要的bug,最终在GUI界面上让了步,所以最终的GUI界面并没有进度条和图片保存进度

    感悟及思考

    这次实验让我明白了,我所知道的python相关知识,真的只是九牛一毛,还有很多东西我并没有接触到。不过我相信,在以后的自学中,我会慢慢的掌握更多的python相关知识。另外,要非常感谢王老师,王老师的选课让我深入的了解了python,比起我之前自学,效率高了三倍不止。我从王老师以及其他同学身上学到了很多,从头至尾,从python最基础的变量和print,到数据库和爬虫,从浅至深。或许我并未有掌握到100%老师所传授的东西,但是我会继续学习,python在我的生命中已经成为了烙印,王老师的课很精彩,但是之后没有王老师的教授,我也会一步一步的进步。人生苦短,我爱python。

    参考资料

  • 相关阅读:
    Java IO 5 : 对象序列化
    Java IO 4 : RandomAccessFile
    3 Linux平台安装jenkins
    AWS-EC2配置swap
    2.8 环境准备-静态资源服务器搭建
    2.7 环境准备-MongoDB
    2.6 环境准备-redis
    2.5 环境准备-zookeeper
    2.4 环境准备-mysql8
    2.3 环境准备-nexus
  • 原文地址:https://www.cnblogs.com/mo-xiao-qi/p/13114580.html
Copyright © 2020-2023  润新知