• python---网络爬虫


    写了一个简单的网络爬虫:

    #coding=utf-8
    from bs4 import BeautifulSoup
    import requests
    url = "http://www.weather.com.cn/textFC/hb.shtml"
    def get_temperature(url):
        headers = {
            'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36',
            'Upgrade-Insecure-Requests':'1',
            'Referer':'http://www.weather.com.cn/weather1d/10129160502A.shtml',
            'Host':'www.weather.com.cn'
        }
        res = requests.get(url,headers=headers)
        res.encoding = "utf-8"
        content = res.content # 拿到的是ascll编码
        content = content.decode('UTF-8')# 转成UTF-8编码
        #print(content)
    
        soup = BeautifulSoup(content,'lxml')
        conMidetab = soup.find('div',class_='conMidtab')
        conMidetab2_list = conMidetab.find_all('div',class_='conMidtab2')
        for x in conMidetab2_list:
            tr_list = x.find_all('tr')[2:] # 所有的tr
            province = ''
            min = 0
            for index,x in enumerate(tr_list):
                if index == 0:
                    td_list = x.find_all('td')
                    province = td_list[0].text.replace('
    ','')
                    city = td_list[1].text.replace('
    ','')
                    min = td_list[7].text.replace('
    ','')
                else:
                    td_list = x.find_all('td')
                    city = td_list[0].text.replace('
    ','')
                    min = td_list[6].text.replace('
    ','')
                print(province,city,min)
            # province_list = tr_list[2]
            # td_list = province_list.find_all('td')
            # province_td = td_list[0]
            # province = province_td.text
            # #print(province.replace('
    ',''))
    get_temperature(url)
  • 相关阅读:
    第三次作业
    第二次作业
    第一次作业
    仪仗队(欧拉函数)
    自己随便做的,没做完就没耐心继续了。呵呵
    从别处见到一位大神的代码 飞扬的小鸟(flappy bird)
    简易迷宫游戏c++
    STL做法 平衡树
    基于百度地图api + AngularJS 的入门地图
    javascript 简易文本编辑器
  • 原文地址:https://www.cnblogs.com/e0yu/p/9505490.html
Copyright © 2020-2023  润新知