• python3爬虫抓取智联招聘职位信息代码


    上代码,有问题欢迎留言指出。

    # -*- coding: utf-8 -*-
    """
    Created on Tue Aug  7 20:41:09 2018
    @author: brave-man
    blog: http://www.cnblogs.com/zrmw/
    """
    
    import requests
    from bs4 import BeautifulSoup
    import json
    
    def getDetails(url):
        headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:6.0) Gecko/20100101 Firefox/6.0'}
        res = requests.get(url, headers = headers)
        res.encoding = 'utf-8'
        soup = BeautifulSoup(res.text, 'html.parser')
        soup = json.loads(str(soup))
        
        try:
            with open('jobDetails.txt', 'w') as f:
                print('创建 {} 文件成功'.format('jobDetails.txt'))
        except:
            print('failure')
        
        details = {}    
        for i in soup['data']['results']:
            jobName = i['jobName']
            salary = i['salary']
            company = i['company']['name']
            companyUrl = i['company']['url']
            positionURL = i['positionURL']
            details = {'jobName': jobName,
                       'salary': salary,
                       'company': company,
                       'companyUrl': companyUrl,
                       'positionURL': positionURL
                       }
    #        print(details)
            toFile(details)
    
    def toFile(d):
        dj = json.dumps(d)
        try:
            with open('jobDetails.txt', 'a') as f:
                f.write(dj)
    #            print('sucessful')
        except:
            print('Error')
    
    def main():
        url = 'https://fe-api.zhaopin.com/c/i/sou?pageSize=60&cityId=635&workExperience=-1&education=-1&companyType=-1&employmentType=-1&jobWelfareTag=-1&kw=python&kt=3&lastUrlQuery={"jl":"635","kw":"python","kt":"3"}'
        getDetails(url)
    
    if __name__ == "__main__":
        main()

    执行完上述代码后,会在代码同目录下创建一个保存职位信息的txt文件,jobDetails.txt。

    这只是获取一页招聘信息的代码,后续会添加,如何获取url和所有页的招聘信息的代码。

    智联招聘网站还是有一点点小坑的,就是不是所有的招聘职位详情页面都是使用智联的官网格式,点开某个招聘职位之后,链接定向到某公司官网的招聘网站上,后面遇到的时候会具体处理。

  • 相关阅读:
    LOJ3160 「NOI2019」斗主地
    常系数齐次线性递推
    最小树形图——朱刘算法学习小记
    Linux系统分区(一)
    Linux系统启动过程(二)
    Linux系统目录结构(三)
    cross_val_score
    sklearn.pipeline.Pileline
    DBSCAN密度聚类算法
    特征选择
  • 原文地址:https://www.cnblogs.com/zrmw/p/9439905.html
Copyright © 2020-2023  润新知