获取一篇新闻的全部信息

作业来源：https://edu.cnblogs.com/campus/gzcc/GZCC-16SE2/homework/2894

题目：

给定一篇新闻的链接newsUrl，获取该新闻的全部信息

标题、作者、发布单位、审核、来源

发布时间:转换成datetime类型

点击：

newsUrl
newsId(使用正则表达式re)
clickUrl(str.format(newsId))
requests.get(clickUrl)
newClick(用字符串处理，或正则表达式)
int()

整个过程包装成一个简单清晰的函数。

newsURL为：

http://news.gzcc.cn/html/2019/xiaoyuanxinwen_0402/11131.html

代码为

# -*- coding: utf-8 -*-

import requests
from datetime import datetime
from bs4 import BeautifulSoup

url = 'http://news.gzcc.cn/html/2019/xiaoyuanxinwen_0402/11131.html'
clickNumURL = 'http://oa.gzcc.cn/api.php?op=count&id=11131&modelid=80'


def newsTime(shareinfo):
    newsDate = shareinfo.split()[0].split(':')[1]
    newsTime = shareinfo.split()[1]
    dt = newsDate + " " + newsTime
    # datetime模块的strptime能够将文本字符串格式的数据转换成时间格式的数据
    showtime = datetime.strptime(dt, "%Y-%m-%d %H:%M:%S")
    print("新闻发布时间：", end="")
    print(showtime)



def click(click_num_url):
    return_click_num = requests.get(click_num_url)
    click_info = BeautifulSoup(return_click_num.text, 'html.parser')
    click_num = int(click_info.text.split('.html')[3].split("'")[1])
    print("点击次数：", end="")
    print(click_num)

resourses = requests.get(url)
resourses.encoding = 'UTF-8'
soup = BeautifulSoup(resourses.text, 'html.parser')

print("
新闻标题：" + soup.select('.show-title')[0].text)  # 使用BeautifulSoup的select方法根据元素的类名来查找元素的内容，返回的是list类型
publishing_unit = soup.select('.show-info')[0].text.split()[4].split('：')[1]
print("新闻发布单位：", end="")
print(publishing_unit)
print("作者：", end="")
writer = soup.select('.show-info')[0].text.split()[2].split('：')[1]
print(writer)
print("新闻内容：" + soup.select('.show-content')[0].text.replace('u3000', ''))
shareinfo = soup.select('.show-info')[0].text

newsTime(shareinfo)
click(clickNumURL)

　　标题、作者、发布单位、审核、来源

整体效果为：

相关阅读:
1648 最大和
poj2243
Codevs 2307[SDOI2009]HH的项链
2597 团伙
一个JavaWeb项目中使用的部分技术
Oracle 11g 学习3——表空间操作
iOS实现抽屉效果
用shell脚本实现linux系统上wifi模式（STA和soft AP）的转换
Codeforces Round #243 (Div. 1)——Sereja and Two Sequences
站点选择配色诀窍

原文地址：https://www.cnblogs.com/hesz/p/10648738.html