Python 爬取每日北上资金数据

北上资金一直被誉为“聪明钱”，擅长左侧交易。现在很多机构和大户都会盯着北上资金调整自己的交易。这似乎已经是公开的秘密了。香港证券交易所每天收盘都会公布当天北上资金的持股量，所以我们也可以爬取这份数据抄一抄北上资金的作业。

爬取数据将会用到《Python 学习笔记：获取网络数据》里提及的 urllib 和 BeautifulSoup。

我们分别爬取沪港通和深港通的数据，然后再将两个 dataframe 合并起来，并保存为 csv 文件。

好了，不多说了上代码吧。

Code

import urllib.request, urllib.parse, urllib.error
from bs4 import BeautifulSoup
import ssl
import pandas as pd

# Ignore SSL certificate errors
ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE

urls = ['https://sc.hkexnews.hk/TuniS/www.hkexnews.hk/sdw/search/mutualmarket_c.aspx?t=sh&t=sh', 
        'https://sc.hkexnews.hk/TuniS/www.hkexnews.hk/sdw/search/mutualmarket_c.aspx?t=sh&t=sz']

dates = []
df_list = []

for url in urls:
    html = urllib.request.urlopen(url, context=ctx).read()
    soup = BeautifulSoup(html, 'lxml')

    date = soup.find('input', class_='input-searchDate')['value']
    dates.append(date)
    codes = [code.find('div', class_='mobile-list-body').string for code in soup.find_all('td',class_='col-stock-code')]
    names = [name.find('div', class_='mobile-list-body').string for name in soup.find_all('td',class_='col-stock-name')]
    shareholding = [int(shareholding.find('div', class_='mobile-list-body').string.replace(',', '')) for shareholding in soup.find_all('td',class_='col-shareholding')]
    percent = [float(percent.find('div', class_='mobile-list-body').string.strip('%')) for percent in soup.find_all('td',class_='col-shareholding-percent')]

    df = pd.DataFrame(list(zip(codes, names, shareholding, percent)), columns=['code', 'stock', 'shareholding', 'shareholding%'])
    df_list.append(df)

output = pd.DataFrame()
if dates[0] == dates[1]:
    # combine dataframe sz and dataframe sh
    output = pd.concat(df_list)
    output.to_csv(fname, encoding='utf-8', index=False)
else:
    print('failed to get northbound data from web')

作者：Yuki

出处：https://www.cnblogs.com/yukiwu/

本文版权归作者和博客园所有，欢迎转载，转载请标明出处（附上博客链接）。如果您觉得本篇博文对您有所收获，请点击右下角的 [推荐]，谢谢！

关注我的公众号,不定期更新学习心得

相关阅读:
页面使用本地IE版本
C#获取客户端ip
获取存储过程返回数据
自制js表格排序
读取数据用rs.open sql,conn,1,1
学习存储过程
COLLATE Chinese_PRC_CI_AS
sqlserver存储过程语句(转)
用P3P header解决iframe跨域访问cookie(转载)
java通过dom创建和解析xml

原文地址：https://www.cnblogs.com/yukiwu/p/15049470.html