• 数据爬取与语音合成


    数据爬取与语音合成

    以python的网站数据爬取,并根据网站数据的内容调用百度音频合成接口进行生成音频文件

    一.基础工作

    在百度AI注册一个账号,获取我们需要的应用

    在这里插入图片描述

    在这里插入图片描述

    二.API函数接口的搭建

    #!/usr/bin/env python3
    # -*- coding:utf-8 -*-
    # Author LQ6H
    
    # baidu_api.py
    
    import os
    
    from aip import AipSpeech
    
    APP_ID = '20257660'
    API_KEY = 'QmnUG6DxYf0DFw1Fjx9IeLqk'
    SECRET_KEY = '******'
    
    client = AipSpeech(APP_ID,API_KEY,SECRET_KEY)
    
    def baidu(title,content):
    
        result = client.synthesis(content,'zh',1,{'vol':5})
    
        filename = os.path.join('./static/',title+'.mp3')
    
        if not isinstance(result,dict):
            with open(filename,'wb') as f:
                f.write(result)
            print(filename+"已存入")
    

    三.网站数据爬虫

    我们爬取网站:https://duanziwang.com/ 的数据

    #!/usr/bin/env python3
    # -*- coding:utf-8 -*-
    # Author LQ6H
    
    # spider.py
    
    import requests
    from baidu_api import baidu
    from lxml import etree
    
    headers = {
        'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36'
    }
    
    
    def parse(url):
    
        response = requests.get(url,headers=headers)
    
        response.encoding = 'utf-8'
    
        html = response.text
    
        root = etree.HTML(html)
    
        parse_html(root)
    
    
    def parse_html(root):
    
        article_list = root.xpath('/html/body/section/div/div/main/article')
    
        for article in article_list:
    
            title = article.xpath('./div[1]/h1/a/text()')[0]
            content = article.xpath('./div[2]/pre/code/text()')[0]
    
            baidu(title,content)
    
    
    
    def main():
        base_url = 'https://duanziwang.com/page/{}/'
    
        for i in range(2,5):
            url = base_url.format(i)
    
            parse(url)
    
    
    if __name__=="__main__":
    
        main()
    

    四.服务器搭建

    使用flask进行网站搭建

    #!/usr/bin/env python3
    # -*- coding:utf-8 -*-
    # Author LQ6H
    
    # server.py
    
    from flask import Flask,render_template
    import os
    
    
    app = Flask(__name__)
    
    @app.route('/test')
    def index():
        filepath_list = os.listdir('./static')
        path_list = []
        for path in filepath_list:
            path = '/static/'+path
            path_list.append(path)
    
        print(path_list)
    
        return render_template('./main.html',path_list=path_list)
    
    if __name__ == "__main__":
        app.run(debug=True)
    
    

    六.主页展示

    <!DOCTYPE html>
    <html lang="en">
    <head>
        <meta charset="UTF-8">
        <title>段子</title>
    </head>
    <body>
    <h1>hello</h1>
    
             {% for path in path_list %}
             <audio controls>
                <source src="{{path}}" type="audio/mp3" />
            </audio>
            <h5>{{path}}</h5><br>
    
             {% endfor %}
    
    
    </body>
    </html>
    

    在这里插入图片描述

    代码链接:https://github.com/LQ6H/Python_spider

  • 相关阅读:
    permute
    ind2sub
    randi( )函数--MATLAB
    ABAQUS复合材料
    matlab中fix函数,floor函数,ceil函数
    在windows10中卸载软件和取消开机启动程序
    在linux的tomcat中配置https及自动跳转
    解决ubuntu无法远程连接
    谷歌浏览器 插件安装配置Momentum chrome
    0903——元类
  • 原文地址:https://www.cnblogs.com/LQ6H/p/13961173.html
Copyright © 2020-2023  润新知