• xml模块


    2.csv格式文件

    逗号分隔值(Comma-Separated Values,CSV,有时也称为字符分隔值,因为分隔字符也可以不是逗号),其文件以纯文本形式存储表格数据(数字和文本)。

    对于这种格式的数据,我们需要利用open函数来读取文件并根据逗号分隔的特点来进行处理。

    股票代码,股票名称,当前价,涨跌额,涨跌幅,年初至今
    SH601778,N晶科,6.29,+1.92,-43.94%,+43.94%
    SH688566,吉贝尔,52.66,+6.96,+15.23%,+122.29%
    ...
    

    练习题案例:下载文档中的所有图片且以用户名为图片名称存储。

    ID,用户名,头像
    26044585,Hush,https://hbimg.huabanimg.com/51d46dc32abe7ac7f83b94c67bb88cacc46869954f478-aP4Q3V
    19318369,柒十一,https://hbimg.huabanimg.com/703fdb063bdc37b11033ef794f9b3a7adfa01fd21a6d1-wTFbnO
    15529690,Law344,https://hbimg.huabanimg.com/b438d8c61ed2abf50ca94e00f257ca7a223e3b364b471-xrzoQd
    18311394,Jennah·,https://hbimg.huabanimg.com/4edba1ed6a71797f52355aa1de5af961b85bf824cb71-px1nZz
    18009711,可洛爱画画,https://hbimg.huabanimg.com/03331ef39b5c7687f5cc47dbcbafd974403c962ae88ce-Co8AUI
    30574436,花姑凉~,https://hbimg.huabanimg.com/2f5b657edb9497ff8c41132e18000edb082d158c2404-8rYHbw
    17740339,小巫師,https://hbimg.huabanimg.com/dbc6fd49f1915545cc42c1a1492a418dbaebd2c21bb9-9aDqgl
    18741964,桐末tonmo,https://hbimg.huabanimg.com/b60cee303f62aaa592292f45a1ed8d5be9873b2ed5c-gAJehO
    30535005,TANGZHIQI,https://hbimg.huabanimg.com/bbd08ee168d54665bf9b07899a5c4a4d6bc1eb8af77a4-8Gz3K1
    31078743,你的老杨,https://hbimg.huabanimg.com/c46fbc3c9a01db37b8e786cbd7174bbd475e4cda220f4-F1u7MX
    25519376,尺尺寸,https://hbimg.huabanimg.com/ee29ee198efb98f970e3dc2b24c40d89bfb6f911126b6-KGvKes
    21113978,C-CLong,https://hbimg.huabanimg.com/7fa6b2a0d570e67246b34840a87d57c16a875dba9100-SXsSeY
    24674102,szaa,https://hbimg.huabanimg.com/0716687b0df93e8c3a8e0925b6d2e4135449cd27597c4-gWdv24
    30508507,爱起床的小灰灰,https://hbimg.huabanimg.com/4eafdbfa21b2f300a7becd8863f948e5e92ef789b5a5-1ozTKq
    12593664,yokozen,https://hbimg.huabanimg.com/cd07bbaf052b752ed5c287602404ea719d7dd8161321b-cJtHss
    16899164,一阵疯,https://hbimg.huabanimg.com/0940b557b28892658c3bcaf52f5ba8dc8402100e130b2-G966Uz
    847937,卩丬My㊊伴er彎,https://hbimg.huabanimg.com/e2d6bb5bc8498c6f607492a8f96164aa2366b104e7a-kWaH68
    31010628,慢慢即漫漫,https://hbimg.huabanimg.com/c4fb6718907a22f202e8dd14d52f0c369685e59cfea7-82FdsK
    13438168,海贼玩跑跑,https://hbimg.huabanimg.com/1edae3ce6fe0f6e95b67b4f8b57c4cebf19c501b397e-BXwiW6
    28593155,源稚生,https://hbimg.huabanimg.com/626cfd89ca4c10e6f875f3dfe1005331e4c0fd7fd429-9SeJeQ
    28201821,合伙哼哼,https://hbimg.huabanimg.com/f59d4780531aa1892b80e0ec94d4ec78dcba08ff18c416-769X6a
    28255146,漫步AAA,https://hbimg.huabanimg.com/3c034c520594e38353a039d7e7a5fd5e74fb53eb1086-KnpLaL
    30537613,配䦹,https://hbimg.huabanimg.com/efd81d22c1b1a2de77a0e0d8e853282b83b6bbc590fd-y3d4GJ
    22665880,日后必火,https://hbimg.huabanimg.com/69f0f959979a4fada9e9e55f565989544be88164d2b-INWbaF
    16748980,keer521521,https://hbimg.huabanimg.com/654953460733026a7ef6e101404055627ad51784a95c-B6OFs4
    30536510,“西辞”,https://hbimg.huabanimg.com/61cfffca6b2507bf51a507e8319d68a8b8c3a96968f-6IvMSk
    30986577,艺成背锅王,https://hbimg.huabanimg.com/c381ecc43d6c69758a86a30ebf72976906ae6c53291f9-9zroHF
    26409800,CsysADk7,https://hbimg.huabanimg.com/bf1d22092c2070d68ade012c588f2e410caaab1f58051-ahlgLm
    30469116,18啊全阿,https://hbimg.huabanimg.com/654953460733026a7ef6e101404055627ad51784a95c-B6OFs4
    15514336,W/小哥,https://hbimg.huabanimg.com/a30f5967fc0acf81421dd49650397de63c105b9ead1c-nVRrNl
    17473505,椿の花,https://hbimg.huabanimg.com/0e38d810e5a24f91ebb251fd3aaaed8bb37655b14844c-pgNJBP
    19165177,っ思忆゜♪,https://hbimg.huabanimg.com/4815ea0e4905d0f3bb82a654b481811dadbfe5ce2673-vMVr0B
    16059616,格林熊丶,https://hbimg.huabanimg.com/8760a2b08d87e6ed4b7a9715b1a668176dbf84fec5b-jx14tZ
    30734152,sCWVkJDG,https://hbimg.huabanimg.com/f31a5305d1b8717bbfb897723f267d316e58e7b7dc40-GD3e22
    24019677,虚无本心,https://hbimg.huabanimg.com/6fdfa9834abe362e978b517275b06e7f0d5926aa650-N1xCXE
    16670283,Y-雨后天空,https://hbimg.huabanimg.com/a3bbb0045b536fc27a6d2effa64a0d43f9f5193c177f-I2vHaI
    21512483,汤姆2,https://hbimg.huabanimg.com/98cc50a61a7cc9b49a8af754ffb26bd15764a82f1133-AkiU7D
    16441049,笑潇啸逍小鱼,https://hbimg.huabanimg.com/ae8a70cd85aff3a8587ff6578d5cf7620f3691df13e46-lmrIi9
    24795603,⁢⁢⁢⁢⁢v,https://hbimg.huabanimg.com/a7183cc3a933aa129d7b3230bf1378fd8f5857846cc5-3tDtx3
    29819152,妮玛士珍多,https://hbimg.huabanimg.com/ca4ecb573bf1ff0415c7a873d64470dedc465ea1213c6-RAkArS
    19101282,陈勇敢❤,https://hbimg.huabanimg.com/ab6d04ebaff3176e3570139a65155856871241b58bc6-Qklj2E
    28337572,爱意随风散,https://hbimg.huabanimg.com/117ad8b6eeda57a562ac6ab2861111a793ca3d1d5543-SjWlk2
    17342758,幸运instant,https://hbimg.huabanimg.com/72b5f9042ec297ae57b83431123bc1c066cca90fa23-3MoJNj
    18483372,Beau染,https://hbimg.huabanimg.com/077115cb622b1ff3907ec6932e1b575393d5aae720487-d1cdT9
    22127102,栽花的小蜻蜓,https://hbimg.huabanimg.com/6c3cbf9f27e17898083186fc51985e43269018cc1e1df-QfOIBG
    13802024,LoveHsu,https://hbimg.huabanimg.com/f720a15f8b49b86a7c1ee4951263a8dbecfe3e43d2d-GPEauV
    22558931,白驹过隙丶梨花泪う,https://hbimg.huabanimg.com/e49e1341dfe5144da5c71bd15f1052ef07ba7a0e1296b-jfyfDJ
    11762339,cojoy,https://hbimg.huabanimg.com/5b27f876d5d391e7c4889bc5e8ba214419eb72b56822-83gYmB
    30711623,雪碧学长呀,https://hbimg.huabanimg.com/2c288a1535048b05537ba523b3fc9eacc1e81273212d1-nr8M4t
    18906718,西霸王,https://hbimg.huabanimg.com/7b02ad5e01bd8c0a29817e362814666a7800831c154a6-AvBDaG
    31037856,邵阳的小哥哥,https://hbimg.huabanimg.com/654953460733026a7ef6e101404055627ad51784a95c-B6OFs4
    26830711,稳健谭,https://hbimg.huabanimg.com/51547ade3f0aef134e8d268cfd4ad61110925aefec8a-NKPEYX
    
    import os
    import requests
    
    with open('files/mv.csv', mode='r', encoding='utf-8') as file_object:
        file_object.readline()
        for line in file_object:
            user_id, username, url = line.strip().split(',')
            print(username, url)
            # 1.根据URL下载图片
            res = requests.get(
                url=url,
                headers={
                    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36"
                }
            )
            # 检查images目录是否存在?不存在,则创建images目录
            if not os.path.exists("images"):
                # 创建images目录
                os.makedirs("images")
    
            # 2.将图片的内容写入到文件
            with open("images/{}.png".format(username), mode='wb') as img_object:
                img_object.write(res.content)
    

    3.ini格式文件

    ini文件是Initialization File的缩写,平时用于存储软件的的配置文件。例如:MySQL数据库的配置文件。

    [mysqld]
    datadir=/var/lib/mysql
    socket=/var/lib/mysql/mysql.sock
    log-bin=py-mysql-bin
    character-set-server=utf8
    collation-server=utf8_general_ci
    log-error=/var/log/mysqld.log
    # Disabling symbolic-links is recommended to prevent assorted security risks
    symbolic-links=0
    
    [mysqld_safe]
    log-error=/var/log/mariadb/mariadb.log
    pid-file=/var/run/mariadb/mariadb.pid
    
    [client]
    default-character-set=utf8
    

    这种格式是可以直接使用open来出来,考虑到自己处理比较麻烦,所以Python为我们提供了更为方便的方式。

    import configparser
    
    config = configparser.ConfigParser()
    config.read('files/my.ini', encoding='utf-8')
    # config.read('/Users/wupeiqi/PycharmProjects/luffyCourse/day09/files/my.ini', encoding='utf-8')
    
    # 1.获取所有的节点
    """
    result = config.sections()
    print(result)  # ['mysqld', 'mysqld_safe', 'client']
    """
    
    # 2.获取节点下的键值
    """
    result = config.items("mysqld_safe")
    print(result)  # [('log-error', '/var/log/mariadb/mariadb.log'), ('pid-file', '/var/run/mariadb/mariadb.pid')]
    
    for key, value in config.items("mysqld_safe"):
        print(key, value)
    """
    
    # 3.获取某个节点下的键对应的值
    """
    result = config.get("mysqld","collation-server")
    print(result)
    """
    
    # 4.其他
    
    # 4.1 是否存在节点
    # v1 = config.has_section("client")
    # print(v1)
    
    # 4.2 添加一个节点
    # config.add_section("group")
    # config.set('group','name','wupeiqi')
    # config.set('client','name','wupeiqi')
    # config.write(open('files/new.ini', mode='w', encoding='utf-8'))
    
    # 4.3 删除
    # config.remove_section('client')
    # config.remove_option("mysqld", "datadir")
    # config.write(open('files/new.ini', mode='w', encoding='utf-8'))
    
    • 读取所有节点

      import configparser
      
      config = configparser.ConfigParser()
      config.read('/Users/wupeiqi/PycharmProjects/luffyCourse/day09/files/my.conf', encoding='utf-8')
      # config.read('my.conf', encoding='utf-8')
      ret = config.sections()
      print(ret) 
      
      >>输出
      ['mysqld', 'mysqld_safe', 'client']
      
    • 读取节点下的键值

      import configparser
      
      config = configparser.ConfigParser()
      config.read('/Users/wupeiqi/PycharmProjects/luffyCourse/day09/files/my.conf', encoding='utf-8')
      # config.read('my.conf', encoding='utf-8')
      item_list = config.items("mysqld_safe")
      print(item_list) 
      
      >>输出
      [('log-error', '/var/log/mariadb/mariadb.log'), ('pid-file', '/var/run/mariadb/mariadb.pid')]
      
    • 读取节点下值(根据 节点+键 )

      import configparser
      
      config = configparser.ConfigParser()
      config.read('/Users/wupeiqi/PycharmProjects/luffyCourse/day09/files/my.conf', encoding='utf-8')
      
      value = config.get('mysqld', 'log-bin')
      print(value)
      
      >>输出
      py-mysql-bin
      
    • 检查、删除、添加节点

      import configparser
      
      config = configparser.ConfigParser()
      config.read('/Users/wupeiqi/PycharmProjects/luffyCourse/day09/files/my.conf', encoding='utf-8')
      # config.read('my.conf', encoding='utf-8')
      
      
      # 检查
      has_sec = config.has_section('mysqld')
      print(has_sec)
      
      # 添加节点
      config.add_section("SEC_1")
      # 节点中设置键值
      config.set('SEC_1', 'k10', "123")
      config.set('SEC_1', 'name', "哈哈哈哈哈")
      
      config.add_section("SEC_2")
      config.set('SEC_2', 'k10', "123")
      # 内容写入新文件
      config.write(open('/Users/wupeiqi/PycharmProjects/luffyCourse/day09/files/xxoo.conf', 'w'))
      
      
      # 删除节点
      config.remove_section("SEC_2")
      # 删除节点中的键值
      config.remove_option('SEC_1', 'k10')
      config.write(open('/Users/wupeiqi/PycharmProjects/luffyCourse/day09/files/new.conf', 'w'))
      

    4.XML格式文件

    可扩展标记语言,是一种简单的数据存储语言,XML 被设计用来传输和存储数据。

    • 存储,可用来存放配置文件,例如:java的配置文件。
    • 传输,网络传输时以这种格式存在,例如:早期ajax传输的数据、soap协议等。
    <data>
        <country name="Liechtenstein">
            <rank updated="yes">2</rank>
            <year>2023</year>
            <gdppc>141100</gdppc>
            <neighbor direction="E" name="Austria" />
            <neighbor direction="W" name="Switzerland" />
        </country>
        <country name="Singapore">
            <rank updated="yes">5</rank>
            <year>2026</year>
            <gdppc>59900</gdppc>
            <neighbor direction="N" name="Malaysia" />
        </country>
        <country name="Panama">
            <rank updated="yes">69</rank>
            <year>2026</year>
            <gdppc>13600</gdppc>
            <neighbor direction="W" name="Costa Rica" />
            <neighbor direction="E" name="Colombia" />
        </country>
    </data>
    

    注意:在Python开发中用的相对来比较少,大家作为了解即可(后期课程在讲解微信支付、微信公众号消息处理 时会用到基于xml传输数据)。

    例如:https://developers.weixin.qq.com/doc/offiaccount/Message_Management/Receiving_standard_messages.html

    4.1 读取文件和内容

    from lxml.etree import ElementTree as ET
    
    # ET去打开xml文件
    tree = ET.parse("files/xo.xml")
    
    # 获取根标签
    root = tree.getroot()
    
    print(root) # <Element 'data' at 0x7f94e02763b0>
    
    from xml.etree import ElementTree as ET
    
    content = """
    <data>
        <country name="Liechtenstein">
            <rank updated="yes">2</rank>
            <year>2023</year>
            <gdppc>141100</gdppc>
            <neighbor direction="E" name="Austria" />
            <neighbor direction="W" name="Switzerland" />
        </country>
         <country name="Panama">
            <rank updated="yes">69</rank>
            <year>2026</year>
            <gdppc>13600</gdppc>
            <neighbor direction="W" name="Costa Rica" />
            <neighbor direction="E" name="Colombia" />
        </country>
    </data>
    """
    
    root = ET.XML(content)
    print(root)  # <Element 'data' at 0x7fdaa019cea0>
    

    4.2 读取节点数据

    from xml.etree import ElementTree as ET
    
    content = """
    <data>
        <country name="Liechtenstein" id="999" >
            <rank>2</rank>
            <year>2023</year>
            <gdppc>141100</gdppc>
            <neighbor direction="E" name="Austria" />
            <neighbor direction="W" name="Switzerland" />
        </country>
         <country name="Panama">
            <rank>69</rank>
            <year>2026</year>
            <gdppc>13600</gdppc>
            <neighbor direction="W" name="Costa Rica" />
            <neighbor direction="E" name="Colombia" />
        </country>
    </data>
    """
    
    # 获取根标签 data
    root = ET.XML(content)
    
    country_object = root.find("country")
    print(country_object.tag, country_object.attrib)
    gdppc_object = country_object.find("gdppc")
    print(gdppc_object.tag,gdppc_object.attrib,gdppc_object.text)
    
    from xml.etree import ElementTree as ET
    
    content = """
    <data>
        <country name="Liechtenstein">
            <rank>2</rank>
            <year>2023</year>
            <gdppc>141100</gdppc>
            <neighbor direction="E" name="Austria" />
            <neighbor direction="W" name="Switzerland" />
        </country>
         <country name="Panama">
            <rank>69</rank>
            <year>2026</year>
            <gdppc>13600</gdppc>
            <neighbor direction="W" name="Costa Rica" />
            <neighbor direction="E" name="Colombia" />
        </country>
    </data>
    """
    
    # 获取根标签 data
    root = ET.XML(content)
    
    # 获取data标签的孩子标签
    for child in root:
        # child.tag = conntry
        # child.attrib = {"name":"Liechtenstein"}
        print(child.tag, child.attrib)
        for node in child:
            print(node.tag, node.attrib, node.text)
    
    from xml.etree import ElementTree as ET
    
    content = """
    <data>
        <country name="Liechtenstein">
            <rank>2</rank>
            <year>2023</year>
            <gdppc>141100</gdppc>
            <neighbor direction="E" name="Austria" />
            <neighbor direction="W" name="Switzerland" />
        </country>
         <country name="Panama">
            <rank>69</rank>
            <year>2026</year>
            <gdppc>13600</gdppc>
            <neighbor direction="W" name="Costa Rica" />
            <neighbor direction="E" name="Colombia" />
        </country>
    </data>
    """
    
    root = ET.XML(content)
    
    for child in root.iter('year'):
        print(child.tag, child.text)
    
    from xml.etree import ElementTree as ET
    
    content = """
    <data>
        <country name="Liechtenstein">
            <rank>2</rank>
            <year>2023</year>
            <gdppc>141100</gdppc>
            <neighbor direction="E" name="Austria" />
            <neighbor direction="W" name="Switzerland" />
        </country>
         <country name="Panama">
            <rank>69</rank>
            <year>2026</year>
            <gdppc>13600</gdppc>
            <neighbor direction="W" name="Costa Rica" />
            <neighbor direction="E" name="Colombia" />
        </country>
    </data>
    """
    
    root = ET.XML(content)
    v1 = root.findall('country')
    print(v1)
    
    v2 = root.find('country').find('rank')
    print(v2.text)
    

    4.3 修改和删除节点

    from xml.etree import ElementTree as ET
    
    content = """
    <data>
        <country name="Liechtenstein">
            <rank>2</rank>
            <year>2023</year>
            <gdppc>141100</gdppc>
            <neighbor direction="E" name="Austria" />
            <neighbor direction="W" name="Switzerland" />
        </country>
         <country name="Panama">
            <rank>69</rank>
            <year>2026</year>
            <gdppc>13600</gdppc>
            <neighbor direction="W" name="Costa Rica" />
            <neighbor direction="E" name="Colombia" />
        </country>
    </data>
    """
    
    root = ET.XML(content)
    
    # 修改节点内容和属性
    rank = root.find('country').find('rank')
    print(rank.text)
    rank.text = "999"
    rank.set('update', '2020-11-11')
    print(rank.text, rank.attrib)
    ############ 保存文件 ############
    tree = ET.ElementTree(root)
    tree.write("new.xml", encoding='utf-8')
    
    
    # 删除节点
    root.remove( root.find('country') )
    print(root.findall('country'))
    
    ############ 保存文件 ############
    tree = ET.ElementTree(root)
    tree.write("newnew.xml", encoding='utf-8')
    

    4.4 构建文档

    <home>
        <son name="儿1">
            <grandson name="儿11"></grandson>
            <grandson name="儿12"></grandson>
        </son>
        <son name="儿2"></son>
    </home>
    
    from xml.etree import ElementTree as ET
    
    # 创建根标签
    root = ET.Element("home")
    
    # 创建节点大儿子
    son1 = ET.Element('son', {'name': '儿1'})
    # 创建小儿子
    son2 = ET.Element('son', {"name": '儿2'})
    
    # 在大儿子中创建两个孙子
    grandson1 = ET.Element('grandson', {'name': '儿11'})
    grandson2 = ET.Element('grandson', {'name': '儿12'})
    son1.append(grandson1)
    son1.append(grandson2)
    
    # 把儿子添加到根节点中
    root.append(son1)
    root.append(son2)
    
    tree = ET.ElementTree(root)
    tree.write('oooo.xml', encoding='utf-8', short_empty_elements=False)
    
    <famliy>
        <son name="儿1">
            <grandson name="儿11"></grandson>
            <grandson name="儿12"></grandson>
        </son>
        <son name="儿2"></son>
    </famliy>
    
    from xml.etree import ElementTree as ET
    
    # 创建根节点
    root = ET.Element("famliy")
    
    
    # 创建大儿子
    son1 = root.makeelement('son', {'name': '儿1'})
    # 创建小儿子
    son2 = root.makeelement('son', {"name": '儿2'})
    
    # 在大儿子中创建两个孙子
    grandson1 = son1.makeelement('grandson', {'name': '儿11'})
    grandson2 = son1.makeelement('grandson', {'name': '儿12'})
    
    son1.append(grandson1)
    son1.append(grandson2)
    
    
    # 把儿子添加到根节点中
    root.append(son1)
    root.append(son2)
    
    tree = ET.ElementTree(root)
    tree.write('oooo.xml',encoding='utf-8')
    
    <famliy>
    	<son name="儿1">
        	<age name="儿11">孙子</age>
        </son>
    	<son name="儿2"></son>
    </famliy>
    
    from xml.etree import ElementTree as ET
    
    
    # 创建根节点
    root = ET.Element("famliy")
    
    
    # 创建节点大儿子
    son1 = ET.SubElement(root, "son", attrib={'name': '儿1'})
    # 创建小儿子
    son2 = ET.SubElement(root, "son", attrib={"name": "儿2"})
    
    # 在大儿子中创建一个孙子
    grandson1 = ET.SubElement(son1, "age", attrib={'name': '儿11'})
    grandson1.text = '孙子'
    
    
    et = ET.ElementTree(root)  #生成文档对象
    et.write("test.xml", encoding="utf-8")
    
    <user><![CDATA[你好呀]]</user>
    
    from xml.etree import ElementTree as ET
    
    # 创建根节点
    root = ET.Element("user")
    root.text = "<![CDATA[你好呀]]"
    
    et = ET.ElementTree(root)  # 生成文档对象
    et.write("test.xml", encoding="utf-8")
    

    案例:

    content = """<xml>
        <ToUserName><![CDATA[gh_7f083739789a]]></ToUserName>
        <FromUserName><![CDATA[oia2TjuEGTNoeX76QEjQNrcURxG8]]></FromUserName>
        <CreateTime>1395658920</CreateTime>
        <MsgType><![CDATA[event]]></MsgType>
        <Event><![CDATA[TEMPLATESENDJOBFINISH]]></Event>
        <MsgID>200163836</MsgID>
        <Status><![CDATA[success]]></Status>
    </xml>"""
    
    from xml.etree import ElementTree as ET
    
    info = {}
    root = ET.XML(content)
    for node in root:
        # print(node.tag,node.text)
        info[node.tag] = node.text
    print(info)
    

    6.压缩文件

    基于Python内置的shutil模块可以实现对压缩文件的操作。

    import shutil
    
    # 1. 压缩文件
    """
    # base_name,压缩后的压缩包文件
    # format,压缩的格式,例如:"zip", "tar", "gztar", "bztar", or "xztar".
    # root_dir,要压缩的文件夹路径
    """
    # shutil.make_archive(base_name=r'datafile',format='zip',root_dir=r'files')
    
    
    # 2. 解压文件
    """
    # filename,要解压的压缩包文件
    # extract_dir,解压的路径
    # format,压缩文件格式
    """
    # shutil.unpack_archive(filename=r'datafile.zip', extract_dir=r'xxxxxx/xo', format='zip')
    

    7.路径相关

    7.1 转义

    windows路径使用的是\,linux路径使用的是/。

    特别的,在windows系统中如果有这样的一个路径 D:\nxxx\txxx\x1,程序会报错。因为在路径中存在特殊符 \n(换行符)和\t(制表符),Python解释器无法自动区分。

    所以,在windows中编写路径时,一般有两种方式:

    • 加转义符,例如:"D:\\nxxx\\txxx\\x1"
    • 路径前加r,例如:r"D:\\nxxx\\txxx\\x1"

    7.2 程序当前路径

    项目中如果使用了相对路径,那么一定要注意当前所在的位置。

    例如:在/Users/wupeiqi/PycharmProjects/CodeRepository/路径下编写 demo.py文件

    with open("a1.txt", mode='w', encoding='utf-8') as f:
        f.write("你好呀")
    

    用以下两种方式去运行:

    • 方式1,文件会创建在 /Users/wupeiqi/PycharmProjects/CodeRepository/ 目录下。

      cd /Users/wupeiqi/PycharmProjects/CodeRepository/
      python demo.py
      
    • 方式2,文件会创建在 /Users/wupeiqi目录下。

      cd /Users/wupeiqi
      python /Users/wupeiqi/PycharmProjects/CodeRepository/demo.py
      
    import os
    
    """
    # 1.获取当前运行的py脚本所在路径
    abs = os.path.abspath(__file__)
    print(abs) # /Users/wupeiqi/PycharmProjects/luffyCourse/day09/20.路径相关.py
    path = os.path.dirname(abs)
    print(path) # /Users/wupeiqi/PycharmProjects/luffyCourse/day09
    """
    base_dir = os.path.dirname(os.path.abspath(__file__))
    file_path = os.path.join(base_dir, 'files', 'info.txt')
    print(file_path)
    if os.path.exists(file_path):
        file_object = open(file_path, mode='r', encoding='utf-8')
        data = file_object.read()
        file_object.close()
    
        print(data)
    else:
        print('文件路径不存在')
    

    7.3 文件和路径相关

    import shutil
    import os
    
    # 1. 获取当前脚本绝对路径
    """
    abs_path = os.path.abspath(__file__)
    print(abs_path)
    """
    
    # 2. 获取当前文件的上级目录
    """
    base_path = os.path.dirname( os.path.dirname(路径) )
    print(base_path)
    """
    
    # 3. 路径拼接
    """
    p1 = os.path.join(base_path, 'xx')
    print(p1)
    
    p2 = os.path.join(base_path, 'xx', 'oo', 'a1.png')
    print(p2)
    """
    
    # 4. 判断路径是否存在
    """
    exists = os.path.exists(p1)
    print(exists)
    """
    
    # 5. 创建文件夹
    """
    os.makedirs(路径)
    """
    """
    path = os.path.join(base_path, 'xx', 'oo', 'uuuu')
    if not os.path.exists(path):
        os.makedirs(path)
    """
    
    # 6. 是否是文件夹
    """
    file_path = os.path.join(base_path, 'xx', 'oo', 'uuuu.png')
    is_dir = os.path.isdir(file_path)
    print(is_dir) # False
    
    folder_path = os.path.join(base_path, 'xx', 'oo', 'uuuu')
    is_dir = os.path.isdir(folder_path)
    print(is_dir) # True
    
    """
    
    # 7. 删除文件或文件夹
    """
    os.remove("文件路径")
    """
    """
    path = os.path.join(base_path, 'xx')
    shutil.rmtree(path)
    """
    
    # 8. 拷贝文件夹
    """
    shutil.copytree("/Users/wupeiqi/Desktop/图/csdn/","/Users/wupeiqi/PycharmProjects/CodeRepository/files")
    """
    
    # 9.拷贝文件
    """
    shutil.copy("/Users/wupeiqi/Desktop/图/csdn/WX20201123-112406@2x.png","/Users/wupeiqi/PycharmProjects/CodeRepository/")
    shutil.copy("/Users/wupeiqi/Desktop/图/csdn/WX20201123-112406@2x.png","/Users/wupeiqi/PycharmProjects/CodeRepository/x.png")
    """
    
    # 10.文件或文件夹重命名
    """
    shutil.move("/Users/wupeiqi/PycharmProjects/CodeRepository/x.png","/Users/wupeiqi/PycharmProjects/CodeRepository/xxxx.png")
    shutil.move("/Users/wupeiqi/PycharmProjects/CodeRepository/files","/Users/wupeiqi/PycharmProjects/CodeRepository/images")
    """
    

    总结

    今天我们主要围绕着文件 相关的操作来展开进行讲解,让大家能够基于Python处理不同格式的文件。由于涉及的知识点比较多,所以今日的内容学起来会比较耗时,但都比较简单,只需要理解并编写好相关笔记以便后期开发时翻阅。

    1. 文件相对路径,在使用相对路径时可能会执行程序的目录不同,导致路径出问题。所以,如若使用相对路径请务必清楚当前运行程序所在目录。

    2. 文件绝对路径(推荐),不要将文件路径写死,而是基于 os 模块中的相关功能自动化获取绝对路径,以方便项目移动到其他文件或电脑上。

      import os
      base_dir = os.path.dirname(os.path.abspath(__file__))
      file_path = os.path.join(base_dir, 'files', 'info.txt')
      
    3. 路径转义

      • 手动写路径,需要自己在路径中添加 r 或 加入 \ 来进行处理。
      • 基于os.path.join拼接,内部自动处理,不需要手动处理。
    4. 内置函数、内置模块、第三方模块的区别?

    5. 如何去下载安装第三方模块?

      pip install 模块名称
      
      • requests模块,可以用来发送网络请求。
      • openpyxl模块,处理Excel格式的文件。

    1.1 json

    json模块,是python内部的一个模块,可以将python的数据格式 转换为json格式的数据,也可以将json格式的数据转换为python的数据格式。

    json格式,是一个数据格式(本质上就是个字符串,常用语网络数据传输)

    # Python中的数据类型的格式
    data = [
        {"id": 1, "name": "武沛齐", "age": 18},
        {"id": 2, "name": "alex", "age": 18},
        ('wupeiqi',123),
    ]
    
    # JSON格式
    value = '[{"id": 1, "name": "武沛齐", "age": 18}, {"id": 2, "name": "alex", "age": 18},["wupeiqi",123]]'
    

    1.1.1 核心功能

    json格式的作用?

    跨语言数据传输,例如:
    	A系统用Python开发,有列表类型和字典类型等。
    	B系统用Java开发,有数组、map等的类型。
    
    	语言不同,基础数据类型格式都不同。
    	
    	为了方便数据传输,大家约定一个格式:json格式,每种语言都是将自己数据类型转换为json格式,也可以将json格式的数据转换为自己的数据类型。
    

    Python数据类型与json格式的相互转换:

    • 数据类型 -> json ,一般称为:序列化

      import json
      
      data = [
          {"id": 1, "name": "武沛齐", "age": 18},
          {"id": 2, "name": "alex", "age": 18},
      ]
      
      res = json.dumps(data)
      print(res) # '[{"id": 1, "name": "\u6b66\u6c9b\u9f50", "age": 18}, {"id": 2, "name": "alex", "age": 18}]'
      
      res = json.dumps(data, ensure_ascii=False)
      print(res) # '[{"id": 1, "name": "武沛齐", "age": 18}, {"id": 2, "name": "alex", "age": 18}]'
      
    • json格式 -> 数据类型,一般称为:反序列化

      import json
      
      data_string = '[{"id": 1, "name": "武沛齐", "age": 18}, {"id": 2, "name": "alex", "age": 18}]'
      
      data_list = json.loads(data_string)
      
      print(data_list)
      

    练习题

    1. 写网站,给用户返回json格式数据

      • 安装flask模块,协助我们快速写网站(之前已安装过)

        pip3 install flask
        
      • 使用flask写网站

        import json
        from flask import Flask
        
        app = Flask(__name__)
        
        
        def index():
            return "首页"
        
        
        def users():
            data = [
                {"id": 1, "name": "武沛齐", "age": 18},
                {"id": 2, "name": "alex", "age": 18},
            ]
            return json.dumps(data)
        
        
        app.add_url_rule('/index/', view_func=index, endpoint='index')
        app.add_url_rule('/users/', view_func=users, endpoint='users')
        
        if __name__ == '__main__':
            app.run()
        
    2. 发送网络请求,获取json格式数据并处理。

      import json
      import requests
      
      url = "https://movie.douban.com/j/search_subjects?type=movie&tag=%E7%83%AD%E9%97%A8&sort=recommend&page_limit=5&page_start=20"
      
      res = requests.get(
          url=url,
          headers={
              "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36"
          }
      )
      
      # json格式
      print(res.text)
      
      # json格式转换为python数据类型
      data_dict = json.loads(res.text)
      print(data_dict)
      

    1.1.2 类型要求

    python的数据类型转换为 json 格式,对数据类型是有要求的,默认只支持:

        +-------------------+---------------+
        | Python            | JSON          |
        +===================+===============+
        | dict              | object        |
        +-------------------+---------------+
        | list, tuple       | array         |
        +-------------------+---------------+
        | str               | string        |
        +-------------------+---------------+
        | int, float        | number        |
        +-------------------+---------------+
        | True              | true          |
        +-------------------+---------------+
        | False             | false         |
        +-------------------+---------------+
        | None              | null          |
        +-------------------+---------------+
    
    data = [
        {"id": 1, "name": "武沛齐", "age": 18},
        {"id": 2, "name": "alex", "age": 18},
    ]
    

    其他类型如果想要支持,需要自定义JSONEncoder 才能实现【目前只需要了解大概意思即可,以后项目开发中用到了还会讲解。】,例如:

    import json
    from decimal import Decimal
    from datetime import datetime
    
    data = [
        {"id": 1, "name": "武沛齐", "age": 18, 'size': Decimal("18.99"), 'ctime': datetime.now()},
        {"id": 2, "name": "alex", "age": 18, 'size': Decimal("9.99"), 'ctime': datetime.now()},
    ]
    
    
    class MyJSONEncoder(json.JSONEncoder):
        def default(self, o):
            if type(o) == Decimal:
                return str(o)
            elif type(o) == datetime:
                return o.strftime("%Y-%M-%d")
            return super().default(o)
    
    
    res = json.dumps(data, cls=MyJSONEncoder)
    print(res)
    

    1.1.3 其他功能

    json模块中常用的是:

    • json.dumps,序列化生成一个字符串。

    • json.loads,发序列化生成python数据类型。

    • json.dump,将数据序列化并写入文件(不常用)

      import json
      
      data = [
          {"id": 1, "name": "武沛齐", "age": 18},
          {"id": 2, "name": "alex", "age": 18},
      ]
      
      file_object = open('xxx.json', mode='w', encoding='utf-8')
      
      json.dump(data, file_object)
      
      file_object.close()
      
    • json.load,读取文件中的数据并反序列化为python的数据类型(不常用)

      import json
      
      file_object = open('xxx.json', mode='r', encoding='utf-8')
      
      data = json.load(file_object)
      print(data)
      
      file_object.close()
      

    1.2 时间处理

    • UTC/GMT:世界时间

    • 本地时间:本地时区的时间。

    Python中关于时间处理的模块有两个,分别是time和datetime。

    1.2.1 time

    import time
    
    # 获取当前时间戳(自1970-1-1 00:00)
    v1 = time.time()
    print(v1)
    
    # 时区
    v2 = time.timezone
    
    # 停止n秒,再执行后续的代码。
    time.sleep(5)
    

    1.2.2 datetime

    在平时开发过程中的时间一般是以为如下三种格式存在:

    • datetime

      from datetime import datetime, timezone, timedelta
      
      v1 = datetime.now()  # 当前本地时间
      print(v1)
      
      tz = timezone(timedelta(hours=7))  # 当前东7区时间
      v2 = datetime.now(tz)
      print(v2)
      
      v3 = datetime.utcnow()  # 当前UTC时间
      print(v3)
      
      from datetime import datetime, timedelta
      
      v1 = datetime.now()
      print(v1)
      
      # 时间的加减
      v2 = v1 + timedelta(days=140, minutes=5)
      print(v2)
      
      # datetime类型 + timedelta类型
      
      from datetime import datetime, timezone, timedelta
      
      v1 = datetime.now()
      print(v1)
      
      v2 = datetime.utcnow()  # 当前UTC时间
      print(v2)
      
      # datetime之间相减,计算间隔时间(不能相加)
      data = v1 - v2
      print(data.days, data.seconds / 60 / 60, data.microseconds)
      
      # datetime类型 - datetime类型
      # datetime类型 比较 datetime类型
      
    • 字符串

      # 字符串格式的时间  ---> 转换为datetime格式时间
      text = "2021-11-11"
      v1 = datetime.strptime(text,'%Y-%m-%d') # %Y 年,%m,月份,%d,天。
      print(v1)
      
      # datetime格式 ----> 转换为字符串格式
      v1 = datetime.now()
      val = v1.strftime("%Y-%m-%d %H:%M:%S")
      print(val)
      
    • 时间戳

      # 时间戳格式 --> 转换为datetime格式
      ctime = time.time() # 11213245345.123
      v1 = datetime.fromtimestamp(ctime)
      print(v1)
      
      # datetime格式 ---> 转换为时间戳格式
      v1 = datetime.now()
      val = v1.timestamp()
      print(val)
      

    image-20210104171741529

    练习题

    1. 日志记录,将用户输入的信息写入到文件,文件名格式为年-月-日-时-分.txt

      from datetime import datetime
      
      while True:
          text = input("请输入内容:")
          if text.upper() == "Q":
              break
              
          current_datetime = datetime.now().strftime("%Y-%m-%d-%H-%M")
          file_name = "{}.txt".format(current_datetime)
          
          with open(file_name, mode='a', encoding='utf-8') as file_object:
              file_object.write(text)
              file_object.flush()
      
    2. 用户注册,将用户信息写入Excel,其中包含:用户名、密码、注册时间 三列。

      import os
      import hashlib
      from datetime import datetime
      
      from openpyxl import load_workbook
      from openpyxl import workbook
      
      
      BASE_DIR = os.path.dirname(os.path.abspath(__file__))
      FILE_NAME = "db.xlsx"
      
      
      def md5(origin):
          hash_object = hashlib.md5("sdfsdfsdfsd23sd".encode('utf-8'))
          hash_object.update(origin.encode('utf-8'))
          return hash_object.hexdigest()
      
      
      def register(username, password):
          db_file_path = os.path.join(BASE_DIR, FILE_NAME)
          if os.path.exists(db_file_path):
              wb = load_workbook(db_file_path)
              sheet = wb.worksheets[0]
              next_row_position = sheet.max_row + 1
          else:
              wb = workbook.Workbook()
              sheet = wb.worksheets[0]
              next_row_position = 1
      
          user = sheet.cell(next_row_position, 1)
          user.value = username
      
          pwd = sheet.cell(next_row_position, 2)
          pwd.value = md5(password)
      
          ctime = sheet.cell(next_row_position, 3)
          ctime.value = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
      
          wb.save(db_file_path)
      
      
      def run():
          while True:
              username = input("请输入用户名:")
              if username.upper() == "Q":
                  break
              password = input("请输入密码:")
              register(username, password)
      
      
      if __name__ == '__main__':
          run()
      
      

    1.3 正则表达式相关

    当给你一大堆文本信息,让你提取其中的指定数据时,可以使用正则来实现。例如:提取文本中的邮箱和手机号

    import re
    
    text = "楼主太牛逼了,在线想要 442662578@qq.com和xxxxx@live.com谢谢楼主,手机号也可15131255789,搞起来呀"
    
    phone_list = re.findall("1[3|5|8|9]\d{9}", text)
    print(phone_list)
    

    1.3.1 正则表达式

    1. 字符相关
    • wupeiqi 匹配文本中的wupeiqi

      import re
      
      text = "你好wupeiqi,阿斯顿发wupeiqasd 阿士大夫能接受的wupeiqiff"
      data_list = re.findall("wupeiqi", text)
      print(data_list) # ['wupeiqi', 'wupeiqi'] 可用于计算字符串中某个字符出现的次数
      
    • [abc] 匹配a或b或c 字符。

      import re
      
      text = "你2b好wupeiqi,阿斯顿发awupeiqasd 阿士大夫a能接受的wffbbupqaceiqiff"
      data_list = re.findall("[abc]", text)
      print(data_list) # ['b', 'a', 'a', 'a', 'b', 'b', 'c']
      
      import re
      
      text = "你2b好wupeiqi,阿斯顿发awupeiqasd 阿士大夫a能接受的wffbbupqcceiqiff"
      data_list = re.findall("q[abc]", text)
      print(data_list) # ['qa', 'qc']
      
    • [^abc] 匹配除了abc意外的其他字符。

      import re
      
      text = "你wffbbupceiqiff"
      data_list = re.findall("[^abc]", text)
      print(data_list)  # ['你', 'w', 'f', 'f', 'u', 'p', 'e', 'i', 'q', 'i', 'f', 'f']
      
    • [a-z] 匹配a~z的任意字符( [0-9]也可以 )。

      import re
      
      text = "alexrootrootadmin"
      data_list = re.findall("t[a-z]", text)
      print(data_list)  # ['tr', 'ta']
      
    • . 代指除换行符以外的任意字符。

      import re
      
      text = "alexraotrootadmin"
      data_list = re.findall("r.o", text)
      print(data_list) # ['rao', 'roo']
      
      import re
      
      text = "alexraotrootadmin"
      data_list = re.findall("r.+o", text) # 贪婪匹配
      print(data_list) # ['raotroo']
      
      import re
      
      text = "alexraotrootadmin"
      data_list = re.findall("r.+?o", text) # 非贪婪匹配
      print(data_list) # ['rao']
      
    • \w 代指字母或数字或下划线(汉字)。

      import re
      
      text = "北京武沛alex齐北  京武沛alex齐"
      data_list = re.findall("武\w+x", text)
      print(data_list) # ['武沛alex', '武沛alex']
      
    • \d 代指数字

      import re
      
      text = "root-ad32min-add3-admd1in"
      data_list = re.findall("d\d", text)
      print(data_list) # ['d3', 'd3', 'd1']
      
      import re
      
      text = "root-ad32min-add3-admd1in"
      data_list = re.findall("d\d+", text)
      print(data_list) # ['d32', 'd3', 'd1']
      
    • \s 代指任意的空白符,包括空格、制表符等。

      import re
      
      text = "root admin add admin"
      data_list = re.findall("a\w+\s\w+", text)
      print(data_list) # ['admin add']
      
    2. 数量相关
    • * 重复0次或更多次

      import re
      
      text = "他是大B个,确实是个大2B。"
      data_list = re.findall("大2*B", text)
      print(data_list) # ['大B', '大2B']
      
    • + 重复1次或更多次

      import re
      
      text = "他是大B个,确实是个大2B,大3B,大66666B。"
      data_list = re.findall("大\d+B", text)
      print(data_list) # ['大2B', '大3B', '大66666B']
      
    • ? 重复0次或1次

      import re
      
      text = "他是大B个,确实是个大2B,大3B,大66666B。"
      data_list = re.findall("大\d?B", text)
      print(data_list) # ['大B', '大2B', '大3B']
      
    • {n} 重复n次

      import re
      
      text = "楼主太牛逼了,在线想要 442662578@qq.com和xxxxx@live.com谢谢楼主,手机号也可15131255789,搞起来呀"
      data_list = re.findall("151312\d{5}", text)
      print(data_list) # ['15131255789']
      
    • {n,} 重复n次或更多次

      import re
      
      text = "楼主太牛逼了,在线想要 442662578@qq.com和xxxxx@live.com谢谢楼主,手机号也可15131255789,搞起来呀"
      data_list = re.findall("\d{9,}", text)
      print(data_list) # ['442662578', '15131255789']
      
      
    • {n,m} 重复n到m次

      import re
      
      text = "楼主太牛逼了,在线想要 442662578@qq.com和xxxxx@live.com谢谢楼主,手机号也可15131255789,搞起来呀"
      data_list = re.findall("\d{10,15}", text)
      print(data_list) # ['15131255789']
      
    3. 括号(分组)
    • 提取数据区域

      import re
      
      text = "楼主太牛逼了,在线想要 442662578@qq.com和xxxxx@live.com谢谢楼主,手机号也可15131255789,搞起来呀"
      data_list = re.findall("15131(2\d{5})", text)
      print(data_list)  # ['255789']
      
      import re
      
      text = "楼主太牛逼了,在线想要 442662578@qq.com和xxxxx@live.com谢谢楼主,手机号也可15131255789,搞起来15131266666呀"
      data_list = re.findall("15(13)1(2\d{5})", text)
      print(data_list)  # [ ('13', '255789')   ]
      
      import re
      
      text = "楼主太牛逼了,在线想要 442662578@qq.com和xxxxx@live.com谢谢楼主,手机号也可15131255789,搞起来呀"
      data_list = re.findall("(15131(2\d{5}))", text)
      print(data_list)  # [('15131255789', '255789')]
      
    • 获取指定区域 + 或条件

      import re
      
      text = "楼主15131root太牛15131alex逼了,在线想要 442662578@qq.com和xxxxx@live.com谢谢楼主,手机号也可15131255789,搞起来呀"
      data_list = re.findall("15131(2\d{5}|r\w+太)", text)
      print(data_list)  # ['root太', '255789']
      
      import re
      
      text = "楼主15131root太牛15131alex逼了,在线想要 442662578@qq.com和xxxxx@live.com谢谢楼主,手机号也可15131255789,搞起来呀"
      data_list = re.findall("(15131(2\d{5}|r\w+太))", text)
      print(data_list)  # [('15131root太', 'root太'), ('15131255789', '255789')]
      
      练习题
    1. 利用正则匹配QQ号码

      [1-9]\d{4,}
      
    2. 身份证号码

      import re
      
      text = "dsf130429191912015219k13042919591219521Xkk"
      data_list = re.findall("\d{17}[\dX]", text) # [abc]
      print(data_list) # ['130429191912015219', '13042919591219521X']
      
      import re
      
      text = "dsf130429191912015219k13042919591219521Xkk"
      data_list = re.findall("\d{17}(\d|X)", text)
      print(data_list) # ['9', 'X']
      
      import re
      
      text = "dsf130429191912015219k13042919591219521Xkk"
      data_list = re.findall("(\d{17}(\d|X))", text)
      print(data_list) # [('130429191912015219', '9'), ('13042919591219521X', 'X')]
      
      import re
      
      text = "dsf130429191912015219k13042919591219521Xkk"
      data_list = re.findall("(\d{6})(\d{4})(\d{2})(\d{2})(\d{3})([0-9]|X)", text)
      print(data_list) # [('130429', '1919', '12', '01', '521', '9'), ('130429', '1959', '12', '19', '521', 'X')]
      
    3. 手机号

      import re
      
      text = "我的手机哈是15133377892,你的手机号是1171123啊?"
      data_list = re.findall("1[3-9]\d{9}", text)
      print(data_list)  # ['15133377892']
      
    4. 邮箱地址

      import re
      
      text = "楼主太牛逼了,在线想要 442662578@qq.com和xxxxx@live.com谢谢楼主,手机号也可15131255789,搞起来呀"
      email_list = re.findall("\w+@\w+\.\w+",text)
      print(email_list) # ['442662578@qq.com和xxxxx']
      
      import re
      
      text = "楼主太牛逼了,在线想要 442662578@qq.com和xxxxx@live.com谢谢楼主,手机号也可15131255789,搞起来呀"
      email_list = re.findall("[a-zA-Z0-9_-]+@[a-zA-Z0-9_-]+\.[a-zA-Z0-9_-]+", text, re.ASCII)
      print(email_list) # ['442662578@qq.com', 'xxxxx@live.com']
      
      
      import re
      
      text = "楼主太牛逼了,在线想要 442662578@qq.com和xxxxx@live.com谢谢楼主,手机号也可15131255789,搞起来呀"
      email_list = re.findall("\w+@\w+\.\w+", text, re.ASCII)
      print(email_list) # ['442662578@qq.com', 'xxxxx@live.com']
      
      import re
      
      text = "楼主太牛44266-2578@qq.com逼了,在线想要 442662578@qq.com和xxxxx@live.com谢谢楼主,手机号也可15131255789,搞起来呀"
      email_list = re.findall("(\w+([-+.]\w+)*@\w+([-.]\w+)*\.\w+([-.]\w+)*)", text, re.ASCII)
      print(email_list) # [('44266-2578@qq.com', '-2578', '', ''), ('xxxxx@live.com', '', '', '')]
      
    5. 补充代码,实现获取页面上的所有评论(已实现),并提取里面的邮箱。

      # 先安装两个模块
      pip3 install requests
      pip3 install beautifulsoup4
      
      import re
      import requests
      from bs4 import BeautifulSoup
      
      res = requests.get(
          url="https://www.douban.com/group/topic/79870081/",
          headers={
              'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163 Safari/537.36',
          }
      )
      bs_object = BeautifulSoup(res.text, "html.parser")
      comment_object_list = bs_object.find_all("p", attrs={"class": "reply-content"})
      for comment_object in comment_object_list:
          text = comment_object.text
          print(text)
          # 请继续补充代码,提取text中的邮箱地址
      
      
    4. 起始和结束

    上述示例中都是去一段文本中提取数据,只要文本中存在即可。

    但,如果要求用户输入的内容必须是指定的内容开头和结尾,比就需要用到如下两个字符。

    • ^ 开始
    • $ 结束
    import re
    
    text = "啊442662578@qq.com我靠"
    email_list = re.findall("^\w+@\w+.\w+$", text, re.ASCII)
    print(email_list) # []
    
    import re
    
    text = "442662578@qq.com"
    email_list = re.findall("^\w+@\w+.\w+$", text, re.ASCII)
    print(email_list) # ['442662578@qq.com']
    

    这种一般用于对用户输入数据格式的校验比较多,例如:

    import re
    
    text = input("请输入邮箱:")
    email = re.findall("^\w+@\w+.\w+$", text, re.ASCII)
    if not email:
        print("邮箱格式错误")
    else:
        print(email)
    
    5. 特殊字符

    由于正则表达式中 * . \ { } ( ) 等都具有特殊的含义,所以如果想要在正则中匹配这种指定的字符,需要转义,例如:

    import re
    
    text = "我是你{5}爸爸"
    data = re.findall("你{5}爸", text)
    print(data) # []
    
    import re
    
    text = "我是你{5}爸爸"
    data = re.findall("你\{5\}爸", text)
    print(data)
    

    1.3.2 re模块

    python中提供了re模块,可以处理正则表达式并对文本进行处理。

    • findall,获取匹配到的所有数据

      import re
      
      text = "dsf130429191912015219k13042919591219521Xkk"
      data_list = re.findall("(\d{6})(\d{4})(\d{2})(\d{2})(\d{3})([0-9]|X)", text)
      print(data_list) # [('130429', '1919', '12', '01', '521', '9'), ('130429', '1959', '12', '19', '521', 'X')]
      
    • match,从起始位置开始匹配,匹配成功返回一个对象,未匹配成功返回None

      import re
      
      text = "大小逗2B最逗3B欢乐"
      data = re.match("逗\dB", text)
      print(data) # None
      
      import re
      
      text = "逗2B最逗3B欢乐"
      data = re.match("逗\dB", text)
      if data:
          content = data.group() # "逗2B"
          print(content)
      
    • search,浏览整个字符串去匹配第一个,未匹配成功返回None

      import re
      
      text = "大小逗2B最逗3B欢乐"
      data = re.search("逗\dB", text)
      if data:
          print(data.group())  # "逗2B"
      
    • sub,替换匹配成功的位置

      import re
      
      text = "逗2B最逗3B欢乐"
      data = re.sub("\dB", "沙雕", text)
      print(data) # 逗沙雕最逗沙雕欢乐
      
      import re
      
      text = "逗2B最逗3B欢乐"
      data = re.sub("\dB", "沙雕", text, 1)
      print(data) # 逗沙雕最逗3B欢乐
      
    • split,根据匹配成功的位置分割

      import re
      
      text = "逗2B最逗3B欢乐"
      data = re.split("\dB", text)
      print(data) # ['逗', '最逗', '欢乐']
      
      import re
      
      text = "逗2B最逗3B欢乐"
      data = re.split("\dB", text, 1)
      print(data) # ['逗', '最逗3B欢乐']
      
    • finditer

      import re
      
      text = "逗2B最逗3B欢乐"
      data = re.finditer("\dB", text)
      for item in data:
          print(item.group())
      
      import re
      
      text = "逗2B最逗3B欢乐"
      data = re.finditer("(?P<xx>\dB)", text)  # 命名分组
      for item in data:
          print(item.groupdict())
      
      text = "dsf130429191912015219k13042919591219521Xkk"
      data_list = re.finditer("\d{6}(?P<year>\d{4})(?P<month>\d{2})(?P<day>\d{2})\d{3}[\d|X]", text)
      for item in data_list:
          info_dict = item.groupdict()
          print(info_dict)
      

    3.1 os

    import os
    
    # 1. 获取当前脚本绝对路径
    """
    abs_path = os.path.abspath(__file__)
    print(abs_path)
    """
    
    # 2. 获取当前文件的上级目录
    """
    base_path = os.path.dirname( os.path.dirname(路径) )
    print(base_path)
    """
    
    # 3. 路径拼接
    """
    p1 = os.path.join(base_path, 'xx')
    print(p1)
    
    p2 = os.path.join(base_path, 'xx', 'oo', 'a1.png')
    print(p2)
    """
    
    # 4. 判断路径是否存在
    """
    exists = os.path.exists(p1)
    print(exists)
    """
    
    # 5. 创建文件夹
    """
    os.makedirs(路径)
    """
    """
    path = os.path.join(base_path, 'xx', 'oo', 'uuuu')
    if not os.path.exists(path):
        os.makedirs(path)
    """
    
    # 6. 是否是文件夹
    """
    file_path = os.path.join(base_path, 'xx', 'oo', 'uuuu.png')
    is_dir = os.path.isdir(file_path)
    print(is_dir) # False
    
    folder_path = os.path.join(base_path, 'xx', 'oo', 'uuuu')
    is_dir = os.path.isdir(folder_path)
    print(is_dir) # True
    
    """
    
    # 7. 删除文件或文件夹
    """
    os.remove("文件路径")
    """
    """
    path = os.path.join(base_path, 'xx')
    shutil.rmtree(path)
    """
    
    
    • listdir,查看目录下所有的文件
    • walk,查看目录下所有的文件(含子孙文件)
    import os
    
    """
    data = os.listdir("/Users/wupeiqi/PycharmProjects/luffyCourse/day14/commons")
    print(data)
    # ['convert.py', '__init__.py', 'page.py', '__pycache__', 'utils.py', 'tencent']
    """
    
    """
    要遍历一个文件夹下的所有文件,例如:遍历文件夹下的所有mp4文件
    """
    
    data = os.walk("/Users/wupeiqi/Documents/视频教程/路飞Python/mp4")
    for path, folder_list, file_list in data:
        for file_name in file_list:
            file_abs_path = os.path.join(path, file_name)
            ext = file_abs_path.rsplit(".",1)[-1]
            if ext == "mp4":
                print(file_abs_path)
    

    3.2 shutil

    import shutil
    
    # 1. 删除文件夹
    """
    path = os.path.join(base_path, 'xx')
    shutil.rmtree(path)
    """
    
    # 2. 拷贝文件夹
    """
    shutil.copytree("/Users/wupeiqi/Desktop/图/csdn/","/Users/wupeiqi/PycharmProjects/CodeRepository/files")
    """
    
    # 3.拷贝文件
    """
    shutil.copy("/Users/wupeiqi/Desktop/图/csdn/WX20201123-112406@2x.png","/Users/wupeiqi/PycharmProjects/CodeRepository/")
    shutil.copy("/Users/wupeiqi/Desktop/图/csdn/WX20201123-112406@2x.png","/Users/wupeiqi/PycharmProjects/CodeRepository/x.png")
    """
    
    # 4.文件或文件夹重命名
    """
    shutil.move("/Users/wupeiqi/PycharmProjects/CodeRepository/x.png","/Users/wupeiqi/PycharmProjects/CodeRepository/xxxx.png")
    shutil.move("/Users/wupeiqi/PycharmProjects/CodeRepository/files","/Users/wupeiqi/PycharmProjects/CodeRepository/images")
    """
    
    # 5. 压缩文件
    """
    # base_name,压缩后的压缩包文件
    # format,压缩的格式,例如:"zip", "tar", "gztar", "bztar", or "xztar".
    # root_dir,要压缩的文件夹路径
    """
    # shutil.make_archive(base_name=r'datafile',format='zip',root_dir=r'files')
    
    
    # 6. 解压文件
    """
    # filename,要解压的压缩包文件
    # extract_dir,解压的路径
    # format,压缩文件格式
    """
    # shutil.unpack_archive(filename=r'datafile.zip', extract_dir=r'xxxxxx/xo', format='zip')
    

    3.3 sys

    import sys
    
    # 1. 获取解释器版本
    """
    print(sys.version)
    print(sys.version_info)
    print(sys.version_info.major, sys.version_info.minor, sys.version_info.micro)
    """
    
    # 2. 导入模块路径
    """
    print(sys.path)
    """
    
    
    • argv,执行脚本时,python解释器后面传入的参数
    import sys
    
    print(sys.argv)
    
    
    # [
    #       '/Users/wupeiqi/PycharmProjects/luffyCourse/day14/2.接受执行脚本的参数.py'
    # ]
    
    # [
    #     "2.接受执行脚本的参数.py"
    # ]
    
    # ['2.接受执行脚本的参数.py', '127', '999', '666', 'wupeiqi']
    
    # 例如,请实现下载图片的一个工具。
    
    def download_image(url):
        print("下载图片", url)
    
    
    def run():
        # 接受用户传入的参数
        url_list = sys.argv[1:]
        for url in url_list:
            download_image(url)
    
    
    if __name__ == '__main__':
        run()
    

    3.4 random

    import random
    
    # 1. 获取范围内的随机整数
    v = random.randint(10, 20)
    print(v)
    
    # 2. 获取范围内的随机小数
    v = random.uniform(1, 10)
    print(v)
    
    # 3. 随机抽取一个元素
    v = random.choice([11, 22, 33, 44, 55])
    print(v)
    
    # 4. 随机抽取多个元素
    v = random.sample([11, 22, 33, 44, 55], 3)
    print(v)
    
    # 5. 打乱顺序
    data = [1, 2, 3, 4, 5, 6, 7, 8, 9]
    random.shuffle(data)
    print(data)
    

    3.5 hashlib

    import hashlib
    
    hash_object = hashlib.md5()
    hash_object.update("武沛齐".encode('utf-8'))
    result = hash_object.hexdigest()
    print(result)
    
    import hashlib
    
    hash_object = hashlib.md5("iajfsdunjaksdjfasdfasdf".encode('utf-8'))
    hash_object.update("武沛齐".encode('utf-8'))
    result = hash_object.hexdigest()
    print(result)
    

    3.6 configparser

  • 相关阅读:
    Day Six(Beta)
    Day Five (beta)
    Day Four(Beta)
    Day Three(Beta)
    Day Two(Beta)
    Day One(Beta)
    项目冲刺——总结
    beta版本贡献率
    软件工程实践总结
    团队作业--Beta版本冲刺
  • 原文地址:https://www.cnblogs.com/bubu99/p/16324295.html
Copyright © 2020-2023  润新知