• Python Module_openpyxl_处理Excel表格


    目录

    前言

    Python处理Excel表格有多种方法,其中对于.xlsx后缀的Excel版本而言openpyxl绝对是一个非常棒的选择。在openpyxl中,一个Excel文件就是一个Workbook,一张Excel文件中的表就是一个Worksheet。当我们需要对一个Excel文件进行处理的时候,需要先获取到一个Workbook对象,再获取到一个Worksheet对象,对Worksheet对象中rows、columns进行数据处理,最后通过Workbook.save()方法将Workbook对象的内容写入到磁盘中。或者可以使用Openpyxl内置的ExcelWriter()方法来关联Workbook对象,最终实现写入。

    软件系统

    • 系统
      • Windows 8.1
    • 软件
      • Python 3.4.3

    Install openpyxl module

    使用Python3.4.3自带的软件包管理工具easy_install.exe来安装openpyxl模块
    Run(Ctrl+r) cmd

    cd %pythonRoot%Scripts
    easy_install.exe openpyxl

    Check:安装后导入openpyxl模块不会触发ImportError

    import openpyxl

    Sample code

    from openpyxl import Workbook
    wb = Workbook()
    
    # grab the active worksheet
    ws = wb.active
    
    # Data can be assigned directly to cells
    ws['A1'] = 42
    
    # Rows can also be appended
    ws.append([1, 2, 3])
    
    # Python types will automatically be converted
    import datetime
    ws['A2'] = datetime.datetime.now()
    
    # Save the file
    wb.save("sample.xlsx")

    Documentationhttp://openpyxl.readthedocs.org

    load_workbook()加载Excel文件

    我们将一个Excel文件称之为一个workbook,workbook中又包含了许多的worksheet(工作表)。我们可以通过workbook[‘sheetName’]来定位一个worksheet。
    将文件导入到内存

    load_workbook(filename, read_only=False, use_iterators=False, keep_vba=False, guess_types=False, data_only=False)
    In [7]: help(load_workbook)
    Help on function load_workbook in module openpyxl.reader.excel:
    
    load_workbook(filename, read_only=False, use_iterators=False, keep_vba=False, guess_types=False, data_only=False)
        Open the given filename and return the workbook
    
        :param filename: the path to open or a file-like object
        :type filename: string or a file-like object open in binary mode c.f., :class:`zipfile.ZipFile`
    
        :param read_only: optimised for reading, content cannot be edited
        :type read_only: bool 
    
        :param use_iterators: use lazy load for cells
        :type use_iterators: bool
    
        :param keep_vba: preseve vba content (this does NOT mean you can use it)
        :type keep_vba: bool
    
        :param guess_types: guess cell content type and do not read it from the file
        :type guess_types: bool
    
        :param data_only: controls whether cells with formulae have either the formula (default) or the value stored the last time Excel read the sheet
        :type data_only: bool
    
        :rtype: :class:`openpyxl.workbook.Workbook`

    filename(str or file-like object):是一个Excel文件的路径或者是一个类文件对象。
    read_only(bool):只读模式,不可编辑文件。缺省为False
    use_iterators(bool):是否调用延迟加载。缺省为False
    keep_vba(bool):是否保持VBA的内容。缺省为False
    guess_type(bool):获取单元格内容的类型而且不能从文件中读取他。缺省为False
    date_only(bool):控制包含有公式的单元格是否有任何公式,或者存储着最后读取Excel表的读取时间
    Note
    When using lazy load, all worksheets will be class: {openpyxl.worksheet.iter_worksheet.IterableWorksheet} and the returned workbook will be read-only.

    In [29]: from openpyxl import load_workbook
    
    In [5]: getwb = load_workbook(filename=r"Handoff.xlsx")   #返回一个Workbook对象
    
    In [6]: getwb
    Out[6]: <openpyxl.workbook.workbook.Workbook at 0x4b7c030>

    getwb是一个Workbook对象,Workbook()是最基本的一个类,能够在内存中创建文件最后将文件内容写进磁盘。

    wbObject.get_sheet_names() 获取Excel表格名

    In [70]: getwb.get_sheet_names()    #返回一个Excel表名组成的列表
    Out[70]: ['NodeCount']
    
    In [75]: getwb.get_sheet_names()[0]
    Out[75]: 'NodeCount'

    Workbook[tableName] 定位Excel表

    Openpyxl模块支持类似字典键值对映射的方式,来获取表格的内容

    In [80]: sheetContent = getwb[getwb.get_sheet_names()[0]]
    
    In [84]: type(sheetContent)
    Out[84]: openpyxl.worksheet.worksheet.Worksheet      #返回一个Worksheet对象,用于存储表格内容

    WbObject.get_sheet_by_name(sheetName) 定位Excel表

    In [57]: sheet1 = getwb.get_sheet_by_name('NodeCount')

    Worksheet.rows和Worksheet.columns获取表格的行列值

    In [89]: sheetContent.rows
    Out[89]:
    ((<Cell NodeCount.A1>,
      <Cell NodeCount.B1>,
      <Cell NodeCount.C1>,
      <Cell NodeCount.D1>),
     (<Cell NodeCount.A2>,
      <Cell NodeCount.B2>,
      .
      .
      .
    In [90]: len(sheetContent.rows)
    Out[90]: 25
    
    In [93]: len(sheetContent.columns)
    Out[93]: 4

    获取Worksheet的行列数目

    In [115]: sheetContent.get_highest_row()
    
    In [117]: sheetContent.get_highest_column()

    通过切片运算符划分表格区域

    因为Worksheet.rows和Worksheet.columns都是Tuple数据类型,支持使用切片运算符。

    In [100]: type(sheetContent.rows),type(sheetContent.columns)
    Out[100]: (tuple, tuple)

    1.获取sheetContent的前两列内容

    In [103]: colA,colB = sheetContent.columns[:2]

    2.划分出一个二维区域

    In [112]: cells = sheetContent['A1':'C3']    #返回一个生成器对象
    
    In [113]: type(cells)
    Out[113]: generator

    get_column_letter将一个列的索引转化为列的字母

    get_column_letter(idx)
    Convert a column index into a column letter
    (3 -> ‘C’)

    In [122]: from openpyxl.cell import get_column_letter
    
    In [124]: for x in list(range(1,11)):
       .....:     ch = get_column_letter(x)
       .....:     print(ch)
       .....:
    A
    B
    C
    D
    E
    F
    G
    H
    I
    J

    Worksheet.cell().value 定位单元格的数据值

    基于给定的坐标(A1)返回一个单元格对象。
    cell(coordinate=None, row=None, column=None, value=None) method of openpyxl.worksheet.worksheet.Worksh
    Returns a cell object based on the given coordinates.

    Usage: cell(coodinate='A15') **or** cell(row=15, column=1)
    
    If `coordinates` are not given, then row *and* column must be given.
    
    Cells are kept in a dictionary which is empty at the worksheet
    creation.  Calling `cell` creates the cell in memory when they
    are first accessed, to reduce memory usage.
    
    :param coordinate: coordinates of the cell (e.g. 'B12')
    :type coordinate: string
    
    :param row: row index of the cell (e.g. 4)
    :type row: int
    
    :param column: column index of the cell (e.g. 3)
    :type column: int
    
    :raise: InsufficientCoordinatesException when coordinate or (row and column) are not given
    
    :rtype: :class:openpyxl.cell.Cell
    
    In [117]: sheetContent.cell("A1")
    Out[117]: <Cell NodeCount.A1>
    
    In [118]: sheetContent.cell("A1").value
    Out[118]: 'Cluster'
    
    In [120]: sheetContent.cell(row=1,column=2).value
    Out[120]: 'HI'

    注意:Excel表格的数据常常在其两边都伴有空格符,需要使用Str.strip()来去除多余的空格符。

    直接给单元格赋值

    将A列全部置为None

    In [127]: colALen = len(sheetContent.columns[0])
    
    In [128]: for i in list(range(1,colALen+1)):
       .....:     sheetContent.cell('A%s' % i).value = None
       .....:

    注意:当你为单元格赋值时,Excel的数据类型由赋值的数据类型决定

    Woeksheet.get_cell_collection() 获取全部的单元格数据值

    获取全部的cell的数值,但是没有顺序。
    get_cell_collection() method of openpyxl.worksheet.worksheet.Worksheet instance
    Return an unordered list of the cells in this worksheet.
    返回一个无序的包含了所有单元格的列表

    In [59]: sheetContent.get_cell_collection()

    enumerate(iterators)获取迭代器的索引和元素

    enumerate(iterable[, start]) -> iterator for index, value of iterable
    Return an enumerate object. iterable must be another object that supports
    iteration. The enumerate object yields pairs containing a count (from
    start, which defaults to zero) and a value yielded by the iterable argument.
    enumerate is useful for obtaining an indexed list:
    (0, seq[0]), (1, seq[1]), (2, seq[2]), …
    接收迭代器类型的实参,返回一个可以遍历的迭代器,包含了(索引,元素)的元组。

    In [46]: row1,row2 = sheetContent.rows[:2]
    
    In [49]: for index,cell in enumerate(row1):
       ....:     print(index,cell)
       ....:
    0 <Cell NodeCount.A1>
    1 <Cell NodeCount.B1>
    2 <Cell NodeCount.C1>
    3 <Cell NodeCount.D1>

    Workbook.save()保存所有操作,并生成新的Excel文件

    已指定的文件名保存这个Excel文件。
    save(filename) method of openpyxl.workbook.workbook.Workbook instance
    Save the current workbook under the given filename.
    Use this function instead of using an ExcelWriter.

    .. warning::
        When creating your workbook using `write_only` set to True,
        you will only be able to call this function once. Subsequents attempts to
        modify or save the file will raise an :class:`openpyxl.shared.exc.WorkbookAlreadySaved` exception.
    
    In [134]: getwb.save('test.xlsx')

    Workbook()创建一个新的Excel文件

    创建一个新的Workbook对象
    class Workbook(builtins.object)
    Workbook is the container for all other parts of the document.

    In [40]: from openpyxl import Workbook
    
    In [48]: outwb = Workbook()     #返回一个openpyxl.workbook.workbook.Workbook的对象
    
    In [49]: outwb
    Out[49]: <openpyxl.workbook.workbook.Workbook at 0x13665d0>

    wbObject.create_sheet() 创建一个Excel表格

    create_sheet(title=None, index=None) method of openpyxl.workbook.workbook.Workbook instance
    Create a worksheet (at an optional index).
    :param title: optional title of the sheet
    :type tile: unicode
    :param index: optional position at which the sheet will be inserted
    :type index: int
    title(unicode):创建新Excel表的标题
    index(int):新Excel表在Excel文件中插入的位置

    In [62]: newSheet = outwb.create_sheet('NewSheet',0)   #返回一个openpyxl.worksheet.worksheet.Worksheet对象
    
    In [63]: type(newSheet)
    Out[63]: openpyxl.worksheet.worksheet.Worksheet

    Worksheet.append() 逐行追加数值到单元格

    当前表格的最后一行追加一行数据。必须传递迭代器实参。
    append(iterable) method of openpyxl.worksheet.worksheet.Worksheet instance
    Appends a group of values at the bottom of the current sheet.

    * If it's a list: all values are added in order, starting from the first column
    * If it's a dict: values are assigned to the columns indicated by the keys (numbers or letters)
    
    :param iterable: list, range or generator, or dict containing values to append
    :type iterable: list/tuple/range/generator or dict
    
    Usage:
    
    * append(['This is A1', 'This is B1', 'This is C1'])   #添加一行三列
    *#or append({'A' : 'This is A1', 'C' : 'This is C1'})  #在指定的'A'和'C'列中添加一行
    *#or append({1 : 'This is A1', 3 : 'This is C1'})      #在指定的1、3行中添加一列
    
    :raise: TypeError when iterable is neither a list/tuple nor a dict
    
    In [131]: newSheet.append(['Test',1,2,3])

    在指定的列内添加一行添加行

    In [80]: newSheet.append({'A':'Add one row'})

    在指定的行中添加一列

    In [84]: newSheet.append({1:'Is A1',3:'Is C1'})

    ExcelWriter() 将Workbook对象写入Excel文件

    一般而言,通过Workbok.save()方法就可以将Workbook对象的内容写入到Excel中,openpyxl提供了ExcelWriter这一个更加强大的Excel写实现。

    In [88]: from openpyxl.writer.excel import ExcelWriter

    class ExcelWriter(builtins.object)
    Write a workbook object to an Excel file.返回一个ExcelWriter对象。

    In [92]: ewb = ExcelWriter(workbook=outwb)     #将Workbook关联到一个ExcelWriter,最后将Workbook的内容写入到磁盘中
    
    In [95]: newSheet.title='testSheet'
    
    In [96]: outwb.get_sheet_names()
    Out[96]: ['testSheet', 'Sheet']
    
    In [97]: for i in list(range(1,11)):
       ....:     newSheet.cell('A%s' % (i)).value = i
       ....:     newSheet.append({'B':i})
       ....:
    
    In [98]: ewb.save(filename='test.xlsx')     #一定要Call ExcelWriterObject.save()方法将Workbook写入到磁盘中。

    最后

    除了使用上述的方法来处理Excel文件的数据之外,openpyxl还提供了能修改Excel表格的样式的实现openpyxl.styles,这个我们下一篇再继续搞起。 :-)

  • 相关阅读:
    2021软件工程-第一周作业01准备工作
    String.matches()的用法
    Idea使用JSP出现404问题---已解决
    如何解决Tomcat启动闪退现象(环境配置没问题)
    解决启动java的web项目时端口占用问题
    IntelliJ IDEA 连接数据库 详细过程-包含使用jdbc连接数据库
    ApplicationContextAware接口认识
    RestTemplate 服务间接口调用
    maven相关
    FastJson:json字符串与Java对象转换
  • 原文地址:https://www.cnblogs.com/jmilkfan-fanguiju/p/11825196.html
Copyright © 2020-2023  润新知