题记:
最近做Python导入接口,用到xlrd包读取excel文件信息入库,获取合并单元格信息时遇到时而成功时而失败的情况,一开始用xls文件读取不了合并单元格信息,后来换用xlsx格式可以读取。但将导出的文件再重新导入时获取合并单元格信息又失败了。没办法去看看源码看能不能突破,说实话看源码真的头疼,去源码里面搜merge发现里面的 sheet.merged_cells 属性并没有什么特别的地方。后来读merged_cells 头部的注释是发现一个叫formatting_info的参数,然后问题迎刃而解。遂记之。
源码注释:
##
# List of address ranges of cells which have been merged.
# These are set up in Excel by Format > Cells > Alignment, then ticking
# the "Merge cells" box.
# <br> Note that the upper limits are exclusive: i.e. <tt>[2, 3, 7, 9]</tt> only
# spans two cells.
# <br> -- New in version 0.6.1. Extracted only if open_workbook(formatting_info=True).
# <br>How to deconstruct the list:
# <pre>
# for crange in thesheet.merged_cells:
# rlo, rhi, clo, chi = crange
# for rowx in xrange(rlo, rhi):
# for colx in xrange(clo, chi):
# # cell (rlo, clo) (the top left one) will carry the data
# # and formatting info; the remainder will be recorded as
# # blank cells, but a renderer will apply the formatting info
# # for the top left cell (e.g. border, pattern) to all cells in
# # the range.
# </pre>
merged_cells = []
读取合并单元格代码:
def read_excel_as_merge(self,file_name):
# 读取合并单元格信息,按照平铺方式展开,合并单元格按照合并单元格的最左上单元格填充 data = [] wb = xlrd.open_workbook(file_name,'r',formatting_info=True) sheet = wb.sheet_by_name(wb.sheet_names()[0]) merge = sheet.merged_cells for ri in range(sheet.nrows): # print(sheet.row_values(r)) row = [] for ci in range(sheet.ncols): v = sheet.cell(ri, ci).value for rg in merge: a, b, c, d = rg if ri >= a and ri < b and ci >= c and ci < d: v = sheet.cell(a, c).value row.append(v) data.append(row) return data,merge
能通过百度解决的问题,就不打扰别人;
能通过问人解决的问题,就不读源码;
能通过源码解决的问题,就不自己Coding;