• PyPDF2 编码问题 PyPDF2.utils.PdfReadError Illegal character in Name Object


    PyPDF2 编码问题 PyPDF2.utils.PdfReadError Illegal character in Name Object

    参考资料:https://github.com/mstamy2/PyPDF2/issues/438

    使用 PyPDF2 做合并 PDF 文件时报错如下:

    Traceback (most recent call last):
      File "D:projectsmyprojectvenvlibsite-packagesPyPDF2generic.py", line 484, in readFromStream
        return NameObject(name.decode('utf-8'))
    UnicodeDecodeError: 'utf-8' codec can't decode byte 0xcb in position 8: invalid continuation byte
    During handling of the above exception, another exception occurred:
    Traceback (most recent call last):
      File "D:projectsmyprojectappsackstageviewsusi_contract_manage_view.py", line 703, in post
        merge_pdf_result = merge_pdf(final_files, pdf_path)
      File "D:projectsmyprojectappsutilsdoc_convert_util.py", line 86, in merge_pdf
        pdf_writer.write(new_file)
      File "D:projectsmyprojectvenvlibsite-packagesPyPDF2pdf.py", line 482, in write
        self._sweepIndirectReferences(externalReferenceMap, self._root)
      File "D:projectsmyprojectvenvlibsite-packagesPyPDF2pdf.py", line 571, in _sweepIndirectReferences
        self._sweepIndirectReferences(externMap, realdata)
      File "D:projectsmyprojectvenvlibsite-packagesPyPDF2pdf.py", line 547, in _sweepIndirectReferences
        value = self._sweepIndirectReferences(externMap, value)
      File "D:projectsmyprojectvenvlibsite-packagesPyPDF2pdf.py", line 571, in _sweepIndirectReferences
        self._sweepIndirectReferences(externMap, realdata)
      File "D:projectsmyprojectvenvlibsite-packagesPyPDF2pdf.py", line 547, in _sweepIndirectReferences
        value = self._sweepIndirectReferences(externMap, value)
      File "D:projectsmyprojectvenvlibsite-packagesPyPDF2pdf.py", line 556, in _sweepIndirectReferences
        value = self._sweepIndirectReferences(externMap, data[i])
      File "D:projectsmyprojectvenvlibsite-packagesPyPDF2pdf.py", line 571, in _sweepIndirectReferences
        self._sweepIndirectReferences(externMap, realdata)
      File "D:projectsmyprojectvenvlibsite-packagesPyPDF2pdf.py", line 547, in _sweepIndirectReferences
        value = self._sweepIndirectReferences(externMap, value)
      File "D:projectsmyprojectvenvlibsite-packagesPyPDF2pdf.py", line 547, in _sweepIndirectReferences
        value = self._sweepIndirectReferences(externMap, value)
      File "D:projectsmyprojectvenvlibsite-packagesPyPDF2pdf.py", line 547, in _sweepIndirectReferences
        value = self._sweepIndirectReferences(externMap, value)
      File "D:projectsmyprojectvenvlibsite-packagesPyPDF2pdf.py", line 577, in _sweepIndirectReferences
        newobj = data.pdf.getObject(data)
      File "D:projectsmyprojectvenvlibsite-packagesPyPDF2pdf.py", line 1611, in getObject
        retval = readObject(self.stream, self)
      File "D:projectsmyprojectvenvlibsite-packagesPyPDF2generic.py", line 66, in readObject
        return DictionaryObject.readFromStream(stream, pdf)
      File "D:projectsmyprojectvenvlibsite-packagesPyPDF2generic.py", line 579, in readFromStream
        value = readObject(stream, pdf)
      File "D:projectsmyprojectvenvlibsite-packagesPyPDF2generic.py", line 60, in readObject
        return NameObject.readFromStream(stream, pdf)
      File "D:projectsmyprojectvenvlibsite-packagesPyPDF2generic.py", line 492, in readFromStream
        raise utils.PdfReadError("Illegal character in Name Object")
    PyPDF2.utils.PdfReadError: Illegal character in Name Object
    

     找到对应的报错文件 

     File "D:projectsmyprojectvenvlibsite-packagesPyPDF2generic.py", line 484 

    第484行 原代码:

    try:
        return NameObject(name.decode('utf-8'))
    except (UnicodeEncodeError, UnicodeDecodeError) as e:
        # Name objects should represent irregular characters
        # with a '#' followed by the symbol's hex number
        if not pdf.strict:
            warnings.warn("Illegal character in Name Object", utils.PdfReadWarning)
            return NameObject(name)
        else:
            raise utils.PdfReadError("Illegal character in Name Object")

    在 except 中加入代码 

     return NameObject(name.decode('gbk')) 

    修改后

    try:
        return NameObject(name.decode('utf-8'))
    except (UnicodeEncodeError, UnicodeDecodeError) as e:
        try:
            return NameObject(name.decode('gbk'))
        except (UnicodeEncodeError, UnicodeDecodeError) as e:
            # Name objects should represent irregular characters
            # with a '#' followed by the symbol's hex number
            if not pdf.strict:
                warnings.warn("Illegal character in Name Object", utils.PdfReadWarning)
                return NameObject(name)
            else:
                raise utils.PdfReadError("Illegal character in Name Object")

    修改后仍会报错,需要修改修改另一处

    Lib/site-packages/PyPDF2/utils.py 第238行

    原代码

    r = s.encode('latin-1')
    if len(s) < 2:
        bc[s] = r
    return r

    修改后代码:

    try:
        r = s.encode('latin-1')
    except Exception as e:
        r = s.encode('utf-8')
    if len(s) < 2:
        bc[s] = r
    return r

    出处:https://blog.csdn.net/kmesky/article/details/102695520

  • 相关阅读:
    处理不同方向的文本1.0
    CSS盒模型
    费德曼学习法
    [转]Photoshop中的高斯模糊、高反差保留和Halcon中的rft频域分析研究
    [转]仿射变换及其变换矩阵的理解
    [转]Scintilla开源库使用指南(一
    [转]Scintilla开源库使用指南(二
    [转]C#中WinForm窗体事件的执行次序
    [转]透过IL看C#:switch语句(转)
    [转]程序员必读书单(转)
  • 原文地址:https://www.cnblogs.com/mysick/p/12726582.html
Copyright © 2020-2023  润新知