Traceback (most recent call last): File "/Users/*******.py", line 37, in <module> BtcSpider().run() File "/Users/******.py", line 34, in run self.parse_data(data) File "/Users/******.py", line 21, in parse_data xpath_data = etree.HTML(data) File "src/lxml/etree.pyx", line 3161, in lxml.etree.HTML File "src/lxml/parser.pxi", line 1872, in lxml.etree._parseMemoryDocument ValueError: Unicode strings with encoding declaration are not supported. Please use bytes input or XML fragments without declaration.
爬了一个论坛,网页是<meta http-equiv="Content-Type" content="text/html; charset=gb2312"> 但是Mac爬取的网页utf-8解码才正确,但是在 xpath 解析的时候出现上面问题,
xpath 解析的时候 encode 一下就可以了,看代码:
xpath_data = etree.HTML(data.encode('utf-8'))
问题解决啦