response 内容被注释怎么解决

我们做爬虫的时候会遇到返回结果被注释的情况，下面说一下我是怎么解决的

第一步获得被注释的内容，根据自己的内容获取。

 thread_node_list = '//code[@id = "pagelet_html_frs-list/pagelet/thread_list"]'

第二步替换掉注释，并恢复成html格式

 1     def get_tag(self, xpath_str, response):
 2         # 获取字符串格式标签,去除注释后，返回新标签
 3         thread_node_list = response.xpath(xpath_str)
 4         # print(thread_node_list, 'thread_node_list')
 5         if thread_node_list:
 6             item = thread_node_list[0]
 7             thread_new_html = item.extract().replace('<!--', '').replace('--></code>', '</code>')
 8             # print(thread_new_html)
 9             new_thread_node_list = html.fromstring(thread_new_html)
10             return new_thread_node_list

最后就能直接使用了

相关阅读:
MySQL 允许远程连接
EeePad刷机
Ubuntu安装Oracle JDK
Windows Azure Tips
查看MySQL数据库大小
Tomcat 7 DBCP 配置（MySQL）
几个国内的OpenSource镜像站
好吧，这是我的第一篇文章。
安卓软件推荐56冰箱IceBox
ArrayList 冷门方法

原文地址：https://www.cnblogs.com/wzbk/p/11050094.html