lxml 中连续调用xpath出的问题

#获取智联招聘每条记录的信息
#工资字段的xpath为：//*[@id="newlist_list_content_table"]/table[2]/tbody/tr[1]/td[4] 

selecor=etree.HTML(html)
#第一次调用xpath。type（tables）==list
tables=selecor.xpath('//div[@class="newlist_list_content"]/table[@class="newlist"]/tr[1]')

for tr in tables:

    tr=etree.HTML(etree.tostring(tr))
　　#第二次调用xpath
    job = tr.xpath('//td[@class="zwmc"]//a[1]/text()')
    com_name=tr.xpath('//td[@class="gsmc"]/a[1]/text()')
    salary = tr.xpath('//td[@class="zwyx"]/text()')
    address = tr.xpath('//td[@class="gzdd"]/text()')

　　 print(job)
　　 print(com_name)
    .....

以上可以输出想要的内容。

之前，饶了很大一个弯子。因为没有下面这行代码，一直出错，循环打印出的内容一直相同。

tr=etree.HTML(etree.tostring(tr)) #先把element对象转成str，然后再调用etree.HTML()重新生成一个element

问题出在两次调用xpath。加上上面这行代码之后，就可以了。

网上查了一些资料，都是入门介绍，不能帮助自己理解。

然后，专门测试了一下以下这种情况。

et=etree.HTML(html)
# pdb.set_trace()
htmlTree=etree.XPathEvaluator(et)
#case 1
tables=htmlTree('//div[@class="newlist_list_content"]/table[@class="newlist"]/tr[1]')

table=tables[1].xpath('//td[@class="gzdd"]/text()')
table2=tables[2].xpath('//td[@class="gzdd"]/text()')
#case 2
ee=htmlTree('//div[@class="newlist_list_content"]/table[@class="newlist"]/tr[1]//td[@class="gzdd"]/text()')

#结果
# print(table == ee)#true
print(table2 == ee)#true

总结：

1) case1,case2 连续（两次）调用xpath，返回的element对象一样；

2) 将element转str，再生成element，可以再调用xpath

3）新手，解决这类问题还没有成型的思路，请高手指明这类问题的解决思路。

原文地址：https://www.cnblogs.com/zqctzk/p/8862671.html