Python Web-第四周-Programs that Surf the Web（Using Python to Access Web Data）

Python Web-第四周-Programs that Surf the Web（Using Python to Access Web Data）
1.Understanding HTML

1.最简单的爬虫
```
import urllib
fhand=urllib.urlopen('http://www.dr-chuck.com/page1.htm')
for line in fhand:
print line.strip()
```
2.Python 爬网页和直接访问网页

3.Scrape

2.Parsing HTML with BeautifulSoup

1.这次直接使用简单方法 BeautifulSoup

2.BeautifulSoup的安装

1.下载 http://www.crummy.com/software/BeautifulSoup/#Download

2.将下载后的文件解压，并拷贝到C：Python27目录下

3.CMD cd到该目录下运行 python setuyp.py install

3.初试BeautifulSoup(同样也是初试Python库)
```
import urllib
from bs4 importBeautifulSoup
url =raw_input('Enter - ')
html = urllib.urlopen(url).read()
soup=BeautifulSoup(html,"html.parser")
tags = soup('a')
for tag in tags:
print tag.get('href',None)
```
注意点：

1.BeautifulSoup在地址后面要加参数

2.BS的引用方式

更多有关BS的教程：http://cuiqingcai.com/1319.html

4.raw_input() 与 input()

raw_input() 直接读取控制台的输入（任何类型的输入它都可以接收）。

而对于 input() ，它希望能够读取一个合法的 python 表达式，

即你输入字符串的时候必须使用引号将它括起来，否则它会引发一个 SyntaxError 。

一般若无特殊需求，多用raw_input()

input() 可接受合法的 python 表达式，input( 1 + 3 ) 会返回 int 型的 4

5.BS的高级用法（课后作业1）

http://python-data.dr-chuck.net/comments_222777.html

对上面网址中的comments求和
import urllib from bs4 importBeautifulSoup url = raw_input('Enter - ') html = urllib.urlopen(url).read() soup =BeautifulSoup(html,"html.parser") sc=soup.select('span[class="comments"]')#查找class为comments的span Sum=0 Count=0 for span in sc: # print 'span' ,span # print 'Attr:' ,span.attrs # print 'Contents:',span.contents[0] Sum+=int(span.contents[0])#提取span中的内容 Count+=1 print'Count:',Count print'Sum:',Sum
PS:

由于从Python 3 换成了 2 出现了 "Non-ASCII character" 问题

在源代码第一行添加：
```
#coding:utf-8
```
或是添加：
#-*- coding: UTF-8 -*-
来自为知笔记(Wiz)
相关阅读:
Notepadd ++ PluginManager安装
 Srping cloud Ribbon 自定义负载均衡
 Spring cloud Eureka 和 Zookeeper 比较
 Spring cloud info信息显示
 kafka 在Windows端安装找不到或无法加载主类的解决方案
 Linux kafka 单机安装
 mina
@bzoj
@51nod
@topcoder
原文地址：https://www.cnblogs.com/moonache/p/5112088.html

Python Web-第四周-Programs that Surf the Web（Using Python to Access Web Data）

1.Understanding HTML

1.最简单的爬虫

2.Python 爬网页和直接访问网页

3.Scrape

2.Parsing HTML with BeautifulSoup

1.这次直接使用简单方法 BeautifulSoup

2.BeautifulSoup的安装

3.初试BeautifulSoup(同样也是初试Python库)

4.raw_input() 与 input()

5.BS的高级用法（课后作业1）

PS: