• 第一篇随笔:用VB.NET搞点简单事情(1)


    网络上能搜索到的爬虫文章大多是用python做的,也有少部分是C#做的(小声:所以用VB.NET也可以做爬虫.本文写的是第一步:获取网页)

    使用代码前先imports以下内容

    Imports System.IO, System.IO.Compression, System.Text, System.Net

    写程序前先开浏览器(我用的Chrome),随便上个网页,F12看下header,粘下来useragent备用,也可以粘下accept,cookie等(在本文中用不到

    用httpwebrequest建立请求,用httpwebresponse得到响应体.然后考虑下压缩的问题(imports System.IO.Compression就是解决这个的)

    最后得到真正的返回流,streamreader读取之,然后网页的http代码就搞下来了.用这种方法可以搞定编码为UTF-8的网页对于编码是GB2312或GBK的需有改动:使用streamreader时第二个参数改为Encoding.GetEncoding("gbk")

    下面是代码:

     1 Public Function GetHttpContent(url As String) As String
     2         Try
     3             Dim req As HttpWebRequest = HttpWebRequest.CreateHttp(url), resp As HttpWebResponse, sol$
     4             With req
     5                 .UserAgent = "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36"
     6                 .Accept = "*/*"
     7                 .Method = "GET"
     8                 .Timeout = 300000
     9                 .Headers.Add("accept-encoding", " gzip, deflate")
    10             End With
    11             resp = req.GetResponse
    12             Select Case resp.ContentEncoding.ToLower
    13                 Case "gzip"
    14                     Using z As New GZipStream(resp.GetResponseStream, CompressionMode.Decompress)
    15                         Using sr As New StreamReader(z, Encoding.UTF8)
    16                             sol = sr.ReadToEnd
    17                         End Using
    18                     End Using
    19                     Exit Select
    20                 Case "deflate"
    21                     Using z As New DeflateStream(resp.GetResponseStream, CompressionMode.Decompress)
    22                         Using sr As New StreamReader(z, Encoding.UTF8)
    23                             sol = sr.ReadToEnd
    24                         End Using
    25                     End Using
    26                     Exit Select
    27                 Case Else
    28                     Using sr As New StreamReader(resp.GetResponseStream, Encoding.UTF8)
    29                         sol = sr.ReadToEnd
    30                     End Using
    31                     Exit Select
    32             End Select
    33             Return sol
    34         Catch ex As Exception
    35             Return ""
    36         End Try
    37     End Function

    (本人水平有限,代码有不完善的地方欢迎指出

  • 相关阅读:
    Ubuntu在命令行开启远程桌面
    Qt5编译项目出现GL/gl.h:No such file or directory错误
    硬盘录像机协议与技术汇总
    js判断IP字符串是否正确
    PHP获取原生POST数据
    hdu 5093 二分匹配
    hdu 4435 bfs+贪心
    hdu 4431 绝对值之和最小公式
    hdu 5073 推公式相邻质心转换
    hdu 3657 最小割(牛逼!!!!)总算理解了
  • 原文地址:https://www.cnblogs.com/woshilxcdexuesheng/p/11414764.html
Copyright © 2020-2023  润新知