版本HttpClient3.1
1、GET方式
第一步、创建一个客户端,类似于你用浏览器打开一个网页
HttpClient httpClient = new HttpClient();
第二步、创建一个GET方法,用来获取到你需要抓取的网页URL
GetMethod getMethod = new GetMethod("http://www.baidu.com");
第三步、获得网址的响应状态码,200表示请求成功
int statusCode = httpClient.executeMethod(getMethod);
第四步、获取网页的源码
byte[] responseBody = getMethod.getResponseBody();
主要就这四步,当然还有其他很多东西,比如网页编码的问题
1 public static String spiderHtml() throws Exception { 2 //URL url = new URL("http://top.baidu.com/buzz?b=1"); 3 4 HttpClient client = new HttpClient(); 5 GetMethod method = new GetMethod("http://top.baidu.com/buzz?b=1"); 6 7 int statusCode = client.executeMethod(method); 8 if(statusCode != HttpStatus.SC_OK) { 9 System.err.println("Method failed: " + method.getStatusLine()); 10 } 11 12 byte[] body = method.getResponseBody(); 13 String html = new String(body,"gbk");
2、Post方式
1 HttpClient httpClient = new HttpClient();
2 PostMethod postMethod = new PostMethod(UrlPath); 3 postMethod.getParams().setParameter(HttpMethodParams.RETRY_HANDLER,new DefaultHttpMethodRetryHandler()); 4 NameValuePair[] postData = new NameValuePair[2]; 5 postData[0] = new NameValuePair("username", "xkey"); 6 postData[1] = new NameValuePair("userpass", "********"); 7 postMethod.setRequestBody(postData); 8 try { 9 int statusCode = httpClient.executeMethod(postMethod); 10 if (statusCode == HttpStatus.SC_OK) { 11 byte[] responseBody = postMethod.getResponseBody(); 12 String html = new String(responseBody); 13 System.out.println(html); 14 } 15 } catch (Exception e) {
16 System.err.println("页面无法访问"); 17 }finally{ 18 postMethod.releaseConnection(); 19 }
相关链接:http://blog.csdn.net/acceptedxukai/article/details/7030700
http://www.cnblogs.com/modou/articles/1325569.html