• 使用URLConnection获取网页信息的基本流程


      参考自core java v2, chapter3 Networking.

    注:URLConnection的子类HttpURLConnection被广泛用于Android网络客户端编程,它与apache HttpClient是两种主要的客户端实现方式,google官方推荐使用HttpURLConnection.


    使用URL类可以简单获取网页信息,

    URL url = new URL("http://www.baidu.com");
    InputStream is = url.openStream();
    Scanner sc = new Scanner(is);
    但URLConnection提供了更为强大的功能,其基本步骤如下:

    1. Call the openConnection method of the URL class to obtain the URLConnection object:
    URLConnection connection = url.openConnection();


    2. Set any request properties, using the methods
    setDoInput
    setDoOutput
    setIfModifiedSince
    setUseCaches
    setAllowUserInteraction
    setRequestProperty
    setConnectTimeout
    setReadTimeout
    We discuss these methods later in this section and in the API notes.


    3. Connect to the remote resource by calling the connect method.
    connection.connect();
    Besides making a socket connection to the server, this method also queries the server for header information.


    4. After connecting to the server, you can query the header information. Two methods, getHeaderFieldKey and
    getHeaderField, enumerate all fields of the header. The method getHeaderFields gets a standard Map object
    containing the header fields. For your convenience, the following methods query standard fields:
    getContentType
    getContentLength
    getContentEncoding
    getDate
    getExpiration
    getLastModified


    5. Finally, you can access the resource data. Use the getInputStream method to obtain an input stream for reading the
    information. (This is the same input stream that the openStream method of the URL class returns.) The other method,
    getContent, isn’t very useful in practice. The objects that are returned by standard content types such as
    text/plain and image/gif require classes in the com.sun hierarchy for processing. You could register your own
    content handlers, but we do not discuss that technique in this book.


    package com.ljh.corejava;
    
    import java.io.IOException;
    import java.io.InputStream;
    import java.net.URL;
    import java.net.URLConnection;
    import java.util.List;
    import java.util.Map;
    import java.util.Scanner;
    
    public class URLConnectionTest {
    	public static void main(String[] args) {
    		
    		try{
    			//1、创建URLConnction.		
    			URL url = new URL("http://www.baidu.com");
    			URLConnection connection = url.openConnection();
    		
    			//2、设置connection的属性
    			connection.setConnectTimeout(10000);
    			connection.setReadTimeout(10000);
    			
    			//3.连接
    			connection.connect();
    			
    			//4.获取头部信息之一:获取所有头部信息后再遍历
    			Map<String, List<String>> headers = connection.getHeaderFields();
    			for(Map.Entry<String,List<String>> entry : headers.entrySet()){
    				System.out.println(entry.getKey()+" : ");
    				for(String value : entry.getValue()){
    					System.out.println(value+" , ");
    				}
    			}
    			
    
    			//4.获取头部信息之二:使用简便方法
    	         System.out.println("----------");
    	         System.out.println("getContentType: " + connection.getContentType());
    	         System.out.println("getContentLength: " + connection.getContentLength());
    	         System.out.println("getContentEncoding: " + connection.getContentEncoding());
    	         System.out.println("getDate: " + connection.getDate());
    	         System.out.println("getExpiration: " + connection.getExpiration());
    	         System.out.println("getLastModifed: " + connection.getLastModified());
    	         System.out.println("----------");
    	         
    	         //5.获取内容
    	         InputStream is = connection.getInputStream();
    	         Scanner sc = new Scanner(is);
    	         while(sc.hasNextLine()){
    	        	 System.out.println(sc.nextLine());
    	         }
    	         
    	         sc.close();
    	         is.close();
    		
    		}catch(IOException e){
    			e.printStackTrace();
    		}
    
    	}
    }
    
    输出结果如下:

    null : 
    HTTP/1.1 200 OK , 
    Expires : 
    Sat, 12 Oct 2013 16:16:20 GMT , 
    Set-Cookie : 
    H_PS_PSSID=; path=/; domain=.baidu.com , 
    BDSVRTM=0; path=/ , 
    BAIDUID=2CD6C90F50267C4DD25F1DA90D209AB5:FG=1; expires=Thu, 31-Dec-37 23:55:55 GMT; max-age=2147483647; path=/; domain=.baidu.com , 
    Connection : 
    Keep-Alive , 
    Server : 
    BWS/1.0 , 
    Cache-Control : 
    private , 
    Date : 
    Sat, 12 Oct 2013 16:16:51 GMT , 
    BDQID : 
    0xad03b06b916405ea , 
    Vary : 
    Accept-Encoding , 
    Transfer-Encoding : 
    chunked , 
    P3P : 
    CP=" OTI DSP COR IVA OUR IND COM " , 
    BDPAGETYPE : 
    1 , 
    Content-Type : 
    text/html;charset=utf-8 , 
    BDUSERID : 
    0 , 
    ----------
    getContentType: text/html;charset=utf-8
    getContentLength: -1
    getContentEncoding: null
    getDate: 1381594611000
    getExpiration: 1381594580000
    getLastModifed: 0
    ----------
    <!DOCTYPE html><!--STATUS OK--><html><head><meta http-equiv="content-type" content="text/html;charset=utf-8"><title>百度一下,你就知道</title><style >html,body{height:100%}html{overflow-y:auto}#wrapper{position:relative;_position:;min-height:100%}#content{padding-bottom:100px;text-align:center}#ftCon{height:100px;position:absolute;bottom:44px;text-align:center;100%;margin:0 auto;z-index:0;overflow:hidden}#ftConw{720px;margin:0 auto}body{font:12px arial;text-align:;background:#fff}body,p,form,ul,li{margin:0;padding:0;list-style:none}body,form,#fm{position:relative}td{text-align:left}img{border:0}a{color:#00c}a:active{color:#f60}.bg{background-image:url(http://s1.bdstatic.com/r/www/cache/static/global/img/icons_db1b0e67.png);background-repeat:no-repeat;_background-image:url(http://s1.bdstatic.com/r/www/cache/static/global/img/icons_190dda05.gif)}.c-icon{display:inline-block;14px;height:14px;vertical-align:text-bottom;font-style normal;overflow:hidden;background:url(http://s1.bdstatic.com/r/www/cache/static/global/img/icons_db1b0e67.png) no-repeat 0 0;_background-image:url(http://s1.bdstatic.com/r/www/cache/static/global/img/icons_190dda05.gif)}.c-icon-triangle-down-blue{background-position:-480px -24px}.c-icon-chevron-unfold2{background-position:-504px -24px}#u{color:#999;padding:4px 10px 5px 0;text-align:right}#u a{margin:0 5px}#u .reg{margin:0}#m{720px;margin:0 auto}#nv a,#nv b,.btn,#lk{font-size:14px}#fm{padding-left:110px;text-align:left;z-index:1}input{border:0;padding:0}#nv{height:19px;font-size:16px;margin:0 0 4px;text-align:left;text-indent:137px}.s_ipt_wr{418px;height:30px;display:inline-block;margin-right:5px;background-position:0 -288px;border:1px solid #b6b6b6;border-color:#9a9a9a #cdcdcd #cdcdcd #9a9a9a;vertical-align:top}.s_ipt{405px;height:22px;font:16px/22px arial;margin:5px 0 0 7px;background:#fff;outline:0;-webkit-appearance:none}.s_btn{95px;height:32px;padding-top:2px9;font-size:14px;background-color:#ddd;background-position:0 -240px;cursor:pointer}.s_btn_h{background-position:-240px -240px}.s_btn_wr{97px;height:34px;display:inline-block;background-position:-120px -240px;*position:relative;z-index:0;vertical-align:top}#lg img{vertical-align:top;margin-bottom:3px}#lk{margin:33px 0}#lk span{font:14px "宋体"}#lm{height:60px}#lh{margin:16px 0 5px;word-spacing:3px}.tools{position:absolute;top:-4px;*top:10px;right:7px}#mHolder{62px;position:relative;z-index:296;display:none}#mCon{height:18px;line-height:18px;position:absolute;cursor:pointer}#mCon span{color:#00c;cursor:default;display:block}#mCon .hw{text-decoration:underline;cursor:pointer;display:inline-block}#mCon .pinyin{display:inline-block}#mCon .c-icon-chevron-unfold2{margin-left:5px}#mMenu a{100%;height:100%;display:block;line-height:22px;text-indent:6px;text-decoration:none;filter:none9}#mMenu,#user ul{box-shadow:1px 1px 2px #ccc;-moz-box-shadow:1px 1px 2px #ccc;-webkit-box-shadow:1px 1px 2px #ccc;filter:progid:DXImageTransform.Microsoft.Shadow(Strength=2,Direction=135,Color="#cccccc")9}#mMenu{56px;border:1px solid #9b9b9b;list-style:none;position:absolute;right:27px;top:28px;display:none;background:#fff}#mMenu a:hover{background:#ebebeb}#mMenu .ln{height:1px;background:#ebebeb;overflow:hidden;font-size:1px;line-height:1px;margin-top:-1px}#cp,#cp a{color:#666}#seth{display:none;behavior:url(#default#homepage)}#setf{display:none}#sekj{margin-left:14px}#shouji{margin-right:14px}</style><script >function h(obj){obj.style.behavior='url(#default#homepage)';var a = obj.setHomePage('http://www.baidu.com/');}</script></head><body><div id="wrapper"><div id="content"><div id="u"><a href="http://www.baidu.com/gaoji/preferences.html" name="tj_setting">搜索设置</a>|<a href="https://passport.baidu.com/v2/?login&tpl=mn&u=http%3A%2F%2Fwww.baidu.com%2F" name="tj_login" id="lb" onclick="return false;">登录</a><a href="https://passport.baidu.com/v2/?reg&regType=1&tpl=mn&u=http%3A%2F%2Fwww.baidu.com%2F" target="_blank" name="tj_reg" class="reg">注册</a></div><div id="m"><p id="lg"><img src="http://www.baidu.com/img/270%EF%BC%8F129_f01c4e74c976a9e3baf6cfa8a3f97c8f.gif" width="270" height="129" usemap="#mp"><map name="mp"><area shape="rect" coords="0,1,270,129" href="http://www.baidu.com/s?wd=%E9%87%8D%E9%98%B3%E8%8A%82" target="_blank" title="九九重阳,关爱老人"onmousedown="return ns_c({'fm':'behs','tab':'bdlogo'})"></map></p><p id="nv"><a href="http://news.baidu.com">新&nbsp;闻</a> <b>网&nbsp;页</b> <a href="http://tieba.baidu.com">贴&nbsp;吧</a> <a href="http://zhidao.baidu.com">知&nbsp;道</a> <a href="http://music.baidu.com">音&nbsp;乐</a> <a href="http://image.baidu.com">图&nbsp;片</a> <a href="http://v.baidu.com">视&nbsp;频</a> <a href="http://map.baidu.com">地&nbsp;图</a></p><div id="fm"><form name="f" action="/s"><span class="bg s_ipt_wr"><input type="text" name="wd" id="kw" maxlength="100" class="s_ipt"></span><input type="hidden" name="rsv_bp" value="0"><input type="hidden" name="rsv_spt" value="3"><input type="hidden" name="ie" value="utf-8"><span class="bg s_btn_wr"><input type="submit" value="百度一下" id="su" class="bg s_btn" onmousedown="this.className='bg s_btn s_btn_h'" onmouseout="this.className='bg s_btn'"></span></form><span class="tools"><span id="mHolder"><div id="mCon"><span>输入法</span></div></span></span><ul id="mMenu"><li><a href="#" name="ime_hw">手写</a></li><li><a href="#" name="ime_py">拼音</a></li><li class="ln"></li><li><a href="#" name="ime_cl">关闭</a></li></ul></div><p id="lk"><a href="http://baike.baidu.com">百科</a> <a href="http://wenku.baidu.com">文库</a> <a href="http://www.hao123.com">hao123</a><span>&nbsp;|&nbsp;<a href="http://www.baidu.com/more/">更多&gt;&gt;</a></span></p><p id="lm"></p></div></div><div id="ftCon"><div id="ftConw"><p ><a id="seth" onClick="h(this)" href="/" onmousedown="return ns_c({'fm':'behs','tab':'homepage','pos':0})">把百度设为主页</a><a id="setf" href="http://www.baidu.com/cache/sethelp/index.html" onmousedown="return ns_c({'fm':'behs','tab':'favorites','pos':0})" target="_blank">把百度设为主页</a><span id="sekj"><a href="http://weishi.baidu.com/?shouye" target="_blank" onmousedown="return ns_c({'fm':'behs','tab':'bdbrwlk','pos':1})">安装百度卫士</a></span></p><p id="lh"><a href="http://e.baidu.com/?refer=888" onmousedown="return ns_c({'fm':'behs','tab':'btlink','pos':2})">加入百度推广</a>&nbsp;|&nbsp;<a onmousedown="return ns_c({'fm':'behs','tab':'tj_bang'})" href="http://top.baidu.com">搜索风云榜</a>&nbsp;|&nbsp;<a onmousedown="return ns_c({'fm':'behs','tab':'tj_about'})" href="http://home.baidu.com">关于百度</a>&nbsp;|&nbsp;<a onmousedown="return ns_c({'fm':'behs','tab':'tj_about_en'})" href="http://ir.baidu.com">About Baidu</a></p><p id="cp">&copy;2013&nbsp;Baidu&nbsp;<a href="/duty/" name="tj_duty">使用百度前必读</a>&nbsp;京ICP证030173号&nbsp;<img src="http://www.baidu.com/cache/global/img/gs.gif"></p></div></div></div></body><script>var bds={se:{},comm : {ishome : 1,sid : "",user : "",username : "",sugHost : "http://suggestion.baidu.com/su",loginAction : []}}</script><script type="text/javascript" src="http://s1.bdstatic.com/r/www/cache/static/global/js/home_f813a739.js" charset="utf-8"></script><script>var bdUser = null;var w=window,d=document,n=navigator,k=d.f.wd,a=d.getElementById("nv").getElementsByTagName("a"),isIE=n.userAgent.indexOf("MSIE")!=-1&&!window.opera;(function(){if(/q=([^&]+)/.test(location.search)){k.value=decodeURIComponent(RegExp["x241"])}})();if(n.cookieEnabled){bds.se.sug();};function addEV(o, e, f){if(w.attachEvent){o.attachEvent("on" + e, f);}else if(w.addEventListener){ o.addEventListener(e, f, false);}}function G(id){return d.getElementById(id);}function ns_c(q){var p = encodeURIComponent(window.document.location.href), sQ = '', sV = '', mu='', img = window["BD_PS_C" + (new Date()).getTime()] = new Image();for (v in q) {sV = q[v];sQ += v + "=" + sV + "&";} mu= "&mu=" + p ;img.src = "http://nsclick.baidu.com/v.gif?pid=201&pj=www&rsv_sid=&" + sQ + "path="+p+"&t="+new Date().getTime();return true;}if(/bdime=[12]/.test(d.cookie)){document.write('<script src="' + "http://s1.bdstatic.com/r/www/cache/static/ime/js/openime_ceac1c4e.js" + '" charset="utf-8"></script>');}(function(){var u = G("u").getElementsByTagName("a"), nv = G("nv").getElementsByTagName("a"), lk = G("lk").getElementsByTagName("a"), un = "";var tj_nv = ["news","tieba","zhidao","mp3","img","video","map"];var tj_lk = ["baike","wenku","hao123","more"];un = bds.comm.user == "" ? "" : bds.comm.user;function _addTJ(obj){addEV(obj, "mousedown", function(e){var e = e || window.event;var target = e.target || e.srcElement;ns_c({'fm':'behs','tab':target.name||'tj_user','un':encodeURIComponent(un)});});}for(var i = 0; i < u.length; i++){_addTJ(u[i]);}for(var i = 0; i < nv.length; i++){nv[i].name = 'tj_' + tj_nv[i];}for(var i = 0; i < lk.length; i++){lk[i].name = 'tj_' + tj_lk[i];}})();(function() {var links = {'tj_news': ['word', 'http://news.baidu.com/ns?tn=news&cl=2&rn=20&ct=1&ie=utf-8'],'tj_tieba': ['kw', 'http://tieba.baidu.com/f?ie=utf-8'],'tj_zhidao': ['word', 'http://zhidao.baidu.com/search?pn=0&rn=10&lm=0'],'tj_mp3': ['key', 'http://music.baidu.com/search?fr=ps&ie=utf-8'],'tj_img': ['word', 'http://image.baidu.com/i?ct=201326592&cl=2&nc=1&lm=-1&st=-1&tn=baiduimage&istype=2&fm=&pv=&z=0&ie=utf-8'],'tj_video': ['word', 'http://video.baidu.com/v?ct=301989888&s=25&ie=utf-8'],'tj_map': ['wd', 'http://map.baidu.com/?newmap=1&ie=utf-8&s=s'],'tj_baike': ['word', 'http://baike.baidu.com/search/word?pic=1&sug=1&enc=utf8'],'tj_wenku': ['word', 'http://wenku.baidu.com/search?ie=utf-8']};var domArr = [G('nv'), G('lk'),G('cp')],kw = G('kw');for (var i = 0, l = domArr.length; i < l; i++) {domArr[i].onmousedown = function(e) {e = e || window.event;var target = e.target || e.srcElement,name = target.getAttribute('name'),items = links[name],reg = new RegExp('^\s+|\s+x24'),key = kw.value.replace(reg, '');if (items) {if (key.length > 0) {var wd = items[0], url = items[1],url = url + ( name === 'tj_map' ? encodeURIComponent('&' + wd + '=' + key) : ( ( url.indexOf('?') > 0 ? '&' : '?' ) + wd + '=' + encodeURIComponent(key) ) );target.href = url;} else {target.href = target.href.match(new RegExp('^http://.+.baidu.com'))[0];}}name && ns_c({'fm': 'behs','tab': name,'query': encodeURIComponent(key),'un': encodeURIComponent(bds.comm.user || '') });};}})();addEV(w,"load",function(){k.focus()});w.onunload=function(){};</script><script type="text/javascript" src="http://s1.bdstatic.com/r/www/cache/static/global/js/tangram-1.3.4c1.0_07038476.js"></script><script type="text/javascript" src="http://s1.bdstatic.com/r/www/cache/static/user/js/u_ec0ebfe1.js" charset="utf-8"></script><script>try{document.cookie="WWW_ST=;expires=Sat, 01 Jan 2000 00:00:00 GMT";baidu.on(document.forms[0],"submit",function(){var _t=new Date().getTime();document.cookie = "WWW_ST=" + _t +";expires=" + new Date(_t + 10000).toGMTString()})}catch(e){}</script></html><!--20b2cb72348ecb0c--><!--09807388711509496586101300-->
    <script> var _trace_page_logid = 0980738871; </script>

  • 相关阅读:
    wordpress ImetaWeblog
    日期替换,正则
    大文本编辑程序
    [C#]使用 Bing Sharp 來做 Bing 翻譯[转]
    uc密码产生方式。
    运行时出现 “child”不是此父级的子控件。
    太犯傻了。。。。
    mysql中使用rand函数得到随机整数
    混合模式程序集是针对“v2.0.50727”版的运行时生成的,在没有配置其他信息的情况下,无法在 4.0 运行时中加载该程序集。
    获取 httponly 的 cookie
  • 原文地址:https://www.cnblogs.com/eaglegeek/p/4557984.html
Copyright © 2020-2023  润新知