今天做个爬虫,爬取百度搜索的结果。可出来的连接是这样的,虽然通过浏览器能够访问,但是这不是我想要的练级
http://www.baidu.com/link?url=xwnhqq0ofbRBQnHyAmPSV93YEiNCZQ5M_nMkEpFgraWSrWZ1-ZtLZGR9jgqBjEc-Olb80TlBmpFAd0H1s7tuXq
通过jsoup获得真实的连接
String url = "http://www.baidu.com/link?url=xwnhqq0ofbRBQnHyAmPSV93YEiNCZQ5M_nMkEpFgraWSrWZ1-ZtLZGR9jgqBjEc-Olb80TlBmpFAd0H1s7tuXq";
int itimeout = 60000;
Connection.Response res = Jsoup.connect(url).timeout(itimeout).method(Connection.Method.GET).followRedirects(false).execute();
String realUrl = res.header("Location");
System.out.println(realUrl);
也可以通过jsoup直接转发向到目标网站,获得内容
Document document = Jsoup.connect(url).timeout(60000).method(Connection.Method.GET).followRedirects(true).get();