Heritrix 3.1.0 源码解析(二十三)中我们分析了Heritrix3.1.0系统是怎样扩展HttpClient组件的HttpConnection连接对象和相应的管理接口HttpConnectionManager
HttpConnection连接对象里面创建了SOCKET连接,但是还没用向输出流写数据,也没有从输入流读数据, 这里面HttpClient组件是怎么实现的,Heritrix3.1.0系统又是怎么扩展的呢?
我们知道,当我们用HttpClient组件执行网页请求时,根据我们要请求的网页是GET请求还是POST请求我们创建相应的GetMethod类或PostMethod类(当然还有其他方式,浏览器暂不支持)
这些请求类实现了共同的接口HttpMethod,该接口声明了所有请求需要实现的方法(该接口声明方法比较多,逻辑上可以将它们分为与Request相关部分和与Response相关部分,便于理解),下面列出的是里面的重要方法
public interface HttpMethod { // ---------------------------------------------------------------- Queries //与Response相关部分 boolean validate(); int getStatusCode(); byte[] getResponseBody() throws IOException; String getResponseBodyAsString() throws IOException; InputStream getResponseBodyAsStream() throws IOException; int execute(HttpState state, HttpConnection connection) throws HttpException, IOException; void releaseConnection();boolean getDoAuthentication(); void setDoAuthentication(boolean doAuthentication); public HttpMethodParams getParams(); public void setParams(final HttpMethodParams params); public AuthState getHostAuthState(); public AuthState getProxyAuthState(); boolean isRequestSent(); }
当我们执行一个请求时,实际会调用接口实现类的execute方法
实现该接口有一个抽象类HttpMethodBase,该抽象类实现了所有继承类(所有请求方式)的共同方法,主要是SOCKET输出流和输入流的处理,其中最重要的是execute方法
/** * Executes this method using the specified <code>HttpConnection</code> and * <code>HttpState</code>. * * @param state {@link HttpState state} information to associate with this * request. Must be non-null. * @param conn the {@link HttpConnection connection} to used to execute * this HTTP method. Must be non-null. * * @return the integer status code if one was obtained, or <tt>-1</tt> * * @throws IOException if an I/O (transport) error occurs * @throws HttpException if a protocol exception occurs. */ public int execute(HttpState state, HttpConnection conn) throws HttpException, IOException { LOG.trace("enter HttpMethodBase.execute(HttpState, HttpConnection)"); // this is our connection now, assign it to a local variable so // that it can be released later this.responseConnection = conn; checkExecuteConditions(state, conn); this.statusLine = null; this.connectionCloseForced = false; conn.setLastResponseInputStream(null); // determine the effective protocol version if (this.effectiveVersion == null) { this.effectiveVersion = this.params.getVersion(); } //Socket输出流 writeRequest(state, conn); this.requestSent = true; //Socket输入流 readResponse(state, conn); // the method has successfully executed used = true; return statusLine.getStatusCode(); }
上面方法中的writeRequest(state, conn)负责写入流,readResponse(state, conn)负责读取流
writeRequest(state, conn)方法写入流的过程无非是组装数据,Heritrix3.1.0系统就是通过这个入口切入的,并改写了HttpMethodBase类,写入自定义的逻辑,包括cookies的写入和form参数的写入等(这部分待分析HERITRIX3.1.0系统的自定义cookies和form封装再分析吧)
该方法除了执行上述公用的逻辑外,还继续调用了boolean writeRequestBody(HttpState state, HttpConnection conn)方法,该方法通常由子类实现
该抽象类HttpMethodBase的继承类提供对应请求方式的自身方法实现,我这里只分析Heritrix3.1.0系统自定义的HttpRecorderGetMethod类和HttpRecorderPostMethod类
public class HttpRecorderGetMethod extends GetMethod { protected static Logger logger = Logger.getLogger(HttpRecorderGetMethod.class.getName()); /** * Instance of http recorder method. */ protected HttpRecorderMethod httpRecorderMethod = null; public HttpRecorderGetMethod(String uri, Recorder recorder) { super(uri); this.httpRecorderMethod = new HttpRecorderMethod(recorder); } protected void readResponseBody(HttpState state, HttpConnection connection) throws IOException, HttpException { // We're about to read the body. Mark transition in http recorder. this.httpRecorderMethod.markContentBegin(connection); super.readResponseBody(state, connection); } protected boolean shouldCloseConnection(HttpConnection conn) { // Always close connection after each request. As best I can tell, this // is superfluous -- we've set our client to be HTTP/1.0. Doing this // out of paranoia. return true; } public int execute(HttpState state, HttpConnection conn) throws HttpException, IOException { // Save off the connection so we can close it on our way out in case // httpclient fails to (We're not supposed to have access to the // underlying connection object; am only violating contract because // see cases where httpclient is skipping out w/o cleaning up // after itself). this.httpRecorderMethod.setConnection(conn); return super.execute(state, conn); } protected void addProxyConnectionHeader(HttpState state, HttpConnection conn) throws IOException, HttpException { super.addProxyConnectionHeader(state, conn); this.httpRecorderMethod.handleAddProxyConnectionHeader(this); } }
该类的构造方法除了传入URL字符串外,还包括Recorder recorder对象用于初始化成员对象HttpRecorderMethod httpRecorderMethod,该对象包含两个成员Recorder httpRecorder对象和HttpConnection connection对象,在HttpRecorderPostMethod类的相关方法里面,除了调用父类的同名方法外,就是调用HttpRecorderMethod httpRecorderMethod对象的相关方法,包括设置自身的HttpConnection connection成员对象和回调Recorder httpRecorder对象方法(输入流的预备工作)
HttpRecorderPostMethod类继承自PostMethod类,与HttpRecorderGetMethod类的基本逻辑很类似,我就不再分析了
---------------------------------------------------------------------------
本系列Heritrix 3.1.0 源码解析系本人原创
转载请注明出处 博客园 刺猬的温驯
本文链接 http://www.cnblogs.com/chenying99/archive/2013/04/28/3048387.html