Heritrix 3.1.0 源码解析(二十三)中我们分析了Heritrix3.1.0系统是怎样扩展HttpClient组件的HttpConnection连接对象和相应的管理接口HttpConnectionManager
HttpConnection连接对象里面创建了SOCKET连接,但是还没用向输出流写数据,也没有从输入流读数据, 这里面HttpClient组件是怎么实现的,Heritrix3.1.0系统又是怎么扩展的呢?
public interface HttpMethod { // ---------------------------------------------------------------- Queries //与Response相关部分 boolean validate(); int getStatusCode(); byte[] getResponseBody() throws IOException; String getResponseBodyAsString() throws IOException; InputStream getResponseBodyAsStream() throws IOException; int execute(HttpState state, HttpConnection connection) throws HttpException, IOException; void releaseConnection();boolean getDoAuthentication(); void setDoAuthentication(boolean doAuthentication); public HttpMethodParams getParams(); public void setParams(final HttpMethodParams params); public AuthState getHostAuthState(); public AuthState getProxyAuthState(); boolean isRequestSent(); }
/** * Executes this method using the specified <code>HttpConnection</code> and * <code>HttpState</code>. * * @param state {@link HttpState state} information to associate with this * request. Must be non-null. * @param conn the {@link HttpConnection connection} to used to execute * this HTTP method. Must be non-null. * * @return the integer status code if one was obtained, or <tt>-1</tt> * * @throws IOException if an I/O (transport) error occurs * @throws HttpException if a protocol exception occurs. */ public int execute(HttpState state, HttpConnection conn) throws HttpException, IOException { LOG.trace("enter HttpMethodBase.execute(HttpState, HttpConnection)"); // this is our connection now, assign it to a local variable so // that it can be released later this.responseConnection = conn; checkExecuteConditions(state, conn); this.statusLine = null; this.connectionCloseForced = false; conn.setLastResponseInputStream(null); // determine the effective protocol version if (this.effectiveVersion == null) { this.effectiveVersion = this.params.getVersion(); } //Socket输出流 writeRequest(state, conn); this.requestSent = true; //Socket输入流 readResponse(state, conn); // the method has successfully executed used = true; return statusLine.getStatusCode(); }
上面方法中的writeRequest(state, conn)负责写入流,readResponse(state, conn)负责读取流
writeRequest(state, conn)方法写入流的过程无非是组装数据,Heritrix3.1.0系统就是通过这个入口切入的,并改写了HttpMethodBase类,写入自定义的逻辑,包括cookies的写入和form参数的写入等(这部分待分析HERITRIX3.1.0系统的自定义cookies和form封装再分析吧)
该方法除了执行上述公用的逻辑外,还继续调用了boolean writeRequestBody(HttpState state, HttpConnection conn)方法,该方法通常由子类实现
public class HttpRecorderGetMethod extends GetMethod { protected static Logger logger = Logger.getLogger(HttpRecorderGetMethod.class.getName()); /** * Instance of http recorder method. */ protected HttpRecorderMethod httpRecorderMethod = null; public HttpRecorderGetMethod(String uri, Recorder recorder) { super(uri); this.httpRecorderMethod = new HttpRecorderMethod(recorder); } protected void readResponseBody(HttpState state, HttpConnection connection) throws IOException, HttpException { // We're about to read the body. Mark transition in http recorder. this.httpRecorderMethod.markContentBegin(connection); super.readResponseBody(state, connection); } protected boolean shouldCloseConnection(HttpConnection conn) { // Always close connection after each request. As best I can tell, this // is superfluous -- we've set our client to be HTTP/1.0. Doing this // out of paranoia. return true; } public int execute(HttpState state, HttpConnection conn) throws HttpException, IOException { // Save off the connection so we can close it on our way out in case // httpclient fails to (We're not supposed to have access to the // underlying connection object; am only violating contract because // see cases where httpclient is skipping out w/o cleaning up // after itself). this.httpRecorderMethod.setConnection(conn); return super.execute(state, conn); } protected void addProxyConnectionHeader(HttpState state, HttpConnection conn) throws IOException, HttpException { super.addProxyConnectionHeader(state, conn); this.httpRecorderMethod.handleAddProxyConnectionHeader(this); } }
该类的构造方法除了传入URL字符串外,还包括Recorder recorder对象用于初始化成员对象HttpRecorderMethod httpRecorderMethod,该对象包含两个成员Recorder httpRecorder对象和HttpConnection connection对象,在HttpRecorderPostMethod类的相关方法里面,除了调用父类的同名方法外,就是调用HttpRecorderMethod httpRecorderMethod对象的相关方法,包括设置自身的HttpConnection connection成员对象和回调Recorder httpRecorder对象方法(输入流的预备工作)
本系列Heritrix 3.1.0 源码解析系本人原创
转载请注明出处 博客园 刺猬的温驯
本文链接 http://www.cnblogs.com/chenying99/archive/2013/04/28/3048387.html