Runtime Inline Cache

从编译说起。以C1为例，假设Java代码有个invoke指令：

void foo() {
    base.vcall();
}

在编译这个foo的时候，遇到base.vcall()，它会生成一个HIR指令Invoke，这个Invoke后面lowering成LIR_OpJavaCall：

void LIRGenerator::do_Invoke(Invoke* x) {
  ...
  switch (x->code()) {
    case Bytecodes::_invokestatic:
      __ call_static(target, result_register,
                     SharedRuntime::get_resolve_static_call_stub(),
                     arg_list, info);
      break;
    case Bytecodes::_invokespecial:
    case Bytecodes::_invokevirtual:
    case Bytecodes::_invokeinterface:
      // for loaded and final (method or class) target we still produce an inline cache,
      // in order to be able to call mixed mode
      if (x->code() == Bytecodes::_invokespecial || x->target_is_final()) {
        __ call_opt_virtual(target, receiver, result_register,
                            SharedRuntime::get_resolve_opt_virtual_call_stub(),
                            arg_list, info);
      } else {
        __ call_icvirtual(target, receiver, result_register,
                          SharedRuntime::get_resolve_virtual_call_stub(),
                          arg_list, info);
      }
      break;
    case Bytecodes::_invokedynamic: {
      __ call_dynamic(target, receiver, result_register,
                      SharedRuntime::get_resolve_static_call_stub(),
                      arg_list, info);
      break;
    }
    default:
      fatal("unexpected bytecode: %s", Bytecodes::name(x->code()));
      break;
  }

}

注意call_icvirtual，它的ciMethod是target，但是最终跳转的地址dest是SharedRuntime::get_resolve_virtual_call_stub。再次lowering的时候：

void LIR_Assembler::ic_call(LIR_OpJavaCall* op) {
  __ ic_call(op->addr());
  add_call_info(code_offset(), op->info());
  assert((__ offset() - NativeCall::instruction_size + NativeCall::displacement_offset) % BytesPerWord == 0,
         "must be aligned");
}
void MacroAssembler::ic_call(address entry, jint method_index) {
  RelocationHolder rh = virtual_call_Relocation::spec(pc(), method_index);
  movptr(rax, (intptr_t)Universe::non_oop_word());
  call(AddressLiteral(entry, rh));
}

ic_call展开是两条指令

foo:
  mov rax, non_oop
  call addr(SharedRuntime::get_resolve_virtual_call_stub)

所以这个vcall()最终生成的代码是上面两条，并不会真正call到vcall，需要走一个get_resolve_virtual_call_stub。这个get_resolve_virtual_call_stub的用处会尝试看看这个base.vcall是不是一直都是单态调用，如果是则找到vcall真正的地址，假设是real_vcall，然后告诉call处，你这里是单态（set_to_monomorphic)，然后返回这个地址——同时还会修改mov 那个non_oop，把它改成当前找到的方法的，最后上面指令就变成了

foo:
  mov rax, real_call_receiver
  call real_vcall

caller侧差不多就这样。caller还需要检查，万一caller改了之后，callee肯定还要检查一下的。还是从编译说起。

void LIR_Assembler::emit_op0(LIR_Op0* op) {
  switch (op->code()) {
    ...
    case lir_std_entry:
      // init offsets
      offsets()->set_value(CodeOffsets::OSR_Entry, _masm->offset());
      _masm->align(CodeEntryAlignment);
      if (needs_icache(compilation()->method())) {
        check_icache();
      }
      offsets()->set_value(CodeOffsets::Verified_Entry, _masm->offset());
      _masm->verified_entry(compilation()->directive()->BreakAtExecuteOption);
      if (needs_clinit_barrier_on_entry(compilation()->method())) {
        clinit_barrier(compilation()->method());
      }
      build_frame();
      offsets()->set_value(CodeOffsets::Frame_Complete, _masm->offset());
      break;
    ...
}

这个vcall方法编译的时候，会在方法开头先生成ic检查指令，然后生成当前vcall的Verified entry，最后生成clinit barrier指令。然后接着生成后面的指令，所以现在大概长这样：

foo:
  mov rax, non_oop
  call addr(SharedRuntime::get_resolve_virtual_call_stub) ; 假设还没有走过resolve

vcall:
  // 方法开头
  mov rscratch1, addr(receiver, klass_offset)
  cmp rscratch1, rax
  jne get_ic_miss_stub
  ... // clinit_barrier
  // 方法体
  ...

check_icache就是插入inlinecache（后简称ic）的地方。

int LIR_Assembler::check_icache() {
  Register receiver = FrameMap::receiver_opr->as_register();
  Register ic_klass = IC_Klass;
  ...
  int offset = __ offset();
  __ inline_cache_check(receiver, IC_Klass);
  ...
  return offset;
}
void C1_MacroAssembler::inline_cache_check(Register receiver, Register iCache) {
  ...
  if (UseCompressedClassPointers) {
    load_klass(rscratch1, receiver, tmp_load_klass);
    cmpptr(rscratch1, iCache);
  } else {
    cmpptr(iCache, Address(receiver, oopDesc::klass_offset_in_bytes()));
  }
  jump_cc(Assembler::notEqual,
          RuntimeAddress(SharedRuntime::get_ic_miss_stub()));
  ...
}

IC_Klass是rax，和之前caller的调用点生成的一致，都是rax。这里inline_cache_check获取当前receiver对象的klass，判断是否和rax一样，如果一样，ic命中，无事发生。如果没有命中，则进入get_ic_miss_stub。这个get_ic_miss_stub可能视情况（vcall是接口调用还是虚调用）修改caller处的代码。假设这是虚调用

foo:
  mov rax, non_oop
  call vtable_stub

vcall:
  // 方法开头 unverified_entry_point
  mov rscratch1, addr(receiver, klass_offset)
  cmp rscratch1, rax
  jne get_ic_miss_stub
  ... // clinit_barrier
  // 方法体 verified_entry_point
  ...

那么caller处会修改成一个vtable_stub，这个vtable_stub就和c++类似了，查虚表，找目标方法，最后调用，相对之前会慢很多。可能有人已经注意到，这里改了caller，那callee的ic check指令就完全没必要了，所以vtable_stub直接指向的是verified_entry_point，即跳过ic check指令的代码地址。与之对应的概念，unverified_entry_point就是指包括ic check的代码地址。还有一篇补充文章https://wiki.openjdk.java.net/display/HotSpot/Overview+of+CompiledIC+and+CompiledStaticCall可以阅读，我写完后才发现的，它更详细更精确。

相关阅读:
Java中的流
 多种日志收集工具比较
 UML类图几种关系的总结
 从数组中找出第K大的数
 数组各种排序算法和复杂度分析
 时间复杂度和空间复杂度
 Java Classloader原理分析
 WebSocket
TCP/IP详解--TCP连接中TIME_WAIT状态过多
 TCP协议
原文地址：https://www.cnblogs.com/kelthuzadx/p/16038179.html