• scoreboarding


    Reference docs:

    https://en.wikipedia.org/wiki/Scoreboarding

    SSC_course_5_Scoreboard_ex.pdf

    1, what is scoreboarding

    A method to dynamically schedule pipelining in case of out-of-order execution when there’re no conflicts and hardware is available.

    The reason it’s called scoreboarding, as shown below, is because the way it determines whether an action is ready to go is just like a scoreboard in baseball game.

    2, principle

    In a scoreboard, the data dependencies of every instruction are logged. Instructions are released only when the scoreboard determines that there are no conflicts with previously issued and incomplete instructions.

    The logging process is not added as part of the instruction; the log is recorded during instruction pipeline processing, so we should think scoreboarding as part of the pipeline.

    3, scoreboarding stages and each stage’s resposibilities

    After fetching, instructions would go through 4 stages: issue, read, execute and write back.

    1) issue

    what to do here: Check which registers will be read and written by this instruction. Instruction would stall until current instructions intending to write to the same register are completed.

    issue = ID + structure/WAW hazard check

    2) read

    what to do here: After an instruction has been issued and correctly allocated to the required hardware module, the instruction waits until all operands become available.

    Read stage is to avoid RAW hazard. For read stage to go forward, Rj,Rk should both be Yes (meaning see below).

    3) execute

    what to do here: When all operands have been fetched, the functional unit starts its execution.

    4) write back

    what to do here: In this stage the result is about to be written to its destination register.

    In this stage, functional unit should be idle; src1/scr2 registers shoule be available and dst register should be idle.

    This stage needs to avoid WAR.

    4, data structure

    Scoreboarding maintains 3 status tables: instruction status, functional unit status and register result status.

    One demo see below:

    image

    1) instruction status

    Record which above 4 stage an instruction is in.

    2) register result status

    Record which functional unit would write to which register.

    3) functional unit status

    Each functional unit maintains 9 fields to indicate its status:

    • Busy: Indicates whether the unit is being used or not
    • Op: Operation to perform in the unit (e.g. MUL, DIV or MOD)
    • Fi: Destination register -- which register would be written
    • Fj,Fk: Source-register numbers —src1 and src2 register number
    • Qj,Qk: Functional units that will produce the source registers Fj, Fk – which operation will generate scr1 and scr2 results
    • Rj,Rk: Flags that indicates when Fj, Fk are ready for and are not yet read. – whether src1 and src2 register is available

    An example would look like this:

    image

    See reference pdf for details.

    5, algorithm in function mode

    Each stage of scoreboarding can be implemented as followed:

    1) issue

     function issue(op, dst, src1, src2)
        wait until (!Busy[FU] AND !Result[dst]); // FU can be any functional unit that can execute operation op

    -- 条件:(1) 当前FU没被使用; (2) 无其他活跃指令操作同一目的寄存器, 即无WAR风险

        Busy[FU] ← Yes;
        Op[FU] ← op;
        F

    i

    [FU] ← dst;
        F

    j

    [FU] ← src1;
        F

    k

    [FU] ← src2;
        Q

    j

    [FU] ← Result[src1];
        Q

    k

    [FU] ← Result[src2];
        R

    j

    [FU] ← Q

    j

    [FU] == 0;
        R

    k

    [FU] ← Q

    k

    [FU] == 0;
        Result[dst] ← FU;

    2) read

     function read_operands(FU)
        wait until (R

    j

    [FU] AND R

    k

    [FU]);
    -- 条件:Rj和Rk均为Yes
        R

    j

    [FU] ← No;
        R

    k

    [FU] ← No;

    3) execute

     function execute(FU)
        // Execute whatever FU must do

    4) write back

     function write_back(FU)
        wait until (∀f {(F

    j

    [f]≠F

    i

    [FU] OR R

    j

    [f]=No) AND (F

    k

    [f]≠F

    i

    [FU] OR R

    k

    [f]=No)})
    -- 条件:FU可使用, scr1/scr2可使用, dst可使用
        foreach f do
            if Q

    j

    [f]=FU then R

    j

    [f] ← Yes;
            if Q

    k

    [f]=FU then R

    k

    [f] ← Yes;
        Result[F

    i

    [FU]] ← 0; // 0 means no FU generates the register's result
        Busy[FU] ← No;

    Again, above algorithm may look odd, they completely make sense we going through the pdf.

    6, typical scoreboarding structure

    image

    2 FP multiply, 1 FP adder, 1 FP divider, 1 FP integer

    7, scoreboarding limitation

    (1) stall on name dependencies

    For example,

            MULT F4, F2, F2

            ADDD F2, F0, F6

    Actually above instructions are the same as:

            MULT F4, F2, F2

            ADDD F8, F0, F6

    but scoreboadring cannot tell. To scoreboarding, this is a WAR hazard.

    And it’s not difficult to conclude that scoreboarding may also see a name dependencies case as a WAW hazard.

    (This limitation can be covered in Tomasulo’s Reservation Station/Renaming mechanism.)

    (2) no forwarding hardware

    (3) instruction parallelism is limited by the number of function units.

  • 相关阅读:
    1040 Longest Symmetric String (25 分)
    1087 All Roads Lead to Rome (30 分)
    数据结构与算法(十三)——红黑树1 Craftsman
    数据结构与算法(十三)——红黑树2 Craftsman
    Java基础(十一)——反射 Craftsman
    docker 安装fastdfs
    ubuntu新建用户
    pytorch函数zero_grad(),step()作用
    ubuntu挂载新硬盘并分区
    使用pytorch求梯度
  • 原文地址:https://www.cnblogs.com/freshair_cnblog/p/8657479.html
Copyright © 2020-2023  润新知