中国科学院大学数字集成电路作业开源——大作业
0、说明
本次大作业难度不算大,但是还是花了不少时间,主要是以前学习DSP时其实对FFT的理解仍然是肤浅的,没有深入思考过用代码实现的问题,而借着这次大作业的机会,我深入的梳理了一遍蝶形单元的组成形式,DIT-FFT和DIF-FFT,IFFT和FFT的关系等,可以说收获颇丰,也说明了一切知识的理解的加深都是建立在实践上的,通过理论与实践的结合才能够构建完整的认识。
当然,这次实现的FFT只是采取了最简单的形式,这方面的水是很深的,很多专业人士在这方面投入了很多心血,本文的方案仅为一个启发式的简单方案。
1、64点FFT处理器
FT用于快速计算离散傅立叶变换(DFT)。
长为N的序列x(n)的DFT定义为:
相应的序列X(k)的IDFT定义为
这里DFT和IDFT定义均忽略前面常数因子。
设计一个FFT处理器时序逻辑电路,计算64点FFT和IFFT(N = 64)。模块整体采用流水线结构实现,能够处理连续多组输入数据。
顶层模块名为fft_64,输入输出功能定义:
名称 | 方向 | 位宽 | 描述 |
---|---|---|---|
clk | I | 1 | 系统时钟 |
rst_n | I | 1 | 系统异步复位,低电平有效 |
inv | I | 1 | 模式控制,0表示FFT运算,1表示IFFT运算 |
valid_in | I | 1 | 输入数据有效指示,高电平有效 |
sop_in | I | 1 | 每组输入数据(64个数)第一个有效数据指示,高电平有效 |
x_re | I | 16 | 输入数据实部,二进制补码定点格式 |
x_im | I | 16 | 输入数据虚部,二进制补码定点格式 |
valid_out | O | 1 | 输出数据有效指示,高电平有效 |
sop_out | O | 1 | 每组输出数据(64个数)第一个有效数据指示,高电平有效 |
y_re | O | 16 | 输出数据实部,二进制补码定点格式 |
y_im | O | 16 | 输出数据虚部,二进制补码定点格式 |
设计要求:
l Verilog实现代码可综合,给出详细设计文档、综合以及仿真结果。
l 在每组数据处理过程中,inv信号值保持不变。
l 计算过程进行适当精度控制,保证输出结果精确度,输出定点格式(精度范围)可以根据需要进行调整,需要对计算结果进行误差分析。
设计思路:
为了实现FFT和IFFT计算,首先需要明确FFT/IFFT的计算结构。8位基2 DIT-FFT结构图如下:
作业要求实现流水线结构的64位FFT/IFFT运算电路,针对上图进行分析有如下要点:
(1)FFT/IFFT运算需要在装入全部数据后进行,由于运算电路为串行输入,因此必须要通过FIFO进行装入,待数据全部装入完成后开始运算
(2)FFT/IFFT运算结果的输出是并行同时输出的,由于运算电路为串行输出,因此必须要通过FIFO将数据由并行转成串行输出,即结果计算完成后装入FIFO然后依次排出
(以上的两个FIFO需要配合counter.v进行控制,FIFO输入数据后开始启动计时器,历经64拍后装满/排空FIFO)
(3)FFT/IFFT运算模块由多个蝶形单元组合而成,每个运算单元示意图如下:
蝶形单元的输入输出满足:
其中,\(q = p + 2^m\),每一个蝶形单元运算时,进行了一次乘法和两次加法。每一级中,均有 N/2 个蝶形单元。
(4)FFT/IFFT运算模块共用中间的蝶形运算单元阵列,仅在输入输出处理上有不同,这样方便于减少逻辑资源的消耗。IFFT复用FFT的阵列需要在输出进行实部和虚部的交换,以及实部和虚部需要除以64(右移6位)。
(5)输入时需要对数据进行重排,规律如下:
将顺序的二进制码倒序后便可以得到重排后的输入顺序,通过matlab的dec2bin,fliplr和bin2dec三个函数实现64位FFT/IFFT的输入排序,将matlab的计算结果导出用于verilog中的输入重排。
对于64位的FFT/IFFT模块,总共需要例化32*6=192个蝶形单元,采用generate方法进行生成,为此需要明确蝶形单元之间互联,以及对应的旋转因子的数量关系,在matlab中通过butterfly函数描述蝶形单元(见附录butterfly.m),并基于butterfly函数完成FFT/IFFT运算,与matlab自带的FFT/IFFT函数的计算进行比较(见附录my_fft.m,my_ifft.m),每个单元的旋转因子量化为16位定点数,在小数的基础上乘以0x2000然后取整得到(见附录wnr.m)。
关于各个蝶形单元互联的顺序,64位FFT共有6层,m为层数,i为组数,j为蝶形单元组内编号。可以分析出每个蝶形的输入/出上节点编号,输入/出下节点编号以及旋转因子编号分别为:
输入/出上节点编号:(i-1)* 2^m + j
输入/出下节点编号:(i-1)* 2^m + j + 2^(m-1)
旋转因子编号:2^(6-m) * (j-1)
以下图为例,8位fft共有3层,红框框出的蝶形其层数m=1,组数i=3,组内编号j=1。其上节点编号(2-1)* 2^1 + 1 – 1 = 5,即为从上往下数第5个节点(因为输入已经进行了重排,所以这里的编号都是顺序的),下编号节点编号5+2^(1-1) = 6,旋转因子编号2^(3-1) * (1-1) = 0,所以旋转因子是\(W_N^0\)。
关于流水线结构实现的问题,经过调研,FFT流水线结构有延迟反馈型DF,延迟转换型DC,串行转换型SC,单流前向型SFF等。本次采用基2的DIT-FFT,延迟转换型DC结构。
整个系统的整体架构如下图所示:
参考资料:
菜鸟教程网《Verilog FFT 设计》
知乎网《数字芯片IP设计——FFT加速核设计(一)》
《A Survey on Pipelined FFT Hardware Architecture》
《基于流水线的FFT快速计算方法与实现技术》
代码实现:
butterfly.v
/**************** butter unit *************************
Xm(p) ------------------------> Xm+1(p)
- ->
- -
-
- -
- ->
Xm(q) ------------------------> Xm+1(q)
Wn -1
*//////////////////////////////////////////////////////
module butterfly (
input clk,
input rst_n,
input en,
input signed [15:0] xp_re, // Xm(p)
input signed [15:0] xp_im,
input signed [15:0] xq_re, // Xm(q)
input signed [15:0] xq_im,
input signed [15:0] factor_re, // Wnr
input signed [15:0] factor_im,
output vld,
output signed [15:0] yp_re, // Xm+1(p)
output signed [15:0] yp_im,
output signed [15:0] yq_re, // Xm+1(q)
output signed [15:0] yq_im
);
reg [2:0] en_r;
always @(posedge clk or negedge rst_n) begin
if(rst_n == 1'b0) begin
en_r <= 0;
end
else begin
en_r <= {en_r[1:0],en};
end
end
reg signed [31:0] xq_wnr_re0;
reg signed [31:0] xq_wnr_re1;
reg signed [31:0] xq_wnr_im0;
reg signed [31:0] xq_wnr_im1;
reg signed [31:0] xp_re_d;
reg signed [31:0] xp_im_d;
always @(posedge clk or negedge rst_n) begin
if (rst_n == 1'b0) begin
xp_re_d <= 0;
xp_im_d <= 0;
xq_wnr_re0 <= 0;
xq_wnr_re1 <= 0;
xq_wnr_im0 <= 0;
xq_wnr_im1 <= 0;
end
else if (en) begin
xq_wnr_re0 <= xq_re * factor_re;
xq_wnr_re1 <= xq_im * factor_im;
xq_wnr_im0 <= xq_re * factor_im;
xq_wnr_im1 <= xq_im * factor_re;
xp_re_d <= {{4{xp_re[15]}}, xp_re[14:0], 13'b0};
xp_im_d <= {{4{xp_im[15]}}, xp_im[14:0], 13'b0};
end
end
reg signed [31:0] xq_wnr_re;
reg signed [31:0] xq_wnr_im;
reg signed [31:0] xp_re_d1;
reg signed [31:0] xp_im_d1;
always @(posedge clk or negedge rst_n) begin
if (rst_n == 1'b0) begin
xq_wnr_re <= 0;
xq_wnr_im <= 0;
xp_re_d1 <= 0;
xp_im_d1 <= 0;
end
else if (en_r[0]) begin
xp_re_d1 <= xp_re_d;
xp_im_d1 <= xp_im_d;
xq_wnr_re <= xq_wnr_re0 - xq_wnr_re1;
xq_wnr_im <= xq_wnr_im0 + xq_wnr_im1;
end
end
reg signed [31:0] yp_re_r;
reg signed [31:0] yp_im_r;
reg signed [31:0] yq_re_r;
reg signed [31:0] yq_im_r;
always @(posedge clk or negedge rst_n) begin
if (rst_n == 1'b0) begin
yp_re_r <= 0;
yp_im_r <= 0;
yq_re_r <= 0;
yq_im_r <= 0;
end
else if (en_r[1]) begin
yp_re_r <= xp_re_d1 + xq_wnr_re;
yp_im_r <= xp_im_d1 + xq_wnr_im;
yq_re_r <= xp_re_d1 - xq_wnr_re;
yq_im_r <= xp_im_d1 - xq_wnr_im;
end
end
assign yp_re = {yp_re_r[31],yp_re_r[13+15:13]};
assign yp_im = {yp_im_r[31],yp_im_r[13+15:13]};
assign yq_re = {yq_re_r[31],yq_re_r[13+15:13]};
assign yq_im = {yq_im_r[31],yq_im_r[13+15:13]};
assign vld = en_r[2];
endmodule
counter.v
module counter (
input clk,
input rst_n,
input [6:0] thresh,
input start,
input valid,
output reg not_zero,
output reg full
);
reg [6:0] cnt;
reg cur_state, nxt_state;
reg cnt_en;
parameter START = 0;
parameter STOP = 1;
always @(posedge clk or negedge rst_n) begin
if (rst_n == 1'b0) begin
cur_state <= STOP;
cnt_en <= 0;
end
else begin
cur_state <= nxt_state;
end
end
always @(cur_state or start or full) begin
case (cur_state)
STOP : if(start == 1'b1) begin
nxt_state = START;
cnt_en = 1'b1;
end
else nxt_state = STOP;
START : if(full == 1'b1) begin
nxt_state = STOP;
cnt_en = 1'b0;
end
else nxt_state = START;
default : nxt_state = STOP;
endcase
end
always @(posedge clk or negedge rst_n) begin
if (rst_n == 1'b0) begin
cnt <= 0;
full <= 1'b0;
not_zero <= 1'b0;
end
else if (cnt == thresh) begin
cnt <= 0;
full <= 1'b1;
not_zero <= 1'b0;
end
else if (valid == 1'b1 && cnt_en == 1'b1) begin
cnt <= cnt + 1;
full <= 0;
not_zero <= 1'b1;
end
else begin
cnt <= cnt;
full <= 1'b0;
not_zero <= 1'b0;
end
end
endmodule
fft_64.v
module fft_64 (
input clk,
input rst_n,
input inv,
input valid_in,
input sop_in,
input [15:0] x_re,
input [15:0] x_im,
output valid_out,
output sop_out,
output [15:0] y_re,
output [15:0] y_im
);
// operating data
wire signed [15:0] xm_re [6:0] [63:0];
wire signed [15:0] xm_im [6:0] [63:0];
reg signed [15:0] xm_re_buf [63:0];
reg signed [15:0] xm_im_buf [63:0];
// enable control
wire [6:0] en_ctrl;
// input buffer
integer k;
always @(posedge clk or negedge rst_n) begin
if(rst_n == 1'b0) begin
for (k = 0; k <= 63; k=k+1 ) begin
xm_re_buf[k] <= 0;
xm_im_buf[k] <= 0;
end
end
else begin
for (k = 0; k <= 62; k=k+1 ) begin
xm_re_buf[k+1] <= xm_re_buf[k];
xm_im_buf[k+1] <= xm_im_buf[k];
end
xm_re_buf[0] <= x_re;
xm_im_buf[0] <= x_im;
end
end
// input ctrl
parameter TIME_THRESH_IN = 62;
counter counter_u1(
.clk(clk),
.rst_n(rst_n),
.thresh(TIME_THRESH_IN),
.start(sop_in),
.valid(valid_in),
.not_zero(),
.full(en_ctrl[0])
);
// output buffer
reg signed [15:0] ym_re_buf [63:0];
reg signed [15:0] ym_im_buf [63:0];
always @(posedge clk or negedge rst_n) begin
if (rst_n == 1'b0) begin
for (k = 0; k<=63 ; k=k+1 ) begin
ym_re_buf[k] <= 0;
ym_im_buf[k] <= 0;
end
end
else if (en_ctrl[6]) begin
for (k = 0; k<=63 ; k=k+1 ) begin
ym_re_buf[k] <= xm_re[6][k];
ym_im_buf[k] <= xm_im[6][k];
end
end
else begin
for (k = 0; k<=62; k=k+1 ) begin
ym_re_buf[k] <= ym_re_buf[k+1];
ym_im_buf[k] <= ym_im_buf[k+1];
end
end
end
// output ctrl
parameter TIME_THRESH_OUT = 64;
counter counter_u2(
.clk(clk),
.rst_n(rst_n),
.thresh(TIME_THRESH_OUT),
.start(sop_out),
.valid(1'b1),
.not_zero(valid_out),
.full()
);
assign sop_out = en_ctrl[6];
reg signed [15:0] y_re_out,y_im_out;
always@(ym_re_buf[0] or ym_im_buf[0]) begin
if (inv == 1'b0) begin
y_re_out = ym_re_buf[0];
y_im_out = ym_im_buf[0];
end
else if(inv == 1'b1) begin
y_re_out = (ym_im_buf[0] >>> 6);
y_im_out = (ym_re_buf[0] >>> 6);
end
end
assign y_re = y_re_out;
assign y_im = y_im_out;
// Wnr, multiplied by 0x2000
wire signed [15:0] factor_re[31:0];
wire signed [15:0] factor_im[31:0];
assign factor_re[0] = 16'h2000;
assign factor_im[0] = 16'h0000;
assign factor_re[1] = 16'h1FD8;
assign factor_im[1] = 16'hFCDD;
assign factor_re[2] = 16'h1F62;
assign factor_im[2] = 16'hF9C1;
assign factor_re[3] = 16'h1E9F;
assign factor_im[3] = 16'hF6B5;
assign factor_re[4] = 16'h1D90;
assign factor_im[4] = 16'hF3C1;
assign factor_re[5] = 16'h1C38;
assign factor_im[5] = 16'hF0EA;
assign factor_re[6] = 16'h1A9B;
assign factor_im[6] = 16'hEE38;
assign factor_re[7] = 16'h18BC;
assign factor_im[7] = 16'hEBB3;
assign factor_re[8] = 16'h16A0;
assign factor_im[8] = 16'hE95F;
assign factor_re[9] = 16'h144C;
assign factor_im[9] = 16'hE743;
assign factor_re[10] = 16'h11C7;
assign factor_im[10] = 16'hE564;
assign factor_re[11] = 16'h0F15;
assign factor_im[11] = 16'hE3C7;
assign factor_re[12] = 16'h0C3E;
assign factor_im[12] = 16'hE26F;
assign factor_re[13] = 16'h094A;
assign factor_im[13] = 16'hE160;
assign factor_re[14] = 16'h063E;
assign factor_im[14] = 16'hE09D;
assign factor_re[15] = 16'h0322;
assign factor_im[15] = 16'hE027;
assign factor_re[16] = 16'h0000;
assign factor_im[16] = 16'hE000;
assign factor_re[17] = 16'hFCDD;
assign factor_im[17] = 16'hE027;
assign factor_re[18] = 16'hF9C1;
assign factor_im[18] = 16'hE09D;
assign factor_re[19] = 16'hF6B5;
assign factor_im[19] = 16'hE160;
assign factor_re[20] = 16'hF3C1;
assign factor_im[20] = 16'hE26F;
assign factor_re[21] = 16'hF0EA;
assign factor_im[21] = 16'hE3C7;
assign factor_re[22] = 16'hEE38;
assign factor_im[22] = 16'hE564;
assign factor_re[23] = 16'hEBB3;
assign factor_im[23] = 16'hE743;
assign factor_re[24] = 16'hE95F;
assign factor_im[24] = 16'hE95F;
assign factor_re[25] = 16'hE743;
assign factor_im[25] = 16'hEBB3;
assign factor_re[26] = 16'hE564;
assign factor_im[26] = 16'hEE38;
assign factor_re[27] = 16'hE3C7;
assign factor_im[27] = 16'hF0EA;
assign factor_re[28] = 16'hE26F;
assign factor_im[28] = 16'hF3C1;
assign factor_re[29] = 16'hE160;
assign factor_im[29] = 16'hF6B5;
assign factor_re[30] = 16'hE09D;
assign factor_im[30] = 16'hF9C1;
assign factor_re[31] = 16'hE027;
assign factor_im[31] = 16'hFCDD;
// input intial, change order
assign xm_re[0][0] = xm_re_buf[0];
assign xm_re[0][1] = xm_re_buf[32];
assign xm_re[0][2] = xm_re_buf[16];
assign xm_re[0][3] = xm_re_buf[48];
assign xm_re[0][4] = xm_re_buf[8];
assign xm_re[0][5] = xm_re_buf[40];
assign xm_re[0][6] = xm_re_buf[24];
assign xm_re[0][7] = xm_re_buf[56];
assign xm_re[0][8] = xm_re_buf[4];
assign xm_re[0][9] = xm_re_buf[36];
assign xm_re[0][10] = xm_re_buf[20];
assign xm_re[0][11] = xm_re_buf[52];
assign xm_re[0][12] = xm_re_buf[12];
assign xm_re[0][13] = xm_re_buf[44];
assign xm_re[0][14] = xm_re_buf[28];
assign xm_re[0][15] = xm_re_buf[60];
assign xm_re[0][16] = xm_re_buf[2];
assign xm_re[0][17] = xm_re_buf[34];
assign xm_re[0][18] = xm_re_buf[18];
assign xm_re[0][19] = xm_re_buf[50];
assign xm_re[0][20] = xm_re_buf[10];
assign xm_re[0][21] = xm_re_buf[42];
assign xm_re[0][22] = xm_re_buf[26];
assign xm_re[0][23] = xm_re_buf[58];
assign xm_re[0][24] = xm_re_buf[6];
assign xm_re[0][25] = xm_re_buf[38];
assign xm_re[0][26] = xm_re_buf[22];
assign xm_re[0][27] = xm_re_buf[54];
assign xm_re[0][28] = xm_re_buf[14];
assign xm_re[0][29] = xm_re_buf[46];
assign xm_re[0][30] = xm_re_buf[30];
assign xm_re[0][31] = xm_re_buf[62];
assign xm_re[0][32] = xm_re_buf[1];
assign xm_re[0][33] = xm_re_buf[33];
assign xm_re[0][34] = xm_re_buf[17];
assign xm_re[0][35] = xm_re_buf[49];
assign xm_re[0][36] = xm_re_buf[9];
assign xm_re[0][37] = xm_re_buf[41];
assign xm_re[0][38] = xm_re_buf[25];
assign xm_re[0][39] = xm_re_buf[57];
assign xm_re[0][40] = xm_re_buf[5];
assign xm_re[0][41] = xm_re_buf[37];
assign xm_re[0][42] = xm_re_buf[21];
assign xm_re[0][43] = xm_re_buf[53];
assign xm_re[0][44] = xm_re_buf[13];
assign xm_re[0][45] = xm_re_buf[45];
assign xm_re[0][46] = xm_re_buf[29];
assign xm_re[0][47] = xm_re_buf[61];
assign xm_re[0][48] = xm_re_buf[3];
assign xm_re[0][49] = xm_re_buf[35];
assign xm_re[0][50] = xm_re_buf[19];
assign xm_re[0][51] = xm_re_buf[51];
assign xm_re[0][52] = xm_re_buf[11];
assign xm_re[0][53] = xm_re_buf[43];
assign xm_re[0][54] = xm_re_buf[27];
assign xm_re[0][55] = xm_re_buf[59];
assign xm_re[0][56] = xm_re_buf[7];
assign xm_re[0][57] = xm_re_buf[39];
assign xm_re[0][58] = xm_re_buf[23];
assign xm_re[0][59] = xm_re_buf[55];
assign xm_re[0][60] = xm_re_buf[15];
assign xm_re[0][61] = xm_re_buf[47];
assign xm_re[0][62] = xm_re_buf[31];
assign xm_re[0][63] = xm_re_buf[63];
assign xm_im[0][0] = xm_im_buf[0];
assign xm_im[0][1] = xm_im_buf[32];
assign xm_im[0][2] = xm_im_buf[16];
assign xm_im[0][3] = xm_im_buf[48];
assign xm_im[0][4] = xm_im_buf[8];
assign xm_im[0][5] = xm_im_buf[40];
assign xm_im[0][6] = xm_im_buf[24];
assign xm_im[0][7] = xm_im_buf[56];
assign xm_im[0][8] = xm_im_buf[4];
assign xm_im[0][9] = xm_im_buf[36];
assign xm_im[0][10] = xm_im_buf[20];
assign xm_im[0][11] = xm_im_buf[52];
assign xm_im[0][12] = xm_im_buf[12];
assign xm_im[0][13] = xm_im_buf[44];
assign xm_im[0][14] = xm_im_buf[28];
assign xm_im[0][15] = xm_im_buf[60];
assign xm_im[0][16] = xm_im_buf[2];
assign xm_im[0][17] = xm_im_buf[34];
assign xm_im[0][18] = xm_im_buf[18];
assign xm_im[0][19] = xm_im_buf[50];
assign xm_im[0][20] = xm_im_buf[10];
assign xm_im[0][21] = xm_im_buf[42];
assign xm_im[0][22] = xm_im_buf[26];
assign xm_im[0][23] = xm_im_buf[58];
assign xm_im[0][24] = xm_im_buf[6];
assign xm_im[0][25] = xm_im_buf[38];
assign xm_im[0][26] = xm_im_buf[22];
assign xm_im[0][27] = xm_im_buf[54];
assign xm_im[0][28] = xm_im_buf[14];
assign xm_im[0][29] = xm_im_buf[46];
assign xm_im[0][30] = xm_im_buf[30];
assign xm_im[0][31] = xm_im_buf[62];
assign xm_im[0][32] = xm_im_buf[1];
assign xm_im[0][33] = xm_im_buf[33];
assign xm_im[0][34] = xm_im_buf[17];
assign xm_im[0][35] = xm_im_buf[49];
assign xm_im[0][36] = xm_im_buf[9];
assign xm_im[0][37] = xm_im_buf[41];
assign xm_im[0][38] = xm_im_buf[25];
assign xm_im[0][39] = xm_im_buf[57];
assign xm_im[0][40] = xm_im_buf[5];
assign xm_im[0][41] = xm_im_buf[37];
assign xm_im[0][42] = xm_im_buf[21];
assign xm_im[0][43] = xm_im_buf[53];
assign xm_im[0][44] = xm_im_buf[13];
assign xm_im[0][45] = xm_im_buf[45];
assign xm_im[0][46] = xm_im_buf[29];
assign xm_im[0][47] = xm_im_buf[61];
assign xm_im[0][48] = xm_im_buf[3];
assign xm_im[0][49] = xm_im_buf[35];
assign xm_im[0][50] = xm_im_buf[19];
assign xm_im[0][51] = xm_im_buf[51];
assign xm_im[0][52] = xm_im_buf[11];
assign xm_im[0][53] = xm_im_buf[43];
assign xm_im[0][54] = xm_im_buf[27];
assign xm_im[0][55] = xm_im_buf[59];
assign xm_im[0][56] = xm_im_buf[7];
assign xm_im[0][57] = xm_im_buf[39];
assign xm_im[0][58] = xm_im_buf[23];
assign xm_im[0][59] = xm_im_buf[55];
assign xm_im[0][60] = xm_im_buf[15];
assign xm_im[0][61] = xm_im_buf[47];
assign xm_im[0][62] = xm_im_buf[31];
assign xm_im[0][63] = xm_im_buf[63];
// butter instantiation
genvar m,i,j;
generate
for (m = 0; m <= 5; m=m+1) begin:stage
for (i = 0; i <= (1<<(5-m))-1 ; i=i+1) begin:group
for (j = 0; j <= (1<<m)-1 ; j=j+1) begin:unit
butterfly butterfly_u(
.clk(clk),
.rst_n(rst_n),
.en(en_ctrl[m]),
.xp_re(xm_re[m][(i<<(m+1))+j]),
.xp_im(xm_im[m][(i<<(m+1))+j]),
.xq_re(xm_re[m][(i<<(m+1))+j+(1<<m)]),
.xq_im(xm_im[m][(i<<(m+1))+j+(1<<m)]),
.factor_re(factor_re[(1<<(5-m))*j]),
.factor_im(factor_im[(1<<(5-m))*j]),
.vld(en_ctrl[m+1]),
.yp_re(xm_re[m+1][(i<<(m+1))+j]),
.yp_im(xm_im[m+1][(i<<(m+1))+j]),
.yq_re(xm_re[m+1][(i<<(m+1))+j+(1<<m)]),
.yq_im(xm_im[m+1][(i<<(m+1))+j+(1<<m)])
);
end
end
end
endgenerate
endmodule
testbench
module testbench ();
reg clk;
reg rst_n;
reg inv;
reg valid_in;
reg sop_in;
reg [15:0] x_re;
reg [15:0] x_im;
wire valid_out;
wire sop_out;
wire signed [15:0] y_re;
wire signed [15:0] y_im;
initial begin
clk <= 1'b0;
rst_n <= 1'b0;
sop_in <= 1'b0;
valid_in <= 1'b0;
inv <= 1;
x_re <= 0;
x_im <= 0;
#30
rst_n <= 1'b1;
sop_in <= 1'b1;
valid_in <= 1'b1;
x_re <= 64;
x_im <= 64;
#20
sop_in <= 1'b0;
x_re <= 63;
x_im <= 63;
#20
x_re <= 62;
x_im <= 62;
#20
x_re <= 61;
x_im <= 61;
#20
x_re <= 60;
x_im <= 60;
#20
x_re <= 59;
x_im <= 59;
#20
x_re <= 58;
x_im <= 58;
#20
x_re <= 57;
x_im <= 57;
#20
x_re <= 56;
x_im <= 56;
#20
x_re <= 55;
x_im <= 55;
#20
x_re <= 54;
x_im <= 54;
#20
x_re <= 53;
x_im <= 53;
#20
x_re <= 52;
x_im <= 52;
#20
x_re <= 51;
x_im <= 51;
#20
x_re <= 50;
x_im <= 50;
#20
x_re <= 49;
x_im <= 49;
#20
x_re <= 48;
x_im <= 48;
#20
x_re <= 47;
x_im <= 47;
#20
x_re <= 46;
x_im <= 46;
#20
x_re <= 45;
x_im <= 45;
#20
x_re <= 44;
x_im <= 44;
#20
x_re <= 43;
x_im <= 43;
#20
x_re <= 42;
x_im <= 42;
#20
x_re <= 41;
x_im <= 41;
#20
x_re <= 40;
x_im <= 40;
#20
x_re <= 39;
x_im <= 39;
#20
x_re <= 38;
x_im <= 38;
#20
x_re <= 37;
x_im <= 37;
#20
x_re <= 36;
x_im <= 36;
#20
x_re <= 35;
x_im <= 35;
#20
x_re <= 34;
x_im <= 34;
#20
x_re <= 33;
x_im <= 33;
#20
x_re <= 32;
x_im <= 32;
#20
x_re <= 31;
x_im <= 31;
#20
x_re <= 30;
x_im <= 30;
#20
x_re <= 29;
x_im <= 29;
#20
x_re <= 28;
x_im <= 28;
#20
x_re <= 27;
x_im <= 27;
#20
x_re <= 26;
x_im <= 26;
#20
x_re <= 25;
x_im <= 25;
#20
x_re <= 24;
x_im <= 24;
#20
x_re <= 23;
x_im <= 23;
#20
x_re <= 22;
x_im <= 22;
#20
x_re <= 21;
x_im <= 21;
#20
x_re <= 20;
x_im <= 20;
#20
x_re <= 19;
x_im <= 19;
#20
x_re <= 18;
x_im <= 18;
#20
x_re <= 17;
x_im <= 17;
#20
x_re <= 16;
x_im <= 16;
#20
x_re <= 15;
x_im <= 15;
#20
x_re <= 14;
x_im <= 14;
#20
x_re <= 13;
x_im <= 13;
#20
x_re <= 12;
x_im <= 12;
#20
x_re <= 11;
x_im <= 11;
#20
x_re <= 10;
x_im <= 10;
#20
x_re <= 9;
x_im <= 9;
#20
x_re <= 8;
x_im <= 8;
#20
x_re <= 7;
x_im <= 7;
#20
x_re <= 6;
x_im <= 6;
#20
x_re <= 5;
x_im <= 5;
#20
x_re <= 4;
x_im <= 4;
#20
x_re <= 3;
x_im <= 3;
#20
x_re <= 2;
x_im <= 2;
#20
x_re <= 1;
x_im <= 1;
#20
valid_in <= 1'b0;
end
always #10 clk <= ~clk;
integer w_file;
initial w_file = $fopen("G:/Modelsim_Project/homework_final_fft/fft_output.txt");
always@(posedge clk) begin
if(valid_out == 1'b1) begin
$fwrite(w_file,"%d,%d\n",y_re,y_im);
end
end
fft_64 fft_64_u1(
.clk(clk),
.rst_n(rst_n),
.inv(inv),
.valid_in(valid_in),
.sop_in(sop_in),
.x_re(x_re),
.x_im(x_im),
.valid_out(valid_out),
.sop_out(sop_out),
.y_re(y_re),
.y_im(y_im)
);
endmodule
仿真结果:
比较matlab自带fft与自己通过butterfly函数实现的fft的计算结果
比较matlab自带ifft与自己通过butterfly函数实现的ifft的计算结果
蝶形单元仿真结果
matlab验证与结果:
r = 1;
wnr = cos(pi/4*r) - 1i*sin(pi/4*r) ;
x1 = 11 + 2i;
x2 = 21 + 5i;
x1_d = x1 + x2 * wnr;
x2_d = x1 - x2 * wnr;
x3 = 61 + 3i;
x4 = 41 + 6i;
x3_d = x3 + x4 * wnr;
x4_d = x3 - x4 * wnr;
有一点舍入的误差,但基本上准确
在仿真下逐层查看FFT的计算结果,并与matlab计算得到的每层的结果进行对比(以前6个点为例子):
第一层:
FFT模块实部xm_re[1][5:0]
FFT模块虚部xm_im[1][5:0]
Matlab第一层结果
第二层:
FFT模块实部xm_re[2][5:0]
FFT模块虚部xm_re[2][5:0]
Matlab第二层结果
第三层:
FFT模块实部xm_re[3][5:0]
FFT模块虚部xm_re[3][5:0]
Matlab第三层结果
第四层:
FFT模块实部xm_re[4][5:0]
FFT模块虚部xm_re[4][5:0]
Matlab第四层结果
第五层:
FFT模块实部xm_re[5][4:0]
FFT模块虚部xm_re[5][4:0]
Matlab第五层结果
第六层:
FFT模块实部xm_re[6][4:0]
FFT模块虚部xm_re[6][4:0]
Matlab第六层结果
最后一层的结果装入输出buffer中并串行输出到y_re,y_im
将verilog计算的结果与matlab中浮点数计算的结果进行比较,结果如下图所示,可见虽然定点化带来了一定的精度损失,但结果仍然是正确的,且误差很小。
整体时序:
将IFFT的结果导出与matlab计算的浮点数结果比较,由于IFFT输出值相对较小,所以定点化带来的误差更加明显,但仍然在一个可以接受的范围内
整体时序:
附录:
butterfly.m
function [yp,yq] = butterfly(xp,xq,wnr)
%BUTTERFLY 此处显示有关此函数的摘要
% 此处显示详细说明
yp = xp + xq * wnr;
yq = xp - xq * wnr;
end
my_fft.m
% DIT-FFT
clear;
close all;
clc;
N = 64;
m = log2(N);
%% 产生旋转系数
for r=0:N/2-1
Wnr(r+1) = cos(2*pi/N*r) - 1i*sin(2*pi/N*r) ;
end
%% 产生输入x
for r = 1:N
x(r) = r + 1i*r;
end
%% 反序操作
d = bin2dec(fliplr(dec2bin([0:N-1],m)))+1;
xm{1} = x(d);
%% my_fft
for mm = 1:m % mm为层数
for i = 1:2^(m-mm) % i为组数
for j = 1:2^(mm-1) % j为蝶形在组内的编号
k = (i-1) * 2^mm + j;
n = 2^(m-mm)*(j-1);
[xm{mm+1}(k),xm{mm+1}(k+2^(mm-1))] = butterfly(xm{mm}(k),xm{mm}(k+2^(mm-1)),Wnr(n+1));
end
end
end
%% matlab_fft
fft2 = fft(x);
%% 画图比较
figure(1)
plot(1:N, abs(fft2));
figure(2)
plot(1:N, abs(xm{m+1}), 'r');
my_ifft.m
% DIF-IFFT
clear;
close all;
clc;
N = 64;
m = log2(N);
%% 产生旋转系数
for r=0:N/2-1
Wnr(r+1) = cos(2*pi/N*r) - 1i*sin(2*pi/N*r) ;
end
%% 产生输入x
for r = 1:N
x(r) = r + 1i*r;
end
%% 反序操作
d = bin2dec(fliplr(dec2bin([0:N-1],m)))+1;
xm{1} = x(d);
%% my_ifft
for mm = 1:m % mm为层数
for i = 1:2^(m-mm) % i为组数
for j = 1:2^(mm-1) % j为蝶形在组内的编号
k = (i-1) * 2^mm + j;
n = 2^(m-mm)*(j-1);
[xm{mm+1}(k),xm{mm+1}(k+2^(mm-1))] = butterfly(xm{mm}(k),xm{mm}(k+2^(mm-1)),Wnr(n+1));
end
end
end
for i = 1:N
re = real(xm{m+1}(i));
im = imag(xm{m+1}(i));
xm{m+1}(i) = im + re*1i;
end
xm{m+1} = xm{m+1}/N;
%% matlab_ifft
ifft2 = ifft(x);
%% 画图比较
plot(1:N, abs(ifft2));
hold on
plot(1:N, abs(xm{m+1}), 'r');
wnr.m
clear all;close all;clc;
%=======================================================
% Wnr calcuting
%=======================================================
N = 64;
for r = 0:N/2-1
Wnr_factor = cos(2*pi/N*r) - 1i*sin(2*pi/N*r) ;
Wnr_integer = floor(Wnr_factor * 2^13) ;
if (real(Wnr_integer)<0)
Wnr_real = real(Wnr_integer) + 2^16 ; %负数的补码
else
Wnr_real = real(Wnr_integer) ;
end
if (imag(Wnr_integer)<0)
Wnr_imag = imag(Wnr_integer) + 2^16 ;
else
Wnr_imag = imag(Wnr_integer);
end
Wnr{2*r+1} = dec2hex(Wnr_real); %实部
Wnr{2*r+2} = dec2hex(Wnr_imag); %虚部
end