caffe模型转pytorch---LSTM
本文官方链接https://www.cnblogs.com/yanghailin/p/15599428.html,未经授权勿转载
先来个总结:
具体的可以看博客:
https://www.cnblogs.com/yanghailin/p/15599428.html
caffe提取权重搭建pytorch网络,实现lstm转换。
pytorch1.0,cuda8.0,libtorch1.0
pytorch1.0上面是可以的,精度一致,但是转libtorch的时候也没有问题,没有任何提示,转pt是没有问题的。
但是就是最后精度不对,找到问题就是lstm那层精度不对。上面一层精度还是对的。无解。
然后又试验了pytorch1.1.0环境,没问题。
github提的issue:
https://github.com/pytorch/pytorch/issues/68864
之前完成了几个网络的caffe转pytorch。
refinenet https://www.cnblogs.com/yanghailin/p/13096258.html
refinedet https://www.cnblogs.com/yanghailin/p/12965695.html
上面那个是提取caffe权重然后转libtorch,下面是直接对应的pytorch版本转libtorch,大量的后处理用libtorch完成,后来同事也完成了直接拿caffe权重转libtorch。
无出意外,上面的都是需要编译caffe的python接口完成。但是一般的工程场景是我们只用caffe的c++,有时候没有对应的python工程。然后编译python接口并调用会有一些麻烦。
后来我想为啥我们要多此一举,直接用caffe跑前向的c++工程难道不行吗?
其实是可以的,只是caffe的源码复杂,一开始看不懂。
本系列的博客就是直接用caffe的c++工程直接提取权重,搭建同样的pytorch网络,把caffe权重填充过来就可以直接运行跑前向推理。
我是这么处理的,首先编译caffe lstm的cpu版本,可以在clion里面debug,我是在/caffe_ocr/tools/caffe.cpp 把caffe.cpp原有的都删了,然后换上了lstm 跑前向推理的代码,这样编译出来的caffe源码。
然后我就可以打断点调试了。
caffe源码是一个高度抽象的工程,通过Layer作为基类,其他所有算法模块都是在这个Layer上派生出来的。
net类是一个很重要的类,它管理与统筹了整个网络,在net类中可以拿到网络所有中间feature map结果,可以拿到每个层对应的权重。
由于我的目的是需要转lstm到pytorch。所有把lstm这个算子实现方法整明白至关重要。一看不知道,再看直接傻眼。lstm实现真是复杂啊!它内部自己整了一个net类!!!双向lstm就是整了2个net类,派生于RecurrentLayer这个类。
lstm原理的话就是那6个公式,看这个博客就可以:
https://www.jianshu.com/p/9dc9f41f0b29
https://colah.github.io/posts/2015-08-Understanding-LSTMs/
本文并不打算仔细讲解caffe源码与lstm具体实现方式。后面有机会单独开一个博客吧。
本文具体讲解从caffemodel提取各个层的权重。权重是一般是很大的一个矩阵,比如[64,3,7,7], 需要把这些权重保存起来供Python读取。
一开始我也在c++想有啥办法和Python numpy一样的方便处理矩阵,想过了用json,xml或者直接用caffe自带的blob类,但是不会用啊!用caffe的proto应该是可以的,但是不会用。
然后就用最直接的方法吧,就是把权重直接一行一个保存在本地txt中,文件命名就直接是该层的层名,比如该层层名是conv1,那么就是conv1_weight_0.txt,conv1_weight_1.txt。第一行放形状,比如64,3,7,7。
由于权重也是以blob形式存在的,所以我在blob源码里面加上了保存该blob数据到本地txt的函数,只需要提供保存的地址就可以。如下:
void save_data_to_txt(const string path_txt,bool b_save_shape = true)
{
std::ofstream fOut(path_txt);
if (!fOut)
{
std::cout << "Open output file faild." << std::endl;
}
if(b_save_shape)
{
for(int i=0;i<shape_.size();i++)
{
fOut << shape_[i];
if(i == shape_.size()-1)
{
fOut<<std::endl;
}else
{
fOut<<",";
}
}
}
const Dtype* data_vec = cpu_data();
for (int i = 0; i < count_; ++i) {
fOut << data_vec[i] << std::endl;
}
fOut.close();
}
下面直接上我的代码,保存每层权重到txt的代码如下:
std::cout<<"\n\n\n\n============2021-11-18======================================="<<std::endl;
shared_ptr<Net<float> > net_ = classifier.get_net(); //这里是从跑前向的类里面拿Net类
vector<shared_ptr<Layer<float> > > layers = net_->layers(); //拿到每层Layer算子的指针
vector<shared_ptr<Blob<float> > > params = net_->params();//拿到所有权重指针
vector<vector<Blob<float>*> > bottom_vecs_ = net_->bottom_vecs();//拿到所有bottom feature map
vector<vector<Blob<float>*> > top_vecs_ = net_->top_vecs();//拿到所有top feature map //注意这里面的layers和bottom_vecs_ top_vecs_都是一一对应的
std::cout<<"size layer=" << layers.size()<<std::endl;
std::cout<<"size params=" << params.size()<<std::endl;
string path_save_dir = "/data_1/Yang/project/save_weight/";
for(int i=0;i<layers.size();i++)
{
shared_ptr<Layer<float> > layer = layers[i];
string name_layer = layer->layer_param().name();//当前层层名
std::cout<<i<<" layer_name="<<name_layer<<" type="<<layer->layer_param().type()<<std::endl;
int bottom_name_size = layer->layer_param().bottom().size();
std::cout<<"=================bottom================"<<std::endl;
if(bottom_name_size>0)
{
for(int ii=0;ii<bottom_name_size;ii++)
{
std::cout<<ii<<" ::bottom name="<<layer->layer_param().bottom(ii)<<std::endl;
Blob<float>* ptr_blob = bottom_vecs_[i][ii];
std::cout<<"bottom shape="<<ptr_blob->shape_string()<<std::endl;
}
} else{
std::cout<<"no bottom"<<std::endl;
}
std::cout<<"=================top================"<<std::endl;
int top_name_size = layer->layer_param().top().size();
if(top_name_size>0)
{
for(int ii=0;ii<top_name_size;ii++)
{
std::cout<<ii<<" ::top name="<<layer->layer_param().top(ii)<<std::endl;
Blob<float>* ptr_blob = top_vecs_[i][ii];
std::cout<<"top shape="<<ptr_blob->shape_string()<<std::endl;
}
} else{
std::cout<<"no top"<<std::endl;
}
vector<shared_ptr<Blob<float> > > params = layer->blobs();
std::cout<<"=================params ================"<<std::endl;
std::cout<<"params size= "<<params.size()<<std::endl;
if(0 == params.size())
{
std::cout<<"has no params"<<std::endl;
} else
{
for(int j=0;j<params.size();j++)
{
std::cout<<"params_"<<j<<" shape="<<params[j]->shape_string()<<std::endl;
params[j]->save_data_to_txt(path_save_dir + name_layer + "_weight_" + std::to_string(j)+".txt");
}
}
std::cout<<std::endl;
}
//这里是为了对比caffe和pytorch的某一层输出是否一致,先保存caffe的某层feature map输出。
string name_aim_top = "premuted_fc";
const shared_ptr<Blob<float>> feature_map = net_->blob_by_name(name_aim_top);
bool b_save_shape = false;
std::cout<<"featuremap shape="<<std::endl;
std::cout<<feature_map->shape_string()<<std::endl;
feature_map->save_data_to_txt("/data_1/Yang/project/myfile/blob_val/"+name_aim_top+".txt",b_save_shape);
看caffe网络的话,可以直接把prototxt文件复制到网页上面查看。
http://ethereon.github.io/netscope/quickstart.html
这样看比较直观。
这里需要特别注意的是一个,就地操作。就是比如图上网络连在一起的conv1,conv1_bn,conv1_scale,conv1_relu由于它们的bottom和top名字一样,导致经过该层的运算结果直接会覆盖bottom,就是共用了一块内存。
这里是一个坑,之前一个同事也在做类似的工作,然后不同框架之间对比检查精度,发现刚开始的几层精度就对不上了,苦苦找问题找了一周都没有找到,最后让我帮忙看了看,我看了大半天才发现是这个就地操作导致的,你想拿conv1的feature map的结果是拿不到,你拿的实际已经是经过了conv1,conv1_bn,conv1_scale,conv1_relu这4步操作之后的结果了!
以上,就会生成每层权重,如果该层有多个权重,就直接是文件名末尾计数0,1,2来区分的,命名方式是layerName+_weight_cnt.txt。文件txt第一行是权重的shape,比如64,64,1,1。
完事之后,在Python端,我先写了一个脚本,读取txt把这些权重保存在一个字典里面。
import os
import numpy as np
#这个类主要是为了能够多重字典赋值
class AutoVivification(dict):
"""Implementation of perl's autovivification feature."""
def __getitem__(self, item):
try:
return dict.__getitem__(self, item)
except KeyError:
value = self[item] = type(self)()
return value
def get_weight_numpy(path_dir):
out = AutoVivification()
list_txt = os.listdir(path_dir)
for cnt,txt in enumerate(list_txt):
print(cnt, " ", txt)
txt_ = txt.replace(".txt","")
layer_name, idx = txt_.split("_weight_")
path_txt = path_dir + txt
with open(path_txt, 'r') as fr:
lines = fr.readlines()
data = []
shape_line = []
for cnt_1, line in enumerate(lines):
if(0 == cnt_1):
shape_line = []
shape_line = line.strip().split(",")
else:
data.append(float(line))
shape_line = map(eval, shape_line)
data = np.array(data).reshape(shape_line)
# new_dict = {}
out[layer_name][int(idx)] = data
return out
if __name__ == "__main__":
path_dir = "/data_1/Yang/project/save_weight/"
out = get_weight_numpy(path_dir)
conv1_weight = out['conv1'][0]
conv1_bias = out['conv1'][1]
下面直接给出把caffe保存的权重怼到搭建的pytorch 层上:
# coding=utf-8
import torch
import torchvision
from torch import nn
import torch.nn.functional as F
import cv2
import numpy as np
from weight_numpy import get_weight_numpy
class lstm_general(nn.Module): # SfSNet = PS-Net in SfSNet_deploy.prototxt
def __init__(self):
super(lstm_general, self).__init__()
# self.conv1_1 = nn.Conv2d(3, 64, 3, 1, 1)
self.data_bn = nn.BatchNorm2d(3)
self.conv1 = nn.Conv2d(3, 64, 7, 2, 3)
self.conv1_bn = nn.BatchNorm2d(64)
self.conv1_pool = nn.MaxPool2d(kernel_size=3, stride=2, ceil_mode=True)
self.layer_64_1_conv1 = nn.Conv2d(64, 64, 1, 1, 0, bias = False)
self.layer_64_1_bn2 = nn.BatchNorm2d(64)
self.layer_64_1_conv2 = nn.Conv2d(64, 64, 3, 1, 1, bias=False)
self.layer_64_1_bn3 = nn.BatchNorm2d(64)
self.layer_64_1_conv3 = nn.Conv2d(64, 256, 1, 1, 0, bias=False)
self.layer_64_1_conv_expand = nn.Conv2d(64, 256, 1, 1, 0, bias=False)
self.layer_128_1_bn1 = nn.BatchNorm2d(256)
self.layer_128_1_conv1 = nn.Conv2d(256, 128, 1, 1, 0, bias=False)
self.layer_128_1_bn2 = nn.BatchNorm2d(128)
self.layer_128_1_conv2 = nn.Conv2d(128, 128, 3, 1, 1, bias=False)
self.layer_128_1_bn3 = nn.BatchNorm2d(128)
self.layer_128_1_conv3 = nn.Conv2d(128, 512, 1, 1, 0, bias=False)
self.layer_128_1_conv_expand = nn.Conv2d(256, 512, 1, 1, 0, bias=False)
self.last_bn = nn.BatchNorm2d(512)
# self.lstm_1 = nn.LSTM(512 * 8, 100, 1, bidirectional=False)
self.lstm_lr = nn.LSTM(512 * 8, 100, 1, bidirectional=True)
self.fc1x1_r2_v2_a = nn.Linear(200,7118)
def forward(self, inputs):
# x = F.relu(self.bn1_1(self.conv1_1(inputs)))
x = self.data_bn(inputs)
x = F.relu(self.conv1_bn(self.conv1(x)))
x = self.conv1_pool(x) #[1,64,8,80]
x = F.relu(self.layer_64_1_bn2(self.layer_64_1_conv1(x))) # 1 64 8 80
layer_64_1_conv1 = x
x = F.relu(self.layer_64_1_bn3(self.layer_64_1_conv2(x)))
x = self.layer_64_1_conv3(x)
layer_64_1_conv_expand = self.layer_64_1_conv_expand(layer_64_1_conv1)
layer_64_3_sum = x + layer_64_1_conv_expand #1 256 8 80
x = F.relu(self.layer_128_1_bn1(layer_64_3_sum))
layer_128_1_bn1 = x
x = F.relu(self.layer_128_1_bn2(self.layer_128_1_conv1(x)))
x = F.relu(self.layer_128_1_bn3(self.layer_128_1_conv2(x)))
x = self.layer_128_1_conv3(x) #1, 512, 8, 80
layer_128_1_conv_expand = self.layer_128_1_conv_expand(layer_128_1_bn1) #1, 512, 8, 80
layer_128_4_sum = x + layer_128_1_conv_expand
x = F.relu(self.last_bn(layer_128_4_sum))
x = F.dropout(x, p=0.7, training=False) #1 512 8 80
x = x.permute(3,0,1,2) # 80 1 512 8
x = x.reshape(80,1,512*8)
#
# merge_lstm_rlstmx, (hn, cn) = self.lstm_r(x)
lstm_out,(_,_) = self.lstm_lr(x) #(80,1,200)
out = self.fc1x1_r2_v2_a(lstm_out) #(80,1,7118)
return out
def save_tensor(tensor_in,path_save):
tensor_in = tensor_in.contiguous().view(-1,1)
np_tensor = tensor_in.cpu().detach().numpy()
# np_tensor = np_tensor.view()
np.savetxt(path_save,np_tensor,fmt='%.12e')
def access_pixels(frame):
print(frame.shape) # shape内包含三个元素:按顺序为高、宽、通道数
height = frame.shape[0]
weight = frame.shape[1]
channels = frame.shape[2]
print("weight : %s, height : %s, channel : %s" % (weight, height, channels))
with open("/data_1/Yang/project/myfile/blob_val/img_stand_python.txt", "w") as fw:
for row in range(height): # 遍历高
for col in range(weight): # 遍历宽
for c in range(channels): # 便利通道
pv = frame[row, col, c]
fw.write(str(int(pv)))
fw.write("\n")
def LstmImgStandardization(img, ratio=10.0, stand_w=320, stand_h=32):
img_h, img_w, _ = img.shape
if img_h < 2 or img_w < 2:
return
# if 32 == img_h and 320 == img_w:
# return img
ratio_now = img_w * 1.0 / img_h
if ratio_now <= ratio:
mask = np.ones((img_h, int(img_h * ratio), 3), dtype=np.uint8) * 255
mask[0:img_h,0:img_w,:] = img
else:
mask = np.ones((int(img_w*1.0/ratio), img_w, 3), dtype=np.uint8) * 255
mask[0:img_h, 0:img_w, :] = img
mask_stand = cv2.resize(mask,(stand_w, stand_h),interpolation=cv2.INTER_LINEAR)
# access_pixels(mask_stand)
return mask_stand
if __name__ == '__main__':
device = torch.device('cuda:0') if torch.cuda.is_available() else torch.device('cpu')
net = lstm_general()
# net.eval()
index = 0
print("*" * 50)
for name, param in list(net.named_parameters()):
print(str(index) + ':', name, param.size())
index += 1
print("*" * 50)
##搭建完网络就可以通过这里看到网络所需要的参数名字
for k, v in net.state_dict().items():
print(k)
print(v.shape)
# print(k,v)
print("@" * 50)
# aaa = np.zeros((400,1))
path_dir = "/data_1/Yang/project/OCR/3rdlib/caffe_ocr_2021/myfile/save_weight/"
weight_numpy_dict = get_weight_numpy(path_dir)
from torch import from_numpy
state_dict = {}
state_dict['data_bn.running_mean'] = from_numpy(weight_numpy_dict["data_bn"][0] / weight_numpy_dict["data_bn"][2])
state_dict['data_bn.running_var'] = from_numpy(weight_numpy_dict["data_bn"][1] / weight_numpy_dict["data_bn"][2])
state_dict['data_bn.weight'] = from_numpy(weight_numpy_dict['data_scale'][0])
state_dict['data_bn.bias'] = from_numpy(weight_numpy_dict['data_scale'][1])
state_dict['conv1.weight'] = from_numpy(weight_numpy_dict['conv1'][0])
state_dict['conv1.bias'] = from_numpy(weight_numpy_dict['conv1'][1])
state_dict['conv1_bn.running_mean'] = from_numpy(weight_numpy_dict["conv1_bn"][0] / weight_numpy_dict["conv1_bn"][2])
state_dict['conv1_bn.running_var'] = from_numpy(weight_numpy_dict["conv1_bn"][1] / weight_numpy_dict["conv1_bn"][2])
state_dict['conv1_bn.weight'] = from_numpy(weight_numpy_dict['conv1_scale'][0])
state_dict['conv1_bn.bias'] = from_numpy(weight_numpy_dict['conv1_scale'][1])
state_dict['layer_64_1_conv1.weight'] = from_numpy(weight_numpy_dict['layer_64_1_conv1'][0])
state_dict['layer_64_1_bn2.running_mean'] = from_numpy(weight_numpy_dict["layer_64_1_bn2"][0] / weight_numpy_dict["layer_64_1_bn2"][2])
state_dict['layer_64_1_bn2.running_var'] = from_numpy(weight_numpy_dict["layer_64_1_bn2"][1] / weight_numpy_dict["layer_64_1_bn2"][2])
state_dict['layer_64_1_bn2.weight'] = from_numpy(weight_numpy_dict['layer_64_1_scale2'][0])
state_dict['layer_64_1_bn2.bias'] = from_numpy(weight_numpy_dict['layer_64_1_scale2'][1])
state_dict['layer_64_1_conv2.weight'] = from_numpy(weight_numpy_dict['layer_64_1_conv2'][0])
state_dict['layer_64_1_bn3.running_mean'] = from_numpy(weight_numpy_dict["layer_64_1_bn3"][0] / weight_numpy_dict["layer_64_1_bn3"][2])
state_dict['layer_64_1_bn3.running_var'] = from_numpy(weight_numpy_dict["layer_64_1_bn3"][1] / weight_numpy_dict["layer_64_1_bn3"][2])
state_dict['layer_64_1_bn3.weight'] = from_numpy(weight_numpy_dict['layer_64_1_scale3'][0])
state_dict['layer_64_1_bn3.bias'] = from_numpy(weight_numpy_dict['layer_64_1_scale3'][1])
state_dict['layer_64_1_conv3.weight'] = from_numpy(weight_numpy_dict['layer_64_1_conv3'][0])
state_dict['layer_64_1_conv_expand.weight'] = from_numpy(weight_numpy_dict['layer_64_1_conv_expand'][0])
state_dict['layer_128_1_bn1.running_mean'] = from_numpy(weight_numpy_dict["layer_128_1_bn1"][0] / weight_numpy_dict["layer_128_1_bn1"][2])
state_dict['layer_128_1_bn1.running_var'] = from_numpy(weight_numpy_dict["layer_128_1_bn1"][1] / weight_numpy_dict["layer_128_1_bn1"][2])
state_dict['layer_128_1_bn1.weight'] = from_numpy(weight_numpy_dict['layer_128_1_scale1'][0])
state_dict['layer_128_1_bn1.bias'] = from_numpy(weight_numpy_dict['layer_128_1_scale1'][1])
state_dict['layer_128_1_conv1.weight'] = from_numpy(weight_numpy_dict['layer_128_1_conv1'][0])
state_dict['layer_128_1_bn2.running_mean'] = from_numpy(weight_numpy_dict["layer_128_1_bn2"][0] / weight_numpy_dict["layer_128_1_bn2"][2])
state_dict['layer_128_1_bn2.running_var'] = from_numpy(weight_numpy_dict["layer_128_1_bn2"][1] / weight_numpy_dict["layer_128_1_bn2"][2])
state_dict['layer_128_1_bn2.weight'] = from_numpy(weight_numpy_dict['layer_128_1_scale2'][0])
state_dict['layer_128_1_bn2.bias'] = from_numpy(weight_numpy_dict['layer_128_1_scale2'][1])
state_dict['layer_128_1_conv2.weight'] = from_numpy(weight_numpy_dict['layer_128_1_conv2'][0])
state_dict['layer_128_1_bn3.running_mean'] = from_numpy(weight_numpy_dict["layer_128_1_bn3"][0] / weight_numpy_dict["layer_128_1_bn3"][2])
state_dict['layer_128_1_bn3.running_var'] = from_numpy(weight_numpy_dict["layer_128_1_bn3"][1] / weight_numpy_dict["layer_128_1_bn3"][2])
state_dict['layer_128_1_bn3.weight'] = from_numpy(weight_numpy_dict['layer_128_1_scale3'][0])
state_dict['layer_128_1_bn3.bias'] = from_numpy(weight_numpy_dict['layer_128_1_scale3'][1])
state_dict['layer_128_1_conv3.weight'] = from_numpy(weight_numpy_dict['layer_128_1_conv3'][0])
state_dict['layer_128_1_conv_expand.weight'] = from_numpy(weight_numpy_dict['layer_128_1_conv_expand'][0])
state_dict['last_bn.running_mean'] = from_numpy(weight_numpy_dict["last_bn"][0] / weight_numpy_dict["last_bn"][2])
state_dict['last_bn.running_var'] = from_numpy(weight_numpy_dict["last_bn"][1] / weight_numpy_dict["last_bn"][2])
state_dict['last_bn.weight'] = from_numpy(weight_numpy_dict['last_scale'][0])
state_dict['last_bn.bias'] = from_numpy(weight_numpy_dict['last_scale'][1])
## caffe i f o g
## pytorch i f g o
ww = from_numpy(weight_numpy_dict['lstm1x_r2'][0]) # [400,4096]
ww_200_if = ww[:200,:] #[200,4096]
ww_100_o = ww[200:300,:] #[100,4096]
ww_100_g = ww[300:400,:]#[100,4096]
ww_cat_ifgo = torch.cat((ww_200_if,ww_100_g,ww_100_o),0)
state_dict['lstm_lr.weight_ih_l0'] = ww_cat_ifgo
bb = from_numpy(weight_numpy_dict['lstm1x_r2'][1]) # [400]
bb_200_if = bb[:200]
bb_100_o = bb[200:300]
bb_100_g = bb[300:400]
bb_cat_ifgo = torch.cat((bb_200_if, bb_100_g, bb_100_o), 0)
state_dict['lstm_lr.bias_ih_l0'] = bb_cat_ifgo
ww = from_numpy(weight_numpy_dict['lstm1x_r2'][2]) # [400,100]
ww_200_if = ww[:200, :] # [200,100]
ww_100_o = ww[200:300, :] # [100,100]
ww_100_g = ww[300:400, :] # [100,100]
ww_cat_ifgo = torch.cat((ww_200_if, ww_100_g, ww_100_o), 0)
state_dict['lstm_lr.weight_hh_l0'] = ww_cat_ifgo
state_dict['lstm_lr.bias_hh_l0'] = from_numpy(np.zeros((400)))
##########################################
ww = from_numpy(weight_numpy_dict['lstm2x_r2'][0]) # [400,4096]
ww_200_if = ww[:200, :] # [200,4096]
ww_100_o = ww[200:300, :] # [100,4096]
ww_100_g = ww[300:400, :] # [100,4096]
ww_cat_ifgo = torch.cat((ww_200_if, ww_100_g, ww_100_o), 0)
state_dict['lstm_lr.weight_ih_l0_reverse'] = ww_cat_ifgo
bb = from_numpy(weight_numpy_dict['lstm2x_r2'][1]) # [400]
bb_200_if = bb[:200]
bb_100_o = bb[200:300]
bb_100_g = bb[300:400]
bb_cat_ifgo = torch.cat((bb_200_if, bb_100_g, bb_100_o), 0)
state_dict['lstm_lr.bias_ih_l0_reverse'] = bb_cat_ifgo
ww = from_numpy(weight_numpy_dict['lstm2x_r2'][2]) # [400,100]
ww_200_if = ww[:200, :] # [200,100]
ww_100_o = ww[200:300, :] # [100,100]
ww_100_g = ww[300:400, :] # [100,100]
ww_cat_ifgo = torch.cat((ww_200_if, ww_100_g, ww_100_o), 0)
state_dict['lstm_lr.weight_hh_l0_reverse'] = ww_cat_ifgo
state_dict['lstm_lr.bias_hh_l0_reverse'] = from_numpy(np.zeros((400)))
state_dict['fc1x1_r2_v2_a.weight'] = from_numpy(weight_numpy_dict['fc1x1_r2_v2_a'][0])
state_dict['fc1x1_r2_v2_a.bias'] = from_numpy(weight_numpy_dict['fc1x1_r2_v2_a'][1])
####input########################################
path_img = "/data_2/project/1.jpg"
img = cv2.imread(path_img)
# access_pixels(img)
img_stand = LstmImgStandardization(img, ratio=10.0, stand_w=320, stand_h=32)
img_stand = img_stand.astype(np.float32)
# img = (img / 255. - config.DATASET.MEAN) / config.DATASET.STD
img_stand = img_stand.transpose([2, 0, 1])
img_stand = img_stand[None,:,:,:]
img_stand = torch.from_numpy(img_stand)
img_stand = img_stand.type(torch.FloatTensor)
img_stand = img_stand.to(device)
# img_stand = img_stand.view(1, *img.size())
#######net##########################
net.load_state_dict(state_dict)
net.cuda()
net.eval()
preds = net(img_stand)
print("out shape=",preds.shape)
torch.save(net.state_dict(), './lstm_model.pth')
# name_top_caffe_layer = "fc1x_a" #"merge_lstm_rlstmx" #"#"data_bn"
# path_save = "/data_1/Yang/project/myfile/blob_val/" + name_top_caffe_layer + "_torch.txt"
# save_tensor(preds, path_save)
aaa = 0
这里需要注意一下caffe里面的bn层有三个参数,前面两个是均值和方差,第三个参数是一个系数,均值和方差都需要除以这个系数,这个系数是一个固定值999.982
caffe中的scale层就是图中下面这个公式系数。
这里还需要讲下lstm这个算法。在caffe中设定的time_step为80,设定的hidden为100,输入到lstm之前的feature map大小是80,1,512,8.
然后我通过层的权重看到lstm有3个权重,大小分别是[400,4096] [400] [400,100]
lstm通过查看源码发现有参数的就是2个全连接层,[400,4096] [400] 这两个是对输入进行inner所需要的参数,400是100*4得到的,至于为什么是4,这个需要看lstm原理,这里简单说下就是用h,x有4组相乘。
[400,100]是隐含h进行inner所需要的权重。
查看pytorch手册关于lstm介绍。
https://pytorch.org/docs/1.0.1/nn.html?highlight=lstm#torch.nn.LSTM。输入参数介绍。
然后根据输入参数,单独写了一个lstm算子测试看看:
import torch
import torch.nn as nn
# rnn = nn.LSTM(512*8, 100, 1, False)
# input = torch.randn(80, 1, 512*8)
#
# output, (hn, cn) = rnn(input)
#
#
# for name,parameters in rnn.named_parameters():
# print(name,':',parameters.size())
# # parm[name]=parameters.detach().numpy()
#
# aa = 0
rnn = nn.LSTM(512*8, 100, 1, bidirectional=True)
input = torch.randn(80, 1, 512*8)
output, (hn, cn) = rnn(input)
print("out shape=",output.shape)
for name,parameters in rnn.named_parameters():
print(name,':',parameters.size())
# parm[name]=parameters.detach().numpy()
aa = 0
输出如下:
('out shape=', (80, 1, 200))
('weight_ih_l0', ':', (400, 4096))
('weight_hh_l0', ':', (400, 100))
('bias_ih_l0', ':', (400,))
('bias_hh_l0', ':', (400,))
('weight_ih_l0_reverse', ':', (400, 4096))
('weight_hh_l0_reverse', ':', (400, 100))
('bias_ih_l0_reverse', ':', (400,))
('bias_hh_l0_reverse', ':', (400,))
Process finished with exit code 0
可以看到pytorch的lstm所需要的参数基本与caffe一致,不过caffe的一个lstm参数是3个,pytorch的lstm参数是4个,显然是因为caffe隐含层的inner没用偏置,到时候直接把一个pytorch的偏置放为0就可以!
然而事情并不是一帆风顺的,上面给出的代码是成功的,但是在此之前我把所有的参数都怼上,但是精度是不对的。后面仔细看lstm源码,发现caffe的计算顺序:
lstm_unit_layer.cpp
template <typename Dtype>
void LSTMUnitLayer<Dtype>::Forward_cpu(const vector<Blob<Dtype>*>& bottom,
const vector<Blob<Dtype>*>& top) {
const int num = bottom[0]->shape(1);//1
const int x_dim = hidden_dim_ * 4;
const Dtype* C_prev = bottom[0]->cpu_data();
const Dtype* X = bottom[1]->cpu_data();
const Dtype* cont = bottom[2]->cpu_data();
Dtype* C = top[0]->mutable_cpu_data();
Dtype* H = top[1]->mutable_cpu_data();
for (int n = 0; n < num; ++n) { //1
for (int d = 0; d < hidden_dim_; ++d) {//100
const Dtype i = sigmoid(X[d]);
const Dtype f = (*cont == 0) ? 0 :
(*cont * sigmoid(X[1 * hidden_dim_ + d]));weight_ih_l[k] – the learnable input-hidden weights of the \text{k}^{th}k
th
layer (W_ii|W_if|W_ig|W_io), of shape (4*hidden_size x input_size)
weight_hh_l[k] – the learnable hidden-hidden weights of the \text{k}^{th}k
th
layer (W_hi|W_hf|W_hg|W_ho), of shape (4*hidden_size x hidden_size)
bias_ih_l[k] – the learnable input-hidden bias of the \text{k}^{th}k
th
layer (b_ii|b_if|b_ig|b_io), of shape (4*hidden_size)
bias_hh_l[k] – the learnable hidden-hidden bias of the \text{k}^{th}k
th
layer (b_hi|b_hf|b_hg|b_ho), of shape (4*hidden_size)
const Dtype o = sigmoid(X[2 * hidden_dim_ + d]);
const Dtype g = tanh(X[3 * hidden_dim_ + d]);
const Dtype c_prev = C_prev[d];
const Dtype c = f * c_prev + i * g;
C[d] = c;
const Dtype tanh_c = tanh(c);
H[d] = o * tanh_c;
}
C_prev += hidden_dim_;
X += x_dim;
C += hidden_dim_;
H += hidden_dim_;
++cont;
}
}
发现caffe的计算顺序是ifog。
看pytorch说明文档介绍权重的顺序是
weight_ih_l[k] – the learnable input-hidden weights of the \text{k}^{th}k
th
layer (W_ii|W_if|W_ig|W_io), of shape (4*hidden_size x input_size)
weight_hh_l[k] – the learnable hidden-hidden weights of the \text{k}^{th}k
th
layer (W_hi|W_hf|W_hg|W_ho), of shape (4*hidden_size x hidden_size)
bias_ih_l[k] – the learnable input-hidden bias of the \text{k}^{th}k
th
layer (b_ii|b_if|b_ig|b_io), of shape (4*hidden_size)
bias_hh_l[k] – the learnable hidden-hidden bias of the \text{k}^{th}k
th
layer (b_hi|b_hf|b_hg|b_ho), of shape (4*hidden_size)
有点儿不一样,那么我只需要把caffe的权重顺序改下和pytorch一致试试。所有就有了上面的代码:
## caffe i f o g
## pytorch i f g o
ww = from_numpy(weight_numpy_dict['lstm1x_r2'][0]) # [400,4096]
ww_200_if = ww[:200,:] #[200,4096]
ww_100_o = ww[200:300,:] #[100,4096]
ww_100_g = ww[300:400,:]#[100,4096]
ww_cat_ifgo = torch.cat((ww_200_if,ww_100_g,ww_100_o),0)
state_dict['lstm_lr.weight_ih_l0'] = ww_cat_ifgo
这样一整,成功了,精度一致!! 给出测试精度的代码。
不同框架下验证精度 https://www.cnblogs.com/yanghailin/p/15593614.html
给出我跑出结果的代码:
# -*- coding: utf-8
import torch
from torch import nn
import torch.nn.functional as F
import cv2
import numpy as np
import os
from chn_tab import chn_tab
class lstm_general(nn.Module): # SfSNet = PS-Net in SfSNet_deploy.prototxt
def __init__(self):
super(lstm_general, self).__init__()
# self.conv1_1 = nn.Conv2d(3, 64, 3, 1, 1)
self.data_bn = nn.BatchNorm2d(3)
self.conv1 = nn.Conv2d(3, 64, 7, 2, 3)
self.conv1_bn = nn.BatchNorm2d(64)
self.conv1_pool = nn.MaxPool2d(kernel_size=3, stride=2, ceil_mode=True)
self.layer_64_1_conv1 = nn.Conv2d(64, 64, 1, 1, 0, bias = False)
self.layer_64_1_bn2 = nn.BatchNorm2d(64)
self.layer_64_1_conv2 = nn.Conv2d(64, 64, 3, 1, 1, bias=False)
self.layer_64_1_bn3 = nn.BatchNorm2d(64)
self.layer_64_1_conv3 = nn.Conv2d(64, 256, 1, 1, 0, bias=False)
self.layer_64_1_conv_expand = nn.Conv2d(64, 256, 1, 1, 0, bias=False)
self.layer_128_1_bn1 = nn.BatchNorm2d(256)
self.layer_128_1_conv1 = nn.Conv2d(256, 128, 1, 1, 0, bias=False)
self.layer_128_1_bn2 = nn.BatchNorm2d(128)
self.layer_128_1_conv2 = nn.Conv2d(128, 128, 3, 1, 1, bias=False)
self.layer_128_1_bn3 = nn.BatchNorm2d(128)
self.layer_128_1_conv3 = nn.Conv2d(128, 512, 1, 1, 0, bias=False)
self.layer_128_1_conv_expand = nn.Conv2d(256, 512, 1, 1, 0, bias=False)
self.last_bn = nn.BatchNorm2d(512)
# self.lstm_1 = nn.LSTM(512 * 8, 100, 1, bidirectional=False)
self.lstm_lr = nn.LSTM(512 * 8, 100, 1, bidirectional=True)
self.fc1x1_r2_v2_a = nn.Linear(200,7118)
def forward(self, inputs):
# x = F.relu(self.bn1_1(self.conv1_1(inputs)))
x = self.data_bn(inputs)
x = F.relu(self.conv1_bn(self.conv1(x)))
x = self.conv1_pool(x) #[1,64,8,80]
x = F.relu(self.layer_64_1_bn2(self.layer_64_1_conv1(x))) # 1 64 8 80
layer_64_1_conv1 = x
x = F.relu(self.layer_64_1_bn3(self.layer_64_1_conv2(x)))
x = self.layer_64_1_conv3(x)
layer_64_1_conv_expand = self.layer_64_1_conv_expand(layer_64_1_conv1)
layer_64_3_sum = x + layer_64_1_conv_expand #1 256 8 80
x = F.relu(self.layer_128_1_bn1(layer_64_3_sum))
layer_128_1_bn1 = x
x = F.relu(self.layer_128_1_bn2(self.layer_128_1_conv1(x)))
x = F.relu(self.layer_128_1_bn3(self.layer_128_1_conv2(x)))
x = self.layer_128_1_conv3(x) #1, 512, 8, 80
layer_128_1_conv_expand = self.layer_128_1_conv_expand(layer_128_1_bn1) #1, 512, 8, 80
layer_128_4_sum = x + layer_128_1_conv_expand
x = F.relu(self.last_bn(layer_128_4_sum))###acc ok
x = F.dropout(x, p=0.7, training=False) #1 512 8 80
x = x.permute(3,0,1,2) # 80 1 512 8
x = x.reshape(80,1,512*8)###acc ok
#
# merge_lstm_rlstmx, (hn, cn) = self.lstm_r(x)
lstm_out,(_,_) = self.lstm_lr(x) #(80,1,200)
return lstm_out
out = self.fc1x1_r2_v2_a(lstm_out) #(80,1,7118)
return out
def LstmImgStandardization(img, ratio=10.0, stand_w=320, stand_h=32):
img_h, img_w, _ = img.shape
if img_h < 2 or img_w < 2:
return
# if 32 == img_h and 320 == img_w:
# return img
ratio_now = img_w * 1.0 / img_h
if ratio_now <= ratio:
mask = np.ones((img_h, int(img_h * ratio), 3), dtype=np.uint8) * 255
mask[0:img_h,0:img_w,:] = img
else:
mask = np.ones((int(img_w*1.0/ratio), img_w, 3), dtype=np.uint8) * 255
mask[0:img_h, 0:img_w, :] = img
mask_stand = cv2.resize(mask,(stand_w, stand_h),interpolation=cv2.INTER_LINEAR)
# access_pixels(mask_stand)
return mask_stand
if __name__ == '__main__':
path_model = "/data_1/everyday/1118/pytorch_lstm_test/lstm_model.pth"
path_img = "/data_2/project_202009/chejian/test_data/model_test/rec_general/1.jpg"
blank_label = 7117
prev_label = blank_label
device = torch.device('cuda:0') if torch.cuda.is_available() else torch.device('cpu')
img = cv2.imread(path_img)
img_stand = LstmImgStandardization(img, ratio=10.0, stand_w=320, stand_h=32)
img_stand = img_stand.astype(np.float32)
img_stand = img_stand.transpose([2, 0, 1])
img_stand = img_stand[None, :, :, :]
img_stand = torch.from_numpy(img_stand)
img_stand = img_stand.type(torch.FloatTensor)
img_stand = img_stand.to(device)
net = lstm_general()
checkpoint = torch.load(path_model)
net.load_state_dict(checkpoint)
net.cuda()
net.eval()
# traced_script_module = torch.jit.trace(net, img_stand)
# traced_script_module.save("./lstm.pt")
preds = net(img_stand)
# print("out shape=", preds.shape)
preds_1 = preds.squeeze()
# print("preds_1 out shape=", preds_1.shape)
val, pos = torch.max(preds_1,1)
pos = pos.cpu().numpy()
rec = ""
for predict_label in pos:
if predict_label != blank_label and predict_label != prev_label:
# print("predict_label=",predict_label)
print(chn_tab[predict_label])
rec += chn_tab[predict_label]
prev_label = predict_label
# print("rec=",rec)
print(rec)
弄成功了,但是只高兴了一天。
我最终目的是能在c++下面跑,于是转libtorch,本来我以为这是轻而易举的事情,但是事情并没有那么简单。
我发现我的libtorch代码经过lstm这层之后精度就对不上了,在此之前都是可以对上的。!!!无解。
可能和版本有关系,因为我用高版本的libtorch之前是转成功一个crnn的,是没有问题的。
https://github.com/wuzuowuyou/crnn_libtorch
这个是pytorch1.7版本的,而我现在是用的1.0版本的。我试了很久发现还是精度不对,这就无法解决了,也不知道从何下手去解决这个问题。翻遍了pytorch github上面的issue,没人遇到和我一样的问题。。。除非看pytorch源码去找问题,这太难了。
在pytorch的github提了issue
https://github.com/pytorch/pytorch/issues/68864
我知道这也会石沉大海的。
以下是我凌乱的,未完工的代码:
#include <torch/script.h> // One-stop header.
#include "torch/torch.h"
#include "torch/jit.h"
#include <memory>
#include "opencv2/opencv.hpp"
#include <queue>
#include <dirent.h>
#include <iostream>
#include <cstdlib>
#include <cstring>
#include <opencv2/opencv.hpp>
#include <opencv2/core/core.hpp>
#include <opencv2/highgui/highgui.hpp>
#include <opencv2/imgproc/imgproc.hpp>
using namespace cv;
using namespace std;
// cv::Mat m_stand;
#define TABLE_SIZE 7117
static string chn_tab[TABLE_SIZE+1] = {"啊","阿","埃"
。。。
。。。
。。。
"0","1","2","3","4","5","6","7","8","9",
":",";","<","=",">","?","@",
"A","B","C","D","E","F","G","H","I","J","K","L","M","N","O","P","Q","R","S","T","U","V","W","X","Y","Z",
"[","\\","]","^","_","`",
"a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z",
"{","|","}","~",
" "};
bool LstmImgStandardization_src_1(const cv::Mat &src, const float &ratio, int standard_w, int standard_h, cv::Mat &dst)
{
if(src.empty())return false;
float width=src.cols;
float height=src.rows;
float a=width/ height;
if(a <=ratio)
{
Mat mask(height, ratio*height, CV_8UC3, cv::Scalar(255, 255, 255));
Mat imageROI = mask(Rect(0, 0, width, height));
src.copyTo(imageROI);
dst=mask.clone();
}
else
{
Mat mask(width/ratio, width, CV_8UC3, cv::Scalar(255, 255, 255));
Mat imageROI = mask(Rect(0, 0, width, height));
src.copyTo(imageROI);
dst=mask.clone();
}
//cv::resize(dst, dst, cv::Size(standard_w,standard_h));
cv::resize(dst, dst, cv::Size(standard_w,standard_h),0,0,cv::INTER_AREA);
return true;
}
bool lstm_img_standardization(cv::Mat src, cv::Mat &dst,float ratio)
{
if(src.empty())return false;
double width=src.cols;
double height=src.rows;
double a=width/height;
if(a <=ratio)//6
{
Mat mask(height, ratio*height, CV_8UC3, Scalar(255, 255, 255));
Mat imageROI = mask(Rect(0, 0, width, height));
src.copyTo(imageROI);
dst=mask.clone();
}
else
{
Mat mask(width/ratio, width, CV_8UC3, Scalar(255, 255, 255));
Mat imageROI = mask(Rect(0, 0, width, height));
src.copyTo(imageROI);
dst=mask.clone();
}
// cv::resize(dst, dst, cv::Size(360,60));
cv::resize(dst, dst, cv::Size(320,32));
return true;
}
//torch::Tensor pre_img(cv::Mat &img)
//{
// cv::Mat m_stand;
// float ratio = 10.0;
// if(1 == img.channels()) { cv::cvtColor(img,img,CV_GRAY2BGR); }
// lstm_img_standardization(img, m_stand, ratio);
//
// std::vector<int64_t> sizes = {m_stand.rows, m_stand.cols, m_stand.channels()};
// torch::TensorOptions options = torch::TensorOptions().dtype(torch::kByte);
// torch::Tensor tensor_image = torch::from_blob(m_stand.data, torch::IntList(sizes), options);
// // Permute tensor, shape is (C, H, W)
// tensor_image = tensor_image.permute({2, 0, 1});
//
//
// // Convert tensor dtype to float32, and range from [0, 255] to [0, 1]
// tensor_image = tensor_image.toType(torch::ScalarType::Float);
//
//
//// tensor_image = tensor_image.div_(255.0);
//// // Subtract mean value
//// for (int i = 0; i < std::min<int64_t>(v_mean.size(), tensor_image.size(0)); i++) {
//// tensor_image[i] = tensor_image[i].sub_(v_mean[i]);
//// }
//// // Divide by std value
//// for (int i = 0; i < std::min<int64_t>(v_std.size(), tensor_image.size(0)); i++) {
//// tensor_image[i] = tensor_image[i].div_(v_std[i]);
//// }
// //[c,h,w] --> [1,c,h,w]
// tensor_image.unsqueeze_(0);
// std::cout<<tensor_image;
// return tensor_image;
//}
bool pre_img(cv::Mat &img, torch::Tensor &input_tensor)
{
static cv::Mat m_stand;
float ratio = 10.0;
// if(1 == img.channels()) { cv::cvtColor(img,img,CV_GRAY2BGR); }
lstm_img_standardization(img, m_stand, ratio);
m_stand.convertTo(m_stand, CV_32FC3);
// imshow("m_stand",m_stand);
// waitKey(0);
// Mat m_stand_new;
// m_stand.convertTo(m_stand_new, CV_32FC3);
// int rowNumber = m_stand_new.rows; //行数
// int colNumber = m_stand_new.cols*m_stand_new.channels(); //列数 x 通道数=每一行元素的个数
// std::ofstream out_file("/data_1/everyday/1123/img_acc/after_CV_32FC3-float-111.txt");
// //双重循环,遍历所有的像素值
// for (int i = 0; i < rowNumber; i++) //行循环
// {
// uchar *data = m_stand_new.ptr<uchar>(i); //获取第i行的首地址
// for (int j = 0; j < colNumber; j++) //列循环
// {
// // ---------【开始处理每个像素】-------------
// int pix = int(data[j]);
// out_file << pix << std::endl;
// }
// }
//
// out_file.close();
// std::cout<<"==m_stand.convertTo(m_stand, CV_32FC3);=="<<std::endl;
// while(1);
int stand_row = m_stand.rows;
int stand_cols = m_stand.cols;
input_tensor = torch::from_blob(
m_stand.data, {stand_row, stand_cols, 3}).toType(torch::kFloat);
input_tensor = input_tensor.permute({2,0,1});
input_tensor = input_tensor.unsqueeze(0);//.to(torch::kFloat);
// std::cout<<input_tensor;
return true;
}
void GetFileInDir(string dirName, vector<string> &v_path)
{
DIR* Dir = NULL;
struct dirent* file = NULL;
if (dirName[dirName.size()-1] != '/')
{
dirName += "/";
}
if ((Dir = opendir(dirName.c_str())) == NULL)
{
cerr << "Can't open Directory" << endl;
exit(1);
}
while (file = readdir(Dir))
{
//if the file is a normal file
if (file->d_type == DT_REG)
{
v_path.push_back(dirName + file->d_name);
}
//if the file is a directory
else if (file->d_type == DT_DIR && strcmp(file->d_name, ".") != 0 && strcmp(file->d_name, "..") != 0)
{
GetFileInDir(dirName + file->d_name,v_path);
}
}
}
string str_replace(const string &str,const string &str_find,const string &str_replacee)
{
string str_tmp=str;
size_t pos = str_tmp.find(str_find);
while (pos != string::npos)
{
str_tmp.replace(pos, str_find.length(), str_replacee);
size_t pos_t=pos+str_replacee.length();
string str_sub=str_tmp.substr(pos_t,str_tmp.length()-pos_t);
size_t pos_tt=str_sub.find(str_find);
if(string::npos != pos_tt)
{
pos =pos_t + str_sub.find(str_find);
}else
{
pos=string::npos;
}
}
return str_tmp;
}
string get_ans(const string path)
{
int pos_1 = path.find_last_of("_");
int pos_2 = path.find_last_of(".");
string ans = path.substr(pos_1+1,pos_2-pos_1-1);
ans = str_replace(ans,"@","/");
return ans;
}
bool save_tensor_txt(torch::Tensor tensor_in_,string path_txt)
{
#include "fstream"
ofstream outfile(path_txt);
torch::Tensor tensor_in = tensor_in_.clone();
tensor_in = tensor_in.view({-1,1});
tensor_in = tensor_in.to(torch::kCPU);
auto result_data = tensor_in.accessor<float, 2>();
for(int i=0;i<result_data.size(0);i++)
{
float val = result_data[i][0];
// std::cout<<"val="<<val<<std::endl;
outfile<<val<<std::endl;
}
return true;
}
int main()
{
std::string path_pt = "/data_1/everyday/1118/pytorch_lstm_test/lstmunidirectional20211124.pt";//"/data_1/everyday/1118/pytorch_lstm_test/lstm20211124.pt";//"/data_1/everyday/1118/pytorch_lstm_test/lstm10000.pt";//"/data_1/everyday/1118/pytorch_lstm_test/lstm.pt";
std::string path_img_dir = "/data_1/2020biaozhushuju/2021_rec/general/test";//"/data_1/everyday/1118/pytorch_lstm_test/test_data";
int blank_label = 7117;
std::ifstream list("/data_1/everyday/1123/list.txt");
int standard_w = 320;
int standard_h = 32;
// vector<string> v_path;
// GetFileInDir(path_img_dir, v_path);
// for(int i=0;i<v_path.size();i++)
// {
// std::cout<<i<<" "<<v_path[i]<<std::endl;
// }
torch::Device m_device(torch::kCUDA);
// torch::Device m_device(torch::kCPU);
std::shared_ptr<torch::jit::script::Module> m_model = torch::jit::load(path_pt);
torch::NoGradGuard no_grad;
m_model->to(m_device);
std::cout<<"success load model"<<std::endl;
int cnt_all = 0;
int cnt_right = 0;
double start = getTickCount();
string file;
while(list >> file)
{
file = "/data_1/everyday/1123/img/bxd_39_发动机号码.jpg";
cout<<cnt_all++<<" :: "<<file<<endl;
string jpg=".jpg";
string::size_type idx = file.find( jpg );
if ( idx == string::npos )
continue;
int pos_1 = file.find_last_of("_");
int pos_2 = file.find_last_of(".");
string answer = file.substr(pos_1+1,pos_2-pos_1-1);
cv::Mat img = cv::imread(file);
// int rowNumber = img.rows; //行数
// int colNumber = img.cols*img.channels(); //列数 x 通道数=每一行元素的个数
// std::ofstream out_file("/data_1/everyday/1123/img_acc/libtorch_img.txt");
// //双重循环,遍历所有的像素值
// for (int i = 0; i < rowNumber; i++) //行循环
// {
// uchar *data = img.ptr<uchar>(i); //获取第i行的首地址
// for (int j = 0; j < colNumber; j++) //列循环
// {
// // ---------【开始处理每个像素】-------------
// int pix = int(data[j]);
// out_file << pix << std::endl;
// }
// }
//
// out_file.close();
// while(1);
torch::Tensor tensor_input;
pre_img(img, tensor_input);
tensor_input = tensor_input.to(m_device);
tensor_input.print();
std::cout<<tensor_input[0][2][12][25]<<std::endl;
std::cout<<tensor_input[0][1][15][100]<<std::endl;
std::cout<<tensor_input[0][0][16][132]<<std::endl;
std::cout<<tensor_input[0][1][17][156]<<std::endl;
std::cout<<tensor_input[0][2][5][256]<<std::endl;
std::cout<<tensor_input[0][0][14][205]<<std::endl;
save_tensor_txt(tensor_input, "/data_1/everyday/1124/acc/libtorch_input-100.txt");
torch::Tensor output = m_model->forward({tensor_input}).toTensor();
output.print();
// output = output.squeeze();//80,7118
// output.print();
save_tensor_txt(output, "/data_1/everyday/1124/acc/libtorch-out-100.txt");
//// std::cout<<output<<std::endl;
while(1);
//
torch::Tensor index = torch::argmax(output,1).cpu();//.to(torch::kInt);
index.print();
// std::cout<<index<<std::endl;
// while(1);
int prev_label = blank_label;
string result;
auto result_data = index.accessor<long, 1>();
for(int i=0;i<result_data.size(0);i++)
{
// std::cout<<result_data[i]<<std::endl;
int predict_label = result_data[i];
if (predict_label != blank_label && predict_label != prev_label )
{
{
result = result + chn_tab[predict_label];
}
}
prev_label = predict_label;
}
cout << "answer: " << answer << endl;
cout << "result : " << result << endl;
imshow("src",img);
waitKey(0);
// while(1);
}
// for(int i=0;i<v_path.size();i++)
// {
// cnt_all += 1;
// std::string path_img = v_path[i];
// string ans = get_ans(path_img);
// std::cout<<i<<" path="<<path_img<<" ans="<<ans<<std::endl;
// cv::Mat img = cv::imread(path_img);
// torch::Tensor input = pre_img(img, v_mean, v_std, standard_w, standard_h);
// input = input.to(m_device);
// torch::Tensor output = m_module.forward({input}).toTensor();
//
// std::string rec = get_label(output);
//#if 1 //for show
// std::cout<<"rec="<<rec<<std::endl;
// std::cout<<"ans="<<ans<<std::endl;
// cv::imshow("img",img);
// cv::waitKey(0);
//#endif
//
//#if 0 //In order to test the accuracy
// std::cout<<"rec="<<rec<<std::endl;
// std::cout<<"ans="<<ans<<std::endl;
// if(ans == rec)
// {
// cnt_right += 1;
// }
// std::cout<<"cnt_right="<<cnt_right<<std::endl;
// std::cout<<"cnt_all="<<cnt_all<<std::endl;
// std::cout<<"ratio="<<cnt_right * 1.0 / cnt_all<<std::endl;
//#endif
// }
// double time_cunsume = ((double)getTickCount() - start) / getTickFrequency();
// std::cout<<"ave time="<< time_cunsume * 1.0 / cnt_all * 1000 <<"ms"<<std::endl;
return 0;
}
------------------2021年11月25日10:18:54
早上来看到github有人回复建议我升级到最新版本看看。
没办法,我本地有pytorch1.1.0, cuda10.0, libtorch1.1.0的环境,我就直接用这个环境再来一遍,先生成pth,看模型输出是正确的,然后再生成pt,然后配置libtorch的cmakelist,然后再跑,发现没问题!!!
也就是说确实是libtorch1.0的问题了。无解。
这里再配上我cmakelist
cmake_minimum_required(VERSION 2.6)
project(libtorch_lstm_1.1.0)
set(CMAKE_BUILD_TYPE Debug)
set(CMAKE_BUILD_TYPE Debug CACHE STRING "set build type to debug")
#add_definitions(-std=c++11)
set(CMAKE_CXX_STANDARD 11)
set(CMAKE_CXX_STANDARD_REQUIRED ON)
option(CUDA_USE_STATIC_CUDA_RUNTIME OFF)
#set(CMAKE_CXX_STANDARD 11)
set(CMAKE_BUILD_TYPE Debug)
# cuda10
include_directories(${CMAKE_SOURCE_DIR}/3rdparty/cuda/include)
link_directories(${CMAKE_SOURCE_DIR}/3rdparty/cuda/lib64)
###libtorch1.1.0
set(TORCH_ROOT ${CMAKE_SOURCE_DIR}/3rdparty/libtorch)
set(CMAKE_PREFIX_PATH ${CMAKE_SOURCE_DIR}/3rdparty/libtorch)
include_directories(${TORCH_ROOT}/include)
include_directories(${TORCH_ROOT}/include/torch/csrc/api/include)
link_directories(${TORCH_ROOT}/lib)
#OpenCv3.4.10
set(OPENCV_ROOT ${CMAKE_SOURCE_DIR}/3rdparty/opencv-3.4.10)
include_directories(${OPENCV_ROOT}/include)
link_directories(${OPENCV_ROOT}/lib)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -std=c++11 -Wall -Ofast -Wfatal-errors -D_MWAITXINTRIN_H_INCLUDED")
add_executable(libtorch_lstm ${PROJECT_SOURCE_DIR}/lstm.cpp)
target_link_libraries(libtorch_lstm opencv_calib3d opencv_core opencv_imgproc opencv_highgui opencv_imgcodecs)
target_link_libraries(libtorch_lstm torch c10 caffe2)
target_link_libraries(libtorch_lstm nvrtc cuda)
#target_link_libraries(crnn c10 c10_cuda torch torch_cuda torch_cpu "-Wl,--no-as-needed -ltorch_cuda")
add_definitions(-O2 -pthread)
#include <torch/script.h> // One-stop header.
#include "torch/torch.h"
#include "torch/jit.h"
#include <memory>
#include "opencv2/opencv.hpp"
#include <queue>
#include <dirent.h>
#include <iostream>
#include <cstdlib>
#include <cstring>
#include <opencv2/opencv.hpp>
#include <opencv2/core/core.hpp>
#include <opencv2/highgui/highgui.hpp>
#include <opencv2/imgproc/imgproc.hpp>
using namespace cv;
using namespace std;
// cv::Mat m_stand;
#define TABLE_SIZE 7117
static string chn_tab[TABLE_SIZE+1] = {"啊","阿","埃","挨","哎","唉",
。。。
。。。
。。。
"∴","♂","♀","°","′","″","℃","$","¤","¢","£","‰","§","№","☆","★",
"○","●","◎","◇","◆","□","■","△","▲","※","→","←","↑","↓","〓",
"⒈","⒉","⒊","⒋","⒌","⒍","⒎","⒏","⒐","⒑","⒒","⒓","⒔","⒕","⒖",
"⒗","⒘","⒙","⒚","⒛","⑴","⑵","⑶","⑷","⑸","⑹","⑺","⑻","⑼","⑽","⑾",
"⑿","⒀","⒁","⒂","⒃","⒄","⒅","⒆","⒇","①","②","③","④","⑤","⑥","⑦",
"⑧","⑨","⑩","㈠","㈡","㈢","㈣","㈤","㈥","㈦","㈧","㈨","㈩",
"Ⅰ","Ⅱ","Ⅲ","Ⅳ","Ⅴ","Ⅵ","Ⅶ","Ⅷ","Ⅸ","Ⅹ","Ⅺ","Ⅻ",
"!",""","#","¥","%","&","'","(",")","*","+",",","-",".","/",
"0","1","2","3","4","5","6","7","8","9",":",";","<","=",">","?",
"@","A","B","C","D","E","F","G","H","I","J","K","L","M","N","O",
"P","Q","R","S","T","U","V","W","X","Y","Z","[","\","]","^","_",
"`","a","b","c","d","e","f","g","h","i","j","k","l","m","n","o",
"p","q","r","s","t","u","v","w","x","y","z","{","|","}"," ̄",
"!","\"","#","$","%","&","'","(",")","*","+",",","-",".","/", //========ascii========//
"0","1","2","3","4","5","6","7","8","9",
":",";","<","=",">","?","@",
"A","B","C","D","E","F","G","H","I","J","K","L","M","N","O","P","Q","R","S","T","U","V","W","X","Y","Z",
"[","\\","]","^","_","`",
"a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z",
"{","|","}","~",
" "};
bool LstmImgStandardization_src_1(const cv::Mat &src, const float &ratio, int standard_w, int standard_h, cv::Mat &dst)
{
if(src.empty())return false;
float width=src.cols;
float height=src.rows;
float a=width/ height;
if(a <=ratio)
{
Mat mask(height, ratio*height, CV_8UC3, cv::Scalar(255, 255, 255));
Mat imageROI = mask(Rect(0, 0, width, height));
src.copyTo(imageROI);
dst=mask.clone();
}
else
{
Mat mask(width/ratio, width, CV_8UC3, cv::Scalar(255, 255, 255));
Mat imageROI = mask(Rect(0, 0, width, height));
src.copyTo(imageROI);
dst=mask.clone();
}
//cv::resize(dst, dst, cv::Size(standard_w,standard_h));
cv::resize(dst, dst, cv::Size(standard_w,standard_h),0,0,cv::INTER_AREA);
return true;
}
bool lstm_img_standardization(cv::Mat src, cv::Mat &dst,float ratio)
{
if(src.empty())return false;
double width=src.cols;
double height=src.rows;
double a=width/height;
if(a <=ratio)//6
{
Mat mask(height, ratio*height, CV_8UC3, Scalar(255, 255, 255));
Mat imageROI = mask(Rect(0, 0, width, height));
src.copyTo(imageROI);
dst=mask.clone();
}
else
{
Mat mask(width/ratio, width, CV_8UC3, Scalar(255, 255, 255));
Mat imageROI = mask(Rect(0, 0, width, height));
src.copyTo(imageROI);
dst=mask.clone();
}
// cv::resize(dst, dst, cv::Size(360,60));
cv::resize(dst, dst, cv::Size(320,32));
return true;
}
//torch::Tensor pre_img(cv::Mat &img)
//{
// cv::Mat m_stand;
// float ratio = 10.0;
// if(1 == img.channels()) { cv::cvtColor(img,img,CV_GRAY2BGR); }
// lstm_img_standardization(img, m_stand, ratio);
//
// std::vector<int64_t> sizes = {m_stand.rows, m_stand.cols, m_stand.channels()};
// torch::TensorOptions options = torch::TensorOptions().dtype(torch::kByte);
// torch::Tensor tensor_image = torch::from_blob(m_stand.data, torch::IntList(sizes), options);
// // Permute tensor, shape is (C, H, W)
// tensor_image = tensor_image.permute({2, 0, 1});
//
//
// // Convert tensor dtype to float32, and range from [0, 255] to [0, 1]
// tensor_image = tensor_image.toType(torch::ScalarType::Float);
//
//
//// tensor_image = tensor_image.div_(255.0);
//// // Subtract mean value
//// for (int i = 0; i < std::min<int64_t>(v_mean.size(), tensor_image.size(0)); i++) {
//// tensor_image[i] = tensor_image[i].sub_(v_mean[i]);
//// }
//// // Divide by std value
//// for (int i = 0; i < std::min<int64_t>(v_std.size(), tensor_image.size(0)); i++) {
//// tensor_image[i] = tensor_image[i].div_(v_std[i]);
//// }
// //[c,h,w] --> [1,c,h,w]
// tensor_image.unsqueeze_(0);
// std::cout<<tensor_image;
// return tensor_image;
//}
bool pre_img(cv::Mat &img, torch::Tensor &input_tensor)
{
static cv::Mat m_stand;
float ratio = 10.0;
// if(1 == img.channels()) { cv::cvtColor(img,img,CV_GRAY2BGR); }
lstm_img_standardization(img, m_stand, ratio);
m_stand.convertTo(m_stand, CV_32FC3);
// imshow("m_stand",m_stand);
// waitKey(0);
// Mat m_stand_new;
// m_stand.convertTo(m_stand_new, CV_32FC3);
// int rowNumber = m_stand_new.rows; //行数
// int colNumber = m_stand_new.cols*m_stand_new.channels(); //列数 x 通道数=每一行元素的个数
// std::ofstream out_file("/data_1/everyday/1123/img_acc/after_CV_32FC3-float-111.txt");
// //双重循环,遍历所有的像素值
// for (int i = 0; i < rowNumber; i++) //行循环
// {
// uchar *data = m_stand_new.ptr<uchar>(i); //获取第i行的首地址
// for (int j = 0; j < colNumber; j++) //列循环
// {
// // ---------【开始处理每个像素】-------------
// int pix = int(data[j]);
// out_file << pix << std::endl;
// }
// }
//
// out_file.close();
// std::cout<<"==m_stand.convertTo(m_stand, CV_32FC3);=="<<std::endl;
// while(1);
int stand_row = m_stand.rows;
int stand_cols = m_stand.cols;
input_tensor = torch::from_blob(
m_stand.data, {stand_row, stand_cols, 3}).toType(torch::kFloat);
input_tensor = input_tensor.permute({2,0,1});
input_tensor = input_tensor.unsqueeze(0);//.to(torch::kFloat);
// std::cout<<input_tensor;
return true;
}
void GetFileInDir(string dirName, vector<string> &v_path)
{
DIR* Dir = NULL;
struct dirent* file = NULL;
if (dirName[dirName.size()-1] != '/')
{
dirName += "/";
}
if ((Dir = opendir(dirName.c_str())) == NULL)
{
cerr << "Can't open Directory" << endl;
exit(1);
}
while (file = readdir(Dir))
{
//if the file is a normal file
if (file->d_type == DT_REG)
{
v_path.push_back(dirName + file->d_name);
}
//if the file is a directory
else if (file->d_type == DT_DIR && strcmp(file->d_name, ".") != 0 && strcmp(file->d_name, "..") != 0)
{
GetFileInDir(dirName + file->d_name,v_path);
}
}
}
string str_replace(const string &str,const string &str_find,const string &str_replacee)
{
string str_tmp=str;
size_t pos = str_tmp.find(str_find);
while (pos != string::npos)
{
str_tmp.replace(pos, str_find.length(), str_replacee);
size_t pos_t=pos+str_replacee.length();
string str_sub=str_tmp.substr(pos_t,str_tmp.length()-pos_t);
size_t pos_tt=str_sub.find(str_find);
if(string::npos != pos_tt)
{
pos =pos_t + str_sub.find(str_find);
}else
{
pos=string::npos;
}
}
return str_tmp;
}
string get_ans(const string path)
{
int pos_1 = path.find_last_of("_");
int pos_2 = path.find_last_of(".");
string ans = path.substr(pos_1+1,pos_2-pos_1-1);
ans = str_replace(ans,"@","/");
return ans;
}
bool save_tensor_txt(torch::Tensor tensor_in_,string path_txt)
{
#include "fstream"
ofstream outfile(path_txt);
torch::Tensor tensor_in = tensor_in_.clone();
tensor_in = tensor_in.view({-1,1});
tensor_in = tensor_in.to(torch::kCPU);
auto result_data = tensor_in.accessor<float, 2>();
for(int i=0;i<result_data.size(0);i++)
{
float val = result_data[i][0];
// std::cout<<"val="<<val<<std::endl;
outfile<<val<<std::endl;
}
return true;
}
int main()
{
std::string path_pt = "/data_1/everyday/1125/lstm/lstm20211125.pt";//"/data_1/everyday/1118/pytorch_lstm_test/lstmunidirectional20211124.pt";//"/data_1/everyday/1118/pytorch_lstm_test/lstm20211124.pt";//"/data_1/everyday/1118/pytorch_lstm_test/lstm10000.pt";//"/data_1/everyday/1118/pytorch_lstm_test/lstm.pt";
std::string path_img_dir = "/data_1/2020biaozhushuju/2021_rec/general/test";//"/data_1/everyday/1118/pytorch_lstm_test/test_data";
int blank_label = 7117;
std::ifstream list("/data_1/everyday/1123/list.txt");
int standard_w = 320;
int standard_h = 32;
// vector<string> v_path;
// GetFileInDir(path_img_dir, v_path);
// for(int i=0;i<v_path.size();i++)
// {
// std::cout<<i<<" "<<v_path[i]<<std::endl;
// }
torch::Device m_device(torch::kCUDA);
// torch::Device m_device(torch::kCPU);
std::shared_ptr<torch::jit::script::Module> m_model = torch::jit::load(path_pt);
torch::NoGradGuard no_grad;
m_model->to(m_device);
std::cout<<"success load model"<<std::endl;
int cnt_all = 0;
int cnt_right = 0;
double start = getTickCount();
string file;
while(list >> file)
{
file = "/data_2/project_202009/chejian/test_data/model_test/rec_general/1.jpg";
cout<<cnt_all++<<" :: "<<file<<endl;
string jpg=".jpg";
string::size_type idx = file.find( jpg );
if ( idx == string::npos )
continue;
int pos_1 = file.find_last_of("_");
int pos_2 = file.find_last_of(".");
string answer = file.substr(pos_1+1,pos_2-pos_1-1);
cv::Mat img = cv::imread(file);
// int rowNumber = img.rows; //行数
// int colNumber = img.cols*img.channels(); //列数 x 通道数=每一行元素的个数
// std::ofstream out_file("/data_1/everyday/1123/img_acc/libtorch_img.txt");
// //双重循环,遍历所有的像素值
// for (int i = 0; i < rowNumber; i++) //行循环
// {
// uchar *data = img.ptr<uchar>(i); //获取第i行的首地址
// for (int j = 0; j < colNumber; j++) //列循环
// {
// // ---------【开始处理每个像素】-------------
// int pix = int(data[j]);
// out_file << pix << std::endl;
// }
// }
//
// out_file.close();
// while(1);
torch::Tensor tensor_input;
pre_img(img, tensor_input);
tensor_input = tensor_input.to(m_device);
tensor_input.print();
std::cout<<tensor_input[0][2][12][25]<<std::endl;
std::cout<<tensor_input[0][1][15][100]<<std::endl;
std::cout<<tensor_input[0][0][16][132]<<std::endl;
std::cout<<tensor_input[0][1][17][156]<<std::endl;
std::cout<<tensor_input[0][2][5][256]<<std::endl;
std::cout<<tensor_input[0][0][14][205]<<std::endl;
save_tensor_txt(tensor_input, "/data_1/everyday/1124/acc/libtorch_input-100.txt");
torch::Tensor output = m_model->forward({tensor_input}).toTensor();
output.print();
output = output.squeeze();//80,7118
output.print();
// save_tensor_txt(output, "/data_1/everyday/1124/acc/libtorch-out-100.txt");
////// std::cout<<output<<std::endl;
// while(1);
//
torch::Tensor index = torch::argmax(output,1).cpu();//.to(torch::kInt);
index.print();
// std::cout<<index<<std::endl;
// while(1);
int prev_label = blank_label;
string result;
auto result_data = index.accessor<long, 1>();
for(int i=0;i<result_data.size(0);i++)
{
// std::cout<<result_data[i]<<std::endl;
int predict_label = result_data[i];
if (predict_label != blank_label && predict_label != prev_label )
{
{
result = result + chn_tab[predict_label];
}
}
prev_label = predict_label;
}
cout << "answer: " << answer << endl;
cout << "result : " << result << endl;
imshow("src",img);
waitKey(0);
// while(1);
}
// for(int i=0;i<v_path.size();i++)
// {
// cnt_all += 1;
// std::string path_img = v_path[i];
// string ans = get_ans(path_img);
// std::cout<<i<<" path="<<path_img<<" ans="<<ans<<std::endl;
// cv::Mat img = cv::imread(path_img);
// torch::Tensor input = pre_img(img, v_mean, v_std, standard_w, standard_h);
// input = input.to(m_device);
// torch::Tensor output = m_module.forward({input}).toTensor();
//
// std::string rec = get_label(output);
//#if 1 //for show
// std::cout<<"rec="<<rec<<std::endl;
// std::cout<<"ans="<<ans<<std::endl;
// cv::imshow("img",img);
// cv::waitKey(0);
//#endif
//
//#if 0 //In order to test the accuracy
// std::cout<<"rec="<<rec<<std::endl;
// std::cout<<"ans="<<ans<<std::endl;
// if(ans == rec)
// {
// cnt_right += 1;
// }
// std::cout<<"cnt_right="<<cnt_right<<std::endl;
// std::cout<<"cnt_all="<<cnt_all<<std::endl;
// std::cout<<"ratio="<<cnt_right * 1.0 / cnt_all<<std::endl;
//#endif
// }
// double time_cunsume = ((double)getTickCount() - start) / getTickFrequency();
// std::cout<<"ave time="<< time_cunsume * 1.0 / cnt_all * 1000 <<"ms"<<std::endl;
return 0;
}
这里再说下遇到的一些坑,因为不同框架之间做转移,就是需要对比每一个环节的精度,一开始遇到一个问题精度对不上,然后一步步找问题,看哪一个环节精度开始不对的,最终定位在两边opencv imread之后的图像像素就开始不一样了!
原来是opencv版本不一样,一个版本是opencv3.3的,一个是opencv3.4.10的。所以做这些还需要版本严格一致,要不然会带来意想不到的问题。