使用 Python 接口编译和优化模型（AutoTVM）

在本节，将介绍与TVMC相同的知识，但展示的是如何使用Python API来完成它。完成本节后，我们将使用适用于 TVM 的 Python API 来完成以下任务：

为TVM Runtime编译预训练的ResNet-50 v2模型
通过编译的模型运行真实图像，并解释输出和模型性能。
使用TVM在CPU上对模型调优
使用 TVM 收集的调优数据重新编译优化的模型。
通过优化的模型运行图像，并比较输出和模型性能。

TVM是一个深度学习编译器框架，具有许多不同的模块可用于处理深度学习模型和运算符。在本教程中，我们将介绍如何使用 Python API 加载、编译和优化模型。
我们首先导入许多依赖项，包括用于加载和转换模型的onnx，用于下载测试数据的帮助器实用程序，用于处理图像数据的Python图像库，用于图像数据预处理和后处理的numpy，TVM Relay框架以及TVM图形执行器。

import onnx
from tvm.contrib.download import download_testdata
from PIL import Image
import numpy as np
import tvm.relay as relay
import tvm
from tvm.contrib import graph_executor

下载并加载ONNX模型

model_url = (
    "https://github.com/onnx/models/raw/main/"
    "vision/classification/resnet/model/"
    "resnet50-v2-7.onnx"
)

model_path = download_testdata(model_url, "resnet50-v2-7.onnx", module="onnx")
onnx_model = onnx.load(model_path)

# Seed numpy's RNG to get consistent results
np.random.seed(0)

下载/预处理/加载测试图片

同样，模型的输入/输出采用的是numpy的.npz格式，

下载图片数据，并转换成numpy数组作为输入，送入模型

img_url = "https://s3.amazonaws.com/model-server/inputs/kitten.jpg"
img_path = download_testdata(img_url, "imagenet_cat.png", module="data")

# Resize it to 224x224
resized_image = Image.open(img_path).resize((224, 224))
img_data = np.asarray(resized_image).astype("float32")

# Our input image is in HWC layout while ONNX expects CHW input, so convert the array
img_data = np.transpose(img_data, (2, 0, 1))

# Normalize according to the ImageNet input specification
imagenet_mean = np.array([0.485, 0.456, 0.406]).reshape((3, 1, 1))
imagenet_stddev = np.array([0.229, 0.224, 0.225]).reshape((3, 1, 1))
norm_img_data = (img_data / 255 - imagenet_mean) / imagenet_stddev

# Add the batch dimension, as we are expecting 4-dimensional input: NCHW.
img_data = np.expand_dims(norm_img_data, axis=0)

Compile the Model With Relay

下一步就是编译ResNet模型。使用from_onnx将模型导入到relay。然后，我们通过标准优化将模型构建到TVM库中。我们从library中创建一个 TVM graph runtime模块。

target = "llvm"

可使用Netron检查模型的的输入名字

# The input name may vary across model types. You can use a tool
# like Netron to check input names
input_name = "data"
shape_dict = {input_name: img_data.shape}

mod, params = relay.frontend.from_onnx(onnx_model, shape_dict)

with tvm.transform.PassContext(opt_level=3):
    lib = relay.build(mod, target=target, params=params)

dev = tvm.device(str(target), 0)
module = graph_executor.GraphModule(lib["default"](dev))

Execute on the TVM Runtime

现在我们已经编译了模型，我们可以使用 TVM 运行时对其进行预测。要使用 TVM 运行模型并进行预测，我们需要两件事：

编译好的模型，这个刚刚处理过了
有效的模型输入，对它进行预测

dtype = "float32"
module.set_input(input_name, img_data)
module.run()
output_shape = (1, 1000)
tvm_output = module.get_output(0, tvm.nd.empty(output_shape)).numpy()

Collect Basic Performance Data

我们希望收集与此未优化模型关联的一些基本性能数据，并在以后将其与优化的模型进行比较。为了帮助考虑 CPU 噪声，我们多次重复地分批运行计算，然后收集有关平均值、中位数和标准差的一些基础统计信息。

import timeit

timing_number = 10
timing_repeat = 10
unoptimized = (
    np.array(timeit.Timer(lambda: module.run()).repeat(repeat=timing_repeat, number=timing_number))
    * 1000
    / timing_number
)
unoptimized = {
    "mean": np.mean(unoptimized),
    "median": np.median(unoptimized),
    "std": np.std(unoptimized),
}

print(unoptimized)

运行结果：

{'mean': 48.89584059594199, 'median': 48.16241894150153, 'std': 2.2564635214327597}

Postprocess the output

如前所述，每个模型都有自己提供输出张量的特定方式。
在我们的例子中，我们需要运行一些后处理，使用为模型提供的查找表将ResNet-50 v2的输出呈现为更人性化的形式。

from scipy.special import softmax

# Download a list of labels
labels_url = "https://s3.amazonaws.com/onnx-model-zoo/synset.txt"
labels_path = download_testdata(labels_url, "synset.txt", module="data")

with open(labels_path, "r") as f:
    labels = [l.rstrip() for l in f]

# Open the output and read the output tensor
scores = softmax(tvm_output)
scores = np.squeeze(scores)
ranks = np.argsort(scores)[::-1]
for rank in ranks[0:5]:
    print("class='%s' with probability=%f" % (labels[rank], scores[rank]))

运行结果如下：

class='n02123045 tabby, tabby cat' with probability=0.621105
class='n02123159 tiger cat' with probability=0.356377
class='n02124075 Egyptian cat' with probability=0.019712
class='n02129604 tiger, Panthera tigris' with probability=0.001215
class='n04040759 radiator' with probability=0.000262

这应该产生以下输出：

# class='n02123045 tabby, tabby cat' with probability=0.610553
# class='n02123159 tiger cat' with probability=0.367179
# class='n02124075 Egyptian cat' with probability=0.019365
# class='n02129604 tiger, Panthera tigris' with probability=0.001273
# class='n04040759 radiator' with probability=0.000261

Tune the model

在某些情况下，使用编译的模块运行推理时，我们可能无法获得预期的性能。在这种情况下，我们可以利用auto tuner，为我们的模型找到更好的配置，并提高性能。
TVM中的tuning 是指对模型进行优化以使其在特定目标上运行得更快的过程。这与training 和 fine-tuning 不同，因为它不影响模型的准确性，而只是影响运行时的性能。作为调整过程的一部分，TVM将尝试运行许多不同的算子实现变体，以观察哪些运算器表现最佳。这些运行的结果被储存在一个调整记录文件中。
在最简单的形式下，tuning要求你提供三样东西。

打算在这个模型上运行的设备的目标规格
输出文件的路径，调整记录将被存储在该文件中
需要被tune的模型

import tvm.auto_scheduler as auto_scheduler
from tvm.autotvm.tuner import XGBTuner
from tvm import autotvm

为运行器设置一些基本参数。运行器接收用一组特定参数生成的编译代码，并测量其性能。

number指定了我们将测试的不同配置的数量
repeat指定了我们将对每个配置进行多少次测量。
min_repeat_ms指定了需要运行配置测试的时间。如果重复次数低于这个时间，它将被增加。这个选项对于在GPU上进行精确调整是必要的，而对于CPU调整则不需要。将这个值设置为0可以禁用它。
timeout是对每个测试配置运行训练代码的时间的上限

number = 10
repeat = 1
min_repeat_ms = 0  # since we're tuning on a CPU, can be set to 0
timeout = 10  # in seconds

# create a TVM runner
runner = autotvm.LocalRunner(
    number=number,
    repeat=repeat,
    timeout=timeout,
    min_repeat_ms=min_repeat_ms,
    enable_cpu_cache_flush=True,
)

创建一个简单的结构体保存tuning options.使用xgboost算法来搜索。对于产线任务，需要把试验次数设置得比使用的20次的大。对CPU推荐1500,GPU 3000～400.所需的试验次数可能取决于特定的模型和处理器，因此值得花一些时间来评估各种数值的性能，以找到调整时间和模型优化之间的最佳平衡。因为runing tuning是需要时间的，这里将实验次数设置为10次，但不建议使用这么小的数值。
early_stopping参数是指在应用提前停止搜索的条件之前，需要运行的最小测试数。
measure option参数指的是trial code哪里被构建，什么时候被运行。本例中，使用了LocalRunner（刚刚创建的）和一个LocalBuilder。
tuning_records选项指的是一个写入调优数据的文件

tuning_option = {
    "tuner": "xgb",
    "trials": 20,
    "early_stopping": 100,
    "measure_option": autotvm.measure_option(
        builder=autotvm.LocalBuilder(build_func="default"), runner=runner
    ),
    "tuning_records": "resnet-50-v2-autotuning.json",
}

# begin by extracting the tasks from the onnx model
tasks = autotvm.task.extract_from_program(mod["main"], target=target, params=params)

# Tune the extracted tasks sequentially.
for i, task in enumerate(tasks):
    prefix = "[Task %2d/%2d] " % (i + 1, len(tasks))
    tuner_obj = XGBTuner(task, loss_type="rank")
    tuner_obj.tune(
        n_trial=min(tuning_option["trials"], len(task.config_space)),
        early_stopping=tuning_option["early_stopping"],
        measure_option=tuning_option["measure_option"],
        callbacks=[
            autotvm.callback.progress_bar(tuning_option["trials"], prefix=prefix),
            autotvm.callback.log_to_file(tuning_option["tuning_records"]),
        ],
    )

输出结果：

[Task  1/25]  Current/Best:   85.05/ 251.80 GFLOPS | Progress: (20/20) | 7.68 s Done.
[Task  2/25]  Current/Best:   91.92/ 209.11 GFLOPS | Progress: (20/20) | 6.07 s Done.
[Task  3/25]  Current/Best:   95.80/ 219.97 GFLOPS | Progress: (20/20) | 6.27 s Done.
[Task  4/25]  Current/Best:  166.34/ 237.16 GFLOPS | Progress: (20/20) | 7.10 s Done.
[Task  5/25]  Current/Best:   81.68/ 260.01 GFLOPS | Progress: (20/20) | 6.13 s Done.
[Task  6/25]  Current/Best:   41.35/ 242.81 GFLOPS | Progress: (20/20) | 6.41 s Done.
[Task  7/25]  Current/Best:   75.99/ 240.20 GFLOPS | Progress: (20/20) | 5.48 s Done.
[Task  8/25]  Current/Best:  123.49/ 216.88 GFLOPS | Progress: (20/20) | 9.69 s Done.
[Task  9/25]  Current/Best:   53.55/ 230.81 GFLOPS | Progress: (20/20) | 16.94 s Done.
[Task 10/25]  Current/Best:   86.86/ 240.26 GFLOPS | Progress: (20/20) | 5.03 s Done.
[Task 11/25]  Current/Best:  191.19/ 257.60 GFLOPS | Progress: (20/20) | 6.02 s Done.
[Task 12/25]  Current/Best:   94.22/ 225.94 GFLOPS | Progress: (20/20) | 6.71 s Done.
[Task 13/25]  Current/Best:  127.52/ 220.16 GFLOPS | Progress: (20/20) | 6.42 s Done.
[Task 14/25]  Current/Best:  239.47/ 252.94 GFLOPS | Progress: (20/20) | 18.66 s Done.
[Task 15/25]  Current/Best:   62.80/ 260.21 GFLOPS | Progress: (20/20) | 13.09 s Done.
[Task 16/25]  Current/Best:   86.70/ 194.14 GFLOPS | Progress: (20/20) | 5.30 s Done.
[Task 17/25]  Current/Best:  101.12/ 257.36 GFLOPS | Progress: (20/20) | 6.23 s Done.
[Task 18/25]  Current/Best:  130.45/ 248.23 GFLOPS | Progress: (20/20) | 6.19 s Done.
[Task 19/25]  Current/Best:   26.57/ 237.67 GFLOPS | Progress: (20/20) | 7.63 s Done.
[Task 20/25]  Current/Best:  140.13/ 179.09 GFLOPS | Progress: (20/20) | 14.41 s Done.
[Task 21/25]  Current/Best:   49.51/ 199.20 GFLOPS | Progress: (20/20) | 11.11 s Done.
[Task 22/25]  Current/Best:  193.76/ 228.26 GFLOPS | Progress: (20/20) | 5.81 s Done.
[Task 23/25]  Current/Best:   61.72/ 257.58 GFLOPS | Progress: (20/20) | 9.12 s Done.
[Task 25/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s Done.
[Task 25/25]  Current/Best:    4.54/  42.50 GFLOPS | Progress: (20/20) | 22.68 s

Compiling an Optimized Model with Tuning Data

上述调优过程的输出结果，将其储存在了resnet-50-v2-autotuning.json中。编译器将使用该结果在你指定的目标上为该模型生成高性能代码。
现在，模型的调整数据已经收集完毕，我们可以使用优化的算子重新编译模型，以加快我们的计算速度。

with autotvm.apply_history_best(tuning_option["tuning_records"]):
    with tvm.transform.PassContext(opt_level=3, config={}):
        lib = relay.build(mod, target=target, params=params)

dev = tvm.device(str(target), 0)
module = graph_executor.GraphModule(lib["default"](dev))

输出：

/home/workspace/tvm/tvm/python/tvm/driver/build_module.py:267: UserWarning: target_host parameter is going to be deprecated. Please pass in tvm.target.Target(target, host=target_host) instead.
  warnings.warn(

验证优化后的模型是否运行并产生相同的结果

dtype = "float32"
module.set_input(input_name, img_data)
module.run()
output_shape = (1, 1000)
tvm_output = module.get_output(0, tvm.nd.empty(output_shape)).numpy()

scores = softmax(tvm_output)
scores = np.squeeze(scores)
ranks = np.argsort(scores)[::-1]
for rank in ranks[0:5]:
    print("class='%s' with probability=%f" % (labels[rank], scores[rank]))