1、git clone tensorflow serving 及tensorflow代码
2、
ERROR: /root/.cache/bazel/_bazel_root/f71d782da17fd83c84ed6253a342a306/external/local_config_cuda/crosstool/BUILD:4:1: Traceback (most recent call last): File "/root/.cache/bazel/_bazel_root/f71d782da17fd83c84ed6253a342a306/external/local_config_cuda/crosstool/BUILD", line 4 error_gpu_disabled() File "/root/.cache/bazel/_bazel_root/f71d782da17fd83c84ed6253a342a306/external/local_config_cuda/crosstool/error_gpu_disabled.bzl", line 3, in error_gpu_disabled fail("ERROR: Building with --config=c...") ERROR: Building with --config=cuda but TensorFlow is not configured to build with GPU support. Please re-run ./configure and enter 'Y' at the prompt to build with GPU support. ERROR: no such target '@local_config_cuda//crosstool:toolchain': target 'toolchain' not declared in package 'crosstool ' defined by /root/.cache/bazel/_bazel_root/f71d782da17fd83c84ed6253a342a306/external/local_config_cuda/crosstool/BUILD INFO: Elapsed time: 0.093s INFO: 0 processes. FAILED: Build did NOT complete successfully (0 packages loaded)
解决办法:
export TF_NEED_CUDA="1"
3、看起来./configure配置全部不能生效,因此需要自己配置变量指引cuda、cudnn、nccl等所有的非默认路径
解决办法:通过变量设置
export PATH=$PATH:/env/bazel-0.15.0/bin export TF_NEED_CUDA="1" export CUDNN_INSTALL_PATH="/usr/local/cudnn7.3_cuda9.0" export CUDA_INSTALL_PATH="/usr/local/cuda-9.0" export TF_CUDA_VERSION="9.0" export TF_CUDNN_VERSION="7" export TF_NCCL_VERSION="2.2" export NCCL_INSTALL_PATH="/env/nccl_2.2.13-1+cuda9.0_x86_64" export TEST_TMPDIR=/home
4、遇到nvcc检测cuda版本与设置不一致的问题。
ERROR: no such package '@local_config_cuda//crosstool': Traceback (most recent call last): File "/search/odin/zhangliang/_bazel_root/f71d782da17fd83c84ed6253a342a306/external/org_tensorflow/third_party /gpus/cuda_configure.bzl", line 1447 _create_local_cuda_repository(repository_ctx) File "/search/odin/zhangliang/_bazel_root/f71d782da17fd83c84ed6253a342a306/external/org_tensorflow/third_party/gpus/cuda_configure.bzl", line 1187, in _create_local_cuda_repository _get_cuda_config(repository_ctx) File "/search/odin/zhangliang/_bazel_root/f71d782da17fd83c84ed6253a342a306/external/org_tensorflow/third_party/gpus/cuda_configure.bzl", line 909, in _get_cuda_config _cuda_version(repository_ctx, cuda_toolkit_path, c...) File "/search/odin/zhangliang/_bazel_root/f71d782da17fd83c84ed6253a342a306/external/org_tensorflow/third_party/gpus/cuda_configure.bzl", line 492, in _cuda_version auto_configure_fail(("CUDA version detected from nvc...))) File "/search/odin/zhangliang/_bazel_root/f71d782da17fd83c84ed6253a342a306/external/org_tensorflow/third_party/gpus/cuda_configure.bzl", line 317, in auto_configure_fail fail((" %sCuda Configuration Error:%...))) Cuda Configuration Error: CUDA version detected from nvcc (8.0.61) does not match TF_CUDA_VERSION (9.0) INFO: Elapsed time: 0.785s INFO: 0 processes. FAILED: Build did NOT complete successfully (1 packages loaded)
解决办法,修改cuda toolkit的地址:
export CUDA_TOOLKIT_PATH="/usr/local/cuda-9.0"
5、
ERROR: no such package '@local_config_cc//': Traceback (most recent call last): File "/search/odin/zhangliang/_bazel_root/f71d782da17fd83c84ed6253a342a306/external/bazel_tools/tools/cpp/cc_configure.bzl", line 56 configure_unix_toolchain(repository_ctx, cpu_value, overriden...) File "/search/odin/zhangliang/_bazel_root/f71d782da17fd83c84ed6253a342a306/external/bazel_tools/tools/cpp/unix_cc_configure.bzl", line 477, in configure_unix_toolchain _find_generic(repository_ctx, "gcc", "CC", overriden...) File "/search/odin/zhangliang/_bazel_root/f71d782da17fd83c84ed6253a342a306/external/bazel_tools/tools/cpp/unix_cc_configure.bzl", line 459, in _find_generic auto_configure_fail(msg) File "/search/odin/zhangliang/_bazel_root/f71d782da17fd83c84ed6253a342a306/external/bazel_tools/tools/cpp/lib_cc_configure.bzl", line 109, in auto_configure_fail fail((" %sAuto-Configuration Error:%...))) Auto-Configuration Error: Cannot find gcc or CC (gcc -std=gnu99); either correct your path or set the CC environment variable ERROR: Analysis of target '//tensorflow_serving/model_servers:tensorflow_model_server' failed; build aborted: no such package '@local_config_cc//': Traceback (most recent call last): File "/search/odin/zhangliang/_bazel_root/f71d782da17fd83c84ed6253a342a306/external/bazel_tools/tools/cpp/cc_configure.bzl", line 56 configure_unix_toolchain(repository_ctx, cpu_value, overriden...) File "/search/odin/zhangliang/_bazel_root/f71d782da17fd83c84ed6253a342a306/external/bazel_tools/tools/cpp/unix_cc_configure.bzl", line 477, in configure_unix_toolchain _find_generic(repository_ctx, "gcc", "CC", overriden...) File "/search/odin/zhangliang/_bazel_root/f71d782da17fd83c84ed6253a342a306/external/bazel_tools/tools/cpp/unix_cc_configure.bzl", line 459, in _find_generic auto_configure_fail(msg) File "/search/odin/zhangliang/_bazel_root/f71d782da17fd83c84ed6253a342a306/external/bazel_tools/tools/cpp/lib_cc_configure.bzl", line 109, in auto_configure_fail fail((" %sAuto-Configuration Error:%...))) Auto-Configuration Error: Cannot find gcc or CC (gcc -std=gnu99); either correct your path or set the CC environment variable INFO: Elapsed time: 2.579s INFO: 0 processes. FAILED: Build did NOT complete successfully (4 packages loaded)
解决办法:
CC=/usr/bin/gcc
6、
ERROR: /search/odin/zhangliang/code/serving-1.14/serving/tensorflow_serving/model_servers/BUILD:356:1: Linking of rule '//tensorflow_serving/model_servers:tensorflow_model_server' failed (Exit 1) /usr/bin/ld: bazel-out/k8-opt/bin/tensorflow_serving/model_servers/_objs/tensorflow_model_server/tensorflow_serving/mo del_servers/version.o: relocation R_X86_64_32 against `.rodata' can not be used when making a shared object; recompile with -fPIC bazel-out/k8-opt/bin/tensorflow_serving/model_servers/_objs/tensorflow_model_server/tensorflow_serving/model_servers/version.o: could not read symbols: Bad value collect2: error: ld returned 1 exit status Target //tensorflow_serving/model_servers:tensorflow_model_server failed to build Use --verbose_failures to see the command lines of failed build steps. INFO: Elapsed time: 697.810s, Critical Path: 331.33s INFO: 3321 processes: 3321 local. FAILED: Build did NOT complete successfully
解决办法:
编译tensorflow_model_server_main_lib时出错,查看了一下,发现是linkstamp “version.cc”时出错,提示应该加-fPIC。
简单的绕过办法:
BUILD
cc_library( name = "tensorflow_model_server_main_lib", srcs = [ "main.cc", ], #hdrs = [ # "version.h", #], #linkstamp = "version.cc", visibility = [ ":tensorflow_model_server_custom_op_clients", "//tensorflow_serving:internal", ], deps = [ ":server_lib", "@org_tensorflow//tensorflow/c:c_api", "@org_tensorflow//tensorflow/core:lib", "@org_tensorflow//tensorflow/core/platform/cloud:gcs_file_system", "@org_tensorflow//tensorflow/core/platform/hadoop:hadoop_file_system", "@org_tensorflow//tensorflow/core/platform/s3:s3_file_system", ], )
main.cc
//#include "tensorflow_serving/model_servers/version.h" ... if (display_version) { std::cout << "TensorFlow ModelServer: " << "r1.12" << " " << "TensorFlow Library: " << TF_Version() << " "; return 0; }