Zexin Li

Please keep honest, open, patient, happy and visionary.

Synopsis

To whom needs GPU version of TensorFlow running on NVIDIA Jetson.

Prerequisite checklist

  1. Check model of embedded board
  2. Prepare a virtual python environment (e.g., miniforge3), it’s not suggested to follow NVIDIA official version of guideline to install code by sudo (which may mess up system python environments).
  3. Check pre-built binary files on Jetson binaries, if there are some appropriate prebuilt wheels, then download them. Make sure to rename them into the formate like “tensorflow-2.10.0+nv22.10-cp38-cp38-linux_aarch64.whl”.
  4. Run the following code:
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    sudo apt-get update
    sudo apt install python3-dev python3-pip
    sudo apt-get install libhdf5-serial-dev hdf5-tools libhdf5-dev zlib1g-dev zip libjpeg8-dev liblapack-dev libblas-dev gfortran
    sudo apt-get install python3-pip
    # for AGX with Jetpack 5.0.2, python 3.8.10 is suggested.
    conda create -n pytorch_env python=3.8.10
    conda activate pytorch_env
    pip install pip testresources setuptools
    # install by wheel file (example for Jetpack 5.0.2)
    pip install tensorflow-2.10.0+nv22.10-cp38-cp38-linux_aarch64.whl
    # For tensorflow 1.x, tensorflow-1.15.5+nv22.12-cp38-cp38-linux_aarch64.whl is suggested.

Install by source code (not suggested)

Unless you really need some typical version of TensorFlow, otherwise to build a PyTorch wheel is really unnecessary (time-consuming, bug-filled, lack of documents)

1
2
3
git clone https://github.com/tensorflow/tensorflow
cd tensorflow
# later similar to TensorFlow build (The specific procedure is omitted here)

TensorFlow refer to en or zh.

Synopsis

To whom needs GPU version of PyTorch running on NVIDIA Jetson.

Prerequisite checklist

  1. Check model of embedded board
  2. Prepare a virtual python environment (e.g., miniforge3), it’s not suggested to follow NVIDIA official version of guideline to install code by sudo (which may mess up system python environments).
  3. Check pre-built binary files on Jetson binaries, if there are some appropriate prebuilt wheels, then download them. Make sure to rename them into the formate like “ torch-1.10.0-cp36-cp36m-linux_aarch64.whl”.
  4. Run the following code:
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    sudo apt-get update
    sudo apt install python3-dev python3-pip
    sudo apt-get install libhdf5-serial-dev hdf5-tools libhdf5-dev zlib1g-dev zip libjpeg8-dev liblapack-dev libblas-dev gfortran
    sudo apt-get install python3-pip
    # for Jetpack version >= 5.0.2, python 3.8.10 is suggested.
    conda create -n pytorch_env python=3.8.10
    conda activate pytorch_env
    pip install pip testresources setuptools
    # install by wheel file (example for newest Jetpack 5.1.2)
    # News: PyTorch 2.0.0 prebuilt is supported right now.
    pip install torch-2.0.0+nv23.05-cp38-cp38-linux_aarch64.whl

Install by source code (not suggested)

Unless you really need some typical version of PyTorch, otherwise to build a PyTorch wheel is really unnecessary (time-consuming, bug-filled, lack of documents)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
sudo apt-get update
sudo apt-get upgrade
sudo apt-get install python3-dev python3-pip libopenblas-base libopenmpi-dev libomp-dev libopenblas-dev libopenmpi-dev
sudo apt-get install libhdf5-serial-dev hdf5-tools libhdf5-dev zlib1g-dev zip libjpeg8-dev liblapack-dev libblas-dev gfortran

# Download a newer version of cmake for building PyTorch from source
wget https://github.com/Kitware/CMake/releases/download/v3.28.3/cmake-3.28.3-linux-aarch64.tar.gz
tar -xvzf cmake-3.28.3-linux-aarch64.tar.gz
export PATH=/path/to/cmake-3.28.3-linux-aarch64/bin:$PATH

# avalanche-lib requires this pytorch.distributed; so cannot use Jetson pre-built wheels
export USE_NCCL=0
export USE_DISTRIBUTED=1
export USE_QNNPACK=0
export USE_PYTORCH_QNNPACK=0
export TORCH_CUDA_ARCH_LIST="7.2;8.7" # "7.2;8.7" for JetPack 5 wheels for Xavier/Orin
export PYTORCH_BUILD_VERSION=2.1.0 # without the leading 'v', e.g. 1.3.0 for PyTorch v1.3.0
export PYTORCH_BUILD_NUMBER=1

# Build GPU-enabled PyTorch from source for v2.1.0
git clone --recursive --branch v2.1.0 http://github.com/pytorch/pytorch
cd pytorch
pip install -r requirements.txt
pip install scikit-build
pip install ninja
python setup.py bdist_wheel
python setup.py install

TensorFlow refer to en or zh.

Talk is cheap, show me the code

Note: these commands are for x86 based server, not for Jetson itself.

For Jetpack 4.6

1
2
conda create -n py36 python=3.6
conda install pytorch==1.8.0 torchvision==0.9.0 torchaudio==0.8.0 cudatoolkit=11.1 -c pytorch -c conda-forge

For Jetpack 5.0+

1
2
conda create -n py38 python=3.8.10
conda install pytorch==1.13.0 cudatoolkit=11.8 -c pytorch -c conda-forge

Synopsis

Save your Ubuntu PC/server from black/purple screen.

Checklists [Hardware check]

  1. Restart and press [F2/F6/F8/F10], try to get into BIOS. Check CPU/Memory are in good status.
  2. Try to remove the HDMI/DP cable from GPU and connect it to the motherboard and reboot. If you could get into system without problem now, The possible reason is that the recovery mode temporarily disabled the GPU, and then a normal reboot will re-enable the GPU.

Checklists [Software check]

  1. Select Boot Ubuntu in BIOS.
  2. Press [Esc] and [e] to enter grub while booting. (if you could see grub).
  3. use arrow keys to select [Advanced Mode] and press [e] to edit configuration.
  4. use the arrow keys to locate the end of the line that starts with
    1
    "linux /boot/vmlinuz***"
    type nomodeset to make that line to temporarily disable GPU driver.
    1
    "... ro quiet sqlash nomodeset ..."
  5. Fix GPU driver or something else when get into the systems, then reboot can solve the problem.

Last Hope

If the above methods still don’t save your Ubuntu, locating exactly what went wrong and fixing it is expected to be difficult. Probably the most efficient solution is to reinstall Ubuntu (using with caution ONLY and backing up your data).

Synopsis

Save your PC/server in CUDA driver crashed down.

Prequisite

  1. Host machine: A PC/laptop/server with Intel/AMDx86 architeture and Ubuntu installed, can be connected to network.
  2. Physical disk: >10GB available.
  3. sudo

Remove old driver

1
2
3
4
5
sudo apt purge nvidia*
sudo apt purge cuda*
sudo apt clean
sudo apt update
sudo apt upgrade

Follow instructions in NVIDIA official website

For example: NVIDIA

Reboot and configure environmental variables

1
2
3
4
5
sudo reboot
export PATH=/usr/local/cuda/bin:$PATH
# or write into ~/.bashrc
nvcc --version
# check nvcc for verfication

Download cuDNN package and copy them into CUDA PATH

1
2
3
4
wget ...
sudo dpkg -i ...
ls | grep cudnn | xargs -I{} sudo cp {} /usr/local/cuda/include
ls | grep cudnn | xargs -I{} sudo cp {} /usr/local/cuda/lib64

Or goes to NVIDIA Official Site to download: https://developer.nvidia.com/cudnn

Verification

1
2
3
4
5
6
sudo apt-get install libfreeimage3 libfreeimage-dev # install dependencies
cp -r /usr/src/cudnn_samples_v8/ ~
cd cudnn_samples_v8/
cd mnistCUDNN/
make
./mnistCUDNN

Problem on NVIDIA Jetson Development Kit

  1. Refer enviroment variables:
    1
    2
    3
    4
    5
    os.environ['TF_CPP_MIN_VLOG_LEVEL'] = '1' # Low level of Tensorflow, if need locating bugs, raise it up to 10.
    os.environ['CUDA_CACHE_MAXSIZE'] = "2147483648" # Enable CUDA_CACHE to avoid multiple long-last JIT compiling.
    os.environ['TF_FORCE_GPU_ALLOW_GROWTH'] = 'true' # Avoid allocating all memory rapidly in one process.
    os.environ['TF_FORCE_UNIFIED_MEMORY'] = '1' # Use unified memory to reduce data transfer time.
    os.environ['TF_ENABLE_GPU_GARBAGE_COLLECTION'] = '0' # Boost performance by disable GPU_GARBAGE_COLLCETION, enable only when meeting OOM.
  2. pip3 install h5py very slow/compilation failed: use root account to install pip dependencies.
  3. local_cuda_not_found: switch to verified known version in the above.
  4. c++ compiling error+cannot write file: need at least 32GB extra disk storage,NVIDIA Jetson internal storage is insufficient to a large amount of intermediate file while compiling tensorflow.
  5. c++ compiling error+process xx killed: OOM error, for NVIDIA Jetson TX2/Nano, set 8GB swap first.
  6. cannot import name ‘function_pb2’: switch current path, don’t try to run import tensorflow under Tensorflow source code path.
  7. can compile and pip installation, but cannot pass test, stuck when executing testing files: since Jetpack default CUDA/CUDNN versions may be incompatible to tensorflow official guidance version. Possible solution: (1) use known tested version above (2) from Jetpack downloading corresponding version of CUDA/CUDNN then compiling tensorflow (3) goto NVIDIA forumto ask official help.
  8. C++ compilation of rule ‘//tensorflow/python:bfloat16_lib’ failed (Exit 1): For tensorflow<=2.2, need to downgrade numpy version
    1
    2
    pip install 'numpy<1.19.0'
    # conda install 'numpy<1.19.0'
  9. Runtime error: “CUDA driver version is insufficient for CUDA runtime version”. cuda10.2+cudnn7.0 are incompatible, re-create soft link to cuda9.0+cudnn7.0 and compile again.
  10. Executing long time python tensorflow script, may occur CUDA_UNKNOWN_ERROR: Maybe Tensorflow internal bug or memory problem. Possible solution: reboot the board; pip uninstall tensorflow; pip install tensorflow-xxx.whl
  11. Performance bug of tensorflow: need extremely long time to initialize GPU on TX2 (e.g., on TX2 initialize ResNet50 training requires over 20min): set environment variables export CUDA_CACHE_MAXSIZ="2147483648" and run the tensorflow code twice.
  12. When using unified memory, display ‘NvMapReserveOp 0x80000001 failed [22]’: limit Tensorflow allocating memory.
    1
    2
    3
    4
    5
    config = ConfigProto()
    config.gpu_options.per_process_gpu_memory_fraction = 0.2 # or other small values less than 1.0
    config.gpu_options.experimental.use_unified_memory= True
    with tf.compat.v1.Session(config=config) as s:
    your_program
  13. Performance bug: W tensorflow/core/common_runtime/bfc_allocator.cc:311] Garbage collection: deallocate free memory regions (i.e., allocations) so that we can re-allocate a larger region to avoid OOM due to memory fragmentation. If you see this message frequently, you are running near the threshold of the available device memory and re-allocation may incur great performance overhead. You may try smaller batch sizes to observe the performance impact. Set TF_ENABLE_GPU_GARBAGE_COLLECTION=false if you'd like to disable this feature.: define enviroment variables
    1
    os.environ['TF_ENABLE_GPU_GARBAGE_COLLECTION'] = '0'

Synopsis

Q: Who needs install tensorflow from source code?
A: 1. tensorflow developer (or anyone want to improve it); 2. developers who need a specific version of tensorflow but don’t have the resources to download itds. Note: only using tensorflow API are not required to building from source, suggested to use NVIDIA offcial tensorflow image, or download built .whl file and install from pip.

Install Tensorflow 1.x from Source Code

Jetsonhacks offers github installation guidance

Install Tensorflow 2.x from Source Code

Currently nearly no concrete instructions for building Tensorflow 2.x from source. If have any problem, please see the below section: [Potential Problem].

Prerequisite

  1. Read official guidance to check the target version.tensorflow official guidance NVIDIA building tensorflow official guidance
    Currently can build/run:
  2. Jetpack 3.3 (python 3.5/3.6) + CUDA 10.2 + CUDNN 7.1 + tensorflow r2.2;
  3. Jetpack 4.6 (python 3.6) + CUDA 10.2 + CUDNN 7.1 + tensorflow r2.2;
  4. (recommended)Jetpack 4.6 (python 3.6) + CUDA 10.2 + CUDNN 8.2 + tensorflow r2.4。
  5. Flash the board correctly (begin from clean os): Use NVIDIA offcial SDKmanager to install Jetpack【How to flash Jetson board】. For instance, TX2 supports Jetpack version 4.6, 4.5.1, 4.5, 3.3. Note: 4.5+version may result in CUDA/CUDNN imcompatible.
  6. Install python dependencies
    For jetpack 4.5+
    1
    2
    3
    4
    5
    6
    7
    sudo apt-get update
    sudo apt install python3-dev python3-pip
    sudo apt-get install libhdf5-serial-dev hdf5-tools libhdf5-dev zlib1g-dev zip libjpeg8-dev liblapack-dev libblas-dev gfortran
    sudo apt-get install python3-pip
    sudo pip3 install -U pip testresources setuptools==49.6.0
    sudo pip3 install -U --no-deps numpy==1.18.5 future==0.18.2 mock==3.0.5 keras_preprocessing==1.1.2 keras_applications==1.0.8 gast==0.4.0 protobuf pybind11 cython pkgconfig
    sudo env H5PY_SETUP_REQUIRES=0 pip3 install -U h5py==2.10.0
    For jetpack 3.3
    1
    2
    3
    4
    5
    6
    7
    sudo apt-get update
    sudo apt install python3-dev python3-pip
    sudo apt-get install libhdf5-serial-dev hdf5-tools libhdf5-dev zlib1g-dev zip libjpeg8-dev liblapack-dev libblas-dev gfortran
    curl -fsSL https://bootstrap.pypa.io/pip/3.5/get-pip.py | python3.5 # pip3 has abandoned python 3.5 support, cannot use apt to install python3.5 pip
    sudo pip3 install -U testresources setuptools
    sudo pip3 install -U --no-deps numpy future mock keras_preprocessing keras_applications gast protobuf pybind11 cython pkgconfig
    sudo env H5PY_SETUP_REQUIRES=0 pip3 install -U h5py
    Note: python dependenceis may conflict and result in import core dumped. Best check by a reactive python3 by importing these dependencies. One common feasible solution is to downgrade numpy to 1.18.5.
  7. Install JDK dependencies
    1
    2
    3
    4
    5
    ref to <a href="https://docs.bazel.build/versions/main/install-ubuntu.html">link</a>
    # For jetpack 4.5+ Ubuntu 18.04 (LTS) uses OpenJDK 11 by default:
    sudo apt-get install openjdk-11-jdk
    # For jetpack 3.3 Ubuntu 16.04 (LTS) uses OpenJDK 8 by default:
    sudo apt-get install openjdk-8-jdk
  8. Install bazel
    Build bazel,the version refers to .bazelversion on /tensorflow
    1
    2
    3
    4
    5
    6
    7
    8
    mkdir bazel-$BAZEL_VERSION
    cd bazel-$BAZEL_VERSION
    wget https://github.com/bazelbuild/bazel/releases/download/$BAZEL_VERSION/bazel-$BAZEL_VERSION-dist.zip
    unzip bazel-$BAZEL_VERSION-dist.zip
    rm bazel-bazel-$BAZEL_VERSION-dist.zip
    ./compile.sh
    sudo cp output/bazel /usr/local/bin
    bazel version
  9. Enable swap for board with less than 16GB memory
    o.w., may result in OOM error -> Error: c++ compiling error *** Killed
    1
    2
    3
    4
    5
    fallocate -l 8G swapfile
    chmod 600 swapfile
    mkswap swapfile
    sudo swapon swapfile
    swapon -s
    Or directly use the script offered by jetsonhack
    1
    2
    wget https://raw.githubusercontent.com/jetsonhacks/installTensorFlowTX2/master/createSwapfile.sh
    ./createSwapfile.sh -d /experiment -s 8

Compilation/Installation/Testing

  1. Get source code
    1
    2
    3
    git clone https://github.com/tensorflow/tensorflow.git
    cd tensorflow
    git checkout r2.2 # 对于jetpack 3.3
  2. Configuration/Compilation
    Note: very time consuming, best with an disconnection prevention session (e.g. tmux)
    1
    2
    3
    4
    5
    # Configure correct path of python interpreter,enable CUDA
    ./configure
    # Set output_base path (default is ~/.cache/), o.w. cross-disk will cause compilation to be much slower than without cross-disk
    bazel --output_base=/experiment/tensorflow_pkg build --verbose_failures --config=noaws --config=cuda //tensorflow/tools/pip_package:build_pip_package
    ./bazel-bin/tensorflow/tools/pip_package/build_pip_package /experiment/tensorflow_tmp
  3. Install tensorflow .whl
    1
    pip install /experiment/tensorflow_tmp/tensorflow-$version-$tags.whl
  4. Testing
    Smoking Test
    1
    2
    3
    python3
    >> import tensorflow
    >> exit()
    GPU/MNIST
    1
    # Refer to two files under https://github.com/peterlee0127/tensorflow-nvJetson/tree/master/tf-test

Potential Problem

  1. pip3 install h5py very slow/compilation failed: use root account to install pip dependencies.
  2. local_cuda_not_found: switch to verified known version in the above.
  3. c++ compiling error+cannot write file: need at least 32GB extra disk storage,NVIDIA Jetson internal storage is insufficient to a large amount of intermediate file while compiling tensorflow.
  4. c++ compiling error+process xx killed: OOM error, for NVIDIA Jetson TX2/Nano, set 8GB swap first.
  5. cannot import name ‘function_pb2’: switch current path, don’t try to run import tensorflow under Tensorflow source code path.
  6. can compile and pip installation, but cannot pass test, stuck when executing testing files: since Jetpack default CUDA/CUDNN versions may be incompatible to tensorflow official guidance version. Possible solution: (1) use known tested version above (2) from Jetpack downloading corresponding version of CUDA/CUDNN then compiling tensorflow (3) goto NVIDIA forumto ask official help.
  7. C++ compilation of rule ‘//tensorflow/python:bfloat16_lib’ failed (Exit 1): For tensorflow<=2.2, need to downgrade numpy version
    1
    2
    pip install 'numpy<1.19.0'
    # conda install 'numpy<1.19.0'
  8. Runtime error: “CUDA driver version is insufficient for CUDA runtime version”. cuda10.2+cudnn7.0 are incompatible, re-create soft link to cuda9.0+cudnn7.0 and compile again.
  9. Executing long time python tensorflow script, may occur CUDA_UNKNOWN_ERROR: Maybe Tensorflow internal bug or memory problem. Possible solution: reboot the board; pip uninstall tensorflow; pip install tensorflow-xxx.whl
  10. Performance bug of tensorflow: need extremely long time to initialize GPU on TX2 (e.g., on TX2 initialize ResNet50 training requires over 20min): set environment variables export CUDA_CACHE_MAXSIZ="2147483648" and run the tensorflow code twice.
  11. import numpy/tensorflow core dumped: Downgrade numpy<=1.18.5
  12. h5py installing stucks and cannot be successfully compiled, error ‘Cython is not installed’ with Cython installed: Downgrade numpy<=1.18.5

Weekly Contest 301

Congratulations to 4A!
2909 / 24327 = 11.96%
Finish Time: 1:03:06
Q1: 0:03:38 -> easy loop
Q2: 0:15:22 -> greedy and map
Q3: 0:57:14 -> greedy and map (use python) (take too much time)
Q4: 1:03:06 -> very straightforward math

Weekly Contest 301

5742 / 23561 = 24.37%
Finish Time: 1:26:18
Q1: 0:10:42 -> easy greedy (take too much time)
Q2: 0:27:39 -> priority queue (take too much time)
Q3: 1:16:18 (2 erros) -> crash car problem
Q4: N/A -> dynamic programming

Weekly Contest 286

2766 / 21339 = 12.96%
Finish Time: 1:10:21
Q1: 0:06:04 -> easy loop (take too much time)
Q2: 0:12:41 -> greedy
Q3: 1:10:21 -> modified binary search (slow implementation)
Q4: N/A -> dynamic programming

Weekly Contest 283

5044 / 19916 = 25.33%
Finish Time: 1:09:06
Q1: 0:10:51 -> easy loop (1 bug)
Q2: 0:17:31 -> easy loop + map
Q3: 0:44:06 -> binary search (4 bugs) (long long error)
Q4L N/A -> Trie tree or easy math

Weekly Contest 282

3828 / 15143 = 25.28%
Finish Time: 1:35:40
Q1: 0:02:36 -> easy loop
Q2: 0:05:17 -> map
Q3: 1:10:40 (5 errors) -> modified binary search
Q4: N/A -> dynamic programming

Synopsis

NVIDIA Jetson Development Toolkit is based on arm64 architecure. Correpsondingly, many system libraries/deep learning libraries cannot be found&intalled easily (compared to x86 platform). To solve such dependencies problem once and for all, NVIDIA offers official flashing tool (Jetpack). This article focuses on the technical details of Jetpack 4.6 flashing.

JetsonHacks offers official flashing turorial videos corresponding to different versions (Jetpack 4.2 installation Jetpack 3.0 installation) which may have some differences but similar to Jetpack 4.6. For older version (e.g., Jetpack 3.0), please ref to Jetson TX2 Flashing Tutorial

Prequisite

  1. Host machine: An PC/laptop with Intel/AMDx86 architeture and Ubuntu 18.04 installed, can be connected to network.
  2. Physical disk: >40GB available.
  3. Installing Jetpack package: apply for NVIDIA developer account, then download Jetpack (newest version 4.6 is recommend) aat NVIDIA based on board hardware model.
  4. A Mirco-USB data cable, an HDMI cable, one screen, mouse and keyboard.

Installing Jetpack

  1. Disconnect the development board, disconnect the development board network cable, and connect the development board and host computer with the data cable. Double-click to run the Jetpack 4.6 installer (or command line install), then type in command: sdkmanager (make sure don’t use root here). After login NVIDIA developer account, tick host and target, select device hardware manually.
  2. Create 2 new folders to store libraries files and targets image files.
  3. Begin downloading and wait patiently to install OS ([Note]The development board is still powered off at this time)
  4. Sdkmanger will notify whether (auto flash)/(manual flash). Select (manual flash) and switch the board to Recovery Mode, for instance (Jetson TX2):
    (a) Ensure that the development board is initially in a power-off state and that the Micro USB cable is properly connected
    (b) Power on the development board, press power to boot, then quickly press and hold the recovery key without releasing, press the reset key, and release the recovery key after 2 seconds
    (c) At this point there should be 2 green lights on
  5. To verify that you are in the forced recovery state, type in the host command: lsusb. If Nvidia Corp is in device list then success, o.w. back to do step (4) again.
  6. Connect the HDMI cable + screen to the development board. Then click flash on sdkmanager.
  7. Wait patiently for the development board to initialize the ubuntu system (configure the account password). Sdkmanager will ask to choose the connection method + enter your account password. In this case, choose to use the USB connection with default ip address, and enter the account/password of the ubuntu system of the development board.
  8. Wait patiently for long time until all libraries installed. Click exit to quit sdkmanager.

Further Configuration

Connect the development board and check IP address with ifconfig | grep eth0 -a1. Then it can be accessed by ssh and configure as normal Ubuntu.

Potential Problems and Solutions

  1. Slow downloading:
    Replace the networking environment that can bypass firewall (GFW) and has sufficient bandwidth.
  2. Strange issues in installing process reported by sdkmanager(e.g., dependencies error)
    Replace the host machine with cleanUbuntu 18.04, most of these kinds of problems are due to system configuration conflicts, which are very difficult to locate and fix.
  3. Performance bugs (Some CPU cores on board are idle)
    rewrite /boot/extlinux/exltinux.conf isolcpus=1-2 to isolcpus= (ref to Performance bug solution to TX2.

总述

Q: 什么人需要源码安装tensorflow?

A: 1. tensorflow的开发者(想改进它的代码); 2. 需要指定版本tensorflow但没有下载资源的开发者。仅使用tensorflow API的开发者不需要编译,建议使用NVIDIA官方镜像,或者从官方下载编译好的.whl文件并使用pip安装。

tensorflow 1.x源码安装

Jetsonhacks提供了github安装指导

tensorflow 2.x源码安装

目前公网基本没有编译2.x的教程,因此需要大量踩坑。遇到问题请查看文末【潜在问题】。

准备工作

  1. 查询官方指南确定需要安装的版本。tensorflow官方编译指南 NVIDIA编译tensorflow指南

目前经测试可编译+运行:

  1. Jetpack 3.3 (python 3.5/3.6) + CUDA 10.2 + CUDNN 7.1 + tensorflow r2.2;

  2. Jetpack 4.6 (python 3.6) + CUDA 10.2 + CUDNN 7.1 + tensorflow r2.2;

  3. (推荐)Jetpack 4.6 (python 3.6) + CUDA 10.2 + CUDNN 8.2 + tensorflow r2.4。

  4. 正确刷机: 使用NVIDIA官方提供的NVIDIA SDK Manager刷入Jetpack【刷机教程】。以TX2为例,目前Jetpack可选版本4.6, 4.5.1, 4.5, 3.3。注意: 4.5+版本CUDA/CUDNN版本较高,可能会出现不兼容情况。

  5. 安装python依赖库
    对于jetpack 4.5+

    1
    2
    3
    4
    5
    6
    7
    sudo apt-get update
    sudo apt install python3-dev python3-pip
    sudo apt-get install libhdf5-serial-dev hdf5-tools libhdf5-dev zlib1g-dev zip libjpeg8-dev liblapack-dev libblas-dev gfortran
    sudo apt-get install python3-pip
    sudo pip3 install -U pip testresources setuptools==49.6.0
    sudo pip3 install -U --no-deps numpy==1.18.5 future==0.18.2 mock==3.0.5 keras_preprocessing==1.1.2 keras_applications==1.0.8 gast==0.4.0 protobuf pybind11 cython pkgconfig
    sudo env H5PY_SETUP_REQUIRES=0 pip3 install -U h5py==2.10.0

    对于jetpack 3.3

    1
    2
    3
    4
    5
    6
    7
    sudo apt-get update
    sudo apt install python3-dev python3-pip
    sudo apt-get install libhdf5-serial-dev hdf5-tools libhdf5-dev zlib1g-dev zip libjpeg8-dev liblapack-dev libblas-dev gfortran
    curl -fsSL https://bootstrap.pypa.io/pip/3.5/get-pip.py | python3.5 # 因为pip3已经放弃对python 3.5的支持, 所以不能从apt安装pip
    sudo pip3 install -U testresources setuptools
    sudo pip3 install -U --no-deps numpy future mock keras_preprocessing keras_applications gast protobuf pybind11 cython pkgconfig
    sudo env H5PY_SETUP_REQUIRES=0 pip3 install -U h5py

注意: python依赖库版本,装错可能会冲突和import core dumped。装完开一个交互式命令行python3尝试import这几个包。一个常见的方案是降级numpy版本至1.18.5。

  1. 安装jdk依赖库

    1
    2
    3
    4
    5
    参考https://docs.bazel.build/versions/main/install-ubuntu.html
    # 对于jetpack 4.5+ Ubuntu 18.04 (LTS) uses OpenJDK 11 by default:
    sudo apt-get install openjdk-11-jdk
    # 对于jetpack 3.3 Ubuntu 16.04 (LTS) uses OpenJDK 8 by default:
    sudo apt-get install openjdk-8-jdk
  2. 安装bazel
    编译/下载安装bazel,版本号参考.bazelversion

    1
    2
    3
    4
    5
    6
    7
    8
    mkdir bazel-$BAZEL_VERSION
    cd bazel-$BAZEL_VERSION
    wget https://github.com/bazelbuild/bazel/releases/download/$BAZEL_VERSION/bazel-$BAZEL_VERSION-dist.zip
    unzip bazel-$BAZEL_VERSION-dist.zip
    rm bazel-bazel-$BAZEL_VERSION-dist.zip
    ./compile.sh
    sudo cp output/bazel /usr/local/bin
    bazel version
  3. 挂载交换区,否则可能会内存错误(编译期间使用超过8GB内存) -> Error: c++ compiling error *** Killed

    1
    2
    3
    4
    5
    fallocate -l 8G swapfile
    chmod 600 swapfile
    mkswap swapfile
    sudo swapon swapfile
    swapon -s # 输出结果中应有priority -1的swap

    或者,直接使用jetsonhack提供的创建swap脚本

    1
    2
    wget https://raw.githubusercontent.com/jetsonhacks/installTensorFlowTX2/master/createSwapfile.sh
    ./createSwapfile.sh -d /experiment -s 8

编译/安装/测试

  1. 获取tensorflow源码

    1
    2
    3
    git clone https://github.com/tensorflow/tensorflow.git
    cd tensorflow
    git checkout r2.2 # 对于jetpack 3.3
  2. 配置/编译 (编译特别耗时, 在全功率模式下TX2需要大约13h, AGX需要大约6h::建议起一个防断连接session(例如tmux), 否则编译到一半连接没了那真的是emo了)

    1
    2
    3
    4
    5
    #配置正确的python路径,CUDA那一项输入y,其他全部回车跳过
    ./configure
    # 注意设置output_base路径,如果跨盘会导致编译比不跨盘慢很多。默认为~/.cache/
    bazel --output_base=/experiment/tensorflow_pkg build --verbose_failures --config=noaws --config=cuda //tensorflow/tools/pip_package:build_pip_package
    ./bazel-bin/tensorflow/tools/pip_package/build_pip_package /experiment/tensorflow_tmp
  3. 安装tensorflow .whl

    1
    pip install /experiment/tensorflow_tmp/tensorflow-$version-$tags.whl
  4. 测试

    1
    2
    3
    4
    python3
    >> import tensorflow # 冒烟测试
    >> exit()
    GPU/mnist测试 #参考 https://github.com/peterlee0127/tensorflow-nvJetson/tree/master/tf-test

潜在问题

  1. pip3安装h5py编译不通过: 全部使用root账户安装pip3依赖库
  2. local_cuda_not_found的编译错误: 切换已知可用版本(tensorflowr2.5+jetpack 4.5+版本可能会出现此错误)
  3. c++ compiling error+cannot write file: 需要至少32GB外置储存设备,NVIDIA Jetson系列的内置储存(通常为32GB)不足以容纳编译tensorflow产生的大量中间文件
  4. c++ compiling error+process xx killed: 对于TX2/Nano一定要先设置8GB交换区(swap),否则会出现内存不足错误
  5. cannot import name ‘function_pb2’: 切换当前目录,不要在tensorflow源代码路径import tensorflow
  6. 能够编译+pip安装,但无法通过测试,执行测试文件卡住: 由于jetpack默认安装的CUDA/CUDNN版本和tensorflow官方推荐版本不匹配,可能会出现类似的兼容性问题。可能的解决方案: (1)使用推荐版本编译 (2)从jetpack下载/编译对应版本的CUDA/CUDNN,再编译tensorflow (3)到NVIDIA开发者论坛提问,但目前官方也没有很好的解决办法。
  7. C++ compilation of rule ‘//tensorflow/python:bfloat16_lib’ failed (Exit 1): 对于tensorflow<=2.2 可能是因为numpy版本太高不兼容,使用conda/pip降级numpy版本
    1
    2
    pip install 'numpy<1.19.0'
    # conda install 'numpy<1.19.0'
  8. Runtime error: “CUDA driver version is insufficient for CUDA runtime version”。手动安装了个cuda10.2+cudnn7.0,这两者会冲突。重新链接cuda9.0+cudnn7.0编译,一切就正常了。
  9. 执行长时间python tensorflow脚本后,可能会出现再执行tensorflow脚本就出现CUDA_UNKNOWN_ERROR并异常退出情况。猜测可能是tensorflow自身的bug或者内存问题。解决方案:重新启动,pip uninstall tensorflow; pip install tensorflow-xxx.whl
  10. tensorflow启动GPU需要非常久(例如在TX2上启动ResNet50的训练需要加载20min以上): 设置环境变量export CUDA_CACHE_MAXSIZ="2147483648"
  11. import numpy/tensorflow 出现core dumped: 降级numpy版本<=1.18.5
  12. h5py安装编译很长时间,持续无法编译通过,虽然Cython已经安装,但报错类似’Cython is not installed’: 降级numpy版本<=1.18.5