Zexin Li

Please keep honest, open, patient, happy and visionary.

NVIDIA Jetson install tensorflow 2.x from source code

Synopsis

Q: Who needs install tensorflow from source code?
A: 1. tensorflow developer (or anyone want to improve it); 2. developers who need a specific version of tensorflow but don’t have the resources to download itds. Note: only using tensorflow API are not required to building from source, suggested to use NVIDIA offcial tensorflow image, or download built .whl file and install from pip.

Install Tensorflow 1.x from Source Code

Jetsonhacks offers github installation guidance

Install Tensorflow 2.x from Source Code

Currently nearly no concrete instructions for building Tensorflow 2.x from source. If have any problem, please see the below section: [Potential Problem].

Prerequisite

  1. Read official guidance to check the target version.tensorflow official guidance NVIDIA building tensorflow official guidance
    Currently can build/run:
  2. Jetpack 3.3 (python 3.5/3.6) + CUDA 10.2 + CUDNN 7.1 + tensorflow r2.2;
  3. Jetpack 4.6 (python 3.6) + CUDA 10.2 + CUDNN 7.1 + tensorflow r2.2;
  4. (recommended)Jetpack 4.6 (python 3.6) + CUDA 10.2 + CUDNN 8.2 + tensorflow r2.4。
  5. Flash the board correctly (begin from clean os): Use NVIDIA offcial SDKmanager to install Jetpack【How to flash Jetson board】. For instance, TX2 supports Jetpack version 4.6, 4.5.1, 4.5, 3.3. Note: 4.5+version may result in CUDA/CUDNN imcompatible.
  6. Install python dependencies
    For jetpack 4.5+
    1
    2
    3
    4
    5
    6
    7
    sudo apt-get update
    sudo apt install python3-dev python3-pip
    sudo apt-get install libhdf5-serial-dev hdf5-tools libhdf5-dev zlib1g-dev zip libjpeg8-dev liblapack-dev libblas-dev gfortran
    sudo apt-get install python3-pip
    sudo pip3 install -U pip testresources setuptools==49.6.0
    sudo pip3 install -U --no-deps numpy==1.18.5 future==0.18.2 mock==3.0.5 keras_preprocessing==1.1.2 keras_applications==1.0.8 gast==0.4.0 protobuf pybind11 cython pkgconfig
    sudo env H5PY_SETUP_REQUIRES=0 pip3 install -U h5py==2.10.0
    For jetpack 3.3
    1
    2
    3
    4
    5
    6
    7
    sudo apt-get update
    sudo apt install python3-dev python3-pip
    sudo apt-get install libhdf5-serial-dev hdf5-tools libhdf5-dev zlib1g-dev zip libjpeg8-dev liblapack-dev libblas-dev gfortran
    curl -fsSL https://bootstrap.pypa.io/pip/3.5/get-pip.py | python3.5 # pip3 has abandoned python 3.5 support, cannot use apt to install python3.5 pip
    sudo pip3 install -U testresources setuptools
    sudo pip3 install -U --no-deps numpy future mock keras_preprocessing keras_applications gast protobuf pybind11 cython pkgconfig
    sudo env H5PY_SETUP_REQUIRES=0 pip3 install -U h5py
    Note: python dependenceis may conflict and result in import core dumped. Best check by a reactive python3 by importing these dependencies. One common feasible solution is to downgrade numpy to 1.18.5.
  7. Install JDK dependencies
    1
    2
    3
    4
    5
    ref to <a href="https://docs.bazel.build/versions/main/install-ubuntu.html">link</a>
    # For jetpack 4.5+ Ubuntu 18.04 (LTS) uses OpenJDK 11 by default:
    sudo apt-get install openjdk-11-jdk
    # For jetpack 3.3 Ubuntu 16.04 (LTS) uses OpenJDK 8 by default:
    sudo apt-get install openjdk-8-jdk
  8. Install bazel
    Build bazel,the version refers to .bazelversion on /tensorflow
    1
    2
    3
    4
    5
    6
    7
    8
    mkdir bazel-$BAZEL_VERSION
    cd bazel-$BAZEL_VERSION
    wget https://github.com/bazelbuild/bazel/releases/download/$BAZEL_VERSION/bazel-$BAZEL_VERSION-dist.zip
    unzip bazel-$BAZEL_VERSION-dist.zip
    rm bazel-bazel-$BAZEL_VERSION-dist.zip
    ./compile.sh
    sudo cp output/bazel /usr/local/bin
    bazel version
  9. Enable swap for board with less than 16GB memory
    o.w., may result in OOM error -> Error: c++ compiling error *** Killed
    1
    2
    3
    4
    5
    fallocate -l 8G swapfile
    chmod 600 swapfile
    mkswap swapfile
    sudo swapon swapfile
    swapon -s
    Or directly use the script offered by jetsonhack
    1
    2
    wget https://raw.githubusercontent.com/jetsonhacks/installTensorFlowTX2/master/createSwapfile.sh
    ./createSwapfile.sh -d /experiment -s 8

Compilation/Installation/Testing

  1. Get source code
    1
    2
    3
    git clone https://github.com/tensorflow/tensorflow.git
    cd tensorflow
    git checkout r2.2 # 对于jetpack 3.3
  2. Configuration/Compilation
    Note: very time consuming, best with an disconnection prevention session (e.g. tmux)
    1
    2
    3
    4
    5
    # Configure correct path of python interpreter,enable CUDA
    ./configure
    # Set output_base path (default is ~/.cache/), o.w. cross-disk will cause compilation to be much slower than without cross-disk
    bazel --output_base=/experiment/tensorflow_pkg build --verbose_failures --config=noaws --config=cuda //tensorflow/tools/pip_package:build_pip_package
    ./bazel-bin/tensorflow/tools/pip_package/build_pip_package /experiment/tensorflow_tmp
  3. Install tensorflow .whl
    1
    pip install /experiment/tensorflow_tmp/tensorflow-$version-$tags.whl
  4. Testing
    Smoking Test
    1
    2
    3
    python3
    >> import tensorflow
    >> exit()
    GPU/MNIST
    1
    # Refer to two files under https://github.com/peterlee0127/tensorflow-nvJetson/tree/master/tf-test

Potential Problem

  1. pip3 install h5py very slow/compilation failed: use root account to install pip dependencies.
  2. local_cuda_not_found: switch to verified known version in the above.
  3. c++ compiling error+cannot write file: need at least 32GB extra disk storage,NVIDIA Jetson internal storage is insufficient to a large amount of intermediate file while compiling tensorflow.
  4. c++ compiling error+process xx killed: OOM error, for NVIDIA Jetson TX2/Nano, set 8GB swap first.
  5. cannot import name ‘function_pb2’: switch current path, don’t try to run import tensorflow under Tensorflow source code path.
  6. can compile and pip installation, but cannot pass test, stuck when executing testing files: since Jetpack default CUDA/CUDNN versions may be incompatible to tensorflow official guidance version. Possible solution: (1) use known tested version above (2) from Jetpack downloading corresponding version of CUDA/CUDNN then compiling tensorflow (3) goto NVIDIA forumto ask official help.
  7. C++ compilation of rule ‘//tensorflow/python:bfloat16_lib’ failed (Exit 1): For tensorflow<=2.2, need to downgrade numpy version
    1
    2
    pip install 'numpy<1.19.0'
    # conda install 'numpy<1.19.0'
  8. Runtime error: “CUDA driver version is insufficient for CUDA runtime version”. cuda10.2+cudnn7.0 are incompatible, re-create soft link to cuda9.0+cudnn7.0 and compile again.
  9. Executing long time python tensorflow script, may occur CUDA_UNKNOWN_ERROR: Maybe Tensorflow internal bug or memory problem. Possible solution: reboot the board; pip uninstall tensorflow; pip install tensorflow-xxx.whl
  10. Performance bug of tensorflow: need extremely long time to initialize GPU on TX2 (e.g., on TX2 initialize ResNet50 training requires over 20min): set environment variables export CUDA_CACHE_MAXSIZ="2147483648" and run the tensorflow code twice.
  11. import numpy/tensorflow core dumped: Downgrade numpy<=1.18.5
  12. h5py installing stucks and cannot be successfully compiled, error ‘Cython is not installed’ with Cython installed: Downgrade numpy<=1.18.5