Zexin Li

Please keep honest, open, patient, happy and visionary.

Tensorflow issues & solution

Problem on NVIDIA Jetson Development Kit

  1. Refer enviroment variables:
    1
    2
    3
    4
    5
    os.environ['TF_CPP_MIN_VLOG_LEVEL'] = '1' # Low level of Tensorflow, if need locating bugs, raise it up to 10.
    os.environ['CUDA_CACHE_MAXSIZE'] = "2147483648" # Enable CUDA_CACHE to avoid multiple long-last JIT compiling.
    os.environ['TF_FORCE_GPU_ALLOW_GROWTH'] = 'true' # Avoid allocating all memory rapidly in one process.
    os.environ['TF_FORCE_UNIFIED_MEMORY'] = '1' # Use unified memory to reduce data transfer time.
    os.environ['TF_ENABLE_GPU_GARBAGE_COLLECTION'] = '0' # Boost performance by disable GPU_GARBAGE_COLLCETION, enable only when meeting OOM.
  2. pip3 install h5py very slow/compilation failed: use root account to install pip dependencies.
  3. local_cuda_not_found: switch to verified known version in the above.
  4. c++ compiling error+cannot write file: need at least 32GB extra disk storage,NVIDIA Jetson internal storage is insufficient to a large amount of intermediate file while compiling tensorflow.
  5. c++ compiling error+process xx killed: OOM error, for NVIDIA Jetson TX2/Nano, set 8GB swap first.
  6. cannot import name ‘function_pb2’: switch current path, don’t try to run import tensorflow under Tensorflow source code path.
  7. can compile and pip installation, but cannot pass test, stuck when executing testing files: since Jetpack default CUDA/CUDNN versions may be incompatible to tensorflow official guidance version. Possible solution: (1) use known tested version above (2) from Jetpack downloading corresponding version of CUDA/CUDNN then compiling tensorflow (3) goto NVIDIA forumto ask official help.
  8. C++ compilation of rule ‘//tensorflow/python:bfloat16_lib’ failed (Exit 1): For tensorflow<=2.2, need to downgrade numpy version
    1
    2
    pip install 'numpy<1.19.0'
    # conda install 'numpy<1.19.0'
  9. Runtime error: “CUDA driver version is insufficient for CUDA runtime version”. cuda10.2+cudnn7.0 are incompatible, re-create soft link to cuda9.0+cudnn7.0 and compile again.
  10. Executing long time python tensorflow script, may occur CUDA_UNKNOWN_ERROR: Maybe Tensorflow internal bug or memory problem. Possible solution: reboot the board; pip uninstall tensorflow; pip install tensorflow-xxx.whl
  11. Performance bug of tensorflow: need extremely long time to initialize GPU on TX2 (e.g., on TX2 initialize ResNet50 training requires over 20min): set environment variables export CUDA_CACHE_MAXSIZ="2147483648" and run the tensorflow code twice.
  12. When using unified memory, display ‘NvMapReserveOp 0x80000001 failed [22]’: limit Tensorflow allocating memory.
    1
    2
    3
    4
    5
    config = ConfigProto()
    config.gpu_options.per_process_gpu_memory_fraction = 0.2 # or other small values less than 1.0
    config.gpu_options.experimental.use_unified_memory= True
    with tf.compat.v1.Session(config=config) as s:
    your_program
  13. Performance bug: W tensorflow/core/common_runtime/bfc_allocator.cc:311] Garbage collection: deallocate free memory regions (i.e., allocations) so that we can re-allocate a larger region to avoid OOM due to memory fragmentation. If you see this message frequently, you are running near the threshold of the available device memory and re-allocation may incur great performance overhead. You may try smaller batch sizes to observe the performance impact. Set TF_ENABLE_GPU_GARBAGE_COLLECTION=false if you'd like to disable this feature.: define enviroment variables
    1
    os.environ['TF_ENABLE_GPU_GARBAGE_COLLECTION'] = '0'