Zexin Li

Please keep honest, open, patient, happy and visionary.

Learning from others

  1. A road to become a good researcher in computer architecture & security from Mr. Yicheng Zhang.
  2. 和导师一起赶文章死线(Deadline)的十大注意事项 from Dr. Yiran Chen.
  3. 教诲 from Wei Li.
  4. Tips about writing systems papers.
  5. 如何有效地报告 Bug.
  6. How To Ask Questions The Smart Way.
  7. 计算机系统会议论文是如何评审的 from Dr. Haibo Chen.
  8. 博士五年总结系列.
  9. 一名系统研究者的攀登之路 from Dr. Haibo Chen.
  10. CS自学指南.

History

A pretty comprehensive NVIDIA Jetson AGX Xavier cheatsheet
Jetson Install PyG
Jetson Build Torchvision
Jetson PyTorch Bypass Distributed Errors
Jetson change swap memory
Jetson mount nvme disk
Jetson set huggingface cache
Jetson install GPU TensorFlow
Jetson install GPU PyTorch
Configure a same virtual environment on server as Jetson
Tensorflow issues & solution
Flash NVIDIA Jetson TX2
NVIDIA Jetson install tensorflow 2.x from source code
[zh] NVIDIA Jetson install tensorflow 2.x from source code
[zh] Flash NVIDIA Jetson TX2

Synopsis

Sometimes we need to migrate user from /home to new disk, e.g., when the boot disk is full. This post is a note for this process.

Prerequisite checklist

Check if the new disk is mounted correctly by df -h. If not, mount it by sudo mount ${DEVICE_NAME} ${MOUNT_POINT}.

Move user home directory

1
2
sudo mv /home/${USER_NAME} ${MOUNT_POINT}
sudo ln -s ${MOUNT_POINT}/${USER_NAME} /home/${USER_NAME}

(Important) Make sure the permission is correct for the new home directory

The permission of the new home directory should be 755 and the permission of .ssh and authorized_keys should be 700 and 600 respectively. Otherwise, ssh login will fail.

1
2
3
sudo chmod 755 ${MOUNT_POINT}/${USER_NAME}
sudo chmod 700 ${MOUNT_POINT}/${USER_NAME}/.ssh
sudo chmod 600 ${MOUNT_POINT}/${USER_NAME}/.ssh/authorized_keys

Synopsis

To whom needs GPU version of PyG on NVIDIA Jetson (or MacBook with Apple Silicon). Please refer to link.

Synopsis

To whom needs GPU version of Torchvision running on NVIDIA Jetson. - Cannot simply install torchvision by pip.

Prerequisite checklist

  1. Check model of embedded board
  2. Prepare a virtual python environment (e.g., miniforge3), it’s not suggested to follow NVIDIA official version of guideline to install code by sudo (which may mess up system python environments).
  3. Check pre-built binary files on Jetson binaries, if there are some appropriate prebuilt wheels, then download them. Make sure to rename them into the formate like “ torch-1.10.0-cp36-cp36m-linux_aarch64.whl”.
  4. Install PyTorch via this blog.
  5. Query Torchvision version compatible with current PyTorch version.

Install by source code

Refer to this blog from NVIDIA forum

After installing PyTorch by wheels from NVIDIA, then install torchvision by source code to avoid compatibility issues.

1
2
3
4
5
6
sudo apt-get install libjpeg-dev zlib1g-dev libpython3-dev libopenblas-dev libavcodec-dev libavformat-dev libswscale-dev
git clone --branch <version> https://github.com/pytorch/vision torchvision # see below for version of torchvision to download
cd torchvision
export BUILD_VERSION=0.x.0 # where 0.x.0 is the torchvision version
# for instance: 0.16.0 refers to PyTorch v2.1.0
python setup.py install --user

Synopsis

To whom needs Running PyTorch HuggingFace based model inference on NVIDIA Jetson.

Problem

When running PyTorch HuggingFace based model inference on NVIDIA Jetson, the following error may occur:

1
2
3
4
5
6
7
8
9
  File "BERT.py", line 60, in <module>
model = AutoModelForCausalLM.from_pretrained("bert-base-uncased").to(device)
File "/experiment/miniforge3/envs/dytransformer/lib/python3.8/site-packages/transformers/models/auto/auto_factory.py", line 563, in from_pretrained
return model_class.from_pretrained(
File "/experiment/miniforge3/envs/dytransformer/lib/python3.8/site-packages/transformers/modeling_utils.py", line 2359, in from_pretrained
if is_fsdp_enabled():
File "/experiment/miniforge3/envs/dytransformer/lib/python3.8/site-packages/transformers/modeling_utils.py", line 118, in is_fsdp_enabled
return torch.distributed.is_initialized() and strtobool(os.environ.get("ACCELERATE_USE_FSDP", "False")) == 1
AttributeError: module 'torch.distributed' has no attribute 'is_initialized'

Solution

The reason why this bug occurs is that the PyTorch wheel compiled for Jetson is not compiled with distributed support. One quick solution is to bypass the distributed module.

Change file "/experiment/miniforge3/envs/dytransformer/lib/python3.8/site-packages/transformers/modeling_utils.py" line 118 from

1
return torch.distributed.is_initialized() and strtobool(os.environ.get("ACCELERATE_USE_FSDP", "False")) == 1

to

1
return False

Synopsis

To whom needs access ImageNet-1k from HuggingFace Hub.

Prerequisite checklist

  1. Enough disk (>150GB)
  2. Anaconda pip
  3. Run the following code:
    1
    2
    3
    4
    5
    6
    pip install datasets
    # use a linux command screen to prevent session crashed
    # refer to https://www.geeksforgeeks.org/screen-command-in-linux-with-examples/
    screen -S download
    # get a token from https://huggingface.co/settings/tokens
    huggingface-cli login

Download ImageNet-1k with Token Assess

1
2
3
4
# refer to https://huggingface.co/datasets/imagenet-1k
# refer to https://discuss.huggingface.co/t/imagenet-1k-is-not-available-in-huggingface-dataset-hub/25040
from datasets import load_dataset
dset = load_dataset('imagenet-1k', split='train', use_auth_token=True)

Jtop

1
sudo jtop

Command

1
sudo swapoff -a

1
sudo mount /dev/nvme0n1 /experiments