Zexin Li

Please keep honest, open, patient, happy and visionary.

Jetson PyTorch Bypass Distributed Errors

Synopsis

To whom needs Running PyTorch HuggingFace based model inference on NVIDIA Jetson.

Problem

When running PyTorch HuggingFace based model inference on NVIDIA Jetson, the following error may occur:

1
2
3
4
5
6
7
8
9
  File "BERT.py", line 60, in <module>
model = AutoModelForCausalLM.from_pretrained("bert-base-uncased").to(device)
File "/experiment/miniforge3/envs/dytransformer/lib/python3.8/site-packages/transformers/models/auto/auto_factory.py", line 563, in from_pretrained
return model_class.from_pretrained(
File "/experiment/miniforge3/envs/dytransformer/lib/python3.8/site-packages/transformers/modeling_utils.py", line 2359, in from_pretrained
if is_fsdp_enabled():
File "/experiment/miniforge3/envs/dytransformer/lib/python3.8/site-packages/transformers/modeling_utils.py", line 118, in is_fsdp_enabled
return torch.distributed.is_initialized() and strtobool(os.environ.get("ACCELERATE_USE_FSDP", "False")) == 1
AttributeError: module 'torch.distributed' has no attribute 'is_initialized'

Solution

The reason why this bug occurs is that the PyTorch wheel compiled for Jetson is not compiled with distributed support. One quick solution is to bypass the distributed module.

Change file "/experiment/miniforge3/envs/dytransformer/lib/python3.8/site-packages/transformers/modeling_utils.py" line 118 from

1
return torch.distributed.is_initialized() and strtobool(os.environ.get("ACCELERATE_USE_FSDP", "False")) == 1

to

1
return False