Hello everyone, *Context:* I am currently facing a headache-inducing issue regarding the integration of flash attention V2 for LLM training. I am running a python script locally, that then runs remotely. Without the integration of flash attention, the co

Answered

Hello everyone,

Context:
I am currently facing a headache-inducing issue regarding the integration of flash attention V2 for LLM training.
I am running a python script locally, that then runs remotely. Without the integration of flash attention, the code runs well and allows fetching data, training models, etc.
For the flash attention integration, I followed carefully the github repo installation steps (and I am quite convinced it is OK). The remote instance on which the code runs is an AWS EC2 instance. The built venv is created via pip here /root/.clearml/venvs-builds/3.9 .

Issue:
At some point during the task run, it fails with that mistake:

File "/root/.clearml/venvs-builds/3.9/task_repository/....git/...", line 252, in fit
model = AutoModelForCausalLM.from_pretrained(
File "/root/.clearml/venvs-builds/3.9/lib/python3.9/site-packages/transformers/models/auto/auto_factory.py", line 566, in from_pretrained
return model_class.from_pretrained(
File "/root/.clearml/venvs-builds/3.9/lib/python3.9/site-packages/transformers/modeling_utils.py", line 3233, in from_pretrained
config = cls._check_and_enable_flash_attn_2(config, torch_dtype=torch_dtype, device_map=device_map)
File "/root/.clearml/venvs-builds/3.9/lib/python3.9/site-packages/transformers/modeling_utils.py", line 1273, in _check_and_enable_flash_attn_2
raise ImportError(
ImportError: Flash Attention 2 is not available. Please refer to the documentation of None for installing it. Make sure to have at least the version 2.1.0
2023-11-08 21:48:05
Process failed, exit code 1

However, the installation of flash_attn package worked: Successfully installed MarkupSafe-2.1.3 einops-0.7.0 filelock-3.13.1 flash-attn-2.3.3 fsspec-2023.10.0 jinja2-3.1.2 mpmath-1.3.0 networkx-3.2.1 ninja-1.11.1.1 nvidia-cublas-cu12-12.1.3.1 nvidia-cuda-cupti-cu12-12.1.105 nvidia-cuda-nvrtc-cu12-12.1.105 nvidia-cuda-runtime-cu12-12.1.105 nvidia-cudnn-cu12-8.9.2.26 nvidia-cufft-cu12-11.0.2.54 nvidia-curand-cu12-10.3.2.106 nvidia-cusolver-cu12-11.4.5.107 nvidia-cusparse-cu12-12.1.0.106 nvidia-nccl-cu12-2.18.1 nvidia-nvjitlink-cu12-12.3.52 nvidia-nvtx-cu12-12.1.105 packaging-23.2 sympy-1.12 torch-2.1.0 triton-2.1.0 typing-extensions-4.8.0

The package has been installed AFTER the Task initialization (before the real script to be run) using this small snippet of code:
import subprocess

install_command = (
'/root/.clearml/venvs-builds/3.9/bin/python -m pip install --upgrade pip && /root/.clearml/venvs-builds/3.9/bin/python -m pip install flash-attn --no-build-isolation'
)
subprocess.run(install_command, shell=True)

From that point, I was very confused. I then decided to run another EC2 instance, go to the same level (loading an LLM with flash attention V2). I connected to the running docker container using the dev container VsCode extension. When running the same piece of code, by providing the venv to the command, it worked.

Conclusion:
I am thus extremely confused knowing that the task fails for a specific part of my training script, while running the same portion of the script in the docker container itself works..,. Does someone has any idea?

My first guess was that the package was installed into an incorrect location (other venv, etc). However, when uninstalling the package, the code running on the dev container failed too, meaning that the installation imo was correctly done.

I know the use case is very personal, but any help would be very appreciated 🙂
Thank you,

  				
Posted 
	one year ago

					More  		
  Report
		
					SuccessfulRaven86
				
					0
					 × 1

Votes Newest

Answers 3

Hi SuccessfulRaven86 , how exactly are you running the code remotely? Is this a daemon agent running on that EC2 instance?

  				
Posted 
	one year ago

					More  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

Hi SuccessfulKoala55 , the EC2 instance is spinned-up from the AWS autoscaler provided by ClearML. I use this following docker image: nvidia/cuda:11.8.0-devel-ubuntu20.0

So the EC2 instance runs a docker container

  				
Posted 
	one year ago

					More  		
  Report
		
					SuccessfulRaven86
				
					0
					 × 1

It is due to the caching mechanism of Clearml. Is there a python command to update the venvs-cache?

  				
Posted 
	one year ago

					More  		
  Report
		
					SuccessfulRaven86
				
					0
					 × 1

Write your answer

1K Views

3 Answers

one year ago