Hello everyone,
Context:
I am currently facing a headache-inducing issue regarding the integration of flash attention V2 for LLM training.
I am running a python script locally, that then runs remotely. Without the integration of flash attention, the code runs well and allows fetching data, training models, etc.
For the flash attention integration, I followed carefully the github repo installation steps (and I am quite convinced it is OK). The remote instance on which the code runs is an AWS EC2 instance. The built venv is created via pip here /root/.clearml/venvs-builds/3.9
.
Issue:
At some point during the task run, it fails with that mistake:
File "/root/.clearml/venvs-builds/3.9/task_repository/....git/...", line 252, in fit
model = AutoModelForCausalLM.from_pretrained(
File "/root/.clearml/venvs-builds/3.9/lib/python3.9/site-packages/transformers/models/auto/auto_factory.py", line 566, in from_pretrained
return model_class.from_pretrained(
File "/root/.clearml/venvs-builds/3.9/lib/python3.9/site-packages/transformers/modeling_utils.py", line 3233, in from_pretrained
config = cls._check_and_enable_flash_attn_2(config, torch_dtype=torch_dtype, device_map=device_map)
File "/root/.clearml/venvs-builds/3.9/lib/python3.9/site-packages/transformers/modeling_utils.py", line 1273, in _check_and_enable_flash_attn_2
raise ImportError(
ImportError: Flash Attention 2 is not available. Please refer to the documentation of
None for installing it. Make sure to have at least the version 2.1.0
2023-11-08 21:48:05
Process failed, exit code 1
However, the installation of flash_attn
package worked: Successfully installed MarkupSafe-2.1.3 einops-0.7.0 filelock-3.13.1 flash-attn-2.3.3 fsspec-2023.10.0 jinja2-3.1.2 mpmath-1.3.0 networkx-3.2.1 ninja-1.11.1.1 nvidia-cublas-cu12-12.1.3.1 nvidia-cuda-cupti-cu12-12.1.105 nvidia-cuda-nvrtc-cu12-12.1.105 nvidia-cuda-runtime-cu12-12.1.105 nvidia-cudnn-cu12-8.9.2.26 nvidia-cufft-cu12-11.0.2.54 nvidia-curand-cu12-10.3.2.106 nvidia-cusolver-cu12-11.4.5.107 nvidia-cusparse-cu12-12.1.0.106 nvidia-nccl-cu12-2.18.1 nvidia-nvjitlink-cu12-12.3.52 nvidia-nvtx-cu12-12.1.105 packaging-23.2 sympy-1.12 torch-2.1.0 triton-2.1.0 typing-extensions-4.8.0
The package has been installed AFTER the Task initialization (before the real script to be run) using this small snippet of code:
import subprocess
install_command = (
'/root/.clearml/venvs-builds/3.9/bin/python -m pip install --upgrade pip && /root/.clearml/venvs-builds/3.9/bin/python -m pip install flash-attn --no-build-isolation'
)
subprocess.run(install_command, shell=True)
From that point, I was very confused. I then decided to run another EC2 instance, go to the same level (loading an LLM with flash attention V2). I connected to the running docker container using the dev container
VsCode extension. When running the same piece of code, by providing the venv to the command, it worked.
Conclusion:
I am thus extremely confused knowing that the task fails for a specific part of my training script, while running the same portion of the script in the docker container itself works..,. Does someone has any idea?
My first guess was that the package was installed into an incorrect location (other venv, etc). However, when uninstalling the package, the code running on the dev container failed too, meaning that the installation imo was correctly done.
I know the use case is very personal, but any help would be very appreciated 🙂
Thank you,