Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi All, I Was Trying To Use Clearml-Task To Run A Custom Docker(With Poetry To Install All The Python Dependencies And Activated The Environment) Using Clearml Gpu, But It Seems Like Clearml Always Create A Virtual Environment And Run The Python Script Fr

Hi all, I was trying to use clearml-task to run a custom docker(with poetry to install all the python dependencies and activated the environment) using clearml GPU, but it seems like clearml always create a virtual environment and run the python script from /root/.clearml/venvs-builds/3.10/bin/python . Is there a way that I can have the clearml-task to automatically activated a virtual environment use the activated custom virtual environment in my docker and run the scripts from there instead of always creating a new venv inheriting from the clearml system_site_packages? I noticed that clearml.conf has a configuration agent.docker_use_activated_venv , but I am not sure how to enable it from clearml-task

  
  
Posted 9 months ago
Votes Newest

Answers 38


That's the right place but
like you would use hydra --override, which in your case I think it should be "accelerator.gpu" ,

You can also change allow_omegaconf_editin the UI to True, and then you could just edit the OmegaConf in the UI (if you do not changeallow_omegaconf_edit` then the edit in the UI is ignored)

  
  
Posted 9 months ago

okay, when I run main.py on my local machine, I can use python main.py experiement=example.yaml to override acceleator to GPU option. But seems like the --args experiement=example.yaml in clearml-task didn't work so I have to manually modify it on UI?

clearml-task \
    --project fluoro-motion-detection \
    --name uniformer-test \
    --repo git@github.com:imperative-care-campbell/algorithms-python.git \
    --branch SW-956-Fluoro-Motion-Detection \
    --script fluoro_motion_detection/src/run/main.py \
    --args experiment=example.yaml \
    --docker mzhengtelos/algorithm-ml:pyenv \
    --docker_args "--env CLEARML_AGENT_SKIP_PIP_VENV_INSTALL=$PYTHON_ENV_DIR --env AWS_ACCESS_KEY_ID=$AWS_ACCESS_KEY_ID --env AWS_SECRET_ACCESS_KEY=$AWS_SECRET_ACCESS_KEY" \
    --queue test-gpu
  
  
Posted 9 months ago

image

  
  
Posted 9 months ago

I am using hydra in main.py

  
  
Posted 9 months ago

Hi @<1597762318140182528:profile|EnchantingPenguin77>

, but it seems like clearml always create a virtual environmen

Yes that's correct, but the new venv inside the container inherits from the system packages (so if nothing changes it does nothing)

Is there a way that I can have the clearml-task to automatically activated a virtual environment use the activated custom virtual environment in my docker and run the scripts

Yoo can but the "correct" way to work with python and containers is to actually install everything on the system (not venv)
That said, just set this env variable to point top the python binary inside your venv in the container
CLEARML_AGENT_SKIP_PIP_VENV_INSTALL=/root/venv/bin/python
None

  
  
Posted 9 months ago

The queue will be empty when I run task

  
  
Posted 9 months ago

you should have a gpu argument there, set it to true

  
  
Posted 9 months ago

Not the file the UI

  
  
Posted 9 months ago

it has been pending whole day yesterday, but today it's able to run the task

  
  
Posted 9 months ago

None
See: Add an experiment hyperparameter:
and add gpu : True

  
  
Posted 9 months ago

I see, like that?
image

  
  
Posted 9 months ago

I did use --args to clearml-task command for this run, but it looks like the docker didn't take it
image

  
  
Posted 9 months ago

And how did you connect your example,yaml?

  
  
Posted 9 months ago

There is nothing on the queue and worker
image

  
  
Posted 9 months ago

well I do not think you set your pytorch lightining to use cuda:

GPU available: True (cuda), used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
/code/.venv/lib/python3.9/site-packages/lightning/pytorch/trainer/setup.py:176: PossibleUserWarning: GPU available but not used. Set `accelerator` and `devices` using `Trainer(accelerator='gpu', devices=1)`.
  
  
Posted 9 months ago

is it displaying that it is running anything?

  
  
Posted 9 months ago

I was trying to run python main.py experiemnt=example.yaml

  
  
Posted 9 months ago

@<1597762318140182528:profile|EnchantingPenguin77> can you provide the full log?

  
  
Posted 9 months ago

I actually have aborted it

  
  
Posted 9 months ago

@<1523701205467926528:profile|AgitatedDove14> I'm trying to run Clearml GPU compute(RTX 3080) with pytorch-lightning but keep getting CUDA error. Is there any specific CUDA/Ubuntu/torch/python version required? I tried several different version but can't make it work

FROM nvidia/cuda:11.8.0-cudnn8-devel-ubuntu22.04 as telos_algorithms
  File "/code/.venv/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 1013, in _run_stage
    with isolate_rng():
  File "/.pyenv/versions/3.10.9/lib/python3.10/contextlib.py", line 135, in __enter__
    return next(self.gen)
  File "/code/.venv/lib/python3.10/site-packages/lightning/pytorch/utilities/seed.py", line 42, in isolate_rng
    states = _collect_rng_states(include_cuda)
  File "/code/.venv/lib/python3.10/site-packages/lightning/fabric/utilities/seed.py", line 115, in _collect_rng_states
    states["torch.cuda"] = torch.cuda.get_rng_state_all()
  File "/code/.venv/lib/python3.10/site-packages/torch/cuda/random.py", line 39, in get_rng_state_all
    results.append(get_rng_state(i))
  File "/code/.venv/lib/python3.10/site-packages/torch/cuda/random.py", line 22, in get_rng_state
    _lazy_init()
  File "/code/.venv/lib/python3.10/site-packages/torch/cuda/__init__.py", line 247, in _lazy_init
    torch._C._cuda_init()
RuntimeError: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 804: forward compatibility was attempted on non supported HW
Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
  
  
Posted 9 months ago

I've added gpu:True to my hydra config file but the GPU is still not used

  
  
Posted 9 months ago

I see, seems like the -args for scripts didn't passed to the docker:

--script fluoro_motion_detection/src/run/main.py \
--args experiment=example.yaml \
  
  
Posted 9 months ago

It seems like CPU is working on something, I saw the usage is spiking periodically but I didn't run any task this morning

  
  
Posted 9 months ago

Click on the Task it is running and abort it, it seems to be stuck, I guess this is why the others are not pulled

  
  
Posted 9 months ago

@<1523701205467926528:profile|AgitatedDove14> Is there any reason why you mentioned that the "correct" way to work with python and containers is to actually install everything on the system (not venv)?

  
  
Posted 9 months ago

Notice you should be able to override them in the UI (under Args seciton)

  
  
Posted 9 months ago

@<1523701205467926528:profile|AgitatedDove14> Yes I cansee the worker:
image

  
  
Posted 9 months ago

I got the same cuda issue after being able to use GPU
image

  
  
Posted 9 months ago

Here it is @<1523701205467926528:profile|AgitatedDove14>

  
  
Posted 9 months ago

Yes, because when a container is executed, the agent creates a new venv and inherits from the system wide installed packages, but it cannot inherit or "understand" there is an existing venv, and where it is.

  
  
Posted 9 months ago