Reputation
Badges 1
611 × Eureka!I colleague fixed my server and I can confirm, that the fix works!
However, to use conda as package manager I need a docker image that provides conda.
Yes, I do not want to rely on the clearml-agent. Afaik the clearml-sdk in the container does the downloading and since a host directory is mounted, it is mirrored there. If it was possible to not mount the host directory, everything would be contained 🙂
Interesting: This command failes (with an error similar to the one I posted above) in conda version 4.7.12 but runs just fine in version 4.9.2: conda create --name test-pytorch python=3.8 cudatoolkit=11.1 -c conda-forge
However, because of the import carla it is added to the task requirements and clearml-agent tries to install it, although it is meant to be included at runtime.
I just updated my server to 1.0 and now the services agent is stuck in restarting:
When I select many experiments it will only delete some and show an error message, that some could not be deleted. But if I only select a few, everything works fine.
Thank you. Will try that!
@<1576381444509405184:profile|ManiacalLizard2> Maybe you are using the enterprise version with the vault? I suppose the enterprise version is running differently, but I dont have experience with it.
For the open-source version, each clearml-agent is using it's own clearml.conf
And clearml-agent should pull these datasets from network storage...
Hi TimelyMouse69
Thank you for answering, but I do not think these methods do allow me to modify anything the is set in clearml.conf. Rather they just do logging.
AgitatedDove14 I have to problem that "debug samples" are not shown anymore after running many iterations. What's appropriate to use here: A colleague told me increasing task_log_buffer_capacity worked. Is this the right way? What is the difference to file_history_size ?
What you mean by "Why not add the extra_index_url to the installed packages part of the script?"?
I tried to run the task with detect_with_conda_freeze: false instead of true and got
Executing Conda: /home/tim/miniconda3/condabin/conda install -p /home/tim/.clearml/venvs-builds/3.8 -c defaults -c conda-forge -c pytorch 'pip<20.2' --quiet --json
Pass
Conda: Trying to install requirements:
['pytorch~=1.8.0']
Executing Conda: /home/tim/miniconda3/condabin/conda env update -p /home/tim/.clearml/venvs-builds/3.8 --file /tmp/conda_envh7rq4qmc.yml --quiet --json
Conda error: Unsati...
I can put anything there: s3://my_minio_instance:9000 /bucket_that_does_not_exist and it will work.
Thank you very much. I tested it on a different machine now and it works like intended. So there must be something misconfigured with this one machine.
Okay, it seems like it just takes some time to delete and to reflect in the WebUI. So when I try to delete again, actually a deletion process seems already to be running in the background.
By host you mean the machine on which the agent is running? How does clearml-agent find the cuda_version?
==> 2021-03-11 12:50:38 <==
# cmd: /home/tim/miniconda3/condabin/conda create --yes --mkdir --prefix /home/tim/.clearml/venvs-builds/3.8 python=3.8
--
==> 2021-03-11 12:50:40 <==
# cmd: /home/tim/miniconda3/condabin/conda install -p /home/tim/.clearml/venvs-builds/3.8 -c defaults -c conda-forge -c pytorch cudatoolkit=11.0 --quiet --json
--
==> 2021-03-11 12:50:43 <==
# cmd: /home/tim/miniconda3/condabin/conda install -p /home/tim/.clearml/venvs-builds/3.8 -c defaults -c conda-forge -c p...
Do you know how I can make sure I do not have CUDA or a broken installation installed?
channels:
- defaults
- conda-forge
- pytorch
dependencies:
- cudatoolkit==11.1.1
- pytorch==1.8.0
Gives CPU version
Thank you. Yes we need to wait for carla to spin up.
But the problems seem to be reoccuring
AgitatedDove14 SuccessfulKoala55 Could you briefly explain whether clearml supports no-copy add for datasets?
Ah, I see. Any way to make the UI recognize it as a file server?