Reputation
Badges 1
606 × Eureka!And in the WebUI I can see arguments similar to the second print statement's.
Ok. I just wanted to make sure I have configured my agent properly. Just want to make sure I have to set it on all agents.
Seems possible because I didn't know I had to specify an entrypoint somewhere. I will do some additional tests.
Okay, thanks for the info! I am currently not using k8s, but may be good to know for the future.
For me this does not work (at least with nested tqdm bars, did not try single ones yet).
Thanks a lot. I somehow missed this.
Local execution output:ClearML Task: created new task id=855948f5d73c47e2ae37bb821385e15b ======> WARNING! Git diff to large to store (2190kb), skipping uncommitted changes <====== ClearML results page:
uploading artifact done uploading artifact 2021-02-05 16:24:56,112 - clearml.Task - INFO - Waiting to finish uploads 2021-02-05 16:24:58,499 - clearml.Task - INFO - Finished uploading
The script is intended to be used something like this:script.py train my_model --steps 10000 --checkpoint-every 10000
orscript.py test my_model --steps 1000
When you say it is an SDK parameter this means that I only have to specify it on the computer where I start the task from, right? So an clearml-agent would read this parameter from the task itself.
Python 3.8.8, clearml 1.0.2
# This file may be used to create an environment using:
# $ conda create --name <env> --file <this file>
# platform: linux-64
_libgcc_mutex=0.1=conda_forge
_openmp_mutex=4.5=1_llvm
absl-py=0.12.0=pypi_0
aiostream=0.4.2=pypi_0
attrs=20.3.0=pypi_0
blas=1.0=mkl
bzip2=1.0.8=h7b6447c_0
ca-certificates=2020.10.14=0
cached-property=1.5.2=pypi_0
cachetools=4.2.1=pypi_0
certifi=2020.6.20=py37_0
chardet=4.0.0=pypi_0
clearml=0.17.4=pypi_0
cloudpickle=1.6.0=py_0
cudatoolkit=11.1.1=h6406543_8
cycler...
I used the wrong docker container. The docker container I used had version 11.4. Interestingly, the override from clearml.conf and CUDA_VERSION Env variable did not work there.
With the correct docker container everything works fine. Shame on me.
Ah, it actually is also a string with remote_execution, but still not what it should be.
Alright, thank you. I will try to debug further
Can You tell me which python version is running on the agent/docker and which docker image?
Perfect, thanks! Only issue that is left, is that it seems like .ssh
is used even when I provideSSH_AUTH_SOCK
. I created an issue here: https://github.com/allegroai/clearml-agent/issues/45
Okay, I didn't know that. I just saw that VSCode seems to use a similar setup for their docker devcontainers.
Is there a way for me to configure/add the run arguments for the docker run
call?
What exactly do you mean by docker run permissions?
Yes, but this seems pretty reasonable to assume imo.
I am using https://hub.docker.com/layers/nvidia/cuda/11.8.0-base-ubuntu22.04/images/sha256-88b85c6edd089acdf0cb7f3be020a1e812b009bafaf92c1715ab6677bd997ef1?context=explore
which has python 3.10.6 if I remember correctly.
Hi TimelyMouse69 Thank you for your answer.
I use 3.10.8 locally and 3.10.6 remotely. Everything is run in a docker container, locally and remotely on the docker-agent (exactly the same docker image).
Thank you for looking into the disappearing dev
. It seems like this should be the reason for pip trying to install a stable version of 1.14, which does only exist as nightly
Bonus question: Is there some clearml-agent mode that does not do "some magic" and instead just installs exactly what is shown in the "INSTALLED PACKAGES" editor in the web UI?
Here is some code that shows exactly what goes wrong. I do local execution only. It seems not to be related to remote execution as I thought, but more related to clearml.Task:
` args = parser.parse_args()
print(args) # FIRST OUTPUT
command = args.command
enqueue = args.enqueue
track_remote = args.track_remote
preset_name = args.preset
type_name = args.type
environment_name = args.environment
nvidia_docker = args.nvidia_docker
# Initialize ClearML Tas...
I just wanna avoid that ClearML leaves files lingering around. Btw: a better default behavior in my opinion would be to delete tasks only after files have been deleted. And only with the force option to delete the task anyways!