![Profile picture](https://clearml-web-assets.s3.amazonaws.com/scoold/avatars/ImmensePenguin78.png)
Reputation
Badges 1
45 × Eureka!But this is not the data I want
It should be possible somehow, as they are attached to the Task and displayed in the Task's results tab
SuccessfulKoala55 , meanwhile I try that, I encounter something weird. I am using a clearml agent with the following
clearml-agent daemon --detached --docker --gpus 0,1,2,3 --dynamic-gpus --queue kenny_1_gpu_queue=1
But for some reason although all the gpus are free and no other agent is on the machine, only one task is executed at the time instead of 4. Why is that?
SuccessfulKoala55
Well, I've removed the requirement altogether and now it won't fail on this anymore (TF is provided anyway AFAIK via the image) but now I get the following:
Any ideas?
*Needless to say, when running locally this works with no problem. Also the http://nvcr.io/nvidia/tensorflow:21.02-tf2-py3 image is able to run TRT
I was told not to kill the process, also, finding it on my own seems very un-user-friendly
AgitatedDove14 , could it be that the GitHub is not synchronized? I can find only up to 1.2.0.rc3 in it.
clearml-agent daemon --detached --gpus 0,1,2 --dynamic-gpus --queue 2_gpu_queue=2 --docker --stop
A task can also have plots - for example 2d scatter plots and histograms
I am also running from a NVIDIA container and I get
ERROR: No matching distribution found for tensorflow==2.4.0+nv
clearml_agent: ERROR: Could not install task requirements!
docker image is
http://nvcr.io/nvidia/tensorflow:21.10-tf2-py3
What should I do?
We think we fixed it.
The problem seemed to be having a path with // and clearml not handling it well
try making two tasks, both with the same project name (While the project name contains '//') and you will get the same error.
TimelyPenguin76
Wouldn'ttask.mark_failed() task.close()
Work?
docker mode + services mode
I'd like if possible a command line, same as I'd just sent, to recognize the specific worker that was brought up in this manner and kill only it
Well, on the first task it grabs it opens a different WORKER:gpu0 worker entry as expected while the agent stays with WORKER:dgpu0,1,2,3
but the other tasks on queue won't start and upon the first task's completion the following are not being run on WORKER:gpu0 but on WORKER:dgpu0,1,2,3 instead using only 1 GPU (the task execution says it runs on WORKER:gpu0)
Thanks. But I am not talking about scalars. I am talking about plots I've reported to ClearML using .report_histogram or .report_scatter2d or .report_table
with a self-hosted clearml server
SuccessfulKoala55 On another note, I'm also getting
ERROR: Could not find a version that satisfies the requirement pandas==1.3.4 (from versions: 0.1, 0.2, 0.3.0, 0.4.0, 0.4.1, 0.4.2, 0.4.3, 0.5.0, 0.6.0, 0.6.1, 0.7.0, 0.7.1, 0.7.2, 0.7.3, 0.8.0, 0.8.1, 0.9.0, 0.9.1, 0.10.0, 0.10.1, 0.11.0, 0.12.0, 0.13.0, 0.13.1, 0.14.0, 0.14.1, 0.15.0, 0.15.1, 0.15.2, 0.16.0, 0.16.1, 0.16.2, 0.17.0, 0.17.1, 0.18.0, 0.18.1, 0.19.0, 0.19.1, 0.19.2, 0.20.0, 0.20.1, 0.20.2, 0.20.3, 0.21.0, 0.21.1, 0.22.0, 0.23....
Yes, fail it and then close it