Reputation
Badges 1
45 × Eureka!Well the requirements were automatically filled, not by me
SuccessfulKoala55 I've tried changing manually the TF version but it fails. I get:
import tensorflow as tf
File "/root/.clearml/venvs-builds/3.8/lib/python3.8/site-packages/tensorflow/init.py", line 435, in <module>
_ll.load_library(_main_dir)
File "/root/.clearml/venvs-builds/3.8/lib/python3.8/site-packages/tensorflow/python/framework/load_library.py", line 153, in load_library
py_tf.TF_LoadLibrary(lib)
tensorflow.python.framework.errors_impl.NotFoundError: /usr/local/lib/py...
SuccessfulKoala55 , meanwhile I try that, I encounter something weird. I am using a clearml agent with the following
clearml-agent daemon --detached --docker --gpus 0,1,2,3 --dynamic-gpus --queue kenny_1_gpu_queue=1
But for some reason although all the gpus are free and no other agent is on the machine, only one task is executed at the time instead of 4. Why is that?
We think we fixed it.
The problem seemed to be having a path with // and clearml not handling it well
Latest allegro POC server (saips)
Well, on the first task it grabs it opens a different WORKER:gpu0 worker entry as expected while the agent stays with WORKER:dgpu0,1,2,3
but the other tasks on queue won't start and upon the first task's completion the following are not being run on WORKER:gpu0 but on WORKER:dgpu0,1,2,3 instead using only 1 GPU (the task execution says it runs on WORKER:gpu0)
Yes, fail it and then close it
Partially, as I wanted to get logs with a level of ERROR and above, but using APIClient I've managed to get the reports anyway. Thanks.
I am not sure what you mean. This is text, while I grab it from the artifact via python and print it, newlines are printed as expected
scalars are only some of the results a Task can have
I want to access their data
I was told not to kill the process, also, finding it on my own seems very un-user-friendly
SuccessfulKoala55
Well, I've removed the requirement altogether and now it won't fail on this anymore (TF is provided anyway AFAIK via the image) but now I get the following:
Any ideas?
*Needless to say, when running locally this works with no problem. Also the http://nvcr.io/nvidia/tensorflow:21.02-tf2-py3 image is able to run TRT
It should be possible somehow, as they are attached to the Task and displayed in the Task's results tab
clearml-agent daemon --detached --gpus 0,1,2 --dynamic-gpus --queue 2_gpu_queue=2 --docker --stop
Thanks. But I am not talking about scalars. I am talking about plots I've reported to ClearML using .report_histogram or .report_scatter2d or .report_table
Hi SuccessfulKoala55 ,
failed. I read in the docs I can use mark_failed .
How should I use it correctly with task.close()?
TimelyPenguin76
Wouldn'ttask.mark_failed() task.close()
Work?