Reputation
Badges 1
45 × Eureka!SuccessfulKoala55 I've tried changing manually the TF version but it fails. I get:
import tensorflow as tf
File "/root/.clearml/venvs-builds/3.8/lib/python3.8/site-packages/tensorflow/init.py", line 435, in <module>
_ll.load_library(_main_dir)
File "/root/.clearml/venvs-builds/3.8/lib/python3.8/site-packages/tensorflow/python/framework/load_library.py", line 153, in load_library
py_tf.TF_LoadLibrary(lib)
tensorflow.python.framework.errors_impl.NotFoundError: /usr/local/lib/py...
SuccessfulKoala55 On another note, I'm also getting
ERROR: Could not find a version that satisfies the requirement pandas==1.3.4 (from versions: 0.1, 0.2, 0.3.0, 0.4.0, 0.4.1, 0.4.2, 0.4.3, 0.5.0, 0.6.0, 0.6.1, 0.7.0, 0.7.1, 0.7.2, 0.7.3, 0.8.0, 0.8.1, 0.9.0, 0.9.1, 0.10.0, 0.10.1, 0.11.0, 0.12.0, 0.13.0, 0.13.1, 0.14.0, 0.14.1, 0.15.0, 0.15.1, 0.15.2, 0.16.0, 0.16.1, 0.16.2, 0.17.0, 0.17.1, 0.18.0, 0.18.1, 0.19.0, 0.19.1, 0.19.2, 0.20.0, 0.20.1, 0.20.2, 0.20.3, 0.21.0, 0.21.1, 0.22.0, 0.23....
docker mode + services mode
Yes, fail it and then close it
My own agent.
I want to clarify:
I was asking if such a feature exists (that limits number of simultaneous service tasks that can be brought up when using service mode) and if so how can I utilize it.
Well the requirements were automatically filled, not by me
SuccessfulKoala55
Well, I've removed the requirement altogether and now it won't fail on this anymore (TF is provided anyway AFAIK via the image) but now I get the following:
Any ideas?
*Needless to say, when running locally this works with no problem. Also the http://nvcr.io/nvidia/tensorflow:21.02-tf2-py3 image is able to run TRT
We think we fixed it.
The problem seemed to be having a path with // and clearml not handling it well
I am not sure what you mean. This is text, while I grab it from the artifact via python and print it, newlines are printed as expected
AgitatedDove14 , could it be that the GitHub is not synchronized? I can find only up to 1.2.0.rc3 in it.
clearml-agent daemon --detached --gpus 0,1,2 --dynamic-gpus --queue 2_gpu_queue=2 --docker --stop
try making two tasks, both with the same project name (While the project name contains '//') and you will get the same error.
It should be possible somehow, as they are attached to the Task and displayed in the Task's results tab
But this is not the data I want
A task can also have plots - for example 2d scatter plots and histograms
SuccessfulKoala55 , meanwhile I try that, I encounter something weird. I am using a clearml agent with the following
clearml-agent daemon --detached --docker --gpus 0,1,2,3 --dynamic-gpus --queue kenny_1_gpu_queue=1
But for some reason although all the gpus are free and no other agent is on the machine, only one task is executed at the time instead of 4. Why is that?
I'd like if possible a command line, same as I'd just sent, to recognize the specific worker that was brought up in this manner and kill only it
I want to access their data
project name is: RemoteStorage06/saips06/rdekel/hackathon_baselines/DATA_DIR/
No strange characters as far as I can tell
Latest allegro POC server (saips)