Reputation
Badges 1
606 × Eureka!This my environment installed from env file. Training works just fine here:
But I do not have anything linked correctly since I rely in conda installing cuda/cudnn for me
Thank you! I agree with CostlyOstrich36 that is why I meant false sense of security 🙂
Thank you SuccessfulKoala55 so actually only the file-server needs to be secured.
You can add and remove clearml-agents to/from the clearml-server anytime.
CharmingPuppy6 These threads may also be interesting for you: https://clearml.slack.com/archives/CTK20V944/p1614867532303700 https://clearml.slack.com/archives/CTK20V944/p1617963053397600
Perfect! That sounds like a good solution for me.
But it is not possible to aggregate scalars, right? Like taking the mean, median or max of the scalars of multiple experiments.
AnxiousSeal95 This bug seems to be affecting me. I just tried forcing clearml-agent to install clearml-agent==1.4.1 in the docker and now it works.
Btw: clearml-agent uses pip install clearml-agent -U
to install clearml-agent in the docker. However, instead of using the newest clearml-agent it should use the version that the host machine is using to run clearml-agent in my opinion.
I will create a minimal example.
Thank you. Seems like this is not the best solution: https://serverfault.com/questions/132970/can-i-automatically-add-a-new-host-to-known-hosts#comment622492_132973
AgitatedDove14 I have to problem that "debug samples" are not shown anymore after running many iterations. What's appropriate to use here: A colleague told me increasing task_log_buffer_capacity
worked. Is this the right way? What is the difference to file_history_size
?
Thanks for answering. I don't quite get your explanation. You mean if I have 100 experiments and I start up another one (experiment "101"), then experiment "0" logs will get replaced?
MortifiedDove27 Sure did, but I do not understand it very well. Else I would not be asking here for an intuitive explanation 🙂 Maybe you can explain it to me?
Thanks, that makes sense. Can you also explain what task_log_buffer_capacity
does?
But would this not have to be a server parameter instead of a clearml.conf parameter then? Maybe someone from clearml can confirm MortifiedDove27 's explaination?
AgitatedDove14 Could you elaborate?
I have a related question: I read here that 4GB is a http limitation and ClearML will not chunk single files. I take from that, that ClearML did not want/there was no need to implement an own solution so far. But what about models that are larger than 4GB?
No problem. Sounds like a good solution, no need to implement something that has already been implemented somewhere else 🙂
Now for some reason everything is gone .. 😕
Seems like some experiments cannot be deleted
[2021-05-07 10:52:00,282] [9] [WARNING] [elasticsearch] POST
` [status:N/A request:60.058s]
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 445, in _make_request
six.raise_from(e, None)
File "<string>", line 3, in raise_from
File "/usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 440, in _make_request
httplib_response = conn.getresponse()
File "/usr/lib64/python3.6/http/client.py", lin...
I got the error again. Seems to happen only when I try to delete "large" experiments.
Seems to happen only while the cleanup_service is running!
Yea, something like this seems to be the best solution.