Hmmmm you can automate the cleanup. Iterate through folders, if such an experiment exists, skip, if no experiment exists, delete folder
Hi @<1855782479961526272:profile|CleanBee5> , I think you're using the old repository.
None is what you need 🙂
Hi JuicyOtter4 , not sure you can disable certain scalars but you can disable auto logging of specific frameworks using the method showed here:
https://www.youtube.com/watch?v=etGjxOKG9lo
Hope this helps 🙂
Hi @<1689446563463565312:profile|SmallTurkey79> , when this happens, do you see anything in the API server logs? How is the agent running, on top of K8s or bare metal? Docker mode or venv?
Hi @<1529633468214939648:profile|CostlyElephant1> , it looks like thats the environment setup. Can you share the full log?
GreasyPenguin14 Hi!
I wish I could help but I'm afraid I'll need to ask AnxiousSeal95 for some help with that, please hold tight until he will be able to help out 🙂
GreasyLeopard35 , what happens if you try to run the command it's (agent) trying to run yourself?
That's the controller. I would guess if you fetch the controller you can get it's id as well
Hi, SteepDeer88
For example, if the experiment I am cloning has no docker image and parameters set, will that make the agent ignore the ones I set in
clearml.conf
?
No, the experiment should run in docker mode if the agent was run with --docker mode
https://clear.ml/docs/latest/docs/references/sdk/task#taskenqueue
Is this what you're looking for?
Also you can enqueue it through the API
https://clear.ml/docs/latest/docs/references/api/tasks#post-tasksenqueue
Hi, where did you get the instructions thay specify 'trains' ? Everything should be switched to 'clearml'
Hi @<1717350310768283648:profile|SplendidFlamingo62> , are you using a self hosted server or the community?
Hi @<1797800418953138176:profile|ScrawnyCrocodile51> , You can set the repository using Task.set_repo - None
Although if you use Task.init it will automatically detect the repository from the script. If you don't want to execute the code on your machine you can use Task.execute_remotely - None
And finally, you can use `Task.set_base_do...
How did you set the output URI?
REMOTE MACHINE:
- git ssh key is located at ~/.ssh/id_rsa
Is this also mounted into the docker itself?
from clearml import Task
task = Task.init(,...)
print("hello world!")
Hi @<1562610699555835904:profile|VirtuousHedgehong97> , I think you can mount some shared folder between the ec2 instances to use as cache. ClearML hashes data so it can know if what it has in it's cache is relevant or not.
Hi @<1719524641879363584:profile|ThankfulClams64> , what do you mean regarding ClearML GPU Compute? Do you mean the Genesis autoscaler?
I think it is one of the parameters of the task. Fetch a Task and see what properties the artifact has 🙂
Looks like a network issue.
As a side note, I would suggest removing & revoking all credentials you've pasted here 🙂
That's a good question. In case you're not running in docker mode, the agent machine that runs the experiment needs to have Cuda/Cudnn installed. If you're running in docker mode you need to select a docker that already has those installed 🙂
Hi, How did you deploy?
RotundSquirrel78 , you can go to localhost:8080/version.json
Hi @<1726047624538099712:profile|WorriedSwan6> , ClearML uses ElasticSearch & MongoDB as databases for all of that information. I suggest checking online for backup procedures of these databases in K8s
ColorfulRaven45 Hi!
I'm afraid this is kind of a plotly limitation. Currently you can switch to full screen view OR hit the maximize graph button (it shows on over) for a better view.
We'd be happy to take a suggestion though 🙂
DepravedSheep68 , Hi 🙂
Can you try connecting to the instance and checking on the state of the dockers inside?
Hi @<1775332375794814976:profile|WhimsicalChimpanzee6> , the webUI uses the API under the hood. You can trigger and a pipeline via the webUI and see what happens in developer tools (F12)
You still have the AWS/GCP autoscalers which are great 🙂
The highlighted line is exactly that. Instead of client.tasks.get_all() I think it would be along the lines of client.debug.ping()